Although I have posted this to the devel list, as Steve pointed out, putting patches to sandbox is not the best way to do things. So I open this thread. I have measured the time for build a node query in this is about 0.22-0.24ms (on an Athlon 933 MHz machine).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dries’s picture

I'm not likely to commit this. It's not conform with Drupal's coding conventions, but more importantly, a node query builder doesn't solve any real problems. It just adds a different way of doing things without offering a significant advantage.

chx’s picture

FileSize
5.67 KB

At this moment there are direct queries into the node table everywhere. After node_access_*_sql calls were invented, a lot of queries needed to be changed, and I think we will find more to be inserted. If someone does sg. else with the queries, permission, language etc. he needs to patch 60+ quieries in core alone. This is not a good thing. That's why I am proposing a central node query builder.

As for code standards, code-style.pl node.module returns 17 errors, none of them comes from my patch. However, I should admit, there were a few spaces missing from the database.inc patch. So I resubmit.

chx’s picture

FileSize
5.67 KB

OK, I was wrong, there is a space missing in one of the rewritten queries. But I doubt this matters too much, as this is only a proposal and I think there will be a lot of revisions before it gets to the core.

drumm’s picture

The blank lines between the comments and function starts will confuse the documentation parser on drupaldocs.org.

I'm not sure if I support this quite yet. Reducing the number of queries and whatnot is great, but this patch doesn't do that quite yet. I'd only want this if the new system is more readable and easy to write. I'm not sure if this is true.

As for code-style.pl, I don't think this is actually used much and it doesn't catch everything. Such as, most operators (=, =>, +, etc) like a space on either side of themselves. I'd support removing code-style.pl and throwing it in contributions somewhere. (If this needs further discussion or action, please file another issue, lets keep this focused on this patch).

I'm going to wait for a few more revisions of this before I weigh in a +1 or -1.

Jose Reyero’s picture

I can really think of a number of advantages of this, and though not really for committing it yet, think is very good to see some discussion on the issue and some concrete implementations.

When implementing i18n I run into this problem, of having to patch lots of queries -actually too many- in about all the modules doing any kind of node listing. This far, I'm stuck at this point. Then I realized it was just the same for node permissions, and will be the same again for whatever new functionality you want to add in the future which affects node listings.

I think actually that direct access to 'node' table should be avoided when possible out of the node.module. And the only way to achieve this is to have some kind of query builder. And also think that chx's approach is very good. it provides some upper layer at the 'node' level, which can use some object level semantics while relying on a db layer query builder, which could be used later for other objects -users, taxonomy terms...-

Just think of all the work that this could have saved when implementing permissions system, which I think is ugly and dirty anyway because of all that node_join, node_where...

In the long run, adding such complex logic as sql code, patching and re-patching queries is bad, very bad, making any new change one higher level of complexity and driving us away from a OO model, which I think we should tend to. It's not using PHP objects. It is thinking of nodes as 'objects' and coding according to it.

Oh, yes, sorry, this is growing too long to write it here :-). But we could talk also about db portability...

moshe weitzman’s picture

I should add that the Organic Groups module could benefit from this node listing SQL builder. I have awkwardly worked around the problem for now but I don't like it.

I don't know if this patch is good or not, but I do lean toward the functionality that it offers. Remember that any developer is free not to use the query builder and issue direct SQL as needed. If the query builder is making life hard, simply don't use, just like today.

Bèr Kessels’s picture

I +1 for this functionality (i have no time to comment on coding style etc).
It will not only improve maintainability, but will allow a far easier implementation of the i18n.
i18n currently has a lot of patches, simpley to add logic to all those node sql queries referred to.

chx’s picture

Title: Node query builder proposal » Centralize node queries somehow
FileSize
1.63 KB

OK, maybe the former approach was too complex. How about this? This requires a lot less change to the queries and no change to database.inc and ten times faster.

As usual, a sample query is rewritten and an example of the proposed hook is provided.

chx’s picture

Final thoughts for this day. If you like the last version, there is a possibility to automate the whole thing by adding the following three lines to the beginning of db_prefix_tables:

  if (preg_match('/^SELECT.+\{node\}/',$sql)) {
    $sql = node_query($sql);
  }

Of course, there is very small chance of a loop here, so this would require more thought, but I think at the end this would be a good thing.

Dries’s picture

I like the second approach better as it keep things readable and gives me a bit more control over the order in which things are written. Looking for more feedback from the others. Did you try updating more queries to make sure it works as intended in all cases?

chx’s picture

FileSize
1.23 KB

Yes, I tried with many different queries. And I have found one bug -- forgot the underscore in the regexp following FORM so I have rewritten the whole regexp. It is now a lot simpler :)

And node_query is now reentrant, so if a function in the process of hook_sql calls node_query, it won't fall into an infinite loop. This situation did not occur but it's better to forestall such things.

moshe weitzman’s picture

This patch is a step forward. I thought of another feature which becomes easier with the proposed hook_sql(). think of filtering like freshmeat.net. At freshmeat, you can say that you only want to see projects where OS=Windows and License=GPL (for example). A Drupal equivalent is 'only show me nodes related to Democrats and Poverty'. This sort if global node filtering is very hard in Drupal today. With this hook, it becomes easy.

Dries’s picture

It's starting to look good (and you're getting more support)!

chx’s picture

Here is a sum of changes in this new revision:

  • Following Dries' advice, I read those threads and now DISTINCT(nid) is used only if there is something joined. Whether the access_grants system is used or not, is determined at the module configuration screen, so for efficiency reasons, I have patched system.module.
  • I the very first version of my patch (you know, with arrays) I passed on the whole query so hook_sql implementations could easily determine whether they are interested in this query or not. In a very cool brainstorming with Jose, we found out, that passing the whole query is not a good thing, hints are enough. So when I will patch the forum queries, I will define a NODE_SQL_FORUM hint, and a filter which does something cool with forums (I have heard something about private forums not implemented -- I maybe wrong here) may do so.
  • We now have a separate helper function to call hook_sql which is cool, 'cos with it the last node_access_*_sql calls which were lurking in node_search could be elinimated.
chx’s picture

FileSize
6.46 KB

Funny thing. Forgot to attach the patch.

chx’s picture

Another day, another revision. As this patch now has nothing to do with queries, but it is about rewriting node SQL statements, I have renamed it node_rewrite_sql, the helper _node_rewrite_sql and there is hook_node_rewrite_sql. hook_node_rewrite_sql expected return value has changed: it can return an array('join'=> ..., 'where'=>..., 'distinct'=> ) so it may determine whether DISTINCT is needed or not. Neither index is mandatory. That hook_node_rewrite_sql impementations will decide whether they are asking for DISTINCT or not, is -- I think -- a good opportunity to optimize for speed. For eg. I doubt every node access mechanism requires DISTINCT.

One last note: hook return value can be also can be a string with a WHERE statement for the sake of simplicity.

node_rewrite_sql now works with multiline queries.

chx’s picture

FileSize
7.43 KB

I can't believe I once again did not attach the patch

chx’s picture

FileSize
6.65 KB

Moshe, thanks for your patience and wisdom.

  • Extract is thrown out, list is used.
  • node_node_rewrite_sql looks better now.
  • access_grants_present is determined in run time, once per page.
Dries’s picture

The hint-stuff looks a bit hairy to me. It would be nice if this could be avoided but I see what it is necessary.

The semaphore stuff is vague too. I don't see how you'd be able to get into an endless loop.

It might actually be useful to turn node_access_grants_present() into a more generic module.inc function. Maybe something along the lines of module_implements($hook), which would return a list of modules implementing the specified $hook.

I suggest you update more core modules. If you do, I'll measure the performance gain against a copy of drupal.org's database.

chx’s picture

FileSize
15.72 KB

The hint stuff is something is pretty useful: hook implementations can fine tune themselves when to fire and when not. Good for features and performance, too.

module_implements is done.

Semaphore -- we will have tons of node_rewrite_sql calls in the core. Now someone writes a hook_node_rewrite_sql implementation which calls a function which in turn calls another one which calls node_rewrite_sql . I would rather not be the one who debugs such a scenario.

Thanks for the offer! forum.module is now patched. Others will follow tomorrow.

Jose Reyero’s picture

I think this looks good enough, so I'm starting to test and use this for i18n. If this get into the node module, I don't think I will need any node.module patch anymore...

A pair of performance hints:
- As this node_rewrite_sql is going to be called several times for each page, we can cache the list of modules implementing the hook in a static variable.
- We can even cache the whole result for each 'hint' value, so the whole hook will only be called once or twice for each page.

Some suggested 'hint' values to be defined:

NODE_REWRITE_SQL_LIST -> This one for simple generic list of nodes for the main page, could be default.

NODE_REWRITE_SQL_ADMIN_LIST -> List of nodes for administration pages

NODE_REWRITE_SQL_RELATED_LIST -> List of objects related to nodes, but no nodes themselves -like
comments-, node table may have to be joined...

NODE_REWRITE_SQL_BLOCK_LIST -> Listing for a block

NODE_REWRITE_SQL_NO_LANGUAGE -> No need to add language conditions. This could hapen when you need a listing for other languages, or mixed languages, for translators or for administrators..

I would need something like this last one for i18n, just to have the permissions working and some flexibility with language conditions.

And also the one you already have in place ...._SEARCH.

All this will allow for options like search, admin, list current/all languages...

Jose Reyero’s picture

I found some problems when using the patch. The where conditions are not merged well if there are more than once, and also conditions need some parenthesis around.

So I replaced the $where and $join strings in _node_rewrite_sql by arrays which are imploded at the end.

Also added some 'hint' definitions. Patches for i18n module are coming next.

This works like charm for node listings, comment listings, searches, etc...

Btw, I also think 'semaphores' are not really needed.

Dries’s picture

I don't see how the hint system can be portable: are contributed modules supposed to define their own hits?

chx’s picture

The hints -- I was in a hurry yesterday evening, so I have forgot to define a NODE_REWRITE_SQL_FORUM in the forum.module patch and also forgot to use the hints. So I think core node types will have their hint defined.

I have a vague idea about how contrib modules could define their own hint type: we shall have a NODE_REWRITE_SQL_HINT_MAX, and every hint define should like this -- beware this is not working! This is just an idea.

<?
define(NODE_REWRITE_SQL_COMMENT, NODE_REWRITE_SQL_HINT_MAX);
define(NODE_REWRITE_SQL_HINT_MAX, NODE_REWRITE_SQL_HINT_MAX+NODE_REWRITE_SQL_HINT_MAX);
?>

Well, if defines does not work, maybe a global node_rewrite_sql_hints array should be introduced...?

chx’s picture

FileSize
16.69 KB

Here is a list of changes:

  • Complete rethought of hint system. Uses array of strings instead of defines. There are a number of examples.
  • the new comment.module patch shows a good usage example for the third parameter of node_rewrite_sql which is at the moment called node_alias. It's really not a good name, it should be "the alias of the table which holds the nid field for this query". BTW. there should be an index on comment.nid.
  • Incorporated Jose's "replaced the $where and $join strings in _node_rewrite_sql by arrays which are imploded at the end". I was a fool to lose these, the first version (which used array) used a proper implode for WHEREs.
Dries’s picture

I have two fundamental problems with it this patch:

  1. The loop detection mechanism has to go.
  2. The hints are difficult to understand and I'm not convinced they are good/needed. Does the patch contain a good example of a node_rewrite_sql function/hook that uses these hints? I can't find one. I won't like these hints until I'm convinced they are needed and that they represent the best possible solution. Convince me.

    Could you extend the documentation about the hint-system? Currently it reads "An array of hint strings about the query, passed on to hook_sql handlers.". Clearly, that doesn't help me at all. The documentation should answer the following questions: (1) what are the hints for, (2) when should I use them (or not use them) and (3) what does hints look like (what is their format/syntax)?

Note that I can't compare performance unless more queries have been updated.

Thanks chx.

Dries’s picture

Why does the hints in the comment module include 'related_list'. This is confusing.

Jose Reyero’s picture

Well, it seems we have different views about the 'hints'.

At this point, the node_rewrite has two main uses: permissions and language conditions (i18n). I don't know whether they could have any use for permission system, but for i18n, some extra information about the rewritten queries will be needed:

- Most of the queries will just select fields in the node table. Here, only a where condition is needed, but language condition may be added or not depending on the kind of query:

_LIST would be a list for the main page, so only in this case, decissions can be made depending on the path. For some pages, you may want a list of nodes for all languages and for some others, only current language.

_ADMIN_LIST is a list of nodes for administration. An option will be added in i18n for 'administering all languages together' or 'administer each language separately', or even a language field could be added in the search form. So language conditions will be merged diferently

_BLOCK_LIST is a list of nodes for blocks. I'm thinking here of a global option like 'show mixed languages in blocks'. Anyway, this is a list for a block and you cannot have any logic depending on path here.

_SEARCH, would be only for searching, and in this case you may decide to search 'all languages' or only the current one. This can be a global option or some extra field in search form. When doing text searches, searching all languages together will make a lot of sense, but also restricting search by language.

About _RELATED_LIST, this is the case of comments, is a list of objects which are not nodes themselves, but related objects in a different table. Here, the node table itself may have to be joined as it is not yet in the query.

When dealing with the translation system, the language conditions will be added or not by the module requesting the query itself, so no further language conditions will be needed. Queries will have to be run anyway through 'node_rewrite_sql' to have permissions in place.
That's what the _NO_LANGUAGE hint is about.

So this is my proposed use for the 'hint' thing, though maybe the hints themselves could be better defined or elaborated, but this is the main idea. I mean no specific hints for modules but generic ones for 'kind of listing' you are requesting.

Maybe for the permission system, it could be also some use, as you could decide you want to show a list of objects to anyone, but deny access to the objects themselves, something like 'subscribe or register if you want to read the article'. So permissions here will apply differently depending on what you are doing.

We can think also about more creative use of permissions like 'you can read latest articles' but old ones will be available only for selected people, or the opposite 'only old content available for everybody'. Or even 'you can read articles but not comments'. This kind of permissions would need also some 'hints'.

chx’s picture

FileSize
26.89 KB

More queries rewritten. Loop detection gone. Will gather my thoughts on hint system later, but I do hope this amount of queries will be enough for a performance measurement. Now only blogapi, blog, book calls node_access_*_sql, all others are done.

chx’s picture

FileSize
26.88 KB

Forgot to save taxonomy.module before rolling the patch, so a DISTINCT remained :(

chx’s picture

I see Jose has written a lot about hints, thanks! "Well, it seems we have different views about the 'hints'." -- "we" probably means Dries vs. "Jose and I" 'cos I agree with Jose, but I see some more uses, namely what I have said before on making a filter on forums. If you want to that, you need to know which queries are coming from the forum.module. What shall we do? Pass on the whole query, and let the filter proccess the query to decide whether now it is the forum block it hunts for? I doubt. Let's give it a hint it's a forum and a block query.

Some kind of extensionable hint mechanism is necessary, 'cos what about a contrib module which filters another contrib node, say flexinode?

Dries’s picture

Could we drop the hint system if we'd choose not to support the il8n module for now?

Jose Reyero’s picture

:-(

Ok, let's try to think positively.

Could we at least pass the query along so i18n -or whatever module tries to implement this hook- could support itself?

chx’s picture

Well, passing along the query would mean no problems, but I wonder whether a better documented hint system would be better? I had no problems with letting the loop detection system go it was just a safety belt for some distant future possibility. But the hint system is -- in my point -- is needed.

So, here are the valid hint strings:

node, forum, comment etc.
name of the module from the query originates, like forum, comment, flexinode etc.
generic
this would be be for generic listings, like node_default_page, forum_get_forums etc.
related
node table is not joined so if you want to use other fields from node table than nid you shall join node table yourself. However, if you have your table which is like JOIN {mytable} ON mt.nid=$nid_alias.nid that's fine.
admin
something meant only for administrators.
block
nodes in a block
search
something to do with a search
chx’s picture

FileSize
28.31 KB

Killes said I did not had -F^f in the last version.

chx’s picture

FileSize
28.77 KB

Automatic version as discussed and args for Killes.

chx’s picture

FileSize
28.31 KB

Hm. It seems we have agreed on no automatization. However, arbitrary arguments are kept for Killes.

Dries’s picture

The last version of this patch improved the performance of my test site's main page by 257%. That is, my machine can serve 2.57 times as much pages in the same time interval.

Out of curiousity, I investigated this some more and it turns out this can be tracked down to the useless DISTINCT's which this patch removes.

Unless someone objects to this patch or the approach taken, I'm going to commit this shortly. Please review this patch carefully as it marks an important change.

chx’s picture

FileSize
44.18 KB

More DISTINCTs cleared, now there are no more node_access_*_sql calls in the core modules. This may be a final version.

Or we can find a few more to rewrite. I've searched for {node} selects, dropped those which had nid = %d as I thought those do not need rewrite (I may be wrong) and dropped also those that are rewritten, dropped those that are indexing or cron. Here are the remaining queries, which may need rewriting:

blogapi.module:364:  $result = db_query_range('SELECT n.nid, n.title,'. ($bodies ? ' n.body,' : '') ." n.created, u.name FROM {node} n, {users} u WHERE n.uid=u.uid AND n.type = 'blog' AND n.uid = %d ORDER BY n.created DESC",  $user->uid, 0, $params[3]);
node.module:386:  $node = db_fetch_object(db_query('SELECT n.*, u.uid, u.name, u.picture, u.data FROM {node} n INNER JOIN {users} u ON u.uid = n.uid WHERE '. implode(' AND ', $cond)));
node.module:771:  $result = pager_query('SELECT n.*, u.name, u.uid FROM {node} n INNER JOIN {users} u ON n.uid = u.uid '. $filters[$filter][1], 50);
queue.module:57:  $result = db_query('SELECT COUNT(nid) FROM {node} WHERE moderate = 1');
search.module:118:  $nodes = max(1, db_result(db_query('SELECT COUNT(*) FROM {node}')));
upload.module:269:    $result = db_query("SELECT SUM(f.filesize) FROM {files} f INNER JOIN {node} n ON f.nid = n.nid WHERE uid = %d", $uid);
upload.module:272:    $result = db_query("SELECT SUM(f.filesize) FROM {files} f INNER JOIN {node} n ON f.nid = n.nid");

Also, a recent upload module patch (#14545 by nysus) and the comment query update made by Jose in this thread shows that there could be a few more queries like these two which SELECTs something from a table which has nid as foreign key but still needs rewriting.
node_rewrite_sql is prepared for such queries by it's second parameter, which I've renamed $nid_alias is useful, 'cos it's not the alias of the node table, but the alias of the table which has nid field, it could be f for file, c for comment etc.

Chris Johnson’s picture

This patch seems like both a Good Thing and a Bad Thing to me.

The idea of centralizing node queries is a good one, I think. It seems to be the original motivation for this patch. It allows better SQL optmization and hence performance. It puts all the code in one place for better future maintenance and enhancement.

On the other hand, constructing complex SQL statements from programmatic logic is full of peril. The task quickly becomes very large and complex.

Manipulating the SQL language elements via code means it is harder to see when one has made a syntactic mistake. Most such errors will be fatal. As the SQL construction code becomes more complex with development, the number of paths through it become huge and impossible to exhaustively test. Some day, someone will pass parameters that cause fatal SQL syntax to be generated in a "production" system.

It also obscures SQL syntax differences between supported databases. It's more likely to introduce usage that works with MySQL but fails with Postgres.

I've long been in favor of unifying and centralizing all of the database access in Drupal, to the greatest extent which does not adversely affect performance. So in this way, I think this patch is a big step in that general direction.

But I've also been involved in maintaining a search engine which used this technique to create the SQL statements on the fly. It became extremely complex over time as one, then two, then three, etc. new possibilities were added (e.g. new usage, new search criteria). The more complex it became, the more regression errors we had and the harder to debug it became. It got to the point where making one small addition would result in 3 new bugs. It worked most of the time, and when it worked, it was fast. But it failed fatally in a significant number of ways.

I'm not sure whether to vote +3 or -4 on this one. It might be good; it might be terrible.

chx’s picture

FileSize
44.18 KB

Chris, I understand your concerns, but this method -- unlike the original -- tries to keep things as simple as possible. You can only JOIN tables and add WHERE conditions. We are not creating complex SQL on the fly, I tried that originally, just read comment #1 for the warm welcome by Dries: "I'm not likely to commit this". So do not worry, while I may be a little hotheaded, Dries keeps order with an iron hand.

And the actual rewrite mechanism is also very simple.

Ps. Two typos corrected in this version.

Dries’s picture

I'm going to test and benchmark the latest version of this patch over the weekend. If all goes well, I'll commit this to HEAD as no concrete alternatives have been proposed. (I can't think of a better approach and believe that this patch does a good job balancing different values/requirements.)

It would be much appreciated if those who maintain node-level permission modules could take a look at this patch to evaluate its usefulness/performance. If you maintain a module that queries the node table you might want to look at this as well -- after all, you'll have to update your module if this hits core.

(Note that the PHPdoc code in the patch is not up-to-date.)

chx’s picture

FileSize
43.97 KB

PHPdoc is upgraded, I hope now it is up to date.

chx’s picture

FileSize
43.98 KB

module_implementations had a minor, but performance eating bug, it's caching was broken.

Jose Reyero’s picture

The function _node_rewrite_sql shoult have a default value for the $query param, as it is called at least once with no arguments. You get a 'warning' otherwise.

chx’s picture

FileSize
43.99 KB

Yes, that's true. Default value of query added to helper _node_rewrite_sql.

Dries’s picture

Benchmarked some more: the main page (q=node) is 64% faster, the forum listing page (q=forum) is 416% faster and the forum topic listing (q=forum/1) is 57% faster. All pages had a 'Forum topics' block.

Dries’s picture

Committed to HEAD. TODO: update the upgrade guide. Marking this active.

moshe weitzman’s picture

Title: Centralize node queries somehow » Centralize node queries

the upgrading guide dedicates a whole page to this - http://drupal.org/node/12347, marked as fixed.

chx’s picture

The node queries page is actually http://drupal.org/node/17669 , what moshe pointed out is the general update your module from 4.5 to HEAD.

Anonymous’s picture