It appers that during cron run query rewrite may produce undesired results - in my case not indexing site content. Please see more details on http://drupal.org/node/64333

Proposed changes:
1. disable rewrite during cron run altogether
or
2. add one more parameter to node_load( ....$rewrite=true) with obvious meaning and change calls from node_update_index

Sincerely
Maris

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mgabalins’s picture

FileSize
86.46 KB

Hi,

I have attached updated node.module - have not yet figured out how to create proper patch, however maybe for somebody this will be usefull. At least it has solved bugs in my case.

Sincerely
Maris

greggles’s picture

Please see http://drupal.org/node/323 for how to create patch files for Drupal.

Heine’s picture

Status: Active » Needs review
FileSize
2.67 KB

I've attached it in patch form. You can find a guide on creating patches at http://drupal.org/diffandpatch. Try it, it's quite easy :)

Dries’s picture

This looks ilke a hack. Soon we'll be adding $rewrite flags to all functions that use db_rewrite_sql()? I think we need to look for another solution.

mgabalins’s picture

Other possibilities as I see are:

1. Change rewrite logic within OG (in my case) or/and other modules, since as I understood this behaviour has been seen also in other cases,

2. Implement special user name for CRON jobs (I recall it was suggested somewhere here on site) which has appropriate priviledges to have needed access.

As I understand there is no really clear method for function in module to determine that it has been called from CRON job. Relaying on variable 'cron_busy' can create potential security risk. So actually first option can not be really implemented without the second one.

I cannot really imagine any of those changes happening quickly and they may turn out to be disruptive for number of modules relaying on CRON functionality.

On the other hand I believe that proposed change (besides that it solved my problem) does not break general logic and, seems to me, does not create any side effects.

Only other things which I see is replace node_load call with select, however it requires repeating bunch of code lines and I really think it is not a good approach.

Sincerely
Maris

killes@www.drop.org’s picture

Version: 4.7.0 » x.y.z

no new features to cvs. Look at mailhandler on how to switch user sessions.

mgabalins’s picture

I guess I should have called it a bug - what it is in fact.

This patch does not introduce any new behaviour other than one additional paramater to node_load,

Sincerely
Maris

pwolanin’s picture

Version: x.y.z » 4.7.2
Category: feature » bug

At first I didn't see this problem, but maybe some nodes got indexed when I ran cron.php manually as user #1. I agree that this is a major bug, not a feature request- I'm trying to build up a site with content restricted using taxonomy_access, and this is really going to be a problem. The results of search are already filted using sql_reqrite, so why does it need to be done during indexing?

I'm looking into killes's suggestion of changing user as per mailhandler module.

pwolanin’s picture

Title: node.module - disable sql rewrite during cron run » search.module - index all content during cron run
Component: node system » search.module
FileSize
676 bytes

Changing the title, etc since I think this is a problem with search.module not node.module

Patch is attached which sets the user to be user #1 during search indexing based on the code in mailhandler.module. This seems to fix the problem if I re-index my site after installing this patch. Nodes which didn't appear in the search results do appear for a user with access to the node. New (restriceted) content gets added to the index as well.

i'm not sure what, if any, the security implications of this patch are if cron times out or fails. Maybe we can deal with this in the shutdown_function? What would happen if this function set user to #0 (anonymous) or forced a logout?

killes@www.drop.org’s picture

I agree that this should be fixed for 4.7.3, however I am not too fond of either of the proposed solutions. I recall that we added the db_rewrite_sql just as a general stop gap measure in case of sloppy contributed modules which wouldn't do proper permission checks.

I think the best woul dbe to remove it again.

Commit message from revision 1.488 of node.module:

Patch by chx: added a db_rewrite_sql() in node_load() just to make sure.

pwolanin’s picture

Component: search.module » node.module
FileSize
1.85 KB

Attached patch removes the sql_rewite calls in node_load() as suggested.

This seems to work- if I re-index my site and run cron.php as an anonymous user, the private content still now shows up in search reults when I'm logged in.

If i'm logged out, node_search already rewrites its query, so private content does not appear in the search results.

pwolanin’s picture

has anyone else tried this patch?

pwolanin’s picture

Title: search.module - index all content during cron run » node.module - index all content for search during cron run
Version: 4.7.2 » x.y.z
FileSize
1.82 KB

Problematic code still exists in HEAD- changing version to CVS. Attached patch is against HEAD, patch in #11 should be suitable for 4.7 backport.

Dries’s picture

I'm OK with this patch, although it looks less secure. I'd encourage you to add a code comment explaining why we are not rewriting the queries.

pwolanin’s picture

Ok, will add a small comment and post a revised patch as RTBC.

In terms of security, my understanding from #10 that this is really a reversion back to the 4.6(?) code, and really should not matter if modules are correctly using db_rewrite_sql.

pwolanin’s picture

Ok, here's a patch for 4-7 with a comment

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
2.11 KB

Patch for HEAD and RTBC. Note, patch for 4.7 does not apply to HEAD becuase of change to the code to remove n.moderate.

merlinofchaos’s picture

I'm a little concerned about this one, but as I've thought about the ramifications of this, I'm a little torn.

On one hand, I'm not sure we really want to be protecting from sloppy contrib.

On the other hand, it seems like we could pretty easily flag node_load so that the rewrite is optional, and I can see the logic to that. Very few cases really *need* to load privileged content, the search index is kind of a special case. Perhaps many things during cron are special cases.

But they are a minority of cases, and being a minority does lend some justification to adding a flag to node_load to skip the check.

pwolanin’s picture

My point of view is that if the API is being used corrrectly, you don't need the rewrite- it's being done ahead of time to choose the nodes to load. Anyone using a pager query, for example, will need to use the rewrite so they know how many nodes they have. The node_menu() function checks via node_access('view', $node) before displaying the node.

So, correct me if I'm wrong, but I don't think there is any code in core that expects the rewrite to be done in node_load(). Any "sloppy" use in contrib modules would be noticed quickly, I think, and corrected.

killes@www.drop.org’s picture

applied to 4.7

drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)
Dave Cohen’s picture

Admins should be advised to go to admin/settings/search/wipe after applying this patch.