CCK supports indexing of fields but ONLY if they are accessible to anonymous users.
That means that if you enable Content permission and set a field as not viewable by anonymous users (for instance, to restrict access to this field to privileged users), this field won't be indexed at all, and privileged users won't be able to search on it.

This is because of line 768 in content.module (6.X-2.4):

          '#access' => $formatter_name != 'hidden' && content_access('view', $field),

which should be replaced by:

          '#access' => ($formatter_name != 'hidden') && (($context == NODE_BUILD_SEARCH_INDEX) || content_access('view', $field)),

As a matter of fact, when a scheduled cron is run, it is run as an anonymous user, so without the fix above, fields not accessible to anonymous users are not indexed.
But in the other hand, if you reset indexing and run the cron manually, from admin pages, the original code above will work and fields will be indexed (this is because admin user can access anything).
So as for me this is a bug: CCK fields not accessible to anonymous users are not indexed in a case and are indexed in another.
All fields, whether they are accessible to anonymous users or not, should be indexed by CCK (as long as set to be from CCK display-search settings). It should then be up to "search" module to display only the results that the current user is allowed to view.

The fix above works for me, and I would appreciate if it could be reviewed by community.
And if someone could make a patch...

Files: 

Comments

markus_petrux’s picture

Status: Needs review » Needs work

When indexing fields not accessible to anonymous users, they will get hits for something they cannot see, and the fact that a hit can be returned exposes privacy issue that would have to be resolved, IMO.

jonathan1055’s picture

gpk’s picture

I'm not hugely familiar with the inner workings of core's search.module but I wonder if this is actually a limitation of search.module. To handle fields with different permissions, each field might need to be indexed separately; whether the fields can then be matched against the search query at the same time as the basic node (content) search I don't know. This might in any case be beyond the scope of the existing content permissions module which is intended to be fairly simple.

Always running cron as anonymous (per #431776: Cron should run as anonymous when invoked via the run-cron link on the status report page) would at least sort out the privacy issue (#1). For better search behaviour wrt permissioned fields it might be necessary to use Faceted Search and Field Indexer, but from the text on the project pages there appears to be a performance hit, presumably because core's search.module does most if its work in 2 queries I think whereas pulling all the relevant search result info from individual fields into the overall search results is (obviously?!) much more work... And I've not actually tried this so I don't know whether it will do what is required. Might be worth investigating though.

aufumy’s picture

FileSize
1.04 KB

The above patch did not work for me, attached is a patch that I created fairly quickly to solve my issue.

The way I see it, search should have access to index all content, and I do agree that in search each field should be indexed separately to be able to work with content permissions.

With solr, it might be possible to specify what fields not to search in.

With sql search, it might be quite difficult to search, if there is a lot of content and cck fields, as each cck field could be in its own table, requiring a lot of joins.

aufumy’s picture

Status: Needs work » Needs review
markus_petrux’s picture

Version: 6.x-2.4 » 6.x-2.x-dev
Category: bug » feature

This is not a bug, but a feature request. As such, it needs to be related to 6.x-2.x-dev version.

That being said, I'm tempted to say won't fix, because of reason described in #1, and also because it implies an architectural design change in how CCK works and manages field permissions. It is too late for the life cycle of CCK for D6, IMHO.

aufumy’s picture

Regarding privacy issues, that has to be handled on the search side and not on the cck side.

Currently even if a user has full permission to view all fields, they will not be able to search for any of the fields that anonymous users do not have access to, because of cron running as anonymous user.

Search needs to be able to index the content, once indexed, then it should be up to search to return the appropriate results.

Also if 'search content' permission is not assigned to anonymous users. This prevents anonymous users from seeing any search results.

deltab’s picture

I agree with #7, search engines should index everything, what to show and to who is a workflow/authorization issue, not the indexer's business.

eikes’s picture

+1 subscribing

mErilainen’s picture

I'm using custom code in hook_nodeapi to hide some fields from users' who are not allowed to see them.

<?php
case 'view':
  if (user_access('show all fields')) {
          
    }
    elseif ((og_is_group_member($gid)) {
      //deny access to some fields
    }
    else {
      //deny access to rest of the fields
    }
  break;
?>

This works well and shows anonymous users some fields, group members some more fields and user with "show all fields" permission all fields.

When I'm searching as anonymous from the hidden fields, those nodes are listed in result, but the content is not visible. Instead only a string "bad" is in place of normal content. Shouldn't it be possible somehow not to show that node at all, if the user cannot access the field where the search word is located?

Jody Lynn’s picture

Category: feature » bug

I understand that getting the ideal behavior with core search is not going to happen, but I still believe the current behavior is a bug.

As indicated in the original post here, if you run cron as superuser from the status report, it will index all the private fields, and then would show private content to other searchers. That is a major problem.

Also, when a site builder uses content permissions, there is nothing other than this issue currently to indicate to them that their site search is potentially going to become useless. In my case, I came upon this for a site which has nearly all private fields and only allows admins to use search, but search was not working at all (patch in #4 was appropriate in this case and worked, but would not be a good general solution).

koyama’s picture

+1 subscribe

Magnus’s picture

Category: bug » feature

I have three sites where almost all content is for internal use. I tried the patch in #4 and it works for me, even though I know it isn't the right aproach.

anrikun’s picture

Category: feature » bug

Sorry to turn this into a bug again but please read my first post again and #11 too.

There is some inconsistency here:
- when a scheduled cron is run, it is run as an anonymous user, fields not accessible to anonymous users are not indexed.
- but if you reset indexing and run the cron manually, from admin pages, the same private fields will be indexed (this is because admin user can access anything).

As a result, CCK fields not accessible to anonymous users are not indexed in a case and are indexed in another.
This inconsistency is the bug.

Edit: By the way, I haven't tried 6.x-3.x yet, is this problem still present?

anrikun’s picture

Here is a patch for 6.x-2.9 implementing the change described in Issue summary.

Jody Lynn’s picture

Status: Needs review » Reviewed & tested by the community

#4 stopped working for me in 2.9 and #15 does work.

jonathan1055’s picture

Status: Reviewed & tested by the community » Needs work

Hi Jody,
Your comments imply that the patches are not ready to be committed yet. When this is the case, the status should be 'needs work'.

'reviewed and tested by the community' actual means 'reviewed and tested by the community, and everything is good and ready to be committed'. Interestingly RTBC seems to have the dual meaning 'Reviewed and Tested By the Community' and 'Ready To Be Committed'.

Jonathan

anrikun’s picture

@jonathan1055:
I think that Jody wanted to say that she had reviewed & tested patch at #15 and that it worked.
So unless you found that patch at #15 does not work, why reverting to Needs work?

jonathan1055’s picture

Status: Needs work » Reviewed & tested by the community

Oops, I'm very sorry, I mis-read the line. I thought it was '#15 does not work'.
I can't explain why I made that mistake! Re-setting the status.
Jonathan

anrikun’s picture

I better understand then :-D No problem!

grndlvl’s picture

FileSize
713 bytes

Re-rolling so automated patches.make applies patch without hassle.