Problem/Motivation

A search result page shows title(s) and the excerpt in each search result item. However, it contains any item even though a user doesn't have any permission to view an item since Search module bypasses hook_node_access while building the list of the search results.

This is the expected result, since hook_node_access does not operate on listings.

However, when using a node access module that writes node access grants, a change to the grants and a rebuild of the node access table may cause search results to appear based on the prior grants.

Proposed resolution

Invalidate the search index cache tags when node_access_rebuild() is called.

Remaining tasks

None.

User interface changes

None.

API changes

None.

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yas created an issue. See original summary.

yas’s picture

yas’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, 2: 2784135-2.patch, failed testing.

yas’s picture

Status: Needs work » Needs review
FileSize
916 bytes

Status: Needs review » Needs work

The last submitted patch, 5: 2784135-3.patch, failed testing.

The last submitted patch, 5: 2784135-3.patch, failed testing.

yas’s picture

Status: Needs work » Needs review
yas’s picture

CI error occurred, so trying once again.

Peter Majmesku’s picture

Status: Needs review » Reviewed & tested by the community

I can confirm that this patch does the job. This is exactly what I was searching regarding an issue in my contrib module: https://www.drupal.org/node/2831165. Please apply this patch to the core.

yas’s picture

Issue summary: View changes
tstoeckler’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/search/src/Plugin/SearchPluginBase.php
@@ -95,10 +95,19 @@ public function getType() {
+        $permissions = \Drupal::moduleHandler()->invokeAll('node_access', array($result['node'], 'view', $user));

This should not invoke hook_node_access() manually, but instead call $node->access(). Then we also don't need this array check, but can just check $access_result->isAllowed() on the return value of that.

yas’s picture

Status: Needs work » Needs review
FileSize
829 bytes

@tstoeckler

Thank you for your guidance. Please let me suggest the new patch.

yas’s picture

Revamped.

tstoeckler’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Thanks, the patch looks better now in my opinion. I found two minor things that I think we can improve, but more importantly in order for this to go in, we need an automated test proving that it works.

  1. +++ b/core/modules/search/src/Plugin/SearchPluginBase.php
    @@ -95,10 +95,18 @@ public function getType() {
    +        $access_result = $result['node']->access('view', $user, $return_as_object = TRUE);
    

    You can remove the $return_as_object = part, just pass TRUE.

  2. +++ b/core/modules/search/src/Plugin/SearchPluginBase.php
    @@ -95,10 +95,18 @@ public function getType() {
    +        if ($access_result->isAllowed() === FALSE) {
    

    I think it would be more readable to instead to if (!$access_result->isAllowed())

Berdir’s picture

You could actually remove the third argument completely, then you get a boolean back.

But, we might lose cacheability metadata then, calling it with TRUE is correct, but we need to merge the cacheability metadata of that into the render array then.

yas’s picture

Status: Needs work » Needs review
FileSize
800 bytes

@tstoeckler

Thank you for your review. Yes, I agree to your comment and actually those are more natural to me since I wrote exactly the same code that you pointed out at first. However I know that I suggested the previous patch with redundant code in order to make it more explicit:

  1. I learned about the third parameter of $node->access() by looking into the Drupal core APIs. I thought that someone like me wonders and needs to learn about the same thing in the future when s/he reads the code where we put only TRUE on the third parameter in the $node->access() method.
  2. I prefer the way of describing !$variable as "NOT" but we can also find the code such as $variable === FALSE widely in the Drupal core code. I thought that we could give the more explicit message to the code readers.

However anyway, I respect your feedback.

Peter Majmesku’s picture

Regarding the automated test hint from #15: Must it be a functional test by inheriting from KernelTestBase or WebTestBase or can it be an unit test, which inherits from PHPUnit_Framework_TestCase?

Peter Majmesku’s picture

@yas: What is the state regarding the automated test (see #15)?

Peter Majmesku’s picture

Status: Needs review » Needs work
yas’s picture

@jepSter

Good timing. Now I can tell you what I did and what I am thinking:

  1. I thoroughly investigated Drupal core code regarding how we can create the test case. I grepped the combination of keywords like view, own, content, access, permission, node_access and/or isallowed.
  2. Unfortunately, I only found that Drupal has a permission called view own unpublished nodes as a result of my study. As we know, Drupal doesn't have a permission such as view own published nodes.
  3. I think we can create a test case based on \Drupal\search\Tests\SearchNodeUpdateAndDeletionTest as the basic logic. However still we need our custom permission like view own node content.
  4. Therefore, I assume that we need to add an extra module for the test such as book_breadcrumb_test.module in Drupal core code or node_view_permissions
  5. I am thinking we can create the test case by combining \Drupal\search\Tests\SearchNodeUpdateAndDeletionTest and node_view_permissions

Then, I wanted to ask the Drupal community if we can include node_view_permissions as it is (that is, 100% code is the same) into search module test case or not. If yes, we can just change the class name to SearchNodeAccess and the function name to search_node_access.

Any comment, hint, suggestions or strategy are welcome.

Peter Majmesku’s picture

@yas: Regarding the automated test. Why so complicated as this issue is about an if-case for a loop continue statement? I would write an unit test, mock the dependency and check only if the loop works as expected. What do you think?

yas’s picture

@jepSter

Thank you for your message. SearchPluginBase::buildResults() is a callback function. My questions are:

  1. How can we create a node associated a user who has a 'view' permission? We can use drupalCreateNode and after creating it, we can change the owner (the user) of the node for the test, however how can we set up a 'view' permission associated to the user? (Again, Drupal only have an 'access content' permission.)
  2. How can we mock the dependencies?
Peter Majmesku’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
5.49 KB

I have extended your patch by the unit test. It checks that SearchPluginBase::buildResults() retrieves the expected amount of nodes. Based on the return of isAllowed() within the mocked node objects. So you find the detail answer to question number 2 in my patch.

Regarding question number 1: I have no idea why you would need this. I think it is enough to check if the access result retrieves TRUE via isAllowed(). In your implementation you can expect that the user object just works and it has nothing to do with the context of your small code change.

For respecting the code standards, $access_result needs to be named in camel case ($accessResult). I have done so within the new patch.

tstoeckler’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: -Needs tests

Please do not set your own patches to RTBC. (I'm assuming this was a simple mistake, but still noting here for people following along at home.)

Some notes on the patch:

  1. --- a/core/modules/search/src/Plugin/SearchPluginBase.php
    +++ b/core/modules/search/src/Plugin/SearchPluginBase.php
    

    I just saw that this is in the generic search plugin, but the code is node specific. So I guess we would have to add this to NodeSearch instead? Although that's unfortunate as then UserSearch would not benefit from this.

  2. +++ b/core/modules/search/src/Plugin/SearchPluginBase.php
    @@ -95,10 +95,18 @@ public function getType() {
    +        $accessResult = $result['node']->access('view', $user, TRUE);
    

    Let's use $access_result as the variable name for now for consistency with the rest of the class.

  3. +++ b/core/modules/search/src/Plugin/SearchPluginBase.php
    @@ -95,10 +95,18 @@ public function getType() {
    +        if (!$accessResult->isAllowed()) {
    

    As @Berdir mentioned (and I overlooked :-/) we have to merge the cacheability metadata of the access result into the (weirdly named) $build variable.

    Something like this should work:
    CacheableMetadata::createFromObject($access_result)->applyTo($built);

  4. +++ b/core/tests/Drupal/Tests/Core/Access/SearchResultsAccessTest.php
    @@ -0,0 +1,195 @@
    +    $user = $this->prophesize(AccountProxyInterface::class);
    ...
    +    $mock = $this
    +      ->getMockBuilder($namespace);
    

    Maybe it's just me but I find it confusing to use both Prophecy and the PHPUnit mocking system in one test. I personally prefer Prophecy, but at least I think we should stick to one.

  5. +++ b/core/tests/Drupal/Tests/Core/Access/SearchResultsAccessTest.php
    @@ -0,0 +1,195 @@
    +    return [
    +      [
    

    It's nice to use proper keys for the test cases, i.e. something like "disallowed-disallowed-allowed" or "1-allowed" or something as this will show up in test results and makes debugging a bit easier.

  6. +++ b/core/tests/Drupal/Tests/Core/Access/SearchResultsAccessTest.php
    @@ -0,0 +1,195 @@
    +          [
    +            'node' => $this->createDisallowedMock(),
    +          ],
    

    (Here and below) Personally, I think this would be easier to read if this were on one line. I guess that's arguable, though, so feel free to ignore.

  7. +++ b/core/tests/Drupal/Tests/Core/Access/SearchResultsAccessTest.php
    @@ -0,0 +1,195 @@
    +}
    \ No newline at end of file
    

    Missing newline here.

Peter Majmesku’s picture

Status: Needs work » Needs review

Right now I have doubts, if we should implement any code for this issue at all and the source of the problem is our comprehension of the search index build process. Because it looks like the node grants are checked on search index build and hook_node_access() is obsolet in this case. This might be the way to go and we are running risk that our code optimizations are a waste of time.

What I have done:

  • An node which was only granted for a certain user via an node access record has been setup to be visible for all users via the grants system.
  • I search for this node - I cannot find it.
  • I rebuild the permissions via URL call (/admin/reports/status/rebuild)
  • I re-index the site via the button at /admin/config/search/pages
  • I run the cron via the button at /admin/config/system/cron to rebuild the search index.
  • Then I search for the desired node - I am getting it listed within the search results.

So before we invest time in any code tweaks, we should make sure that our code suits Drupal's behavioral specifications. Is it meant to re-build the search index after each node grant update?

If so: this is performance expensive, when it comes to restrictive websites, which should definitely keep the search index and the node grants records in sync. But it "could" work for sites with < 1.000 nodes.

The performance footprint is not that big, when we add the access check for each node in the search results listing process. So it might enrich Drupal's access functionality. Also such websites can consider to disable the search module's functionality.

Please tell what the specifications and your points of view are.

tstoeckler’s picture

Tagging "Needs subsystem maintainer review" in the hopes that one of the search.module maintainers will see this. I can't really say much to #26 I'm not that knowledgable about the innerworkings of the search system.

pwolanin’s picture

Status: Needs review » Postponed (maintainer needs more info)

This looks like it works as designed and the module you reference should have better documentation to explain that it will not filter listing pages or search results.

Views will also include title or field of nodes in listings unless you use are supplying node access grants and node access records. Have you tried seeing if restricted content is visible in views?

hook node access is only meant to limit viewing the node page directly. I think it was a mistake actually to have added it in Drupal 7 since this confusion comes up regularly. If a module wants to control listing is needs to be a node access module and implement hook_node_grants() and hook_node_access_records().

I will make this closed, but wanted to make sure you see this comment first, or if you have any more info that would support it being a bug.

Peter Majmesku’s picture

Status: Postponed (maintainer needs more info) » Needs review

This looks like it works as designed and the module you reference should have better documentation to explain that it will not filter listing pages or search results.

Well, I am the maintainer of the Permissions by Term module. The module is filtering nodes in views listings and search result listings. But it cannot do it quite good, because of the Drupal core.

Views will also include title or field of nodes in listings unless you use are supplying node access grants and node access records. Have you tried seeing if restricted content is visible in views?

Right now I re-create all node access records on every node, user and taxonomy term save (update + create). So the node items are filtered in views. However this is not very performant. Imagine that you have > 10.000 nodes and you rebuild all node access records on each node save. This will take some time and annoy people. So I am looking for a more performant way of doing it. I could not make it work by modifying the views queries.

Regarding the search result pages: Compared to views support, the entire search index must be rebuild after a content permission update has happened. After some thinking I find it a pitty that we cannot add the access check into the search result pages build process. That would make Drupal saver. I can imagine that there are many websites which restrict their content via node access records (because this is Drupal's standard) and there's always a gap between node access records update and the search index rebuild.

This would work for any magazine websites, which need to release their content periodically. But this is not usable for intranets or applications where content needs to be shared quickly between user groups and the access needs to be kept in sync with the access restriction settings (sensible documents like contracts). This sites are just loosing Drupal's core search functionality, if they do not want to mention this documents in the search in any way (not showing up the title and not provide search functionality within this sensible documents).

I hope that I have overlooked any Drupal functionality for filtering Views (without loosing the pager functionality, like via hook_views_post_execute()) and core search. Any helpful hint would be appreciated.

pwolanin’s picture

Category: Bug report » Feature request
Priority: Major » Normal
Status: Needs review » Needs work

Yes, you ned to rebuild the node access table if you are changing the permissions, but not the search index.

If you are having to rebuild records when any node, term, or user is saved that sounds like a flawed system.

In any event, this still sounds like it is working as designed - not a bug. And the patch as you have it would break paging, so doesn't seem like a good solution.

Peter Majmesku’s picture

Status: Needs work » Needs review

If you are having to rebuild records when any node, term, or user is saved that sounds like a flawed system.

Unfortunately yes.

Yes, you ned to rebuild the node access table if you are changing the permissions, but not the search index.

The updated records in the node access table take only effect in the search results, when I re-index the site and run the cron to rebuild the search index. So there is a gap between the node access records and the search results. If this is "not" the expected behavior, then we have a bug here.

pwolanin’s picture

I'm sorry, but I don't actually believe that you would need to re-index to have the node access entries affect the node module search results.

Nothing about the node access information is stored as part of the search index (see schema).

You should check if perhaps there is a failure to clear cached searched results that makes it seems as though the new node access grants are not being applied.

CREATE TABLE `search_index` (
  `word` varchar(50) NOT NULL DEFAULT '' COMMENT 'The search_total.word that is associated with the search item.',
  `sid` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'The search_dataset.sid of the searchable item to which the word belongs.',
  `langcode` varchar(12) CHARACTER SET ascii NOT NULL DEFAULT '' COMMENT 'The languages.langcode of the item variant.',
  `type` varchar(64) CHARACTER SET ascii NOT NULL COMMENT 'The search_dataset.type of the searchable item to which the word belongs.',
  `score` float DEFAULT NULL COMMENT 'The numeric score of the word, higher being more important.',
  PRIMARY KEY (`word`,`sid`,`langcode`,`type`),
  KEY `sid_type` (`sid`,`langcode`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Stores the search index, associating words, items and…' 
pwolanin’s picture

Ah, yes - that's probably it - see the setting of cache tags in \Drupal\search\Controller\SearchController::view

There may be a bug there in the sense that triggering a node access rebuild should probably invalidate the search index cache tags some way.

pwolanin’s picture

Title: Search module bypasses hook_node_access while listing the search results » Search module seems to bypasses hook_node_access while listing the search results
Category: Feature request » Bug report
Issue summary: View changes
FileSize
870 bytes

Here's a possible fix for the cache issue.

pwolanin’s picture

Title: Search module seems to bypasses hook_node_access while listing the search results » Search module seems to bypasses node access while listing the search results
Peter Majmesku’s picture

I have tested this patch. I can confirm that it works properly in relation to the records in the node_access database table. I have edited a restricted node a few times to check and the search results were as expected. Great that this can work without rebuilding the entire search index. Please apply to core.

Peter Majmesku’s picture

Status: Needs review » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 34: 2784135-34.patch, failed testing.

Peter Majmesku’s picture

@pwolanin: There is some functional test failing. Will you take care of it?

pwolanin’s picture

Status: Needs work » Needs review

Looks like this is an unrelated and sporadic fail due to a bug in the test runner mentioned recently at #2829040: [meta] Known intermittent, random, and environment-specific test failures

Status: Needs review » Needs work

The last submitted patch, 34: 2784135-34.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
pwolanin’s picture

Priority: Normal » Major

This may be major due to the information disclosure risk

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Peter Majmesku’s picture

Well, then this patch is probably ready. However it is always weird, when automated tests are failing periodically. There is always a reason for this behavior. Is this automated testing issue a blocker for the release of this patch?

pwolanin’s picture

Issue tags: +Needs tests

There is another issue to fix the sporadic fails.

We should add tests here to prove the bug and the fix.

Peter Majmesku’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
2.6 KB

I have created a functional test, which proves that the cache tag "search_index:node_search" is not contained in the cache after node_access_rebuild() has been called. Since Cache::invalidateTags(['search_index:node_search']) has been implemented into node_access_rebuild() to ensure that nodes with updated node access records aren't listed in the search results for unauthorized users.

Peter Majmesku’s picture

FileSize
4.13 KB

Some test cleanup.

Peter Majmesku’s picture

Once more.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/node/node.module
@@ -1211,6 +1211,7 @@ function node_access_rebuild($batch_mode = FALSE) {
+    Cache::invalidateTags(['search_index:node_search']);

@@ -1284,6 +1285,7 @@ function _node_access_rebuild_batch_finished($success, $results, $operations) {
+    Cache::invalidateTags(['search_index:node_search']);

This is adding search dependent code to the node module. Which has no dependency on search. I think we should clear a node_access cache tag or something like that and add this cache tag to the search listing. And we should add documentation that anything that needs to respond to a node access change needs to have this cache tag.

Peter Majmesku’s picture

I think we should clear a node_access cache tag or something like that and add this cache tag to the search listing.

More precisely the node module should be just separated from the search module. If we want to separate the search module from the node module, we could use an event listener in the search module. However I do not know any Drupal event, which could be triggered by an event listener inside the search module. Can you point me to the right direction? I know the current event options by this page: https://api.drupal.org/api/drupal/core%21core.api.php/group/events/8.2.x.

alexpott’s picture

@Peter Majmesku - it's fine to use the cache tag system - we just need to clear a more appropriately named tag. It's fine for whatever is tagging cache entries with search_index:node_search to also add a tag of node_access_rebuild or whatever.

alexpott’s picture

    // If this plugin uses a search index, then also add the cache tag tracking
    // that search index, so that cached search result pages are invalidated
    // when necessary.
    if ($plugin->getType()) {
      $build['search_results']['#cache']['tags'][] = 'search_index';
      $build['search_results']['#cache']['tags'][] = 'search_index:' . $plugin->getType();
    }

Looks like we'll have to add a new method and interface to the search plugins to get additional tags.

Berdir’s picture

We have the standard entity cache tags, invalidating those would require a query for all search pages using that plugin, but that's not that much of an overhead and then we don't need to invent/add additional cache tags, which need to be checked on every search page.

There's also plenty of search specific code in various places in node module, for example node_cron(): https://api.drupal.org/api/drupal/core%21modules%21node%21node.module/fu.... We could wrap it in a module exists and then that would be OK.

That said, wondering if that's the only scenario where this could be a problem. Wouldn't normal node lists also be affected? What if we invalidate the node_list cache tag, is that present on search pages?

alexpott’s picture

node_list sounds very sensible to invalidate - but we're going to need to still have a way for search plugins to declare additional tags.

Peter Majmesku’s picture

I may repeat here some stuff, which you have already mentioned, but I want to make the picture clearer here and summarize.

An option for search plugins to declare additional tags

In modules/node/node.module:1195 we have one line which resets the cache for 1 node.

      $nids = $entity_query->execute();
      foreach ($nids as $nid) {
        $node_storage->resetCache([$nid]);
        $node = Node::load($nid);
        // To preserve database integrity, only write grants if the node
        // loads successfully.
        if (!empty($node)) {
          $grants = $access_control_handler->acquireGrants($node);
          \Drupal::service('node.grant_storage')->write($node, $grants);
        }
      }

Search result pages "should" be tied to an entire search result list. In modules/search/src/Controller/SearchController.php:139 is a cache tag set for a view.

    // If this plugin uses a search index, then also add the cache tag tracking
    // that search index, so that cached search result pages are invalidated
    // when necessary.
    if ($plugin->getType()) {
      $build['search_results']['#cache']['tags'][] = 'search_index';
      $build['search_results']['#cache']['tags'][] = 'search_index:' . $plugin->getType();
    }

So we do not have a way there to flush the cache for a more general way. @Berdir: Do you have an example, which we could re-use to flush the search result cache here more properly? In which already existing cache tag we could move the search results cache tag?

Please mind that the search_index:node_search cache tag is also covered by multiple tests. They would need to be modified, if we remove this cache tag. Luckily the tests are only asserting that this cache tag exists. So there is not much to refactor.

My proposal for keeping the search module separate from the node module

We have symfony's event system in Drupal 8 - so we should treat hooks for the lack of oop-extensibility and event handling overhead by PHP 4 style as deprecations at all. So in my proposal we would add such a line into node_access_rebuild() and _node_access_rebuild_batch_finished():

$this->getEventDispatcher()->dispatch(NodeEvents::ACCESS_REBUILD);

The new cache tag "idea" seems unformed to me

Afterwards we would be able to trigger the event by a event listener. We would still be able to just clear the already existing search_index:node_search cache tag if we decide that an reworked cache tag is outside of the scope of this issue. The current suggestion regarding the "standard entity" cache tag seems unclear to me and I am not sure about the implementation. Since this major issue here is active since August last year, I would be able to solve this issue via the event approach in a small time amount. If you are clutching to the cache tag rewrite, then please provide verbose guidance.

alexpott’s picture

So as it turns out SearchPluginBase implements \Drupal\Core\Cache\RefinableCacheableDependencyTrait
All we need to add to NodeSearch is:

  /**
   * {@inheritdoc}
   */
  protected $cacheTags = ['node_list'];

And that'll add the node_list cache tag to node searches. And then we can clear that in the node access rebuild code.

The test added in the latest patch needs a bit of work. It is too coupled to the implementation using cache tags. It should be testing a node search that gets cached and then something should be updated that requires a node access rebuild and then we should do the search again and ensure that the search result is updated.

Peter Majmesku’s picture

Sounds like an easy implementation. The test on the other hand seems a bit more complicated to me.

The test added in the latest patch needs a bit of work. It is too coupled to the implementation using cache tags. It should be testing a node search that gets cached and then something should be updated that requires a node access rebuild and then we should do the search again and ensure that the search result is updated.

Do you have an example in the current code base which could be used as a starting point for updating the node access and create a search result list?

Peter Majmesku’s picture

Status: Needs work » Closed (cannot reproduce)

I cannot reproduce the issue anymore in 8.4.x-dev. What I have done:

  1. Installed a new Drupal instance in version 8.4.x-dev
  2. Created a node
  3. Re-index site and run cron
  4. Search does reveal the node within the search results
  5. Added the node to a restricted realm via an node access record
  6. The search results do not reveal the restricted node anymore. Without re-indexing.
  7. If I remove the node restriction, the search results are revealing the restricted node. Also without re-indexing.

So I assume, that something else has already taken care of this issue. I am closing this issue. Please re-open, if you can reproduce this issue.