Problem/Motivation
A search result page shows title(s) and the excerpt in each search result item. However, it contains any item even though a user doesn't have any permission to view an item since Search module bypasses hook_node_access
while building the list of the search results.
This is the expected result, since hook_node_access
does not operate on listings.
However, when using a node access module that writes node access grants, a change to the grants and a rebuild of the node access table may cause search results to appear based on the prior grants.
Proposed resolution
Invalidate the search index cache tags when node_access_rebuild() is called.
Remaining tasks
None.
User interface changes
None.
API changes
None.
Data model changes
None.
Comment | File | Size | Author |
---|---|---|---|
#49 | 2784135-49.patch | 2.54 KB | Peter Majmesku |
#48 | 2784135-48.patch | 4.13 KB | Peter Majmesku |
#34 | 2784135-34.patch | 870 bytes | pwolanin |
#24 | 2784135-17.patch | 5.49 KB | Peter Majmesku |
#17 | 2784135-16.patch | 800 bytes | yas |
Comments
Comment #2
yasComment #3
yasComment #5
yasComment #8
yasComment #9
yasCI error occurred, so trying once again.
Comment #10
Peter MajmeskuI can confirm that this patch does the job. This is exactly what I was searching regarding an issue in my contrib module: https://www.drupal.org/node/2831165. Please apply this patch to the core.
Comment #11
yasComment #12
tstoecklerThis should not invoke
hook_node_access()
manually, but instead call$node->access()
. Then we also don't need this array check, but can just check$access_result->isAllowed()
on the return value of that.Comment #13
yas@tstoeckler
Thank you for your guidance. Please let me suggest the new patch.
Comment #14
yasRevamped.
Comment #15
tstoecklerThanks, the patch looks better now in my opinion. I found two minor things that I think we can improve, but more importantly in order for this to go in, we need an automated test proving that it works.
You can remove the
$return_as_object =
part, just passTRUE
.I think it would be more readable to instead to
if (!$access_result->isAllowed())
Comment #16
BerdirYou could actually remove the third argument completely, then you get a boolean back.
But, we might lose cacheability metadata then, calling it with TRUE is correct, but we need to merge the cacheability metadata of that into the render array then.
Comment #17
yas@tstoeckler
Thank you for your review. Yes, I agree to your comment and actually those are more natural to me since I wrote exactly the same code that you pointed out at first. However I know that I suggested the previous patch with redundant code in order to make it more explicit:
$node->access()
by looking into the Drupal core APIs. I thought that someone like me wonders and needs to learn about the same thing in the future when s/he reads the code where we put onlyTRUE
on the third parameter in the$node->access()
method.!$variable
as "NOT" but we can also find the code such as$variable === FALSE
widely in the Drupal core code. I thought that we could give the more explicit message to the code readers.However anyway, I respect your feedback.
Comment #18
Peter MajmeskuRegarding the automated test hint from #15: Must it be a functional test by inheriting from KernelTestBase or WebTestBase or can it be an unit test, which inherits from PHPUnit_Framework_TestCase?
Comment #19
Peter Majmesku@yas: What is the state regarding the automated test (see #15)?
Comment #20
Peter MajmeskuComment #21
yas@jepSter
Good timing. Now I can tell you what I did and what I am thinking:
view
,own
,content
,access
,permission
,node_access
and/orisallowed
.view own unpublished nodes
as a result of my study. As we know, Drupal doesn't have a permission such asview own published nodes
.\Drupal\search\Tests\SearchNodeUpdateAndDeletionTest
as the basic logic. However still we need our custom permission likeview own node content
.book_breadcrumb_test.module
in Drupal core code ornode_view_permissions
\Drupal\search\Tests\SearchNodeUpdateAndDeletionTest
andnode_view_permissions
Then, I wanted to ask the Drupal community if we can include
node_view_permissions
as it is (that is, 100% code is the same) into search module test case or not. If yes, we can just change the class name toSearchNodeAccess
and the function name tosearch_node_access
.Any comment, hint, suggestions or strategy are welcome.
Comment #22
Peter Majmesku@yas: Regarding the automated test. Why so complicated as this issue is about an if-case for a loop continue statement? I would write an unit test, mock the dependency and check only if the loop works as expected. What do you think?
Comment #23
yas@jepSter
Thank you for your message.
SearchPluginBase::buildResults()
is a callback function. My questions are:'view'
permission? We can usedrupalCreateNode
and after creating it, we can change the owner (the user) of the node for the test, however how can we set up a'view'
permission associated to the user? (Again, Drupal only have an'access content'
permission.)Comment #24
Peter MajmeskuI have extended your patch by the unit test. It checks that SearchPluginBase::buildResults() retrieves the expected amount of nodes. Based on the return of isAllowed() within the mocked node objects. So you find the detail answer to question number 2 in my patch.
Regarding question number 1: I have no idea why you would need this. I think it is enough to check if the access result retrieves TRUE via isAllowed(). In your implementation you can expect that the user object just works and it has nothing to do with the context of your small code change.
For respecting the code standards, $access_result needs to be named in camel case ($accessResult). I have done so within the new patch.
Comment #25
tstoecklerPlease do not set your own patches to RTBC. (I'm assuming this was a simple mistake, but still noting here for people following along at home.)
Some notes on the patch:
I just saw that this is in the generic search plugin, but the code is node specific. So I guess we would have to add this to
NodeSearch
instead? Although that's unfortunate as thenUserSearch
would not benefit from this.Let's use
$access_result
as the variable name for now for consistency with the rest of the class.As @Berdir mentioned (and I overlooked :-/) we have to merge the cacheability metadata of the access result into the (weirdly named)
$build
variable.Something like this should work:
CacheableMetadata::createFromObject($access_result)->applyTo($built);
Maybe it's just me but I find it confusing to use both Prophecy and the PHPUnit mocking system in one test. I personally prefer Prophecy, but at least I think we should stick to one.
It's nice to use proper keys for the test cases, i.e. something like "disallowed-disallowed-allowed" or "1-allowed" or something as this will show up in test results and makes debugging a bit easier.
(Here and below) Personally, I think this would be easier to read if this were on one line. I guess that's arguable, though, so feel free to ignore.
Missing newline here.
Comment #26
Peter MajmeskuRight now I have doubts, if we should implement any code for this issue at all and the source of the problem is our comprehension of the search index build process. Because it looks like the node grants are checked on search index build and hook_node_access() is obsolet in this case. This might be the way to go and we are running risk that our code optimizations are a waste of time.
What I have done:
So before we invest time in any code tweaks, we should make sure that our code suits Drupal's behavioral specifications. Is it meant to re-build the search index after each node grant update?
If so: this is performance expensive, when it comes to restrictive websites, which should definitely keep the search index and the node grants records in sync. But it "could" work for sites with < 1.000 nodes.
The performance footprint is not that big, when we add the access check for each node in the search results listing process. So it might enrich Drupal's access functionality. Also such websites can consider to disable the search module's functionality.
Please tell what the specifications and your points of view are.
Comment #27
tstoecklerTagging "Needs subsystem maintainer review" in the hopes that one of the search.module maintainers will see this. I can't really say much to #26 I'm not that knowledgable about the innerworkings of the search system.
Comment #28
pwolanin CreditAttribution: pwolanin as a volunteer commentedThis looks like it works as designed and the module you reference should have better documentation to explain that it will not filter listing pages or search results.
Views will also include title or field of nodes in listings unless you use are supplying node access grants and node access records. Have you tried seeing if restricted content is visible in views?
hook node access is only meant to limit viewing the node page directly. I think it was a mistake actually to have added it in Drupal 7 since this confusion comes up regularly. If a module wants to control listing is needs to be a node access module and implement hook_node_grants() and hook_node_access_records().
I will make this closed, but wanted to make sure you see this comment first, or if you have any more info that would support it being a bug.
Comment #29
Peter MajmeskuWell, I am the maintainer of the Permissions by Term module. The module is filtering nodes in views listings and search result listings. But it cannot do it quite good, because of the Drupal core.
Right now I re-create all node access records on every node, user and taxonomy term save (update + create). So the node items are filtered in views. However this is not very performant. Imagine that you have > 10.000 nodes and you rebuild all node access records on each node save. This will take some time and annoy people. So I am looking for a more performant way of doing it. I could not make it work by modifying the views queries.
Regarding the search result pages: Compared to views support, the entire search index must be rebuild after a content permission update has happened. After some thinking I find it a pitty that we cannot add the access check into the search result pages build process. That would make Drupal saver. I can imagine that there are many websites which restrict their content via node access records (because this is Drupal's standard) and there's always a gap between node access records update and the search index rebuild.
This would work for any magazine websites, which need to release their content periodically. But this is not usable for intranets or applications where content needs to be shared quickly between user groups and the access needs to be kept in sync with the access restriction settings (sensible documents like contracts). This sites are just loosing Drupal's core search functionality, if they do not want to mention this documents in the search in any way (not showing up the title and not provide search functionality within this sensible documents).
I hope that I have overlooked any Drupal functionality for filtering Views (without loosing the pager functionality, like via hook_views_post_execute()) and core search. Any helpful hint would be appreciated.
Comment #30
pwolanin CreditAttribution: pwolanin as a volunteer commentedYes, you ned to rebuild the node access table if you are changing the permissions, but not the search index.
If you are having to rebuild records when any node, term, or user is saved that sounds like a flawed system.
In any event, this still sounds like it is working as designed - not a bug. And the patch as you have it would break paging, so doesn't seem like a good solution.
Comment #31
Peter MajmeskuUnfortunately yes.
The updated records in the node access table take only effect in the search results, when I re-index the site and run the cron to rebuild the search index. So there is a gap between the node access records and the search results. If this is "not" the expected behavior, then we have a bug here.
Comment #32
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedI'm sorry, but I don't actually believe that you would need to re-index to have the node access entries affect the node module search results.
Nothing about the node access information is stored as part of the search index (see schema).
You should check if perhaps there is a failure to clear cached searched results that makes it seems as though the new node access grants are not being applied.
Comment #33
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedAh, yes - that's probably it - see the setting of cache tags in \Drupal\search\Controller\SearchController::view
There may be a bug there in the sense that triggering a node access rebuild should probably invalidate the search index cache tags some way.
Comment #34
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedHere's a possible fix for the cache issue.
Comment #35
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedComment #36
Peter MajmeskuI have tested this patch. I can confirm that it works properly in relation to the records in the node_access database table. I have edited a restricted node a few times to check and the search results were as expected. Great that this can work without rebuilding the entire search index. Please apply to core.
Comment #37
Peter MajmeskuComment #39
Peter Majmesku@pwolanin: There is some functional test failing. Will you take care of it?
Comment #40
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedLooks like this is an unrelated and sporadic fail due to a bug in the test runner mentioned recently at #2829040: [meta] Known intermittent, random, and environment-specific test failures
Comment #42
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedComment #43
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedThis may be major due to the information disclosure risk
Comment #45
Peter MajmeskuWell, then this patch is probably ready. However it is always weird, when automated tests are failing periodically. There is always a reason for this behavior. Is this automated testing issue a blocker for the release of this patch?
Comment #46
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedThere is another issue to fix the sporadic fails.
We should add tests here to prove the bug and the fix.
Comment #47
Peter MajmeskuI have created a functional test, which proves that the cache tag "search_index:node_search" is not contained in the cache after node_access_rebuild() has been called. Since Cache::invalidateTags(['search_index:node_search']) has been implemented into node_access_rebuild() to ensure that nodes with updated node access records aren't listed in the search results for unauthorized users.
Comment #48
Peter MajmeskuSome test cleanup.
Comment #49
Peter MajmeskuOnce more.
Comment #50
alexpottThis is adding search dependent code to the node module. Which has no dependency on search. I think we should clear a node_access cache tag or something like that and add this cache tag to the search listing. And we should add documentation that anything that needs to respond to a node access change needs to have this cache tag.
Comment #51
Peter MajmeskuMore precisely the node module should be just separated from the search module. If we want to separate the search module from the node module, we could use an event listener in the search module. However I do not know any Drupal event, which could be triggered by an event listener inside the search module. Can you point me to the right direction? I know the current event options by this page: https://api.drupal.org/api/drupal/core%21core.api.php/group/events/8.2.x.
Comment #52
alexpott@Peter Majmesku - it's fine to use the cache tag system - we just need to clear a more appropriately named tag. It's fine for whatever is tagging cache entries with search_index:node_search to also add a tag of node_access_rebuild or whatever.
Comment #53
alexpottLooks like we'll have to add a new method and interface to the search plugins to get additional tags.
Comment #54
BerdirWe have the standard entity cache tags, invalidating those would require a query for all search pages using that plugin, but that's not that much of an overhead and then we don't need to invent/add additional cache tags, which need to be checked on every search page.
There's also plenty of search specific code in various places in node module, for example node_cron(): https://api.drupal.org/api/drupal/core%21modules%21node%21node.module/fu.... We could wrap it in a module exists and then that would be OK.
That said, wondering if that's the only scenario where this could be a problem. Wouldn't normal node lists also be affected? What if we invalidate the node_list cache tag, is that present on search pages?
Comment #55
alexpottnode_list sounds very sensible to invalidate - but we're going to need to still have a way for search plugins to declare additional tags.
Comment #56
Peter MajmeskuI may repeat here some stuff, which you have already mentioned, but I want to make the picture clearer here and summarize.
In modules/node/node.module:1195 we have one line which resets the cache for 1 node.
Search result pages "should" be tied to an entire search result list. In modules/search/src/Controller/SearchController.php:139 is a cache tag set for a view.
So we do not have a way there to flush the cache for a more general way. @Berdir: Do you have an example, which we could re-use to flush the search result cache here more properly? In which already existing cache tag we could move the search results cache tag?
Please mind that the search_index:node_search cache tag is also covered by multiple tests. They would need to be modified, if we remove this cache tag. Luckily the tests are only asserting that this cache tag exists. So there is not much to refactor.
My proposal for keeping the search module separate from the node module
We have symfony's event system in Drupal 8 - so we should treat hooks for the lack of oop-extensibility and event handling overhead by PHP 4 style as deprecations at all. So in my proposal we would add such a line into node_access_rebuild() and _node_access_rebuild_batch_finished():
The new cache tag "idea" seems unformed to me
Afterwards we would be able to trigger the event by a event listener. We would still be able to just clear the already existing search_index:node_search cache tag if we decide that an reworked cache tag is outside of the scope of this issue. The current suggestion regarding the "standard entity" cache tag seems unclear to me and I am not sure about the implementation. Since this major issue here is active since August last year, I would be able to solve this issue via the event approach in a small time amount. If you are clutching to the cache tag rewrite, then please provide verbose guidance.
Comment #57
alexpottSo as it turns out SearchPluginBase implements \Drupal\Core\Cache\RefinableCacheableDependencyTrait
All we need to add to NodeSearch is:
And that'll add the node_list cache tag to node searches. And then we can clear that in the node access rebuild code.
The test added in the latest patch needs a bit of work. It is too coupled to the implementation using cache tags. It should be testing a node search that gets cached and then something should be updated that requires a node access rebuild and then we should do the search again and ensure that the search result is updated.
Comment #58
Peter MajmeskuSounds like an easy implementation. The test on the other hand seems a bit more complicated to me.
Do you have an example in the current code base which could be used as a starting point for updating the node access and create a search result list?
Comment #59
Peter MajmeskuI cannot reproduce the issue anymore in 8.4.x-dev. What I have done:
So I assume, that something else has already taken care of this issue. I am closing this issue. Please re-open, if you can reproduce this issue.