While testing using usasearch_api to populate our search index, I was seeing error messages and other content in search results that should not have been included. Viewing the JSON object that was sent via the API, the content property included the full, rendered markup for the page, including the page header and footer. Additionally, messages and contextual links which should not be visible to anonymous users are included. I am able to search for text which should not be visible to anonymous users and see results from the search index. When viewing the page as an anonymous user, the text in question is not rendered.
I believe this may be occurring because we use panels to control the node display rather than the display mode set on the content type. Our panelization for content types includes the page header, footer, breadcrumbs, navigation, system messages, individual node fields, and other custom panes (including some which reference other content).
It doesn't appear from looking at the code that generates the content to be indexed that there is a way to change the way the document content is generated until after, and the alter hook that appears to exist only sends the document, not the original node used to generate the content.
Comment | File | Size | Author |
---|---|---|---|
#5 | edit_content_sent_to_be-2848883-2.patch | 3.62 KB | schiavone |
#2 | edit_content_sent_to_be-2848883-1.patch | 1.97 KB | schiavone |
Comments
Comment #2
schiavone CreditAttribution: schiavone at Snake Hill Web Agency commentedHere's a patch. Give it a try. This should resolve the issue. If it does we'll roll it into a release.
Comment #3
pixlkat CreditAttribution: pixlkat commentedThis check still references the $anon_user variable which has been removed above. It should either not be removed above, or the assignment of the global $user variable to the anonymous user should be moved ahead of this and the variable in the call to node_access() replaced. As it stands, node_access() will receive a NULL value for the account, which is not the desired behavior.
Otherwise, the code resulted in the removal of content which should not be visible to anonymous users.
The other issue is strictly panels-related as the view mode is replaced with the panelized version of the node. I think we might be able to solve this by adding a variable for the content display mode which would default to 'full'. We could then override that and remove the header/footer and extra content we don't want indexed along with the node.
Comment #4
schiavone CreditAttribution: schiavone at Snake Hill Web Agency commentedThanks for the feedback @pixlkat. I agree that users will want some control over the view mode for the indexed content so that will be a good addition. I'll re-roll with the added functionality.
Comment #5
schiavone CreditAttribution: schiavone at Snake Hill Web Agency commentedI've re-rolled the patch with the new feature for selecting a view mode for the content that is submitted to the index. This adds the ability to explicitly remove field from what gets indexed. This is similar to the setting that give the same control over what text is displayed in the search results.
@pixlkat please help out by trying the patch. Once review it will get reviewed into a new release.
Comment #7
schiavone CreditAttribution: schiavone at Snake Hill Web Agency commentedComment #8
schiavone CreditAttribution: schiavone at Snake Hill Web Agency commented