While testing using usasearch_api to populate our search index, I was seeing error messages and other content in search results that should not have been included. Viewing the JSON object that was sent via the API, the content property included the full, rendered markup for the page, including the page header and footer. Additionally, messages and contextual links which should not be visible to anonymous users are included. I am able to search for text which should not be visible to anonymous users and see results from the search index. When viewing the page as an anonymous user, the text in question is not rendered.

I believe this may be occurring because we use panels to control the node display rather than the display mode set on the content type. Our panelization for content types includes the page header, footer, breadcrumbs, navigation, system messages, individual node fields, and other custom panes (including some which reference other content).

It doesn't appear from looking at the code that generates the content to be indexed that there is a way to change the way the document content is generated until after, and the alter hook that appears to exist only sends the document, not the original node used to generate the content.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pixlkat created an issue. See original summary.

schiavone’s picture

Here's a patch. Give it a try. This should resolve the issue. If it does we'll roll it into a release.

pixlkat’s picture

+++ b/usasearch_api/usasearch_api.module
@@ -170,31 +170,37 @@ function usasearch_api_preprocess_node(&$vars) {
+    if (node_access('view', $node, $anon_user) || $force) {

This check still references the $anon_user variable which has been removed above. It should either not be removed above, or the assignment of the global $user variable to the anonymous user should be moved ahead of this and the variable in the call to node_access() replaced. As it stands, node_access() will receive a NULL value for the account, which is not the desired behavior.

Otherwise, the code resulted in the removal of content which should not be visible to anonymous users.

The other issue is strictly panels-related as the view mode is replaced with the panelized version of the node. I think we might be able to solve this by adding a variable for the content display mode which would default to 'full'. We could then override that and remove the header/footer and extra content we don't want indexed along with the node.

schiavone’s picture

Thanks for the feedback @pixlkat. I agree that users will want some control over the view mode for the indexed content so that will be a good addition. I'll re-roll with the added functionality.

schiavone’s picture

I've re-rolled the patch with the new feature for selecting a view mode for the content that is submitted to the index. This adds the ability to explicitly remove field from what gets indexed. This is similar to the setting that give the same control over what text is displayed in the search results.

@pixlkat please help out by trying the patch. Once review it will get reviewed into a new release.

  • schiavone committed 432f1d9 on 7.x-5.x
    Issue #2848883 by schiavone, pixlkat: Content sent to be indexed...
schiavone’s picture

Status: Active » Fixed
schiavone’s picture

Assigned: Unassigned » schiavone

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.