I am having an odd issue.

If you search the site for a simple string like "stand", the first two pages of results are legitimate, but then on page three the results turn to news articles that all show the same (incorrect) content excerpt. These many news stories do not contain the search term, yet show this excerpt for a different page that does contain the term.
https://www.tballiance.org/search/search_by_page/stand

The excerpt that is repeated over and over, appears ONLY on the this page - https://www.tballiance.org/portfolio - and is a block placed there using the context module checking for the path "portfolio"

Similar results occur if you search for another word that exists in that same block, such as - https://www.tballiance.org/search/search_by_page/administer
Part way down the first page, the same incorrect repeated results begin.

Any ideas? Let me know if I explained that poorly.

Thanks

Comments

jamesrice created an issue. See original summary.

jhodgdon’s picture

Title: False results » False results with Context module

Hm... So, let me explain a bit how this module works:

a) Pages are indexed during drupal "cron" runs. During this process, each page is rendered, and then the text in the Content region is added to the search index.

b) When you search for something, the search index is used to find matches between the entered keyword(s) and pages. So only things that get into the search index should be found in search results.

c) During the display of search results, the text that was found during indexing for the Content region is used to generate an excerpt.

So it looks like this block is getting added to the search index for all the pages in (a). Which means something is going wrong during the rendering in the Search by Page module, that is causing this block to be displayed for more pages than what the Context module should have.

My guess is that it is a caching problem, because a bunch of pages are rendered during the same search results page request. Context isn't used to that -- it probably assumes that once it has decided what the context of a page is, it can continue using that until PHP starts up again. Search by Page breaks that assumption, because it renders multiple pages during cron. Search by Page does try to restore the page-related cache stuff after each page that it renders, but probably there's something in Context that also needs to be cleared out... not sure what it is, but that would be my best guess.

You could possibly get around this by setting the number of pages to be indexed per cron run to a very low number, like 1, and running cron often. You'd also need to reset the search index so that everything will get reindexed.

Alternatively, we could dig into Context and figure out what needs to be reset after each page build during the cron run.

jamesrice’s picture

Thanks for the suggestion. I am going to setup a test environment and try the 1 page per cron suggestion you made. Will report back result

For this particular block, since it only goes on one page using some very simple logic... if I used the regular block configuration instead the context that should prevent this problem?

Thanks for the quick reply

jhodgdon’s picture

It should, but I am not absolutely certain. You could try that and then hit the button to re-index the site (which will leave the existing search index intact as it runs through everything again and updates it).

jhodgdon’s picture

Status: Active » Postponed (maintainer needs more info)

Hi, it's been a while on this issue... Did setting the cron count to 1 fix your problem?

I think that may be the only viable solution. I took a look at the code in the Context module, and made these conclusions...

First I looked at the file plugins/context_reaction_block.inc , which is the plugin that decides what blocks to show on the page. It has a function block_list(), which does the work:
http://cgit.drupalcode.org/context/tree/plugins/context_reaction_block.i...
which uses a call to &drupal_static('context_reaction_block_list'); to static-cache the list of blocks to be displayed. So in an ordinary page load, that will be fine, because it will static cache for the duration of the page load. But in the context of Search by Page indexing, this will cause problems, because the other pages that are indexed later will not have this static cache variable cleared, and they will be using the blocks from the first page that is rendered and indexed.

And then you'll also notice inside the calculation, a call to context_active_contexts(), which figures out which contexts are active for the given page load -- this will also be a problem, for other context reactions, because again it will static-cache information about which contexts are relevant for the page, and for other types of reactions besides blocks, there could also be differences in how the page is rendered.

So... I think that setting the cron size to 1 is unfortunately the only solution if you want to use Search by Page on a site with the Context module, if Context is being used to put blocks into the Content region of the page (which is the part that Search by Page indexes).

I'm curious if that worked for you? If so, I might add a note to the README file about this issue, because it seems like it could be a common problem.

jhodgdon’s picture

Category: Bug report » Support request
Status: Postponed (maintainer needs more info) » Fixed

Given that setting the cron size to 1 is probably the only viable solution, I'm going to just set this to be a support request and mark it fixed. It seems that indexing with Context, if you have different blocks in the content region of your page on different pages, is just going to cause problems. I'll add a note to the README though.

  • jhodgdon committed d439842 on 7.x-1.x
    Issue #2833567 by jhodgdon: Document how to overcome problems with...
jhodgdon’s picture

I also added the ability in Search by Page to set this setting to value 1. Previously the lowest choice was 10 items per cron run.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.