Updated comment #15. The original summary was very different, see https://drupal.org/node/2123073/revisions/6822963/view/

Currently the general search configuration page at admin/config/search/settings includes an indexing throttle option. Confusingly, this is used by the NodeSearch plugin, but not by the UserSearch plugin as there is no index for users. Even if multiple search plugins do need an indexing throttle (e.g. for a contrib plugin), it is likely that the throttle will need to be set per-plugin, as different content types will index at different speeds.

Note: There is not a simple mapping between a search plugin, a search index and a search page. It is perfectly possible for a site to use, for example, 2 search plugins, 3 search indexes and 4 search pages. This is because some plugins don't require any index, some can share an index with other plugins of the same type, and others need their own index.

Proposed implementation 1

Proposed by jhodgon in #12

Configure separately for each defined NodeSearch page. The NodeSearch::updateIndex() method will be called once per defined NodeSearch page.

We do want to have the search cron setting be per-page though so you can say "each cron run I want this page to index this many nodes" -- and really what will happen is that the total for all pages will be indexed since each NodeSearch page will just grab the next few nodes to do.

Advantages:

Easiest to implement
Clearly linked to the content being indexed - not even shown if you don't have any NodeSearch pages configured.

Disadvantages:

Confusing UX: If you have two node search pages configured, and can afford to index 100 nodes per cron run, then you should set each search page to index 50 nodes. If you then disable on of the search pages and don't update the other, your indexing rate will halve.
Performance: There are some summarising operation performed after indexing. This would be performed once per search page per cron run, rather than just once per cron run.
There is no obvious reason why different instances of NodeSearch would index at different speeds.

Proposed implementation 2

Move the setting to the NodeSearch edit page, but share the same value across all NodeSearch pages. This is similar to Content type (node) configuration, where some Field settings apply everywhere that Field is used, but others only apply to the current content type.

When multiple plugins of the same type are configured, this will need to ensure that updateIndex() either only gets called once per cron run for NodeSearch (and other plugins that use a single index) or returns immediately if it is called multiple times.

Advantages:

Clearly linked to the content being indexed.
Performance

Disadvantages:

Still a slightly confusing UX, because the same setting can be configured in multiple places.

Proposed implementation 3

Keep the indexing throttle setting on the main search configuration page, but extend it so that it can take different values for each index. If you had two user searches, two node searches and two widget searches (which require separate indexes), then the form might look as follows:

INDEXING THROTTLE

Number of content items to index per cron run [ 100 ]
Number of widgets to index per cron run for search page foo [ 20 ]
Number of widgets to index per cron run for search page bar [ 20 ]
User search does not require an index.

Advantages

Clearly communicates what will happen on each cron run.

Disadvantages

Most complicated implementation. Will presumably require each search plugin to define whether it needs a unique, shared or no index.
Clutters up the main settings page.
Separates the settings from the page-specific settings.

Files: 
CommentFileSizeAuthor
#39 search-2123073-39.patch14.17 KBianthomas_uk
PASSED: [[SimpleTest]]: [MySQL] 60,002 pass(es). View
#37 interdiff-37.txt3.22 KBianthomas_uk
#37 search-2123073-37.patch15.04 KBianthomas_uk
PASSED: [[SimpleTest]]: [MySQL] 60,005 pass(es). View

Comments

ianthomas_uk’s picture

Issue tags: +Configuration system

Tag

jhodgdon’s picture

Actually, I think a better approach would be
#1560820: Use queue for indexing content

And if we don't change to using queues, the limit should be made into a per-plugin setting rather than a global Search module setting. Each search plugin has the opportunity to add settings to the search settings page (node is already doing this), so it would be easy to move this setting over to the node module.

swentel’s picture

Category: bug » task

Yeah, anyone is free to read in settings from any module/configuration. So I would go by moving the limit to the node module instead of directly from search. However, that would mean we'd have todo that for user as well.

jhodgdon’s picture

That is incorrect. User search does not do anything during cron -- it just queries the {users} table directly when you search. There is no indexing involved.

swentel’s picture

Oh, right, I was completely mixing that variable up with the limit per page - which even doesn't exist in core as a setting, but I was working on a contrib module that exposes that. Sorry for that mix up.

So yeah, just moving that node would be fine enough here.

jhodgdon’s picture

Component: node system » search.module

We've been managing search-related issues in the search.module component rather than node, even though this would be under node.module. So, changing component.

And regarding number of results per page, can I just mention
#702940: Make the the number of search results per page settings configurable?
I don't see a strong reason why we couldn't get this fixed in core. It's been requested for a loooooong time... :)

ianthomas_uk’s picture

ianthomas_uk’s picture

The changes for #2042807: Convert search plugins to use a ConfigEntity and a PluginBag mean there will be per-search page settings, but this is actually a per-search page type setting (due to 1-1 mapping between search page type and search index).

jhodgdon’s picture

What are you saying in #8? Each search page will have its own settings. The ranking stuff is done at search time and does not go into the index. And the search index is separated out by the page, not by the page type, anyway.

ianthomas_uk’s picture

This issue is about where the indexing throttle option should go. If you're right that the index is separated by page, then this won't be a problem, but it looks to me that it is separated by page type.

At the moment we have one page of settings for all search configuration, so we're mixing settings that apply to
a) all searches, e.g. the cjk handling option.
b) all search pages of a specifc type, e.g. the indexing throttle option, which applies to all node-based searches.
c) specific search pages, e.g. the node ranking options.

2042807 splits (c) out onto it's own pages, but also introduces the ability to have more than one search page of the same type, meaning b and c aren't necessarily the same.

jhodgdon’s picture

The indexing throttle setting should go in the individual search.page configuration for indexable searches, and should be removed from the main search settings page. We should wait until #2042807: Convert search plugins to use a ConfigEntity and a PluginBag goes in however (or combine it with that issue). Different indexable plugins of different types would have different sensible defaults, and would want different settings. (I ran into that in D7 with the Search by Page contrib module, for instance -- I just had a separate setting for the throttle for that module since indexing took longer.)

And yes I am certain that the index is by page, at least once 2042807 goes in. The indexing code in NodeSearch calls:

      search_index($node->id(), $this->getPluginId(), $text, $language->id);

and $this->getPluginID() returns the individual page config's ID (machine name). Then when searching, the execute method does this:

      ->searchExpression($keys, $this->getPluginId());

which sets up the query to match only records that were indexed by this plugin.

Before 2042807 goes in, the index is by plugin ID as well, but the plugin ID is one per plugin type, since each plugin only has one page.

jhodgdon’s picture

I was partly wrong in #11 and really in the NodeSearch plugin specifically, we are sharing an index across all individual page plugin instances.

However, the NodeSearch::updateIndex() method will be called once per defined NodeSearch page.

We do want to have the search cron setting be per-page though so you can say "each cron run I want this page to index this many nodes" -- and really what will happen is that the total for all pages will be indexed since each NodeSearch page will just grab the next few nodes to do.

ianthomas_uk’s picture

Status: Postponed » Needs work
jhodgdon’s picture

What we should really do in this issue is make the cron limit a configuration option in NodeSearch instead of in search.module.

ianthomas_uk’s picture

Title: Stop reading search.settings config in node module » Move index.cron_limit setting to NodeSearch
Issue summary: View changes

Rewrote summary as the issue has drifted quite a lot.
Added two additional potential solutions.

Do these three solutions make sense? Which is the best? Is anyone even going to configure multiple search pages of the same type?

Personally I'm not keen on option 1, but I keep changing my mind about whether option 2 or option 3 has the better UX, and I'm not sure how much harder option 3 would be to implement.

jhodgdon’s picture

I think Proposal #1 is the best idea.

Given how things are right now in Core, I don't think anyone would want to configure multiple NodeSearch pages, because the only difference between them would be the rankings.

However, for some contrib plugins (such as the one I need to create for my contrib Search by Page module), you may have each search page have its own index space, in which case you definitely want each search page to have its own cron throttle limit.

So given that it's going to need to be per-page for that type of plugin, I think we should also just make it per-page for the Node search plugin. We should just make it clear (with a description text?) on the configuration page that you are saying how many nodes to index for that particular page per cron run. Which will probably be clear anyway since you are configuring a particular page.

I think proposal 2 (having one setting for all of NodeSearch but displaying it on each NodeSearch page config screen) would be too confusing for users, and too difficult to implement (we'd have to change the way cron is invoked in search.module but only for NodeSearch, because my other search plugin is going to need it invoked for each page as it is now).

Proposal 3 (having the setting be on the main search page but have different settings for each type of index) also seems difficult to implement, and then also NodeSearch will need to get that value, which defeats the original intent of this issue (decoupling search.settings from NodeSearch).

ianthomas_uk’s picture

Assigned: ianthomas_uk » Unassigned
Status: Needs work » Needs review
FileSize
7.42 KB
FAILED: [[SimpleTest]]: [MySQL] 59,576 pass(es), 2 fail(s), and 0 exception(s). View

This is an implementation of option 1 - @jhodgon convinced me that it was the pragmatic option, at least until there is a good reason for a site to have multiple NodeSearch instances.

The patch isn't quite there yet, but I'm about to be afk for a week so wanted to record my progress. At the very least SearchMultilingualEntityTest will need to be updated. Feel free to work on it while I'm away, otherwise I'll pick it up when I get back.

Status: Needs review » Needs work

The last submitted patch, 17: search-2123073-16.patch, failed testing.

jhodgdon’s picture

This looks good! It seems like all we need to do now is fix that one test that is setting the cron limit in the old way, and also fix the config update test.

ianthomas_uk’s picture

I had a chat with chx in irc, we should just remove the migration for this and file an issue in https://drupal.org/project/issues/2105305 that we need to migrate this (and rankings) to the config entity.

ianthomas_uk’s picture

jhodgdon’s picture

Excellent.

ianthomas_uk’s picture

FileSize
9.7 KB
2.28 KB

This should fix the migration failure. @chx could you check the interdiff and confirm that I've done the right thing please (ignore the last hunk)?

I updated the other failing test, but when I ran it locally it still failed. I'm not sure why, but I really have run out of time now so can't look into it any further. See you next week.

chx’s picture

Yes, the migration is correct there is #2171087: Migrate search page settings for the new migration.

ianthomas_uk’s picture

Status: Needs work » Needs review
FileSize
10.05 KB
PASSED: [[SimpleTest]]: [MySQL] 59,753 pass(es). View
2.06 KB

Minor reroll and fixed the broken test (I hadn't taken account of the configuration now being in a config entity). I've also removed an unnecessary cast, because that's enforced by the config system.

ianthomas_uk’s picture

As far as migration is concerned, this removes the cron_limit configuration. That will need a quick followup in the migrate sandbox once this has been committed.

chx’s picture

Right, I see no problems at all: we will need to add the search_cron_limit to the source (which is d6_variable anyways), 'configuration:cron_limit': search_cron_limit to process and we are done. To test, add the variable and a value of it to the dump, assert to the test and done.

jhodgdon’s picture

The patch looks OK to me... but it bothers me that there was apparently not a test that checked the ability to set the cron limit from the UI that had to be changed? As it stands right now, it looks like we'll have to test that it works manually, and some later patch could break it without triggering a test failure?

ianthomas_uk’s picture

Do you want me to write a test?

I could change the existing SearchMultilingualEntityTest to use the UI, but that would need us to either create and index 11 nodes/translations, which is fairly time consuming for the testbots and I wonder if that cost is worth the benefit when multiplied by every time the tests are run.

Alternatively we could change the UI so that we can index 5 at a time, but that means our tests would be driving the UI, which is the wrong way round.

chx’s picture

or the test can form alter so that you can index 2...

jhodgdon’s picture

We could have kind of a hybrid test:
a) Verify that changing settings in the UI updates the configuration value.
b) The test that we have now that verifies that changing the configuration value affects what is indexed in each index run.

[Edit: added] Along with a comment in the test code explaining why it's a hybrid (because the min value in the UI is 10 or whatever it is).

jhodgdon’s picture

This patch also needs to remove all the existing update code that attempts to update the search cron limit config, or else we'll end up with another issue like #2178205: Delete obsolete config key active_plugins from search module

ianthomas_uk’s picture

FileSize
1.77 KB
11.82 KB
PASSED: [[SimpleTest]]: [MySQL] 59,915 pass(es). View

Here's a test for form > config.

RE #32, existing update functions are due for removal by #2168011: Remove all 7.x to 8.x update hooks and disallow updates from the previous major version, so we do not need to maintain them.

Status: Needs review » Needs work

The last submitted patch, 33: search-2123073-33.patch, failed testing.

ianthomas_uk’s picture

Status: Needs work » Needs review

33: search-2123073-33.patch queued for re-testing.

jhodgdon’s picture

Status: Needs review » Needs work

Your patch is removing the settings save from the SearchPageListController::submitForm() method:

-    $search_settings
-      ->set('index.cron_limit', $form_state['values']['cron_limit'])
-      ->save();
ianthomas_uk’s picture

Status: Needs work » Needs review
FileSize
15.04 KB
PASSED: [[SimpleTest]]: [MySQL] 60,005 pass(es). View
3.22 KB

Oops! Here's a fix, and a test so it doesn't happen again. Thanks.

jhodgdon’s picture

OK, down to nitpicks...

a) Do we need a fieldset/details element for this setting on the config page? It doesn't seem necessary to me to have a fieldset with only one item in it.

b) I know you just copied this from the search settings page, but in the description for that field:

t(...<a href="@cron">cron maintenance task</a> ..., array('@cron' => url('admin/reports/status'))

The url() function is deprecated -- this should use a route. Also, we really should not be sending people to admin/reports/status when we have a Cron config page in the system.module, so can we send them there instead?

c) It looks like the code that saved the search settings in SearchPageListController::submitForm() is still being removed:

-    $search_settings
-      ->set('index.cron_limit', $form_state['values']['cron_limit'])
-      ->save(); 

I don't see the $search_settings->save() being put back in anywhere? Maybe we need a test added to make sure you can save the main search page settings (possibly out of scope for this issue, but then again maybe not)?

d) Oh wait, that patch is very very odd. There are two sections for SearchPageListController... Can you make a new patch? I can't read the existing patch if it has multiple sections for the same file. Weird.

I do see something added to

ianthomas_uk’s picture

FileSize
14.17 KB
PASSED: [[SimpleTest]]: [MySQL] 60,002 pass(es). View

Thanks. Here's #37 without the formatting problem. This patch may no longer be relevant depending in decisions in #2178643: Indexing status can show a negative percentage of content as having been indexed though, so I won't address the rest of your review just yet.

(If you were wondering what happened, a bit of my workflow leaked into that patch. Generally I apply to the old patch to my git staging area, and work on my changes in my working copy. Then I can create an interdiff from the working copy, but I need to stage my changes before creating the patch. If I forget, I end up with a patch like #37).

jhodgdon’s picture

Um... So do you want a review of this patch, or should we postpone on the other issue?

ianthomas_uk’s picture

Status: Needs review » Postponed

Lets postpone this on a followup to #2178643: Indexing status can show a negative percentage of content as having been indexed which will switch from a count-based to a time-based throttle.

ianthomas_uk’s picture

This is postponed on #2180389: Switch from item-based to time-based index throttling. If that issue is fixed, then this becomes a won't fix.

jhodgdon’s picture

Status: Postponed » Active
jhodgdon’s picture

Status: Active » Needs work
Issue tags: -Configuration system

If we are going to do this issue, the current patch will not quite work:
- More tests have been added to the throttling test in SearchMultilingualEntityTest.php, so that will need to be updated.
- I don't think we need the additional tests in SearchConfigSettingsFormTest -- one place for testing throttling should be fine, and I think the SearchMultilingualEntityTest covers all the necessary tests.
- It looks like the migration changes are removing the setting from being migrated. Don't they also need to add it back in in the new location for this setting?

However, now that #2178651: Consider UX implications of indexed vs non-indexed search pages has been taken care of, the UI confusion problems are greatly reduced, since the UI now shows us which search pages use the index and which don't. So, I'm not sure we need to do this at all?

Thoughts?

jhodgdon’s picture

Issue tags: +beta deadline

According to my read of the Beta data model policy, I think this would be a beta deadline issue, because it would change the config schema.

So if we want to do this, we'd better do it now.

catch’s picture

I'm not sure about the status of this issue overall, but I'd allow minor changes to configuration files between beta and when we provide a beta-beta upgrade path if they're needed to resolve issues (i.e. not for the sake of just changing the file structure).

Once the beta-to-beta upgrade path is supported, I'm not sure the change warrants the risk of an upgrade path between betas.

However if there's a usability improvement in the end, we could possibly do it as an 8.1.x change if there's no API change and there's a well tested update path.

That makes this 'beta target', 'D8 upgrade path', 'minor version target'.

jhodgdon’s picture

Status: Needs work » Closed (won't fix)

I'm thinking that with #2211241: Refactor search_reindex() into separate functions, which adds notes to the API docs and UI help/descriptions explaining that this setting is a default for pages using the index, and since we've made other changes recently in the UI to explain indexed vs. non-indexed pages, we probably do not really need to do this.

The Node plugin (and any others that want to) can go ahead and use the default setting, and any other plugin that wants to can override with its own setting. So... I'm going to go ahead and close this as "won't fix". We can reopen if we change our minds.