Updated comment #15. The original summary was very different, see https://drupal.org/node/2123073/revisions/6822963/view/
Currently the general search configuration page at admin/config/search/settings includes an indexing throttle option. Confusingly, this is used by the NodeSearch plugin, but not by the UserSearch plugin as there is no index for users. Even if multiple search plugins do need an indexing throttle (e.g. for a contrib plugin), it is likely that the throttle will need to be set per-plugin, as different content types will index at different speeds.
Note: There is not a simple mapping between a search plugin, a search index and a search page. It is perfectly possible for a site to use, for example, 2 search plugins, 3 search indexes and 4 search pages. This is because some plugins don't require any index, some can share an index with other plugins of the same type, and others need their own index.
Proposed implementation 1
Proposed by jhodgon in #12
Configure separately for each defined NodeSearch page. The NodeSearch::updateIndex() method will be called once per defined NodeSearch page.
We do want to have the search cron setting be per-page though so you can say "each cron run I want this page to index this many nodes" -- and really what will happen is that the total for all pages will be indexed since each NodeSearch page will just grab the next few nodes to do.
Advantages:
Easiest to implement
Clearly linked to the content being indexed - not even shown if you don't have any NodeSearch pages configured.
Disadvantages:
Confusing UX: If you have two node search pages configured, and can afford to index 100 nodes per cron run, then you should set each search page to index 50 nodes. If you then disable on of the search pages and don't update the other, your indexing rate will halve.
Performance: There are some summarising operation performed after indexing. This would be performed once per search page per cron run, rather than just once per cron run.
There is no obvious reason why different instances of NodeSearch would index at different speeds.
Proposed implementation 2
Move the setting to the NodeSearch edit page, but share the same value across all NodeSearch pages. This is similar to Content type (node) configuration, where some Field settings apply everywhere that Field is used, but others only apply to the current content type.
When multiple plugins of the same type are configured, this will need to ensure that updateIndex() either only gets called once per cron run for NodeSearch (and other plugins that use a single index) or returns immediately if it is called multiple times.
Advantages:
Clearly linked to the content being indexed.
Performance
Disadvantages:
Still a slightly confusing UX, because the same setting can be configured in multiple places.
Proposed implementation 3
Keep the indexing throttle setting on the main search configuration page, but extend it so that it can take different values for each index. If you had two user searches, two node searches and two widget searches (which require separate indexes), then the form might look as follows:
INDEXING THROTTLE
Number of content items to index per cron run [ 100 ]
Number of widgets to index per cron run for search page foo [ 20 ]
Number of widgets to index per cron run for search page bar [ 20 ]
User search does not require an index.
Advantages
Clearly communicates what will happen on each cron run.
Disadvantages
Most complicated implementation. Will presumably require each search plugin to define whether it needs a unique, shared or no index.
Clutters up the main settings page.
Separates the settings from the page-specific settings.
Comment | File | Size | Author |
---|---|---|---|
#39 | search-2123073-39.patch | 14.17 KB | ianthomas_uk |
#37 | interdiff-37.txt | 3.22 KB | ianthomas_uk |
#37 | search-2123073-37.patch | 15.04 KB | ianthomas_uk |
Comments
Comment #1
ianthomas_ukTag
Comment #2
jhodgdonActually, I think a better approach would be
#1560820: Use queue for indexing content
And if we don't change to using queues, the limit should be made into a per-plugin setting rather than a global Search module setting. Each search plugin has the opportunity to add settings to the search settings page (node is already doing this), so it would be easy to move this setting over to the node module.
Comment #3
swentel CreditAttribution: swentel commentedYeah, anyone is free to read in settings from any module/configuration. So I would go by moving the limit to the node module instead of directly from search. However, that would mean we'd have todo that for user as well.
Comment #4
jhodgdonThat is incorrect. User search does not do anything during cron -- it just queries the {users} table directly when you search. There is no indexing involved.
Comment #5
swentel CreditAttribution: swentel commentedOh, right, I was completely mixing that variable up with the limit per page - which even doesn't exist in core as a setting, but I was working on a contrib module that exposes that. Sorry for that mix up.
So yeah, just moving that node would be fine enough here.
Comment #6
jhodgdonWe've been managing search-related issues in the search.module component rather than node, even though this would be under node.module. So, changing component.
And regarding number of results per page, can I just mention
#702940: Make the number of results per page configurable?
I don't see a strong reason why we couldn't get this fixed in core. It's been requested for a loooooong time... :)
Comment #7
ianthomas_ukPostponed on #2042807: Convert search plugins to use a ConfigEntity and a PluginBag
Affects UI changed in #1559244: Clean up search settings page
Comment #8
ianthomas_ukThe changes for #2042807: Convert search plugins to use a ConfigEntity and a PluginBag mean there will be per-search page settings, but this is actually a per-search page type setting (due to 1-1 mapping between search page type and search index).
Comment #9
jhodgdonWhat are you saying in #8? Each search page will have its own settings. The ranking stuff is done at search time and does not go into the index. And the search index is separated out by the page, not by the page type, anyway.
Comment #10
ianthomas_ukThis issue is about where the indexing throttle option should go. If you're right that the index is separated by page, then this won't be a problem, but it looks to me that it is separated by page type.
At the moment we have one page of settings for all search configuration, so we're mixing settings that apply to
a) all searches, e.g. the cjk handling option.
b) all search pages of a specifc type, e.g. the indexing throttle option, which applies to all node-based searches.
c) specific search pages, e.g. the node ranking options.
2042807 splits (c) out onto it's own pages, but also introduces the ability to have more than one search page of the same type, meaning b and c aren't necessarily the same.
Comment #11
jhodgdonThe indexing throttle setting should go in the individual search.page configuration for indexable searches, and should be removed from the main search settings page. We should wait until #2042807: Convert search plugins to use a ConfigEntity and a PluginBag goes in however (or combine it with that issue). Different indexable plugins of different types would have different sensible defaults, and would want different settings. (I ran into that in D7 with the Search by Page contrib module, for instance -- I just had a separate setting for the throttle for that module since indexing took longer.)
And yes I am certain that the index is by page, at least once 2042807 goes in. The indexing code in NodeSearch calls:
and $this->getPluginID() returns the individual page config's ID (machine name). Then when searching, the execute method does this:
which sets up the query to match only records that were indexed by this plugin.
Before 2042807 goes in, the index is by plugin ID as well, but the plugin ID is one per plugin type, since each plugin only has one page.
Comment #12
jhodgdonI was partly wrong in #11 and really in the NodeSearch plugin specifically, we are sharing an index across all individual page plugin instances.
However, the NodeSearch::updateIndex() method will be called once per defined NodeSearch page.
We do want to have the search cron setting be per-page though so you can say "each cron run I want this page to index this many nodes" -- and really what will happen is that the total for all pages will be indexed since each NodeSearch page will just grab the next few nodes to do.
Comment #13
ianthomas_ukComment #14
jhodgdonWhat we should really do in this issue is make the cron limit a configuration option in NodeSearch instead of in search.module.
Comment #15
ianthomas_ukRewrote summary as the issue has drifted quite a lot.
Added two additional potential solutions.
Do these three solutions make sense? Which is the best? Is anyone even going to configure multiple search pages of the same type?
Personally I'm not keen on option 1, but I keep changing my mind about whether option 2 or option 3 has the better UX, and I'm not sure how much harder option 3 would be to implement.
Comment #16
jhodgdonI think Proposal #1 is the best idea.
Given how things are right now in Core, I don't think anyone would want to configure multiple NodeSearch pages, because the only difference between them would be the rankings.
However, for some contrib plugins (such as the one I need to create for my contrib Search by Page module), you may have each search page have its own index space, in which case you definitely want each search page to have its own cron throttle limit.
So given that it's going to need to be per-page for that type of plugin, I think we should also just make it per-page for the Node search plugin. We should just make it clear (with a description text?) on the configuration page that you are saying how many nodes to index for that particular page per cron run. Which will probably be clear anyway since you are configuring a particular page.
I think proposal 2 (having one setting for all of NodeSearch but displaying it on each NodeSearch page config screen) would be too confusing for users, and too difficult to implement (we'd have to change the way cron is invoked in search.module but only for NodeSearch, because my other search plugin is going to need it invoked for each page as it is now).
Proposal 3 (having the setting be on the main search page but have different settings for each type of index) also seems difficult to implement, and then also NodeSearch will need to get that value, which defeats the original intent of this issue (decoupling search.settings from NodeSearch).
Comment #17
ianthomas_ukThis is an implementation of option 1 - @jhodgon convinced me that it was the pragmatic option, at least until there is a good reason for a site to have multiple NodeSearch instances.
The patch isn't quite there yet, but I'm about to be afk for a week so wanted to record my progress. At the very least SearchMultilingualEntityTest will need to be updated. Feel free to work on it while I'm away, otherwise I'll pick it up when I get back.
Comment #19
jhodgdonThis looks good! It seems like all we need to do now is fix that one test that is setting the cron limit in the old way, and also fix the config update test.
Comment #20
ianthomas_ukI had a chat with chx in irc, we should just remove the migration for this and file an issue in https://drupal.org/project/issues/2105305 that we need to migrate this (and rankings) to the config entity.
Comment #21
ianthomas_ukIssue filed: #2171087: Migrate search page settings
Comment #22
jhodgdonExcellent.
Comment #23
ianthomas_ukThis should fix the migration failure. @chx could you check the interdiff and confirm that I've done the right thing please (ignore the last hunk)?
I updated the other failing test, but when I ran it locally it still failed. I'm not sure why, but I really have run out of time now so can't look into it any further. See you next week.
Comment #24
chx CreditAttribution: chx commentedYes, the migration is correct there is #2171087: Migrate search page settings for the new migration.
Comment #25
ianthomas_ukMinor reroll and fixed the broken test (I hadn't taken account of the configuration now being in a config entity). I've also removed an unnecessary cast, because that's enforced by the config system.
Comment #26
ianthomas_ukAs far as migration is concerned, this removes the cron_limit configuration. That will need a quick followup in the migrate sandbox once this has been committed.
Comment #27
chx CreditAttribution: chx commentedRight, I see no problems at all: we will need to add the search_cron_limit to the source (which is d6_variable anyways),
'configuration:cron_limit': search_cron_limit
to process and we are done. To test, add the variable and a value of it to the dump, assert to the test and done.Comment #28
jhodgdonThe patch looks OK to me... but it bothers me that there was apparently not a test that checked the ability to set the cron limit from the UI that had to be changed? As it stands right now, it looks like we'll have to test that it works manually, and some later patch could break it without triggering a test failure?
Comment #29
ianthomas_ukDo you want me to write a test?
I could change the existing SearchMultilingualEntityTest to use the UI, but that would need us to either create and index 11 nodes/translations, which is fairly time consuming for the testbots and I wonder if that cost is worth the benefit when multiplied by every time the tests are run.
Alternatively we could change the UI so that we can index 5 at a time, but that means our tests would be driving the UI, which is the wrong way round.
Comment #30
chx CreditAttribution: chx commentedor the test can form alter so that you can index 2...
Comment #31
jhodgdonWe could have kind of a hybrid test:
a) Verify that changing settings in the UI updates the configuration value.
b) The test that we have now that verifies that changing the configuration value affects what is indexed in each index run.
[Edit: added] Along with a comment in the test code explaining why it's a hybrid (because the min value in the UI is 10 or whatever it is).
Comment #32
jhodgdonThis patch also needs to remove all the existing update code that attempts to update the search cron limit config, or else we'll end up with another issue like #2178205: Delete obsolete config key active_plugins from search module
Comment #33
ianthomas_ukHere's a test for form > config.
RE #32, existing update functions are due for removal by #2168011: Remove all 7.x to 8.x update hooks and disallow updates from the previous major version, so we do not need to maintain them.
Comment #35
ianthomas_uk33: search-2123073-33.patch queued for re-testing.
Comment #36
jhodgdonYour patch is removing the settings save from the SearchPageListController::submitForm() method:
Comment #37
ianthomas_ukOops! Here's a fix, and a test so it doesn't happen again. Thanks.
Comment #38
jhodgdonOK, down to nitpicks...
a) Do we need a fieldset/details element for this setting on the config page? It doesn't seem necessary to me to have a fieldset with only one item in it.
b) I know you just copied this from the search settings page, but in the description for that field:
The url() function is deprecated -- this should use a route. Also, we really should not be sending people to admin/reports/status when we have a Cron config page in the system.module, so can we send them there instead?
c) It looks like the code that saved the search settings in SearchPageListController::submitForm() is still being removed:
I don't see the $search_settings->save() being put back in anywhere? Maybe we need a test added to make sure you can save the main search page settings (possibly out of scope for this issue, but then again maybe not)?
d) Oh wait, that patch is very very odd. There are two sections for SearchPageListController... Can you make a new patch? I can't read the existing patch if it has multiple sections for the same file. Weird.
I do see something added to
Comment #39
ianthomas_ukThanks. Here's #37 without the formatting problem. This patch may no longer be relevant depending in decisions in #2178643: Indexing status can show a negative percentage of content as having been indexed though, so I won't address the rest of your review just yet.
(If you were wondering what happened, a bit of my workflow leaked into that patch. Generally I apply to the old patch to my git staging area, and work on my changes in my working copy. Then I can create an interdiff from the working copy, but I need to stage my changes before creating the patch. If I forget, I end up with a patch like #37).
Comment #40
jhodgdonUm... So do you want a review of this patch, or should we postpone on the other issue?
Comment #41
ianthomas_ukLets postpone this on a followup to #2178643: Indexing status can show a negative percentage of content as having been indexed which will switch from a count-based to a time-based throttle.
Comment #42
ianthomas_ukThis is postponed on #2180389: Switch from item-based to time-based index throttling. If that issue is fixed, then this becomes a won't fix.
Comment #43
jhodgdonI do not think we are really going to do #2180389: Switch from item-based to time-based index throttling and #2178643: Indexing status can show a negative percentage of content as having been indexed is fixed, so let's move this back to Active.
Comment #44
jhodgdonIf we are going to do this issue, the current patch will not quite work:
- More tests have been added to the throttling test in SearchMultilingualEntityTest.php, so that will need to be updated.
- I don't think we need the additional tests in SearchConfigSettingsFormTest -- one place for testing throttling should be fine, and I think the SearchMultilingualEntityTest covers all the necessary tests.
- It looks like the migration changes are removing the setting from being migrated. Don't they also need to add it back in in the new location for this setting?
However, now that #2178651: Consider UX implications of indexed vs non-indexed search pages has been taken care of, the UI confusion problems are greatly reduced, since the UI now shows us which search pages use the index and which don't. So, I'm not sure we need to do this at all?
Thoughts?
Comment #45
jhodgdonAccording to my read of the Beta data model policy, I think this would be a beta deadline issue, because it would change the config schema.
So if we want to do this, we'd better do it now.
Comment #46
catchI'm not sure about the status of this issue overall, but I'd allow minor changes to configuration files between beta and when we provide a beta-beta upgrade path if they're needed to resolve issues (i.e. not for the sake of just changing the file structure).
Once the beta-to-beta upgrade path is supported, I'm not sure the change warrants the risk of an upgrade path between betas.
However if there's a usability improvement in the end, we could possibly do it as an 8.1.x change if there's no API change and there's a well tested update path.
That makes this 'beta target', 'D8 upgrade path', 'minor version target'.
Comment #47
jhodgdonI'm thinking that with #2211241: Refactor search_reindex() into separate functions, which adds notes to the API docs and UI help/descriptions explaining that this setting is a default for pages using the index, and since we've made other changes recently in the UI to explain indexed vs. non-indexed pages, we probably do not really need to do this.
The Node plugin (and any others that want to) can go ahead and use the default setting, and any other plugin that wants to can override with its own setting. So... I'm going to go ahead and close this as "won't fix". We can reopen if we change our minds.