As part of the effort to leverage Search API on Drupal.org, I realized that SearchApiViewsHandlerFilterOptions stores all the possible options in the views data definition.

This is bad for two reasons:

  • First, it unnecessarily clobbers the Views data, which is already very big
  • Second, it makes impossible to have a dynamic option list down the line, like the one we have as part of Dereference list

I suggest we just store the minimal information required to rebuild the wrapper at runtime, and call ->getOptions('list') on that.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Status: Active » Needs review
FileSize
6.01 KB

Here is what I have in mind.

drunken monkey’s picture

Hm, I do see the problem of storing too much data in the Views data table – however, having to retrieve the options again each time the view is displayed could also be a massive performance drain. Take taxonomy term references for example – if I'm not mistaken, the options list callback for such fields actually calls taxonomy_get_tree(). With lots of taxonomy terms on a site, this will take ages.

So, I think I'd prefer if you could just override the filter handler for that one field – should be enough, right?

Any other opinions on that?

mh86’s picture

I ran into the same issue as I was wondering why a cache clear is eating up all my memory.

I have 10-15 taxonomy fields that get indexed in a few different Search API indexes. For all of them all possible taxonomy terms get loaded on a cache rebuild. Most vocabularies are pretty small, except one with regional data that has around 10.000 entries. For none of the bigger vocabularies it makes sense to display all possible values in an exposed filter. For this use case we should stick to autocompletes. Furthermore no one would expect that you have to clear the cache to see newly inserted terms in your exposed filters (second point Damien mentioned).

I did not test Damien's patch yet, just did some quick testing with and without calls to $wrapper->getOptions('view'), the differences for my use case are huge.

With $wrapper->getOptions('view'):

  • Cache Clear with Drush, Total Inc. Peak Memory Use (XHProf): 330MB
  • views_data cache object in the cache_view database table: 2,7MB

Without $wrapper->getOptions('view'):

  • Cache Clear with Drush, Total Inc. Peak Memory Use (XHProf): 260MB
  • views_data cache object in the cache_view database table: 2,2MB

When enabling the i18n_taxonomy module, my memory consumption on cache clear even goes up to 400MB, because it tries to load translations for each term.

Not sure what the best solution is. The options list callback is also used for several other fields / properties, for some use cases the pre-calculation makes sense (e.g. list field with fixed values), for others it doesn't.
Furthermore I currently don't see any possibilities to alter the field definitions before the Search API executes the options list callback.

drunken monkey’s picture

Title: SearchApiViewsHandlerFilterOptions should delay use of $wrapper->getOptions('view') » Fix performance issues of Views options filter handler for huge options lists
FileSize
7.08 KB

Attached is a re-roll of the patch from #1, which also fixes several issues it had. It seems to work well, and the options filters continue working as before. I also took a look at the performance, and it's still pretty OK – the maximum, for a term reference on a related entity, was a bit more than 10 ms (for get_value_options()), in a ~1 s page request.

However, some fields or properties in contrib which I didn't test might use more complicated options list callbacks, so it might still be a good idea to leave both methods in – probably via a variable. We could also look at the number of options available in hook_views_data() and include options lists below a certain size (~50 items?) directly in the cached data. The size could be configurable, with settings to always or never store the options also available.
What would you say, does that make sense?
(This would also eliminate one further problem I could imagine, i.e., that a property might have an options list callback which returns an empty array. Currently, this doesn't get the "options" type, which makes sense I think, but with the attached patch it would.)

Furthermore I currently don't see any possibilities to alter the field definitions before the Search API executes the options list callback.

Ah, of course, that's true. But since you'll probably never need the options list for that vocabulary anyways, you could just alter the Entity API metadata of the term reference field(s) directly.

mh86’s picture

Hey Thomas,

thanks for taking a look at it. I've just applied your patch and tested it a little bit - my performance issues on cache clear are gone and the options list filters are still working.

Even though it helps for my use case it would be great to get the opinion of others, especially use cases with lots of exposed filters would be interesting for performance comparison.

We could also look at the number of options available in hook_views_data() and include options lists below a certain size (~50 items?) directly in the cached data. The size could be configurable, with settings to always or never store the options also available.

Once we start loading the items to count them, the performance problem will be back.
Maybe it makes sense to do the pre calculation for simple properties, like list fields with text, integers or floats, but not for properties that are referencing other entities, like taxonomy terms, or more generally properties for entity reference fields.

Ah, of course, that's true. But since you'll probably never need the options list for that vocabulary anyways, you could just alter the Entity API metadata of the term reference field(s) directly.

I would really like to avoid that, maybe some other modules are using the options list callback in a meaningful way.

drunken monkey’s picture

thanks for taking a look at it. I've just applied your patch and tested it a little bit - my performance issues on cache clear are gone and the options list filters are still working.

Even though it helps for my use case it would be great to get the opinion of others, especially use cases with lots of exposed filters would be interesting for performance comparison.

Excellent, good to hear! And seconded, if someone else could test it that would be great!

Once we start loading the items to count them, the performance problem will be back.

Yes, but you could deactivate that with the variable.

Maybe it makes sense to do the pre calculation for simple properties, like list fields with text, integers or floats, but not for properties that are referencing other entities, like taxonomy terms, or more generally properties for entity reference fields.

I think that distinction would be a bit arbitrary. Of course, it makes sense most of the time, especially with core, but in contrib you can't know what kind of options list callbacks people might have.

I would really like to avoid that, maybe some other modules are using the options list callback in a meaningful way.

Even so, it sounds like that would bring your site to its knees …
But of course, this isn't really a solution, we should definitely work around this problem somehow in the Search API. If we can fix it with this patch, then great!

mh86’s picture

If we add a variable, which variant should be used by default? I'm not so sure about it...

drunken monkey’s picture

A threshold of 50 could be the default, and people who see problems could switch it off altogether.
But we should probably have someone test it with a lot of exposed term filters – if the performance loss is negligible we can just use the patch as-is and not bother with a variable. Also, with this patch it's at least possible to override the behavior to regain the old one – as you said, it's currently not possible to stop Search API Views from loading the options list.

drunken monkey’s picture

OK, new development: in #2111899: Specialized filter for taxonomy terms drumm proposes (and presents) a dedicated filter for taxonomy terms. The current patch only changes the handler for free-tagging term reference fields, but when we'd include all of them that would also solve this issue, at least for taxonomy terms. You'd only get a select list with hierarchy if you expressly choose it.
With that out of the way, we could maybe then implement this issue with the variable, so if someone comes along another giant options list, they can at least do something about it. (And we'd also save space for large-but-not-humongous options lists.)

klausi’s picture

Issue summary: View changes
Status: Needs review » Needs work
+++ b/contrib/search_api_views/includes/handler_filter_options.inc
@@ -1,17 +1,83 @@
+  /**
+   * Stores the values which are available on the form.
+   *
+   * @var array
+   */
+  var $value_options = NULL;

The "var" keyword should not be used, and the visibility is missing. Protected? And the variable name should use camelCase, not underscores. See https://drupal.org/node/608152#visibility .

Otherwise the patch makes sense - so where do you want to put the variable to determine the behavior? On the index or on the Views options?

klausi’s picture

Ok, I think I misunderstood the issue a bit. mh86 explained to me that the variable would be a global variable_get() in the views data hook, which runs before any concrete index or view.

I don't think that is necessary, it makes way more sense to load available options on the fly. And if that is expensive people need to use proper autocomplete widgets.

So apart from the minor code style issue this should be ready to go.

drunken monkey’s picture

Status: Needs work » Fixed

The "var" keyword should not be used, and the visibility is missing. Protected? And the variable name should use camelCase, not underscores.

Oops, thanks for spotting this! I'd never use var myself, but it seems I missed it when reviewing Damien's original patch. So thanks for reviewing so thoroughly!
Regarding the property name, I'd normaly concur but Views has its own twisted, incompatible code style "do-what-you-likes", and while I loathe them and ignore them whenever possible, I do think that in this case just keeping with Views' lowercase-with-underscores style makes the code look a bit more consistent. So leaving that for now, and praying to the powers-that-be that being in core has improved Views' code style in D8.

Anyways, your general review is along the lines of what I think, too. So: committed.
Thanks again for your work and input, everyone!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.