I have a small site: 120 nodes, about 20.000 words indexed, partial search enabled.
A normal search takes about 11seconds, while the cached result loads in 1second. The fewer results a search returns, the faster it loads. No results loads in about 1second, 50 results takes 11seconds.
I placed a timer on SearchApiPageController.php::page(), at the start and right before return $build. That function seems to take up about 10seconds of the 11 it takes to build the page. So it's clear that the search is responsible.
I narrowed it down to one line, taking op 9.75seconds of the 9.78 used in the function:
$result = $query->execute();
This seems related to: https://www.drupal.org/project/search_api/issues/2898327
Although I tried all patches there and the search time does not change.
I will continue investigating this week and I'll report back.
| Comment | File | Size | Author |
|---|---|---|---|
| #12 | 3015114-12--highlight_performance.patch | 2.24 KB | drunken monkey |
Comments
Comment #2
mvogel commentedThank you for investigating. I got kinda the same issue, especially when I add a second entity type so you search for node and user for example.
It seems to be related with the "content access" preprocessor when I disable it the search is fast.
Comment #3
weseze commentedI tried disabling access checks in the search API, but still 11 seconds for 50 results. I will start digging deeper.
Comment #4
weseze commentedSo I've narrowed it down to the highlight processor. As soon as I disable that, the load time drops from 11seconds to 1 second.
Comment #5
legolasboThis issue should live in the #790418: Search API issue queue because this has been narrowed down to the highlight processor, which is part of Search API, not Search API Pages. That said, I do have a patch in the making to address this issue.
Comment #6
legolasboI managed to reproduce this issue on a vanilla Drupal installation by taking the following steps:
drupal crnWhile looking through the highlighting code I noticed a couple of inefficiënt code constructs and changed those to be more efficiënt. That reduced the waiting time to 1-2 seconds when highlighting with partial matching is enabled.
@weseze Can you please test the attached patch and see if it resolves your issue?
Comment #7
legolasboComment #8
borisson_Oh wow, this looks like it makes a ton of difference. I'm going to review this properly later today but at first glance this is looking really really awesome.
Assigning this to me, in the hope that that makes me remember to do that :)
Comment #9
mkalkbrennerI already aware of these kind of optimizations. But shouldn't we follow the recommendation for PHP 5.6+?
Comment #10
legolasbo@mkalkbrenner,
I agree that the php 5.6+ approach would be better, but as far as I am aware search_api still supports php 5.5, because that's the minimum version supported by Drupal 8 core, which is why I went for the old solution. I'd be more than happy to change this if the maintainers agree.
Comment #11
borisson_I had this assigned to me, but I don't have the bandwith to actually look at this. This looks super solid though, so it get's a +1 from me.
Comment #12
drunken monkeyWow, if the performance gain of this is really as large as you say, that would of course be huge! Thanks a lot for finding this, weseze – are you maybe able to test the patch and verify the improvements for you?
Of course many thanks to legolasbo as well, nice job with the optimizations! Just two notes:
Have you actually tested whether this is an improvement?
I remember benchmarking this once and array_diff() is really incredibly slow. At least at that time, flipping the array and then doing array_diff_key() instead was faster – and the current code does something similar.
It’s of course nice that this is more concise, so if there’s no difference we might as well still opt for that change – but I still always get a slight jolt when seeing
array_diff()used, thinking, “this could be a performance problem”.This will result in a warning if
$itemis empty, and it’s not obvious that it can’t be (even with a bit of research). SO, let’s just check for that to be on the safe side.Comment #13
legolasboI did not test this change specifically, but the patch as a whole. Developed and tested on a vm running php7.2, so array_diff might still be "slow" on php 5.6. That said, it probably won't matter that much, because the number of elements in both arrays being compared won't be high enough to matter (or the site in question will have far worse problems caused by the excessive number of fields in use)
Your change looks good to me and assuming that you agreed with the rest of the patch, I think this is safe to mark RTBC. That said, I think it would be best for @weseze to test if this actually solves his problem and mark it RTBC if it does.
Comment #14
borisson_Array_diff is indeed a bit slower, even on php 7.3.x I used https://3v4l.org to test this:
With array_diff - this takes 55ms/14,8mb, with array_diff_key it takes 27ms/14,6mb.
So there is a difference on that one line, but it still seems to be faster based on
Comment #15
drunken monkeyThank a lot for benchmarking, Joris! Good to know this is still a valid concern!
However, I guess Legolasbo is still right: This is called just a single time, with pretty small arrays, so there is little chance that this will ever actually matter. So, we might as well just use the more concise version for that, even though it’s misplaced as part of a performance patch.
Also agree that it would be great if weseze could test this so we can be sure of the improvements here, and that they fix the problem. (Would really be an incredible improvement, so would be great to get this committed.)
(Also, note to Legolasbo/#5: the
[#NID]markdown only works for issues, you unfortunately can’t link projects or other nodes that way.)Comment #16
borisson_Yes - so safe to ignore my comment there.
I agree it would be great to see a comment by @weseze but it has been quite some time since he opened this issue. Benchmarking this now with the help of @Nick_vh
Comment #17
borisson_In the testing I did, I couldn't find a way to see this perform better but I also did not see it slower. This gets a +1 from me based on the results by @legolasbo.
Comment #19
drunken monkeyAlright, thanks for testing!
I contacted weseze and he says he’ll have to see when he has time for testing. So, I agree, let’s just commit this.
Thanks again, everyone!
Comment #21
thomaswalther commentedI have about 90 nodes from different types and I got 300 seconds timeout or max memory size of 1258291200 bytes exhausted.
I use highlight with checked excerpt. If I deactivate the excerpt, it worked. So performance with the excerpt (indexed field) is a problem for me.
If I changed the node types from 4 to 2 its a lot faster.
Comment #22
borisson_@thomaswalther please open a new issue.