I'm not sure why, but it appears that the search api sometimes marks items to be deleted via a task instead of via a tracked item change. However, when the search service (solr) is unavailable for a while, a lot of tasks can quickly pile up. In my case I ran into a scenariou where those were ~6. 10^6 tasks - obviously search API fails to handle *all* of those tasks at once before indexing and thus becomes broken.
So either the task-handling code needs to become a reliable (queue?) implementation or maybe the problem is having so much item deletions in there at all?
Estimated Value and Story Points
This issue was identified as a Beta Blocker for Drupal 8. We sat down and figured out the value proposition and amount of work (story points) for this issue.
Value and Story points are in the scale of fibonacci. Our minimum is 1, our maximum is 21. The higher, the more value or work a certain issue has.
Value : 2
Story Points: 5
Comment | File | Size | Author |
---|---|---|---|
#9 | 2319103-9--fix_server_tasks_scalability.patch | 47.16 KB | drunken monkey |
Comments
Comment #1
drunken monkeyWell, if so many items get deleted on your site, there's hardly a way we can change that. But I agree, the potential of having so many tasks was never considered when implementing this system, and having them would of course lead the whole system to collapse.
Probably the best way forward here would be to a) limit the tasks executed in each call of
search_api_server_tasks_check()
(or its D8 equivalent) and b) fail if there were more than that. That would at least ensure that tasks get processed at all, even if other operations on the server would fail in that time. A nice addition would then be if we'd also provide a button on the server's "View" page that would execute all pending tasks using the Batch API. That should somewhat help if there are much too many tasks, as in your case.Oh, now that I think about it, there's another problem in D8: as for all other operations on the server, we check for pending tasks before doing a "delete all" operation. However, if we delete all items, pending "delete" operations shouldn't keep us from doing/trying that, so we should first invalidate all those pending tasks.
In your case, I would recommend either removing and re-adding the index from/to the server to automatically clear all the tasks; or manually deleting these items and removing the tasks in code.
Comment #2
Nick_vhComment #3
drunken monkeyThis implements all my suggestions from #1, also adding test coverage, and should therefore (hopefully) fix this problem. It also expands the general capabilities of the server task manager somewhat, especially regarding overridability.
One thing I wondered, though, is why
Server::deleteAllItems()
doesn't have any kind of task system integration currently. The issue where it's added (#2230925: Implement server tasks system) doesn't mention anything about this.Does anyone have any idea?
Comment #4
borisson_I love the new tests! Very readable and they look like they cover everything, woo! No idea about the question in #3.
I think
$pending_tasks
is a better variable name for this, but I don't feel very strong about it.Do you think we can inject this service instead? That might be something to discuss / implement in a follow-up though.
Again,
$pending_tasks
?Can we use strict comparison for comparing strings?
Execute the SELECT query and store the used limit, the query object doesn't have a method to retrieve the range.
sounds more confident. :)Can we mention here, that if no server is passed along, we results for all servers will be returned?
This is just a \stdClass, right? we can typehint on that or at least change the documentation to \stdClass instead of object.
Comment #5
drunken monkeyProbably not everything, but I tried my best. Thanks for the praise!
Damn.
Any opinion on whether we should change that? (In a follow-up, probably, though.)
Maybe we ran into troubles since it uses
deleteAllIndexItems()
internally, which has task-integration itself? Would have to see about that.No, unfortunately entities cannot use dependency injection at all, otherwise we'd use a lot there. Really a hole in the framework there.
However, no such excuse for the
ServerStatusForm
– added DI there.If we know that both will be strings, I see no reason to do that.
However, this check didn't seem to work, for some reason, so that code is gone anyways.
Maybe, but it doesn't put the blame so elegantly on Core's doorstep. And I'd think most people reading this would agree that it's weird.
No, definitely not.
stdClass
is never anything that should be relied upon, in my opinion – and, generally, you also shouldn't type-hint class names. Especially ones you cannot subclass.I don't think adding that in the documentation would give us any benefit, either, but it might be wrong in some cases.
Fixed your other comments, tested a bit and also added a general "Execute all server tasks" functionality for the overview page (and linked to in the status report). From what I can tell, everything seems to work nicely – except that the action on the overview page is sometimes not displayed, and it's access check method isn't called. As of now, I have no idea why that would be the case. Any hints would be appreciated! (I do know much too little about our new routing system in general, I'm afraid. Just copied bits and pieces from Core, and from our existing definitions.)
Comment #8
borisson_I think we should probably make those changes, to integrate server::deleteAllItems with the task system in a followup. Not sure why the access check isn't called as expected.
I think we should import this class, so we don't have to do this with the full namespace.
Comment #9
drunken monkeySeems like the test has the same problem (even after eliminating all the other problems it had), so I had to debug that anyways.
What I found, though, seems to be a Core bug: #2722237: "Local actions" block doesn't take cache data of route access information into account. Let's see how that turns out. At the moment, we can just clear the render cache before accessing the page and the tests will pass. (Not ideal for actual users, of course, but then again having this link present is hardly vital, since we have the same functionality accessible through several other ways.)
Yes, of course. For some reason, PhpStorm autocompletes it like this sometimes, and it seems I missed that. Good catch!
Comment #11
borisson_Comment #13
drunken monkeyGreat, thanks a lot!
Committed.
Moving back to D7 – in case someone wants to pick it up. I probably won't get to it in the foreseeable future.