Problem/Motivation
Currently, onDependencyRemoval()
of Search API Index entity doesn't handle the case when dependent fields are removed. It results in removing whole index config entity.
In addition, we set a dependency on field definition instead of field storage definition.
Proposed resolution
Remaining tasks
Extend onDependencyRemoval()
to consider removing field storage dependencies.
User interface changes
API changes
Data model changes
Estimated Value and Story Points
This issue was identified as a Beta Blocker for Drupal 8. We sat down and figured out the value proposition and amount of work (story points) for this issue.
Value and Story points are in the scale of fibonacci. Our minimum is 1, our maximum is 21. The higher, the more value or work a certain issue has.
Value : 3
Story Points: 5
Comment | File | Size | Author |
---|---|---|---|
#51 | 2541206-51--field_dependency_removal.patch | 23.21 KB | drunken monkey |
#25 | 2541206-25--field_dependency_removal.patch | 17.74 KB | drunken monkey |
#14 | consider_field_storage-2541206-14.patch | 11.69 KB | mbovan |
| |||
#11 | consider_field_storage-2541206-11-interdiff.txt | 9.17 KB | mbovan |
#11 | consider_field_storage-2541206-11.patch | 7.57 KB | mbovan |
|
Comments
Comment #1
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedChanged dependency name for fields to field storage definition config dependency name.
Extend
onDependencyRemoval()
on Search API Index to support removing field storage dependencies.There are 2 @todos:
- Making the field name generic through datasource
- Removing additional fields
Comment #2
BerdirSo the hard part is being able to route this through the data source so that it can give us the field_id to remove.
Maybe something like $datasource->getFieldIdsForDependencies($dependencies) or something like that?
Comment #3
drunken monkeyThanks a lot for reporting this problem and providing a patch! It already looks quite good. A few notes, though:
That needs to be a
|=
instead of an=
, unless I'm mistaken. Otherwise, other changes might be ignored.That's a bit weird to have if there's no
addField()
method.There is already the method of
$index->getFields()[$field_id]->setIndexed(FALSE, TRUE)
which is a bit more verbose, but should work fine.How about just making
getFieldDependencies()
public and letting it return a mapping between Search API fields and dependencies (no matter which way – we just have to keep in mind that that's an N:M relationship). It would then be called from the index automatically, of course, the datasource would just return (in the case of theContentEntity
datasource) the provider of the entity type it's configured for as the sole dependency. And theonDependencyRemoval()
code would have it easy to determine which fields need to be removed. (So it seems for both (existing) use cases the format[$dependency => $field_ids]
would be handier.)I think that would better be handled along with the previous item – when getting the dependencies from the datasource, it would just list all Search API fields that depend on each of the Field API fields – i.e., we would know that both
entity:node/field_editor
andentity:node/field_editor:entity:name
depend on thefield_editor
field. (However, we of course have to keep in mind when defining the return value ofgetFieldDependencies()
(and implementing it) that, e.g.,entity:node/field_editor:entity:field_project
would depend on two fields –field_editor
andfield_project
.The point that we should depend on field storages, not field instances, was already discussed in #2472917: Make index configuration depend on field storage not instances. Since there's no patch for that issue, yet, it's OK to fix it here, of course. I've just closed that issue as a duplicate now.
Anyways, thanks again for the great work!
Comment #4
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedUploading a new patch with some points applied from #3. Additional fields are still not considered.
There are quite some caching issues (?), not directly related to this one:
1. Create a new field (Drupal field), it won't appear on the Search API Index page until you clear the cache.
2. Create an index, enable some fields, field dependencies are not calculated each time you save changes? You have to clear the cache and save it again to see configuration updated.
3.
getFieldsByDatasource()
sometimes returns empty array even if there are fields present in the index configuration. You have to clear the cache and re-save the index configuration...Comment #5
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedJust removing unused
removeField()
method.Comment #6
drunken monkeyThanks for the reworked patch! Looks already better, but I don't think you've correctly understood my proposal?
It was about re-using the already existing (protected)
getFieldDependencies()
by making it public and letting it return a mapping of all dependencies mapped to the fields that depend on them.If done properly, this would already take care of additional fields, too.
If, for some reason, the field isn't present in the index, this will lead to a fatal error. Please check for the existence of the field before calling the method (and only set
$changed
if it is there.If the caching issues are not related, please create a new issue for them. It seems we just shouldn't cache the unindexed fields?
Comment #7
BerdirWe discussed that and couldn't really make sense of what you were suggesting since the function seems to be opposite of what we need (needs fields, returns dependencies).
But I think I understand what you are saying now. That it should just return the mapping for all fields and their dependencies, not just those that are passed in?
We can try that tomorrow.
Comment #8
drunken monkeyExactly, yes. Instead of just returning an array with the dependency keys as values, put the dependency keys into the returned array's keys and store the Search API field(s) each dependency comes from in the values. That way, you can easily use the same method for both determining the dependencies (
array_keys()
) and determining the fields which should be removed when a certain dependency is removed (foreach ($dependency_fields[$dependency] …
). Less code, less potential for bugs. And the issue with additional fields should also already be solved (though it seems we don't handle non-standard entity references, yet (i.e., non-Field API properties that reference entities) – but that's a different issue).Comment #11
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedProviding a mapping of dependencies and fields approach patch.
We tried to solve caching issues here by introducing
$useComputedFieldsCache
flag on Index because we are blocked with it and cannot properly test things. #4 1 and #4 2 should be solved with it.Currently, I have a problem on
onDependencyRemoval()
in Index when calling$this->getFieldsByDatasource($datasource_id)
. It gives me list of fields on the given datasource but without (Search API) fields that have dependency on the Drupal field that is going to be removed. I guess that's happening because ofcomputeFields()
and the$useComputedFieldsCache
flag we added, but cannot confirm.I would appreciate some feedback here.
Comment #12
drunken monkeyThanks a lot for your work on this!
First off, some remarks on the code (interdiff) I had at a glance:
We currently don't use the short array syntax in this module, I think mixing this is more confusing than useful.
What is that
@todo
supposed to mean?I think you can just save the fields in a variable before and then call
setIndexed()
andunset()
when removing fields.What do you mean with that? Don't you do that already?
As said, now that that method is public I would move this code to the
Index
class (since it will be the same for all datasources).Please remove the empty line.
Then, I'm not sure I understand the purpose of the
$useComputedFieldsCache
property correctly.At the very least, if it is
FALSE
, I don't think you should set the persistent cache to the newly calculated value?Also, for #4.1, isn't that what the new
entity_field_info
cache tag solves? Otherwise, why was that added?And #4.2 should be solved by just clearing our cache when the index is saved. Which, I think, our cache tags should already take care of? Otherwise, we should debug it, but if it hasn't got anything to do with this issue, I wouldn't want to introduce a haphazard fix here, which might (as you say yourself) interfere with our solution for this issue.
(If it makes testing harder/impossible, we'll just have to solve that issue first, before working further on this one.)
Speaking of tests: If possible, it would be great to get some automated test coverage for this issue. It's turning out really complex, and once it's fixed we should ensure that it stays fixed.
Hm, maybe for this code, with the sensitive time it's executed (in between various config changes) it might just be easier/better to only use low-level operations instead of our helper methods. I.e., loop through
$this->options['fields']
directly, and unset the removed fields directly there. Maybe that will help? And it might be possible to merge some of the loops that way, because currently your looping (explicitly and implicitly) a lot through the datasources and fields. Maybe just try to build a more suited data structure for the method and then go with that.If I can help you with something else, feel free to ping me again via mail or IRC.
Comment #13
BerdirOn the caching changes, we worked on that together, it is a bit complicated. Field definitions are pretty wrong in HEAD because of that, they're at least one save delayed. Which isn't *directly* related but it makes it very hard to debug/work on this issue.
First, those caches need to be invalidated if new fields are added, which is why we add the entity_field_data cache tag. However, that's only the start.
When you are saving the index and change the fields, we need to request them again during presave. in HEAD, that doesn't work at all because it just keeps using the computed field arrays that haven't been reset. So when the field configuration changes, then we now unset the computed field arrays. But even that is not enough, because it falls back to the cache and loads that again (which has not yet been invalidated as the view has not yet been saved).
That's why we added that flag to prevent it from using the cache at all when it has been changed in the current request. We could also delete the cache, but that method is called for every field so we would repeatedly have to delete that cache.. and there might be parallel requests to the site that are building it again, who knows. But yes, we also shouldn't write the cache if that flag is set.
Comment #14
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedRe #12:
1. Changed to array()
2. That was the line where was the problem - not loading the right fields.
3. Changed.
4. This is the line where we add dependencies. I was wondering if we need to add dependencies if there are only additional fields selected, but probably not.
5. Moved to Index class now.
6. Removed.
It should be set only if it is TRUE?
Added new test assertions in IntergrationTest which should fail now (with this patch and without).
Cannot use $this->options['fields'] as it only contains type and not property path what we currently use. Should we modify options, or
getFieldDependencies()
?Comment #16
drunken monkey@ Sascha: Thanks for the explanation! Makes sense, of course.
Still, I a) think the solution looks a bit hacky and b) worry that with just the check in
setOption()
this will not cover all cases. I.e., there's alsosetOptions()
andset()
which might change the fields, and maybe we'll add other code later.Safer, I think, would be to maybe store alongside the cached data the fields options for which it is valid – i.e., make the cache self-validating. Maybe with just a hash of
$this->options['fields']
(and additional fields) attached to the data? That way, you could right away detect when the information that was used to build the cache isn't the one for which you want the fields data now – no matter what events led to this situation. (On the other hand, I don't think this is a pattern employed elsewhere (e.g., in Core), so maybe there's a good reason not to do that. I'ts just an idea.)Also, if we end up agreeing on using the flag instead, the property should be added to the
__sleep()
method (I'd say).Yes, I'd certainly say so. Otherwise you'll cache the fields information for a version of the entity which isn't yet saved in the database – if something changes the fields along the way, you'll end up with a bad cache.
If that's the only issue, the property path can be trivially extracted from the field's machine name, with
list(, $property_path) = Utility::splitCombinedId($field_id)
.Comment #17
Nick_vhComment #18
BerdirOk. This works. Removed the flag and instead doing it with that hash.
Also had the problem that it still removed the index despite saying it would not. Which was somehow related to static caches that had already removed the field and didn't know about it anymore. So if I find an unexpected field now, I'm just flagging the entity as changed so that core recalculates the dependencies and then we're fine.
Comment #19
Nick_vhThis is great. I discussed this with Berdir in Drupalcamp Vienna and we need one more test that tests the same use case but with the additional fields. Eg, when you add a the title of an image to the search index as well and we remove that field, we need to make sure that also updates the index so that it no longer uses that additional field.
Comment #20
Nick_vhComment #21
borisson_Added tests tag.
Comment #22
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedI made a small modification, added support for nested Search API fields and provided tests for additional fields.
We still don't have support for deep nested fields, but as this is a rare case probably we can do it in a followup issue.
Comment #23
BerdirYeah, the dependencies code certainly isn't perfect yet, but this seems like a good start and this improves the situation a lot.
Comment #24
borisson_Looks great, rtbc++.
Comment #25
drunken monkeyThanks for implementing it like this, this does look a lot a lot better now.
Still, there were several things suboptimal about this, see the attached patch.
Also a few remarks:
These could also check whether the fields actually changed. But probably unnecessary for now, maybe when we move it to its own property (cf. #2317771: Move processor and/or field settings to top-level properties on the index?).
Why this change, shouldn't that be completely the same? Neither does the locked configuration depend on the status, nor is it possible to lock the index's server.
Doesn't mean that we'll always return
TRUE
when a field gets deleted, no matter whether we index it, or whether it's even on the same entity type?That sounds pretty inefficient, would be great to find out how we can more reliably detect that case, and/or what causes it.
Or does returning
TRUE
not effect a significant performance penalty?Also, what exactly doesn't work about ("deeply") nested entities? To me, it looks like it would just recurse again. (But the code is pretty complicated, of course, so I can't be sure.
It would also be great not to hard-code
:entity:
as the separator for an entity reference – even though that's in Core like this, it's still possible to have a reference to an entity in some other way.But sure, after these questions are cleared up we can at least commit this as a temporary partial fix and try to improve on it subsequently. Would be great if one of you could then create the follow-up issue, to make sure a) we don't forget it and b) it explains correctly what is still missing.
Comment #26
Berdir3. No. We're only called if we actually depend on the thing being deleted. There is no other way as far as I see.
Comment #27
drunken monkeyAh, of course, my mistake.
Comment #28
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedAfter #2638116: Clean up caching of Index class method results (especially fields) the situation about Items/search api fields is much more cleaner.
I adapted patch #25 to the latest Search API changes, removed some unnecessary workarounds, refactored etc... Beside automatic tests, tested manually as well, the patch worked fine.
Btw, no interdiff since last patch is ~2 months old.
Comment #31
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedAh, forgot to implement
TestDatasource::getFieldDependencies()
.Edit: Created a followup issue #2656916: Overhaul or remove ContentEntity::getFieldDependenciesForEntityType() based on #25.3
Comment #32
borisson_Is there an issue we can link to?
Overall I think the code here looks great, is it possible to add a testcase for this in the dependencyRemovalTest as well? That's a kernel test so probably a different test to the one added in the integration test. But that makes it easier to test the code in isolation.
Comment #33
drunken monkeyI wonder if that isn't needlessly restrictive. Why not also allow non-config dependencies there, by using an additional level of nesting?
Several things here:
Why is this public? And, if there is a good reason, the method should definitely also be on the interface. No public methods without interface, as a rule.
Why does it return the dependencies nested by datasource? Does that really provide any benefit? If there is really a need for that, we could also fulfill it by just adding an $datasource_id = NULL parameter.
This will have the complete wrong format for $dependency_data. Please add a loop that correctly nests it within 'always' and 'field'.
This is a completely different style than the rest of the method. Instead, add your code within the current framework, checking for a $plugin_type of 'fields' and then removing the fields when they come up there.
Unless I am very mistaken, the entity should never be saved inside an onDependencyRemoval() method.
See the attached patch for some further corrections.
Furthermore, I think the
ContentEntity::getFieldDependenciesForEntityType()
method takes a needlessly difficult approach. Instead, it should just take the property paths, descend into them (see\Drupal\search_api\Utility::retrieveNestedProperty()
for example code in that direction) and, at each level, check whether the property is an instance ofFieldDefinitionInterface
– if so, add the necessary dependency. (That way, #2656916: Overhaul or remove ContentEntity::getFieldDependenciesForEntityType() would also be fixed – and adding code for other kinds of dependencies would be much easier.)More importantly: is this actually correct? As said before, to me it looks like this would recurse correctly also for "deeply" nested fields.
We should definitely add a test for this, either to verify it fails and needs to be fixed or to make sure it works and stays working.
As you say, adding this to the existing
DependencyRemovalTest
would also make a lot of sense, yes. Although, of course, having both is even better, so we can naturally also keep the addition to the integration test.Also, the age of the last patch is not an issue, an interdiff could still make sense. But if you mostly just re-rolled, you're right, it would indeed be pointless.
Comment #34
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedTried to address some points #33 (1-5), but it seems that main purpose of the patch is not fulfilled anymore. :D Anyway, it might help someone to continue work.
Re #33:
- (2) Changed to private. Field dependencies are returned by datasource and that was the reason to return index fields dependencies keyed by datasource id.
- (3) Added 'always', 'fields', not sure if it is right tho...
- (4) Changed, I guess this is the part where new patch breaks.
- (5) Removed.
Removed @todo, as this seems to be working, indeed.
Added a test in DependencyRemovalTest class as well...
Edit:
Not sure why PhpStorm changed this.
Comment #37
mikemiles86Comment #38
mikemiles86Comment #39
drunken monkeyManaged to find the bug, re-wrote the field dependency detection code, but it's still there. The problem is (both with the previous and my current code) that, if we depend on the field storage config (instead of the field config itself), that at the point where
onDependencyRemoval()
is called the actual field configs have already been deleted – and with them our internal record of having them on the datasource. Thus, at the point where the field is actually deleted (not at the time where the resulting config chances are displayed – that correctly lists the index as "to be updated"), we can't actually find any dependency to the field, thus don't remove any dependency,$changed
remainsFALSE
and the index is deleted (even though it wouldn't even have any (reported) dependency on the deleted entity anymore (the field would remain on the index, though, which is of course not OK).The underlying problem is that the deletion of the storage upon deletion of the field config is not actually triggered by a dependency, but by a
postDelete()
operation. (Come to think of it, it's a bit of a mystery why the index is even listed on the field deletion page. Probably special code to include both dependents of the field and the storage (when both would be deleted).)In any case, I'm attaching my current state of work. Any ideas or input would be much appreciated – not sure how we can solve this properly.
Comment #42
BerdirYes. That was a critical that I worked on, in Barcelona IIRC.
Comment #43
drunken monkeyAh, that at least explains that. Thanks!
Any ideas for solving our problem here?
What I thought about that might help is adding caching for our dependency data. That way, the dependency data built for the "Predicted config changes" list would also be used for the actual dependency removal process. The caching would have to be persistent, though, as a cloned version of the index will be used for the "prediction" operation.
The problem here, however: I don't think it's impossible, internally, to remove a field config (or config entity in general) without first doing a "dry run" dependency check. And in that case, this would still fail. So, probably not practical either.
Comment #44
drunken monkeyNext idea: Just storing a field's dependencies along with its other information in the index. A bit brute force, maybe, but should generally be fine. Would at least be a solution if we can't think of any cleaner one.
Comment #45
drunken monkeyThis implements the "saving dependencies with the fields" method.
Seems to work well enough – I guess we should just commit that, we can always come back and improve on it later, if we have an idea.
Anyways, reviews welcome!
Comment #48
drunken monkeySeems #2670372: Index settings do not synchronize across environments fixed that problem. Attaching a re-roll of #45.
Comment #51
drunken monkeyMessed up while re-rolling, this one should work.
Comment #52
drunken monkeyStill applies – no volunteers for reviewing?
Then unless anyone objects, I'll commit this early next week. High time this finally got fixed.
Comment #53
BerdirThis is not an easy patch to review ;)
Can't spot anything obviously wrong, seems to work, so +1 from me.
Comment #55
drunken monkeyOK, then let's go the alternative route: commit first and then wait for people to complain.
Anyways, thanks for trying it out and looking it over!
Committed.
Comment #56
borisson_I looked at this while on the plane earlier today, these were my remarks:
Overall, I think this looks great, I did spot a couple of small suggestions.
/s/IDs/ID/?
This is a great comment!
This is a great comment, it perfectly describes what's going on. I'd only change "enable" for "unlock".
Comment #58
drunken monkeyI think "property paths keyed by field IDs" is fine like it is, but you're right about 3. – "enable" doesn't really make sense there. Committed the follow-up, thanks!