Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
For more information see discussion "#2126979: Overhaul the "Multi-Index Searches" module"
Comments
Comment #1
Nick_vhComment #2
drunken monkeySee the bottom of #2232253-29: Make the indexing batch process work for my thoughts on how tracking could/should work with multiple types per index.
It therefore probably makes sense to have the tracking stable (again) before implementing this change. I'll discuss this shortly in today's meeting – either someone else implements the now more or less agreed-upon tracking solution, or I'll do it myself, before starting on this issue.
Comment #3
amateescu CreditAttribution: amateescu commentedMarking as postponed on #2232253: Make the indexing batch process work for better issue tracking.
Comment #4
amateescu CreditAttribution: amateescu commentedErr, I meant this.
Comment #5
amateescu CreditAttribution: amateescu commented#2232253: Make the indexing batch process work is in!
Comment #6
drunken monkeyOK, then let's see how this could be implemented …
Index::options['fields']
into sub-arrays for each typeComment #7
BerdirJust thinking out lout, what about switching to passing around an object? IndexItem, SearchItem, something like that...
Then you can more or less avoid the concatenated string vs. multiple arguments question and while it's going to be a larger initial change (that should possibly happen as a separate issue, without introducing an further changes) but it would be much easier to add language to it I think.
The biggest problem could be that AFAIK right now, we often pass a list of indexable data arrays around, we can't use an IndexItem object as array key, but switching to type + id would require that to be nested anyway. Maybe an extended object, like IndexItemData?
Could look like this:
And when you add language to the mix, you can add a new getLangcode() and extend the unique identifier. Not sure if the unique identifier should be unique within the index or globally, but I assume within the index is enough. Names are just suggestions of course.
And for the data, something like this?
Or you could only have a single interface and define that the data methods only return something in certain cases, but two interfaces might be a nice separation for that?
We unfortunately can't type hint on an array of specific objects but at least PHPStorm understands @param IndexItemDataInterface[] $items just fine.
Comment #8
drunken monkeyYes, that's definitely something I wanted to do. Nested arrays are just sooo D7. ;)
I see we don't have an issue for this yet, but yes, basically I completely agree with you there. Could you maybe create an issue with basically just a copy of your above comment (and reference this one)? I do agree, also, that it should be a separate issue. This change is large enough as it is, for now I'll just add another magic property (after
#item
).Also, again, I have to apologize to Frédéric here, who already implemented this partly in his initial port.
Btw, I've now dabbled a bit with this and am gonna commit what I have to a new branch soon. If you want, feel free to take a look – though it's of course far, far from finished. If someone wants, you can also work on this until tomorrow morning, I'll just review any new commits when starting again (and ask in IRC whether someone is currently working on it).
I'll also add a temporary "notes" file to that branch, where I keep track of things I shouldn't forget for this change.
Comment #9
freblasty CreditAttribution: freblasty commented@Berdir: Did you create the issue or can I go ahead and create it?
Comment #10
BerdirI did not :)
Comment #11
drunken monkeyThe branch is now also pushed to the sandbox. Yesterday, I failed "a little" and pushed it to the normal Search API project instead. Luckily, deleting branches is supported by the Drupal.org repos, otherwise we'd now have several hundred useless commits in the history there (let alone a new branch).
Continuing work on this now, as soon as I've looked at the other new comments in this issue queue.
Comment #12
drunken monkeyAfter a little more thought on the matter, I think for the fields a prefixed solution would be better, since it would hide this complexity for systems that don't care about it – e.g., the service classes, or processors.
E.g., the node author's name would then be
entity:node|uid:name
.One thing that makes this ugly currently is the possibility that datasource plugin IDs have colons in them, so we can't use a colon to separate the datasource from the rest of the field identifier. But it seems the colon method for derivatives is a core standard, right?
Then, what would be a good separator here? A pipe (|), as suggested above?
Comment #13
drunken monkeyUnfortunately, I'm done again already for today, busy day. :-/ I'll hopefully have a lot more time tomorrow (weekend probably not so much).
I committed again my current state, feel free to take a look or continue work (e.g., the tasks in the "notes" file, especially adapting the Fields tab). Tomorrow I'll probably finish the index and go on to the tracker – datasource can stay untouched, I think.
Comment #14
Nick_vhThis sounds good to me, gives us some flexibility for the UI also as you said. Hoping to get some more feedback
Comment #15
amateescu CreditAttribution: amateescu commentedIs that possibility a historical one (e.g. coming from 7.x)? And even if it is, can't we change it for the 8.x branch?
Yep, it is.
Is there something else preventing us from using a period instead of a pipe?
Comment #16
aspilicious CreditAttribution: aspilicious commentedAccording to this function, it only gets split on the first colon...
We should test this.
And some other copy paste snippet from irc
Comment #17
drunken monkeyNo, on the contrary. I meant that colons are now present in some datasource IDs because they are derivatives.
We didn't have that problem in D7.
No, of course not. Anything not valid in a datasource ID is OK here. Would you think a period makes more sense?
@ Bram: Thanks for the additional information!
However, it seems to me I have caused some misunderstandings here. (Or maybe not and I'm just misunderstanding.) I'll try to rehash the problem here:
uid:name
).entity
, as specified in the annotation) should never contain a colon (we don't want to allow that), due to the possibility of having derivatives (that use colons), complete datasource IDs (likeentity:node
) can contain colons.entity.node
instead. That's why I asked if derivatives only work with colons.So, a "normal" (non-derivative) plugin ID containing a colon is definitely not the issue here, we don't want to allow that.
I hope I could clear this up.
Continuing work on the branch now.
Comment #18
amateescu CreditAttribution: amateescu commentedThanks, that is indeed much more clear and I do think a period makes sense in this case :)
Comment #19
drunken monkeyOther votes for pipe, period or other? (I have now extracted the separator into a constant, thus making it trivial to change.)
Furthermore, I have given more thought to the "combined vs. pure item ID" question (i.e., "Should the datasource ID be included in the item IDs, or not?") and am now unsure again. (Reminder: We previously settled on keeping it the pure ID from the datasource, and passing the datasource separately along with the items/IDs.)
A lot of functionality simply shouldn't/doesn't care about the separate parts, it just needs unique IDs. Having those centrally defined makes things a lot easier. Primarily the service classes really shouldn't care at all from which datasource an item comes. They get the extracted fields and should index them.
With "pure" item IDs, most of them would have to construct a combined ID anyways (the DB backend could circumvent this, but that would make deletes uglier), but the Search API couldn't use that. Instead, methods like
deleteItems()
would need to receive the item IDs plus the datasource they belong to, leading to (conceptually) completely unnecessary changes in the server plugin interface.On the other hand, if we include the datasource in the item ID, the datasource would need to remove that prefix when loading items. Also, the tracker might want to store the datasource ID separately.
I would therefore suggest that we generally use combined item IDs (maybe just using the same separator as for field names, and renaming it from
FIELD_ID_SEPARATOR
toDATASOURCE_ID_SEPARATOR
, or something), but introduce wrapper methodsloadItem()
andloadMultipleItems()
in the index class which automatically take care of removing this prefix, so the datasource plugins don't have to deal with them. We should then just properly document this, and also come up with clear terminology to refer to the different types of item IDs to avoid confusion (same for the machine names).The tracker could internally of course do what it wants – extract the datasource ID and store it separately, or don't care about it and use
LIKE
in queries that make it necessary. (Our implementation should of course still have a separate column, I do think that makes sense.)However, we could also decide to let the tracker also use the "pure" IDs of the datasource, and continue to, e.g., return remaining item IDs grouped by datasource. When we have two kinds of IDs, ideally only the index would ever have to interact with its datasources and its tracker, so that the complexity of translating between the IDs would be entirely encapsulated in the index class. (Therefore, we'd need to create proxy methods on the index for all generally useful datasource methods, like
getItemUrl()
andviewItem()
.)Any comments?
Comment #20
drunken monkey#2244787: Ensure index state invariants always hold contains a new side track of this, as I came across
Index::postSave()
which didn't really do what it should.Also, I'm finished for today and, most likely, the week. I sadly probably won't have much time the next days. If someone could, e.g., start the whole UI (fields, index create/edit), that would be great!
Comment #21
amateescu CreditAttribution: amateescu commentedThe suggestion from #19 makes sense to me, with the same disclaimer that I'm not yet familiar with the entire codebase :/
That said, I'm going to work on the UI bits today.
Comment #22
amateescu CreditAttribution: amateescu commentedStarted some work on the Index form, not so much progress but I'll get back to it later in the weekend.
Comment #23
amateescu CreditAttribution: amateescu commentedI finished the index form yesterday, going to work on fields and filters today :)
Comment #24
amateescu CreditAttribution: amateescu commentedThe fields page is done, started some work on processors but those need more thought.
Comment #25
drunken monkeySorry I didn't manage to look at this earlier. Your code so far looks mostly very good, though! Just two notes:
In any case, we can discuss more in half an hour!
Comment #26
amateescu CreditAttribution: amateescu commentedThanks for reviewing :) As discussed today, we shouldn't update the filters page to set processors per datasource, so we just need to revert this commit: http://drupalcode.org/sandbox/daeron/2091893.git/commit/63e0ebda1d46133d...
About the validation code for datasources, that's a good question. It was there before and I just made it work.. :)
Comment #27
drunken monkeyOK, reverted that commit. Now working on rewriting the processors themselves – shouldn't be much, though.
However, I've already hit a snag: I'd imagined
Index::getPropertyDefinitions()
would now receive the properties for only one datasource at a time. Thus, we'd also only pass the properties of a single datasource at a time to the processors for altering.What should a processor now do, though, if they are adding a new field for all datasources? For now, they'll need to add it to the properties array for all datasources separately. However, since we currently don't allow datasource-independent fields at all, there wouldn't be an easy way around this anyways, I guess.
Thinking about it, though, having datasource-independent fields is important for several different use cases, so we will probably have to re-add them before merging this into the master branch. Or maybe afterwards, to get this in as soon as possible? Creating a new issue, for now: #2247451: Allow datasource-independent fields.
Also, side note @ Andrei: Why did you comment out the cache set in
Index::getFields()
?Comment #28
drunken monkeyAlso²: Do the Fields forms already work for you? For me, they just throw fatals when submitting (either with "Save changes" or "Update" for the "Add related fields" subform).
If this isn't done, then that's something you could work at in parallel with me, if you want. As long as you only touch the fields form class, there shouldn't be any conflicts (otherwise, we'd have to coordinate).
Comment #29
amateescu CreditAttribution: amateescu commentedUgh, I committed that? Sorry :( I was debugging why the processors alter step from
Index::getPropertyDefinitions()
was makinggetFields()
return an empty array and it was easier than clearing the cache all the time.About the fields page, yep, adding regular and additional fields on a index with two datasources was working for me. I'll test again tomorrow morning and work only in that form if there's something to fix.
Comment #30
amateescu CreditAttribution: amateescu commentedI managed to get that fatal error after testing with different entity types and committed a fix for it.
Comment #31
drunken monkeyAh, OK. It's no problem, I was just wondering. (Personally, I always add "dpm" to lines I temporarily comment out, so I can quickly check for any forgotten debugging code by greping for that.)
Comment #32
drunken monkeyOK, this is now working quite well already, as far as I can see. I also wouldn't know of any areas that are still incompatible with the change.
However, I think there are still some fails in the tests which I couldn't quite figure out. Also, I've run out of time for this week, so it would be great if someone else could finalize this. Otherwise, we can try on Monday, then we should really have enough time I'd say. ;)
(Btw, I figured out some of my local fails. They were due to the wrong usage of paths in the tests, which lead to incorrect links on my installation (which isn't in the root directory of the domain). Such errors also appear at some places in our normal code, I think, at least until recently. Maybe we can fix that, too, in Zürich.)
Comment #33
amateescu CreditAttribution: amateescu commentedFixed
SearchApiLocalTasksTest
last night and I'm working on fixing the others now.Comment #34
amateescu CreditAttribution: amateescu commentedThe last fails I'm hitting are actually caused by a core bug in config entity query: #2248567: Config entity query doesn't work when matching an array property
Comment #35
amateescu CreditAttribution: amateescu commentedAaaaand we have a working branch!
Comment #36
drunken monkeyExcellent, thanks a lot! Looks great, too.
I only had one minimal code clarity fix, but other than that I think we can merge this now. Does anyone still want to have a look, or has some other objection?
Comment #37
amateescu CreditAttribution: amateescu commentedI think it's safe to merge it, everyone can express any objection next week at the sprint :)
Comment #38
drunken monkeyOK, did that. So, this is finally fixed, yay! ;)
Comment #39
amateescu CreditAttribution: amateescu commentedWoot!
Comment #40
Nick_vhAwesome work guys!