So Drupal 7 used only numeric IDs to store entities but Drupal 8 is changing and adapting to the philosophy of none numeric IDs. We are building an application and using entities that use URIs as IDs.
The issue came when I tried to integrate with SOLR (I don't know about the generic framework but I tried to solve the issue within it so I'm reporting it here). The indexing and loading of the entities failed with errors regarding the ID.
Looking a bit more into it, we found 2 places within the code that are not enough ID agnostic and actually work when the character ':' and the character '/' are not part of the ID.
So I propose that we should change the way the ID splitting occures in the code. All we need to check is that when splitting by '/', the strpos should definetly be used instead of strrpos and also we should replace explode function in Utilities.php with a preg_split().
Credit also goes to sandervd
| Comment | File | Size | Author |
|---|---|---|---|
| #50 | 2671228-50--uris_as_entity_ids.patch | 5.99 KB | drunken monkey |
| #39 | 2671228.patch | 6.21 KB | dimilias |
| #10 | support_for_non_numeric-2671228-2.patch | 1.3 KB | dimilias |
Comments
Comment #2
dimilias commentedComment #5
dimilias commentedSorry for the previous patch, Filezilla messed the format...
Comment #6
dimilias commentedComment #9
pfrenssenI ran the test locally, it doesn't output why it fails but this is the reason:
This happens when an item is indexed with the language code "l0".
Comment #10
dimilias commentedOk so the previous one was failing because I did not include the upper case and numbers for the preg_split.
I though that language codes are just small case and dashes like "fr" or "en-us". Am I thinking it wrong? Various tests are using generators including both upper case and numbers as well.
Comment #11
pfrenssen@dimilias looking at
ConfigurableLanguageTest::testName()it appears that alphanumeric characters are accepted. Dashes are not tested there, and it only tests 2 characters.Now I did some more digging, and I found
ConfigurableLanguage::createFromLangcode()which points to a long list of supported language codes inLanguageManager::getStandardLanguageList(). They are all consisting of lowercase characters and dashes. Most are 2 characters but some are longer (e.g. 'xx-lolspeak'). None of them contain numbers, but sinceConfigurableLanguageTest::testName()actively tests that numbers are supported I would leave them in.I would like to see a specific test added for this issue. This regex stuff is always a bit tricky, so it would be good to have a failing test that proves the problem exists and is conclusively fixed.
Thanks for working on this!
Comment #12
dimilias commentedI've started on a test and it is not working for some reason (my xdebug is failing to communicate with me)..
Can you test it??
Comment #13
pfrenssenSome of these use statements are not used in the current scope.
It uses EntityTestMul here of which the schema is not available, it should be EntityTestStringId instead.
Comment #14
drunken monkeyThanks for reporting this problem and providing a patch!
The title is a bit misleading, though – string entity IDs are already well supported, it's just that it seems we have problems with specific special characters in there. But you're right, we definitely didn't think of that use case! (Actually, I thought content entities are still required to have integer IDs. But apparently I was mistaken.)
There will always be a language code at the end, so why not just use
strrpos()?The correct fix for this problem is not to change the
splitPropertyPath()method, but not to use it (or to use it in the proper way) insplitCombinedId(). It probably was a bad idea to begin with – it's just that the code was so similar and I apparently was too aggressive in trying to avoid code duplication.In any case, a test case demonstrating this problem would really be helpful!
Comment #15
dimilias commentedSorry for taking so long. I will try to write the test soon but I'm quite busy at the moment. :/
Comment #16
dimilias commentedOk, I have built the test. sorry for taking so long. This is also the first test I create so sorry if I am missing some logic.
There are two test methods in the test. The first one tests plain string Ids and the second one is testing uri ids.
The limits of the ID length are also described in the test regarding the db backend.
Comment #17
dimilias commentedInitially that was also my thought but I think (don't remember quite well) that there were various cases and the checks would need to be more and then again maybe there where use cases that I wouldn't think of. Like initially I thought that the only characters supported in a language code was letters. Then my tests were failing and it was because language codes also get '-' and numbers. So I went with this safe solution instead.
Comment #18
dimilias commentedHere is a better version of the previous test.
Comment #19
pfrenssenA new coding standards rule was accepted yesterday - these @file docblocks should be removed for namespaced classes. Finally :)
See #2304909: Relax requirement for @file when using OO Class or Interface per file.
This test is very similar to the first one. You can use a @dataProvider to provide a range of IDs to test. That way you can reuse the same test code to test a whole bunch of different IDs.
Edit: This was a review for the previous version of the patch :) Ilias can implement this faster than I can explain it :)
Comment #20
dimilias commentedComment #21
dimilias commented@pfrenssen Sorry! I just fixed this really fast. New patch available.
Comment #30
dimilias commentedI'm too clumsy.. Coding standards met.
Comment #31
dimilias commentedPatch with test also submitted now.
Comment #32
borisson_back to NR, so the testbot can have a look at the last patch.
Comment #33
borisson_Actually doing #32
Comment #34
pfrenssenStrange that #30 is not tested?
Comment #35
pfrenssenOk anyway the patch in #21 contains only the test, this one fails correctly to prove that it can correctly detect the bug. The one in #30 which is not tested only fixes some coding standards. The patch in #31 provides both the patch + the test and this one is green. So the test and the fix are proven to be functional. That's great!
Assigning to me for code review.
Comment #36
pfrenssenThis patch looks really great! I just found some minor code style and comment cleanups to make it perfect.
The two remarks of @drunkenmonkey were not addressed but you answered them. I also think that
strrpos()would be a viablealternative, but the regex also is fine for me.
I agree with the second remark of @drunkenmonkey:
splitPropertyPath()should not have been used in the first place insplitCombinedId(). In this patch we now had to implement a workaround becausesplitPropertyPath()was not used for its intended purpose. The workaround now actually can cause it to return wrong results if the $separator is a different value.I propose to remove the workaround from
splitPropertyPath()and replacesplitCombinedId()with the following:This should satisfy the tests, and we don't need to pollute
splitPropertyPath().Here's the rest of the review:
Tests indexing entities that are using string IDs.
This is overriding a property of KernelTestBase, so this docblock should be:
Are you sure this is correct? This says that it is "enabling" translation, but it sets the value to FALSE, which seems to be disabling translation? Am I seeing this incorrectly?
This is really minor, but in new code written for D8 we typically use the new short array syntax.[]instead ofarray()If you decide to fix this, have a look for other places in the test where this syntax is used.Edit: @borisson_ points out that Search API uses the classic array syntax by default.
Tests URIs as IDs.
This test is really great! I would never be able to tell this is your first test. Great job!
There is only 1 entity in this test, so you can just name the variable
$entity.This documentation is left over from an earlier version of the patch I think. The "One should fail" part is not relevant any more it seems.
Provides string IDs to test.
Here the description should be on a second line. I know, the coding standards are complicated ;)
Leave a space between 'return' and '['.
Comment #37
pfrenssenComment #38
borisson_#36.4 - we always use the long notation in the entire Search API module, so let's keep it to the long notation everywhere in this patch.
While this is great information, I don't think we need it in the docblock here (because it'll go out of date when this tests gets updated)
Should be
public function, there also needs to be a space after the brackets:() {I agree with @pfrenssen that this looks great!
Comment #39
dimilias commentedImplemented latest required changes.
@borisson what is NR?
Also, do I have to select anything specific for my patch to go through testing?
Comment #40
borisson_The issue status should be on "Needs Review".
Comment #43
pfrenssenThis is not needed any more since we are no longer calling
splitPropertyPath()fromsplitCombinedId().We should take a look at the failures too.
Comment #44
pfrenssenI'm thinking that it would be better if we simplify
splitPropertyPath()so that it always gives back all three parts, instead of doing all this magic with returning the first or the last part. That would make the logic more straightforward, and we don't need the additional arguments any more.If it returns an array with three items and you need only the language then you can do
list(,,$language) = splitPropertyPath($path). This is very similar to how the entity info was returned by Entity API in D7.Comment #45
pfrenssenHopefully this solves the failures. Also addressed my remark from #43. Going to have a look if it is feasible to simplify splitPropertyPath(), but that's probably better handled in a separate ticket. We now also don't have a need for the
$separatorargument any more. I did not change this.Comment #48
dimilias commentedHmm nice one.! And I thought I was missing something trivial when I tried to fix this myself.. I did not notice about the null case. :/
Comment #49
borisson_We can open a followup to simplify
splitPropertyPath.Comment #50
drunken monkeyWow, great work in this issue, thanks a lot everyone!
Before the switch to use
splitPropertyPath(),splitCombinedId()used the following code:However, the performance benefit is surely negligible, and I guess your new version is slightly better to read, so we can just stick with that.
I don't understand the argument for
preg_split(), so I changed the code to usestrrpos(), as suggested.I shortly considered also putting in a
$pos === FALSEcheck as a safeguard, but I actually think throwingPlease see the attached patch for this and some other small nit-picks – if that's still OK with you, I can commit it.
In any case, thanks again!
Comment #51
drunken monkeyComment #52
mkalkbrennerComment #53
pfrenssenIf this is a side effect of unexpected input, wouldn't it be better to throw an
\InvalidArgumentException?Comment #55
drunken monkeyNo, I don't think so. If you try to load a node with a string as a NID, you will also just get back
NULL(or an empty array, if multi-loading). Drupal is generally generous regarding error reporting when loading entities, and it seems natural to mimick that behavior here, I'd say.So, since this is blocking other issues, if this is the only complaint, I now just committed.
If you still feel we should throw an exception, please either re-open the issue or create a new one!
@ mkalkbrenner: How exactly is that issue related? (Thanks to the link, I noticed it was out-dated, though.)
Comment #56
mkalkbrennerDocumentation ;-)