Support for arbitrary strings as entity IDs [#2671228]

Comment	File	Size	Author
#50	2671228-50--uris_as_entity_ids.patch	5.99 KB	drunken monkey
#50	2671228-50--uris_as_entity_ids--interdiff.txt	3.66 KB	drunken monkey
#49	support_for_arbitrary-2671228-48.patch	5.61 KB	borisson_
#49	interdiff.txt	1.32 KB	borisson_
#45	interdiff.txt	1.08 KB	pfrenssen
#45	2671228-45.patch	5.64 KB	pfrenssen
#45	2671228-45-test-only.patch	4.43 KB	pfrenssen
#39	2671228.patch	6.21 KB	dimilias
#31	2671228.patch	6.23 KB	dimilias
#30	2671228_test.patch	4.93 KB	dimilias
#21	2671228_test.patch	5.02 KB	dimilias
#18	2671228_18_test.patch	5.02 KB	dimilias
#16	2671228_16_test.patch	6.66 KB	dimilias
#12	2671228_12.patch	4.28 KB	dimilias
#10	support_for_non_numeric-2671228-2.patch	1.3 KB	dimilias
#5	support_for_non_numeric-2671228-2.patch	1.29 KB	dimilias
#2	support_for_non_numeric-2671228-2.patch	1.32 KB	dimilias

Comment #1

18 February 2016 at 14:51

dimilias created an issue. See original summary.

Log in or register to post comments

Status	File	Size
new	support_for_non_numeric-2671228-2.patch	1.32 KB

Comment #3

18 February 2016 at 15:04

Status:

Needs review

» Needs work

The last submitted patch, 2: support_for_non_numeric-2671228-2.patch, failed testing.

Log in or register to post comments

Comment #4

18 February 2016 at 15:04

The last submitted patch, 2: support_for_non_numeric-2671228-2.patch, failed testing.

Log in or register to post comments

Comment #5

dimilias commented 18 February 2016 at 15:22

Status	File	Size
new	support_for_non_numeric-2671228-2.patch	1.29 KB

Sorry for the previous patch, Filezilla messed the format...

Log in or register to post comments

Comment #6

dimilias commented 18 February 2016 at 15:24

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #7

18 February 2016 at 15:27

Status:

Needs review

» Needs work

The last submitted patch, 5: support_for_non_numeric-2671228-2.patch, failed testing.

Log in or register to post comments

Comment #8

18 February 2016 at 15:27

The last submitted patch, 5: support_for_non_numeric-2671228-2.patch, failed testing.

Log in or register to post comments

Comment #9

pfrenssen

Sofia

commented 18 February 2016 at 16:00

I ran the test locally, it doesn't output why it fails but this is the reason:

PHPUnit_Framework_Error_Notice: Undefined offset: 1 in src/Plugin/search_api/datasource/ContentEntity.php on line 372

This happens when an item is indexed with the language code "l0".

Log in or register to post comments

Comment #10

dimilias commented 18 February 2016 at 16:03

Status:

Needs work

» Needs review

Status	File	Size
new	support_for_non_numeric-2671228-2.patch	1.3 KB

Ok so the previous one was failing because I did not include the upper case and numbers for the preg_split.
I though that language codes are just small case and dashes like "fr" or "en-us". Am I thinking it wrong? Various tests are using generators including both upper case and numbers as well.

Log in or register to post comments

Comment #11

pfrenssen

Sofia

commented 18 February 2016 at 16:16

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

@dimilias looking at ConfigurableLanguageTest::testName() it appears that alphanumeric characters are accepted. Dashes are not tested there, and it only tests 2 characters.

Now I did some more digging, and I found ConfigurableLanguage::createFromLangcode() which points to a long list of supported language codes in LanguageManager::getStandardLanguageList(). They are all consisting of lowercase characters and dashes. Most are 2 characters but some are longer (e.g. 'xx-lolspeak'). None of them contain numbers, but since ConfigurableLanguageTest::testName() actively tests that numbers are supported I would leave them in.

I would like to see a specific test added for this issue. This regex stuff is always a bit tricky, so it would be good to have a failing test that proves the problem exists and is conclusively fixed.

Thanks for working on this!

Log in or register to post comments

Comment #12

dimilias commented 19 February 2016 at 15:09

Status	File	Size
new	2671228_12.patch	4.28 KB

I've started on a test and it is not working for some reason (my xdebug is failing to communicate with me)..
Can you test it??

Log in or register to post comments

Comment #13

pfrenssen

Sofia

commented 19 February 2016 at 15:20

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,146 @@
+use Drupal\Component\Render\FormattableMarkup;
+use Drupal\entity_test\Entity\EntityTestMul;
+use Drupal\KernelTests\KernelTestBase;
+use Drupal\language\Entity\ConfigurableLanguage;
+use Drupal\search_api\Entity\Index;
+use Drupal\search_api\Entity\Server;

Some of these use statements are not used in the current scope.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,146 @@
+    /** @var \Drupal\entity_test\Entity\EntityTestMul $entity_1 */
+    $entity_1 = EntityTestMul::create(array(

It uses EntityTestMul here of which the schema is not available, it should be EntityTestStringId instead.

Log in or register to post comments

Comment #14

drunken monkey

he/him

German

Vienna, Austria

commented 27 February 2016 at 15:49

Title:

Support for non Numeric IDs

» Support for arbitrary strings as entity IDs

2 files were hidden/shown/deleted

Status	File	Size
hidden	support_for_non_numeric-2671228-2.patch	1.32 KB
hidden	support_for_non_numeric-2671228-2.patch	1.29 KB

Thanks for reporting this problem and providing a patch!
The title is a bit misleading, though – string entity IDs are already well supported, it's just that it seems we have problems with specific special characters in there. But you're right, we definitely didn't think of that use case! (Actually, I thought content entities are still required to have integer IDs. But apparently I was mistaken.)

+++ b/src/Plugin/search_api/datasource/ContentEntity.php
@@ -369,7 +369,7 @@ class ContentEntity extends DatasourcePluginBase {
-      list($entity_id, $langcode) = explode(':', $item_id, 2);
+      list($entity_id, $langcode) = preg_split('/:(?=[a-zA-Z0-9-]+$)/', $item_id);

There will always be a language code at the end, so why not just use strrpos()?

+++ b/src/Utility.php
@@ -367,7 +367,8 @@ class Utility {
   public static function splitPropertyPath($property_path, $separate_last = TRUE, $separator = ':') {
-    $function = $separate_last ? 'strrpos' : 'strpos';
+    $is_default_separator = $separator != IndexInterface::DATASOURCE_ID_SEPARATOR;
+    $function = $separate_last && $is_default_separator ? 'strrpos' : 'strpos';

The correct fix for this problem is not to change the splitPropertyPath() method, but not to use it (or to use it in the proper way) in splitCombinedId(). It probably was a bad idea to begin with – it's just that the code was so similar and I apparently was too aggressive in trying to avoid code duplication.

In any case, a test case demonstrating this problem would really be helpful!

Log in or register to post comments

Comment #15

dimilias commented 14 March 2016 at 23:27

Sorry for taking so long. I will try to write the test soon but I'm quite busy at the moment. :/

Log in or register to post comments

Comment #16

dimilias commented 12 April 2016 at 13:14

Status	File	Size
new	2671228_16_test.patch	6.66 KB

Ok, I have built the test. sorry for taking so long. This is also the first test I create so sorry if I am missing some logic.

There are two test methods in the test. The first one tests plain string Ids and the second one is testing uri ids.
The limits of the ID length are also described in the test regarding the db backend.

Log in or register to post comments

Comment #17

dimilias commented 12 April 2016 at 13:20

There will always be a language code at the end, so why not just use strrpos()?

Initially that was also my thought but I think (don't remember quite well) that there were various cases and the checks would need to be more and then again maybe there where use cases that I wouldn't think of. Like initially I thought that the only characters supported in a language code was letters. Then my tests were failing and it was because language codes also get '-' and numbers. So I went with this safe solution instead.

Log in or register to post comments

Comment #18

dimilias commented 12 April 2016 at 13:35

Status	File	Size
new	2671228_18_test.patch	5.02 KB

Here is a better version of the previous test.

Log in or register to post comments

Comment #19

pfrenssen

Sofia

commented 12 April 2016 at 13:37

```
+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,198 @@
+/**
+ * @file
+ * Contains \Drupal\Tests\search_api\Kernel\EntityStringIdTest.
+ */
```
A new coding standards rule was accepted yesterday - these @file docblocks should be removed for namespaced classes. Finally :)

See #2304909: Relax requirement for @file when using OO Class or Interface per file.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,198 @@
+  /**
+   * Tests Uris as Ids.
+   *
+   * We are testing a uri because it is not matched by the regex character set
+   * represented by \w because they contain the characters ':' and '/'
+   * which are used to split the string saved in the index
+   * and should not affect it.
+   */
+  public function testUriStringId() {

This test is very similar to the first one. You can use a @dataProvider to provide a range of IDs to test. That way you can reuse the same test code to test a whole bunch of different IDs.

Edit: This was a review for the previous version of the patch :) Ilias can implement this faster than I can explain it :)

Log in or register to post comments

Comment #20

dimilias commented 12 April 2016 at 13:37

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #21

dimilias commented 12 April 2016 at 13:40

Status	File	Size
new	2671228_test.patch	5.02 KB

3 files were hidden/shown/deleted

Status	File	Size
hidden	2671228_12.patch	4.28 KB
hidden	2671228_16_test.patch	6.66 KB
hidden	2671228_18_test.patch	5.02 KB

@pfrenssen Sorry! I just fixed this really fast. New patch available.

Log in or register to post comments

Comment #22

12 April 2016 at 13:42

The last submitted patch, 12: 2671228_12.patch, failed testing.

Log in or register to post comments

Comment #23

12 April 2016 at 13:43

The last submitted patch, 12: 2671228_12.patch, failed testing.

Log in or register to post comments

Comment #24

12 April 2016 at 13:45

The last submitted patch, 16: 2671228_16_test.patch, failed testing.

Log in or register to post comments

Comment #25

12 April 2016 at 13:46

The last submitted patch, 16: 2671228_16_test.patch, failed testing.

Log in or register to post comments

Comment #26

12 April 2016 at 13:48

The last submitted patch, 18: 2671228_18_test.patch, failed testing.

Log in or register to post comments

Comment #27

12 April 2016 at 13:49

The last submitted patch, 18: 2671228_18_test.patch, failed testing.

Log in or register to post comments

Comment #28

12 April 2016 at 13:49

Status:

Needs review

» Needs work

The last submitted patch, 21: 2671228_test.patch, failed testing.

Log in or register to post comments

Comment #29

12 April 2016 at 13:50

The last submitted patch, 21: 2671228_test.patch, failed testing.

Log in or register to post comments

Comment #30

dimilias commented 12 April 2016 at 14:29

Status	File	Size
new	2671228_test.patch	4.93 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	2671228_test.patch	5.02 KB

I'm too clumsy.. Coding standards met.

Log in or register to post comments

Comment #31

dimilias commented 12 April 2016 at 14:31

Status	File	Size
new	2671228.patch	6.23 KB

Patch with test also submitted now.

Log in or register to post comments

Comment #32

borisson_

Dutch

Mechelen, 🇧🇪

commented 12 April 2016 at 15:07

1 file was hidden/shown/deleted

Status	File	Size
hidden	2671228_test.patch	4.93 KB

back to NR, so the testbot can have a look at the last patch.

Log in or register to post comments

Comment #33

borisson_

Dutch

Mechelen, 🇧🇪

commented 12 April 2016 at 15:07

Status:

Needs work

» Needs review

Actually doing #32

Log in or register to post comments

Comment #34

pfrenssen

Sofia

commented 12 April 2016 at 15:45

Strange that #30 is not tested?

Log in or register to post comments

Comment #35

pfrenssen

Sofia

commented 12 April 2016 at 15:49

Assigned:

Unassigned

» pfrenssen

Ok anyway the patch in #21 contains only the test, this one fails correctly to prove that it can correctly detect the bug. The one in #30 which is not tested only fixes some coding standards. The patch in #31 provides both the patch + the test and this one is green. So the test and the fix are proven to be functional. That's great!

Assigning to me for code review.

Log in or register to post comments

Comment #36

pfrenssen

Sofia

commented 12 April 2016 at 17:27

Status:	Needs review	» Needs work
Issue tags:	-Needs tests

This patch looks really great! I just found some minor code style and comment cleanups to make it perfect.

The two remarks of @drunkenmonkey were not addressed but you answered them. I also think that strrpos() would be a viable
alternative, but the regex also is fine for me.

I agree with the second remark of @drunkenmonkey: splitPropertyPath() should not have been used in the first place in splitCombinedId(). In this patch we now had to implement a workaround because splitPropertyPath() was not used for its intended purpose. The workaround now actually can cause it to return wrong results if the $separator is a different value.

I propose to remove the workaround from splitPropertyPath() and replace splitCombinedId() with the following:

  public static function splitCombinedId($combined_id) {
    return explode(IndexInterface::DATASOURCE_ID_SEPARATOR, $combined_id, 2);
  }

This should satisfy the tests, and we don't need to pollute splitPropertyPath().

Here's the rest of the review:

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+/**
+ * Tests entity indexing that are using string IDs.
+ *

Tests indexing entities that are using string IDs.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+  /**
+   * Modules to enable for this test.
+   *
+   * @var string[]
+   */
+  public static $modules = array(

This is overriding a property of KernelTestBase, so this docblock should be:

/**
 * {@inheritdoc}
 */

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+    // Enable translation for the entity_test module.
+    \Drupal::state()->set('entity_test_string_id.translation', FALSE);

Are you sure this is correct? This says that it is "enabling" translation, but it sets the value to FALSE, which seems to be disabling translation? Am I seeing this incorrectly?

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+    // Create a test server.
+    $this->server = Server::create(array(
+      'name' => 'Test Server',
+      'id' => 'test_server',
+      'status' => 1,
+      'backend' => 'search_api_test_backend',
+    ));
+    $this->server->save();

~~This is really minor, but in new code written for D8 we typically use the new short array syntax [] instead of array()~~.

~~If you decide to fix this, have a look for other places in the test where this syntax is used.~~

Edit: @borisson_ points out that Search API uses the classic array syntax by default.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+  /**
+   * Tests Uris as Ids.
+   *

Tests URIs as IDs.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+  public function testUriStringId($entity_id) {

This test is really great! I would never be able to tell this is your first test. Great job!

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+    $entity_1 = EntityTestStringId::create(array(

There is only 1 entity in this test, so you can just name the variable $entity.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+    // Test indexing the new entities. One should fail so only one should be indexed.

This documentation is left over from an earlier version of the patch I think. The "One should fail" part is not relevant any more it seems.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+  /**
+   * Provides string ids to test.
+   *

Provides string IDs to test.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+   * @return array An array of arrays which contain a list of parameters to be
+   *   passed to the appropriate tests.

Here the description should be on a second line. I know, the coding standards are complicated ;)

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+    return[

Leave a space between 'return' and '['.

Log in or register to post comments

Comment #37

pfrenssen

Sofia

commented 12 April 2016 at 16:59

Assigned:

pfrenssen

» Unassigned

Log in or register to post comments

Comment #38

borisson_

Dutch

Mechelen, 🇧🇪

commented 12 April 2016 at 17:04

#36.4 - we always use the long notation in the entire Search API module, so let's keep it to the long notation everywhere in this patch.

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+ * For this test we are using:
+ *
+ * entity:entity_test_string_id:<string_id>:und
+ * entity = 6 characters.
+ * entity_test_string_id = 21 characters.
+ * und = 3 characters.
+ * 3 x ':' = 3 characters.
+ * 50 - 6 - 21 - 3 - 3 = 17 characters left for the ID.

While this is great information, I don't think we need it in the docblock here (because it'll go out of date when this tests gets updated)

+++ b/tests/src/Kernel/EntityStringIdTest.php
@@ -0,0 +1,160 @@
+  function entityStringIdList(){

Should be public function, there also needs to be a space after the brackets: () {

I agree with @pfrenssen that this looks great!

Log in or register to post comments

Comment #39

dimilias commented 13 April 2016 at 07:52

Status	File	Size
new	2671228.patch	6.21 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	2671228.patch	6.23 KB

Implemented latest required changes.

@borisson what is NR?
Also, do I have to select anything specific for my patch to go through testing?

Log in or register to post comments

Comment #40

borisson_

Dutch

Mechelen, 🇧🇪

commented 13 April 2016 at 08:03

Status:

Needs work

» Needs review

The issue status should be on "Needs Review".

Log in or register to post comments

Comment #41

13 April 2016 at 08:05

Status:

Needs review

» Needs work

The last submitted patch, 39: 2671228.patch, failed testing.

Log in or register to post comments

Comment #42

13 April 2016 at 08:06

The last submitted patch, 39: 2671228.patch, failed testing.

Log in or register to post comments

Comment #43

pfrenssen

Sofia

commented 13 April 2016 at 08:42

+++ b/src/Utility.php
@@ -365,7 +365,8 @@ class Utility {
-    $function = $separate_last ? 'strrpos' : 'strpos';
+    $is_default_separator = $separator != IndexInterface::DATASOURCE_ID_SEPARATOR;
+    $function = $separate_last && $is_default_separator ? 'strrpos' : 'strpos';

This is not needed any more since we are no longer calling splitPropertyPath() from splitCombinedId().

We should take a look at the failures too.

Log in or register to post comments

Comment #44

pfrenssen

Sofia

commented 13 April 2016 at 09:08

I'm thinking that it would be better if we simplify splitPropertyPath() so that it always gives back all three parts, instead of doing all this magic with returning the first or the last part. That would make the logic more straightforward, and we don't need the additional arguments any more.

If it returns an array with three items and you need only the language then you can do list(,,$language) = splitPropertyPath($path). This is very similar to how the entity info was returned by Entity API in D7.

Log in or register to post comments

Comment #45

pfrenssen

Sofia

commented 13 April 2016 at 09:40

Status:

Needs work

» Needs review

Status	File	Size
new	2671228-45-test-only.patch	4.43 KB
new	2671228-45.patch	5.64 KB
new	interdiff.txt	1.08 KB

Hopefully this solves the failures. Also addressed my remark from #43. Going to have a look if it is feasible to simplify splitPropertyPath(), but that's probably better handled in a separate ticket. We now also don't have a need for the $separator argument any more. I did not change this.

Log in or register to post comments

Comment #46

13 April 2016 at 09:42

The last submitted patch, 45: 2671228-45-test-only.patch, failed testing.

Log in or register to post comments

Comment #47

13 April 2016 at 09:43

The last submitted patch, 45: 2671228-45-test-only.patch, failed testing.

Log in or register to post comments

Comment #48

dimilias commented 13 April 2016 at 09:48

Status:

Needs review

» Needs work

Hmm nice one.! And I thought I was missing something trivial when I tried to fix this myself.. I did not notice about the null case. :/

Log in or register to post comments

Comment #49

borisson_

Dutch

Mechelen, 🇧🇪

commented 13 April 2016 at 09:51

Status:

Needs work

» Reviewed & tested by the community

Status	File	Size
new	interdiff.txt	1.32 KB
new	support_for_arbitrary-2671228-48.patch	5.61 KB

3 files were hidden/shown/deleted

Status	File	Size
hidden	2671228-45-test-only.patch	4.43 KB
hidden	2671228-45.patch	5.64 KB
hidden	interdiff.txt	1.08 KB

We can open a followup to simplify splitPropertyPath .

Log in or register to post comments

Comment #50

drunken monkey

he/him

German

Vienna, Austria

commented 21 April 2016 at 17:53

Status	File	Size
new	2671228-50--uris_as_entity_ids--interdiff.txt	3.66 KB
new	2671228-50--uris_as_entity_ids.patch	5.99 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	support_for_arbitrary-2671228-48.patch	5.61 KB

Wow, great work in this issue, thanks a lot everyone!

Before the switch to use splitPropertyPath(), splitCombinedId() used the following code:

$pos = strpos($combined_id, IndexInterface::DATASOURCE_ID_SEPARATOR);
if ($pos === FALSE) {
  return array(NULL, $combined_id);
}
return array(substr($combined_id, 0, $pos), substr($combined_id, $pos + 1));

However, the performance benefit is surely negligible, and I guess your new version is slightly better to read, so we can just stick with that.

I don't understand the argument for preg_split(), so I changed the code to use strrpos(), as suggested.
I shortly considered also putting in a $pos === FALSE check as a safeguard, but I actually think throwing

Please see the attached patch for this and some other small nit-picks – if that's still OK with you, I can commit it.
In any case, thanks again!

Log in or register to post comments

Comment #51

drunken monkey

he/him

German

Vienna, Austria

commented 21 April 2016 at 17:54

Status:

Reviewed & tested by the community

» Needs review

Log in or register to post comments

Comment #52

mkalkbrenner

German

🇩🇪

commented 22 April 2016 at 07:17

Log in or register to post comments

Comment #53

pfrenssen

Sofia

commented 23 April 2016 at 14:31

+++ b/src/Plugin/search_api/datasource/ContentEntity.php
@@ -368,7 +368,14 @@ public function loadMultiple(array $ids) {
+      // This can only happen if someone passes an invalid ID, since we always
+      // include a language code. Still, no harm in guarding against bad input.
+      if ($pos === FALSE) {
+        continue;
+      }

If this is a side effect of unexpected input, wouldn't it be better to throw an \InvalidArgumentException?

Log in or register to post comments

Comment #54

30 April 2016 at 09:20

drunken monkey committed a29edab on 8.x-1.x authored by dimilias

Issue #2671228 by dimilias, pfrenssen, borisson_, drunken monkey: Fixed...

Log in or register to post comments

Comment #55

drunken monkey

he/him

German

Vienna, Austria

commented 30 April 2016 at 09:37

Status:

Needs review

» Fixed

If this is a side effect of unexpected input, wouldn't it be better to throw an \InvalidArgumentException?

No, I don't think so. If you try to load a node with a string as a NID, you will also just get back NULL (or an empty array, if multi-loading). Drupal is generally generous regarding error reporting when loading entities, and it seems natural to mimick that behavior here, I'd say.

So, since this is blocking other issues, if this is the only complaint, I now just committed.
If you still feel we should throw an exception, please either re-open the issue or create a new one!

@ mkalkbrenner: How exactly is that issue related? (Thanks to the link, I noticed it was out-dated, though.)

Log in or register to post comments

Comment #56

mkalkbrenner

German

🇩🇪

commented 30 April 2016 at 10:09

@ mkalkbrenner: How exactly is that issue related? (Thanks to the link, I noticed it was out-dated, though.)

Documentation ;-)

Log in or register to post comments

Comment #57

14 May 2016 at 10:14

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Support for arbitrary strings as entity IDs

Comments

Related issues

Referenced by