Now that string ID is part of Drupal 8 we have come across with an issue regarding the length of the ID supported.
The current item ID length is 50 characters long. Back in drupal 7 with the numeric IDs, even 10 characters are more than enough but this could be a problem for Drupal 8.

Use case scenarios are for example DOI assignment where each item could get an ID of an identifier in the format of <prefix_number>-<item_number>-<subitem_number> etc that could end up in something like 001test02-12290-cd1222. More like a Uuid generator.
What makes this even worse is that the current saved string's format (regarding the item ID) is something like
entity:<entity_type_id>/<entity_id>:<language_code>
In the provided test, I am using the core entity_test_string_id entity type which is already an id of 21 characters long. The word 'entity' is 6 character long and if, let's say, we assume 'und' as the language code, plus the separators ':' and '/', we are up to 33 characters. Which leaves us with 17 characters for the ID. Even the Uuid generator returns more than 30 characters long string.

Searching around a bit, I think that we can increase the limit to 191 characters before we hit some database limit but feeling the need to not be that greedy, I have provided a patch as well that can increase the limit to 150 so that e.g. for the test, the limit goes up to 117 characters.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dimilias created an issue. See original summary.

The last submitted patch, 2: 2704761_test.patch, failed testing.

idimopoulos’s picture

For log reasons I am also submitting a patch_only version (because I need to refer to it from the composer).

pfrenssen’s picture

Issue summary: View changes
pfrenssen’s picture

Realistically this should support entity IDs that are as long as the ones that core supports. I looked this up and the example implementation EntityTestStringId uses the basic string data type without any limitation. This means a maximum of 255 characters (ref. StringItem::defaultStorageSettings()).

Then we also need to take into account the maximum length of an entity type machine name and the language code. Both of these can be pretty long. Entity types can be user generated (e.g. node types), and these are typically up to 32 characters. The language code can be for example 'xxx-lolspeak' which is 12 characters.

I think it would be better to split up the parts that make up this ID and store them in dedicated table columns.

pfrenssen’s picture

Status: Needs review » Postponed
Related issues: +#2671228: Support for arbitrary strings as entity IDs

The patch is building on the test written for #2671228: Support for arbitrary strings as entity IDs, so let's wait until that is in first.

drunken monkey’s picture

Issue tags: -search_api_db, -entity_id, -string id

I completely support this, 50 characters is really very little for our combined IDs. Almost as soon as some entity uses string IDs, this will be too restrictive.
However, unless I'm mistaken, especially with columns in the primary key increasing the size does have its repercussions, so I wouldn't go overboard here. 150 is probably good enough for almost all use cases.
I would have suggested to just use a config setting for determining this, but then we'd always need to adapt the search_api_item.item_id column definition to these changes, which would be really tricky. And a UI would probably be overkill (and might still be problematic).
So yes, let's wait for the other patch to go in, and then commit this.

PS: It seems you (like many others – it's really easy to misinterpret) are confused by the "Issue tags" field. As the guidelines state, they aren't meant for free text tags related to the issue, but only for specific categorization purposes, usually by module maintainers.
So, if you aren't sure your current usage is correct, please just leave the field empty.

pfrenssen’s picture

Yes indeed 150 characters should be sufficient for 99% of the use cases that Search API supports. If this is surpassed it is going to be a very special use case anyway.

We are such a special snowflake in that we are using URIs as entity IDs which can theoretically be very long (over 2000 characters I believe). In practice we have control over the URIs we generate so we are able to stay within the 150 character limit for content we generate ourselves. If we are forced to go over this limit in the future we'll come up with a custom solution like a lookup table or something.

drunken monkey’s picture

Status: Postponed » Needs review

The other patch is committed, this one still applies. Would you say it's RTBC?
Do we need tests for this?

(Also, we might want to profile at some point whether this doesn't cost too much, performance-wise, in the DB backend. If that's the case, we might want to switch to auto-increment IDs just for the DB backend, and an additional table for translating them back to their real item IDs. But that's a follow-up which can be done at any time.)

idimopoulos’s picture

I cannot really suggest something on the profiling but in any case, since the other issue is merged now, I am re submitting the test and the patch as new files (since I am only extending the previous test by 2 lines).

The last submitted patch, 11: 2704761_only_test.patch, failed testing.

drunken monkey’s picture

Makes sense, thanks!
However, we (currently) don't use the short array syntax in this module, so this should stay consistent.
Also, I don't think using a UUID there is very clear – we can just generate a long string value, that's much more explicit.

Also changing the test data to have labels. Also a nice touch, I think, and we can save the comments that way.

The last submitted patch, 13: 2704761-13--increase_id_column_length--tests_only.patch, failed testing.

borisson_’s picture

Status: Needs review » Needs work

The test-only patch doesn't fail on php7, so we should write a better test.

drunken monkey’s picture

Status: Needs work » Needs review

I think it's not related to PHP 7 but to SQLite data types. I guess they don't have a size restriction on their columns, so item IDs could have had arbitrary length there before, too. (Not too familiar with SQLite, though – but it seems a reasonable assumption, I'd say. I wouldn't know what PHP 7 would have to do with this.)

drunken monkey’s picture

(Added a test for PHP 7 and MySQL to verify this hypothesis. Also one for PHP 7 and Postgres, because I'm stupid.)

pfrenssen’s picture

Status: Needs review » Reviewed & tested by the community

Cool, thanks!

drunken monkey’s picture

Status: Reviewed & tested by the community » Fixed

OK, then. Committed.
Thanks again, everyone!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.