Token length is not checked and causes exceptions if it is longer than 50 chars [#2471509]

QLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'word' at row 302: INSERT INTO @search_api_db_search_api_database_index_text (item_id, field_name, word, score) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3),

Token was: e4cbaihGUxdzUUogNst2iFA8Qph8AmQ4vpcXSpYbtMqoPzVYlR0UPIsBydNL5pWFLiBtAEA6pg (75 chars)

We need to check what Drupal 7 did here and make sure we test this with unit tests.

Comment	File	Size	Author
#7	2471509-7-search_api-index_overlong_tokens.patch	2.09 KB	ekes
#3	2471509-3--index_overlong_tokens.patch	2.08 KB	drunken monkey
#1	2471509-1-search_api-token_length_is_not.patch	2.54 KB	ekes

Comments

Comment #1

ekes commented 15 April 2015 at 16:36

Status:

Active

» Needs review

Status	File	Size
new	2471509-1-search_api-token_length_is_not.patch	2.54 KB

D7 relied on falling through the switch cases from text to token where strings were shortened, with a log message.

D8 the fall through was removed: b00c4ce "Tokenizing is not working as expected after adding the HTML filter to the chain in the DB tests".

Patch adds a check within text case for 50 characters, and truncates.

Comment #2

ekes commented 15 April 2015 at 16:37

Comment #3

drunken monkey

he/him

German

Vienna, Austria

commented 15 April 2015 at 20:00

Status	File	Size
new	2471509-3--index_overlong_tokens.patch	2.08 KB

Thanks for the patch!
Your solution seems to make sense, at least as long as the initial decision to remove the fall-through was really correct/necessary. So, I'll commit it.
However, why did you create a new method on the test, not just use the existing regressionTests2() method? (Also, the doc comment would be missing if it's necessary.)
The attached patch would change that (and also removes a stranded newline in your patch).

Comment #4

nick_vh

he/him

Ghent

commented 15 April 2015 at 22:01

+++ b/search_api_db/src/Plugin/search_api/backend/Database.php
@@ -1114,6 +1114,10 @@ class Database extends BackendPluginBase {
+              $v = mb_strcut($v, 0, 50);

Are we allowed to use the mb_strcut here? Didn't I just see an issue we should not rely on the mb extension?

Comment #5

ekes commented 16 April 2015 at 08:33

MB extension I made a separate issue. There are more instances, so this is consistent with present, and add the test first.

I made a separate test because the 1 and 2 are logical units testing particular activity; only the HTML filter test switches body indexing on and off; so this is a test with body indexing on.

Comment #6

drunken monkey

he/him

German

Vienna, Austria

commented 16 April 2015 at 10:34

I made a separate test because the 1 and 2 are logical units testing particular activity; only the HTML filter test switches body indexing on and off; so this is a test with body indexing on.

OK, would make sense I guess, but I think I like it more like this, just having the regression tests split in groups based on when they should run, not along other lines as well.
Also, wouldn't it work just as well to test with the long word in the "name" property?

Comment #7

ekes commented 18 April 2015 at 09:20

Status	File	Size
new	2471509-7-search_api-index_overlong_tokens.patch	2.09 KB

For the test it needs to be Fulltext indexed, and able to be 50 chars. entity_test name is varchar(32). So body is easiest.

As the MB patch has landed first, I've updated this to work with Unicode::truncateBytes().

Comment #8

drunken monkey

he/him

German

Vienna, Austria

commented 18 April 2015 at 10:44

Status:

Needs review

» Fixed

OK, thanks!
And true, of course, didn't think about that. Good work!
Test bot is happy, too, so: committed.
Thanks again!

Comment #9

18 April 2015 at 10:45

drunken monkey committed d7945eb on 8.x-1.x authored by ekes

Issue #2471509 by ekes, drunken monkey: Fixed errors for untokenized...

Comment #10

2 May 2015 at 10:54

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Token length is not checked and causes exceptions if it is longer than 50 chars

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Related issues

News items

Our community

Documentation

Drupal code base

Governance of community