QLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'word' at row 302: INSERT INTO @search_api_db_search_api_database_index_text (item_id, field_name, word, score) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3),

Token was: e4cbaihGUxdzUUogNst2iFA8Qph8AmQ4vpcXSpYbtMqoPzVYlR0UPIsBydNL5pWFLiBtAEA6pg (75 chars)

We need to check what Drupal 7 did here and make sure we test this with unit tests.

Comments

ekes’s picture

Status: Active » Needs review
StatusFileSize
new2.54 KB

D7 relied on falling through the switch cases from text to token where strings were shortened, with a log message.

D8 the fall through was removed: b00c4ce "Tokenizing is not working as expected after adding the HTML filter to the chain in the DB tests".

Patch adds a check within text case for 50 characters, and truncates.

ekes’s picture

drunken monkey’s picture

StatusFileSize
new2.08 KB

Thanks for the patch!
Your solution seems to make sense, at least as long as the initial decision to remove the fall-through was really correct/necessary. So, I'll commit it.
However, why did you create a new method on the test, not just use the existing regressionTests2() method? (Also, the doc comment would be missing if it's necessary.)
The attached patch would change that (and also removes a stranded newline in your patch).

nick_vh’s picture

+++ b/search_api_db/src/Plugin/search_api/backend/Database.php
@@ -1114,6 +1114,10 @@ class Database extends BackendPluginBase {
+              $v = mb_strcut($v, 0, 50);

Are we allowed to use the mb_strcut here? Didn't I just see an issue we should not rely on the mb extension?

ekes’s picture

MB extension I made a separate issue. There are more instances, so this is consistent with present, and add the test first.

I made a separate test because the 1 and 2 are logical units testing particular activity; only the HTML filter test switches body indexing on and off; so this is a test with body indexing on.

drunken monkey’s picture

I made a separate test because the 1 and 2 are logical units testing particular activity; only the HTML filter test switches body indexing on and off; so this is a test with body indexing on.

OK, would make sense I guess, but I think I like it more like this, just having the regression tests split in groups based on when they should run, not along other lines as well.
Also, wouldn't it work just as well to test with the long word in the "name" property?

ekes’s picture

For the test it needs to be Fulltext indexed, and able to be 50 chars. entity_test name is varchar(32). So body is easiest.

As the MB patch has landed first, I've updated this to work with Unicode::truncateBytes().

drunken monkey’s picture

Status: Needs review » Fixed

OK, thanks!
And true, of course, didn't think about that. Good work!
Test bot is happy, too, so: committed.
Thanks again!

  • drunken monkey committed d7945eb on 8.x-1.x authored by ekes
    Issue #2471509 by ekes, drunken monkey: Fixed errors for untokenized...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.