When Preprocessors "Transliteration" or "Ignore characters" to replace/filter out emojis, the Type of the Field doesn't matter.

Steps to reproduce
- Install Drupal with search_api and search_api_db (I've used Core 8.7.6 and sear_api 8-x.1.14)
- Add a Search Server (all default options, name Server)
- Add a Search Index (name Index, tick Datasources - Content, select Server Server)
- Go to [site_url]/admin/config/search/search-api/index/Index/fields and add field.
- Click 'Add' button on Content - Title, click 'Done'
- Click 'Save changes', leaving the Type as the default String.
- Create a Basic page with Emojis in the title.
- Run Cron to trigger indexing of the new page.
- Go to watchdog log and see this SQL-Error:

SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xF0\x9F\x92\x96 ...' for column 'value' at row 1: INSERT INTO @search_api_db_b_title (item_id, value) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => entity:node/1:en [:db_insert_placeholder_1] => 💖 💎, Daisy Meadows 👑💎 )

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

destinationsound created an issue. See original summary.

drunken monkey’s picture

Component: General code » Database backend
Status: Active » Postponed (maintainer needs more info)

This should already be supported since a few years ago. And we even have tests for it, I’m pretty sure.
Which database system are you using?

borisson_’s picture

Yes, make sure your database supports and is created as utf8_mb4 and then it should just work :)

Spokje’s picture

Status: Postponed (maintainer needs more info) » Active

I re-activated this one since I'm also experiencing problems with Emojis.

It all works fine if you use Type "Text" for your Fields. If using any other type: Nastiness.

Steps to reproduce
- Install Drupal with search_api and search_api_db (I've used Core 8.7.6 and sear_api 8-x.1.14)
- Add a Search Server (all default options, name Server)
- Add a Search Index (name Index, tick Datasources - Content, select Server Server)
- Go to [site_url]/admin/config/search/search-api/index/Index/fields and add field.
- Click 'Add' button on Content - Title, click 'Done'
- Click 'Save changes', leaving the Type as the default String.
- Create a Basic page with Emojis in the title.
- Run Cron to trigger indexing of the new page.
- Go to watchdog log and see this SQL-Error:

SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xF0\x9F\x92\x96 ...' for column 'value' at row 1: INSERT INTO @search_api_db_b_title (item_id, value) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => entity:node/1:en [:db_insert_placeholder_1] => 💖 💎, Daisy Meadows 👑💎 )

Note: Your mileage on the exact error will vary with your chosen title for the Basic Page.

Spokje’s picture

Title: adding emoji support for indexing? » Indexing fields with Emojis will not work for fields that aren't Typed as Text.
Issue summary: View changes

Updated Title and Issue Summary

Spokje’s picture

Category: Feature request » Bug report

Changed Category from Feature request to Bug report

Spokje’s picture

It looks like we're shooting ourselves in the foot in Class Drupal\search_api_db\DatabaseCompatibility\MySql where we set
tables to the Emoji-unfriendly utf8/utf8_general_ci whenever we _not_ use Type text:

$charset = $type === 'text' ? 'utf8mb4' : 'utf8';
$collation = $type === 'text' ? 'utf8mb4_bin' : 'utf8_general_ci';

https://git.drupalcode.org/project/search_api/blob/8.x-1.x/modules/searc...

According to the comment above this code, this is used to be able to use the full 255 characters as a primary key.
My patch will probably completely ignore that, but I'm curious to see what current tests will break.

I think at the very least we would need something filtering out non-utf8 characters when not using utf8mb4 to prevent the mention MySQL error. And also, of course a Test that proves my current reproduction steps are valid.

Spokje’s picture

Version: 8.x-1.13 » 8.x-1.x-dev
Spokje’s picture

Right, that didn't go well...

On my setup (D8.7.6, search_api 1.14 with locally Ver 15.1 Distrib 10.3.9-MariaDB, for Win64 (AMD64) and on server Ver 15.1 Distrib 10.1.41-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2 I only see primary keys with the maximum length of 150, so for my case, I think I could get away with my patch (and drush pmu search_api_db, drush en search_api_db, drush sapi-c and drush sapi-i)

Maybe a forced processor 'Ignore characters' so when using Field Type String?

Anyway, we need a failing test to reproduce first. I'll try to create one tomorrow, please feel free to create one before I can.

Spokje’s picture

Status: Active » Needs work
VasyOK’s picture

Thanx Spokje! 🤝

Your solution works.

While patching with Composer, it write:
Could not apply patch! Skipping.

In current version of serach_api this fragment something changed:

try {
    $this->database->query("ALTER TABLE {{$table}} CONVERT TO CHARACTER SET '$charset' COLLATE '$collation'");
}

Actual patch attached.

Yes, after uninstall search_api_db some settings are gone.
So is needed:

  • export drupal configuration
  • uninstall search_api_db
  • import drupal configuration
  • and reindex searched items

It's comfortable with drush:

drush cex
drush pmu search_api_db
drush en search_api_db
drush cim
drush sapi-c
drush sapi-i