I have set an entity title as unique value with GUID as unique target.

During the processing of imports by Feeds, météo and meteo are seen as the same value, disregards differences in diacritics and disregards the different characters.

This should be changed because in a multilingual website, words with a different diacritic has often a different meaning in another language. This approach is also used in the internet domain name system.

Conclusion: different diacritics (i.e. météo and meteo) should be handled as unique and differen strings, or at least a setting should allow for this.

Currently, during import of météo and meteo, only one value is processed but Feeds overwrites its values with the second one. The result is that one entity is not created and the other gets the wrong data in it.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Yuri created an issue. See original summary.

MegaChriz’s picture

Status: Active » Needs review
FileSize
877 bytes

This is caused by the database backend. Drupal with MySQL is by default case insensitive. According to http://stackoverflow.com/questions/4558707/case-sensitive-collation-in-m..., setting the collation to utf8_bin would fix the issue. I tried this by setting the column "guid" of the table "feeds_item" to utf8_bin and that worked.

The attached patch makes the column "guid" of the table "feeds_item" case sensitive.

This change could cause issues with existing installs that rely on the case insensitive behaviour. These could probably work around that problem by using Feeds Tamper's "Convert case" plugin (or by manually setting it back to utf8_general_ci).

Note: I also checked if this issue could be fixed using Feeds Tamper, but I did not find an appropriate Tamper plugin to convert the value. I found out though that a MD5 hash of "météo" is different compared to the MD5 hash of "meteo", but there is no Tamper plugin to generate a MD5 hash for a single value. There is one to generate a hash of the whole item, but that would not work, cause then the item as a whole is considered unique. That would result into a new item if something else besides the unique value changes.