Problem/Motivation

The 8.x change "Search removes diacritics in indexing rather than relying on database collation" described in https://www.drupal.org/node/2447357 (based on the issue at https://www.drupal.org/node/731298) is incompatible with several languages. It introduces a new removeDiacritics function, \Drupal::service('transliteration')->removeDiacritics($text), into the search_simplify function. This function is always run (both during indexing and actual searches) and removes all diacritical marks from the input text. This happens before hook_search_preprocess implementations have had a chance to affect the text, making e.g. stemming (a suggested use case for the hook) impossible when it relies on the existence of accented characters.

Some examples of common stemming algorithms that expect input to have accented characters to produce reliable results are the following Snowball stemmers:

  • Swedish: cannot replace 'löst' with 'lös' in step 3 when input only has 'lost',
  • Danish: similarly, in step 3, cannot replace 'løst' with 'løs',
  • Italian: in step 1, 'ità' won't get removed; in step 2, 'erà', 'erò' or 'irà' won't get removed,
  • Spanish: all steps rely on the existence of diacritical marks, and
  • French: all steps rely on the existence of diacritical marks.

Removing diacritics in the actual search phase also makes the search too greedy, producing results not related to what the user was searching for.

Proposed resolution

There are several possibilities:

  1. Don't remove diacritical marks.
  2. Make removing them optional per language (and probably per diacritical mark for best results) as originally planned in the linked issue, with sensible defaults.
  3. Run diacritical mark removal function after hook_search_preprocess implementations.

Workaround

It's possible to change the provider of the transliteration service to a custom class. This custom class can extend PhpTransliteration to retain all of its functionality, but have its own removeDiacritics function that does not alter the input text. This function will get called instead of the one in PhpTransliteration.

CommentFileSizeAuthor
#4 drupal-search-diacritics.PNG38.74 KBataimist

Comments

ataimist created an issue. See original summary.

ataimist’s picture

Issue summary: View changes
ataimist’s picture

Issue summary: View changes
ataimist’s picture

Issue summary: View changes
Related issues: +#2858595: Search matching is too greedy
StatusFileSize
new38.74 KB

Clarified issue description per discussion in https://www.drupal.org/node/731298#comment-11973067 & added screenshot. Moved search fuzzy matching problem to new issue: https://www.drupal.org/node/2858595

ataimist’s picture

Version: 8.3.x-dev » 8.4.x-dev
Issue summary: View changes

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Status: Active » Postponed (maintainer needs more info)
Issue tags: +stale-issue-cleanup

Thank you for reporting this problem. We rely on issue reports like this one to resolve bugs and improve Drupal core.

Since there has been no activity here for over 8 years we are asking if this problem persists on a currently supported version of Drupal. To help, add a comment explaining if the problem still occurs or not. Any extra detail you can provide can help others who experienced this.

Since we need more information to move forward with this issue, the status is now Postponed (maintainer needs more info). If we don't receive additional information to help with the issue, it may be closed after three months.
Thanks!

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

quietone’s picture

Status: Postponed (maintainer needs more info) » Closed (outdated)

Another 6 months and no indication if this is still valid. Therefor, I am closing this issue.

If this is incorrect, re-open the issue. Or you can create a new issue and reference this one.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.