Search API Japanese Normalizer logo

This project is not covered by Drupal’s security advisory policy.

Search API Japanese Normalizer is a module that provides a processor for the Drupal Search API module. This processor standardizes variations in Japanese text, improving search accuracy.

Features

This module normalizes Japanese text variations according to the following rules:

  • Convert full-width alphanumeric characters to half-width.
  • Convert half-width Katakana to full-width Katakana.
  • Normalize characters similar to hyphen-minus.
  • Normalize characters similar to the long vowel mark.
  • Replace consecutive long vowel marks with a single one.
  • Remove characters similar to the tilde (~).
  • Convert full-width symbols commonly used in half-width form to half-width.
  • Convert half-width symbols commonly used in full-width form to full-width.
  • Convert full-width spaces to half-width spaces.
  • Replace multiple consecutive half-width spaces with a single one.
  • Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, and full-width symbols."
  • Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, full-width symbols" and "half-width alphanumeric characters."

This module is implemented with reference to the normalization rules used in NEologd, a dictionary for morphological analyzers. For detailed conversion rules, please refer to NEologd Normalization Rules.

Example Conversions

Before After
ドルーパル ドルーパル
スーーパーーー スーパー
アルゴリズム C アルゴリズムC

Post-Installation

After installation, the "Japanese Normalizer" processor will be added to the "Processors" tab in the Search API index settings. Enabling this processor will automatically correct variations in Japanese text, improving search accuracy.

Additional Requirements

The Search API module is required for this module to function. For setup instructions, please refer to the Search API module documentation.

Similar projects

  • Search API Kana Convert - A module specializing in converting between Hiragana, Katakana, and Romaji representations.

日本語による説明はこちら

Supporting organizations: 

Project information

Releases