This project is not covered by Drupal’s security advisory policy.
Search API Japanese Normalizer is a module that provides a processor for the Drupal Search API module. This processor standardizes variations in Japanese text, improving search accuracy.
Features
This module normalizes Japanese text variations according to the following rules:
- Convert full-width alphanumeric characters to half-width.
- Convert half-width Katakana to full-width Katakana.
- Normalize characters similar to hyphen-minus.
- Normalize characters similar to the long vowel mark.
- Replace consecutive long vowel marks with a single one.
- Remove characters similar to the tilde (~).
- Convert full-width symbols commonly used in half-width form to half-width.
- Convert half-width symbols commonly used in full-width form to full-width.
- Convert full-width spaces to half-width spaces.
- Replace multiple consecutive half-width spaces with a single one.
- Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, and full-width symbols."
- Remove half-width spaces between "Hiragana, full-width Katakana, half-width Katakana, Kanji, full-width symbols" and "half-width alphanumeric characters."
This module is implemented with reference to the normalization rules used in NEologd, a dictionary for morphological analyzers. For detailed conversion rules, please refer to NEologd Normalization Rules.
Example Conversions
| Before | After |
|---|---|
| ドルーパル | ドルーパル |
| スーーパーーー | スーパー |
| アルゴリズム C | アルゴリズムC |
Post-Installation
After installation, the "Japanese Normalizer" processor will be added to the "Processors" tab in the Search API index settings. Enabling this processor will automatically correct variations in Japanese text, improving search accuracy.
Additional Requirements
The Search API module is required for this module to function. For setup instructions, please refer to the Search API module documentation.
Recommended modules/libraries
- Search API Japanese Tokenizer: Optimizes search indexes using natural language processing and resolves issues related to N-grams.
Similar projects
- Search API Kana Convert - A module specializing in converting between Hiragana, Katakana, and Romaji representations.
Project information
Maintenance fixes only
Considered feature-complete by its maintainers.- Project categories: Site search
- Ecosystem: Search API
7 sites report using this module
- Created by u7aro on , updated
This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.
Releases
First release.
Development version: 1.0.x-dev updated 4 Feb 2025 at 00:52 UTC


