The purpose of http://api.drupal.org/api/function/hook_search_preprocess/7 is to allow stemming (and perhaps other) modules to pre-process text before it is added to the index in the search module, or used as input to a query on the search index. As it is currently in Drupal 6 and 7, the hook has only a single parameter, which is the text to be pre-processed.
That works fine for a single-language site, where you would likely only have a single stemming module. But for a multi-lingual site, with presumably multiple stemming modules, the stemming modules have no way of knowing which language the text is in. And you definitely shouldn't pre-process English text with a German or Spanish stemming module, or vice versa. See this issue on the Porter Stemmer project, for instance:
Given that, I think that hook_search_preprocess needs to have a second input giving the language the text is in, so that stemming modules can decide not to change the text if it is not in their language. For that to happen, several other functions would also need to have new language arguments. Let's see...
- hook_search_preprocess() is invoked from http://api.drupal.org/api/function/search_invoke_preprocess/7
- search_invoke_preprocess() is called from http://api.drupal.org/api/function/search_simplify/7 (actually, that search_invoke_preprocesss function seems kind of pointless, since it is 3 lines and only called in one place?)
- search_simplify() is called from both http://api.drupal.org/api/function/search_index_split/7 and http://api.drupal.org/api/function/search_parse_query/7
- search_index_split() is called from http://api.drupal.org/api/function/search_index/7 -- that is the search API function that modules can call to cause text to be indexed during their hook_search_index() implementations.
- search_parse_query() is called from http://api.drupal.org/api/function/do_search/7 -- that is the search API function that modules can call to do a search query.
So all the functions listed above would need to have language awareness. At the API level, search_index() would need to have a language input parameter that the modules would pass in to tell search_index() what language the text is in. And do_search() should be able to glean from the environment what language is in use by the user doing the search (or have that be an input to the function with a default being the current language).
|#82||search-preprocess-changelog.patch||588 bytes||Gábor Hojtsy|
PASSED: [[SimpleTest]]: [MySQL] 41,219 pass(es). View
|#73||adding_language_to_hook_511594-73.patch||10.62 KB||Gábor Hojtsy|
PASSED: [[SimpleTest]]: [MySQL] 40,672 pass(es). View
|#73||interdiff.txt||858 bytes||Gábor Hojtsy|
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/modules/search/lib/Drupal/search/Tests/SearchPreprocessLangcodeTest.php. View
PASSED: [[SimpleTest]]: [MySQL] 40,670 pass(es). View