I've run into an issue with multibyte string. The search_excerpt function is providing some context around matched keywords - in some rare cases the context can include part of multi-byte character. In such case the excerpt will not render. I fixed this by replacing all string functions with their multi-byte counterparts.
You can apply the patch using command:
$ patch modules/search/search.module -i multibyte-search-excerpt.patch
| Comment | File | Size | Author |
|---|---|---|---|
| #1 | drupal_substr-1324652-1.patch | 953 bytes | ptrl |
| multibyte-search-excerpt.patch | 771 bytes | barnaba.turek |
Comments
Comment #1
ptrl commentedI had the same problem and solved it by using the drupal_substr function.
Comment #2
jhodgdonWe need to see if this is still a problem in Drupal 8, and probably we should wait until #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() is done, because it is totally reworking the search_excerpt() function and might fix the problem.
We also need a test. Can you provide a suggestion for a call to search_excerpt() that breaks in Drupal 7 due to multi-byte characters?
Comment #3
jhodgdonThe issue this was postponed on is now resolved.
We need to test and see if this is still a problem in D8. If so, we need a test. The original reporter did not say how to trigger the problem, unfortunately... can we figure out how to trigger the problem?
Comment #4
jhodgdonIt's also not just a "minor" issue if search isn't working right for multi-byte characters.
Comment #5
jhodgdonI attempted to test this today in both D7 and D8.
I created a node with the following multi-byte character data in it (Japanese, Hebrew, Russian, and Greek text that I got from random places):
----
以呂波耳・ほへとち。リヌルヲ。
בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ.
Сло́жно найти́ си́мвол Росси́и бо́лее популя́рный, чем традицио́нная ру́сская матрёшка.
εἰς τὸ κιβούριον τῆς ἁγίας Σοφίας ἃς οἱ πλάνοι καθεῖλον ἐνθάδ᾽ εἰκόνας ἄνακτες ἐστήλωσαν εὐσεβεῖς πάλιν.
----
After running cron to index the node, I was able to search for words in any of those languages without a problem. The excerpts worked fine.
So unless someone can come up with a way to reproduce this issue, I'm going to close this as "cannot reproduce".