I've run into an issue with multibyte string. The search_excerpt function is providing some context around matched keywords - in some rare cases the context can include part of multi-byte character. In such case the excerpt will not render. I fixed this by replacing all string functions with their multi-byte counterparts.

You can apply the patch using command:
$ patch modules/search/search.module -i multibyte-search-excerpt.patch

Comments

ptrl’s picture

Version: 7.8 » 7.x-dev
Assigned: barnaba.turek » Unassigned
Status: Active » Needs review
StatusFileSize
new953 bytes

I had the same problem and solved it by using the drupal_substr function.

jhodgdon’s picture

Version: 7.x-dev » 8.x-dev
Issue summary: View changes
Status: Needs review » Postponed
Issue tags: +Needs tests, +Needs backport to D7
Related issues: +#916086: search_excerpt() doesn't highlight words that are matched via search_simplify()

We need to see if this is still a problem in Drupal 8, and probably we should wait until #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() is done, because it is totally reworking the search_excerpt() function and might fix the problem.

We also need a test. Can you provide a suggestion for a call to search_excerpt() that breaks in Drupal 7 due to multi-byte characters?

jhodgdon’s picture

Status: Postponed » Needs work

The issue this was postponed on is now resolved.

We need to test and see if this is still a problem in D8. If so, we need a test. The original reporter did not say how to trigger the problem, unfortunately... can we figure out how to trigger the problem?

jhodgdon’s picture

Priority: Minor » Normal

It's also not just a "minor" issue if search isn't working right for multi-byte characters.

jhodgdon’s picture

Status: Needs work » Closed (cannot reproduce)

I attempted to test this today in both D7 and D8.

I created a node with the following multi-byte character data in it (Japanese, Hebrew, Russian, and Greek text that I got from random places):

----

以呂波耳・ほへとち。リヌルヲ。

בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ.

Сло́жно найти́ си́мвол Росси́и бо́лее популя́рный, чем традицио́нная ру́сская матрёшка.

εἰς τὸ κιβούριον τῆς ἁγίας Σοφίας ἃς οἱ πλάνοι καθεῖλον ἐνθάδ᾽ εἰκόνας ἄνακτες ἐστήλωσαν εὐσεβεῖς πάλιν.
----

After running cron to index the node, I was able to search for words in any of those languages without a problem. The excerpts worked fine.

So unless someone can come up with a way to reproduce this issue, I'm going to close this as "cannot reproduce".