In function views_break_phrase_string, there is a line such that:
if (preg_match('/^(\w+[+ ])+\w+$/', $str)) {
"\w" is intended for word characters but it does not match for UTF-8 characters.
Hence, for example, if the argument is "あ+い" (japanese characters), the result is false and the argument will be treated as an invalid string.

A workaround: use the following code instead of the above:
if (preg_match('/^([^+ ]+[+ ])+[^+ ]+$/', $str)) {

However, this may be not the correct code since they do not take care of word characters or not.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dawehner’s picture

It would be cool if you could provide a real patch to solve this issue.

dawehner’s picture

If you provide a patch it would be even more cool if you could change views_handler.test::test_views_break_phrase_string to use utf8 words as well.

josaku’s picture

Our site decided to use filters instead of contextual filter with UTF-8 characters so now I can not use my time for the fix. I'm sorry. I hope someone can do this...

David_Rothstein’s picture

Title: break_phrase for UTF-8 does not work in handlers.inc » Various characters (UTF-8 characters, dashes, and symbols) cause views_break_phrase_string() not to work
Version: 7.x-3.1 » 7.x-3.x-dev
Status: Active » Needs review
FileSize
4.37 KB

I ran into this problem with dashes as well; the issue is actually more general. See also #1403078: Mulitple email addresses as args doesn't recognize + (a support request which I will mark duplicate of this one) and #672606: Hyphens and forward slashes (-/) break Views contextual filters (which has a patch related to this issue although I think the original issue was about something a bit different).

Seems like the way to go here is just to allow as many characters as we can? We pretty much only want to treat plus signs and commas (and sometimes spaces) as special, since they're the delimiters, but everything else should be allowed to go through, I think.

The attached patch does that, and also adds tests.

A note on the tests (because you'll see I actually made a lot of changes to them). These two:

$this->assertEqualValue(array('word1', 'word2', 'word'), views_break_phrase_string('word1 word2++word', $handler));
$this->assertEqualValue(array('word1', 'word2', 'word'), views_break_phrase_string('word1,,word2,word', $handler));

were passing tests previously, but in fact they never worked at all. The tests were actually completely bogus, because they were passing the same $handler into views_break_phrase_string() over and over again, but if $handler->value already exists then views_break_phrase_string() will keep using the old value (even if an invalid string is passed in), so the tests weren't actually testing much.

I ran into this when I was adding the new tests, so I fixed the tests to work properly and removed the ones (above) that never passed.

dawehner’s picture

Status: Needs review » Fixed
were passing tests previously, but in fact they never worked at all. The tests were actually completely bogus, because they were passing the same $handler into views_break_phrase_string() over and over again, but if $handler->value already exists then views_break_phrase_string() will keep using the old value (even if an invalid string is passed in), so the tests weren't actually testing much.

OH!

+++ b/tests/views_handlers.testundefined
@@ -55,23 +55,42 @@ class ViewsHandlersTest extends ViewsSqlTest {
+    $handler = views_break_phrase_string('wõrd1,wõrd2,wõrd');
+    $this->assertEqualValue(array('wõrd1', 'wõrd2', 'wõrd'), $handler);

I like to have UTF8 testing here.

The patch looks fine, especially now we have realy testing. Committed to 7.x-3.x and 8.x-3.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.