Hi,
I had a user post an issue for Textimage (#465492: Text wrapper splits words (seemingly at random)) where the text wrapping wasn't working properly. It turned out that preg_match() was reporting an incorrect position due to a unicode character ('æ') in the string and therefore offsetting the cut point.
After a bit of searching I discovered a user submitted mb_preg_match() function that fixes the issue: http://php.nusa.net.id/manual/en/function.preg-match.php#71571.
Is there any chance that this function, once modified to better suit drupals current unicode functions, will make it into core?
Or am I barking up the wrong tree and should be solving the above issue with a cleaner solution that I've clearly overlooked?
Cheers,
Deciphered.
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedHi Deciphered,
I cannot say I understand completely what the issue is, but it seems clear that preg_match() is only dealing with offsets in bytes, not in characters, so you have to use substr() and strlen(), not drupal_substr() and drupal_strlen().
One other thing you might want to consider is using preg_split() to cut your string first at ponctuation marks, and then deal with the wrapping. The advantage is that after the string is cut, you don't have to worry about bytes and characters anymore: you can do all the other operations using characters (ie. with drupal_*() functions).
Regardless of everything, a function like this could indeed be useful:
Comment #2
Deciphered CreditAttribution: Deciphered commentedHi Damien,
I had originally fixed the issue by using substr over drupal_substr and had also considered breaking the words into an array as well, but this code does the job quite nicely and feels like the right way to go about unicode support.
Would be nice to see this code make it's way into core, but based on my searches it's clearly not a common issue.
Cheers,
Deciphered.
Comment #3
mdupontBumping feature request to 8.x-dev
Comment #10
dpidrupal_preg_match no longer