We use Pathauto for a news site that we run in the Khmer language. Khmer is a little bit special, we don't separate words. When computerizing the language we instead use so called zero-width spaces (\xE2\x80\x8B) to support line breaks and so forth.

I have come up with the attached one line patch for handling this, maybe you would be interested in including it upstream. It has helped us tremendously, for example Boost would fail caching nearly all our articles because of the occurrance of the zero-width spaces in titles. Maybe it can help someone else :-)

CommentFileSizeAuthor
zero_width_space.patch698 bytespppost
Support from Acquia helps fund testing for Drupal Acquia logo