Postponed
Project:
Bbcode
Version:
5.x-1.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
19 Dec 2007 at 19:10 UTC
Updated:
30 Jan 2011 at 09:52 UTC
I'm currently working on a website running on a domain containing an umlaut (ä, ü, ï for example). When I use the following BB code, it's not being parsed, but the code is simply displayed on the page: [url]http://exämple.com/[/url] (Obfuscated domain name for privacy reasons). As soon as I remove the character containing the umlaut, the link works like a charm.
Comments
Comment #1
naudefj commentedI do understand the problem, but cannot figure out how to solve it.
Any suggestions?
Comment #2
naudefj commentedIssue postponed until someone submits a patch.
Comment #3
gilcot commentednon-pur ascii characters should/must be encoded.. see
l()Comment #4
yngens commentedsame issue here.
gilcot, non-ascii characters must not necessarily be encoded
I believe non-ascii characters should be included in the following lines in 'bbcode-filter.inc', but I dont know how. I need to get cyrillic characters work with BBcode.
Comment #5
lars skjærlund commentedI'm bitten by this one, too.
But it's not just a small bug, I'm afraid: It seems to be a serious localization problem. In my case, I'm unable to use filenames with Danish characters - which, of course, should be perfectly legal in a modern world.
I've tried playing with the code a bit, but was quickly struck by a showstopper: Most of the bbcode-filter is regular expressions, and we really need Unicode to support more than mere ASCII characters. Unicode _is_ supported by the PHP preg_ functions - that is, in principle, and if and only if the underlying PCRE library is compiled with Unicode support. On my Linux distro, it isn't.
Of course I could upgrade my Linux - but in that case I'll get PHP 5.3 as well, and that's another huge Drupal issue, as we all know...
So the bbcode module needs to be rewritten to support non-ASCII characters: For starters that means that all occurrences of \w should be replaced by something else. There's a lot of them - for those of you not so familiar with regular expressions, \w matches "word" characters meaning the letters a..z and A..Z and the numbers 0..9. Nothing more than that - so all of us non-English speaking people are left behind.
And next it should be decided if the module should continue to use the preg_ function family as this requires Unicode support in the underlying PCRE library which, unfortunately, seems not to be the norm.
Until that happens, I'm afraid the bbcode module is for English-speaking people only.
BTW - if you want to know why some of us need Unicode support, look no further than to my name!
Regards,
Lars