Every page has a body_id set by the URL alias but when using other language character sets in the URL, it fails, printing <body id="pid--" when the URL is e.g. .../ru/Таиланд/Паттайя/Главная, instead of the English equivalent, thailand/pattaya/home which works fine giving: <body id="pid-thailand-pattaya-home"

I looked at this snippet #501800: Page titles in Acquia Marina body class but then got confused wondering how the body id was getting set to the URL alias in the first place since there is no stock template.php file.

I ran into a similar problem with the Menu Names CSS module that takes a menu title and creates an id for the li tag for extra flexibility in menu theming. There was a preg_replace function in that module that stripped out all special characters, so I had to tweak the regex to accept foreign character sets. I something similar going on here? Where is this code living?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

goody815’s picture

druplicate,

are you only running into this issue with the Acquia Marina theme?

goody815’s picture

Assigned: Unassigned » goody815
druplicate’s picture

Title: Body id uses URL alias but barfs with Cyrillic characters » body id uses URL alias but barfs with Cyrillic characters
Project: Fusion » Acquia Marina
Version: 6.x-1.0 » 6.x-3.1
Assigned: Unassigned » goody815
Status: Needs review » Active

I didn't try other themes. It would have to be a Fusion based theme and because I have a lot of tpl.php mods, the site would probably not work. With some effort I could ascertain this but I've since decided to abandon translating URL arguments as it seems it's not that important for SEO, may even cause parsing problems with some crawlers, and I think does not display in the address bar of early versions of IE, and it does not copy correctly when pasted into email.

A quick survey of a few sites in other countries/languages showed hardly anyone using other than Latin characters in URLs, even though ICANN has officially sanctioned their use in TLDs. One notable exception: different language versions of Wikipedia use native character sets in the URL. Still, I think it is potentially a big headache to be avoided for the time being.

UPDATE: The problem is in the template.php file for Fusion on line 79:
$vars['body_id'] = 'pid-' . strtolower(preg_replace('/[^a-zA-Z0-9-]+/', '-', drupal_get_path_alias($_GET['q']))); // Add a unique page id

For URL aliases that are in other character sets, the regex has to be modified. See here: http://www.regular-expressions.info/unicode.html and this PDF: http://www.icu-project.org/docs/papers/iuc26_regexp.pdf

This should be moved to the Fusion issue queue.

druplicate’s picture

Title: body id uses URL alias but barfs with Cyrillic characters » Body id uses URL alias but barfs with Cyrillic characters
Project: Acquia Marina » Fusion
Version: 6.x-3.1 » 6.x-1.0
Assigned: goody815 » Unassigned

I have decided to use only standard Latin characters in the URL, so I don't need to do this anymore, however I'm posting this here for future reference.

More on this subject here: #278490: school website: spanish, german, french

aquariumtap’s picture

Status: Active » Needs review
FileSize
1.46 KB

Hi drupalicate, I see the bug you're talking about. If you decide to use cyrillic characters again in your URLs, give this patch a try.

druplicate’s picture

Title: body id uses URL alias but barfs with Cyrillic characters » Body id uses URL alias but barfs with Cyrillic characters
Project: Acquia Marina » Fusion
Version: 6.x-3.1 » 6.x-1.0
Assigned: goody815 » Unassigned
Status: Active » Needs review

Thanks. I was reluctant to just remove that part of the regex without a thorough investigation. There are ways to accept only certain characters sets, but maybe it's overkill.

I'm trying to find other ways around this problem. It's because I have a taxonomy that gets translated for many areas of the site and is also used in the URL. There doesn't seem to be any way to use only the English version of a term for the URL and yet use the appropriate translation everywhere else. What's needed are language specific tokens. I'll probably have to write my own or if worse comes to worst, I'll fallback to using automatic transliterated terms for the URL - ugh.

The issue of making language specific taxonomy tokens has been fixed in D7 but is not easily backported to D6 as discussed here: #736178: Add a [node:source] token for source node of a translated node

aquariumtap’s picture

Status: Needs review » Fixed

Marking as fixed. Patch has been committed in 6.x-1.11. Please re-open if the issue resurfaces.

aquariumtap’s picture

Status: Fixed » Closed (fixed)