I have a problem generating the correct alias for similar words containing diacritics.
Example for my path using [node:title] token.
- bara displays example.com/bara
- bară displays example.com/bara-0 and should display example.com/bară
I am trying to replace manually bara-0 with bară but it tells me bara is already in use. bară should be example.com/bară, different to example.com/bara.

Anyone knows how to deal with this? thank you!

Comments

designarti’s picture

Could anyone tell me if this is an encoding issue? Using ISO-8859-1 could solve my problem? Or maybe a database problem, using the same UTF-8... I'm trying to sort this out for days now, asked on IRC, I'm beat.
For example,
http://www.example.com/sinonime/amend%C4%83
is seen the same with
http://www.example.com/sinonime/amenda
although they are 2 different words
The correct url here is
http://www.example.com/sinonime/amendă
using the right accent

what can I do?

pwiniacki’s picture

I guess bară is transliterate to bara and there is duplicate. Maybe Transliteration module has a option to make bară not to bara but to something else like barA (and not bara-0 like in default).

designarti’s picture

Yeah, thanks, but I need diacritics, this is dictionary website.

neal.zupancic’s picture

I think it's a database issue. Mysql will give bara = bară according to default collation for most languages - for instance, if you are using character set utf8, the default collation is utf8_general_ci - which has bara = bară. If you change the collation to utf8_bin you will get the behavior you are looking for - bara will not equal bară, so pathauto will not consider it a duplicate or try to uniquify it, and drupal_lookup_path will resolve the different aliases correctly.

You can change the alias column in the dr_url_alias table in your database to utf8_bin collation using phpmyadmin. You could also try out a language-specific collation for your language. Also, make sure transliteration and ascii filtering are off in the pathauto settings. I did this and was able to auto-generate appropriate aliases for bara and bară.

k_zoltan’s picture

Component: I18n stuff » Code

@neal.zupancic you just saved my day.

The problem is truly in the database structure.
Since for the particular Query will return more than 1 result and in this case the module will always go to the last one (the one with the bigger pid)

After changeing the field structure of the alias field from the url_alias table from utf8_general_ci to utf8_bin, the Query will no longer return multiple results, so the path will lead where it should.

Its not related to i18n. Its only the case having non-english caracters in the URL.

In some cases the Transliteration module could also fix the problem https://www.drupal.org/project/transliteration

k_zoltan’s picture

There is only one problem with this solution:

Where utf8_general_ci is case insensitive utf8_bin isn't.

Even when in the Global redirect module the "Case Sensitive URL Checking" chechbox is checked in the redirect doesn't work.

designarti’s picture

Delayed dictionary site launch for Drupal 8, just to see if there is a fix on upgrade, but I still can not get my website online because of this.
Changing collation seems like a partial resolve. Thank you @neal.zupancic! Diacritics are shown in URL, but they collide with English-characters aliases.
This time, as I said, I am using Drupal 8.
For example, I have been able to get example.com/bară, but I was not able to generate example.com/bara if the former was already generated. It is seen as duplicate and gets to render as example.com/bara-0.
Tried with utf8_bin and utf8_romanian_ci. Same output. HTML charset is set to UTF-8.