Many a times I have come across situations where because the token contains accents or diacriticals pathauto generates the URL that contains character set for the particular letter.
E.g. for a page node type with title as Härkää sarvista and with [title] token, pathauto creates an alias http://mysite.org/h%C3%A4rk%C3%A4%C3%A4-sarvista which is not at all SEO friendly.
This can be easily solved by calling a function that replaced predefined words that contain diacriticals.
After calling this function pathauto generates SEO friendly URL like http://mysite.org/harkaa-sarvista
I have created a patch that will call this function and replace certain words and create a SEO friendly URL.
And this is the function that can be called within any module.
function _pathauto_cleanaccent($string){
$accents = array(
"searchword" => "replacewith",
"Härkää" => "harkaa"
);
$output = $string;
foreach($accents as $key => $value)
{
$output = str_replace($key, $value, $output);
}
return $output;
}
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | clean_alias.patch | 534 bytes | liquidcms |
| #6 | cleanstrings_hook.patch | 536 bytes | liquidcms |
| #3 | pathauto_accent_replace.1.1.patch | 432 bytes | janwari |
| #1 | pathauto_accent_replace.1.patch | 431 bytes | janwari |
Comments
Comment #1
janwari commentedOops sorry.
Here is the patch.
Comment #2
gregglesWhen we discussed this in IRC I thought that your patterns were more complex but if you want to replace ä with a that should be possible with the i18n-ascii.txt file.
Also, by calling this "cleanaccent" we limit it to the idea of cleaning accents. It should be pathauto_cleanstring or something like that instead.
Comment #3
janwari commentedYes, the patterns which im trying to replace are more complex than the example I mentioned above. The patterns I am replacing are Bahá'ís, Bahá'u'lláh etc.
And as you asked, I renamed the function to pathauto_cleanstrings.
Comment #4
gregglesBetter title.
Comment #5
liquidcms commentedthe correct patch for this is:
and the correct usage is:
thanks for pointing me in the right direction.
Comment #6
liquidcms commentedand here's the patch
Comment #7
janwari commentedThanks for writing a patch.
Comment #8
Freso commentedAdding to "my issues" for later review. :)
Comment #9
gregglesMarked #277331: Add option to replace ampersand (&) with 'and' as a duplicate.
Also updating version.
Comment #10
catchFollowing the post about moving some of the UI to another module, I think the whole string replacement stuff could be factored out with an alter hook - maybe enough that pathauto itself wouldn't need to worry about it (I've never, ever touched the punctuation settings or needed to).
Comment #11
deciphered commentedMy view on this is that the whole cleanup functionality should be broken out into a completely separate module. This would allow for any module that needs to run cleanup on tokens to do so without the need to duplicate functionality.
As for UI, it should be fairly simple to allow each module that hooks into this new module to choose from either use a centralized settings section or integrating the settings form into their own settings form.
If anyone is interested, I will likely start work on this new module in the next few weeks and would be more than happy to co-develop it with anyone capable.
Comment #12
aterchin commentedEdit: Guess I didn't read this whole thread through. My patch is similar but not really, specifying alternative selector for punctuation.
http://drupal.org/node/425164
Comment #13
mitchell commentedUpdating title.
Comment #14
greggles@Mitchell, the new title you proposed is not the direct purpose of this issue. Changing back.
Comment #15
mitchell commented@greggles: will this issue provide the needed functionality to expose pathauto_cleanstring()?
Comment #16
dave reidWe now have hook_pathauto_clean_alias().
Comment #18
liquidcms commentedDave, not completely sure, but i think the patch in #16 is incorrect.
I think for a couple reasons; as it is.. the module_invoke call isn't being used as it needs to likely be assigned to $output (call by reference won't work here as it is passed through the module_invoke_all function).
but, even that i don't think is good enough as module_invole_all simply builds an array of all the results the modules implementing the hook ADD to the output. Since this hook is likely intended to be a string which is possibly "cleaned" by more than one module, i think the code needs to be something like:
and the patch for this is attached
Comment #19
liquidcms commentedalso, i see this started as D5 but now is D7 - not sure what that means.
my patch is against D6 - it is a bug there and also likely the same bug in HEAD (D7)
Comment #20
gregglesI think the same thing is in #788304: hook_pathauto_clean_alias doesn't do anything.