Many a times I have come across situations where because the token contains accents or diacriticals pathauto generates the URL that contains character set for the particular letter.
E.g. for a page node type with title as Härkää sarvista and with [title] token, pathauto creates an alias http://mysite.org/h%C3%A4rk%C3%A4%C3%A4-sarvista which is not at all SEO friendly.

This can be easily solved by calling a function that replaced predefined words that contain diacriticals.
After calling this function pathauto generates SEO friendly URL like http://mysite.org/harkaa-sarvista

I have created a patch that will call this function and replace certain words and create a SEO friendly URL.
And this is the function that can be called within any module.

function _pathauto_cleanaccent($string){

$accents = array(
"searchword" => "replacewith",
"Härkää" => "harkaa"
);

$output = $string;

foreach($accents as $key => $value)
{
$output = str_replace($key, $value, $output);
}

return $output;
}

Comments

janwari’s picture

StatusFileSize
new431 bytes

Oops sorry.

Here is the patch.

greggles’s picture

Status: Needs review » Needs work

When we discussed this in IRC I thought that your patterns were more complex but if you want to replace ä with a that should be possible with the i18n-ascii.txt file.

Also, by calling this "cleanaccent" we limit it to the idea of cleaning accents. It should be pathauto_cleanstring or something like that instead.

janwari’s picture

StatusFileSize
new432 bytes

Yes, the patterns which im trying to replace are more complex than the example I mentioned above. The patterns I am replacing are Bahá'ís, Bahá'u'lláh etc.

And as you asked, I renamed the function to pathauto_cleanstrings.

greggles’s picture

Title: Accent replace to get clean URL » allow other modules to affect strings to help with custom accent/string replacement

Better title.

liquidcms’s picture

the correct patch for this is:

  // Let modules define other replacements
  $cleaned = module_invoke_all('pathauto_cleanstrings', $output);
  $output = $cleaned ? $cleaned[0] : $output;

and the correct usage is:

function mymodule_pathauto_cleanstrings($string) {
  $replacements = array(
    "&" => "_and_",
  );
  
  $output = $string;
  foreach($replacements as $key => $value) {
    $output = str_replace($key, $value, $output);
  }
  
  return $output;
}

thanks for pointing me in the right direction.

liquidcms’s picture

StatusFileSize
new536 bytes

and here's the patch

janwari’s picture

Thanks for writing a patch.

Freso’s picture

Adding to "my issues" for later review. :)

greggles’s picture

Version: 5.x-2.x-dev » 7.x-1.x-dev
Assigned: janwari » Unassigned

Marked #277331: Add option to replace ampersand (&) with 'and' as a duplicate.

Also updating version.

catch’s picture

Following the post about moving some of the UI to another module, I think the whole string replacement stuff could be factored out with an alter hook - maybe enough that pathauto itself wouldn't need to worry about it (I've never, ever touched the punctuation settings or needed to).

deciphered’s picture

My view on this is that the whole cleanup functionality should be broken out into a completely separate module. This would allow for any module that needs to run cleanup on tokens to do so without the need to duplicate functionality.

As for UI, it should be fairly simple to allow each module that hooks into this new module to choose from either use a centralized settings section or integrating the settings form into their own settings form.

If anyone is interested, I will likely start work on this new module in the next few weeks and would be more than happy to co-develop it with anyone capable.

aterchin’s picture

Edit: Guess I didn't read this whole thread through. My patch is similar but not really, specifying alternative selector for punctuation.
http://drupal.org/node/425164

mitchell’s picture

Title: allow other modules to affect strings to help with custom accent/string replacement » Expose pathauto_cleanstring() to other modules
Issue tags: +clean path

Updating title.

greggles’s picture

Title: Expose pathauto_cleanstring() to other modules » allow other modules to affect strings (to help with custom accent/string replacement)

@Mitchell, the new title you proposed is not the direct purpose of this issue. Changing back.

mitchell’s picture

@greggles: will this issue provide the needed functionality to expose pathauto_cleanstring()?

dave reid’s picture

Status: Needs work » Fixed

We now have hook_pathauto_clean_alias().

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

liquidcms’s picture

Status: Closed (fixed) » Needs review
StatusFileSize
new534 bytes

Dave, not completely sure, but i think the patch in #16 is incorrect.

I think for a couple reasons; as it is.. the module_invoke call isn't being used as it needs to likely be assigned to $output (call by reference won't work here as it is passed through the module_invoke_all function).

but, even that i don't think is good enough as module_invole_all simply builds an array of all the results the modules implementing the hook ADD to the output. Since this hook is likely intended to be a string which is possibly "cleaned" by more than one module, i think the code needs to be something like:

  // Give other modules a chance to clean this alias.
  foreach (module_implements('pathauto_clean_alias') as $name) {
    $function = $name . '_pathauto_clean_alias';
    $output = $function($output);
  }

and the patch for this is attached

liquidcms’s picture

Category: feature » bug

also, i see this started as D5 but now is D7 - not sure what that means.

my patch is against D6 - it is a bug there and also likely the same bug in HEAD (D7)

greggles’s picture

Status: Needs review » Closed (fixed)