Node Title: a&b and " and c & d'

Pathgenerated: &b-and-"-and-c-&-d'

The problem is that I thought special characters were taken out by this bit of code from line 101 of pathauto.inc.

// Preserve alphanumerics, everything else becomes a separator.
$pattern = '/[^a-zA-Z0-9]+/ ';
$output = preg_replace($pattern, $separator, $output);

Comments

therainmakor’s picture

Actually, I'm assuming the token module replaces special characters with HTML character entities. I replaced this code starting at line 106 of pathauto.inc

  // Get rid of words that are on the ignore list
  $ignore_re = "\b". preg_replace("/,/", "\b|\b", variable_get('pathauto_ignore_words', $ignore_words)) ."\b";
  $output = preg_replace("/$ignore_re/ie", "", $output);

  // Always replace whitespace with the separator.
  $output = preg_replace("/\s+/", $separator, $output);

To this

  // Get rid of words that are on the ignore list
  $ignore_re = "\b". preg_replace("/,/", "\b|\b", variable_get('pathauto_ignore_words', $ignore_words)) ."\b";
  $output = preg_replace("/$ignore_re/ie", "", $output);
  
  // Remove HTML character entities caused by token module
  $output = preg_replace("/\&[_a-zA-Z]*\;/", "", $output);

  // Always replace whitespace with the separator.
  $output = preg_replace("/\s+/", $separator, $output);

This works great for me and resolved my issue.

therainmakor’s picture

Title: Special characters not being removed » Mistake in last bit of code

Actually use this code, I forgot about entities that use a number like "#039".

  // Get rid of words that are on the ignore list
  $ignore_re = "\b". preg_replace("/,/", "\b|\b", variable_get('pathauto_ignore_words', $ignore_words)) ."\b";
  $output = preg_replace("/$ignore_re/ie", "", $output);
  
  // Remove HTML character entities caused by token module
  $output = preg_replace("/\&#?[_a-zA-Z0-9]*\;/", "", $output);

  // Always replace whitespace with the separator.
  $output = preg_replace("/\s+/", $separator, $output);
therainmakor’s picture

Title: Mistake in last bit of code » Special characters not being removed

accidentally changed the title

greggles’s picture

Status: Active » Closed (duplicate)