I have an issue where our urls aliases are based on the node title and our titles often contain dashes. Dashes are different from hyphens and come in two varieties, En Dashes and Em Dashes.

Because there is currently no way to ignore them in the Pathauto settings, I was getting urls like so:

Node Title:

Downloads – Install additional software

Pathauto URL alias:

http://www.root-of-site.com/software-requirements/downloads-–-install-additional-software

Notice that the En dash is surrounded by two hyphens here. Not a desirable result.

I have created a patch that adds both En and Em Dashes to the list of punctuation to be removed from urls in the settings. The resulting URL is this:

http://www.root-of-site.com/software-requirements/downloads-install-additional-software

Please consider rolling this into dev.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Toby Wild’s picture

For those waiting for this to be included in the module, you could also include this using their hook into a custom module:

function MODULE_pathauto_punctuation_chars_alter(array &$punctuation) {
  $punctuation['ndash'] = array(
    'value' => '–',
    'name' => t('En Dash'),
  );
  
  $punctuation['mdash'] = array(
    'value' => '—',
    'name' => t('Em Dash'),
  );

}

Also, in case anyone has the same issue I had, make sure your text editor is using UTF-8 encoding.
Notepad++ defaults to ANSI and it doesn't save the characters correctly.

dpovshed’s picture

Status: Needs review » Reviewed & tested by the community

@jaydee1818, your patch working fine for me, so I am changing status of the issue.

However, for my task I will use hint from @Toby Wild to define even more characters. Those are loved by endusers in one project. So my hook looks like

  $punctuation['ndash'] = array(
    'value' => '–',
    'name' => t('En Dash'),
  );
  $punctuation['mdash'] = array(
    'value' => '—',
    'name' => t('Em Dash'),
  );
  $punctuation['single_quota_open'] = array(
    'value' => '‘',
    'name' => t('Quotation Open'),
  );
  $punctuation['single_quota_close'] = array(
    'value' => '’',
    'name' => t('Quotation Close'),
  );
  $punctuation['double_quota_open'] = array(
    'value' => '“',
    'name' => t('Double Quotation Open'),
  );
  $punctuation['double_quota_close'] = array(
    'value' => '”',
    'name' => t('Double Quotation Close'),
  );

Thanks to both of you!

Toby Wild’s picture

Fantastic, can't wait to see this released.

Content authors love their special characters in page titles even though I keep telling them not to.

Dave Reid’s picture

Status: Reviewed & tested by the community » Active
KeithC’s picture

Hi,

This is causing issues (in particular with the Rate module) on a clients site.

Is this change likely to be included in a stable release any time soon?

Thanks

rdellis87’s picture

Thanks, jaydee1818. The patch appears to be working great for me.

whthat’s picture

An alternative to not using this patch is using the Transliteration module then and turning on "Transliterate prior to creating alias" in /admin/config/search/path/settings and updating aliases for affect nodes has removed the em/en-dashes in titles along with apostrophe's and other unwanted characters.

zombree’s picture

whthat's tip is helpful for the D8 version of this module, but the "Transliterate prior to creating alias" option is not available in the D7 settings.

whthat’s picture

Just update the previous comment, you need the Transliteration module for that option in D7.