Problem/Motivation

Documentation on hook_pathauto_alias_alter() suggests that the source path can be altered for a pattern. This works when creating the initial alias, but subsequent saves of the entity create duplicate aliases because the source hasn't been changed yet when checking for an existing alias.

Steps to reproduce

Implement hook_pathauto_alias_alter(), change $context['source'], and save an entity twice. Two duplicate aliases are created.

Proposed resolution

Move "This can be altered by reference." documentation for $source to hook_pathauto_pattern_alter(), and ensure $source is passed by reference.

API changes

Both hooks change, but it wouldn't affect anything that was previously working.

Issue fork pathauto-3187945

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

akalam created an issue. See original summary.

akalam’s picture

Status: Active » Needs review
StatusFileSize
new2.81 KB

Status: Needs review » Needs work

The last submitted patch, 2: pathauto-3187945-2.patch, failed testing. View results

akalam’s picture

akalam’s picture

StatusFileSize
new15.36 KB

The previous patch #4 was missing a call to getPatternByEntity, while the new method is called getPatternsByEntity(). Here is a patch fixing it.

KarlShea made their first commit to this issue’s fork.

karlshea’s picture

Title: Provide an alter hook for source url » Fix hooks and documentation to alter source path
Category: Feature request » Bug report
Issue summary: View changes
Status: Needs work » Needs review

I don't think a new alter hook is needed, I just think there's a bug here.

The documentation for $context['source'] on hook_pathauto_alias_alter() states "This can be altered by reference." That is true, altering there will save an alias with a different source. However, re-saves of the same entity will create multiple aliases because the source hasn't been altered when checking for an existing alias.

Moving that documentation to hook_pathauto_pattern_alter() and passing $source by reference to that hook works through multiple saves.

mably made their first commit to this issue’s fork.

mably’s picture

Code review of MR #69

The commit fixes the issue where altering the source path in hook_pathauto_alias_alter() caused duplicate aliases on subsequent saves.

Root cause

In PathautoGenerator::createEntityAlias(), the flow is:

  1. $source is set to the entity's internal path (e.g. /node/1)
  2. The $context array is built with $source
  3. hook_pathauto_pattern_alter() fires
  4. The existing alias lookup runs using $source (loadBySource($source))
  5. The alias is generated from the pattern and tokens
  6. hook_pathauto_alias_alter() fires
  7. The alias is saved using $source

Before the fix, $context['source'] was passed by value at step 2, and only became a reference to $source right before step 6. This meant that if a module altered $context['source'] in hook_pathauto_alias_alter() (e.g. changing it from /node/1 to /custom/1), the alias was created with the altered source at step 7. But on the next save, the existing alias lookup at step 4 still used the original /node/1, could not find the alias created with /custom/1, and created a duplicate.

The fix

The $source variable is now passed by reference ('source' => &$source) from the start when building the $context array. This means source alterations should now be done in hook_pathauto_pattern_alter() (step 3), which fires before the existing alias lookup (step 4). On subsequent saves, the same alteration happens again before the lookup, the existing alias is found, and no duplicate is created.

The redundant $context['source'] = &$source line before hook_pathauto_alias_alter() is removed since the reference is already established at the top.

Documentation changes

In pathauto.api.php:

  • hook_pathauto_pattern_alter(): signature changed from array $context to array &$context, and "This can be altered by reference" is added to the source parameter — this is now the correct place to alter the source path.
  • hook_pathauto_alias_alter(): "This can be altered by reference" is removed from the source parameter — altering source here is too late for the existing alias lookup and leads to duplicates.

The fix is correct and does not break existing code that alters source in hook_pathauto_alias_alter() — the reference is still there — but it steers developers to the right hook where the alteration has the intended effect.

mably’s picture

Test added for source path alteration

Added a kernel test (PathautoSourceAlterTest) and a test module (pathauto_source_alter_test) to verify the fix. The test module implements hook_pathauto_pattern_alter() to rewrite the source path from /node/N to /custom/N. The test creates a node, verifies the alias is stored with the altered source, then re-saves the node and asserts that no duplicate alias is created. Without the fix, the test fails because the source alteration is ignored and the alias is created with the original /node/N source.

mably’s picture

Assigned: Unassigned » berdir
mably’s picture

Won't we have the same problem with $context['language']?

Example code from pathauto.api.php:

function hook_pathauto_alias_alter(&$alias, array &$context) {
  // Add a suffix so that all aliases get saved as 'content/my-title.html'.
  $alias .= '.html';

  // Force all aliases to be saved as language neutral.
  $context['language'] = Language::LANGCODE_NOT_SPECIFIED;
}

Here is what says Claude about it:

The same duplicate alias problem also applies to language changes in hook_pathauto_alias_alter().

The documented example in pathauto.api.php shows:

// Force all aliases to be saved as language neutral.
$context['language'] = Language::LANGCODE_NOT_SPECIFIED;

This has the same root cause as the source alteration issue: loadBySource() is called with the original $langcode before hook_pathauto_alias_alter() runs. On resave, the lookup uses the original language again, fails to find the alias (which was saved with the altered language), and creates a duplicate.

Example scenario:

  1. First save: entity langcode is en, hook changes $context['language'] to fr, alias saved with langcode fr
  2. Second save: loadBySource('/node/1', 'en') — looks for en alias, doesn't find it (was saved as fr). Fallback only checks und and default language, not fr. So $existing_alias is NULL.
  3. Hook changes langcode to fr again
  4. New fr alias created — duplicate

The special case of und happens to work because lookupBySystemPath() always falls back to und. But any other language change would produce duplicates on resave.

The fix in MR !69 (passing $source by reference early in the $context array) should probably also move the language alteration documentation from hook_pathauto_alias_alter() to hook_pathauto_pattern_alter(), where it would run before loadBySource(). Note that $context['language'] is already passed by reference ('language' => &$langcode) so it would work in hook_pathauto_pattern_alter() without any code change — just a documentation update.

mably’s picture

Pushed a voluntarily failing test that generates a duplicate alias when language is modified in hook_pathauto_alias_alter() (as shown in PHP doc) to emphasize that this needs fixing too.

anybody made their first commit to this issue’s fork.

anybody’s picture

Assigned: berdir » Unassigned
Status: Needs review » Needs work

Tests are failing

mably’s picture

Tests are voluntarily failing as per #14 to emphasize that fixing the problem for source is probably not enough and that we must also fix language here.

anybody’s picture

Thanks @mably and sorry, I didn't see #14

mably’s picture

No pb :) Would still be happy to have your feedback on this.

mably’s picture

Status: Needs work » Needs review
mably’s picture

Assigned: Unassigned » berdir
berdir’s picture

I'm not quite sure what the use cases are for altering the source path. What would that achieve, doesn't the source path need to match the entity path?

We only generate one alias per entity, there's an issue that would allow to set up multiple patterns, for example one for the edit path.

Also, what happens with the widget, bulk update processes, those wouldn't know about a changed path and would not show existing/create duplicates?

Maybe @karlshea or @akalam can shed some light on their use case?

Similar to language. There are definitely use cases there, lots of people would like to have language undefined aliases across translations. But that too has a long and complicated history, again conflicts with the widget and so on (I always pushed back on pathauto supporting this scenario as long as the core path widget doesn't support that).

mably’s picture

The problem is not really about allowing changing source and language here, but rather allowing it to be done after having loaded the existing entity, generating a risk to create duplicates, the new created alias being then different than the existing one.

What's your point of view on this @berdir?

berdir’s picture

The question to me is whether we are fixing something that is meant to work or not and what the implications of that are. I'd rather make it clear (with docs updates and possibly code changes, maybe with deprecations) that you're *not* supposed to do that and that it's not supported, if it doesn't properly and consistently work in the scenarios that I mentioned.

That's why I'd like to hear what the use cases here are.

karlshea’s picture

Hey @berdir, I'm using this on a project where we use pathauto normally for all of our content entities, but we have a custom entity where there were two issues:

1. The entity is used to alter some branding on the site, and we're using Pathauto to create the aliases for that entity, but that entity isn't meant to be viewed. Instead, we're pointing the alias for that entity to a custom controller that handles that altering/session setting.

Pathauto was basically the easiest solution to "We need users to create entities where they can type in a path + more info, and when you hit that path it calls a controller to get that entity's info back" because it's already doing the alias -> source mapping, editing, and cleanup.

2. Admins needed to enter exactly the alias that they wanted on the entity form and I'm using hook_pathauto_alias_alter() so we don't run cleanTokenValues() in that particular case:

function themodule_pathauto_pattern_alter(PathautoPatternInterface $pattern, array &$context) {
  if ($context['module'] === 'themodule' && $context['bundle'] === 'site_brand') {
    /** @var \Drupal\themodule\Entity\SiteBrand $brand */
    $brand = $context['data']['site_brand'];

    // Change the source path for site brands to point to the set custom brand controller.
    $context['source'] = '/thesite_custom_brand/' . $brand->id();
  }
}

function themodule_pathauto_alias_alter(&$alias, array &$context) {
  if ($context['module'] === 'themodule' && $context['bundle'] === 'site_brand') {
    /** @var \Drupal\pathauto\AliasCleaner $alias_cleaner */
    $alias_cleaner = \Drupal::service('pathauto.alias_cleaner');
    $token = \Drupal::token();

    $pattern = $context['pattern'];
    $data = $context['data'];
    $langcode = $context['language'];

    // Recreate the alias without the Pathauto cleanTokenValues callback, the
    // alias should be exactly what they entered.
    $alias = $token->replace($pattern->getPattern(), $data, [
      'clear' => TRUE,
      //'callback' => [$this->aliasCleaner, 'cleanTokenValues'],
      'langcode' => $langcode,
      'pathauto' => TRUE,
    ], new BubbleableMetadata());

    $alias = $alias_cleaner->cleanAlias($alias);
  }
}

I don't remember exactly the specifics without debugging through everything again, but I think without this patch something wasn't quite right? I think we were getting multiple aliases on every entity save? Probably because the alias alter wasn't seeing the change from the pattern alter. Or later on in the process after the pattern alter something wasn't linked together? Something like that.

Also having tried to work through all of this it just didn't seem to make sense to not allow the source alter in pattern_alter() but to allow it in alias_alter(). The only place altering it works end-to-end is if you do it in the first one, not the second one.

karlshea’s picture

Sorry, edited my comment with more context if you're only seeing the email.

berdir’s picture

Thanks for the feedback.

I think for your scenario, I'd consider not relying on pathauto at all. I see how themodule_pathauto_pattern_alter() wouldn't work without this change, but I can't see how this would result in a consistent state. If you have a pathauto widget then how should that know that the source path is '/thesite_custom_brand/' . $brand->id();? How should the bulk update feature know that and figure out which aliases already exist? Why not point the canonical link template of your custom entity to that? If you hardcode stuff like this, does pathauto really add a benefit or just complexity?

I think for pathauto, the canonical source path must be consistent, it can not be consistent when just using this hook.

Also, for the second part, did you consider using the safe tokens setting in pathauto? That's the thing that allows more complex tokens that typically are existing paths with / and other special characters to bypass the sanitization. We use that for a similar use case in our projects.

Even though it's possible to generate code now that satisfy practically every use case, I think it's our responsibility as maintainers to balance use cases against code complexity and consistency. *If* we were to support it, I think it should need to be backed by a functional test to ensure something does work consistently. Which I don't think it can using this approach.

So, instead, I'd propose that we deprecate the ability to do what you do and instead suggest different approaches. The way to do that would be to detect if the source path was attempted to be altered in a hook and trigger a deprecation message or even warning in that case (as you only see deprecations in tests, which are unlikely to be triggered in custom code).

A possible alternative would be #3217830: Allow patterns to affect non-canonical entity URLs.

mably’s picture

@berdir, how do you suggest we proceed with this issue?

How do we fix the risk of alias duplication if source or language is modified in the pathauto_alias_alter hook?

berdir’s picture

I propose that we do not allow to alter them in the first place.

mably’s picture

OK, I’ll create a new MR for this.

Should we prevent source and language from being passed by reference in $context, or simply trigger a deprecation notice / warning if hooks modify them?

The example below in pathauto.api.php will also need to be updated to reflect this change:

function hook_pathauto_alias_alter(&$alias, array &$context) {
  // Add a suffix so that all aliases get saved as 'content/my-title.html'.
  $alias .= '.html';

  // Force all aliases to be saved as language neutral.
  $context['language'] = Language::LANGCODE_NOT_SPECIFIED;
}

mably’s picture

I tried to implement something in MR 164.

Deprecated altering $context['source'] and $context['language'] by reference in both hook_pathauto_alias_alter() and hook_pathauto_pattern_alter(), with runtime E_USER_DEPRECATED warnings and updated PHPDoc with @deprecated tags, to be removed in pathauto:2.0.0.

karlshea’s picture

> Why not point the canonical link template of your custom entity to that? If you hardcode stuff like this, does pathauto really add a benefit or just complexity?

I... don't know why I didn't think of that. I updated our code and that is totally working, I was able to remove a bunch of hacks (I guess I had also needed to create a custom AliasType probably to fix one of the other issues you brought up).

Thanks for the tip! We no longer need this patch.

> Also, for the second part, did you consider using the safe tokens setting in pathauto?

Not without that setting affecting all of the regular content entity types, and we needed to allow *anything*. These were promo paths for other companies used on their marketing materials and couldn't be changed, and were imported when I originally built the site on Drupal 7.

berdir’s picture

Didn't carefully review yet, but yes, the MR is what I had in mind.

#33 confirms that at least for that use case, there are better solutions.

In regards to the second part about the safe tokens, the setting is global but it's also per token. If the field that holds those values isn't already unique, you could add your own token like [entity-type:special-safe-whatever-token] and then you only allow "special-safe-whatever-token" and that should work? And if it's already something that's unique to your entity type, you could just use that. Note that the examples are just a single token part like alias, but the setting should work if you use the complete token name, such as entity_type:field_name, see \Drupal\pathauto\AliasCleaner::cleanTokenValues.

karlshea’s picture

Oh sure enough, I can remove that other hook too. Thanks for the tip!

berdir’s picture

Status: Needs review » Reviewed & tested by the community

Didn't test this myself, but we have confirmation that those alter hooks and tricks aren't needed and there are better solutions.

  • mably committed 165cff91 on 8.x-1.x
    fix: #3187945 Fix hooks and documentation to alter source path
    
    By:...
mably’s picture

Assigned: berdir » Unassigned
Status: Reviewed & tested by the community » Fixed

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.