Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nagba’s picture

Status: Active » Needs review
FileSize
1.25 KB

Pathauto can be set to create alias for a given entity - lets call this ALIAS-A. The alias may change (ALIAS-B) on update and at this point the redirect module may create a redirect from the old alias to the node. Now when a new entity is being created and it gets ALIAS-A then due to the existing redirect loading ALIAS-A will get the visitor to ALIAS-B, so suddenly two aliases lead to the same page even though the entities are completely different.

I'm attaching a patch for consideration that would delete existing aliases when a path would take on it. The other option (or one that could complete this approach) is to make sure that when pathauto generates a new path the alias would be altered when the above mentioned case would happen - considering current hook possibilities it might not be the prettiest though.

nagba’s picture

bit the bullet in the end, new patch which acts as if pathauto detected that the new path already exists

azinck’s picture

This can happen in other ways, even without pathauto. The root problem is that it's possible to have a url alias that collides with a redirect but which each point to different endpoints.

Redirect:
mypathname => node/10

While at the same time there's an alias:
mypathname => node/5

Currently the redirect is preferred over the alias. But I'd argue the alias should be preferred over the redirect. I'm not sure of the best place to put the logic to detect this...perhaps redirect_can_redirect()? Or in redirect_redirect() itself? What say others?

By the way, this logic could exist alongside the patch in #2. #2 avoids one way of creating this situation in the first place but there are still other ways it can happen (e.g., by manually creating aliases or using other modules besides pathauto which might create aliases) which is why I think we need the logic I've outlined above.

heddn’s picture

I had to update patch from #2 because it wasn't working. The logic check for if a redirect already exists wasn't working.

I started writing up a patch to address the concerns in #3. It would incur a serious performance hit to check for this small edge case. Even if we did a variable_get, we'd have to call off to the DB on every page render to get that variable. Are we sure this is an important enough scenario to account for? At this point, I've left it out of the attached patch.

For posterity's sake, here's the logic if we wanted to include a check for #3:

--- a/redirect.module
+++ b/redirect.module
@@ -323,7 +323,9 @@ function redirect_init() {

   // Fetch the current redirect.
   if ($redirect = redirect_get_current_redirect()) {
-    redirect_redirect($redirect);
+    if (redirect_no_path_alias_exists($redirect) ) {
+      redirect_redirect($redirect);
+    }
   }

   $redirect_global = FALSE;

@@ -1176,6 +1183,21 @@ function redirect_can_redirect() {
 }

 /**
+ * Checks if an override path alias exists.
+ *
+ * @link http://drupal.org/node/1288768
+ */
+function redirect_no_path_alias_exists($redirect) {
+  global $base_root;
+  $return = TRUE;
+  if ($base_root . drupal_get_path_alias() == url($redirect->redirect, array('absolute' => TRUE) + $redirect->redirect_options)) {
+    watchdog('redirect', 'A duplicate redirect exists for @path', array('@path' => $redirect->redirect));
+    $return = FALSE;
+  }
+  return $return;
+}
+

heddn’s picture

OK, here's a better approach. Almost the same as previous #4 but it adds some logic to remove a redirect if it would clash with a pathauto alias. It then lets pathauto proceed with creating the alias.

nagba’s picture

personally i dont agree with #3.
consider SEO or UX. The redirect is in place because we want the visitors to find the page which has moved - a url could have been bookmarked, etc. If someone creates a new content and you blindly kill the redirect cos you think the new content should get priority then confusion will rise for users.

heddn: hm. weird. i think #2 works on Drupal Gardens.

azinck’s picture

Hi nagba: there are downsides to either way of handling such a conflict but in my mind, a URL alias is more authoritative than a redirect.

Consider this situation again:

Redirect:
mypathname => node/10

While at the same time there's an alias:
mypathname => node/5

In this case, any links to node/5 that get passed through the url() function will get re-written to 'mypathname', but then a user attempting to use that link will end up viewing node/10. To me, that's far worse than any SEO considerations.

That said, heddn's patch looks like a great step in the right direction. It deletes redirects if they exactly match aliases and prevents collisions between automatically generated aliases and redirects. It looks to me like it solves the problem well for any collisions that might have happened "automatically". So if there are logic problems that remain at least they're confined to situations where someone has explicitly created conflicting aliases.

nagba’s picture

not just SEO but also user and user confusion.
and also consider timing.
if mypathname => node/10 was there first it should stay there and any AUTOMATIC pathalias generation should not reuse the alias, but rather use something else.
if the user tries to alter node/5 to use mypathname as alias then 1) alert him 2) make him delete the redirect . Automatically deleting a redirect is bad, cos you are deciding what is best for the user instead of lettimg him clear up the situation.

azinck’s picture

nagba: If you read my comment in #7 carefully I think you'll notice that I agree with you about not allowing automatically generated paths to collide with redirects and about not deleting redirects. I think the only point of disagreement regards whether to prefer a redirect or path alias when two are in direct conflict with each other.

As it is, that discussion doesn't really belong on this issue so I won't muddy it up any further. heddn has provided a patch in #5 that looks like it should address the problem at hand. It needs review (I'm not able to do proper testing on it right now myself due to rather all-consuming life events at the moment).

heddn’s picture

Last patch (hopefully). This one fixes an undefined index error when no redirect exists on a page.

Dave Reid’s picture

Status: Needs review » Needs work
michaelfavia’s picture

Our 2c:

While solving the pathauto side of this is important it is equally important to solve the higher level path.module issue.

Pathauto is just an automated version of what a user might accidentally do by hand via the standard path alias system and that is where the problem should be solved if possible.

I agree that the redirect module should block off the paths previously claimed by the path module. It should also not allow the alias module to reclaim them.

There are lots of solid arguments for why this should be the case but i dont think there is sufficient hookage in path_save(). It doenst have an module_invoke_all() for "alter" or "validation".

If it isnt possible to prevent path.moudle from creating the problem aliases, then redirect should be prevented from overriding known path aliases.

This took us a really long time to unwind today and it has and will bite others too. Thanks for the great work.

thtas’s picture

I've fixed this using the patch #1 for the reasons outlined in #12 - We needed the fix for both automatic aliases and manually entered aliases

narendraR’s picture

narendraR’s picture

donquixote’s picture

Issue summary: View changes

@michaelfavia (#12) is right.

Imo, the problem should be solved like this:

Ideally, whatever was first should win:
- If the alias was first, don't create a conflicting redirect.
- If the redirect was first, don't create a conflicting alias.

However, since we don't have any hooks in path_save(), and this function is called all over the place, this ideal situation won't ever happen.

So, what we should do instead:

  1. Implement hook_path_insert() and hook_path_update(), and delete conflicting redirects.
    If we get this far, path has won, and all we can do is leave.
  2. Hook into pathauto and maybe other things that create aliases, to intercept *before* path_save() is called.
    This should be in addition to (1.). So if this stuff all fails, point (1.) kicks in as a fallback.

Imo we should implement step (1.) asap and then talk about step (2.)

Pere Orga’s picture

Thanks for the comments #12 and #16. However, what happens with languages? Redirects (and aliases?) can be created for specific languages or for all languages, so it may happen that an alias exists for a specific language, and the same path exists a redirect for all (other) languages. In that case it would make sense to have both aliases and redirects for the same paths, ideally giving more priority to the alias/redirect that has more specificity.

And couldn't we create a hook in Drupal core path_save()?

Pere Orga’s picture

I've just closed #1459800: Re-assign old alias leads to duplicate as a duplicate of this one

donquixote’s picture

A new idea:

Create an entry in url_alias for every redirect.

The alias target url would be a path like redirect/%, with a page callback saying "You are not supposed to be here.".
Or the page callback could already do the redirect..

But since redirect does its magic in hook_init(), it does not really matter what the page callback does.
hook_init() redirect will only happen if the entry in url_alias is present and points to a redirect/% url.

Pere Orga’s picture

I like that idea, that's very different of what is being done.

I'm closing #1988450: From (source) should not change aliases to node numbers as a duplicate of this one.

mian3010’s picture

I have recently run into a problem with this patch.

I have applied patch from #10, but the condition that deletes a redirect, if the generated alias is the same will never be executed.

The function redirect_load_by_source returns an array, indexed by the rid, and therefore $redirect->redirect will never be set.

I have attached a revised patch from #10 that fixes this.

Dave Reid’s picture

Issue tags: +Needs tests

Now that #1796596: Fix and prevent circular redirects is resolved, how does that affect this issue? Could we get a test-only patch to confirm what the problem is?

mian3010’s picture

And i do not seem to have taken into account when no redirects are found. I have corrected this in attached patch.

mian3010’s picture

I can see that i have not been focused enough while solving this problem. The if-sentence below is now never true. Fixed in attached patch.

Sorry for all the rookie-mistakes ;-)

smustgrave’s picture

Has anyone made any progress on this issue? I believe this is the same issue I'm currently having where a redirect is created but in the redirect list it's displaying as a node number. I made a small tweak in the redirect.module file at line 385$path['source']; to $path['alias'];. What this seems to do is store the alias in the database under the 'redirect' column vs storing the node number. Thoughts anyone?

smustgrave’s picture

Please review and give any feedback. I'm hoping this helps solve the problem

smustgrave’s picture

Assigned: Unassigned » smustgrave
Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 26: 1288768-26_redirect-pathauto-conflict-fix.patch, failed testing.

smustgrave’s picture

Status: Needs work » Active
smustgrave’s picture

Updated the patch I previously submitted. Now a the redirect_alias table will get a value when a redirect is added between two different nodes as well as when a node changes it's url. The failure the patch is having is because there is no default value but I'm not entirely too sure how to solve that one.

smustgrave’s picture

smustgrave’s picture

Assigned: smustgrave » Unassigned
jyraya’s picture

Hello,

I read carefully each proposal of this issue and I think we should not automate so much the treatment of this case.

From feature point of view, we should treat this edge case as is:

  1. As already mentioned in previous comments, keep the alias of the new page untouched because it is the one required by the Information architecture and SEO needs.
    Why should we go against the site functional requirements?
  2. As suggested by donquixote, if I get well what it is suggested, we keep also the redirect data in order to support a "You are not supposed to be here." case.
    I think that we could display a message on the "new" page that uses the "old" path of the redirection as current path.
    This message could look like: "The page presents a new content. If you want to consult the old content, please follow this [URL]."
    The message is more than perfectible. :-) but the idea is there.
  3. In the redirect interface, foresee a screen listing all these problematic cases and allow users to fix them according to their needs; I.E: by deleting the redirection, by reviewing the implied URLs and changing them if required, or by keeping active the message mechanism described in my 2nd point.

So, we give enough flexibility to the webmasters to choose the right solution compared to their constraints.

scm6079’s picture

Since this ticket was first created, workflow has been introduced into Drupal core and has a very real impact - there will be more webmasters wanting to create redirects from nodes that are archived. In that case, the pathauto aliases will exist, and users will be fed a 403 error instead of a redirect from this module that may exist at that path.

It seems worthwhile to consider this potentially common use-case. Brainstorming possible workarounds, we could use the EventSubscriberInterface and alterResponse to check for nodes that are not published and perform a check for the redirect there only if the status code is 403 - allowing pathauto to remain untouched and handling the case where a node may become active again based on workflow.

I'm not sure this use-case justifies a unique ticket, but it may? What are everyone's thoughts?