Function path_set_alias produces wrong results if there are already different aliases for the same node and something should be changed.
Here's some code to reproduce the issues (normally aliases are strings like 'foo.html', 'foo.htm' or 'foo' and paths are nodes like 'node/4711'):

Wrong deletion of an alias:

    path_set_alias('pathA', 'aliasA');
    path_set_alias('pathB', 'aliasB');
    // change path for aliasA (aliasA should point to the same node as aliasB and aliasC)
    path_set_alias('pathB', 'aliasA');

expected result:

mysql> select src, dst from url_alias;
+-------+--------+
| src   | dst    |
+-------+--------+
| pathB | aliasB |
| pathB | aliasA |
+-------+--------+

real result:

mysql> select src, dst from url_alias;
+-------+--------+
| src   | dst    |
+-------+--------+
| pathB | aliasA |
+-------+--------+

aliasB has been deleted but should not be affected by the change of aliasA.

Ignorance of path change for an alias:

    path_set_alias('pathA', 'aliasA');
    path_set_alias('pathB', 'aliasB');
    path_set_alias('pathB', 'aliasC');
    // change path for aliasA (aliasA should point to the same node as aliasB)
    path_set_alias('pathB', 'aliasA');

expected result:

mysql> select src, dst from url_alias;
+-------+--------+
| src   | dst    |
+-------+--------+
| pathB | aliasA |
| pathB | aliasB |
| pathB | aliasC |
+-------+--------+

real result:

mysql> select src, dst from url_alias;
+-------+--------+
| src   | dst    |
+-------+--------+
| pathA | aliasA |
| pathB | aliasB |
| pathB | aliasC |
+-------+--------+

Path for aliasA should be changed to pathB but nothing happens.

I attached a patch that fixes both issues.

CommentFileSizeAuthor
#25 path-set-alias.patch1.01 KByonailo
#18 path-set-alias.patch2.08 KBAnonymous (not verified)
#13 path-set-alias.patch1.67 KBAnonymous (not verified)
#9 327535-path-set-alias-multiple.patch834 bytesdamien tournoud
path_set_alias_multiple_alias.patch843 bytesmkalkbrenner

Comments

Anonymous’s picture

Status: Needs review » Needs work

It should be impossible to have more than one alias for the same src and vice versa. The columns src and dst should both be unique.

I don't think your patch gives the expected results either.

If I have

pathA, aliasA

then

path_alias_set(pathA, aliasB);
db_query("SELECT src, dst FROM {url_alias}");

should give

pathA, aliasB

and then

path_alias_set(pathB, aliasB);
db_query("SELECT src, dst FROM {url_alias}");

should give

pathB, aliasB
mkalkbrenner’s picture

@earnie

I disagree. In the current implementation only 'dst' has an unique constraint in the database which is correct!

Thinking about a migration of an existing site where you want to keep existing urls but you don't have an 1:1 relationship between the new and the old site regarding the structure you're forced to map two or more urls to one node. (Yes, you can use apache for this too.)

Example:
Old site:
/marketing/about_us.html
/sales/about_us.html

New site:
/marketing_and_sales/about_us.html

In this case you need three urls (aliases) for one node.

Currently this is a feature of drupal 5 which works well as long as you don't modify such an alias after it's creation.

So your example isn't the expected result.

The correct result of the first part of your example (which currently works regardless if you use the current or my patched version) is:

pathA, aliasA

then

path_alias_set(pathA, aliasB);
db_query("SELECT src, dst FROM {url_alias}");

should give

pathA, aliasA
pathA, aliasB

And now we face the bug:

and then

path_alias_set(pathB, aliasB);
db_query("SELECT src, dst FROM {url_alias}");

should give

pathA, aliasA
pathB, aliasB
mkalkbrenner’s picture

BTW I made a mistake regarding the comments when posting my reproduce code. Right comments are:

Wrong deletion of an alias:

    path_set_alias('pathA', 'aliasA');
    path_set_alias('pathB', 'aliasB');
    // change path for aliasA (aliasA should point to the same node as aliasB)
    path_set_alias('pathB', 'aliasA');

Ignorance of path change for an alias:

    path_set_alias('pathA', 'aliasA');
    path_set_alias('pathB', 'aliasB');
    path_set_alias('pathB', 'aliasC');
    // change path for aliasA (aliasA should point to the same node as aliasB and aliasC)
    path_set_alias('pathB', 'aliasA');
Gurpartap Singh’s picture

@earnie More than one path alias can actually correspond to the same system path. Like node/10 can have about_us and about as aliases.

The vice versa is of course not true.

Anonymous’s picture

No, there should only be one path to the same page for SEO reasons and code sanity. If you enter the src path the system will replace it with the dst alias. If there is more than one alias, which one is it to replace it with. The src and dst column should remain unique.

Gurpartap Singh’s picture

To tackle the disputed "Google dislikes duplicate content" problem, there's another issue against path.module to have a primary or active alias/path to which a user is redirected, from any other alias that system path may have.

Anonymous’s picture

But there can only be one alias to a given src. Pathauto goes out of its way to add a suffix, rename an existing or delete an existing alias. It is assumed to be a unique src and a unique dst. You can't have more than one alias for a given path. If you give node/1 an alias of ALIAS1 and an alias of ALIAS2 which is supposed to be displayed in the URL? The alias also substitutes the URL information for the given path and if you enter node/1 it changes it to ALIAS1. More than one alias for the same path makes no sense at all and is convoluted to support.

danielb’s picture

I posted a similar issue at http://drupal.org/node/326490 will mark as dupe.

damien tournoud’s picture

Version: 5.12 » 5.x-dev
Priority: Critical » Normal
Status: Needs work » Needs review
StatusFileSize
new834 bytes

The "Wrong deletion of an alias" is a false issue: the first parameter of path_set_alias() is the destination, the second one is the source. There can only be one destination for a given source, so the current behavior is expected.

But the "Ignorance of path change for an alias" is a true issue, that comes from a minor bug in the function, fixed by the attached patch: a given path can be pointed at by several alias, so the check for $path_count == 1 should be $path_count >= 1.

Moreover, I don't see why we remove all other aliases of the target path, so I also removed that. But this change will have to be reviewed very closely, because the original behavior goes way back to the first commits of the path modules in 2003, and might break other modules down the road.

Note that this only affects 5.x.

mkalkbrenner’s picture

@Damien Tournoud

The "Wrong deletion of an alias" is a false issue: the first parameter of path_set_alias() is the destination, the second one is the source. There can only be one destination for a given source, so the current behavior is expected.

That's not correct. The first parameter is source, the second is destination:

function path_set_alias($path = NULL, $alias = NULL, $pid = NULL) {
  ...
  db_query("INSERT INTO {url_alias} (src, dst) VALUES ('%s', '%s')", $path, $alias);
  ...
}

So "Wrong deletion of an alias" is a true issue!

mkalkbrenner’s picture

@earnie

I understand your arguments why path and alias should be a 1:1 relation. But regardless my bug report that's not how drupal 5 core works. Changing the relation from 1:n to 1:1 will be change of the API which is not good for a stable version.
From my point of view the core should allow 1:n and the optional pathauto module has to ensure 1:1 relations.

BTW If you only install the core you can access a node via it's alias and it's number. From a SEO point of view that's already duplicate content. So duplicate content could not be an argument to force 1:1 relations.

Anonymous’s picture

Status: Needs review » Needs work

From: http://api.drupal.org/api/function/path_set_alias/5

Description

Set an aliased path for a given Drupal path, preventing duplicates.

There shouldn't be duplicates for src or dst. This is what the current code is trying to do. Remove both the existing alias based on $alias and then based on $path.

If no path but an alias is given there is a delete statement where dst = $alias. If path is given and no alias there is a delete statement where src = $path. There should be zero duplication for src and dst.

Anonymous’s picture

StatusFileSize
new1.67 KB

Note the code changed drastically in D6 http://api.drupal.org/api/function/path_set_alias/6 from patch #154517: Path alias saving is not language-aware. So, I'm thinking a patch for the path_set_alias function should be made based on the code from 6 first. What do others think? Otherwise the attached patch is what I believe to be the answer.

Anonymous’s picture

Status: Needs work » Needs review
mkalkbrenner’s picture

@earnie
I interpret "Set an aliased path for a given Drupal path, preventing duplicates" as preventing duplicates of aliased path / Drupal path combinations.

Nevertheless path_set_ailias is buggy and a patch is required. The question is, in which direction?

If you decide to set a unique constraint on 'src' or on the combination of 'src' and 'dst' in the database the update will fail at some of our installations and I'm convinced it will fail at others too.

What do you think about extending the path module and make the behavior configurable?

mkalkbrenner’s picture

The patch attached to comment #13 doesn't work.

function path_set_alias($path = NULL, $alias = NULL, $pid = NULL) {
  ...
  else if ($path && $alias) {
    ...
    $alias_count = db_result(db_query("SELECT COUNT(dst) FROM {url_alias} WHERE dst = '%s'", $alias));
    ...
    else if ($path_count && !$alias_count) {
      if ($pid) {
        ...
      }
      else {
        db_query("UPDATE {url_alias} SET src = '%s', dst = '%s' WHERE src = '%s' OR dst = '%s'", $path, $alias, $path, $alias);

When reaching the UPDATE statement, $alias could not already exist as dst in database. So "OR dst = '%s'" is senseless.
Additionally if there are already stored several aliases (dst) for the source (src) they will all be updated to the same values for dst and src which will fail with a database error due to a unique constraint violation on dst.

Anonymous’s picture

Status: Needs review » Needs work

Ok, so I've found evidence that I'm just blowing smoke in the wind. In particular in the admin/build/path help text it states that a system path can have multiple aliases.

Anonymous’s picture

Status: Needs work » Needs review
StatusFileSize
new2.08 KB

Attached is a patch that matches some of the cleanup in D6 but not all and also uses David's fix of removing the delete of all system paths.

damien tournoud’s picture

#18 is functionally exactly #9. I'm -1 to the cleanup and the move of drupal_clear_path_cache(). This has nothing to do here, as Drupal 5 is a maintenance release.

Please review and test #9 instead.

mkalkbrenner’s picture

@Damien Tournoud

Please review and test #9 instead.

Did you see my comment #10? I think there's a still problem with your patch in #9.

damien tournoud’s picture

@mkalkbrenner: please test and report.

mkalkbrenner’s picture

@Damien Tournoud
Your patch solves both issues. Expected and real result are the same in both cases.

But there's a little thing remaining which is a difference between our patches.
In your case the behavior remains that the dataset for aliasA is not changed but deleted and created again which results in a new pid due to the auto_increment.

I'm of the opinion that if someone uses the third parameter pid of path_set_alias to modify an alias, he expects that it's stable and not changed.

Same thing happens if you simply set the same alias again:

  path_set_alias('pathX', 'aliasX');
mysql> select pid, src, dst from url_alias;
+-----+-------+--------+
| pid | src   | dst    |
+-----+-------+--------+
|   1 | pathX | aliasX |
+-----+-------+--------+
  path_set_alias('pathX', 'aliasX');
mysql> select pid, src, dst from url_alias;
+-----+-------+--------+
| pid | src   | dst    |
+-----+-------+--------+
|   2 | pathX | aliasX |
+-----+-------+--------+
drumm’s picture

Status: Needs review » Needs work

In the DRUPAL-5 branch, the following modules do a SELECT pid... FROM {url_alias}:

edit_term
epublish
find_path
gsitemap
pathauto
sitedoc
sysinfo
url_access

pathauto does preserve pids, but I didn't notice anything that would depend on it staying the same. gsitemap does seem to use the pid, copying it over it its own table.

So I do think we need to preserve pids.

mkalkbrenner’s picture

@drumm

So I do think we need to preserve pids.

So please review the patch I initially posted. It fixes all the issues mentioned in this thread and is preserving pids.

yonailo’s picture

StatusFileSize
new1.01 KB

@Damien Tournoud
@mkalkbrenner
@all

I think that fixing the $pid issue that @mkalkbrenner tells in #22 is quite easy, just add $pid to the recursive call.

To avoid incrementing the $pid when you make the same alias twice, I would propose also checking that the alias does not exist already.

Please review the attach patch.

dpearcefl’s picture

Status: Needs work » Closed (won't fix)

Considering the lack of activity on this issue and that Drupal v5 is no longer supported by for fixes or patches, I am going to close this ticket.