Function path_set_alias produces wrong results if there are already different aliases for the same node and something should be changed.
Here's some code to reproduce the issues (normally aliases are strings like 'foo.html', 'foo.htm' or 'foo' and paths are nodes like 'node/4711'):
Wrong deletion of an alias:
path_set_alias('pathA', 'aliasA');
path_set_alias('pathB', 'aliasB');
// change path for aliasA (aliasA should point to the same node as aliasB and aliasC)
path_set_alias('pathB', 'aliasA');
expected result:
mysql> select src, dst from url_alias;
+-------+--------+
| src | dst |
+-------+--------+
| pathB | aliasB |
| pathB | aliasA |
+-------+--------+
real result:
mysql> select src, dst from url_alias;
+-------+--------+
| src | dst |
+-------+--------+
| pathB | aliasA |
+-------+--------+
aliasB has been deleted but should not be affected by the change of aliasA.
Ignorance of path change for an alias:
path_set_alias('pathA', 'aliasA');
path_set_alias('pathB', 'aliasB');
path_set_alias('pathB', 'aliasC');
// change path for aliasA (aliasA should point to the same node as aliasB)
path_set_alias('pathB', 'aliasA');
expected result:
mysql> select src, dst from url_alias;
+-------+--------+
| src | dst |
+-------+--------+
| pathB | aliasA |
| pathB | aliasB |
| pathB | aliasC |
+-------+--------+
real result:
mysql> select src, dst from url_alias;
+-------+--------+
| src | dst |
+-------+--------+
| pathA | aliasA |
| pathB | aliasB |
| pathB | aliasC |
+-------+--------+
Path for aliasA should be changed to pathB but nothing happens.
I attached a patch that fixes both issues.
Comments
Comment #1
Anonymous (not verified) commentedIt should be impossible to have more than one alias for the same src and vice versa. The columns src and dst should both be unique.
I don't think your patch gives the expected results either.
If I have
then
should give
and then
should give
Comment #2
mkalkbrenner@earnie
I disagree. In the current implementation only 'dst' has an unique constraint in the database which is correct!
Thinking about a migration of an existing site where you want to keep existing urls but you don't have an 1:1 relationship between the new and the old site regarding the structure you're forced to map two or more urls to one node. (Yes, you can use apache for this too.)
Example:
Old site:
/marketing/about_us.html
/sales/about_us.html
New site:
/marketing_and_sales/about_us.html
In this case you need three urls (aliases) for one node.
Currently this is a feature of drupal 5 which works well as long as you don't modify such an alias after it's creation.
So your example isn't the expected result.
The correct result of the first part of your example (which currently works regardless if you use the current or my patched version) is:
then
should give
And now we face the bug:
and then
should give
Comment #3
mkalkbrennerBTW I made a mistake regarding the comments when posting my reproduce code. Right comments are:
Wrong deletion of an alias:
Ignorance of path change for an alias:
Comment #4
Gurpartap Singh commented@earnie More than one path alias can actually correspond to the same system path. Like node/10 can have about_us and about as aliases.
The vice versa is of course not true.
Comment #5
Anonymous (not verified) commentedNo, there should only be one path to the same page for SEO reasons and code sanity. If you enter the src path the system will replace it with the dst alias. If there is more than one alias, which one is it to replace it with. The src and dst column should remain unique.
Comment #6
Gurpartap Singh commentedTo tackle the disputed "Google dislikes duplicate content" problem, there's another issue against path.module to have a primary or active alias/path to which a user is redirected, from any other alias that system path may have.
Comment #7
Anonymous (not verified) commentedBut there can only be one alias to a given src. Pathauto goes out of its way to add a suffix, rename an existing or delete an existing alias. It is assumed to be a unique src and a unique dst. You can't have more than one alias for a given path. If you give node/1 an alias of ALIAS1 and an alias of ALIAS2 which is supposed to be displayed in the URL? The alias also substitutes the URL information for the given path and if you enter node/1 it changes it to ALIAS1. More than one alias for the same path makes no sense at all and is convoluted to support.
Comment #8
danielb commentedI posted a similar issue at http://drupal.org/node/326490 will mark as dupe.
Comment #9
damien tournoud commentedThe "Wrong deletion of an alias" is a false issue: the first parameter of path_set_alias() is the destination, the second one is the source. There can only be one destination for a given source, so the current behavior is expected.
But the "Ignorance of path change for an alias" is a true issue, that comes from a minor bug in the function, fixed by the attached patch: a given path can be pointed at by several alias, so the check for
$path_count == 1should be$path_count >= 1.Moreover, I don't see why we remove all other aliases of the target path, so I also removed that. But this change will have to be reviewed very closely, because the original behavior goes way back to the first commits of the path modules in 2003, and might break other modules down the road.
Note that this only affects 5.x.
Comment #10
mkalkbrenner@Damien Tournoud
That's not correct. The first parameter is source, the second is destination:
So "Wrong deletion of an alias" is a true issue!
Comment #11
mkalkbrenner@earnie
I understand your arguments why path and alias should be a 1:1 relation. But regardless my bug report that's not how drupal 5 core works. Changing the relation from 1:n to 1:1 will be change of the API which is not good for a stable version.
From my point of view the core should allow 1:n and the optional pathauto module has to ensure 1:1 relations.
BTW If you only install the core you can access a node via it's alias and it's number. From a SEO point of view that's already duplicate content. So duplicate content could not be an argument to force 1:1 relations.
Comment #12
Anonymous (not verified) commentedFrom: http://api.drupal.org/api/function/path_set_alias/5
There shouldn't be duplicates for src or dst. This is what the current code is trying to do. Remove both the existing alias based on $alias and then based on $path.
If no path but an alias is given there is a delete statement where dst = $alias. If path is given and no alias there is a delete statement where src = $path. There should be zero duplication for src and dst.
Comment #13
Anonymous (not verified) commentedNote the code changed drastically in D6 http://api.drupal.org/api/function/path_set_alias/6 from patch #154517: Path alias saving is not language-aware. So, I'm thinking a patch for the path_set_alias function should be made based on the code from 6 first. What do others think? Otherwise the attached patch is what I believe to be the answer.
Comment #14
Anonymous (not verified) commentedComment #15
mkalkbrenner@earnie
I interpret "Set an aliased path for a given Drupal path, preventing duplicates" as preventing duplicates of aliased path / Drupal path combinations.
Nevertheless path_set_ailias is buggy and a patch is required. The question is, in which direction?
If you decide to set a unique constraint on 'src' or on the combination of 'src' and 'dst' in the database the update will fail at some of our installations and I'm convinced it will fail at others too.
What do you think about extending the path module and make the behavior configurable?
Comment #16
mkalkbrennerThe patch attached to comment #13 doesn't work.
When reaching the UPDATE statement, $alias could not already exist as dst in database. So "OR dst = '%s'" is senseless.
Additionally if there are already stored several aliases (dst) for the source (src) they will all be updated to the same values for dst and src which will fail with a database error due to a unique constraint violation on dst.
Comment #17
Anonymous (not verified) commentedOk, so I've found evidence that I'm just blowing smoke in the wind. In particular in the admin/build/path help text it states that a system path can have multiple aliases.
Comment #18
Anonymous (not verified) commentedAttached is a patch that matches some of the cleanup in D6 but not all and also uses David's fix of removing the delete of all system paths.
Comment #19
damien tournoud commented#18 is functionally exactly #9. I'm -1 to the cleanup and the move of drupal_clear_path_cache(). This has nothing to do here, as Drupal 5 is a maintenance release.
Please review and test #9 instead.
Comment #20
mkalkbrenner@Damien Tournoud
Did you see my comment #10? I think there's a still problem with your patch in #9.
Comment #21
damien tournoud commented@mkalkbrenner: please test and report.
Comment #22
mkalkbrenner@Damien Tournoud
Your patch solves both issues. Expected and real result are the same in both cases.
But there's a little thing remaining which is a difference between our patches.
In your case the behavior remains that the dataset for aliasA is not changed but deleted and created again which results in a new pid due to the auto_increment.
I'm of the opinion that if someone uses the third parameter pid of path_set_alias to modify an alias, he expects that it's stable and not changed.
Same thing happens if you simply set the same alias again:
Comment #23
drummIn the DRUPAL-5 branch, the following modules do a SELECT pid... FROM {url_alias}:
edit_term
epublish
find_path
gsitemap
pathauto
sitedoc
sysinfo
url_access
pathauto does preserve pids, but I didn't notice anything that would depend on it staying the same. gsitemap does seem to use the pid, copying it over it its own table.
So I do think we need to preserve pids.
Comment #24
mkalkbrenner@drumm
So please review the patch I initially posted. It fixes all the issues mentioned in this thread and is preserving pids.
Comment #25
yonailo@Damien Tournoud
@mkalkbrenner
@all
I think that fixing the $pid issue that @mkalkbrenner tells in #22 is quite easy, just add $pid to the recursive call.
To avoid incrementing the $pid when you make the same alias twice, I would propose also checking that the alias does not exist already.
Please review the attach patch.
Comment #26
dpearcefl commentedConsidering the lack of activity on this issue and that Drupal v5 is no longer supported by for fixes or patches, I am going to close this ticket.