Hi,
I'm using version 6.x-1.1 on a site with the locale and content translation modules disabled, so there is no multi-lingual functionality. However, the default page nodetype creates pages that are automatically tagged as language 'en' and other node types, such as images, are by default created as language-neutral (language is empty).
This creates the possibility of duplicate aliases because of improper handling in _pathauto_alias_exists. It only checks for existing aliases with the same language code as the one we are creating. So, if I create a page with the title 'mynode', an alias with language code 'en' will be created. If I then create an image with the title 'mynode', the function will only check for other language-independent nodes with the same alias. So, the image node gets created with the same alias, 'mynode', instead of 'mynode-0'.
If we are creating a language-neutral node, then we want to check for existing aliases in any language. If we are creating a node (like a page) that is associated with a language, then we want to check for existing aliases that are in that language or are language-neutral.
I'm attaching a patch to pathauto.inc that seems to have solved the problem for me.
Comments
Comment #1
JirkaRybka CreditAttribution: JirkaRybka commentedThis is indeed critical, as it causes nasty indirect effects, but is sometimes difficult to track down/notice (which is why it stands so long, I believe).
In my case, the site in question runs only one language (which is not English), and the aliases are duplicated between legacy content from Drupal 5 (upgraded site, old aliases are language neutral) and new entries of the same type with titles repeated (new aliases are created with the only available language explicitly). Result is that links for both nodes point to the same alias, so that people complain of duplicated contents, of mismatch between link and actual node title (where the difference was just upper/lower case, resulting in the same alias), of incorrect data in nodes (actually viewing another one), but interestingly no-one noticed that several older posts are effectively inaccessible/uneditable unless you know nid and type unaliased path manually. Quite a mess anyway, even with no "I18n stuff" used on the site.
I independently came to the same solution, and then tested this patch on 6.x-1.2 successfuly, so marking as RTBC.
It applied with 1 line fuzz, so I re-rolled the patch for both 6.x-1.2 and 6.x-1.x-dev [the latter separately due to more fuzz, applies also to 6.x-2.x-dev with just a bit of offset]. I didn't change anything, these are just pure re-rolls due to unrelated changes in neighboring code. The affected bit of code is identical in all the versions, so the change should be fine in all of them.
The 6.x-1.x-dev patch should be the one to go, per Drupal workflow, but the 6.x-1.2 version might be useful for users as temporary solution until next release.
(I also marked a couple of issues as Duplicates: #644724: Repeated alias, and #537800: duplicate url aliases which mentions also a collision between different entity types, like identical node title and taxonomy term, leading to quite the same bug.)
Comment #2
Dave ReidThis should probably just use
Comment #3
JirkaRybka CreditAttribution: JirkaRybka commentedIf you mean just the difference between
(language = '%s' OR language = '')
andlanguage IN ('%s', '')
, then yes, these two are essentially the same, but I see no harm in having it written one way or the other.If you mean the line as a sole replacement for the original line (i.e. without the if($language) construct), then no. Although I did use it that way on my site at first and it worked, it's only usable on a single-language site. What we want here is to avoid duplicates between aliases in a language, and language-neutral ones (as these apply to all languages, causing a collision). So, when creating alias in a language, we need to check for an existing alias in that language, or a language-neutral alias (the first part of the code in patch); an alias in a different language is not a problem. When creating a language-neutral alias though, we want to check for existing aliases that are either language-neutral too, or in *any* language (because our new neutral alias is going to apply to *all* languages as well), so in the end just any duplicate alias should be detected no matter the language (that's why we let the language condition completely out in the second part in the patch).
So I see the patch as correct, still.
Comment #4
greg.harveyPatch works for me. Another +1 for R&TBC.
Comment #5
greg.harveyActually, with the patch applied, if you create a node and you're not using the source language you still get a duplicate alias message. I don't think it's quite there. Here are the settings to try to replicate:
Languages are English and French (English is default).
Edit English and give it a path prefix of 'en'.
Under Configure, language negotiation is set to "Path prefix only".
Everything else under i18n settings is default.
So, content type is Page - checked multilingual options are:
Require language
Normal - All enabled languages will be allowed.
Synchronising Moderate, Taxonomy & Comment settings
Create a new Page, set the language to French, write some stuff and save. I get a duplicate alias error on save still. I *don't* get an error when I create an English (source language) page and subsequent French translations, so I think the patch caught that.
Comment #6
JirkaRybka CreditAttribution: JirkaRybka commentedDoes your comment mean, that there may be a collision between two aliases, even if they are in different (explicit) languages? If so, then an easy way out might be to just remove the language part from the query completely, so that *all* aliases will be unique. Just thinking out loud, this needs to be confirmed first, and I don't have the time now (and no experience with multilingual sites neither).
Comment #7
greg.harveyHi,
I'll try to confirm it with a totally blank Drupal 6 installation later today. I'm having a doubt - maybe something else caused this - I'll only install locale and content translate, plus pathauto, and see if I can reproduce this.
Comment #8
sdrycroft CreditAttribution: sdrycroft commentedThe patch works for me (completely clean Drupal install, just Content Translation, Pathauto, and their dependencies installed).
I do however think that this issue isn't really a Pathauto one, and that it should be fixed in Drupal Core. I'm able to recreate exactly the same behaviour as is experienced in this bug report, but without using Pathauto (I can set the path to a Story node, which isn't set to use the translation module, as "somepath", and then set the path to a Language Neutral Page node, which is set to use the translation module, also as "somepath". No warning is displayed, and they both receive the same path:
pid 1 is the Story node (given the default language "en"), whilst pid 2 is the Page node which I didn't specify a language for.
Comment #9
Dave ReidI think #269877: path_set_alias() doesn't account for same alias in different languages is the root issue in core?
Comment #10
greg.harveyHi Dave - that would make sense, I think. So the trick is to make sure something unique to a language either prefixes or suffixes the automatic alias, as a workaround until that core bug is got. Or maybe that's just good practice anyway...
Comment #11
Dave ReidAssigning to myself to fix before the next release.
Comment #12
Dave ReidPatch against D7.
Comment #13
Dave ReidPatch with a test case to ensure this stays fixed.
Comment #14
Dave ReidCommitted #13 to CVS.
http://drupal.org/cvs?commit=398062
Attaching patch for review against 6.x-2.x.
Comment #15
Dave ReidComment #17
Dave Reid#15: 593048-pathauto-alias-exists-language.patch queued for re-testing.
Comment #18
Dave ReidPassed the bot, so committed to 6.x-2.x!
http://drupal.org/cvs?commit=398094
Will be backported to 6.x-1.x as a branch merge.
Comment #20
klonosThanx Dave! I know there are A LOT of important issues still to be fixed in pathauto (imagine that the ones I monitor alone are more than 20), but there were quite a few bugs reported in other module's queues that were depended on this one. So, could there be a stable version released including this fix?
PS: you can see from my posts that I always use dev versions where available, so please do not take this as another 'we need a stable release' nag. I am not like that at all! It's just that other module developers cannot suggest to their audience to install dev versions of other modules to fix their issues. ...well they can, but it'll come back and byte them. I'm sure of it.
Thank you for considering this.
Comment #21
klonos... my comment should have gone here: #845498: Please make a release on the 2.x branch and I should have read this. Sorry ;)
/goes here: http://groups.drupal.org/node/85954 to help with testing/
Comment #22
tjmoyer CreditAttribution: tjmoyer commentedIt seems to me that we should make sure there is no existing alias, independent of language, as there are still conflicts and duplicates when implementing a multi-language site.
For instance, with this patch (and the latest version of pathauto includes it), if I have a node with the default language (in our case 'en') assigned and I try to assign the same path to a new node with language neutral assigned, this only checks against language being empty in the url_alias table and comes back empty, allowing a duplicate alias to be created for the new node.
This can cause problems with caches, proxies, and search engines, to say the least. I suggest removing the language check in _pathauto_alias_exists() all together to ensure there are no duplicates at all.
Comment #23
Dave Reidhuh? your case shouldn't allow the node with language neutral (langcode of '') to have the same alias as the english node as they would conflict. I'm not sure I see how the current code in CVS or latest release has a problem and why this patch would need to be applied
Comment #24
klonosI'm lost TJ :/
Can you please post an example (including some example paths) of what you propose against examples of how it is handled now? Thanx.
Comment #25
tjmoyer CreditAttribution: tjmoyer commentedScenario:
There's an entry in the url_alias table:
src = 'node/123' dst = 'About' language = 'en'
I create a new node with the following:
src = 'node/124' dst = 'About' language = ''
The current query to check for existing aliases is:
SELECT pid FROM {url_alias} WHERE src <> 'node/123' AND dst = 'About' AND language IN ('', '');
This will return empty, as it's checking for neutral language and current node's language (which is also neutral). And it will go ahead and insert a new entry in url_alias for the same alias, 'About'. Now all attempts to get to About will resolved only to one of these two entries.
What I propose is to exclude the language check to ensure, no matter what mix of language you use, there will be no alias conflicts.
The other option would be to check again new node's language, language neutral, and the default language for the site.
Comment #26
Dave ReidAh, the problem is you should not be allowed to create language neutral nodes as far as I know. They always have to have some language. I think without locale module installed Drupal uses the default language code 'en' for {node}.language.
Comment #27
tjmoyer CreditAttribution: tjmoyer commentedWe have locale installed, but not all of our content types are translatable. So the default that's being assigned to nodes without translation turned on is language neutral (or an empty entry in the url_alias table), while those created in the default language ('en') get 'en' assigned. Also, some of the content types were originally not translatable, but have been changed to be translatable, so there are conflicts between nodes within the same node type, as well.
Comment #28
JirkaRybka CreditAttribution: JirkaRybka commentedAlso all nodes existing on 5.x (before upgrade to 6.x) are language neutral - stating this just for record and to complete the info above. (This won't be a problem with newly created nodes, as the language-specific ones will go in later).
Otherwise, I believe that creation of new language-neutral nodes makes sense, whether allowed in core or not, or at the very least language neutral aliases. With CCK, with Image module, or whatever, we can have nodes whose contents is just an image, audio file, numbers, list of persons (names)... These are language neutral by nature, and having alias for such node only in some given language makes no sense.
Comment #29
johnvsubscribing. Having the same problem with Clone module 6.x-1.2 and Pathauto 6.x-1.4
Comment #30
rv0 CreditAttribution: rv0 commentedcrosspost D7 issue:
#2067191: pathauto handles language-neutral aliases wrong resulting in duplicates