Hi,

I'm using version 6.x-1.1 on a site with the locale and content translation modules disabled, so there is no multi-lingual functionality. However, the default page nodetype creates pages that are automatically tagged as language 'en' and other node types, such as images, are by default created as language-neutral (language is empty).

This creates the possibility of duplicate aliases because of improper handling in _pathauto_alias_exists. It only checks for existing aliases with the same language code as the one we are creating. So, if I create a page with the title 'mynode', an alias with language code 'en' will be created. If I then create an image with the title 'mynode', the function will only check for other language-independent nodes with the same alias. So, the image node gets created with the same alias, 'mynode', instead of 'mynode-0'.

If we are creating a language-neutral node, then we want to check for existing aliases in any language. If we are creating a node (like a page) that is associated with a language, then we want to check for existing aliases that are in that language or are language-neutral.

I'm attaching a patch to pathauto.inc that seems to have solved the problem for me.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

JirkaRybka’s picture

Version: 6.x-1.1 » 6.x-1.x-dev
Status: Needs review » Reviewed & tested by the community
FileSize
1.07 KB
1.05 KB

This is indeed critical, as it causes nasty indirect effects, but is sometimes difficult to track down/notice (which is why it stands so long, I believe).

In my case, the site in question runs only one language (which is not English), and the aliases are duplicated between legacy content from Drupal 5 (upgraded site, old aliases are language neutral) and new entries of the same type with titles repeated (new aliases are created with the only available language explicitly). Result is that links for both nodes point to the same alias, so that people complain of duplicated contents, of mismatch between link and actual node title (where the difference was just upper/lower case, resulting in the same alias), of incorrect data in nodes (actually viewing another one), but interestingly no-one noticed that several older posts are effectively inaccessible/uneditable unless you know nid and type unaliased path manually. Quite a mess anyway, even with no "I18n stuff" used on the site.

I independently came to the same solution, and then tested this patch on 6.x-1.2 successfuly, so marking as RTBC.

It applied with 1 line fuzz, so I re-rolled the patch for both 6.x-1.2 and 6.x-1.x-dev [the latter separately due to more fuzz, applies also to 6.x-2.x-dev with just a bit of offset]. I didn't change anything, these are just pure re-rolls due to unrelated changes in neighboring code. The affected bit of code is identical in all the versions, so the change should be fine in all of them.

The 6.x-1.x-dev patch should be the one to go, per Drupal workflow, but the 6.x-1.2 version might be useful for users as temporary solution until next release.

(I also marked a couple of issues as Duplicates: #644724: Repeated alias, and #537800: duplicate url aliases which mentions also a collision between different entity types, like identical node title and taxonomy term, leading to quite the same bug.)

Dave Reid’s picture

This should probably just use

$alias_pid = db_result(db_query_range("SELECT pid FROM {url_alias} WHERE dst = '%s' AND src <> '%s' AND language IN ('%s', '')", array($alias, $src, $language), 0, 1));
JirkaRybka’s picture

If you mean just the difference between (language = '%s' OR language = '') and language IN ('%s', ''), then yes, these two are essentially the same, but I see no harm in having it written one way or the other.

If you mean the line as a sole replacement for the original line (i.e. without the if($language) construct), then no. Although I did use it that way on my site at first and it worked, it's only usable on a single-language site. What we want here is to avoid duplicates between aliases in a language, and language-neutral ones (as these apply to all languages, causing a collision). So, when creating alias in a language, we need to check for an existing alias in that language, or a language-neutral alias (the first part of the code in patch); an alias in a different language is not a problem. When creating a language-neutral alias though, we want to check for existing aliases that are either language-neutral too, or in *any* language (because our new neutral alias is going to apply to *all* languages as well), so in the end just any duplicate alias should be detected no matter the language (that's why we let the language condition completely out in the second part in the patch).

So I see the patch as correct, still.

greg.harvey’s picture

Patch works for me. Another +1 for R&TBC.

greg.harvey’s picture

Status: Reviewed & tested by the community » Needs work

Actually, with the patch applied, if you create a node and you're not using the source language you still get a duplicate alias message. I don't think it's quite there. Here are the settings to try to replicate:

Languages are English and French (English is default).
Edit English and give it a path prefix of 'en'.
Under Configure, language negotiation is set to "Path prefix only".
Everything else under i18n settings is default.

So, content type is Page - checked multilingual options are:

Require language
Normal - All enabled languages will be allowed.
Synchronising Moderate, Taxonomy & Comment settings

Create a new Page, set the language to French, write some stuff and save. I get a duplicate alias error on save still. I *don't* get an error when I create an English (source language) page and subsequent French translations, so I think the patch caught that.

JirkaRybka’s picture

Does your comment mean, that there may be a collision between two aliases, even if they are in different (explicit) languages? If so, then an easy way out might be to just remove the language part from the query completely, so that *all* aliases will be unique. Just thinking out loud, this needs to be confirmed first, and I don't have the time now (and no experience with multilingual sites neither).

greg.harvey’s picture

Hi,

I'll try to confirm it with a totally blank Drupal 6 installation later today. I'm having a doubt - maybe something else caused this - I'll only install locale and content translate, plus pathauto, and see if I can reproduce this.

sdrycroft’s picture

The patch works for me (completely clean Drupal install, just Content Translation, Pathauto, and their dependencies installed).

I do however think that this issue isn't really a Pathauto one, and that it should be fixed in Drupal Core. I'm able to recreate exactly the same behaviour as is experienced in this bug report, but without using Pathauto (I can set the path to a Story node, which isn't set to use the translation module, as "somepath", and then set the path to a Language Neutral Page node, which is set to use the translation module, also as "somepath". No warning is displayed, and they both receive the same path:

SELECT * FROM url_alias;
+-----+--------+----------+----------+
| pid | src    | dst      | language |
+-----+--------+----------+----------+
|   1 | node/1 | somepath | en       | 
|   2 | node/2 | somepath |          | 
+-----+--------+----------+----------+

pid 1 is the Story node (given the default language "en"), whilst pid 2 is the Page node which I didn't specify a language for.

Dave Reid’s picture

greg.harvey’s picture

Hi Dave - that would make sense, I think. So the trick is to make sure something unique to a language either prefixes or suffixes the automatic alias, as a workaround until that core bug is got. Or maybe that's just good practice anyway...

Dave Reid’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Assigned: idmacdonald » Dave Reid

Assigning to myself to fix before the next release.

Dave Reid’s picture

Status: Needs work » Needs review
FileSize
1.63 KB

Patch against D7.

Dave Reid’s picture

Patch with a test case to ensure this stays fixed.

Dave Reid’s picture

Version: 7.x-1.x-dev » 6.x-2.x-dev
FileSize
2.78 KB

Committed #13 to CVS.
http://drupal.org/cvs?commit=398062

Attaching patch for review against 6.x-2.x.

Dave Reid’s picture

Status: Needs review » Needs work

The last submitted patch, 593048-pathauto-alias-exists-language.patch, failed testing.

Dave Reid’s picture

Status: Needs work » Needs review
Dave Reid’s picture

Status: Needs review » Fixed

Passed the bot, so committed to 6.x-2.x!
http://drupal.org/cvs?commit=398094

Will be backported to 6.x-1.x as a branch merge.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

klonos’s picture

Thanx Dave! I know there are A LOT of important issues still to be fixed in pathauto (imagine that the ones I monitor alone are more than 20), but there were quite a few bugs reported in other module's queues that were depended on this one. So, could there be a stable version released including this fix?

PS: you can see from my posts that I always use dev versions where available, so please do not take this as another 'we need a stable release' nag. I am not like that at all! It's just that other module developers cannot suggest to their audience to install dev versions of other modules to fix their issues. ...well they can, but it'll come back and byte them. I'm sure of it.

Thank you for considering this.

klonos’s picture

... my comment should have gone here: #845498: Please make a release on the 2.x branch and I should have read this. Sorry ;)

/goes here: http://groups.drupal.org/node/85954 to help with testing/

tjmoyer’s picture

It seems to me that we should make sure there is no existing alias, independent of language, as there are still conflicts and duplicates when implementing a multi-language site.

For instance, with this patch (and the latest version of pathauto includes it), if I have a node with the default language (in our case 'en') assigned and I try to assign the same path to a new node with language neutral assigned, this only checks against language being empty in the url_alias table and comes back empty, allowing a duplicate alias to be created for the new node.

This can cause problems with caches, proxies, and search engines, to say the least. I suggest removing the language check in _pathauto_alias_exists() all together to ensure there are no duplicates at all.

Dave Reid’s picture

Status: Closed (fixed) » Postponed (maintainer needs more info)

huh? your case shouldn't allow the node with language neutral (langcode of '') to have the same alias as the english node as they would conflict. I'm not sure I see how the current code in CVS or latest release has a problem and why this patch would need to be applied

klonos’s picture

Status: Postponed (maintainer needs more info) » Closed (fixed)

I'm lost TJ :/

Can you please post an example (including some example paths) of what you propose against examples of how it is handled now? Thanx.

tjmoyer’s picture

Scenario:
There's an entry in the url_alias table:
src = 'node/123' dst = 'About' language = 'en'

I create a new node with the following:
src = 'node/124' dst = 'About' language = ''

The current query to check for existing aliases is:
SELECT pid FROM {url_alias} WHERE src <> 'node/123' AND dst = 'About' AND language IN ('', '');

This will return empty, as it's checking for neutral language and current node's language (which is also neutral). And it will go ahead and insert a new entry in url_alias for the same alias, 'About'. Now all attempts to get to About will resolved only to one of these two entries.

What I propose is to exclude the language check to ensure, no matter what mix of language you use, there will be no alias conflicts.

The other option would be to check again new node's language, language neutral, and the default language for the site.

Dave Reid’s picture

Ah, the problem is you should not be allowed to create language neutral nodes as far as I know. They always have to have some language. I think without locale module installed Drupal uses the default language code 'en' for {node}.language.

tjmoyer’s picture

We have locale installed, but not all of our content types are translatable. So the default that's being assigned to nodes without translation turned on is language neutral (or an empty entry in the url_alias table), while those created in the default language ('en') get 'en' assigned. Also, some of the content types were originally not translatable, but have been changed to be translatable, so there are conflicts between nodes within the same node type, as well.

JirkaRybka’s picture

Also all nodes existing on 5.x (before upgrade to 6.x) are language neutral - stating this just for record and to complete the info above. (This won't be a problem with newly created nodes, as the language-specific ones will go in later).

Otherwise, I believe that creation of new language-neutral nodes makes sense, whether allowed in core or not, or at the very least language neutral aliases. With CCK, with Image module, or whatever, we can have nodes whose contents is just an image, audio file, numbers, list of persons (names)... These are language neutral by nature, and having alias for such node only in some given language makes no sense.

johnv’s picture

subscribing. Having the same problem with Clone module 6.x-1.2 and Pathauto 6.x-1.4

rv0’s picture