I'm using Drupal 6.x to build a multilingual website on single codebase / database tables (detailed explanation). Since Drupal 6.x and i18n module allow to set domain (Language domain) to select current language sharing a single node table using the "language" database field on {node}, I ask if:

http://it.mysite.com/sitemap.xml
http://en.mysite.com/sitemap.xml

could contain two different sitemap, one per language. Actually, on my installation, only nodes written in one language (english) are displayed.

Maybe something like:
http://it.mysite.com/sitemap_it.xml
http://en.mysite.com/sitemap_en.xml

should be provided or, instead, a flexible views integration could allow more specific customizations (e.g. through "Node: Language = Current user's language " view filter). I don't know if a single sitemap.xml with multiple languages can be problematic to search engines, but even this alternative is better than a sitemap in a single language.

Members fund testing for the Drupal project. Drupal Association Learn more

Comments

kiamlaluno’s picture

Title: Multilingual / multisite support on 6.x or views integration » Multilingual support or views integration
kiamlaluno’s picture

Status: Active » Postponed
kiamlaluno’s picture

Component: xmlsitemap_node » xmlsitemap
Priority: Critical » Normal
Status: Active » Postponed

The development snapshot of reference is the 6.x-1.x-dev. The code changes for that version will not be backported to 6.x-0.x-dev.

erick.mattos’s picture

Version: 7.x-2.x-dev » 6.x-1.x-dev
Component: xmlsitemap » xmlsitemap_node
Priority: Normal » Critical
Status: Postponed » Active

It is really a need to have this module finding the right address of nodes thus taking in account the path prefix and whatever.
If the whole system can know the real address shown to visitors so should this module.
Thanks in advance to whoever fix this.

kiamlaluno’s picture

Component: xmlsitemap » xmlsitemap_node
Priority: Normal » Critical
Status: Postponed » Active

I made some tests with the locale.module enabled, and I noticed that all the URLs of a web site are still available also without any language prefix locale.module would add; in this case, the language is reported to be the default one set by the administrator.

For XML Site Map this would mean that the site map would be accessible from http://example.com/sitemap.xml, http://example.com/it/sitemap.xml, http://example.com/es/sitemap.xml, etc...

What do you think the module is supposed to do for any of such requests? Should it list only the content that is in the language requested (italian for http://example.com/it/sitemap.xml, etc...)? If this is the case, when would the module report the content for all the languages?
In my opinion, it should list only the content nodes in the same language Drupal discover to be the one requested (locale.module simply checks if there is any prefixes associated to a language); in the case the site map is requested in the default language, XML Sitemap should have an option to make the administrator decide if in this case he wants only content nodes in the default language, only neutral language content nodes, or the content nodes in any languages (or a combination of them).

Bear in mind that submitting http://example.com/sitemap.xml to a search engine doesn't automatically make it search for http://example.com/it/sitemap.xml.

Any suggestion is welcome.

kiamlaluno’s picture

Status: Active » Postponed (maintainer needs more info)
Darren Oh’s picture

I recall deciding to postpone this feature after considering the complexity of the administration page which would be required to allow the submission URLs to be edited separately for each language. Other than that, it should be fairly simple.

chirale’s picture

Maybe a XML Sitemap Views integration based on RSS feed behavior can do the magic: in that case, you can define path directly on view panel, filtering languages using "Current user's language" (it means that only a single view is needed for all languages, varying on language path). I don't know how harder can be that integration, but building on RSS Feed Views integration maybe can speed the task.

kiamlaluno’s picture

@Darren Oh: there is no need to set a different URL for the site map for every language; with locale.module enabled, and with the code as it is now, the module gives a site map back for http://example.com/it/sitemap.xml, http://example.com/es/sitemap.xml, http://example.com/eo/sitemap.xml, etc...
The module should just check what language Drupal has set, if it's the default language, and act accordling.

What "act accordling" means must be cleared defined. Like I reported in my previous post, it must be defined what URLs must be put in the site map when a specific language is set, and when it is used the default language.

kiamlaluno’s picture

@chiral: there is no need to use Views integration to have a different URL for every languages defined; if a module already uses its own menus (i.e., http://example.com/menu-url), it has the possibility to answer to URLs like http://example.com/it/menu-url, http://example.com/es/menu-url.
That happens when locale.module is enabled (otherwise it would have sense to support different languages); the module just needs to check what language Drupal detects (basing on the URL used to access a resource), and change its output basing on it.

Correct me if I am wrong, but Views is not able to change the output of a view basing on the current language.

erick.mattos’s picture

Status: Postponed (maintainer needs more info) » Active

Let me clarify the needed behavior:

This module is mainly designed to let searchbots find website pages.
The correct behavior is showing all pages marked of any language in the way people set on their multilingual settings. In my particular case, prefix en-us for English and nothing for Portuguese like http://www.somesite.com/en-us/page and http://www.somesite.com/page respectively.
Some sites use http://brasil.somesite.com/page and http://us.somesite.com.
The fact is that any node has its particular address and should be on XMLsitemap for cataloging by bots.
The final result should be:

http://www.somesite.com/
http://www.somestie.com/en-us
http://www.somesite.com/page
http://www.somesite.com/en-us/page
...

Or in the second case:
http;//brasil.somesite.com/
http://us.somesite.com/
http://brasil.somesite.com/page
http://us.somesite.com/page
...

The buggy behavior at now is that this module generates the sitemap for the last language edited only and without any prefix.

Regards to all.

kiamlaluno’s picture

Component: xmlsitemap_node » xmlsitemap

I am changing XML Sitemap to support the language settings.
Supporting the language settings means that, i.e., http://example.com/it/sitemap.xml will contain just the links to nodes whose language is Italian, and http://example.com/es/sitemap.xml will contain the links to nodes whose language is Spanish. This is because a site map that is located in http://example.com/it/sitemap.xml cannot contain links starting with http://example.com/es/.

Florian’s picture

I think the best approach is to use a sitemapindex showing both sitemaps. This is what I have done for years for a multisite installation:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://www.puzzle.ro/sites/all/modules/gsitemap/gss.xsl" ?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">

<sitemap>
  <loc>http://www.puzzle.ro/en/sitemap.xml</loc>
</sitemap>
<sitemap>
  <loc>http://www.puzzle.ro/ro/sitemap.xml</loc>
</sitemap>

</sitemapindex>

So, the idea is to generate a sitemapindex.xml which includes all sitemaps.

kiamlaluno’s picture

That is not practical, as that means to generate all the links in all the languages at the same time; already there is a problem creating the site map for web sites with a lot of Drupal pages, creating all the links for all the active languages would make the situation worse.

The solution proposed is not suitable when for every language is associated a different domain. In such case, adopting the solution proposed, the site map chunk URLs would point to a different domain than the one where the site map file is seen; even if the site map chunk URLs would be the same of the site map URL, there would be the problem of having the site map links to point to different domains that the site map domain.
In both the situations the site map protocol says that the links must be discarded.

kiamlaluno’s picture

Assigned: Unassigned » kiamlaluno
Status: Active » Postponed
jack in the box’s picture

Just subscribing to follow and also I would like to add that there is multilingual sites with different domain names and also multiple sites on the same Drupal installation, the other site shouldn't serve the default domain's site map. Multilingual sites can be either of the following:

http://www.site.net/en/
http://www.site.net/pt/

or

http://en.site.net/
http://pt.site.net/

or

http://www.site-in-en.net/
http://www.site-in-pt.net/

Looking forward to the solution of this.. Cheers..

vendeka’s picture

Subscribing to follow. I am looking for a solution to generate different sitemap for domain names on same Drupal Installation like below:

http://www.website-english.com/sitemap.xml --> Sitemap of English Nodes, Menus, etc
http://www.website-german.com/sitemap.xml --> Sitemap of German Nodes, Menus, etc
http://www.website-russian.com/sitemap.xml --> Sitemap of Russian Nodes, Menus, etc
http://www.website-chinese.com/sitemap.xml --> Sitemap of Chinese Nodes, Menus, etc

mlondon77’s picture

I'm guessing you already understand the issue, but I'll post just in case.

I also have a bi-lingual website in English and Spanish. The links to the English pages have no prefix and are located at www.beachcondoplayadelcarmen.com

The Spanish pages have the prefix /es/sample_page. However in the sitemap located at www.beachcondoplayadelcarmen.com/es/sitemap.xml, the links show up without the prefix. So for example where the URL should be:

http://www.beachcondoplayadelcarmen.com/es/ligas_playa_del_carmen

it's showing up as:

http://www.beachcondoplayadelcarmen.com/ligas_playa_del_carmen

As a result, Google does not find these pages and I'm showing up with errors in my Google sitemap analytics for all my pages in Spanish.

Hopefully, this issue is being addressed?

Thanks for all your work on this module.

kiamlaluno’s picture

The issue will be addressed. I just set the issue like postponed because I am changing other issues which need to be fixed, like the fact the project modules don't control if the link being added in the site map is already present.

I am currently making commits to a branch that is not associated with any release.

mlondon77’s picture

Component: xmlsitemap » Code

Thanks again for your word on this module. All the best for the New Year.

ThaboGoodDogs’s picture

Workaround? Anyone know of a quick and dirty workaround for this problem. A little php script that goes out and looks at the regular (english) sitemaps.xml file and then sees if other languages exist and creates the appropriate sitemap.xml for that particular language. Doesn't seem like it'd be too hard to create something like this? I suppose I could hand code a file and upload it to my site, not a very elegant solution.

Thanks again for this great module, probably one of the most important modules ever created - all bow to the Google gods ...... :)

Cheers
Magrathea

yo-1’s picture

Just a remark / question about how a multilingual sitemap will be implemented. Have you considered the option of using multiple indexed sitemaps:

http://www.sitemaps.org/protocol.php#index

Just curious and eager to learn how this will wil work once it's ready....

kiamlaluno’s picture

That is already used to spit the site map when the number of links is greater than the number of links set in the module settings.

It cannot be used for multi-language sites because a site map at http://example.com/it/sitemap.xml cannot contain links like http://example.com/node/1.

yo-1’s picture

hmm, I already thought this solution was too good and easy to be true.

But looking at it from top-down perspective: Will following approach also not be feasible:

Main sitemap index at //example.com/sitemap.xml (so at 'root'-level, not at language level), that refers - one per language - to a //example.com/sitemap1.xml containing all items for the first language, as sitemap2.xml for the second language, etc.??

kiamlaluno’s picture

Site maps cannot contain more than 50,000 links; site map chunks are already used to avoid the site map contains more than 50,000 links, or the number of links set in the module settings.

yo-1’s picture

I'm sorry, but I don't really see why the 50.000 links-limit would be a reason to not choose such solution.
As an indexed sitemap can contain at max. 1.000 sub-sitemaps, I would think there is room for both:
- a split per language used;
- a split at each 50.000 links within a language.
Just a humble opinion...

kiamlaluno’s picture

That is because it's not possible. The module outputs a site map also for URLs like http://example.com/it/sitemap.xml, which cannot contain links for other languages.

valthebald’s picture

FileSize
2.19 KB

Someone has asked for a quick and dirty solution? Try attached patch against 6.x-1.x-dev release.
What it actually does is prepend language suffix before sitemap cache file name, and add " WHERE language={lang} " when searching for nodes.
This is VERY quick fix, better use it with care.
I plan to spend some time and improve it if necessary.

valthebald’s picture

Status: Postponed » Needs review
kiamlaluno’s picture

Status: Needs review » Postponed

The patch seems made for 6.x-0.x-dev, and not 6.x-1.x-dev.
6.x-1.x-dev doesn't create a sitemap.xml.gz file anymore (see the CVS repository for the DRUPAL-6--1 branch of xmlsitemap.module).
Therefore, the patch you attached needs to be changed.

Please, don't change the status of the report.

eliosh’s picture

subscribe

eliosh’s picture

I didn't find my comment in other issues, and because the problem is very similar to this sitemap issue (location of file sitemap.xml for a 3° lvl domain or subdirectory), i write it here: integration with subdomain module. With drupal 6 subdomain module is very useful and stable, so it'll be great to see those two module collaborate each other.

Just my 2€c :-D

kiamlaluno’s picture

Title: Multilingual support or views integration » Multilingual support or Views integration
Priority: Critical » Normal

@eliosh: This report is about multilingual support, or Views integration. If you want that your feature request can be seen by others, it's better if you open a different feature request; if other people will show their interest on this feature, it's possible it gets implemented.

eliosh’s picture

Ok, i'll open ASAP.
Thanks

valthebald’s picture

FileSize
6.82 KB

patch against 1.x-dev version (database schema 6115)
Basically, patch adds 'language' column to xmlsitemap table, and appropriate WHERE clauses.
Only xmlsitemap_node module changed to work in multilingual environment.
P.S. Please consider this patch as "quick and dirty solution", I personally wouldn't recommend it for production-level sites, but... that's why this branch has '-dev' in it's name, no?
Anyway, it probably can serve as starting point for full multilingual support in xmlsitemap module.
And - just to clarify - patch assumes that module installed in sites/all/modules directory

2xe’s picture

subscribe

I'm also in need of sitemaps for our multidomain setup; I get the point that there are some difficult issues regarding setups with prefixed paths -- but sitemaps for multidomain setups should be fairly clean and simple to generate, shouldn't they?

bennos’s picture

i have followed the thread a little. I do not understand, why a sitemap should be divided in languages.
Every SUMA creates his own categories and language scan. If you say this are all greek page and this are all english pages. Google does not trust you and determines the language by himself.

For me there is no reason, why a sitemap should be divided in different languages on a mulilanguage site.

kiamlaluno’s picture

See what http://sitemaps.org/protocol says about the links that can be placed in a site map.
To make it short, if the site map appear at http://example.com/it/sitemap.xml, the site map cannot contain links like http://example.com/eo/esperanto.

2xe’s picture

But it seems you should be able to deliver a sitemap including all links from all languages from the root level?

kiamlaluno’s picture

That could be possible, in a multi-lingual site, just in one case: when the language negotiation is set to none; in the other cases, the URL of any menu callbacks is changed to reflect the language used.

Basing on the language settings, the URL can be changed to include a prefix for the language, and the site map would have an URL like http://example.com/it/sitemap.xml; in the case it has been selected a different domain for each language, the URL for the site map could be http://it.example.com/sitemap.xml. In both the cases, the links reported in the site map cannot be something like http://example.com/es/articulo/1, or http://eo.example.com/artikolo/2.

2xe’s picture

Different domains should, and must be treated as individual sites; Nobody should want to mix URLs from en.example.com and de.example.com - they are completely different sites, althought they may provide content from the same database/installation. So to be clear; a multilingual site with domain name negotiated language setting, should provide a single sitemap for each domain, with no refererences whatsoever to any documents belonging to any of the other languages.

I really don't care about the path prefix issue - but I don't get why you would want to put the sitemap in the prefixed path www.example.com/de/sitemap.xml, and not just at root www.example.com/sitemap.xml, containing all URLs for all languages. I guess this was what bennos meant too.

kiamlaluno’s picture

That is not something XML Sitemap can control.

When the locale.module is enabled, and when there are more than one languages set (which is the case for multilingual web sites) any URLs used by Drupal with have a prefix for the language (supposing that Drupal is set to use a language prefix).
If in the Drupal-powered site, the active languages are Italian, and Spanish, the site map would be accessible from http://example.com/it/sitemap.xml, and http://example.com/es/sitemap.xml.

About the different domains, if they use the same database tables, XML Sitemap cannot know which is the language the domain is associated to. Every URL is generated from url(), which could change the URL basing on the language settings (and the language currently set).

Said that, I can add a setting that allows to add the links for all the languages content to the default language. This means that when the requested site map is at http://example.com/sitemap.xml (for which the language set is the default), the site map will contains the links for the contents in all the languages (if the administrator user has set XML Sitemap to do so).
If with that setting on the site map is not correct, the administrator user can always turn the option off.

2xe’s picture

> About the different domains, if they use the same database tables, XML Sitemap cannot know
> which is the language the domain is associated to. Every URL is generated from url(), which could
> change the URL basing on the language settings (and the language currently set).
I don't understand why XML Sitemap can't know which domain a language is associated to; at all times the global $language variable holds information about the current language and domain. The localization configuration can easily be extracted as well. Drupal knows where to deliver, and so should XML Sitemap.

> Said that, I can add a setting that allows to add the links for all the languages content to the
> default language.
You're back to path prefixes, right? "all URLs in a Sitemap must be from a single host" - so this wouldn't
be allowed for a multidomain setup.

> This means that when the requested site map is at http://example.com/sitemap.xml (for which the
> language set is the default), the site map will contains the links for the contents in all the languages
> (if the administrator user has set XML Sitemap to do so). If with that setting on the site map is not
> correct, the administrator user can always turn the option off.
If this is regarding prefixed paths, this is exactly what I suggested.

Maybe we should split this issue in two? One for multidomain setups and one for prefixed paths?

kiamlaluno’s picture

Title: Multilingual support or Views integration » Multilingual support
  • I will clarify what I mean; to check if a URL is of the same domain of another (which would mean to compare all the URLs to each other) is more expensive than to check if the language associated to the link being added is the language currently set by Drupal.
    The problem then is not related with the domain being used, but with the absolute URL; to know if the domain is the same it's just half of the problem, as Drupal could be set to add the language prefix (two URLs could be of the same domain, but have a different language prefix).
  • The correct sentence should have been to add the links for all the languages content to the default language site map. The default language site map is the only one without language prefix; therefore, it can contains all the links of the site.
    The language prefixes are one of the options locale.module has; XML Site map must then be able to work also with that option. If you don't use language prefixes, it doesn't mean XML Sitemap should not consider both the possible options.
  • The additional option I am talking of would be useful in the case the admin user submits to the search engine the site map, and it declares to the search engines that the only site map is at http://example.com/sitemap.xml. In that case, it is probable he wants to see all the links in the default language site map; if he doesn't want this to happen, he simply doesn't select the checkbox for the setting.
    If the setting must not be available when each language is associated with a different domain, that is an implementation detail.

There is no need to split the report; XML Sitemap must work with whichever the language settings are.

P.S.: The possible options for the language are three; I didn't mention the third one (no language negotiation), because with that option the URLs would not be altered (like it happens with a Drupal site which has locale.module disabled).

2xe’s picture

Any way you solve this - it will get much more expensive than your current solution, at least for the multilingual sites. When we add the first new language to our site, we close to double the amount of pages we provide. So, in principle - the whole job that is done now, has to be repeated for every language. I don't think there are that many shortcuts you can make; every path needs to be looked up for every language.

I haven't reviewed the new code and scheme, but after a quick look - I would say every table that stores links will need a language field...

kiamlaluno’s picture

Actually, the language is associated just to nodes, or path aliases.

2xe’s picture

Ok - good. What will be the path forward on this issue? Will you attempt an implementation of multilingual support or will it remain postponed?

kiamlaluno’s picture

I always added something to implement the multilingual sites. If you note, the xmlsitemap table already has the language field.

2xe’s picture

Could you give us an idea of how long you will postpone this issue? (weeks, months...)

kiamlaluno’s picture

The correct question would be how much it will take for the feature to be completely implemented.
I think in a month, considering I am also fixing some problems XML Sitemap has in generating the site map (memory exhausted, PHP time outs), and that take the precedence.

jack in the box’s picture

can we test or implement any code on the way of completing this?

D -18 days & counting.. =)

kiamlaluno’s picture

The code now has a setting more that allows the site map for the default language, which is the language that doesn't have any path prefix added, to contain the links for all the languages; the setting is not used when Drupal uses a different domain for each language, or when Drupal doesn't use any mechanism to detect the language to use.

2xe’s picture

Hey! Good for people who use prefixed paths! ;)

What's the plan/status for installations with language negotiated by domain name?

2xe’s picture

Sorry to stress this one, but will you attempt an implementation for domain name negotiated multilingual sites? If not, it's best for us to get to know this now rather than later...

kiamlaluno’s picture

The code is able to handle both the path prefix, and the domain name negotiations.

kiamlaluno’s picture

Status: Postponed » Needs review

The code has been implemented from some weeks; it should now require to be reviewed.

rondev’s picture

I use "Path prefix with language fallback.". And only one language nodes are taken in account despite the option "Add all the links to the default language site map".

kiamlaluno’s picture

How many languages are enabled, and what is the default language?

rondev’s picture

3 languages enabled ("en" "fr" "es") and "en" is default.

kiamlaluno’s picture

I enabled 3 languages (en, it, la) with en being the default language; I then created content in two languages (it, la).
When I check the site map, the links to the content in both the languages appear in the site map.

I must say that the site map doesn't show immediately the new content, but I had to first force the update of the site map content by visiting the language settings page, and changing the options there (the language negotiation option is set to Path prefix with language fallback.

rondev’s picture

I noticed that when reinstalling xmlsitemap, I have the following message when about 80% of the process:
An error occurred. /fr/batch?id=64&op=do <br /> <b>Fatal error</b>: Allowed memory size of 33554432 bytes exhausted (tried to allocate 6728 bytes) in <b>/example.com/www/includes/image.gd.inc</b> on line <b>190</b><br />
I don't know if there is a link with my issue.

I've updated language negotiation page, language list, empty caches, cron. nothing changed.

2xe’s picture

I've tested this on a multilingual site with domain names; and it seems to be working very smoothly... I have tested with three languages, and all of them have separate sitemaps with (what looks to be) the correct content. I have around 16000 links in the sitemap, so I guess I really won't get the true answer before google has checked my links against the sitemap ;)

Thank you - great work!

Btw. when will it move from -dev to release?

earnie’s picture

Status: Needs review » Fixed

Based on #62 marking fixed. If there is still an issue please change the status to active.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

bobighorus’s picture

I've created a related topic in the forum that maybe can be helpful.
The thread is located here: http://drupal.org/node/467838 .

nicorac’s picture

I have a multilanguage site where option configured like this:
Languages: it, en
Default lang: en
Language negotiation: Path prefix with language fallback
XMLSiteMap version: 6.x-1.0-beta6

I set the path prefix for the default language too, so for English it's not blank but en.
The option "Add all the links to the default language sitemap" doesn't work because when I request the file http://www.example.org/sitemap.xml, my browser is immediately redirected to http://www.example.org/en/sitemap.xml, thus it can't contain other ("it") links as already said.

Then I created a static sitemap index file and I saved it into my website root, so it's directly accessible through the URI http://www.example.org/sitemap.xml :

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
			  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
			  xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
  <sitemap>
    <loc>http://www.example.org/en/sitemap.xml</loc>
  </sitemap>
  <sitemap>
    <loc>http://www.example.org/it/sitemap.xml</loc>
  </sitemap>
</sitemapindex>

Said that now comes the bug and a request:

BUG
I have an error from Google; it downloads the index correctly, and so it does for the two languages sitemaps.
It then complains about the first link (homepage) in each one: http://www.example.org/en and http://www.example.org/it (please note the missing trailing "/").
Google error: This url is not allowed for a Sitemap at this location.
If I submit the two languages sitemaps separately through webmaster tools they are accepted with no errors
Is there a way to fix it? An I missing something?

REQUEST
Which is the sitemap automatically submitted when a content changes?
Is it the one related to the content language?
What about untranslated contents?
Is my index file useless?

Sorry for being so long...

kiamlaluno’s picture

Assigned: kiamlaluno » Unassigned

It would be better if you would open a new report as this one, which was valid before beta6 was created, is too old. It is also a closed report that nobody will read anymore.