Hi,

My drupal setup is as below:

www.domain.com --> For English
www.domain.de --> For German

XMLSiteMap base URL is configured as http://www.domain.com
English and German languages are enabled in XMLSiteMap settings page.
I am using xmlsitemap menu module to generate sitemap.

The problems which I faced are:
1. http://www.domain.com/sitemap.xml contains all English and German menus in sitemap. (should only contain English menus)
2. http://www.domain.de/sitemap.xml page can not accessed by anonymous user and receive following error:

The XML page cannot be displayed
Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.
--------------------------------------------------------------------------------
Access is denied.

3. http://www.domain.de/sitemap.xml page can be accessed by root user but the file format is not correct.
4. http://www.domain.de/sitemap.xml page can be accessed by root user but again it contains all English and German menus. (should only contain German menus)

Comments

Dave Reid’s picture

Component: xmlsitemap_menu.module » Code
Status: Active » Postponed (maintainer needs more info)

Hmm...I don't think I have any kind of multilingual restriction on the menus yet. What module are you using to use certain menus for each language?

vendeka’s picture

Status: Postponed (maintainer needs more info) » Active

I am using Drupal Core Menu module with Internationalization module.

By the way, I also tried with xmlsitemap_node module but it also has the same issues.

Dave Reid’s picture

Status: Active » Postponed (maintainer needs more info)

That's odd. When I was trying out multilingual stuff, my Drupal install didn't hide any content that wasn't my 'current' language. Maybe it's when domain negotiation is enabled, which I couldn't test. Marking back as needs more info.

vendeka’s picture

Status: Postponed (maintainer needs more info) » Active

Here is more detailed information about my setup:

* admin/settings/language/i18n
Content selection mode: Only current language.

* admin/settings/language/configure
Language negotiation: Domain name only

When creating a new node, I specify the language of content. (I do not set language setting of node as "Language Neutral")

Dave Reid’s picture

Aha...I didn't have the contrib internationalization module enabled. That would be the important point. :) I'll take a look into this again.

apaderno’s picture

Title: Multilingual Multidomain Issues » Multilingual multidomain issues
hass’s picture

Subscribe. I also reported this for 6.x-1-x, but the cases have been closed without a fix by Kiam. There are also a few more i18n issues with path based detection if I remember correctly. The main source is url() that cannot be used and xmlsitemap need to implement it's own function that behave differently.

Dave Reid’s picture

Please expand on why url() cannot be used.

hass’s picture

It returns an URL for a language, but you do not have content in this language. This will happen if a node/7 (http://example.com/foo) has an English version, but is not yet translated to German and you request a URL on http://example.de/foo. Than url() returns the German path, but having English content, but we do not like to have English content on a German site in the SE... and if you build the xmlsitemap you will add the German URL to the sitemap, but there is only English content (wrong)... this is only one reason... there are 3-4 variants I cannot remember now... need to search the issue queue :-( and try out again.

hass’s picture

I may found what I've written... very long time ago...

http://drupal.org/node/157533#comment-797857
http://drupal.org/node/157533#comment-800788 ***

With path base detection the default sitemap is also added as /de/sitemap.xml (if default site language is DE) if I remember correctly. But it need to be /sitemap.xml... and contain all nodes in all languages of the site and not only German. I haven't tested this for a long time - so this may have been solved.

Dave Reid’s picture

Ok, so I had been planning on including the language of the node in {xmlsitemap}.language column. When a chunk of a sitemap is generated for a specific language (let's say german), it would include in the SQL: WHERE language IN ('', 'de') which selects only language-neutral or German nodes. Anything wrong with that approach?

hass’s picture

Cannot say for sure, but sounds good. There are so many variants and haven't taken a look to the 2.x code... plus it's toooo long ago that I've spend days on this functionality and how it should be... sorry. The only thing I can remember is that the url() or drupal_lookup_path() function often returns something that is ok for an user surfing a site, but incorrect for the sitemap.

I hope to find some time to do a detailed test again, but it would make more sense if you know that such issues cannot occur and I'm doing a review afterwards to figure out if the behaviour is correct with all language detection modes...

vendeka’s picture

Ok, so I had been planning on including the language of the node in {xmlsitemap}.language column. When a chunk of a sitemap is generated for a specific language (let's say german), it would include in the SQL: WHERE language IN ('', 'de') which selects only language-neutral or German nodes. Anything wrong with that approach?

It seems the best approach to me but it shouldn't be limited with nodes only. Menu items should also have the same.

apaderno’s picture

Drupal core code assigns a language to nodes, but not to menus; maybe there is a third-party module that assigns a language to the menu being shown, but Drupal core code doesn't do that.

Anonymous’s picture

Menu's use t() internally unless 'title callback' is given. See http://api.drupal.org/api/function/hook_menu/6. The description always uses t().

Anonymous’s picture

Are we using menu_link_load?

apaderno’s picture

To use t() for a string doesn't mean that a language is associated with menus. In the table used to save the menu data there isn't a language field, and when you create a menu, you are not asked for a language; if there would be such possibility, one could have a menu that appear only when the current language is a specific one.

Dave Reid’s picture

Yes we are using menu_link_load in 6.x-2.x.

vendeka’s picture

To use t() for a string doesn't mean that a language is associated with menus. In the table used to save the menu data there isn't a language field, and when you create a menu, you are not asked for a language; if there would be such possibility, one could have a menu that appear only when the current language is a specific one.

I am not talking about menus but menu-items. menu-items has language option when creating. (I think this is also i18n feature but not sure)

menu_links.options field contains information about language for that menu-item:
'a:2:{s:10:"attributes";a:1:{s:5:"title";s:16:"Website Feedback";}s:8:"langcode";s:2:"en";}'

For instance, I have a menu with 6 menu-items. 3 of these items are specified as English and 3 of them are specified as German. They are only shown when the specific language called (in my case, it means specific domain)

Anonymous’s picture

I am not talking about menus but menu-items. menu-items has language option when creating. (I think this is also i18n feature but not sure)

menu_links.options field contains information about language for that menu-item:
'a:2:{s:10:"attributes";a:1:{s:5:"title";s:16:"Website Feedback";}s:8:"langcode";s:2:"en";}'

For instance, I have a menu with 6 menu-items. 3 of these items are specified as English and 3 of them are specified as German. They are only shown when the specified language called (in my case, it means specific domain)

Yikes, the eliminates the need for t(). If the translation is done at the menu editing level then there is no need for using t() at all. I suspect that i18n module is altering the menu links with a hook. This would mean that we handle i18n enabled module differently.

apaderno’s picture

I am not talking about menus but menu-items. menu-items has language option when creating.

That is not a feature that is present in a plain Drupal core installation; the feature you are talking of must be implemented by i18n.
Still, in the Drupal code table, there isn't a field for the language associated with a menu item.

vendeka’s picture

That is not a feature that is present in a plain Drupal core installation;

I am aware of this feature is not a part of Drupal core.

the feature you are talking of must be implemented by i18n.

You mean multilingual menu support for xmlsitemap should be implemented by i18n? I didn't get exactly what you mean.

Still, in the Drupal code table, there isn't a field for the language associated with a menu item.

I think earnie clarified that it is a hook used by i18n module.

Dave Reid’s picture

Issue tags: +6.x-2.0-alpha blocker

Tagging as alpha blocker.

eMPee584’s picture

i just also hit this issue.. wanted to 'quickly' implement this but pondering about it but quickly found out that a) current implementation's (2.x) db schema and api has to be thoughtfully modified and b) i have different priorities than getting my sitemap localized...
i think the best way to handle this is to add a $language parameter to hook_xmlsitemap_links and a correspondent column to the xmlsitemap table... f.e. xmlsitemap_menu module would then query for items in the enabled menus, check the language of each item and in case it's not the wanted one, check for a translated version of the node. If that's not available but language is not set as well: keep it, else through it out.
btw there's what i believe is a bug: in xmlsitemap_menu_xmlsitemap_links(), the $menus arrays *values* are used to fetch the relevant entries from the menu_links table, but the *key names* actually are the machine readable names and values are the titles..

Dave Reid’s picture

Add the db schema and a couple of lines to each hook_xmlsitemap_links() is not a big issue. It just hasn't been as high of a priority as other things.

I'm not sure what you mean by the $menus values vs keys. xmlsitemap_menu_xmlsitemap_links() uses xmlsitemap_menu_get_menus() which runs $menus = array_keys(menu_get_menus());, so only the machine-readable menu names are used.

eMPee584’s picture

well you're right dave, i'm a fool *g
line 103 contains $menus = menu_get_menus(); and that confused me, sorry. (of course the module wouldn't even work the way it does without this being correct..)

Dave Reid’s picture

Assigned: Unassigned » Dave Reid

I just added *basic* support for multilingual node sitemap selection by adding a new hook_xmlsitemap_query_alter() and implementing i18n_xmlsitemap_query_alter() inside xmlsitemap.module on behalf of i18n.module.

Dave Reid’s picture

Title: Multilingual multidomain issues » Integrate with i18n.module and add hook_xmlsitemap_query_alter()
Category: bug » feature
Issue tags: -6.x-2.0-alpha blocker +6.x-2.0-beta blocker

I'll keep slowly working on multilingual menu items and taxonomy terms, but I think this has moved from a bug report to a feature request now that we've solved the node problem. Also moving back to beta blocker for this for the remaining items in this issue.

Dave Reid’s picture

Status: Active » Fixed

I finished the implementation of i8ln_xmlsitemap_query_alter() and also I'm pretty sure I got the support for multilingual menu items and taxonomy terms as well.
http://drupal.org/cvs?commit=253976
http://drupal.org/cvs?commit=253978
http://drupal.org/cvs?commit=253980

I'm going to consider this fixed for now. Just will need some testing from all you i18n users out there.

Dave Reid’s picture

FYI I've decided to move the i18n.module integration into a separate sub-module xmlsitemap_i18n, so the base xmlsitemap module can stay trim as possible.

vendeka’s picture

I just do clean install with latest dev build with xmlsitemap and xmlsitemap_node module but unfortunately the generated sitemaps (http://www.domain.com/sitemap.xml and http://www.domain.de/sitemap.xml) contains all languages nodes in sitemap. I checked database for xmlsitemap language field for nodes and saw that node languages are ok but the generated sitemap.xml files contain all languages.

P.S. Haven't changed the status maybe latest dev build is not up2date.

Dave Reid’s picture

Yes, I just made the changes and the development build only regenerates every 12 hours automatically.

manfer’s picture

I tested last in CVS after this in a test site with i18n enabled and some nodes translated in both languages, xml sitemap internationalization enabled:

  • Tested with language negotiation with prefix. All nodes appear in the sitemap for both languages. The only difference is the default language ones have links http://example.com/blablabla and the other language one (es) are shown as http://example.com/es/blablabla. The nodes with aliases are shown with the aliases in its language while on the other language are shown as /node/xxx (that's the only difference).
  • Tested with language negotiation with domain. All nodes appear in the sitemap for both languages. The only difference is the default language ones have links http://example.com/blablabla and the other language one (es) are shown as http://example.es/blablabla. The nodes with aliases are shown with the aliases in its language while on the other language are shown as /node/xxx (that's the only difference).

In both cases I've been able to access any language sitemap as authenticated and as anonymous.

But I don't know exactly which is the objetive you want to reach and how this affect all cases.

I suppose for a multilingual site managed by domain name is great to have different sitemaps for each language which only its corresponding nodes, menus, taxonomies, ..., and then submit each sitemap to search engines as sitemaps for each different language domain.

But, is it the same situation for multilingual sites managed by prefix (http://www.example.com/, http://www.example.com/es, http://www.example.com/de)?. The domain is the same for all languages (http://www.example.com), or for a multilingual site using only user language preference. How that affect submission to search engines? I can't verify site http://www.example.com/es on search engines. The different sitemaps for every language will be submitted to search engines for the domain http://www.example.com ?

Dave Reid’s picture

Heh, so there were some major bugs, mainly the new query alter hook never being called. I'm tagging an unstable3 that actually works with multilingual support. I even wrote tests to make sure and that's how I found this wasn't actually working. :)

manfer’s picture

Depending on the option chosen in selection mode for multilingual system, every language sitemap shows only the nodes on that language, only the nodes on that language + neutral language nodes .... Now it works fine.

Tested with language negotiation by prefix and by domain.

How is managed the submission to search engines? This is something I can't test and would like to know how it would be done for a multilingual site with language negotiation by prefix.

Dave Reid’s picture

@manfer Yay! Thanks very much for testing it!

To answer your question, the xmlsitemap_engines.module will ping the search engines with all the selected-language sitemaps on admin/settings/xmlsitemap. So if you have English (default lang) and French sitemaps enabled and you have just the Google engine selected, when the sitemap is updated, your site will ping Google with:
http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/sitemap.xml
http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/fr/sitemap.xml

hass’s picture

Dave, this sounds wrong. The module should only notify google about http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/s... and no other file. The file http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/f... should never be accessible. The nodes with /fr/* should to be included in the main file http://example.com/sitemap.xml

Dave Reid’s picture

@hass: If people are using the i18n.module with *content selection settings*, then nodes with /fr/* would and should not be included in the English sitemap. That matches what happens on the actual Drupal site. If visiting the site in /fr/* mode, you don't see English content.

There is also no harm in submitting all the language sitemaps. How would Google know about the French sitemap if we only submitted the English sitemap and we don't have robotstxt.module enabled?

manfer’s picture

With language negotiation by domain looks fine to have different sitemaps and submit the corresponding sitemap to the specific domain for each language.

But I have still many doubts on the correct way for a drupal multilingual site with language negotiation by prefix.

My knowledge is not enough to discuss this. It would be nice if people having the knowledge can clarify.

By now there are two totally opposite opinions. :(

One of my doubts is:

Would really google accept:
http://www.google.com/webmasters/tools/ping?sitemap=http://example.com/fr/sitemap.xml

and ones like that?

manfer’s picture

Another issue (in case submitting that kind of sitemaps with language prefix is accepted) would be with the option selection mode for multilingual site. If it is not set to Only current language you'll finish with a lot of duplicated nodes on submitted sitemaps. I'll explain better with examples:

  • Multilingual site with language negotiation by prefix. Selection mode for multilingual site set as Current language and language neutral
    Language neutral nodes will appear on sitemaps for every language and you finish submitting those nodes a lot of times to search engines.
  • Multilingual site with language negotiation by prefix. Selection mode for multilingual site set as Mixed current language (if available) or default language (if not) and language neutral
    Again you finish with submitting neutral language nodes a lot of times and nodes with no translations would be submitted a lot of times too.
  • ...

I'm not sure but I think with language by prefix only a sitemap is needed with all nodes on it and just that sitemap submitted to search engines. Search engines would have no problem to identify language for each node.

English nodes:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

Spanish nodes:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es" lang="es" dir="ltr">

French nodes:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr" dir="ltr">

and so on.

Dave Reid’s picture

What this integration is all about is mirroring exactly what content is on the Drupal site when you view it in each language, because each language is basically it's own individual Drupal site. If you view http://example.com/fr and it includes language-neutral content, it makes sense that a sitemap on http://example.com/fr/sitemap.xml includes the links to the same content as well. This is still at the basic, but complete step of integration and could probably be improved. But right now the sitemap content matches exactly what is controlled by i18n.module.

Dave Reid’s picture

If you want all the links in one sitemap, only enable the default language sitemap and don't enable the xmlsitemap_i18n.module. Easy as that.

What I could do is not allow people to select multiple language sitemaps unless the xmlsitemap_i18n.module is enabled. That seems to make the most sense.

Dave Reid’s picture

Ok I've moved all multilingual XML sitemap features to xmlsitemap_i18n.module. So by default (without this sub-module enabled) you can only have one sitemap and all your content is in that one sitemap.

EDIT:
If the user has the xmlsitemap_i18n.module enabled, they will see a message on the "Generate sitemaps for the following languages" option: "Each language's sitemap will respect the multilingual content selection mode."

I think this is a much better approach now. Thanks for your help in reaching this manfer. :)

FYI you could have a sitemap in any 'location' of your site as per the sitemaps.org protocol. It could be at:
http://example.com/sitemap.xml
http://example.com/funky_folder/sitemap.xml
http://example.com/funky_folder/strange_folder/what_is_going_on/sitemap.xml

The only restriction is that all the links inside a sitemap should/need to reside in the same location:

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.

If people are using different domains for their Drupal language sites, they should be using i18n (and xmlsitemap_i18n) to help control what is in each site and avoid being dinged by Google for having duplicate content.

manfer’s picture

Yes I tested just now and as you say with xmlsitemap_i18n.module disable you get all nodes in the sitemap. And yes then has not many sense to select different languages on xmlsitemap settings configuration form - something you have just done while I wrote this :) -.

It is good to know if for some reason this is the case I need but still don't know.

It is not just I want all in one sitemap. If all is correct with more than one sitemap It is fine for me. But I have those doubts about the submission of sitemaps with the prefix and with the possible submission of same content more than once. If both things are fine for search engines probably I prefer the solution with more than one sitemap as that could be useful for other uses cases and not only search engines submissions.

vendeka’s picture

Hi again,

First of all, you made a great progress Dave and I want to thank you for your efforts.

I decided to test initially xmlsitemap_menu and xmlsitemap_i18n module with my multilingual setup. Below, you can see the issues I faced:

  • I enabled 2 menus for xmlsitemap as footer menu and header menu. Both footer menu and header menu have "Contact Us" menu item and this "Contact Us" menu item generated twice in generated sitemaps. These menu items have same path so I think xmlsitemap_menu module should check duplicates to prevent search engine content violation.
  • Info: Someone may ask why I add same menu item to different menus. The answer is quite simple: the increase accessibility to "Contact Us" link :)

  • I have different front pages for each language however www.domain.de/sitemap.xml contains www.domain.com URL. It should be www.domain.de.

    Info: I set homepage for each language from admin/settings/site-information by setting "Default front page" variable. However, this "Default front page" variable is not a multilingual variable as default so I added below setting into my settings.php file:

    $conf['i18n_variables'] = array(
      'site_frontpage',
    );
    

    There was a patch for it but I think still not included in release but will in near future.

Dave Reid’s picture

@vendeka:
- Issue for duplicate links: #454442: Disable duplicate links during regeneration. It's not a huge priority since the sitemap still works.
- The front page uses url() with the proper $language object parameter to generate the link of the frontpage. So I'm not sure what else we can do there.

vendeka’s picture

For the frontpage issue, I checked with xmlsitemap-6.x-2.0-unstable2 and it works as expected.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

vendeka’s picture

Status: Closed (fixed) » Active

Hi,

I checked the problem about front page url() and find out the exact problem. The front page url is gathered from the cron url and used for all language sitemaps. To illustrate:

If you run cron through www.domain.com (domain name used for English content only), all generated sitemaps include frontpage URL as www.domain.com (domain.de sitemap should contain front page URL as domain.de)

If you run cron through www.domain.de (domain name used for German content only), all generated sitemaps include frontpage URL as www.domain.de (domain.com sitemap should contain front page URL as domain.com)

To solve this issue, xmlsitemap i18n module should gather front page url from Language domain setting under admin/settings/language/edit/.

Dave Reid’s picture

@vendeka: I can confirm the same problem. I hadn't tried out the domain negotiation before. I don't understand why it isn't working however.

Dave Reid’s picture

Status: Active » Fixed

Ok I think the latest commits should be good:
http://drupal.org/cvs?commit=261478
http://drupal.org/cvs?commit=261508

vendeka’s picture

I just updated xmlsitemap and I confirm latest commits fix the issue.

Status: Fixed » Closed (fixed)
Issue tags: -multilingual multidomain, -6.x-2.0-beta blocker

Automatically closed -- issue fixed for 2 weeks with no activity.