Drupal Association members fund grants that make connections all over the world.
I know there is a separate issue created where people want a separate sitemap per i18n language () - this is a bit different I suppose. I'd like a single sitemap with all translated nodes included.
Whenever the sitemap is created, only nodes with no language, and nodes assigned to the currently selected language (of user that triggered the regeneration) are included. A quick example:
1. Create a node and specify it is in English and save it.
2. The sitemap is regenerated with all "no language" nodes, and all English nodes. No other language nodes are included.
3. Now translate that new English node to French and save the translated node.
4. The sitemap is regenerated with all "no language" nodes and all French nodes included. All the English nodes are removed and lost from the sitemap.
Now, from what I can see from a small piece of debugging,
_xmlsitemap_node_links() selects all eligible nodes, including translations. However, somewhere later the translated nodes are being filtered out and not included in the sitemap.
This leads me to:
Darren Oh says this here: http://drupal.org/node/182442#comment-676408
i18n was adding the language prefix to every URL, so we use i18n_get_lang_prefix() to get the URL without a language prefix.
This will be the situation until someone who knows the i18n code well provides a patch that can ensure that only the appropriate links have a language prefix, or can split the languages into separate site maps.
I don't understand why you are doing this. By default i18n prefixes all paths with a language prefix. Therefore the search engine should be looking for the path with the language prefix. If you don't want language prefixes for your default language, then you can patch i18n to not do it (). I have applied that patch so all my English nodes don't have a path alias, and therefore I don't have any SEO problems with article URLs that I wrote before installing i18n changing. Then I always put the language code in the translated nodes' path aliases, so they are stored in the url_alias table.
example path aliases on some nodes:
English node: /articles/new-car
German node: /de/articles/new-car (just an example, not bothered translating the words)
So I tried this experiment:
I commented out the
i18n_get_lang_prefix($result, TRUE); line that strips the language prefixes and re-generated the sitemap.
1. When English language is selected, sitemap looks correct with no language prefixes (because of i18n patch above). But no translated nodes are included. This proves the decision to strip language prefixes is a mistake because i18n should control this completely - and you can get a sitemap without language prefixes without stripping it manually.
2. When French language is selected, every path in the sitemap has a FR language prefix (except front), but no English nodes are included. Again correct behaviour aside from the missing English nodes - when browsing the site in French you want to view untranslated nodes with an FR prefix to maintain your language setting.
But the big problem is obvious there - the output of the sitemap is dependent on the language you are currently using when causing the sitemap to be regenerated.
Ideally, the sitemap should be generated using the default language, and should include all translated nodes with their language prefixes as supplied by drupal core + i18n.