I have a site that has "translations" for USA users, different from the standard English content.
Detecting and changing the locale by browser language is problematic because some other English speaking countries have their browsers installed with en-us as the default language (eg here in Australia, Microsoft does that all the time).
So I would like to be able to determine the initial language setting by using the user's ip address instead. The IP to country module makes detecting this trivial. My question is where do I need to insert the detection code?
_i18n_get_lang() looks like a good candidate.
Has anyone else done this, and are there any pitfalls? I realise that you could well have a non-English speaker using a computer while in France for example, but I'm prepared to live with that.
Comments
Comment #1
mr.j commentedI tried a few things and here's some code that I've found works nicely.
It goes in i18n.module in the
_i18n_get_lang()function:Note that the module will never use the browser to detect the language if you use this hack.
Comment #2
mr.j commentedJust an update that a week later, this is working perfectly for my needs.
Comment #3
hass commentedComment #4
hass commentedI would never ever do such a detection again. We have used this on our site and after a few months we disabled it completely as we learned hard that we missed Google. By this technical solution you may only show English content to Google and no other content. I suggest you should never do this... otherwise you may create an extra module outside of i18n to archive this.
Marking as design as it's kicking sites to the SERP dead.
Comment #5
hass commentedComment #6
mr.j commentedInteresting but you are probably wrong on the SERPs. We never encountered that problem because the detection only tripped the first time anyone browsed to the site, just like the default browser language detection. Once that is hit, the language is saved in a session variable and can be manually changed by the user, and nothing stops them from clicking through to a translated page in another language. So assuming you had a link to your other languages on the page (e.g. standard language/translation block) then google would have spidered them fine, and I know for sure that it did.
Perhaps you had another issue intermingled somewhere, or maybe you never displayed a language selection block (in which case google would never spider your alternate languages anyway), or maybe i18n in D6 behaves differently (have not used it).
Anyway its a moot point for me as we have abandoned i18n in favour of a Domain Access / multi-site implementation.
Comment #7
hass commentedYou may misunderstood this... the user is Google. Google does not select any selectboxes nor does it keep a session variable while browsing your site. Google Robots could be in USA or somewhere else - you cannot be sure where. If Google crawl your site it's always detected as the language where the robot is currently located (and this could also change on every request that is made from different IPs) and this causes your site to become detected in this language. Google does not crawl your site at once... it hits one page now, than the seconds in a few minutes and always with a new session. This will kick you typically out of every non-US languages in SERPs.
We have had this issue and was only caused by geolocation. We know this for sure - never make only gelocation based language detection if you do not like to get killed in SERP. Geolocation in general is a nice idea for usability - but have so many side effects that make it a no-go. Use domain based or path based language detection only to solve this issues.
I have written this in localizer issue queue ~2 years ago and other also approved this. It was hard learning for us - as we also had the UX in mind first and missed google what kicked us for 6-8 months out of the German Google SERPs. :-(((
Comment #8
fletchgqc commentedInteresting thread - thanks for the info. Hass one thing I wondered whilst reading your comments - does the same principle apply when using the standard "path prefix with language fallback" option for deciding which content to show the user?
Comment #9
tntclaus commentedHass is not completely correct (afaik he is completely incorrect). When you use lang prefix both url and domain - google will crawl all pages of all languages. Hass says that google doesn't store any session variables, but it doesn't have any locale information (like browser does). Then how it gets any localized pages in the case of location being detected by browser?
Comment #10
tntclaus commentedNoticed issue was closed, but I think this integration will be useful for some folks.
Comment #11
hass commentedMake your homework first, than you *may* understand the complexity and the major issues. This is all correct, but you may missed something.
Comment #12
tntclaus commentedHass, you're bm.
It seems you're smart guy. Please, tell us how Google crawler language detection works in the case it is not a browser too. But I will tell you. It doesn't.
>>Google does not select any selectboxes
I am also curios about why Google crawler will not crawl mysite.com/de (fr/es/ru etc.) and so on into other localized pages, if website has links to localized front pages at default main page. Anyone can make those links at the footer section if he prefers to use select boxes at header.
Seems like you do not know that
So, there should be no problems with crawling localized pages by any search engine, if you will use alternative language detection instead of built-in browser language detection. Language by IP detection won't issue problems you faced. You probably did not use path/domain prefix, or had no links to localized front pages, so Google couldn't crawl them.
Any alternative way of language detection will be appreciated by community and I do not see any reasons to close this issue. If you have any, then provide some arguments instead of bming to the people you do not know.
Comment #13
hass commentedNow you got it - language detection with geolocation and without path prefix. It simply fails and you site goes out of every SERPs that is not english. We have been punished with a 5 languages site with geolocation enabled... ~4-5 months took it until Google have removed us completly. After we got the idea that this may have something to do with geo language and browser client language detection and the reasons behind, we removed the geo detection and came back in SERPs... this was a really hard learning as it wasted 10 thousands of lost money as we have not been in the German SERPs.
Nobody should risk this (as it will happen) and some above also confirmed this here or in other issues wherwe we also discussed this. It's absolutly difficult to analyse this type of SERPs dead...
Comment #14
tntclaus commented>> language detection with geolocation and without path prefix
I can't see such option in Drupal 6.22 language settings. There are following options:
But if there was such an option earlier, you surely shouldn't use it, as GoogleBot provides no locale data at all.
>> We have been punished with a 5 languages site with geolocation enabled...
You were punished for disabling path prefix language detection. In no case you should serve different language content at the same url.
Comment #15
jose reyero commentedIt would be nice for a new module. Anyway, closing *all* feature requests for 6.x