I have a site that has "translations" for USA users, different from the standard English content.
Detecting and changing the locale by browser language is problematic because some other English speaking countries have their browsers installed with en-us as the default language (eg here in Australia, Microsoft does that all the time).

So I would like to be able to determine the initial language setting by using the user's ip address instead. The IP to country module makes detecting this trivial. My question is where do I need to insert the detection code?

_i18n_get_lang() looks like a good candidate.

Has anyone else done this, and are there any pitfalls? I realise that you could well have a non-English speaker using a computer while in France for example, but I'm prepared to live with that.

Comments

mr.j’s picture

I tried a few things and here's some code that I've found works nicely.

It goes in i18n.module in the _i18n_get_lang() function:

  elseif ($user->uid && $user->language && array_key_exists($user->language, $languages)) {
    $i18n_lang = $user->language;
  }
##### START INSERT #####
  elseif (module_exists('ip2cc')) {
    $country = module_invoke('ip2cc', 'get_country', $_SERVER['REMOTE_ADDR']);
    $ccode = strtolower($country->country_code);
    $i18n_lang = array_key_exists($ccode, $languages) ? $ccode : i18n_default_language();
  }
##### END INSERT #####
  elseif (variable_get("i18n_browser",0) && $lang=i18n_get_browser_lang()) {
    $i18n_lang=$lang;
  }

Note that the module will never use the browser to detect the language if you use this hack.

mr.j’s picture

Just an update that a week later, this is working perfectly for my needs.

hass’s picture

Title: Select locale by user's ip address? » Select locale by geolocation
Version: 5.x-2.2 » 6.x-1.x-dev
Category: support » feature
hass’s picture

Status: Closed (works as designed) » Active

I would never ever do such a detection again. We have used this on our site and after a few months we disabled it completely as we learned hard that we missed Google. By this technical solution you may only show English content to Google and no other content. I suggest you should never do this... otherwise you may create an extra module outside of i18n to archive this.

Marking as design as it's kicking sites to the SERP dead.

hass’s picture

Status: Active » Closed (works as designed)
mr.j’s picture

Status: Active » Closed (works as designed)

Interesting but you are probably wrong on the SERPs. We never encountered that problem because the detection only tripped the first time anyone browsed to the site, just like the default browser language detection. Once that is hit, the language is saved in a session variable and can be manually changed by the user, and nothing stops them from clicking through to a translated page in another language. So assuming you had a link to your other languages on the page (e.g. standard language/translation block) then google would have spidered them fine, and I know for sure that it did.

Perhaps you had another issue intermingled somewhere, or maybe you never displayed a language selection block (in which case google would never spider your alternate languages anyway), or maybe i18n in D6 behaves differently (have not used it).

Anyway its a moot point for me as we have abandoned i18n in favour of a Domain Access / multi-site implementation.

hass’s picture

You may misunderstood this... the user is Google. Google does not select any selectboxes nor does it keep a session variable while browsing your site. Google Robots could be in USA or somewhere else - you cannot be sure where. If Google crawl your site it's always detected as the language where the robot is currently located (and this could also change on every request that is made from different IPs) and this causes your site to become detected in this language. Google does not crawl your site at once... it hits one page now, than the seconds in a few minutes and always with a new session. This will kick you typically out of every non-US languages in SERPs.

We have had this issue and was only caused by geolocation. We know this for sure - never make only gelocation based language detection if you do not like to get killed in SERP. Geolocation in general is a nice idea for usability - but have so many side effects that make it a no-go. Use domain based or path based language detection only to solve this issues.

I have written this in localizer issue queue ~2 years ago and other also approved this. It was hard learning for us - as we also had the UX in mind first and missed google what kicked us for 6-8 months out of the German Google SERPs. :-(((

fletchgqc’s picture

Interesting thread - thanks for the info. Hass one thing I wondered whilst reading your comments - does the same principle apply when using the standard "path prefix with language fallback" option for deciding which content to show the user?

tntclaus’s picture

Hass is not completely correct (afaik he is completely incorrect). When you use lang prefix both url and domain - google will crawl all pages of all languages. Hass says that google doesn't store any session variables, but it doesn't have any locale information (like browser does). Then how it gets any localized pages in the case of location being detected by browser?

tntclaus’s picture

Status: Closed (works as designed) » Active

Noticed issue was closed, but I think this integration will be useful for some folks.

hass’s picture

Status: Active » Closed (works as designed)

Make your homework first, than you *may* understand the complexity and the major issues. This is all correct, but you may missed something.

tntclaus’s picture

Status: Closed (works as designed) » Active

Hass, you're bm.

It seems you're smart guy. Please, tell us how Google crawler language detection works in the case it is not a browser too. But I will tell you. It doesn't.

>>Google does not select any selectboxes

I am also curios about why Google crawler will not crawl mysite.com/de (fr/es/ru etc.) and so on into other localized pages, if website has links to localized front pages at default main page. Anyone can make those links at the footer section if he prefers to use select boxes at header.

Seems like you do not know that

language is determined by examining the path for a language code or other custom string that matches the path prefix (if any) specified for each language. If a suitable prefix is not identified, the display language is determined by the user's language preferences from the My Account page, or by the browser's language settings. If a presentation language cannot be determined, the default language is used.

So, there should be no problems with crawling localized pages by any search engine, if you will use alternative language detection instead of built-in browser language detection. Language by IP detection won't issue problems you faced. You probably did not use path/domain prefix, or had no links to localized front pages, so Google couldn't crawl them.

Any alternative way of language detection will be appreciated by community and I do not see any reasons to close this issue. If you have any, then provide some arguments instead of bming to the people you do not know.

hass’s picture

Now you got it - language detection with geolocation and without path prefix. It simply fails and you site goes out of every SERPs that is not english. We have been punished with a 5 languages site with geolocation enabled... ~4-5 months took it until Google have removed us completly. After we got the idea that this may have something to do with geo language and browser client language detection and the reasons behind, we removed the geo detection and came back in SERPs... this was a really hard learning as it wasted 10 thousands of lost money as we have not been in the German SERPs.

Nobody should risk this (as it will happen) and some above also confirmed this here or in other issues wherwe we also discussed this. It's absolutly difficult to analyse this type of SERPs dead...

tntclaus’s picture

>> language detection with geolocation and without path prefix

I can't see such option in Drupal 6.22 language settings. There are following options:

Language negotiation settings determine the site's presentation language. Available options include:

  • None. The default language is used for site presentation, though users may (optionally) select a preferred language on the My Account page. (User language preferences will be used for site e-mails, if available.)
  • Path prefix only. The presentation language is determined by examining the path for a language code or other custom string that matches the path prefix (if any) specified for each language. If a suitable prefix is not identified, the default language is used. Example: "example.com/de/contact" sets presentation language to German based on the use of "de" within the path.
  • Path prefix with language fallback. The presentation language is determined by examining the path for a language code or other custom string that matches the path prefix (if any) specified for each language. If a suitable prefix is not identified, the display language is determined by the user's language preferences from the My Account page, or by the browser's language settings. If a presentation language cannot be determined, the default language is used.
  • Domain name only. The presentation language is determined by examining the domain used to access the site, and comparing it to the language domain (if any) specified for each language. If a match is not identified, the default language is used. Example: "http://de.example.com/contact" sets presentation language to German based on the use of "http://de.example.com" in the domain.

The path prefix or domain name for a language may be set by editing the available languages. In the absence of an appropriate match, the site is displayed in the default language.

But if there was such an option earlier, you surely shouldn't use it, as GoogleBot provides no locale data at all.

>> We have been punished with a 5 languages site with geolocation enabled...

You were punished for disabling path prefix language detection. In no case you should serve different language content at the same url.

jose reyero’s picture

Component: Code » Blocks
Status: Active » Closed (won't fix)

It would be nice for a new module. Anyway, closing *all* feature requests for 6.x