Bug description:

On multilingual sites, when caching is enabled and language negotiation is set to "Path prefix with language fallback", page caches are set to the first visitor's browser language preferences in cases of fallback. All subsequent visitors get pages in that first visitor's language rather than their own.

Cause:

Setting the language based on browser language means there are multiple languages in which a particular page/path can be served. However, since page caching is determined on the basis of the request URI, only one cache can be set per page/path.

More generally: The Drupal cache system doesn't support multiple versions of a page per URL. Any time a particular URL may be served in more than one version, the first visitor's version will determine the cache.

Steps to reproduce:

See http://drupal.org/node/330156#comment-1116611

Potential fixes:

-------

1. Drop browser language detection support (or limit it to authenticated users).

This approach would restrict browser language detection to authenticated users, who don't get late page caches, or else remove the "Path prefix with language fallback" option altogether.

Doing so would address the issue but lose some valuable functionality.

2. In the case of a language fallback, forward to the new path, including the correct prefix.

By forwarding, we could ensure that the path matches the language of the page served.

This approach would mean:

  • Move the DRUPAL_BOOTSTRAP_LANGUAGE bootstrap phase to come before DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE. This is needed because we need to determine a potential language/path mismatch before consulting the cache.
  • In the case that the path doesn't include a prefix for the negotiated language, abort bootstrapping and goto a prefixed path.

3. Include language information in the cache key.

If we determine language before caching, we can include language information in the cache key and so have language-appropriate cache results.

This approach would mean:

  • Move the DRUPAL_BOOTSTRAP_LANGUAGE bootstrap phase to come before DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE.
  • In page_get_cache() and page_set_cache(), call a helper function to add the language information to the cache key. E.g., append something like 'cache_language=' . $language->language.

-------

I think all these options are worth considering, as well as other approaches others may come up with.

Options 2 and 3 are significant API changes. They represent two different basic approaches. Option 2 reflects the position that we should never have two different versions of the same URL. Option 3 assumes that having two different versions of the same URL should sometimes be supported. (See this related issue/discussion: #282191: TF #1: Allow different interface language for the same path.)

And even if we solve late page caching, we still have early page caching to worry about. Neither option 2 nor option 3 addresses the problem presented by DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE. If we have file-based caching, invoked before we have database access etc., how can we negotiate language ahead of invoking that cache so we know which version of a page we're concerned with?

Files: 
CommentFileSizeAuthor
#139 drupal-cached-language-fix-339958-139.patch624 bytesnodoxi
FAILED: [[SimpleTest]]: [MySQL] 40,373 pass(es), 14 fail(s), and 0 exception(s). View
#87 browser_redirect.zip1.54 KBplach
#69 cache_negotiation.patch4.99 KBcatch
Failed: 10700 passes, 6 fails, 0 exceptions View
#66 cache_negotiation.patch9.63 KBcatch
Failed: 10702 passes, 8 fails, 0 exceptions View
#64 cache_negotiation_D6.patch752 bytescatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation_D6.patch. Unable to apply patch. See the log in the details link for more information. View
#63 cache_negotiation.patch4.13 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Setup environment: failed to drop checkout database. View
#57 cache_negotiation.patch5.41 KBcatch
Failed: 10700 passes, 6 fails, 0 exceptions View
#57 cache_negotiation-339958-D6.patch1.68 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation-339958-D6_0.patch. Unable to apply patch. See the log in the details link for more information. View
#56 cache_negotiation-339958-D6.patch5.5 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation-339958-D6.patch. Unable to apply patch. See the log in the details link for more information. View
#54 cache_negotiation.patch9.63 KBcatch
Failed: 10702 passes, 8 fails, 0 exceptions View
#51 cache_redirect.patch9.83 KBcatch
Failed: 10703 passes, 7 fails, 0 exceptions View
#48 cache_redirect.patch9.86 KBcatch
Failed: Failed to install HEAD. View
#45 cache_redirect_339958-D6.patch6.52 KBDavid Lesieur
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_redirect_339958-D6.patch. Unable to apply patch. See the log in the details link for more information. View
#44 cache_redirect.patch9.94 KBcatch
Failed: 9291 passes, 3 fails, 3 exceptions View
#42 cache_redirect.patch9.94 KBcatch
Failed: Failed to run tests. View
#40 cache_redirect.patch9.75 KBcatch
Failed: Failed to apply patch. View
#39 language_redirect_backport_d6.patch6.24 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch language_redirect_backport_d6.patch. Unable to apply patch. See the log in the details link for more information. View
#36 cache_redirect.patch9.75 KBcatch
Failed: 8030 passes, 0 fails, 12 exceptions View
#35 cache_redirect.patch9.4 KBcatch
Failed: Failed to apply patch. View
#32 language_redirect.patch8.95 KBcatch
Failed: Failed to apply patch. View
#28 failing_language_test.patch2.92 KBcatch
Failed: 7694 passes, 2 fails, 0 exceptions View
#27 language_redirect.patch6.22 KBcatch
Failed: Failed to apply patch. View
#25 language_redirect.patch5.59 KBcatch
Failed: Failed to apply patch. View
#20 page_cache.patch8.33 KBcatch
Failed: 7659 passes, 1 fail, 0 exceptions View
#18 page_cache.patch8.29 KBcatch
Failed: 7588 passes, 17 fails, 0 exceptions View
#14 page_cache_1.patch8.42 KBcatch
Failed: 7659 passes, 1 fail, 0 exceptions View
#12 cache_language_test.patch3.19 KBcatch
Failed: 7487 passes, 4 fails, 0 exceptions View
#10 page_cache_d6.patch6.42 KBnedjo
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch page_cache_d6.patch. Unable to apply patch. See the log in the details link for more information. View
#9 page_cache.patch8.42 KBnedjo
Failed: 7641 passes, 1 fail, 0 exceptions View
#5 page_cache.patch6.58 KBcatch
Failed: 7659 passes, 1 fail, 0 exceptions View
#4 page_cache.patch5.31 KBcatch
Failed: 7640 passes, 1 fail, 0 exceptions View
#1 language-page-cache.patch5.31 KBnedjo
Failed: 7421 passes, 6 fails, 0 exceptions View

Comments

nedjo’s picture

FileSize
5.31 KB
Failed: 7421 passes, 6 fails, 0 exceptions View

Here's a draft patch for the third approach I outlined above.

Gábor Hojtsy’s picture

Status: Active » Needs review

In Drupal 6, to keep the caching keys intact, we should redirect to the proper language URL instead when identified based on browser or user prefs (case 2 above). I am not sure what people think about applicability of that to Drupal 7.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

FileSize
5.31 KB
Failed: 7640 passes, 1 fail, 0 exceptions View

drupal_get_cache_id() was missing a return. Only failure remaining is the hook_boot() hook_exit() test now.

catch’s picture

Status: Needs work » Needs review
FileSize
6.58 KB
Failed: 7659 passes, 1 fail, 0 exceptions View

And the hook_boot/exit test had a bad check for page cache id, replaced it with a similar assertion to the page cache test itself. So back to code needs reviews. Will try to benchmark this next, see what the impact is.

catch’s picture

This is on a vanilla install of HEAD with no content, all numbers are requests per second.

No measurable difference without page cache, as you'd expect.

Aggressive caching, APC disabled:

HEAD:
66.29
67.31
67.98

Patch:
66.65
68.43
68.59

With APC enabled:
HEAD
213.60
215.24
214.45

Patch:
211.05
216.68
214.73

Normal cache:
HEAD:
190.17
196.45
191.77

Patch:
188.93
194.83
188.51

As far as I can see, no appreciable difference either way despite the extra include.

c960657’s picture

+ return $base_root . request_uri() . $language->language;
I suggest adding a space (or some other character not valid in URLs) before the language code to avoid conflicts in rare cases where one language code is a substring of another.

Example:
URL=http://example.org/foo language=es-sv (Spanish, as spoken in El Salvador)
URL=http://example.org/fooes- language=sv (Swedish)

nedjo’s picture

Thanks catch!

If we're introducing the ability to set cache keys enabling different versions of the same page, language is only one of the possible reasons this might be desired.

So, instead of hard-coding language into the page cache key, maybe we introduce a method for setting the page cache key.

And then rework the order in which page_get_cache and hook_boot are called.

If we want modules to be able to affect the cache ID, we need to call hook_boot before fetching the cache. Our current order has page_get_cache called before hook_boot. This order seems designed to allow calling of hook_boot when no cached page is found even if aggressive caching is enabled. But why, since we explicitly say that hook_boot is incompatible with aggressive caching?

I'll look at working that into the next patch iteration.

nedjo’s picture

FileSize
8.42 KB
Failed: 7641 passes, 1 fail, 0 exceptions View

Changes in this patch as per #8:

1. Introduce a method for setting the page cache key. Call this method in language negotiation. Resulting page cache keys will look like:

http://example.com/node/21?language=en

2. Change bootstrapping in DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE to call hook_boot before fetching the cache, unless aggressive caching is enabled, in which case call it only after looking for the cache. This way, modules can alter the page cache key if e.g. they use a different method for language negotiation at hook_boot, or otherwise provide different versions of a given page.

@c960657: thanks. This iteration adds a key as well as a value and so shd address potential conflicts.

nedjo’s picture

FileSize
6.42 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch page_cache_d6.patch. Unable to apply patch. See the log in the details link for more information. View

Here's a patch for D6 based on Gábor's suggested approach (my original approach 2) of redirecting to language-prefixed paths for Drupal 6 to avoid needing the change cache keys.

Gábor, can you elaborate on the problems that changing the key would cause?

We need to initialized the language system before consulting the cache. And, to do so, we need to initialize the variable system.

I've introduced a new global variable, $language_redirect, that is set to TRUE if user preference or browser settings have determined language. If so, we first call hook_boot - allowing modules to affect the outcome, e.g., by implementing their own language determination - and then conditionally redirect.

The redirection has to take into account the default language. Unprefixed paths are reserved for the default language. We only redirect if we are at a language other than the default one.

And we need to determine the normal path of $_GET['q]. This part of the patch seems particularly awkward--probably I'm overlooking a cleaner way to do this.

catch’s picture

The main reason I can think why changing the caching keys would be a problem is immediately after update - leading to a lot of stale keys in the cache_page table, and having to rebuild the cache. While re-building the page cache is expensive, it's a one-off expense, and we'd probably only need to do a cache_clear_all(NULL, 'page') on sites with language determination - the others could be left to slowly clear out and rebuild their page caches over time.

Ideally keys would be exactly the same for monolingual sites - it looks like #9 already accounts for that. While rebuilding the page cache is expensive, that's a one-off expense. Adding the redirect means two bootstraps for every user who needs to be redirected - although I'm not at all clear how many of such redirects might happen on multilingual sites in day to day operation. So, if we could decide whether its relatively safe to change the cache keys for multi-lingual sites (which are the ones affected by this anyway), then it would seem ideal to go for that approach.

Having said that, in both recent patches I'm concerned about moving hook_boot() around, especially for Drupal 6 - and having its execution change dependent on locale settings. Seems more potentially fragile than just going ahead and changing the cache keys. While it makes sense to leave things open for modules to intercept the page cache key setting and generation, some other modules use hook_boot() for completely different purposes, and we'd be adding that weight to normal cached pages for everyone. For Drupal 7 we can document the change (or even add another hook_intercept_page_cache()).
.

catch’s picture

FileSize
3.19 KB
Failed: 7487 passes, 4 fails, 0 exceptions View

Here's an initial test to confirm the original bug - it's not working properly yet, I had the language negotiation picking up browser language at one point, but doesn't seem to be any more...

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

FileSize
8.42 KB
Failed: 7659 passes, 1 fail, 0 exceptions View

hmm, thought .txt didn't get tested (it's supposed to fail, but not those specific failures). reattaching #9.

catch’s picture

Status: Needs work » Needs review

And changing status.

c960657’s picture

I suggest using something other than ?, e.g. space, for separating the URL and the remaining part of the cache key. Otherwise the key looks like a URL, but not the actual URL being cached. Using space as separator will make it clearer that the key is actuall a URL and some more.

Xano’s picture

I don't know how much of the pages are cached, but there's a good chance this approach will conflict with #282178: Language negotiation overhaul, since the patch in this issue relies on Drupal having one language set rather than two (content/interface). Perhaps this patch could prepare Drupal for the patch in #282178: Language negotiation overhaul by taking those two languages in account already?

catch’s picture

FileSize
8.29 KB
Failed: 7588 passes, 17 fails, 0 exceptions View

Some more benchmarks on #14.

- I unhid system_test.module so that we can see the impact of calling hook_boot() earlier. system_test.module adds a watchdog call to every page request. With HEAD I get about c. 6.45/6.6 reqs/sec with it disabled, c.6.3/6.42 reqs/sec enabled, with no page caching - very small but were it to make a difference on cached pages we'd notice I think.

This is on a completely blank Drupal install, ab -c -n500 - all numbers are requests per second.

APC disabled, no page cache.
HEAD:
6.36
6.37

Patched:
/node
6.49 reqs
6.35

Normal page cache:
HEAD:
42.51
41.35

Patch:
42.45
41.11

Aggressive:
HEAD
67.16
69.65

Patch
66.00
68.21

So no appreciable difference that I can see.

Patch applied with offset, re-rolled to include c960657's feedback - so using a space rather than '?' - also used drupal_query_string_encode() rather than http_build() there. We possibly don't need the encoding at all since this is just a cache key rather than a URL though.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
8.33 KB
Failed: 7659 passes, 1 fail, 0 exceptions View

Need to look into those failures, just re-rolled for offset this time.

c960657’s picture

I think using http_build_query() is better than drupal_query_string_encode(). The latter depends on drupal_urlencode() whose output depends on a variable, and the function is about to be modified in #339958: Cached pages returned in wrong language when browser language used, so http_build_query() is more stable.

If you use http_build_query(), you should supply the third argument – otherwise the separator character is based on the arg_separator.output PHP ini setting (for some reason, Drupal's default.settings.php sets this to & instead of &).

Damien Tournoud’s picture

Status: Needs review » Needs work

I think the only acceptable approach is #2.

The approach currently worked on here can cause issues, for a very simple reason: a paged served from the cache will be returned with Last-Modified and ETag headers, and any proxy server on the way could return the page cached in a different language.

After all, it was a main design objective of the language negotiation system of D6 to always return the same content from a given URL. After all URLs are supposed to be stable, and it's the only way HTTP/1.1 caching could possibly work.

c960657’s picture

If Drupal emits a Vary: Accept-Language header, proxy servers are able to deal with the same URL returning a page in two different languages. So I think #3 is also feasible, even with HTTP caching.

Damien Tournoud’s picture

It's technically possible, but it's always better to have stable URL: one URL should as most as possible always return the same content. I think it's easier and cleaner just to redirect the user.

catch’s picture

Status: Needs work » Needs review
FileSize
5.59 KB
Failed: Failed to apply patch. View

This is a re-roll of #10 for D7 which enforces a redirect. All tests pass but I've not done much manual testing and no new tests introduced. Note I had trouble writing a test for this earlier in the issue trying to alter ACCEPT_LANGUAGE for a page request - any ideas on how to get around that would be great.

c960657’s picture

It recently became possible to add HTTP request headers to the internal browser in SimpleTest. You should be able to set HTTP_ACCEPT_LANGUAGE using e.g. $this->drupalGet('', array(), array('Accept-Language: en')).

catch’s picture

FileSize
6.22 KB
Failed: Failed to apply patch. View

And some modifications in line with the other patch to ensure we don't do page_get_cache() redundantly if we're going to redirect anyway.

catch’s picture

FileSize
2.92 KB
Failed: 7694 passes, 2 fails, 0 exceptions View

c960657 - thanks!

Here's a test which simply demonstrates the bug in HEAD rerolled from the one I posted above. Should have two failures.

Status: Needs review » Needs work

The last submitted patch failed testing.

nedjo’s picture

Nice work catch.

Some of the complexity of this issue is related to the special use of non-prefixed languages. By default, en doesn't have a prefix. But what if the default language does have a prefix? In that case, we will again have duplicate pages. E.g., if 'fr' is the default language and has a prefix of 'fr', the pages 'node' and 'fr/node' will be the same. Should we instead forward in that case even if we haven't found a user-preferred language and are defaulting to the default language (the final fallback in language_initialize())?

To decide this question, it might help to add a test where we set the default language to one with a prefix. Do the tests still pass?

Or for simplicity should we just ignore the question of non-prefixed languages and leave it to the existing issues? #146084: Default path prefix for English (and DBTNG it), #338055: Ensure non-default languages always have a path prefix.

nedjo’s picture

Status: Needs work » Needs review

Since it confirmed the bug we're addressing, that patch failure was a success ;)

catch’s picture

FileSize
8.95 KB
Failed: Failed to apply patch. View

Here's #27 with a slightly modified test - can't work out how to get simpletest to follow the redirect properly (yet?), but it confirms that the cache is only getting set in the correct language. On default 'en' path and requiring prefixes we should probably let those other issues deal with those and confine this to the bootstrap changes.

c960657’s picture

        require_once DRUPAL_ROOT . './includes/common.inc';

The literal string should not start with a period. This generates a warning:
Warning: require_once(/home/chsc/www/drupal7./includes/common.inc): failed to open stream: No such file or directory in /home/chsc/www/drupal7/includes/bootstrap.inc on line 1152

Without this warning, simpletest automatically follows the HTTP redirect. You can get test the actual URL using $this->assertEqual($base_url . '/fr', $this->getUrl()).

The patch actually redirects to http://example.org/fr/node rather than just http://example.org/fr.

You may use the $this->drupalGetHeader('ETag') trick used in BootstrapPageCacheTestCase to verify that the page isn't just saved to the cache but also served from the cache on the second page load.

c960657’s picture

Status: Needs review » Needs work
catch’s picture

Status: Needs work » Needs review
FileSize
9.4 KB
Failed: Failed to apply patch. View

Modified test to check for Etag headers and minus the extra dot - all passes. Thanks again for the in depth review c960657.

catch’s picture

FileSize
9.75 KB
Failed: 8030 passes, 0 fails, 12 exceptions View

Added some additional code comments to the test, otherwise no changes.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Issue tags: +i18n sprint
catch’s picture

Status: Needs work » Needs review
FileSize
6.24 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch language_redirect_backport_d6.patch. Unable to apply patch. See the log in the details link for more information. View

Here's a backport for D6. I confirmed it doesn't break massively, but didn't do thorough testing yet.

And a re-roll for D7 to keep the test bot happy.

catch’s picture

Issue tags: +needs backport to D6
FileSize
9.75 KB
Failed: Failed to apply patch. View

And the D7 patch this time.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
9.94 KB
Failed: Failed to run tests. View

Re-rolled after the user session changes.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
9.94 KB
Failed: 9291 passes, 3 fails, 3 exceptions View

Here's one which doesn't completely break user logins.

David Lesieur’s picture

FileSize
6.52 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_redirect_339958-D6.patch. Unable to apply patch. See the log in the details link for more information. View

Backported to D6. Seems to work nicely! One particular thing: The call to variable_init() had to be moved up to DRUPAL_BOOTSTRAP_LANGUAGE, because drupal_init_language() needs the variables. We don't have the DRUPAL_BOOTSTRAP_VARIABLES step under D6.

David Lesieur’s picture

Mmmh. I'm not too sure that the patch (at least the D6 one I have just submitted) works that well. bootstrap_invoke_all('exit'); appears just before a call to drupal_goto(), which will also invoke hook_exit().

Even without that apparently redundant call, we might still have issues with the statistics module (that relies on hook_exit()), since if we redirect to the same node, the counter will be incremented once before the redirect (from drupal_goto()'s invocation of hook_exit()), and a second time after (when bootstrapping the redirect's target page).

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
9.86 KB
Failed: Failed to install HEAD. View

Good catch about the duplicate call to hook_exit() - here's the HEAD patch re-rolled without that.

For statistics module, I don't think we need to deal with that here - hitting refresh has the same effect - see #90468: Only record unique hits in node counter stats

catch’s picture

Priority: Normal » Critical

Since this makes language negotiation and page caching incompatible, I'm bumping to critical.

c960657’s picture

Status: Needs review » Needs work
+ * @return
+ *   Boolean indicating whether the language set is valid for the current
+ *   path.
  */
 function language_initialize() {

AFAICT the return value is stdClass or NULL.

       // Get the page from the cache.
-      $cache = $cache_mode == CACHE_DISABLED ? FALSE : page_get_cache(TRUE);

The comment should be removed together with the line it is referring to.

The comments do not wrap at 80 characters:

+    // Visit the page again with browser language set to English, and confirm that

This line is 82 characters.

+      // If the language has been reset from the default, redirect
+      // to the new language.

The first line is only 66 characters, so there is room for several more words.

catch’s picture

Status: Needs work » Needs review
FileSize
9.83 KB
Failed: 10703 passes, 7 fails, 0 exceptions View

Thanks for the review, I think think this should cover everything.

catch’s picture

FileSize
9.63 KB
Failed: 10702 passes, 8 fails, 0 exceptions View

Here's a somewhat simpler approach. If we fall back to browser language, and the language isn't the same as the default language, then we neither serve a cached page, nor cache the page request.

This is done by setting a $no_cache global, and unfortunately still requires changing the bootstrap order to avoid serving cached pages to anonymous users in the site's default language when negotiation is on (if we did that, we might as well completely remove language negotiation from browser as an option).

This version still passes all the tests, including the slightly modified new tests added with the patch.

Note that there's a more comprehensive solution to this for Drupal 7 at #54238: page cache might show messages - but that won't fix the issue here since we still need to have language_initialize() run before BOOTSTRAP_LATE_PAGE_CACHE. I'm hoping we can commit this, backport to Drupal 6 quickly, and then concentrate efforts on the more generic patch for Drupal 7.

catch’s picture

FileSize
5.5 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation-339958-D6.patch. Unable to apply patch. See the log in the details link for more information. View

Here's a straight backport for D6.

catch’s picture

FileSize
1.68 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation-339958-D6_0.patch. Unable to apply patch. See the log in the details link for more information. View
5.41 KB
Failed: 10700 passes, 6 fails, 0 exceptions View

Thought about this some more, and realised that there's no point changing the bootstrap just to avoid serving cached pages to anonymous users where the language is taken from the browser.

So this updated patch is a lot smaller, and doesn't touch bootstrap.inc at all. Also updated the simpletest to remove the check for this edge case.

Quick summary:

The main bug here, is that if your site is in English, German and Swahili, and someone visits your front page with their browser set to request Swahili, all requests for the cached version of that page will see it in Swahili if the browser fallback is enabled. Then 10 minutes later, the cache is cleared and the first anonymous visitor has their language set to German, and so on.

This (hopefully final) version of the patch prevents the page being cached when language determination is used. However to avoid bootstrap re-ordering, visitors will be served cached versions of pages in the site default language when available - since language negotiation happens after the cached page is served. This is a much smaller edge case which only affects a subset of users in certain circumstances, and IMO that particular edge case is won't fix.

This means the approach here will be compatible with the upcoming changes in #370454: Simplify page caching (which are a big win), and it also means much less intrusive changes to Drupal 6.

stella’s picture

I think this looks good and can be marked RTBC. Catch has summarized the underlying issue in the last comment, but to briefly summarize the patch:

  • Previously the page wasn't cached for anonymous users if caching was disabled. Now, in addition to that, the page isn't cached if the language set is the browser language but isn't the site default language.
  • Includes a set of simple tests for ensuring that cached pages are served in the correct language.
nedjo’s picture

Status: Needs review » Reviewed & tested by the community

stella: agreed.

Nice work catch. This latest patch addresses the issue without the potentially disruptive bootstrap changes in previous iterations.

Introducing a new global shouldn't be done lightly, but in this case it seems warranted and could be used for purposes beyond this language-specific use case.

sun’s picture

Status: Reviewed & tested by the community » Needs work
+  global $user, $no_cache;
...
+  if (variable_get('cache', CACHE_DISABLED) != CACHE_DISABLED && $no_cache != TRUE) {

$no_cache is not defined anywhere else. So the proper test is isset($no_cache) && !$no_cache.

However, while I can see and understand the actual issue, the entire, negated logic feels very poor. Please try hard to think of a way to do the opposite, or, at the very least, find a better name for that global variable.

toemaz’s picture

I applied the #57 D6 patch + the advice from sun at #61 to a production site (http://musescore.org) which was having this issue. Will report back my findings later. Thanks in advance to all developers involved.

catch’s picture

Status: Needs work » Needs review
FileSize
4.13 KB
FAILED: [[SimpleTest]]: [MySQL] Setup environment: failed to drop checkout database. View

Even smaller patch. No need to set a new global variable because we already have one (hat tip quicksketch on the imagecache in core issue).

It's not pretty, but it's a tiny change to fix a critical issue, and we absolutely need to backport this - so assuming it can go in I'd support leaving this issue open for a better system in Drupal 7 once that's done.

catch’s picture

FileSize
752 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_negotiation_D6.patch. Unable to apply patch. See the log in the details link for more information. View

And for Drupal 6.

quicksketch’s picture

Thanks catch for pointing me to this issue. You mentioned in the #371374: Add ImageCache UI Core issue that "we need a generic way of doing this" (temporarily disabling the page cache), but the ImageCache issue was not the place to do it. This patch could use a generic way also, and then the ImageCache patch could use it. Would this be the appropriate place to add such a generic function?

Here's a stab at an implementation, which would live in bootstrap.inc below page_get_cache()

/**
 * Temporarily disable the page cache for this request
 */
function page_set_cache_temporary() {
  global $conf;

  // The global conf variable affects all variable_get() requests, but does not persist across requests.
  // Setting the cache value in memory disables the cache only for a single request.
  $conf['cache'] = CACHE_DISABLED;
}
catch’s picture

FileSize
9.63 KB
Failed: 10702 passes, 8 fails, 0 exceptions View

Doing it as a new function in D6 wouldn't do any harm, so let's try to fix this here. It's only a couple lines change either way anyway.

Don't like the function name though, sounds like we're going to set the page cache temporarily. Here's one with drupal_skip_page_cache() which was the least worst I could think of.

quicksketch’s picture

Nice, I like drupal_skip_page_cache(). Looks like you attached the wrong patch however. ;)

catch’s picture

FileSize
4.99 KB
Failed: 10700 passes, 6 fails, 0 exceptions View

oh dear.

Status: Needs review » Needs work

The last submitted patch failed testing.

Damien Tournoud’s picture

As I stated in #22 and #24, I don't believe that simply skipping page cache is the way to go. I would still prefer that we redirect the user to the proper, language-prefixed, URL. If we don't want to do that, we at least really need to output a Vary: Accept-Language header so as not to mess up with proxy and client-level caches.

catch’s picture

@Damien - that was implemented around #48 or so, but requires changing the bootstrap order - a change which would have to be backported to D6, and would also conflict with chx's page caching revamp which makes the page caching the first or second bootstrap phase. Are you going to support a bootstrap re-ordering in D6 (or do you have an approach which doesn't require this up your sleeve?).

Damien Tournoud’s picture

@catch: If language_initialize() redirects the user to the proper, language-prefixed, URL, this page will never get cached. Basically having the same effect as your "don't cache" approach.

catch’s picture

Summary of irc conversation with Damien:

if we do #73 (which the patches around #48 did) - then we still break language negotiation for any user visiting a cached page - so it's still a partial fix, just one which requires slightly more work. The only way the bug can be properly fixed for all users is by re-ordering the bootstrap to put language negotiation before cached pages are served.

Two other possibilities - ensuring that Drupal never outputs non-prefixed pages. Or add something to the session when language negotiation is done, and disable page caching for the visitor until that's in place.

quicksketch’s picture

Damien, if we don't use the temporary disabling of the page cache, can you think of an alternative for #371374: Add ImageCache UI Core? I was crossing my fingers that this would get done here, since currently we need it for that patch also.

plach’s picture

subscribing

sun’s picture

Would a Content-Language HTTP header be taken into account by HTTP proxies/caches?

intuited’s picture

People looking for an immediate workaround for this issue (for Drupal 6) may have luck using the globalredirect module. This should prevent a path from displaying in two different languages at the same path.
Note that globalredirect is not normally active on the front page, however there is a patch posted in issue #317305: When browser language is used, foreign language redirection breaks on the front page. that will make it so.

niQo’s picture

For your information, there is dirty workaround using apache rewrite rules which solve this problem by addind hard coded languages in htacess. It's only for browser language detection and drop "/" frontpage.

This is not the drupal's way but perhap's can help people who absolutely need language browser detection with drupal cache.

We did this for example :

"Path prefix with language fallback" should be activated in drupal.

  RewriteCond %{HTTP:Accept-Language} ^fr [NC]
  RewriteCond %{REQUEST_URI} ^/$
  RewriteRule .* /fr [L,R]

  RewriteCond %{HTTP:Accept-Language} ^de [NC]
  RewriteCond %{REQUEST_URI} ^/$
  RewriteRule .* /de [L,R]

## Default language is /en, we don't use /
  RewriteCond %{REQUEST_URI} ^/$
  RewriteRule .* /en [L,R]
toemaz’s picture

@niQo Thats a pretty interesting fix you have there. Any way to write a condition + rule for all language in one time, using some kind of reg exp?

toemaz’s picture

I didn't test this yet, but perhaps it's in the right direction:

RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTP:Accept-Language} ^(de|es|fr|it|ja|ru|en) [NC]
RewriteRule .* /%1 [L,R]
Xano’s picture

Does this mean Drupal should be able to edit .htaccess depending on the available languages?

toemaz’s picture

@Xano No, Drupal won't be able to edit the htaccess file. You'll have to do it manually for each language enabled.

vnb’s picture

subscribing

plach’s picture

Status: Needs work » Fixed

This issue has been fixed in #282191: TF #1: Allow different interface language for the same path by disabling the browser detection when page cache is enabled and we have an anonymous user. The reasons are explained in #282191-16: TF #1: Allow different interface language for the same path. In #50-#61 we agree that we might want to find an alternative solution.

Marking as fixed, if we want to find another way to fix this and enhancing the current solution we should open a new issue taking into account the current situation.

Re-open the issue if you think we should keep handling this here (at least change the issue title in that case).

plach’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Needs work

This still needs a solution for Drupal 6.

plach’s picture

FileSize
1.54 KB

[please ignore this post, I forgot I should never work on critical core bugs this late ;)]

plach’s picture

Status: Needs work » Needs review

What about something like the following?

function locale_boot() {
  global $user, $language;
  $cache = variable_get('cache', CACHE_DISABLED);
  $language_negotiation = variable_get('language_negotiation', LANGUAGE_NEGOTIATION_NONE);

  // If we have an anonymous user, page cache is enabled, the URL has no prefix
  // and browser language negotiation is set, we perform a redirect to the
  // prefixed URL. 
  if (empty($user->uid) && $cache != CACHE_DISABLED && $language_negotiation == LANGUAGE_NEGOTIATION_PATH) {
    $args = isset($_GET['q']) ? explode('/', $_GET['q']) : array();
    $prefix = array_shift($args);

    // Initialize language as we need the proper language negotiation to be
    // performed.
    drupal_bootstrap(DRUPAL_BOOTSTRAP_LANGUAGE);

    if (!empty($language->prefix) && $prefix != $language->prefix) {
      // We need full path support to perform the redirect.
      drupal_bootstrap(DRUPAL_BOOTSTRAP_PATH);

      require_once './includes/common.inc';
      $url = url($_GET['q'], array('query' => drupal_query_string_encode($_GET, array('q'))));
      $code = variable_get('locale_redirect_http_code', 301);

      // Since we are in a hook_boot() implementation cache mode is normal, so
      // we can safely call hook_exit().
      bootstrap_invoke_all('exit');
      header("Location: $url", TRUE, $code);
      exit;
    }
  }
}
ahwebd’s picture

subscribing

toemaz’s picture

I solved this issue with Varnish cache, avoiding the issue with Drupal. See http://groups.drupal.org/node/62768 and browse through the comments.

tomsm’s picture

subscribing

aidanlis’s picture

subscribe

m.sant’s picture

Version: 6.x-dev » 6.19

Subscribing. I'm experiencing this issue on a site in 22 languages!

I also need the ip2locale, ip2cc, ip2country, geoip modules and therefore I cannot avoid this issue using Varnish cache.

plach’s picture

This is present in the latest development snapshot, please don't change the version.

m.sant’s picture

Status: Needs review » Active

Changed status to active as that patch didn't work.

m.sant’s picture

Version: 6.19 » 6.x-dev

Sorry didn't see previous comment...

plach’s picture

Priority: Critical » Major

Downgrading to major as this does not render completely unusable the site.

aidanlis’s picture

Priority: Major » Critical

Upgrading back to critical: I know my english guests seeing a mongolian front page don't agree with you plach :)

toemaz’s picture

If your problem is so critical, all I can say is that this worked for me in case you are 'not' using Varnish.

toemaz’s picture

@aidanlis also, don't forget to apply this patch on the D core in case you work with exotic languages which I think you do (3 versions of Chinese is exotic).

aidanlis’s picture

Thanks toemaz, I have the patches applied and everything is working for me, but I think the "critical" status still applies here until it's fixed in D6. Plach is your patch in #88 ready for review?

plach’s picture

@aidanlis:

I know my english guests seeing a mongolian front page don't agree with you plach

I'm sorry but looking at the priority guidelines I'd say they are wrong, obviously the perception of priority is subjective :)

Plach is your patch in #88 ready for review

It's only a proposal to introduce a redirect to the prefixed version of the page as suggested by Damien. It's not a patch and can only be tested by manually applying it.

aidanlis’s picture

Priority: Critical » Major

@plach you're indeed correct

netgenius.co.uk’s picture

Status: Needs review » Active

I'm also seeing this with a standard install. For me, the easiest work-around is MultiLink Redirect which provides a redirect from generic front-page url to language-prefixed url - but, this is my own module therefore I'm happy with it - it's overkill if all you need is a simple fix to the front-page caching problem.

Apache redirect as suggested in http://drupal.org/node/339958#comment-2503184 also seems to work fine.

Anonymous’s picture

@#99: That didn't work for me, and neither did the patch in #63.

I have my site set such that domains are used to distinguish languages, e.g. www.example.com and zh.example.com. Language negotiation works fine for every other page on the site as they have different URLs for Chinese pages, but on the most critical page, the frontpage, I either get cached English or Chinese no matter the URL (depending on which of www.example.com and zh.example.com was visited first after clearing of caches).

I don't need to detect the anonymous user's browser language, as there are two distinct URLs for the different languages. English as default is just fine. I just need the page to display in Chinese when the user switched to zh.example.com via the language switcher!

EDIT: Using CacheExlude (http://drupal.org/project/cacheexclude) to disable caching on the frontpage seemed to do the job for me, at the cost of, obviously, a cached frontpage.

jlbretton’s picture

Status: Active » Needs review

#63: cache_negotiation.patch queued for re-testing.

tomsm’s picture

Status: Active » Needs review

Cache Exclude is a good solution for the problem. I excluded <front> and enabled normal caching resulting in a much better performance for anonymous users. Thanks for the tip sypl !

plach’s picture

I have an improved version of #88 deployed on http://graphicdesignworlds.it which seems to perform well and is implemented through a contrib custom module. I ain't sure we can make such a big change for D6, perhaps it should just live in contrib for D6/D7.

We definitely need Gabor's and/or webchick's feedback before going on with a core patch implementing #88.

Edit: perhaps we just want to backport the D7 solution, i.e. disabling browser fallback for cached pages and make #88 live in contrib.

k3vin’s picture

+1

plach’s picture

Issue tags: +Needs committer feedback
mr.j’s picture

Subscribing

iamEAP’s picture

This may be specific to my install, but is anyone else still seeing this behavior for anything other than the homepage?

All pages other than the homepage seem to successfully redirect and have the path-prefixed URLs cached.

iamEAP’s picture

I spoke too soon. It was specific to my install, but the contrib module that was giving the (desirable) behavior was Global Redirect, as first suggested by intuited in #78.

I've since submitted a patch at the Global Redirect issue mentioned in #77 which solves this issue completely in contrib. It borrows ideas from #88.

I recommend you try Global Redirect and the patch here.

iamEAP’s picture

Nevermind, the solution above does not work.

goldenboy’s picture

Subscribing

fietserwin’s picture

i know this is for D6, but I created a sandbox module that prevents this problem from happening by executing a real redirect instead of serving the content under the home page URL: Front page redirect.

I guess it will be very easy to backport it: the lookup using the i18n_variable module should be replaced by looking in $conf['i18n_variables'].

iamEAP’s picture

@fietserwin, took a quick look at your code and you're falling into a similar trap. hook_init() isn't called on cached pages; you'd want to implement it in a hook_boot(). That being said, you may be duplicating functionality of the Global Redirect module.

fietserwin’s picture

The page should not be cached because I redirect (the redirect may be cached though). I am still doing some tests though, also in a first D6 backport version.

netgenius.co.uk’s picture

On most sites, the front-page would be one of the most frequently accessed pages, so preventing it from being cached using the Cache Exclude module does not seem to me like the best solution for performance reasons.

Redirecting via a Drupal module such as Global Redirect or MultiLink Redirect is not the best solution either for performance, since quite a lot of Drupal code and MySQL queries will be run before the the user is redirected to the correct (cached) page.

So, my advice would be to redirect using Apache Rewrite, as suggested by toemaz in #81 - I have implemented this on client sites and it works well. For situations where mod_rewrite is not available then my choice would be MultiLink Redirect, but I'm biased :) Global Redirect should work too I think.

I'll do some performance comparisons and post them here.

fietserwin’s picture

The solution form #81 won't always cope correctly with a header like "Accept-Language: nl,en;q=0.7,fr;q=0.3". On a French-English site I want the English version. But performance wise it may still be a good bet.

There is now a patch for global redirect that works fine for me and solves this problem for D6 in contrib, though your default language must have a prefix. I guess #81 and this patch can both be applied and you will get the fast redirect if the most preferred language of the user is supported, and the slower contrib solution if not.

fietserwin’s picture

[Edit] Oops, posted twice...

netgenius.co.uk’s picture

I'm now wondering whether an Apache rewrite would be better/faster than a redirect - could make it look like the relevant language version really was at the front-page address. Just an idea, haven't tried it.

fietserwin’s picture

Should be faster. But be careful with (intermediate) caching when serving different content under the same URL. The rewrite should also add a Vary header.

knalstaaf’s picture

Will there be a solution for D7 or D8?

Fabianx’s picture

It is possible to solve this in D6 and D7 in contrib by including a file from settings.php:

For those that need this in contrib without patching core, you can use a similar approach (only implemented for Apache so far) in a settings.inc that is included from settings.php before the page cache is used:

#1768556: Context Mobile Detect does not work with page cache enabled has a patch that makes a contrib module work that needs contextual page cache.

The idea is very simple. We adjust the request_uri() by changing the parameters to include a "?language=de" or "&language=fr" depending if the request uri has parameters or not.

On non Apache browsers will it just not work as I did not implement the other edge cases for request_uri(). As request_uri() is used for the cache key, this will always work and allow dynamic page cache keys, but the code needs to be run before bootstrap in settings.php, so it is very limited in functionality.

Enjoy!

- Fabian

caktux’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work
Issue tags: +needs backport to D7

This is still a serious problem in D7...

plach’s picture

Version: 7.x-dev » 6.x-dev
Status: Needs work » Needs review
Issue tags: -needs backport to D7

The bug reported in the OP does not occur in D7. Please, leave the version alone.

paolomainardi’s picture

Version: 6.x-dev » 7.x-dev

@plach this problem occur in D7 as well.

plach’s picture

Version: 7.x-dev » 6.x-dev

Browser language negotiation is disabled when page cache is enabled in D7. Hence the bug reported in the OP cannot happen. Please don't change the version again without a list of steps to reproduce the bug described in the OP (page cached in the wrong language) on a fresh D7 installation.

oliverpolden’s picture

I would like to propose another solution. #81 works great unless your default language doesn't have a path prefix in which case you need a bit extra...

In your settings.php (there may be a more appropriate place but hook_init() is too late) you can put:

if ($_SERVER['REQUEST_URI'] == '/') {
  $_SERVER['HTTP_ACCEPT_LANGUAGE'] = 'en';
}

This basically forces the root domain to always be in English whatever the browser's language. Since this runs after mod_rewrite it works perfectly well in conjunction with #81.

amacias’s picture

Version: 6.x-dev » 7.18

This is how I got around this issue. It seems that is working fine for me (I'm using Drupal 7):

my_module_init(){
  if(!user->uid){
    global $language;
    require_once DRUPAL_ROOT . '/includes/locale.inc';
    $languages = language_list('enabled');
    $languages = $languages[1];
    $language = $languages[locale_language_from_browser($languages)];
    $language->provider = 'locale-session';
  }
}
amacias’s picture

Version: 7.18 » 6.x-dev
pabsta’s picture

As this issue been solved? I have the exact same problem with drupal 7.

StephenN’s picture

Liked the look of #131 but couldn't get it to work. My solution for Drupal 7 below. n.b you will need to resave your language negotiation settings on the admin page to get the hook to apply.

function MYmodule_language_negotiation_info_alter(array &$language_providers) {
	//as standard language detection by browser is disabled if page caching is on
	//this is to prevent language specific versions of pages being cached based on a user preference
	//we circumvent this issue in MYmodule_init and so disable the page cache test here
	unset($language_providers['locale-browser']['cache']);
}

function MYmodule_init(){
	//if no language has been specified in the URL we prevent content on this path being cached
	//this allows fallback to language detection via browser settings
	//performance should not be significantly hindered as the vast majority of pages will be accessed with a language prefix
	//see also MYmodule_language_negotiation_info_alter
	//n.b requires all languages to have a prefix set and URL detection to be the primary form of language negotiation 
	
	//get the language set in the URL prefix
	$languages = language_list('enabled');
    $languages = $languages[1];  
	$path = urldecode($_SERVER['REQUEST_URI']);
	$path = substr($path, 1); //strip leading slash or language_url_split_prefix will always return false
	$urlLang =  language_url_split_prefix($path, $languages);
	
	//if no language specified in URL prevent page from being cached
	$defaultCacheBool = drupal_page_is_cacheable();
	if (!$urlLang[0]) {
		drupal_page_is_cacheable(FALSE);
	} else {
		drupal_page_is_cacheable($defaultCacheBool);
	}
}

Works for me but very hacky. Would still like to see a genuine fix.

lmeurs’s picture

For D7 we based our fix on #131 as well, but in our case to redirect anonymous users to the frontpage at a language-prefixed URL. Since the frontpage is cached in the default language, we only need to redirect users with another preferred language set in their browsers.

function my_module_boot() {
  global $user;
  $request_uri = request_uri(); // Get URI cross platform.
  $parsed_url = parse_url($request_uri); // Separate path from query string.

  // If the user is anonymous and the requested path equals Drupal's base path (= frontpage without language prefix).
  if (empty($user->uid) && $parsed_url['path'] === $GLOBALS['base_path']) {
    // Get enabled languages.
    require_once DRUPAL_ROOT . '/includes/locale.inc';
    $enabled_languages = language_list('enabled');

    // Get the browser's preferred language (langcode) or false on failure.
    $langcode_from_browser = locale_language_from_browser($enabled_languages[1]);

    // If the browser's preferred language is found and it differs from the site's default langcode.
    if ($langcode_from_browser && $langcode_from_browser !== language_default()->language) {
      // Redirect the user to a language prefixed URL.
      header('Vary: Accept-language', false);
      header('Content-Language: ' . $langcode_from_browser);
      header('Location: /' . $langcode_from_browser . $request_uri, true, 302);
      exit;
    }
  }
}

Is redirecting a SEO safe approach?

fietserwin’s picture

Is redirecting a SEO safe approach?

If you add a "Vary: Accept-Language" header it should be.

lmeurs’s picture

Hi fietserwin,

Thanks for the comment. I changed our code by adding the "Vary: Accept-language" and "Content-language: [langcode]" headers, but also changing the HTTP response code from 301 (moved permanently) to 302 (found).

Thanks again!

caktux’s picture

#135 is exactly the behavior I expect from Drupal core, thanks a lot for this.

nodoxi’s picture

Version: 6.x-dev » 7.22
FileSize
624 bytes
FAILED: [[SimpleTest]]: [MySQL] 40,373 pass(es), 14 fail(s), and 0 exception(s). View

I made changes to prevent saving wrong cache when language is detected by browser

Status: Needs review » Needs work

The last submitted patch, drupal-cached-language-fix-339958-139.patch, failed testing.

plach’s picture

Version: 7.22 » 6.x-dev

Please see #129.

knalstaaf’s picture

There's a quick'n dirty fix for D7 in #81, if you're not familiar with writing modules. Some say that the Boost module does the job as well for Drupal 7.

RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTP:Accept-Language} ^(bg|hr|sr|sl|hu|pt|es|en) [NC]
RewriteRule .* /%1 [L,R]
kumkum29’s picture

Hello knalstaaf,

I use your fix and it's useful for the main domain (www.domain.com). But, I have several subdomains with no languages, and this fix is functionnal on my subdomains. Now the subdomains aren't accessible.
Is it possible to change the condition and create the fix for only the master domain?

Thanks.

Status: Needs work » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.

xjm’s picture

Issue tags: -i18n sprint, -Needs committer feedback