We're having problems with the cache on a few sites. I originally thought it was because of ye old "Apache and Drupal both compressing pages" problem, but now I'm not so sure.
I can repeat the following problem on a few sites running CVS and beta2. All of these sites are in subdirectories right now (which is important, see later), as they're on our new server:
- Empty the cache.
- Use http://web-sniffer.net/ to hit the front page of a site with "accept-encoding: gzip" disabled. This will make drupal generate a cache entry in the database.
- Repeat the hit again with "accept-encoding: gzip" enabled. Binary is returned instead of HTML..
If I repeat this test on an empty cache starting with gzip enabled, then hit it again with it enabled, I get an error instead of binary:
Warning: gzinflate(): data error in /var/www/drupal-4.7.0-beta2/includes/bootstrap.inc on line 624
Warning: Cannot modify header information
I forced the cache functions to watchdog some debug info on what they were trying to cache, and noticed that toggling the gzip header can make Drupal cache pages as plain HTML instead of compressed. I assumed that this was the cause of the errors but couldn't figure out where the logic was going wrong in the functions. It seems that if a cached page is generated by a client that doesn't support gzip, it can't be viewed by a client that does support compression. The opposite is also true, but with different results.
Then we noticed that the "cid" field in the cache table was broken. Instead of a valid path for each cached paged, we have:
http://<server_ip_here>/~leafish/test/~leafish/test/
This is due to the following line in cache_set:
cache_set($base_url . request_uri(), $data, CACHE_TEMPORARY, drupal_get_headers());
$base_url and request_uri() have overlap if you're running a site in a subdirectory, which is resulting in fruity paths in the database. This may be the cause of our problems, or a seperate issue.
I wanted to provide a patch for this before creating the issue, but I think there's two or three different problems here that are confusing the hell out of me. I'm not sure what the best way to fix the $base_url problem is, and I didn't want to hack something into bootstrap to fix it here, but changing the function that generates this variable will probably break other things. Wah!
The server is running:
Apache 2.0.52 (mod_deflate disabled)
PHP 4.4.1
MySQL 4.1.11
eAccelerator caching.
Comment | File | Size | Author |
---|---|---|---|
#100 | image001.jpg | 30.54 KB | carligraph |
#95 | 43462_gzip_D6.patch | 4.2 KB | andypost |
#88 | cache-gzip-16.patch | 9.07 KB | c960657 |
#81 | cache-gzip-15.patch | 9.06 KB | c960657 |
#74 | cache-gzip-14.patch | 9.06 KB | c960657 |
Comments
Comment #1
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedFirst of all, thanks for the detailed report. :)
Second, request_uri() being broken when Drupal runs in a subdir is a ling standing bug which nobody seems to care about. I do not think that it is related to the issue at hand. http://drupal.org/node/10917
The issue is that apparently unzipped or scrambled data gets inserted into the cache table if a client which does not support gzip accesses the page in question before one that does.
This needs to be investigated. The code is in bootrap.inc. I will try to have a look, but am a bot short on time.
Comment #2
moshe weitzman CreditAttribution: moshe weitzman commented@killes: we probably need to only write to page cache for gzip browsers. but that means that many crawlers will never get cached. of course what we do today is even worse - we show the crawler garbage which can't be good for pagerank.
Comment #3
Dries CreditAttribution: Dries commentedThis needs to be investigated more closely. Can anyone reproduce this?
Comment #4
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedOk, I've thought a bit about it and come to the conclusion that I cannot reproduce the problem.
I have created a patch that will create some debug info. Can somebody run it and report back?
What should happen is that if you have the problem, you will have "data compressed" info in your logs. The patch would also fix the problem if it exists for some combination of apache and php versions and settings.
Comment #5
ixis.dylan CreditAttribution: ixis.dylan commentedI'll take a look at this patch when I get time, and report back.
I debugged the code in a similar way myself by watchdogging the data that's being cached. It alternated between compressed content and plain text, depending on what kind of request generated it.
Comment #6
ixis.dylan CreditAttribution: ixis.dylan commentedI can't reproduce this problem in HEAD, with or without the patch. Tested with HTTP 1.1/1.0, through a proxy, and with accept-encoding gzip on and off. Cache data is now always stored compressed, and delivered in the correct format described in the accept-encoding header.
Could still be an issue, but if nobody else can reproduce this for now then I'll close it. Lowering the priority until then.
Comment #7
Bèr Kessels CreditAttribution: Bèr Kessels commentedI can reproduce.
Visit dev.newsphoto.nl/ with IE6, visit a page twice (first to make a cahce entry) then youll see that the paged served from cache comes out blannk.
www.newsphoto.nl/ has the same codebase, but only cache switched off.
Bèr
Comment #8
Bèr Kessels CreditAttribution: Bèr Kessels commentedComment #9
Bèr Kessels CreditAttribution: Bèr Kessels commentedthe previous post contains a log from my site. From about line 35, is where the problems occur. Those are the headers sent. As per killes request.
Comment #10
moshe weitzman CreditAttribution: moshe weitzman commented@Ber - those can't be all the request headers. no 'accept encoding' or http 1.0/1.1 are specified.
Comment #11
ksoonson CreditAttribution: ksoonson commentedCan anyone please take a look at this also?
http://drupal.org/node/46272
Comment #12
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedthe attached patch is meant to be used todebug the problem, not for includion in core.
Comment #13
webchickI've come across a site that's getting these errors randomly. Will see if we can apply killes's debugging patch.
Comment #14
ricabrantes CreditAttribution: ricabrantes commentedany news about this?
Comment #15
ixis.dylan CreditAttribution: ixis.dylan commentedThe news is that it's still a problem, due to Drupal being a bit crap and issues never being resolved.
Comment #16
webchickHm. I think you meant "Due to no one trying the debugging patch that Gerhard posted at http://drupal.org/node/43462#comment-367659 and reporting back with their full results on a site that was having this problem." :)
We need people who can reproduce the problem to give us enough information to fix it. Until then, it won't be.
Comment #17
alexanderpas CreditAttribution: alexanderpas commentedPutting this to critical so it get's the attention it needs...
Also critical since binary returns, and WSOD's are not good!
(needs to be fixed before 7 ships...)
Comment #18
millions CreditAttribution: millions commentedany news on this?
Comment #19
c960657 CreditAttribution: c960657 commentedSteps to reproduce:
zlib.output_compression On
to .htaccess.telnet example.org 80
):Warning: gzinflate(): data error in /home/chsc/www/drupal7/includes/bootstrap.inc on line 735
page_set_cache() in common.inc assumes that ob_get_contents() returns compressed data if zlib.output_compression is enabled. This is not the case at my box, and IMO it would be surprising if it did considering the layered nature of PHP's output buffers. But this comment on the issue where this code was added seems to indicate otherwise. Can anybody explain that?
The attached patch fixes the problem on my box.
Comment #20
swentel CreditAttribution: swentel commentedCould reproduce this, patch fixes the problem. Patch applies with a bit of fuzz though.
Comment #21
swentel CreditAttribution: swentel commentedChanging status
Comment #22
Dries CreditAttribution: Dries commentedIt is really hard to test this patch. It would be best if we could write some simpletests for this. Anyone wants to try and take this on?
Comment #24
c960657 CreditAttribution: c960657 commentedI'll write some tests when #330582: Retrieve HTTP response headers land.
Comment #25
yonailo CreditAttribution: yonailo commentedsubscribing
Comment #26
c960657 CreditAttribution: c960657 commentedReroll. Has been tested with and without zlib.output_compression and the mod_deflate Apache module.
Comment #27
yonailo CreditAttribution: yonailo commentedAre you sure this is working well ?
I don't understand why you are wrapping bootstrap.inc test to determine if the browser
accepts gzipped data with the following "if":
What would happen if I disable zlib.output_compression, enable mod_deflate and disable
'page_compression' variable ? AFAIK, this would still save compressed data in the cache_page table, because ob_get_contents() returns compressed data (thanks to mod_deflate). Then inside bootstrap.inc, as I dont have 'page_compression' enabled, cache->data could not been decompressed if the browser request no compression, ¿am I right?
Comment #28
c960657 CreditAttribution: c960657 commentedI cannot reproduce that with PHP 5.2.0 and Apache 2.2.3. I'd be surprised if PHP's output is sent to mod_deflate and then back into PHP's output buffer. Do you really get the output from mod_deflate with ob_get_contents()?
Comment #29
yonailo CreditAttribution: yonailo commentedSorry I was wrong. I was thinking that my compressed data was coming from the last comment in the following snippet of code:
But it was coming from the "zlib_get_coding_type() === FALSE" case, because I have zlib.output_compression = Off
I think that I have suffered the same issue you describe in #19... that was the reason we decided to set zlib.output_compression = Off.
Thanks for the patch!
Comment #30
yonailo CreditAttribution: yonailo commentedBut I still don't quite understand, I am reading your first patch where it says:
I don't understand why you say "this option should be disabled when using a webserver that performs compression.". I am using zlib.output_compression = off, mod_deflate = On (so I am using web server compression), but still I want to save anonymous pages in compressed format. This is how my system is running and I'm not experiencing any issues.... if I follow your "description", I would have to set "page_compression" to FALSE, then I wouldn't save compressed pages in cache_page, ¿right?
Comment #31
yonailo CreditAttribution: yonailo commentedWell, I am assuming that the following line:
is compressing my PHP output, because the PHP output has mimetype "text/html", ¿right?
maybe I am confused because this line is not compressing Drupal's output.... that might be my mistake,
I have found in google that to compress PHP output with mod_deflate the mimetype should be "application/x-httpd-php" ????
Comment #32
yonailo CreditAttribution: yonailo commentedSorry this will be my last post, I swear :)
So, what's the best option in terms of performance ?
0. zlib.output_compression = On, mod_deflate = On (ERROR)
1. zlib.output_compression = On, mod_deflate = Off (cache_page compressed data thx to zlib)
2. zlib.output_compression = Off, mod_deflate = Off (cache_page compressed data thx to zlib invoked directly)
3. zlib.output_compression = Off, mod_deflate = On (cache_page not compressed)
Your patch tries to fix an issue with the assumption that ob_get_contents() always returns compressed data when zlib.output_compression is On, which is number 1. in my enumeration list. ¿am I absolutely right :) ?
Comment #33
c960657 CreditAttribution: c960657 commentedYes, that is right. The contents of {cache_page}.data is only compressed when page_compression is TRUE. But pages would be compressed by mod_deflate on the fly.
Note that I didn't write this description. It was originally written in #121820: Cache with zlib results in double compression and later modified in #100581: Interface text for "page caching" -- query & suggested revision. AFAICT the reporter only talks about conflicts with zlib.output_compression and output_handler=ob_gzhandler, and not with external modules like mod_deflate. I believe the problem with zlib.output_compression is fixed by my patch. I am not sure there is a problem with mod_deflate.
I'd guess you get the best performance by using Drupal's page_compression in combination with either zlib.output_compression or mod_deflate (with page_compression data is compressed once when it is saved to the cache instead of on every request). But I think we should discuss this elsewhere :-)
No. The output from ob_get_contents() is never compressed.
Comment #35
c960657 CreditAttribution: c960657 commentedReroll.
I just noticed that cache-gzip-2.patch contained an unrelated file. This patch is much smaller.
Comment #36
c960657 CreditAttribution: c960657 commentedComment #37
andypostsubscribe
Comment #39
c960657 CreditAttribution: c960657 commentedReroll (the latest test failures were due to #346529: Failures in node.test with assertFieldByXPath).
Comment #40
c960657 CreditAttribution: c960657 commentedComment #42
c960657 CreditAttribution: c960657 commentedReroll.
Comment #43
c960657 CreditAttribution: c960657 commentedComment #45
c960657 CreditAttribution: c960657 commentedI could not reproduce the testbot failure on my machine. But here is a reroll.
Comment #46
c960657 CreditAttribution: c960657 commentedComment #47
jrizzo CreditAttribution: jrizzo commentedHi -
The monitoring tool I am using does not support gzip encoding and although "accept-encoding" is not set in the header, the server most always sends Drupal cached content with gzip encoding. With caching enable in Drupal, Page compression can be enable or disabled and the encoding is most always gzip. If caching is disabled the encoding is NOT gzip.
zlib.output_compression is set to off in php.ini and mod_deflate is commented out in httpd.conf.
Drupal version 6.5 is being used. The cache-gzip-6 patch was applied; however that seemed to make no difference in the behavior.
Is it possible for cached data to not be sent with gzip encoding?
Any help is appreciated.
Joe
Comment #48
jrizzo CreditAttribution: jrizzo commentedMy apologizes. I found the solution to my problem in this post: http://drupal.org/node/273618.
My issue was related to the cacherouter module.
Joe
Comment #49
c960657 CreditAttribution: c960657 commentedComment #50
alexanderpas CreditAttribution: alexanderpas commentedComment #51
c960657 CreditAttribution: c960657 commentedI hope that some of you who are experiencing this bug will help review or test the patch. There is a description of how to reproduce the bug in comment #19. Thanks :-)
Comment #53
c960657 CreditAttribution: c960657 commentedReroll.
Comment #54
kenorb CreditAttribution: kenorb commentedRelated topics:
#187912: Problems with cache and zlib.output_compression
Possible duplicates:
#324890: weird unreadable compressed front page
#97847: cached pages appear blank, or garbage, or gzinflate error
Comment #55
kenorb CreditAttribution: kenorb commentedComment #57
c960657 CreditAttribution: c960657 commentedReroll.
Comment #58
sunFunction summaries should go on one line and _not_ wrapped at 80 chars. If further description is required, add a blank line after the summary, followed by the description, wrapped at 80 chars.
Aside from that, patch looks good to go.
Comment #59
c960657 CreditAttribution: c960657 commentedUpdated with reformatted comment blocks.
Comment #60
sunMuch better.
However, we need some feedback from testing framework maintainers for this patch, because zlib.compression might not be enabled on testbots (so this test won't ever run).
Comment #61
boombatower CreditAttribution: boombatower commentedThe testing servers meet the D7 requirements and a few specific memory requirements we put in place. Other than that I do not have direct access to the testing servers themselves. This is something we hope to fix with the deployment of the second generation framework.
In the meantime I'll leave a note (with Drupalcon) to ask hunmonk, DamZ, and greggles who control the three testing servers that are running. (I only have access to testing.drupal.org itself which does not run tests)
Comment #62
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks like the letter O.
Comment #63
Damien Tournoud CreditAttribution: Damien Tournoud commentedTwo remarks:
* zlib.output_compression cannot be enabled at runtime, but it can be disabled?
* Why not simply disabling output compression in .htaccess? Or make a requirement that it is disabled?
Changing the configuration of test slaves to enable output compression, while that is generally not recommended, makes no sense at all for me.
Comment #64
c960657 CreditAttribution: c960657 commentedApparently so :-/ See also http://bugs.php.net/bug.php?id=35936
It seems that some users are using it (e.g. these). By default, Drupal only supports compression of anonymous page views, so there is a legitimate reason for using it. Personally I wouldn't miss this feature, but perhaps somebody would? Anyway, supporting it isn't a big problem (it's only one extra line of code in drupal_page_cache_header()), so for now I suggest we stick with that.
I think we should treat zlib.output_compression like we treat magic_quotes_gpc: We support it, but don't write explicit tests for it and (for now) we don't test it on the test slaves. This doesn't prevent people using unusual configurations from running the test suite and report issues for tests that are failing under certain circumstances.
I have updated the patch to not alter zlib.output_compression during tests. It should still be possible to run the complete test suite on a machine with zlib.output_compression turned on.
I agree. If we had a *lot* of test slaves, we could make some of them run the tests under various obscure configurations (with output compression and magic quotes), but for now this isn't possible.
Comment #65
gregglesMy testing slave is set with
If it should be changed let me know.
Comment #66
markus_petrux CreditAttribution: markus_petrux commentedNot sure if this is covered already. When page cache is NOT enabled... I think one should be able to enable
zlib.output_compression
in PHP and exempt Drupal from doing the job. If so, when Drupal enables output buffering, then it probably needs to useob_start('ob_gzhandler')
rather than simplyob_start()
.Also, would it be possible to back port this to D6?
Comment #67
c960657 CreditAttribution: c960657 commentedThis is already possible. Drupal itself does not compress pages when the cache is disabled or bypassed (authenticated users always bypass the cache), but zlib.output_compression will enable output compression for all page views. An alternative is mod_deflate.
Yes, definitely.
Comment #69
c960657 CreditAttribution: c960657 commentedReroll.
Comment #71
c960657 CreditAttribution: c960657 commentedReroll (following #477944: Fix and streamline page cache and session handling).
Comment #72
andypostLogic with patch is more clear, tested locally with FF3 IE6
Help text of from page may change, suppose better to check site name
Comment #73
c960657 CreditAttribution: c960657 commentedI changes the test to look for the title instead. The same pattern is used in simpletest.test and common.test, so changing it should be easy if the default title changes some day.
Comment #74
c960657 CreditAttribution: c960657 commentedI changed the tests to simply look for
</html>
– much simpler.Comment #75
christefano CreditAttribution: christefano commented#187912 is a duplicate of this.
Comment #78
boombatower CreditAttribution: boombatower commentedTest client crapped.
Comment #80
c960657 CreditAttribution: c960657 commentedReroll (due to #500866: [META] remove t() from assert message).
Comment #81
c960657 CreditAttribution: c960657 commented... and here is the patch.
Comment #82
andypost@c960657 Are you sure remove check for gzip unavailability?
Comment #83
andypostIs there any UX about page-compression? I found only http://groups.drupal.org/node/24318 most recomended is "apache-driven" way
Comment #84
c960657 CreditAttribution: c960657 commented@andypost
zlib_get_coding_type() only reflects the current encoding used by zlib.output_compression if that is enabled. It is not necessarily an indication of whether gzip is supported in the client or server. These are checked at the top of drupal_serve_page_from_cache() - client support is detected via the $_SERVER[HTTP_ACCEPT_ENCODING], and server support using extension_loaded('zlib').
Comment #85
dshieh CreditAttribution: dshieh commentedNewcomer here. I'd like to try the patch as I'm experiencing this issue on my 6.13 site. Would someone provide me with some instructions? Thanks.
Comment #87
andypostThis patch is outdated so it needs reroll
Comment #88
c960657 CreditAttribution: c960657 commentedReroll.
Comment #89
andypost1) Added line to .htaccess
php_flag zlib.output_compression On
Directive Local Value Master Value
zlib.output_compression Off Off
2) Clean setup of cvs HEAD with normal profile + enabled page compression at /admin/config/development/performance
Results of testing:
3) telnet drupal7 80
4) clear cache and again but different order
So suppose this one fixed!!!
Comment #90
andypostNow tests without patch but with zlib.output_compression On Off
Comment #91
kenorb CreditAttribution: kenorb commentedThis could be related to that issue as well: #440182: Sometimes website generate Content Encoding Error
Comment #92
lordsilk CreditAttribution: lordsilk commentedtracking this
Comment #93
andypostTested on another server, suppose it's ready
Comment #94
webchickNice! Looks like this simplifies the logic quite a bit. Dries's concern was lack of tests and we have tests now (nice to have for this feature), although I understand it might be inconsistently run by testing slaves.
Would've been nice for at least one of the people formerly crabbing in this issue about the problem to have tested the freaking patch, but can't have everything, I suppose. :P Thanks to andypost and a few others for your help, and c960657 for your excellent testing instructions.
Committed to HEAD. Thanks! Since this changes no APIs and this is apparently a bug in 6.x too, marking down to be ported.
Comment #95
andypostSo here is backport
Comment #96
rolfmeijer CreditAttribution: rolfmeijer commentedsubscribing
Comment #97
c960657 CreditAttribution: c960657 commentedSuccessfully tested on D6.
Comment #98
lias CreditAttribution: lias commentedWill this be backported for drupal 5.x ? Thanks
Comment #99
andypostThis one still not commited to 6, so chances are low to see it in D5
Comment #100
carligraph CreditAttribution: carligraph commentedthe batch 4 d6 doesn't seem to work, because i got an error like the pictore below (content encoding failure...the file cannot be shown, because you'r using an unknown kind of encoding..) !
i commented ob_gzhandler and it works again.
Comment #101
c960657 CreditAttribution: c960657 commentedcarligraph, did you clear the cache after having installed the patch?
Comment #102
asb CreditAttribution: asb commentedsubscribing
Comment #103
Gábor HojtsyGreat, thanks, committed.