This is really hard to report and we've tried thousands of things, but we still can't fully describe the pattern on what's happening and why. Still trying and I hope someone has a clue and what's going on.

Environment

We are hosting on Pantheon for a client of us and we're using the following components:

  • Drupal 7.34
  • Lot's of contrib and custom modules
  • Redis
  • Search404
  • Caching: Drupal's caching is all enabled and Pantheon also have their Varnish at the front door

AdvAgg context

We started off with version 2.7 and when we started running into problems and during research we realized that a lot has been addressed in the dev release already, so we switched. We have only enabled the main AdvAgg module and kept the default settings appart from "obscure path settings", we enabled base_url and converting absolute paths.

The problem

Step 1: JS aggregation disabled. All works OK.

Step 2: Enable JS aggregation, clear cache and force new aggregates by incrementing global counter. The site is still working on the German content but it doesn't on the Engish one, which is the default language. There is just no javascript executed and it fails silently, no errors on the console.

Looking deeper

The javascript aggregate ends up with 3 js files. The first two js files are perfectly OK, the third one causes problems. However, not everywhere, just on the default language site.

The broken js file showed different effects during our tests, it seems that they are unpredictable. Here are some of those that we have been able to identify. Not all of them are present all the time, but at least one at any given time:

  • File size is zero. When looking into the file system, however, the file is present, it is almost 24 hours old and it has legitimate content.
  • File contains binary content. When we saw that, we disabld gz compression in AdvAgg settings as it looked as if the webserver was probably compressing again - Pantheon is using nginx. But that didn't help. The binary content is gone but now the file is empty again.
  • Http response 504: The browser doesn't receive the file because of that response. But still it is nto clear why we would get this response, the file is present and has valid content.

Different behaviour on different domains

The live site is available at http://www.example.com (default language english) and the German site is available at http://www.example.com/de.

The test stage is available at http://test-site.pantheon.io and the dev stage is available at http://dev-site.pantheon.io

Guess what: only the main domain in English is showing this problem, not all the other variations.

Debugging problem

It is very hard to debug as the site went live early this week and the continued availability of the site is business critical for our client. So we have disabled JS aggregation on the live site as this is working OK for now.

But on the other stages we can't reproduce the problem so we don't know how to move this forward without taking the risk to disapoint site visitors when testing on the live stage again.

Oh, and yes we also asked Pantheon for their support. And their response has been that we should disable AdvAgg altogether as it wouldn't deliver any value anyway. I haven't found that really helpful.

Is anyone around with an idea or maybe similar experiences who is able to help or give us some hints on what to try next?

Edit:

  • We have applied the tip from https://www.drupal.org/node/2200003#comment-9103045, with no luck.
  • The problems are independent of browser and operating system.
  • Even with domain names like dev.example.com instead of dev-site.pantheon.io, the problem is not reproducable other than on the live stage.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jurgenhaas’s picture

Issue summary: View changes
jurgenhaas’s picture

Category: Bug report » Support request
Status: Active » Postponed

Revisiting the troubleshooting section made us think we should again try the fast 404 settings. We had skipped them before as we're using search404 and missed the point that js and css files are not covered by it, so then the fast 404 settings turned out to be important again.

It looks as if that has resolved the issues. At least we haven't seen an incident since and we will be monitoring it more deeply next week, when editors and developers are going back to work on the site which will then be producing updates and will cause advagg to update some of the files. I'll report back here on how things are going.

However, I'm still wondering why this is only happening on the english part of www.example.com and not on any of the others. But we probably may never find out.

mikeytown2’s picture

Ok so it sounds like I need to add in some checks on the status report page if https://www.drupal.org/project/search404 is enabled.

mikeytown2’s picture

Another thing to note is I had to do quite a bit of work inside of advagg_url_inbound_alter() due to language hooks thinking it needed to add /en or /de to the URL and thus causing drupal to 404 when it hit the advagg_css or advagg_js path. So that is something to look into if you think this is still an issue.

jurgenhaas’s picture

It seems to be working fine now for me. Regarding your other suggestions I'd say ...

  • Adding some checks: it would be great to test for search404 and if css and js files are not excluded properly, then a warning on the status page seems to make sense. Also, checking for fast_404 settings (both from core and from the module) would be great too.
  • URL altering seems to be OK as it is now. I haven't seen any issues where the URL would have been wrong.

So, from my end, this issue could be closed but I leave it as is so that you can decide what to do with the other items.

Thanks a lot for an amazing module and great support!

  • mikeytown2 committed cff0615 on 7.x-2.x
    Issue #2451801 by mikeytown2: Add in check for the search404 module.
    
mikeytown2’s picture

Title: "Sometimes" empty JS and/or CSS files on multilingual site » Add checks for the search404 module
Status: Postponed » Fixed
FileSize
9.88 KB

Changing title to reflect what was committed. Also did some readability improvements to the code, thus the large patch size.

jurgenhaas’s picture

Status: Fixed » Active

Unfortunately, my original problem just returned. I haven't used the patch from #7 yet but I guess that wouldn't make any difference anyway. What I had to do was to force new aggregates by incrementing the global counter. Not sure what I could or should do to identify the real cause of the issue. Just noting that we have not yet changed the settings for nginx according to the readme file as the server is hosted by Pantheon and they don't seem to like that solution in the first place, like described above. Any ideas what I could do next?

jurgenhaas’s picture

Unfortunately Pantheon is not willing to change the nginx settings, there comments is:

Unfortunately we do not support the modification of the nginx configuration as all of the containers run a standard profile and we have opted to keep this config out to keep the nginx.conf lean.

We will ask our documentation team to add a note about this to our documentation.

Not sure if that is the related to our problem above but I wanted to let you know.

mikeytown2’s picture

Title: Add checks for the search404 module » Nginx & Pantheon - Empty JS on multilingual's default language pages; sometimes binary (gzip) or 504 as well.

Updated title again.

The nginx settings are mainly for far-future headers and gzip handling, things that shouldn't cause an issue like you're describing. The status report will complain about it, but it shouldn't cause empty js files.

Next time it happens can you capture the source html? Maybe there's something in there.

jurgenhaas’s picture

OK, I'll do that and also I'll try to capture all the referenced assets if possible.

mikeytown2’s picture

Looking into this more and I think it has to do with 307s in the advagg code.

mikeytown2’s picture

Title: Nginx & Pantheon - Empty JS on multilingual's default language pages; sometimes binary (gzip) or 504 as well. » Empty JS on multilingual's default language pages; sometimes binary (gzip) or 504 as well. 307 issue.
Status: Active » Fixed
FileSize
10.38 KB

What was committed. Also threw in a couple of other fixes.

  • mikeytown2 committed e1a4c62 on 7.x-2.x
    Issue #2451801 by mikeytown2: Empty JS on multilingual's default...
jurgenhaas’s picture

Thanks, we'll give that a try.

jurgenhaas’s picture

Status: Fixed » Active
FileSize
169 KB

The problem is back and I have saved the source (see attachment). The studie.html file is loading 4 JS files of which one came in with a length of 0KB (see studie.jpg) and because of the cache settings, there is no way to force load that file from the server.

Note, the JS file with length 0KB on my client is perfectly valid on the server, it's just a problem on the client that they once received a 0-length JS file and they will be stuck with it for a pereiod of time.

This makes the problem even worse: as there are different sets of JS files and as the problem happens on clients, we can't tell if that problem currently happens for some of the visitors, because if this is a timing problem with the server infrastructure, then there could be thousands of users out there with this problem without us being able to tell.

BTW, with this experience I no longer believe that this is a multi-language issue, it just seem that on some pages on either language are showing this problem depending on when they got loaded.

Any suggestions on what we could do?

mikeytown2’s picture

Status: Active » Postponed (maintainer needs more info)

Are there any watchdog messages under advagg that say "Write to file system failed"? After writing to the disk I do a check to see if the file exists and is not zero bytes, advagg will remove the file from the filesystem if it is a zero byte file and choose to always serve the file from PHP until the file exists and the filesize is not zero. See advagg_save_aggregate().

In your case, what's more interesting is the fact that the file names listed in the html document for css & js do not match the pattern that advagg uses. Inside the IE conditional html for CSS you can see that the advagg file structure is used.

<!--[if IE 9]>
<link type="text/css" rel="stylesheet" href="/sites/default/files/advagg_css/css__5xTuj2mwu7YCM5tdqJCSTqb8IFaJsuZtBdlkhPWAvtI__zECT2EO-mYfEcAdyaoiuFULID5G4LnjbTyTBuPvoOcE__cO_Q8asjpbDxPaTmZsGOUZ1ZNBjsRlmOEAPGPY4bGIQ.css" media="all" />
<![endif]-->

We have 3 groups of 43 base64 characters generated by drupal_hash_base64().

5xTuj2mwu7YCM5tdqJCSTqb8IFaJsuZtBdlkhPWAvtI
zECT2EO-mYfEcAdyaoiuFULID5G4LnjbTyTBuPvoOcE
cO_Q8asjpbDxPaTmZsGOUZ1ZNBjsRlmOEAPGPY4bGIQ

1st group is a hash of the files used; 2nd is a hash of the files contents (used for far-future versioning); 3 is a hash of the settings used to generate the files. With the 1st and 3rd data points I can re-generate the file as it was intended to be created.

But for the main CSS you have this structure

<link type="text/css" rel="stylesheet" href="studie-Dateien/css__N3Qa2SmcuP7F06cOB-qAUg2J-lYifmvizNvYl8-OAIg__jQl4fyZi1M.css" media="all">

Looks like the first group of 43 is preserved but after that it doesn't match at all.

N3Qa2SmcuP7F06cOB-qAUg2J-lYifmvizNvYl8-OAIg
jQl4fyZi1M
???

It's also not in the advagg directory.

Something above Drupal is intercepting the HTML, scanning for CSS & JS files and rewriting them on the fly. Code for the JS

<script src="studie-Dateien/js__EUF5gK55x70ZwlK-s8-fYLZZPRk-tnNquiNr6ZUnRxs__l9kxUZWt1PpJ.js"></script>

Has the same structure as the CSS above

EUF5gK55x70ZwlK-s8-fYLZZPRk-tnNquiNr6ZUnRxs
l9kxUZWt1PpJ
???

AdvAgg can not be blamed for the issue you're having because advagg is not being used in the HTML.

jurgenhaas’s picture

Sorry for the confusion here. The way I created the offline HTML was to hit CTRL-S in Firefox and obviously that modifies the location and file names of the CSS and JS assets. Those file names within the IE 9 section are unchanged as Firefox doesn't recognize them as additional ressources.

Next time this is going to happen, I will be using "Show HTML Source" and then copy and paste the HTML content into a file, that will leave everything as it was delivered by the server.

Looking into http://www.kavo.com/de/node/2654 as an example shows that our files are following the structure you describe.

With regard to your first question, if we see any advagg entries in watchdog, I can answer that with a "no". All watchdog entries are reported over to Loggly, so we're having a fairly long hostory of watchdog entries without slowing down the system by reporting to the database, we can search back a couple of weeks and there has been no single entry from advagg.

mikeytown2’s picture

Any news on this?

jurgenhaas’s picture

Status: Postponed (maintainer needs more info) » Closed (fixed)

Thanks for the reminder. We haven't seen any issues for the last couple of weeks and so let's close this as fixed. Great support @mikeytown2, both on this issue and more so on maintaining this great module in general.

mr.baileys’s picture

Status: Closed (fixed) » Active

Tentatively re-opening this as we have recently switched on Advanced Aggregation on a high-traffic site hosted at Pantheon, and are running into the same issue described here: occasionally, the some of the js files generated by advagg are 0 bytes (on disk).

  • We're seeing empty js files for different sets of source js files
  • There are no advanced aggregation watchdog errors in the last couple of days, nor are there any errors/warnings in watchdog around the time of creation

For example:
Nginx access log entry:

10.223.192.191 - - [17/Dec/2015:00:57:01 +0000]  "GET /sites/default/files/advagg_js/js__ZuBmvscliaCjMOn9xLUXCuMFQZIWLK-xK479mu_9GWQ__HcYUy370g4g3kXMcthBj9ZH4LzmuRjdF1Qs7PMiGRjE__FEmgguGjeV4P05hgIH0yZtC5S61T1qSZis52FTUlYEU.js HTTP/1.1" 200 0 "-" "Amazon CloudFront" 0.000 "24.7.29.209, 54.239.203.106, 127.0.0.1, 127.0.0.1”

AdvAgg detailed info:

Array
(
    [0] => js
    [1] => ZuBmvscliaCjMOn9xLUXCuMFQZIWLK-xK479mu_9GWQ
    [2] => HcYUy370g4g3kXMcthBj9ZH4LzmuRjdF1Qs7PMiGRjE
    [3] => Array
        (
            [variables] => Array
                (
                    [advagg_gzip] => 1
                    [is_https] => 1
                    [advagg_global_counter] => 3
                    [base_path] => /
                    [advagg_ie_css_selector_limiter] => 
                    [advagg_ie_css_selector_limiter_value] => 4095
                    [advagg_scripts_scope_anywhere] => 
                    [advagg_devel] => 
                    [advagg_convert_absolute_to_relative_path] => 
                    [advagg_convert_absolute_to_protocol_relative_path] => 1
                    [advagg_force_https_path] => 
                    [advagg_css_dir] => public://advagg_css
                    [advagg_js_dir] => public://advagg_js
                    [language] => en-BE
                    [cdn_mode] => basic
                    [cdn_farfuture_status] => 0
                    [cdn_https_support] => 1
                    [cdn_status] => 2
                    [cdn_request_is_https] => 1
                    [cdn_check_drupal_path] => 1
                )

            [hooks] => Array
                (
                    [advagg_get_css_file_contents_pre_alter] => Array
                        (
                        )

                    [advagg_get_css_file_contents_alter] => Array
                        (
                        )

                    [advagg_get_css_aggregate_contents_alter] => Array
                        (
                        )

                    [advagg_get_js_file_contents_alter] => Array
                        (
                        )

                    [advagg_get_js_aggregate_contents_alter] => Array
                        (
                        )

                    [advagg_save_aggregate_pre_alter] => Array
                        (
                        )

                    [advagg_save_aggregate_alter] => Array
                        (
                            [0] => advagg
                        )

                    [advagg_current_hooks_hash_array_alter] => Array
                        (
                        )

                    [advagg_get_root_files_dir_alter] => Array
                        (
                        )

                    [advagg_context_alter] => Array
                        (
                            [0] => advagg
                        )

                )

        )

    [files] => Array
        (
            [sites/all/modules/contrib/jquery_update/replace/ui/external/jquery.cookie.js] => Array
                (
                    [filesize] => 3655
                    [mtime] => 1450276723
                    [filename_hash] => klaD6918zJ9f6wr59c7bd2RiEZ5F-6lyEP5Mn1v9u6k
                    [content_hash] => S6A-VyA-pXjsUfVtMXppzCu4OvCTN4BoOJD9ngRrZuU
                    [linecount] => 89
                    [data] => sites/all/modules/contrib/jquery_update/replace/ui/external/jquery.cookie.js
                    [fileext] => js
                    [cache_id] => advagg:file:klaD6918zJ9f6wr59c7bd2RiEZ5F-6lyEP5Mn1v9u6k
                )

            [modules/contextual/contextual.js] => Array
                (
                    [filesize] => 1804
                    [mtime] => 1450276722
                    [filename_hash] => sVG21x9WNvrB-RJ68lpPFU1c1nC69Q6vpoE0ZOCXYhQ
                    [content_hash] => re2u0k4LnMflnE_vdtbf895Qi_SkkP2LsRyeZRrC4hQ
                    [linecount] => 52
                    [data] => modules/contextual/contextual.js
                    [fileext] => js
                    [cache_id] => advagg:file:sVG21x9WNvrB-RJ68lpPFU1c1nC69Q6vpoE0ZOCXYhQ
                )

            [sites/all/modules/contrib/jquery_update/replace/misc/jquery.form.min.js] => Array
                (
                    [filesize] => 10594
                    [mtime] => 1450276723
                    [filename_hash] => -2nlfPkds-PeNi-TwIp315clJ-HShrTMUWaGUzNHBDk
                    [content_hash] => zrdc5QneENbHKJYZrS_CG439iKiONm34SFfC5jKU-F8
                    [linecount] => 10
                    [data] => sites/all/modules/contrib/jquery_update/replace/misc/jquery.form.min.js
                    [fileext] => js
                    [cache_id] => advagg:file:-2nlfPkds-PeNi-TwIp315clJ-HShrTMUWaGUzNHBDk
                )

            [sites/all/modules/contrib/views/js/jquery.ui.dialog.patch.js] => Array
                (
                    [filesize] => 1182
                    [mtime] => 1450276723
                    [filename_hash] => 27VuufBZ1fX6jvvIoDMTkI1d-lHd7QsTh-tUhZE3xGw
                    [content_hash] => U1ynZDKcn0SaDsyNXmDwRnz57Bh-H_McW5X9KUfWnrg
                    [linecount] => 27
                    [data] => sites/all/modules/contrib/views/js/jquery.ui.dialog.patch.js
                    [fileext] => js
                    [cache_id] => advagg:file:27VuufBZ1fX6jvvIoDMTkI1d-lHd7QsTh-tUhZE3xGw
                )

            [sites/all/modules/contrib/token/token.js] => Array
                (
                    [filesize] => 2980
                    [mtime] => 1450276723
                    [filename_hash] => is_d3NE-y8GDPg3Zz4UAIXV2XfyIQQOdu8qdbTy0Vd0
                    [content_hash] => SWhgc68aUP4g9KhUWIE0T__sUSAPfzvbUBtSxFYvZHY
                    [linecount] => 87
                    [data] => sites/all/modules/contrib/token/token.js
                    [fileext] => js
                    [cache_id] => advagg:file:is_d3NE-y8GDPg3Zz4UAIXV2XfyIQQOdu8qdbTy0Vd0
                )

            [misc/form.js] => Array
                (
                    [filesize] => 2460
                    [mtime] => 1450276722
                    [filename_hash] => iNq7saNX8iaIXFQKkCLYfDxlmrn_oyC98jPLHTfPLjw
                    [content_hash] => uagYkBY5L2ttNjbunTXZasaLg3L2C9WjlxwNubeA3K4
                    [linecount] => 78
                    [data] => misc/form.js
                    [fileext] => js
                    [cache_id] => advagg:file:iNq7saNX8iaIXFQKkCLYfDxlmrn_oyC98jPLHTfPLjw
                )

            [misc/ajax.js] => Array
                (
                    [filesize] => 25017
                    [mtime] => 1450276722
                    [filename_hash] => iKnaOK61DIrdI--cleGK5s98K2cZWzAgC6-zYs0vpOA
                    [content_hash] => LqvhuMbf3pLq5HoX7phJWA_r1QpssnS0eczwY5fBVM0
                    [linecount] => 677
                    [data] => misc/ajax.js
                    [fileext] => js
                    [cache_id] => advagg:file:iKnaOK61DIrdI--cleGK5s98K2cZWzAgC6-zYs0vpOA
                )

            [sites/all/modules/contrib/jquery_update/js/jquery_update.js] => Array
                (
                    [filesize] => 302
                    [mtime] => 1450276723
                    [filename_hash] => gRfGHcNBOJmcbzFREEMytZcNUPe8UunKWfMbwyi9eGU
                    [content_hash] => -jhdxDgl_J9yMVOtCoRetm1vBOGgnHFpF4H3zzM6Su8
                    [linecount] => 7
                    [data] => sites/all/modules/contrib/jquery_update/js/jquery_update.js
                    [fileext] => js
                    [cache_id] => advagg:file:gRfGHcNBOJmcbzFREEMytZcNUPe8UunKWfMbwyi9eGU
                )

        )

)
mikeytown2’s picture

@mr.baileys
Thanks for the detailed info. I have checks that should delete an aggregate file it its size ends up being zero inside of advagg_save_aggregate() and I have some other checks with rename and write failures inside of advagg_save_data(). The use of rename is done with temporary file names to ensure atomic file operations. What you're seeing tells me that somehow this is still happening even though I check for this. Where else can I add a check in the file writing process? It appears I'm missing something. The only other option I can think of is that the filesystem is lying to php/advagg saying write is good, filesize is not zero and then later on that not being true.

If the filesystem is lying then having a cron job remove zero byte files; having the aggressive cache check the disk (slow); and somehow schedule a filesize check 2 minutes after a write are the only ways I can see to work around a buggy filesystem. Open to ideas here.

You are using the latest version or latest dev correct?

mikeytown2’s picture

Added more reporting in regards to file writes; advagg will now test a file write on the status report page and it should give some detailed info back if it fails #2645582: advagg creates lots of file_advagg js files in my sites/default/files folder

mikeytown2’s picture

Status: Active » Postponed (maintainer needs more info)

#21 was helpful but I couldn't say I found the smoking gun. Anyone having this issue and using the latest dev of AdvAgg?

mikeytown2’s picture

Status: Postponed (maintainer needs more info) » Active

I was able to reproduce part of this bug on our production environment. The new watchdog messages reported exactly what happened. If you have 3 or more webheads then this can affect you, depending on how you deploy code. This doesn't affect core because it doesn't use image style file generation.

Long story short a deployment happens; request hits the new code box and the aggregate request hits the old code box. Generally with 2 webheads during a deployment only one box will be active at a time so this can't happen; once you get to 3 or more webheads then you have the issue where old code and new code are running at the same time.

Thinking that turning off "Use HTTPRL to generate aggregates." will fix most of the occurrences of this issue but doesn't guarantee that it will not happen. Pushing out a new file is where this issue is likely to happen as the aggregate cache will not be valid and thus it will have to be created.

In terms of a fix I can have it not write the aggregate file if reading from the filesystem failed on a file not in the public directory.

catch’s picture

This looks like the same issue I ran into with agrcache, see #2628918: Address deployment race conditions.

mikeytown2’s picture

@catch
Thanks for pointing to that issue; it's re-assuring that we both came to the same conclusion in terms of not writing the file. Good idea on using the filename checksum as a final check before writing the file; might be a little tricky with core css compression so I'll be sure to look into this at a deep level.

mikeytown2’s picture

Got the first version of this patch done. Will be fixing a couple of things; one of them is fixing SRI, got some bad SRI hashes on a second page load (first will be inevitable). Noted that after an hour of the file missing or of the hash not matching it will write the aggregate to disk, this is the advagg_file_read_failure_timeout setting that will need to be documented in the readme.

  • mikeytown2 committed 7b65ce9 on 7.x-2.x
    Issue #2451801 by mikeytown2: Fix issue with empty aggregates durning a...
mikeytown2’s picture

Status: Needs review » Fixed
FileSize
17.39 KB

This should be fixed with this committed patch. Worse case with SRI is it will fail on the first load but work on the second.

mikeytown2’s picture

Fixed a php notice

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.