I've noticed a few of these error messages in my logs since a day or two, and it seems like what's happening is that a page was shared on Facebook, and Facebook is linking to the page with a bunch of parameters at the end of the URL. The resulting cache file name ends up being longer than 255 characters (the limit on several filesystems including ext3, ext4, FAT, and NTFS), so Boost isn't able to create the file. This isn't caught, so drupal_chmod() throws an error message.
I've attached a patch that tries to catch this and print a warning instead of an error.
Comments
Comment #1
ChrisLaFrancis commented#1542436: cache folder: Lots of files without an extension, and with names like boost1WP9Du could possibly be related, as the code the way it was before wouldn't have gotten to the point of unlinking the temporary file.
Comment #2
ChrisLaFrancis commentedForgot to mention that to reproduce, simply visit any page on your site as an anonymous user and append a really long (>255 characters) parameter string. Check the log and you should see some errors regarding file permissions on a file named similarly to the address you just tried to visit.
Comment #3
ChrisLaFrancis commentedI created a private function to contain the renaming-logic. Code looks a lot cleaner now.
Comment #4
ChrisLaFrancis commentedJust FYI, I've been running this patch in production since I posted it without any observed negative effects or watchdog errors.
Comment #5
heddnThis doesn't check the file name length. I feel it should do that. If the rename fails, then check file name length and log errors appropriately.
Comment #6
heddnHere's a patch that does just what is recommended in #2050849-5: Boost failing to cache long paths (views exposed filters support)
Comment #7
heddnComment #8
ChrisLaFrancis commentedMakes sense to me. I'll test the patch when I have time, but the code looks good.
Comment #9
ChrisLaFrancis commentedI tweaked the code you added to my original patch. The logic is the same, it's just a little more concise.
Comment #10
ChrisLaFrancis commentedActually, this doesn't work. PHP_MAXPATHLEN defines the maximum length of the path supported by PHP, not the filesystem. As far as I can tell, there's no way to programatically determine the maximum path length from within PHP. Regardless, since 255 characters are the limit on pretty much all common filesystems, I think it's a safe value to use. I've reverted most of the changes you made to my original patch, but I am writing a warning to the log if the file path exceeds 255 characters.
Comment #11
heddnI think we want is PHP_MAXPATHLEN, because the only way we can interact with files and file paths is through PHP. I'd need to test more to confirm.
Why logging twice? Only log once maybe?
Comment #12
ChrisLaFrancis commentedPHP_MAXPATHLEN is the maximum length PHP can handle, not the underlying filesystem. http://php.net/manual/en/reserved.constants.php
So on my system, PHP_MAXPATHLEN is 4096, but I'm using ext3 which can only handle a maximum of 255.
The first log message logs the fact that the file can't be renamed. The second only logs if the length is greater than 255. I was considering making this a notice instead of a warning. We could only log one message, but then the if statement would contain more or less duplicate strings, and I liked the way this looked better.
Comment #13
heddnMarking #10 as RTBCed. It works. A more complete solution could possibly be solved in the follow-up: #410730: System limits: Number of files in a single directory
Comment #14
bendev commentedpatch #10 works fine
Removing empty parameters from url (exposed filter of a view) can help
https://drupal.stackexchange.com/questions/136312/remove-empty-url-argum...
Comment #15
captainack commentedI was going to post a comment, then #410730-43: System limits: Number of files in a single directory took the words right out of my mouth. Especially your sentence at the conclusion. I didn't post this there, because I think my issue with a few views exposed filters is more relevant here than there.
When I thought about how to actually do it, I realized we can normalize/order it in code, but I don't think we can do it in .htaccess (without, again, shelling it out, which also makes me feel dirty).
But I don't think this is a huge deal, because at least with views exposed filters, the parameters always come out in the same order anyway as far as I can tell. I'm not quite sure about the facebook example cited.
So I think if we can live with potential dupe cache pages from diabolical people manually reordering the query string, this is the way to go.
If we really wanna get fancy, we can also use symlinks to avoid that, but there are a lot of different conditions/races to cover that way. I think I'll give it a shot unless someone corrects me and there IS a way to sort the query string in .htaccess.
Comment #16
captainack commentedI made a patch, and now long-as-heck views exposed filters are cached. A few comments are below:
1) It splits off up to nine query string items into directories, using the following structure:
/(REQUEST_URI_FIRST)/.../(REQUEST_URI_LAST)(BOOST_STR)
/(BOOST_STR)
/(QUERY_STRING_FIRST)/.../(QUERY_STRING_LAST)
/(BOOST_STR).(extension)
Notice the BOOST_STR is used a) to trail the last part of the REQUEST_URI, b) as a directory to separate the REQUEST_URI from the QUERY_STRING, and c) for the file name. If you're wondering why so many BOOST_STR's... I'm trying to minimize the likelihood of collisions, especially given the default character.
2) I was torn whether to wipe off trailing slashes or not. In the end, I decided to take the purist route and treat /path/to/a and /path/to/a/ differently. AFAIK, this was the original behavior too, but it can result in potential duplicates if people add unnecessary slashes.
3) I've tested the following:
- path combos: trailing slash/no trailing slash, query string/no query string, and any combinations thereof.
- htaccess generation with gz/no gz, html/html+json, and any combinations thereof.
5) deduplication feature - I added the generation of the canonical path so far, but stopped because I realized it might be tricky handling the symlinks of the canonical paths in .htaccess. Hopefully someone can help finish it off. Or someone has some insights on how to take care of the .htaccess part without making .htaccess even more horrendous. In case I ever get to come back to this, I think the pseudocode for boost_write_file should be something like this:
if(filename_canonical == filename) {
skip_canonical_symlink = TRUE
} else {
skip_canonical_symlink = FALSE
}
If !exist(filename_canonical) {
//presumably a cache miss - try to create filename and canonical symlink
rename tempfile to filename
if(!skip_canonical_symlink && (symlink filename_canonical -> filename FAILS)) {
//race condition - someone else beat us to it - rollback and symlink to symlink ;)
remove filename
symlink filename -> filename_canonical
}
} else {
//reordered cache hit - just symlink
if(!skip_canonical_symlink) { symlink filename -> filename_canonical }
}
There were a few semi-related things I saw as I was going through it that I'd like to mention:
1) Safety of _ as the default BOOST_STR... I think we need a better choice for this, in case someone wants to poison our cache with collisions to the wrong markup. A character that's legal in filesystems but is given special treatment in URL's would be better - for example '&'. I haven't tried it yet, but I think it should work and make it harder to inject junk into the boost cache.
2) Existing race conditions - I think I noticed a couple - I'll try to find them and give them separate issues. Off the top of my head, boost_mkdir in the event that two cache misses with the same directory come in together, I think the second one will fail.
3) "?" at the end of the rewrite - otherwise the query string is duplicated. I already incorporated this one into the patch.
Comment #17
captainack commentedSorry to bump, but I realized the title was kind of cryptic.
I've been using #16 for a month now. Just curious if anyone has any input on the approach (breaking path/querystring into subdirectories) and the potential gotchas.
Comment #18
dave bruns commentedJust quick note to mention a possible .htaccess "fix" here: http://drupal.stackexchange.com/questions/45113/boost-not-working-on-pag...
Also, in researching this, I noticed that when boost encounters requests > 255 characters, a temporary boost file remains in the cache as well (names begin with "boost"):
This is noted in a separate issue here: https://www.drupal.org/node/1542436
I assume this is a result of the rename failure and have linked that issue to this one.