First off, thanks a bunch for this module. My site kept crashing before I installed it and has been running pretty darn well since I have installed it.

I have one request. My website has 640,000 nodes. They are static content, but because the pages with content save in the node folder, that folder can get to be pretty large within a day or so (running for about 8 hours and we are at 21,000 nodes). It is a shame to have to run cron to delete everything only have to re-store all of the content over again when a new user comes across it. But after about 300,000 pages in the node folder Apache shows signs of not appreciating that the files are all stored in one folder.

Can I suggest that the files be saved into subfolders so that there are only so many files any single folder within the node folder? Or some other solution?

#15 boost.tar_.gz26.77 KBmikeytown2
#12 boost_no_symlink_1.patch8.38 KBmikeytown2


mikeytown2’s picture

interesting problem
Split the nodes folder up into smaller portions... how many nodes does it take before you notice a slowdown in speed? I know ext3 has a subdir limit of 31,998 so we can't put each node into it's own folder. Have you tried this patch #174380: Remove symlink creation. Let each path have own file? I'm assuming your using some sort of path naming.
If boost had a longer cache period, would that help you out as well?

samdeskin’s picture

Filezilla won't give me a directory listing any more. I think about 40,000 files was the cap to be able to get to the files.

This is what I was thinking: The node folder could have folders/files like this:

so that there are a maximum of 10000 files in each folder. What do you think of that?

A longer cache period would also be great - the files don't change so replacing them is a waste of resources.

I haven't tried that patch. Will it work on a linux server?

Also, cron has crapped out ... for now, can I simply delete the files in chron/ to let it start over?

Thanks for your time.

mikeytown2’s picture

does your site have a path structure or is it all node/23423 ? If you have a path structure, the patch might work. I haven't used it my self, but if it does what it says it does, it might solve your problem. Test it on one of your dev boxes, it will work on linux. alex s has written some good patches. Use it with the 6.x dev version of boost.

mikeytown2’s picture

Thinking about this after reading the posts on the other thread. Can you verify the speed slowdown with some data? Firefox's Firebug extension; use the Net panel and report back what a semi empty cache does compared to one that has 300k nodes in it; you could also use Here's a link on how to apply a patch: Applying patches.

EvanDonovan’s picture

As far as the cron causing you issues, you could always comment out the cron function in the Boost module.

samdeskin’s picture

MIketown2, my server just crashed - on the phone with GoDaddy right now - they say it was because there were too many files in one folder - the node folder. I have not set up pingdom - they seem to want money. But GoDaddy is deleting the files off the server - because I could not longer see the files using FTP, let alone delete them, let alone the server actually serve them. Does that help?

mikeytown2’s picture

pingdom is free, give it a url and it tell you how long it took for it to load. That's what we are trying to figure out, how bad the slowdown is.

Have you applied the patch?
What is your URL structure?

samdeskin’s picture

Title:System limits: Number of files in a single directory» Storing of pages in node folder
Version:6.x-1.x-dev» 6.x-1.0-alpha1
Priority:Normal» Critical
Status:Postponed» Active

I set up Pingdom.

I have not applied the patch so that we can see the effect of having all nodes stored in one folder.

It is a random page among the nodes that would end up in the node folder.

mikeytown2’s picture

the patch will put nodes into different folders, right now every node is in the node folder. the patch removes these, it should fix your problem.

have you applied any patches to boost before?

samdeskin’s picture

No, I have never applied a patch ... I was a bit nervous about doing it for the first time on such a big website. Is it going to be part of the next release of boost? When will that come out?

samdeskin’s picture

We tried to apply the patch, but we had a problem:

[root@ip-208-109-205-115 boost]# patch -p0 < boost.patch
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
|diff -upr sites/all/modules/boost_orig/boost.module sites/all/modules/boost/boost.module
|--- sites/all/modules/boost_orig/boost.module 2008-10-25 21:30:34.000000000 +0400
|+++ sites/all/modules/boost/boost.module 2009-01-29 13:29:22.000000000 +0300
File to patch: boost.module
patching file boost.module
Hunk #1 FAILED at 55.
Hunk #2 FAILED at 73.
Hunk #3 FAILED at 281.
Hunk #4 FAILED at 385.
Hunk #5 FAILED at 408.
Hunk #6 FAILED at 443.
Hunk #7 FAILED at 454.
Hunk #8 FAILED at 486.
Hunk #9 FAILED at 494.
Hunk #10 FAILED at 515.
Hunk #11 FAILED at 622.
11 out of 11 hunks FAILED -- saving rejects to file boost.module.rej


mikeytown2’s picture

new8.38 KB

you need to kill the first line of the patch and change the paths. try this one, it's exactly the same, just with those changes.

samdeskin’s picture

GoDaddy told me that this patch had a similar problem.

When do you think the next release will come out?

samdeskin’s picture

Hi Mike,

Is there another chance you can help me with another patch. My server is slowing to a crawl.

mikeytown2’s picture

new26.77 KB

All I did was apply the patch for you. Use at your own risk.

samdeskin’s picture

Seems to work great. Thanks Mikey.

mikeytown2’s picture

Status:Active» Fixed
giorgio79’s picture

Hello Guys,

Has this been committed?

Just to clarify, files in the nodes folder are split up into subdirectories?

Is this for taxonomy terms folder as well? I have a large taxonomy with 40 000 terms :)

mikeytown2’s picture

This patch has not been committed because it conflicts with the latest dev. It works with alpha 2; I'll start to tackle the conflict in a week or so.

rsvelko’s picture

the patch you talk about does not create subfolders of node/

is creates files for every alias - like for the url "forums/topic-title" it will create a forum folder and a topic-title.html in it..

So if you have a nice url hierarchy you will get file hierarchy for free...

Works the same with "en/article" or "de/title" urls ...

PS 1 hour ago the patch for the latest 6-dev got ready - enjoy and report back - #174380-59: Remove symlink creation. Let each path have own file

giorgio79’s picture

Thanks guys for this explanation.
For my free tagging terms there is no hierarchy, so they would all go into the tags folder it sounds like, which wont really solve the original issue entirely :)
I am sure many of us have a large free tagging vocab...

#2 samdeskin's idea would be a possible solution, or perhaps instead of numbers we could use the ABC, and create a folder for each letter...

What do you think?

giorgio79’s picture

Status:Fixed» Needs work

Do you mind if I put this on needs work? :)

rsvelko’s picture

seems like this needs some UI settings as well as some patches to the boost functions.

The .htaccess part seems harder than all of the 3 things...

Detailed Ideas?

mikeytown2’s picture

Title:Storing of pages in node folder» System limits: Number of files in a single directory
Version:6.x-1.0-alpha1» 6.x-1.x-dev
Category:feature» bug
Priority:Critical» Normal
rsvelko’s picture

I am thinking of a possible solution on linux/ext3 level instead of here on the drupal level ...

(note: if you have a huge site you would probably be a dedicated/VPS user so you will have the required control ... )

(please if someone knows a linux file systems geek ask them to help here - these guys here are talking about the same problem on VMS and as I understood there is a way to pre-alocate things :

Even (especially) if they are not in sequence, pre-allocating the
directory would likely have been a big help here. Knowing that
190,000(!!!) files were coming in, I'd have pre-allocated the directory
to 190000 blocks to prevent the system having to find a new contiguous
extent every time it needs to extend. I could always SET FILE/TRUNCATE
it later, if needs be.



Max number of files Variable, allocated at creation time[1]

and [1] leads to :

^ The maximum number of inodes (and hence the maximum number of files and directories) is set when the file system is created. If V is the volume size in bytes, then the default number of inodes is given by V/2^13 (

(10 GB = 10 737 418 240 bytes ) / (2^13) = 1 310 720

) (or the number of blocks, whichever is less), and the minimum by V/2^23. The default was deemed sufficient for most applications. The max number of subdirectories in one directory is fixed to 32000.

This quote says that there is a max subdir count and implies that max file count depends on the size of the partition and is in practice determined by performance ...

My hope is that with prealocating the blocks or some other FS technique we can speed things up...


mikeytown2’s picture

found an old issue in regards to this
#171444: Too many files in 'node' folder slow down the web server
Fairly interesting solution is in there btw, but it's set per dir; in this case the node dir, and the latest dev sorta makes that not work anymore. Plus it's a complicated 1/2 working hack, requiring a separate cron job to run every 10 min. What can be taken from here is counting the file name length and using that as part of the dir structure. Only problem is I have no idea why the .htaccess rules works; all the [0123456789] code is a little confusing.

mikeytown2’s picture

Status:Needs work» Postponed

Got some more info on some creative ways to do this -> Cache-Friendly File Names.
Also after reading a lot... I mean a lot of stuff I'm starting to get some of the mod_rewrite voodoo. That being said, this is a huge monster that I really don't have the time do correctly at this time. #171444 is doing it wrong; the php needs to be changed as well; but is has the right idea. So for now I'm marking this as postponed, the degree of difficulty is too high. If this ever gets done it will involve some nutty Regular Expressions and probably Environmental Variables for URL Rewriting.

mikeytown2’s picture

Status:Postponed» Needs review

Been thinking about this more, and it would have to be a per folder optimization. Would use the first letter of the pages name as a new folder. Rewrite rules would look something like this (untested)

# non root
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^/folder/(.)(.*)
RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/folder/%1/%2.html -f
RewriteRule ^(.*)$ cache/%{SERVER_NAME}/folder/%1/%2.html [L]

PHP - boost_file_path() Add right above return line (untested).

= explode('/', $path);
$page = array_pop($folders);
$folders[] = $substr($page, 0, 1);
$folders[] = $substr($page, 1);
$path = implode('/', $folders);
mikeytown2’s picture

The above php would also need some logic to make sure it only hacked up the path if it was in the correct folder.

samdeskin’s picture

Title:Storing of pages in node folder» System limits: Number of files in a single directory
Version:6.x-1.0-alpha1» 6.x-1.x-dev
Category:feature» bug
Priority:Critical» Normal
Status:Active» Needs review


I have been using this patch:

It seems to have worked well - eg. my server has not crashed for a while - so thank you.

I have gotten a hold of my syslog and have found a few errors that might interet you, so I thought I would point them out:

Apr 6 16:39:27 ip-208-109-205-115 drupal:|1239061167|php||||1||Invalid argument supplied for foreach() in /var/www/vhosts/ on line 625.

Apr 6 16:39:27 ip-208-109-205-115 drupal:|1239061167|php||||1||is_dir() []: open_basedir restriction in effect. File(cache/ is not within the allowed path(s): (/var/www/vhosts/ in /var/www/vhosts/ on line 626.

Apr 6 16:39:27 ip-208-109-205-115 drupal:|1239061167|php||||1||is_file() []: open_basedir restriction in effect. File(cache/ is not within the allowed path(s): (/var/www/vhosts/ in /var/www/vhosts/ on line 630.

I point out that all of these errors are from the same day that I applied your patch, so somehow, all of these things have not affected the site since then.

There are 220,000 lines in this syslog, so if there is a string that it might help for me to search for, please let me know.

mikeytown2’s picture

Are you running the latest alpha4? That fixed a bunch of errors & includes the symlink patch. You should use alpha 4 :)

Anyway Here's an error breakdown
Line 625, 600 - Fixed #356613: Boost cron run produces error because of empty cache directory.
Line 492, 493, 626, 630 - In short, permission errors on your sever.

mikeytown2’s picture

Looking into open_basedir restriction in effect error, this is supposed to take care of that.

//line 328 or so
function _boost_ob_handler($buffer) {
// Ensure we're in the correct working directory, since some web servers (e.g. Apache) mess this up here.

If this error keeps coming up, will try adding the chdir code in boost_init(); if an only if this error keeps showing up.

So once your running alpha4, search your log for any new open_basedir restriction in effect errors, and report back in a new thread.

rsvelko’s picture

Wow - mikey - you are making some remarkable progress with this one - I will review some time soon. I've regained my hope for this issue now .

mikeytown2’s picture

Category:bug» feature
mikeytown2’s picture

Status:Needs review» Postponed

This is postponed until someone says that they need it, even then it can't be generalized. So this will be a $100 dollar feature (will take me several hours to do), customized for your site, if you need it. Contact me via my contact page if your interested in this. OR if anyone posts a patch that is generalized & a good idea, then it will be added to boost's code.

zyxware’s picture

Category:bug» feature
Status:Needs review» Active

Is this still a $100 feature? Can we chip in to make this a generic feature?

mikeytown2’s picture

I would rather not have to code this. ext4 would be something to consider.

Break 32,000 subdirectory limit
In ext3 the number of subdirectories that a directory can contain is limited to 32,000. This limit has been raised to 64,000 in ext4, and with the "dir_nlink" feature it can go beyond this (although it will stop increasing the link count on the parent). To allow for continued performance given the possibility of much larger directories, Htree indexes (a specialized version of a B-tree) are turned on by default in ext4. This feature is implemented in Linux kernel 2.6.23. Htree is also available in ext3 when the dir_index feature is enabled.

AFAIK there in no way to make this generic, it involves some specific logic on the apache & php side. See this thread for an example of what would take place #787286: some ajax json may not cache . In short I'm not here to make money on the feature, it's just something that's hard to do thus I want to place a financial barrier so I don't get tons of requests asking for it. If someone comes up with a good way to do it, I'm all for it; & would do it for free. I haven't come up with a good way to do this as a generic feature. I could ease the pain of this by having the additional htaccess rules be auto generated, but then I have a huge mess in terms of the boost rules from site to site. Like I stated above, its not something I would want to do; but it can be done if needed.

zyxware’s picture

@mikeytown2 - Thanks for the quick reply. I would really like to contribute some effort here to work through this problem.

I am thinking out loud here. Not sure if this is doable or whether this would help/hurt performance.

One generic logic that I thought of was to use the first 2 (or n) characters of every part of the path into a directory each.

For example

should map to

This should solve the directory limit issue I would believe. I hope the depth of directories is not an issue and that it would not hurt the performance seriously.

However I couldn't think of a way in which Rewritecond could correctly test for existence of a cache file given this approach.

Can we do a blank rewrite rule via a perl script via the RewriteMap directive that will handle both the rewrite for logged in as well as anonymous users? In perl the condition checking allows a lot more flexibility.

It would be great if you could provide some tips and directions. Thanks for your time

zyxware’s picture

The 64000 limit does not help me either because the site has nid as part of the URL as a separate item in the path and there are like 200000 nodes.

mikeytown2’s picture

I think Wikipedia's data is old; ext4 in short should be unlimited if using a newer kernel

Give it a shot, if it (ext4) doesn't work then we can mess around with htaccess stuff; but if using RewriteMap, performance will probably be cut in half.

zyxware’s picture

Status:Active» Postponed

@mikeytown2 - Thanks for the offer to help. I am sorry I didn't get to try the ext4 solution. The client decided to move to Varnish. I have marked this issue as postponed like earlier.

For those who stumble upon this thread, there is a workaround other than the ext4 solution or customize boost hack. Use token hook to add tokens that can be used in pathauto that will ensure that the number of items in a folder never gets anywhere near the limit.

Eg: A [nid-split] token that will split [nid], say 12345 into 1/2/3/4/5

Another option would be to insert year month date tokens into the path.

augustofagioli’s picture sounds good to me! got to give a try

heddn’s picture

Version:6.x-1.x-dev» 7.x-1.x-dev
Issue summary:View changes
Related issues:+#2050849: Watchdog messages "Error: The file permissions could not be set on" ...

The discussion in this issue is the more appropriate long-term solution for
Here's a discussion I had in chat with folks from my company after asking if anyone had a good solution for this problem. To summarize the discussion, I'd say that developing a front-end controller that hashed the file path and split files out into directories was the preferred solution.

So the easy answer would be to alter boost to hash the final path part and query params.
can you do an md5sum in .htaccess?
and if you can, how efficient is it?

hashing in htaccess requires: 
you'd need to shell out to do it somehow...
i feel dirty just entertaining that idea

Of course, not sure if boost is already doing this, but you'd also want to normalize the query parameters to always present them in the same order.
use ksort

you could have a lightweight frontcontroller for drupal do it
rather than having all your logic in .htaccess, if the rewrite attempt fails you pass it to sites/all/modules/contrib/boost/frontcontroller.php
frontcontroller.php checks for long filenames and query parameters with an md5 hash, and if that fails, passes it to /index.php
you could still get by without bootstrapping drupal for those long filenames, even if you have to start up the interpreter.

or we could just use a filesystem that supports longer filename lengths

but eventually you hit a limit, so hashing the filename is a more certain solution. but then you have to split it out so you don't have more than a certain number of files in each folder.
otherwise you run into filesystem limits on files in the same folder.

wait a minute… what if you just have the ? in the URL drop into another subdirectory?
you couldn't have it named exactly the same as the file, so include the ? or a replacement for it… then the different combinations of query parameters each generate their own file
heddn’s picture

#410730-43: System limits: Number of files in a single directory also tries to solve the fact that many times there is a length limit to the filepath. This is seen most commonly with query string arguments and can come from ex. Facebook Like links and long urls from Views with lots of exposed filters. This is compounded by the fact that there is a limit to the number of files in a single folder.

Looking for feedback on the front end controller suggestion.