There are thousands and thousands of files under cache/*/0/node folder. So the 'node' folder is not very efficient. Because all files are kept on remote file servers not on web server, meaning every time a folder is accessed, every single file has to be accessed over the remote LAN connection.

I think it will be helpful that I separate(divide) those files into many sub-folders and only put 100 files in each sub-folder to speed up your sites access.

For example:
use path like cache/*/0/node/1/2/3/4/5/6/7.html to save cached files instead of cache/*/0/node/1234567.html

But I don't know how to modify the RewriteCond and RewriteRule in .htaccess file to do that.

My website is stopped by DreamHost because it is overloading and slowing the whole web server.

DreamHost tell me that: Anything you can do to make smaller folders would speed up your sites access.

Please help me out. Thank you very much.

Comments

Arto’s picture

Title: too many files in 'node' folder to slow down the web server » Too many files in 'node' folder slow down the web server
Assigned: Unassigned » Arto

Hmm, interesting problem. Your proposed solution is reasonable, but I don't think mod_rewrite will allow it, and I can't immediately think of a workable alternate workaround. Suggestions welcome.

bingjiw’s picture

I finally figured out a way to solve this problem.

First, add following lines into your .htaccess file. To redirect request to sub-folders.

	  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
	  RewriteCond %{REQUEST_METHOD} ^GET$
	  RewriteCond %{QUERY_STRING} ^$
	  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
	  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
	  RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]

	  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
	  RewriteCond %{REQUEST_METHOD} ^GET$
	  RewriteCond %{QUERY_STRING} ^$
	  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
	  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
	  RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]

Then, I have set a cron job to run following bash .sh script per 10 minutes. This script will move cached files from 'node' to corresponding sub-folders.

cd /home/wbj123/wbj123.com/cache/wbj123.com/0/node

for file in ./*.html
do
  ### now $file is something like: ./filename.html
  
  ### To let fileName = filename.html
  fileName=${file##*/}
  
  ### Get length of file name
  length=${#fileName}
  length=$((length - 5))
  
  ### now, length contains the count of characters in main file name
  
  ### for filename has 3 or more characters, for example: 123.html
  if [ $length -ge 3 ] 
  then
    ### get folder name for level 1, for example: 12
    level1FolderName=${fileName:0:2}
    
    mkdir -p $level1FolderName
    mv --target-directory=$level1FolderName $fileName
    
    ### for filename has 5 or more characters, for example: 123456.html
    if [ $length -ge 5 ] 
    then
      ### get folder name for level 2, for example: 34
      level2FolderName=${fileName:2:2}
      
      mkdir -p $level1FolderName/$level2FolderName
      mv --target-directory=$level1FolderName/$level2FolderName $level1FolderName/$fileName
    fi
  fi
done

I have tested this solution on my website wbj123.com, and it works well.

toma’s picture

Are this solution added to new release or not ? because i have the same problem in my server

Thanks for this great module

Arto’s picture

No, the release does not have any new features per se. I still need to review your solution.

Hetta’s picture

OK, I've tested this.

I added the .htaccess bits to the end of the boost part of the .htaccess file. The rewrite part now looks like this:

</IfModule>
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} ^/$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/index.html [L]
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} !^/cache
  RewriteCond %{REQUEST_URI} !^/user/login
  RewriteCond %{REQUEST_URI} !^/admin
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI} -d
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1/index.html [L]
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{REQUEST_URI} !^/cache
  RewriteCond %{REQUEST_URI} !^/user/login
  RewriteCond %{REQUEST_URI} !^/admin
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1.html [L]
  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html -f
  RewriteRule ^node/([0123456789][0123456789])([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$2/$1$2$3.html [L]
  #RewriteCond %{HTTP_COOKIE} !DRUPAL_UID
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} ^$
  RewriteCond %{REQUEST_URI} ^/node/[0123456789][0123456789][0123456789]+
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/node/$1/$1$2.html -f
  RewriteRule ^node/([0123456789][0123456789])(.+)$ cache/%{SERVER_NAME}/0/node/$1/$1$2.html [L]
  # BOOST END

The bash script works as advertised, and moves pages from /node/ into subfolders.

However. After I've run the bash script, /node/12345.html is moved to /node/12/34/12345.html. AND: whenever I look at a page as user 0 (= anonymous), a new page gets generated under cache/$server/0/node/12345.html .

That's not what I'd call caching ... @bingjiw, what else did you do, to get this to work?

Thanks!

bingjiw’s picture

Yes. So you have to run that bash script often. For my website, I have setted it in cron to run it every ten minutes, that's my solution.

Hetta’s picture

The problem is, that's defeats the whole idea of caching, and thus is a non-solution to this particular problem.

Have you set the expiry for your files to 10 minutes, too? Personally, I think the longer the better, so I'd love to have a week in there ... 1 day max isn't all that much, for a mostly static site.

bingjiw’s picture

You can not expect a real solution without modifying the core code of this module. The idea of 12/34/12345.html need to be writen into this module to solve this problem from the root. For now, my "solution" can avoid the heavy load of web server. That's it.

mikeytown2’s picture

Component: Code » Caching logic
Priority: Critical » Normal
Status: Active » Closed (fixed)

Moved to #410730: System limits: Number of files in a single directory. We are no longer dealing with the node folder, and running a separate cron doesn't sound like an idea solution.