This patch introduces an option for file-based caching. When enabled, a 'cache' subdirectory is created in the files/ directory, and cache data is written there. The best performance improvement will be seen when both the file cache is enabled and a minimum cache lifetime is set.

A few comments about the current design:

  • The cache can be 'disabled', 'database', or 'file'
  • The cache subdirectory is hardcoded to the name 'cache' within the directory_path as t() is unavailable during early bootstrapping. An additional configuration option could be provided to manually set the path to and the name of the cache directory, but I'm choosing to avoid this extra level of adminsitrative overhead.
  • To come up with a unique name for each cache entry that is filesystem safe, I'm using a simple md5 of the cache key. Hash collisions are extremely unlikely, but if that was ever to become a problem cache_filename() could be enhanced.
  • Cache garbage collection is less efficient at the filesystem level than at the database level, so a _cron hook is responsible for cleaning up old cache entries. Cache entries can still be expired, and expired entries are detected by looking at their creation date.
  • This improves upon my earlier filecache efforts (for Drupal 4.0) in that everything is cached to the filesystem, not just pages.
  • Nothing has been done to address potential cache-coherency issues when dealing with one database backend and multiple webserver frontends. Ideas, suggestions, and/or patches are welcome.

On my devel system the patch appears to work as intended. Additional testing would be greatly appreciated, as would benchmarking comparisons.

(I'll be offline January 20th through January 29th, but I'll pick this up again when I return if no-one else has run with it. The ultimate goal is to get support for file-based caching in core.)

Comments

moshe weitzman’s picture

cool.

so what are the implications of "Nothing has been done to address potential cache-coherency issues ..."? i assume each web server will maintain its own cache. and the site could look slightly different as a user bounced between web servers? i think thats acceptable during site distress. the database would always be coherent. in fact it would be sleeping away while the web servers serve flat files :)

can we use var_export() and skip the serialize/unserialize? should be faster.

i'll try to get some testing in.

Dries’s picture

I only looked at the code briefly.

  1. I'm not convinced this system buys us much. Unless you can convince me otherwise (with benchmark results), I believe the design of this patch is flawed. When people think "file-based caching", they hope or believe that Apache will be serving static pages, effectively nullifying the overhead of having to load, parse and initialize Drupal. The approach taken in this patch does not shortcut Drupal. That is, whether the page is served from disk, or whether the page is served from MySQL, we incur almost the exact same overhead. If that is the case, the performance gain might be neglible and not worth the added complexity of the code.
  2. Does this mean that everything gets cached on the filesystem? (Not just pages but also cached copies of the menus, filtered content, etc.)
Jeremy’s picture

Mosh: so what are the implications of "Nothing has been done to address potential cache-coherency issues ..."? i assume each web server will maintain its own cache. and the site could look slightly different as a user bounced between web servers? i think thats acceptable during site distress.

You could load the same page three times, and get a different page each time. That's ugly. And it could happen any time the site gets content posted. Perhaps a special timer could be used to prevent this, but it's something I'd want to look at after the basic functionality is fully tested.

Mosh: can we use var_export() and skip the serialize/unserialize?

I've not played with that before, but if it helps, great.

Dries: I'm not convinced this system buys us much. Unless you can convince me otherwise (with benchmark results), I believe the design of this patch is flawed.

I'm very hopeful that someone will contribute proper benchmarks while I'm away. This should help significantly, as it short-circuits the database.

Dries: The approach taken in this patch does not shortcut Drupal. That is, whether the page is served from disk, or whether the page is served from MySQL, we incur almost the exact same overhead.

This is the same approach I took in 4.0 days, and it offered a _massive_ performance improvement. The database no longer need be involved in displaying most pages to anonymous users. (Add a few more lines to the patch and you could disable the database but still be serving web pages). Enable a PHP cache on your web server, and things should fly.

Dries: Does this mean that everything gets cached on the filesystem? (Not just pages but also cached copies of the menus, filtered content, etc.)

Yes. Everything is cached to the filesystem.

Anyone with a good environment for benchmarking, please do some camparisons between no-cache, the database cache, and the filesystem cache. If you could also try with and without a PHP cache, that would be excellent.

Amazon’s picture

Title:File-based caching» File-based caching-jpcache performance results

I am working on getting this patch tested. But Khalid pointed out a file cache implemented by jpcache had the biggest performance improvements.

http://www.jpcache.com/main.php?content=globalis

Amazon’s picture

Title:File-based caching-jpcache performance results» File-based caching-loading techniques

Here are some techniques for creating load and testing.
Siege: http://www.joedog.org/siege
Siege command:

siege -c 32 -i -t 11m -d 5 -f url_list.txt

You can do benchmarking with the apache ab benchmarking tool: http://httpd.apache.org/docs/1.3/programs/ab.html

ab -c 1 -t 600 http://server/page.htm

You could also use the following file:

<?php
/*
There are two settings:
   First, set $file to be the server and page that you want to benchmark.
   Secondly, set $iter to be the number of times you want it loaded.
*/
$file = "http://localhost/index.php";
$iter = 1000;
function
getmtime()
{
$a = explode (' ',microtime());
return(double)
$a[0] + $a[1];
}
for (
$i = 0; $i < $iter; $i++)
{
$start = getmtime();
file ($file);
$loadtime += getmtime() - $start ;
$intertime = getmtime() - $start ;
echo
$intertime . "<br>" ;
$avgload = $loadtime / $iter;
}
echo
"<p><b>" . $avgload . "</b>" ;
?>
Dries’s picture

After I enabled the file cache, I got a couple warnings:
Warning: fopen(files/cache/87cd8b8808600624d8c590cfc2e6e94b): failed to open stream: No such file or directory in includes/bootstrap.inc on line 376

Dries’s picture

I ran some benchmarks and the average page request time goes down from 140ms (database caching) to 139ms (filesystem caching). That is 0.7% improvement. This confirms my statements: the design is flawed and the complexity is not worth the gain.

(Serving the exact same file as a static page without intervention of PHP, takes 2.10ms on average.)

Dries’s picture

The only way to speed-up Drupal is to (i) execute less code (and less MySQL queries), (ii) to flush the cache less frequently or (iii) to optimize the bootstrap code.

Page caching (and database caching!) can only be made more efficient if you shortcut more of Drupal's bootstrap process. This means losing some functionality like not executing the _init and _exit hooks (node counters, access logs, who's online block, throttle module, hostname banning, etc will stop working). You could also choose not to initialize the $user object in absence of a session cookie. Then again, most of this is not necessarily specific to page-caching, and can be partially achieved with the throttle module (though not nearly as aggressive).

Dries’s picture

Did some quick experiments. When I comment out the execution of the _init and _exit hooks, page generation times go down from 140ms to 50ms (2.8 times faster at the expense of loosing some functionality). When I comment out initialization of the session, page generation times go down to 40ms (20% speedup compared to 50ms). At that point, I probably don't have to invoke variable_init(); we only need one or two variables after that so there might be some additional cycles to safe.

I'm not saying we should go down this road, I'm merely trying to figure out and illustrate where the overhead is located.

bslade’s picture

I just started a new job, so I don't have time to set this up and test it, but I have a question or two, and a point to make. The questions first:

1) When you say "This patch introduces an option for file-based caching", what exactly is being cached and what is the key to the cache? Are you caching whole pages from a Drupal website using the web address as the key? Or are you caching nodes keyed by node id?

2). How does the caching interact with logged in users? How do you cache page options that are different from user to user? (and different between logged in and non logged in users)

Lastly, a point about performance testing. One of the key benefits of a file caching solution is not so much that it makes things faster, but that it can serve a lot more page hits without slowing much. So an additional test that might be useful would be to see the peak sustained page hit rate of the website with caching on and off (spread across 40-50 nodes).

Good luck.

Ben in DC

mgifford’s picture

I implemented it on a 4.7 install that I was working on. Perhaps I did something wrong in setting it up but I didn't notice much of a speed enhancement with the patch. I could confirm that there were cache files in the files/cache directory. I needed to set up an identical install on the same machine (or on a dedicated box) in order to get results that are meaningful.

I do think that cutting out the 50-100 queries per page would be sufficient to get reasonable responses on many servers, but I wasn't able to see much benefit through the apache ab benchmarking tool.

I've also found it is useful with file cache tools to have file names include some path information it for debugging purposes. Perhaps even something like $filename = $cid . md5($cid);

Mike

Jeremy’s picture

Sorry, I've been offline since I posted this patch and am only now catching up with the replies...

Dries: After I enabled the file cache, I got a couple warnings:

This happened once, or consistently? I haven't seen any errors like that during my testing.

Dries: That is 0.7% improvement. This confirms my statements: the design is flawed and the complexity is not worth the gain.

It depends on what you're trying to solve. On a sufficiently powered server in which the database has enough RAM and isn't stressed from other uses, the filecache isn't going to offer the type of benefit you seem to be looking for in your tests. But on an underpowered database server, or on a shared server with many busy sites sharing the same database, then the filecache should offer a noticeable benefit.

Back in 4.0 days when I originally wrote this filecaching logic, kerneltrap was on a very underpowered server and thus the database was always a bottleneck. In that situation, file-based caching offered a huge benefit.

How did you conduct your benchmarks? Just simply running ab against an idle server? Or something more complex? Did you also try with a PHP cache enabled?

My first goal was to verify basic functionality, gaining confidence that caching everything to the filesystem doesn't cause any problems. That done, I will move on to simulating a stressed multi-site environment and try to offer some benchmarks showing when file-based caching actually helps.

Dries: The only way to speed-up Drupal is to (i) execute less code (and less MySQL queries), (ii) to flush the cache less frequently or (iii) to optimize the bootstrap code.

Agreed, and all three improvements will benefit both the database cache and the file cache. Again, at this time I'm targeting when the database itself is a bottleneck such as is often the case on shared hosting solutions.

Dries: shortcut more of Drupal's bootstrap process. This means losing some functionality

At this time I'm only interested in improvements that don't sacrifice functionality.

bslade: what exactly is being cached and what is the key to the cache?

Everything that normally was cached to the database is now cached to the filesystem. This includes pages, as well as the variable table, etc... We use the same key as is used when caching to the database (though we md5 it to make it filesystem friendly).

bslade: Lastly, a point about performance testing. One of the key benefits of a file caching solution is not so much that it makes things faster, but that it can serve a lot more page hits without slowing much. So an additional test that might be useful would be to see the peak sustained page hit rate of the website with caching on and off (spread across 40-50 nodes).

Yes, measuring the benefit of this method of caching will require more than the standard use of ab... Help from anyone with the knowledge, interest and time to perform more complex benchmarks would be very welcome.

Jeremy’s picture

Title:File-based caching-loading techniques» File-based caching
StatusFileSize
new12.38 KB

While performing my own benchmarks, I ran into a major bug in the filecache patch: once you enabled the file cache it was not possible to disable it unless you manually deleted all the cache files. A call such as cache_clear_all('variables') was being ignored, so while the variable was being properly updated in the database, the old no-longer-valid cached version was being used. This means if you first tested the file-caching functionality prior to running your benchmarks, then all your benchmarks were with the file-cache enabled (even if you thought you had enabled the database cache). That would explain why there was next to no difference detected between file-caching and database-caching.

The new attached version of the patch fixes this bug. Please run your benchmarks again.

Jeremy’s picture

StatusFileSize
new17.65 KB

During further testing I found another bug in that the variable cache wasn't being used, so all variables were being loaded from the database one at a time even for cached pages. The fix involved a design change, configuration of the file cache and the minimum cache lifetime were moved into settings.php.

New features:
- if the database is down or otherwise unresponsive, if existing the file cache version of the page will still be displayed rather than an error
- the ability to disable _init() and _exit() hooks for all users, or on a per-role basis

Jeremy’s picture

StatusFileSize
new20.44 KB

Here's an update to the file-based caching patch that introduces a new bootstrap phase called DRUPAL_BOOTSTRAP_FILE. This new phase allows file-cached pages to be displayed to anonymous users without initializing Drupal.

Features:

  • introduces a block of new performance oriented options to settings.php
  • $file_cache is used for caching to the file-system instead of the database
  • $file_cache_fastpath uses the new DRUPAL_BOOTSTRAP_FILE phase to serve file-cached pages directly to anonymous users without initializing Drupal
  • $cache_lifetime was moved to settings.php so it is accessible to file-based caching without loading the variables table
  • $_init is used to specify which roles execute the Drupal _init hook (defaults to ALL)
  • $_exit is used to specify which roles execute the Drupal _exit hook (defaults to ALL)
  • in the event of a failure to connect to the database, if the requested page exists in the file cache it will be displayed rather than displaying an error. (this means users can continue to browse your site even when the database is disabled for maintenance or too busy to process more requests)

Benchmarks:
I performed very simplistic benchmarks (ab2 -c10 -n100 localhost/), running each test 5 times, throwing out the fastest and slowest result. Essentially, running with the file-cache alone (on IDE drives) was a performance hit, but with the fastpath enabled it was a significant (2x) performance boost.

With database caching, I averaged 20.206 ms per request and 49.5 requests per second. With file caching and fastpath I averaged 9.363 ms per request and 106.8 requests per second.

moshe weitzman’s picture

Intereresting ... Why do we need distinguish which roles get the init and exit hooks?

The new uid cookie would be useful in sess_read(). Basically, the absence of that cookie tells us that we have an anon user and thus we can skip a query. This is a big deal for a slashdotting scenario (when file cache is disabled). My one concern is clients who do not accept cookies. Maybe there is some way to deal with that.

I was hoping for more dramatic improvements in speed. I guess the benefit of this patch is not so much performance but rather reliability.

Jeremy’s picture

"Intereresting ... Why do we need distinguish which roles get the init and exit hooks?"

I suppose we don't need to, but this allows you to for example only track the actions of your site administrators (ie, for security logging the actions of privileged users).

"The new uid cookie would be useful in sess_read(). Basically, the absence of that cookie tells us that we have an anon user and thus we can skip a query. This is a big deal for a slashdotting scenario (when file cache is disabled). My one concern is clients who do not accept cookies. Maybe there is some way to deal with that."

Offhand I see two ways to handle users who do not accept cookies:

  1. Don't allow them to log in, sending a message telling them they have to enable cookies (this is common on the Internet)
  2. Add the uid info to the URL, much how PHP adds the session id to the URL

"I was hoping for more dramatic improvements in speed. I guess the benefit of this patch is not so much performance but rather reliability."

I did not in any way optimize my system for these tests. Some things that could greatly improve performance:

  • tune the drive with hdparam
  • replace the IDE drive in my laptop with a SCSI drive
  • enable an opcode cache to get rid of the overhead of PHP
Jeremy’s picture

I performed some further benchmarks to test the filecaching patch. I installed 4.7-beta4, added the patch, then with the devel module scripts created 10,000 nodes, 50,000 comments and 500 users. I then initiated a siege attack with 50 concurrent connections in internet mode using a urls.txt with 10,779 entries. The siege was left running through all the tests, then ab was used to collect the following statistics.

"ab -n1000 -c50" was run against the site's front page while under siege. Siege ran for ~5 minutes before starting each test to give time to populate the cache.

Database cache:

  • Total time to load 1,000 pages: 34.12544 seconds
  • Successful requests: 9033 (92.3%)
  • Average pages per second: 29.38
  • Average time per page: 34.12 ms

File cache:

  • Total time to load 1,000 pages: 28.65989 seconds
  • Successful requests: 1,000 (100%)
  • Average pages per second: 36.1
  • Average time per page: 28.54 ms

File cache with FastPath:

  • Total time to load 1,000 pages: 5.23 seconds
  • Successful requests: 1,000 (100%)
  • Average pages per second: 191.12
  • Average time per page: 5.233 ms
Cvbge’s picture

What is fastpath?

killes@www.drop.org’s picture

How is this going to deal with sites that have non-public content?

scroogie’s picture

What is fastpath?

$file_cache_fastpath uses the new DRUPAL_BOOTSTRAP_FILE phase to serve file-cached pages directly to anonymous users without initializing Drupal

naudefj’s picture

Status:Needs review» Reviewed & tested by the community

I'm impressed! Everyting seems to be working as advertised. I trust this will be committed ASAP.

robertDouglass’s picture

Off topic, but I'm wondering if the fastpath bootstrap, or something similar, could also be used for handling requests for private file downloads in order to avoid the full bootstrap just to serve a binary file?

Jeremy’s picture

Cvbge: What is fastpath?

FastPath is a new bootstrap phase that can serve file-cached pages without connecting to the database. This bypasses session management and _init/_exit hooks.

How is this going to deal with sites that have non-public content?

Are you referring to the private versus public download method? If you enable FastPath, pages that are cached will be displayed to anonymous users without session management or _init/_exit hooks being executed. If private downloads are not cached, then they will be accessed through the normal bootstrap process.

I'm sure there are some websites that require session management for anonymous users, or require _init/_exit hooks for anonymous users. At this time those sites can enable file-caching, but not FastPath. Enabling file-caching without FastPath still increases performance and reliability.

I'm wondering if the fastpath bootstrap, or something similar, could also be used for handling requests for private file downloads

It should be simple enough, but realize that you wouldn't have any session control nor _init/_exit hooks, so you'd not be able to offer any access control on the files from Drupal if you use the FastPath bootstrap.

markus_petrux’s picture

So the difference between "file cache" and "file cache with fastpath" is the former initializes sessions and invokes init/exit hooks? ...or are there more differences?

I believe many modules require sessions and/or init/exit hooks. For instance, statistics (including the content read counters) are recorded at hook_exit time.

Dries’s picture

Looks like the fastpath name is somewhat confusing. Maybe someone can think of a better name?

It might be worth testing database caching with a "fastpath".

bslade’s picture

Jeremy said:

Everything that normally was cached to the database is now cached to the filesystem. This includes pages, as well as the variable table, etc... We use the same key as is used when caching to the database (though we md5 it to make it filesystem friendly).

Ok, so it's a replacement of the Drupal database based caching scheme with a file based caching scheme. Just to review, this means:

  1. Logged in users do not use caching (as per cron and caching). Caching might still be very useful because, in my experience, it's the anonymous load that overwhelms big public sites.
    .
  2. Only whole pages are cached with the requested URL being the key. This is probably fine for the initial version of this patch, but one thing to watch out for is spammers/hackers accessing the same page (er, requesting the same node) over and over again, but using a slightly different URL each time to try to try and guess things about the target site. This can kill a website with cpu/disk load by writing millions of spurious entries into the cache.

An "it would be nice someday once we thought about the implications" feature might be to cache based on the node being requested rather than the URL that invoked the request. Would it make sense to have an option to only cache simple node-request URLs? (eg. http://myurl.com/node/node_id) For the immediate implementation, is there anyway to limit the the size of the filecache?

Jeremy said:

with database cache: Average [web page access] time per page: 34.12 ms, with File cache with FastPath: Average [web page access] time per page: 5.233 ms

So a better than 6 times performance improvement. Respectable.

Did you happen to notice if you were CPU bottlenecked during this FastPath run? If not, then you might have been bottlenecked on I/O, which is actually sort of cool (it would mean you're running very efficiently). Also what architecture/system were you running on?

Dries said:

fastpath name is somewhat confusing. Maybe someone can think of a better name?

How about "BypassDrupalInitFileCache"? Disclaimers should be added that this might cause some Drupal statistics to not be recorded.

Other thoughts:

Would a possible future direction be to cache sub-components of a web page for logged in users? I think this might translate to caching not just on the whole requested URL/node, but also caching the results of specific call backs within a module for a node. Ie., change the key to access the cache from node (which is actually requested URL now), to node + callback + certain params. Eg. the most recent stories block which have a cached entry by node num, module name (which is "node"?), calling arg=most_recent_stories_requested ("view"?)

moshe weitzman’s picture

@bsslade - caching of smaller sections of a page is already possible by the modules themselves using the cache API. and yes, we should consider doing more of this for expensive sections of code. lets not muddy this issue though with future enhancements.

Stefan Nagtegaal’s picture

What is wrong with "File-based cache"??

Jeremy’s picture

StatusFileSize
new21.49 KB

I did some more testing of this patch and found/fixed a few bugs. I'm also resyncing it with HEAD.

Changes:
- bugfix: always make $base_url available globally
- bugfix: remove calls to variable_get from DRUPAL_BOOTSTRAP_FILE, they're not available yet
- bugfix: don't display cached pages when $_POST is set, allows use of forms with fastpath enabled (such as leaving a comment)
- feature: delete expired cache files as detected with new system_exit function (before cache pages were only expired with _cron hook)

Dries: Looks like the fastpath name is somewhat confusing.

Another name that comes to mind is ShortCircuit. Unless you want something a lot longer and more descriptive.

Amazon’s picture

StatusFileSize
new87 bytes

If you are looking to test this patch and want to test writes as well as reads try this:

ab -p loadtest.txt -T "application/x-www-form-urlencoded" http://localhost/head/node/add/story

where loadtest.txt is the attached file.

nathandigriz’s picture

Will this patch work with 4.6.5?

olet’s picture

will this work with 4.6.5?
will this work with 4.6.5?
will this work with 4.6.5?
will this work with 4.6.5?
thanks!!!!

killes@www.drop.org’s picture

This is clearly labelled as development for HEAD, so don't bother asking.

Can we please have a module that disables accounts of users who use more exclamation or questions marks than neccessary?

gte451f’s picture

I might be missing something but if this patch is ready to be committed, am I right in assuming it is not included in the new 4.7 RC?
Does that mean that we have to install and patch up to current version?

Tobias Maier’s picture

yes, because we have a feature freeze for drupal 4.7 to get it out as fast as possible.

njivy’s picture

I like where this is going, Jeremy. I've been testing the patch with "fastpath" enabled on a low-traffic site without much trouble.

As I understand it, the file cache is currently cleared either
1) site-wide by a cron job or
2) per-page by hook_exit() when the next anonymous person visits an invalid page.

Since "fastpath" disables hook_exit(), the file cache is cleared only by cron jobs. Was this behavior intended? I find myself manually removing the cache files after updating a site because I'm impatient.

Jeremy’s picture

Status:Reviewed & tested by the community» Needs review

Hi Nick, thanks for testing!

Yes, the design is such that with FastPath enabled the cache is only cleard by a cron job. This is because clearing the cache involves the potentially time consuming task of scanning an unknown number of cache files and deleting some or all of them. If using FastPath, you will want cron to run very frequently - probably equal to what you've set your "minimum cache lifetime" to. For example, enabling FastPath, hitting cron.php every 5 minutes, and setting your minimum cache life to 5 minutes ought to offer very good performance.

njivy’s picture

This patch affects the $conf variable in settings.php.

During conf_init(), which occurs after the file caching code, the $conf variable is reset for security purposes. (Note, this is the new conf_init() and not the old conf_path()-style function.)

See http://drupal.org/node/59274 for a discussion.

gte451f’s picture

I already know the answer to this of course but for those following at home, what file do I patch with this patch?
I don't see a filecache file anywhere in the drupal install.

lennart’s picture

Very interesting. My host runs MySQL on a seperate server which is often overburdened or the connection between the webserver and the database server is slow. This file-cahcing with fastpath would be a huge improvement for me. Is the patch up-to-date?

Jeremy’s picture

StatusFileSize
new19.9 KB

Resynch with HEAD. Also removed the role-based _init and _exit hook skipping conifguration to just focus on getting the file cache and fastpath functionality into core for now.

Jeremy’s picture

I have backported the file-based caching patch to Drupal 4.6.6, hopefully allowing more people to test it out.

tayknight’s picture

What happens if the site isn't using cron, but is instead using poormanscron? If I'm using poormanscron and I enable ShortCircuit (or fastpath, or FileBasedCaching). If a registered user doesn't go to the site for a while, will the cache ever get expired?

Cool module. Btw. I've found at Dreamhost the SQL queries are definitely the bottleneck. For relatively static public websites, this is pretty cool.

moshe weitzman’s picture

if nothing visits your site, then the cache expiration is a moot point.

drumm’s picture

Status:Needs review» Needs work

I'd like to see more work on the settings page.

- It looks like the variable setup above the removed code becomes dead.
- There is no UI to verify that the settings in settings.php are working that I can see. I expect it would disable the MySQL page cache.

Jeremy’s picture

StatusFileSize
new23.78 KB

I've attached a new patch to resync with HEAD (required by changes to the system.module).

Addressing Neil's concerns I've also updated the cache settings page to reflect the fact that the page cache can be stored to the database or to the filesystem. Finally, I added a new "cache location" entry providing feedback as to the current cache configuration. Suggestions for better wording are welcome.

gte451f’s picture

This may be a stupid question, but what file do you patch? I am used to the file name (in this case filecache) being the file that needs to be patched but I can't find any existing filecache files.

Can anyone give me a hint as to how I can implement this on my brand new 4.7 install?

luperry’s picture

there seems to be an issue when anonymous poll voting is enabled.

if the cached page stores a page that shows the vote has already being casted, the next anonymous user won't be able to vote unless they go into the node. and if the cached page stores a page that shows the vote has not being casted, then those anonymous users who already voted won't be able to see the result.

another issue is, when the recent comment block is enabled, the timestamps on it don't get updated for anonymous users.

I wonder if there is a way to bypass these problems.

Taran’s picture

Hmm. Tried to apply this to 4.7 (reference patch in #47) - bootstrap.inc puked on trying to find db_active, for some reason... Weird.

drubeedoo’s picture

Tracking... this looks like a perfect solution for oversold shared hosting environments. Thanks so much, Jeremy, for tackling this issue!

laura s’s picture

I would be delighted to test this on 4.6.x and 4.7.x. As HEAD has moved past 4.7, I'm wondering if the patch on #47 would still be the appropriate one for 4.7, or should one of the earlier versions apply?

(PS - You guys are awesome!)

pwolanin’s picture

Is this related to the file-based caching scheme described in this comment? http://drupal.org/node/50243#comment-124723

pwolanin’s picture

I answered my question by contacting Atro directly- seems they are complementary approaches.

Jeremy’s picture

StatusFileSize
new23.51 KB

Attached is the file-based caching patch, resynched with Drupal HEAD. I've also done a little cleanup (fixing the cookie handling per an issue reported elsewhere, no longer modifying session.inc, and properly handling the fastpath when cache_lifetime is set to 0).

To make it easier for those that would like to test but aren't good with patches, I'm also including links to pre-patched tarballs:

Please test and post your feedback here.

Jeremy’s picture

Assigned:Unassigned» Jeremy
Status:Needs work» Needs review

Review would be nice. I'd love to get this functionality merged into core.

Dries’s picture

I'll try to do some proper benchmarking.

Dries’s picture

The only reservation I have with this patch is the "fast path" stuff. While it improves performance significantly, it is also dangerous. The number of people who could use this is probably less than one hundred; 99.5% of the users don't have the expertise to understand the implications. I would be nice to hear from some other people what they think about this.

Jeremy’s picture

StatusFileSize
new19.83 KB

I agree that some will be confused by the fastpath, and by the implications of enabling it. That said, I think it offers a very powerful performance boost to those that are managing relatively static sites with Drupal.

In the interest of getting the file cache itself merged into core, I've broken the patch into two pieces. The first patch adds support for file caching. The second patch should be applied after the first, and adds support for fastpath. Perhaps we could focus on what's involved in merging just the first patch for now?

Jeremy’s picture

StatusFileSize
new8.05 KB

Here's the second half of the filecache patch, which adds support for the "fastpath".

luperry’s picture

I hope this is not a stupid question, but can someone explain it to me the short comings of "fastpath"? I understand it lets you load pages directly from the cache without making any DB queries, but why is it dangerous?

Jeremy’s picture

From a comment the patch adds to settings.php:

* file_cache_fastpath:
* When file_cache_fastpath is enabled, pages that have been cached to the
* filesystem will be displayed to anonymous users without making any database
* queries. This bypasses Drupal's session management, as well as all _init
* and _exit hooks. As always, logged in users will not be displayed the
* cached pages.

The full implications of bypassing Drupal's session management for anonymous users probably need to be further explored. Registered users are not affected by the fastpath.

Dries’s picture

See also: http://drupal.org/node/67675.

Maybe we can create a cache.db.inc and a cache.fs.inc ? Still trying to wrap my head around it.

I agree that focussing on part of this patch is a lot easier. I'll do some performance benchmarks.

robertDouglass’s picture

Does the new bootstrap phase that you introduce help us in any way with the problem of serving private files? It seems to me that these two concerns intersect and that changing the bootstrap for one but not the other isn't in our interest.

robertDouglass’s picture

in the event of a failure to connect to the database, if the requested page exists in the file cache it will be displayed rather than displaying an error. (this means users can continue to browse your site even when the database is disabled for maintenance or too busy to process more requests)

I have to say that this alone is a massively valuable new feature.

robertDouglass’s picture

LOL, me and my bad memory. I just realized that I asked the same question twice (serving private files). So the question becomes, what are the solutions being proposed from the private-file bootstrap crowd (Walkah, where are you?) and are they compatible with Jeremy's changes, and should any common functionality be considered before going forward with one or the other?

lennart’s picture

I installed this using the tarball on a couple of sites. As expected the improvements are significant due to the fact that the database server is the weak link. I am not able to give any objective numbers since I cannot benchmark it, but my impression is that the improvements are very significant. That is my subjective test :D

moshe weitzman’s picture

my incomplete proposal for private files is at http://www.tejasa.com/node/113 ... i don't think it is helpful to cloud this issue further with discussion of private files. when someone wants to tackle that issue, they wil change bootstrap if needed. my .02

chx’s picture

-1 . I do not want this to become part of core. Instead, I purport http://drupal.org/node/67675 which will make it possible to push this into contrib. I am willing to work on that as much as needed.

Jeremy’s picture

I am not opposed to splitting the various caching methods into unique files. This is something we've talked about many times in previous years, but it seems there was always opposition. Getting your patch merged would be a step in the right direction.

That said, I am still very interested in getting file caching methods into core. It is a repeatedly requested feature, and has proven that it can be highly beneficial on busy websites. It is simple to configure, and the functionality is simple to maintain. In addition, some of the benefits from file-caching still require modifications to core (ie, the filecache patch makes it possible for a page to be displayed even if the database server is too busy or is temporarily disabled for maintenance. This is provided through a patch to database.*.inc, I think it's still important that this functionality can be provided.)

Leeteq’s picture

Out of curiosity, partly off-topic, but relevant in terms of if such a module can cater for security concerns:

Lets say we have a site where there is some sensitive content that is organised into "closed groups". Not even (some of the) admins have/should have access to the secured content, and that the concern for security also involves limiting which admins has access to the database tables (ie. dba/phpmyadmin access/shell, etc.).

Would it be possible to _ensure_ by configuration that selected content is NOT cached in the file system?
(from certain "parts of the system", say - OG's or taxonomy terms, depending on which security modules are used)

Crosswinds’s picture

I am relatively new to Drupal (it won the choice I had to made for a CMS+ system).

I very much like the sounds of this module (just hit it today) and FastPath in particular with one caveat - most of my users are almost zealous when it comes to how many hits they are getting on their site/page - Anonymous or not. If what I read is correct, FastPath will shortcircuit the exit_hook and not update their view count... this will make a lot of accesthem mad.

So I have to balance their wishes with my requirements - is there any recommendations on how it would be possible to utilize most of the benefits of FastPath and have it update the read count?

dvsouza’s picture

And what about people that prefer memcached instead of filesystem ? I think that we should modularize this instead.
Please take a look on my post at http://drupal.org/node/69206.
I prefer to use a cache server like memcached (if Wikipedia lives well with it, I think that I have nothing to loose... =P)

I agree that filesystem cache is also a good thing (when you have fast discs+RAID, etc). It definitely pays off and keeps the database cool,
but when you have a different server just for caching, with 2GB of RAM and Gbit between the servers (or just a crossover connection if you have just one webserver), things can get pretty exciting =) I'm not a fan of full-page caches (had very bad experiences), but I think that small cachings (like for example, caching the variables table, or caching _some_ lazy SELECT queries) may help a LOT. We have a drupal website (lots of nodes and lots of taxonomy) that went from 8s to 0.8s just by caching the taxonomy stuff (with the default database cache system). In a website with a large number of visitors, I think that it doesn't really matter if you post something now and it will show on the "recent news" only 2 minutes later, for the sake of being able to navigate through the website with a 0.8s delay instead of 8s for every hit.

Resuming, I think that cache should be modularized. =)

firstov’s picture

What caching mechanism do people use on shared hosting servers? If the database is the bottleneck then this utility should be a great help.
What is the current status of this patch?
Thanks

beginner’s picture

Jeremy: do you use this patch somewhere, on a live web site?

cwagar’s picture

I just took Jeremy's patches and rolled them against DRUPAL-4-7, and pushed them to our site. More information tomorrow, but the first hour is very, very, very good.

Since 80%+ of our traffic is anonymous, this is looking to be a great improvement. I'm in touch with Jeremy and will work to modularize this for the alread-committed cache improvements in HEAD.

matt westgate’s picture

Status:Needs review» Closed (fixed)

Jeremy and I tagged-team to turn this into a contributed module since chx made it possible to do without core hacks. You can see the results here.