Closed (works as designed)
Project:
Boost
Version:
7.x-1.0-beta2
Component:
Caching logic
Priority:
Normal
Category:
Support request
Reporter:
Created:
31 Mar 2013 at 08:42 UTC
Updated:
13 May 2013 at 18:51 UTC
Jump to comment: Most recent
Comments
Comment #1
Anonymous (not verified) commentedBoost only works for anonymous users, so the first thing to do and to check is to log out, view a page, refresh it (in case the page was not cached on the first viewing) and then view the source. If boost is working then at the bottom of the page will be a comment saying when the page was cached and when it should expire.
If no comment appears then you have a module that sets a user id, (or your site is https), and boost disabled itself. (Boost turns itself off for things like login pages so that username are not cached is someone enters them incorrectly).
You could easily set your boost lifetime to a month and not have any further use for the setting. Every time you edit a page then boost automatically wipes it out and regenerates it using the crawler, which never crawls the whole site, instead it just deals with pages that are edited, deleted, or related pages with links on them.
Since boost does not deal with logged in users, if the majority of your visitors are logged in, then you would not see any improvements and should looks at authCache or another caching mechanism. Boost only deals with php pages being turned into html, so server resources are split among many things, network bandwidth in which case you should be looking at your gzip settings (and making sure that you are also not wasting cpu cycles by zipping compressed items like images), the amount of things you are trying to send the user in the first place, all of which can be analysed by installing firebug in firefox, or using developer tools in chrome, and selecting the network tab, which will give you a breakdown of what your page is doing. With boost installed, the first page call will be "normal", a second call to a cached page will be much faster as then html is just sent out, so once that is out of the way, you can focus on what else is slowing down the site, your css and js files should be aggregated in Drupal's normal caching mechanism and a long time should be chosen for those.
Whatever your logging mechanism is, it should be analysed carefully to determine the bottleneck, high cpu usage can be database/ PHP related, of server compression. If you have a lot of 404 errors, you could be the victim of dumb bots probing your site for vulnerabilities that don't exist since they run through a list of joomla, wordpress, zen-cart.... etc exploits which are irrelevant but will use up your resources, and you should certainly consider the fast 404 drupal module that sends a static page out rather than hog your db resources.
Comment #2
palazis commentedThanks for the fast and detailed response.
Yes more than 90% of my site visitors are anonymous so i need the boost module.
You are right - I can't find the comment on any page saying when the page was cached and when it should expire.
So boost is not working, perhaps I did something wrong in the configuration process.
This is probably the reason that the site's statistics show a heavier server load than normal because also core's "Cache pages for anonymous users" is disabled.
Installation steps.
1) Clean URLS: OK
2) Boost module enabled: OK
3) Cache pages for anonymous users is unchecked
4) Administer > Configuration > System > Boost > Boost Settings: Seems OK
5) Administer > Configuration > System > Boost > File System
Seems OK since I have a cache folder (permissions 0775) and inside that a normal folder.
Inside the normal folder I have many folders (en, el, de, etc), one for each language.
In there I can find lots of html files so I believe this step is OK
6) .htaccess modification. Using Notepad++ (I have LF now), the following code was added below # RewriteBase /
# RewriteBase /
### BOOST START ###
# Allow for alt paths to be set via htaccess rules; allows for cached variants (future mobile support)
RewriteRule .* - [E=boostpath:normal]
# Caching for anonymous users
# Skip boost IF not get request OR uri has wrong dir OR cookie is set OR request came from this server OR https request
RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]
RewriteCond %{HTTPS} on [OR]
RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [S=7]
# GZIP
RewriteCond %{HTTP:Accept-encoding} !gzip
RewriteRule .* - [S=3]
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.html -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html,E=no-gzip:1]
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.xml -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.xml [L,T=text/xml,E=no-gzip:1]
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.json -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.json [L,T=text/javascript,E=no-gzip:1]
# NORMAL
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.html -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.xml -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.xml [L,T=text/xml]
RewriteCond /home/www/drupal7/cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.json -s
RewriteRule .* cache/%{ENV:boostpath}/studiesinuk.net%{REQUEST_URI}_%{QUERY_STRING}\.json [L,T=text/javascript]
### BOOST END ###
What can be wrong???
Comment #3
Anonymous (not verified) commentedDo you have anything being created under the folder cache/normal, also I notice visiting your site that there is an almost immediate redirect to /en/ which may be the problem for a rewrite path. There are several threads for multilinugual module configuration with .htaccess examples. I would cut the .htaccess right back and only use html for the time being to debug it.
Also I noticed something.
http://studiesinuk.net/cache/normal/studiesinuk.net/ give a 500 error, you may need to use the SymLinksIfOwnerMatch option which is available in the dev version of boost and is required for some web hosting services.
Comment #4
palazis commentedMy structure is like this:
cache-->normal-->studiesinuk.net-->en-->lots of html files there
Under normal I also have eshop.palazis.net, palazis.net, www.palazis.net, www.studiesinuk.net
I am not sure I understand what is this immediate redirect to /en/
I will also try the dev version
Comment #5
Anonymous (not verified) commentedWhen I went to the site by just pasting in
to the browser from your .htaccess file, it immediately redirected to
I have noticed a problem however. You should be able to access the cache files directly using the directory structure you mentioned
http://studiesinuk.net/cache/normal/studiesinuk.net/en/_.html
should give the cached index page, but if placed in the browser it is redirected to
http://studiesinuk.net/en/cache/normal/studiesinuk.net/en/_.html
and giving a 404 error which is correct, it looks like the there is something incorrect with the multilingual module and I suggest you look through the threads, I believe the problem was solved for a japanese site if memory serves correctly, it appears boost is functioning correctly but that it is the rewrite rules that need tweaking.
Comment #6
joeyb583 commentedI'm having the same issue. All setup looks fine, statue report is good, copied the info to the htaccess file, but not getting comment at the bottom of the source pages. Any help would be much appreciated. Here's the site link...
http://www.green-clinic.com.php53-2.dfw1-2.websitetestlink.com/
Comment #7
Anonymous (not verified) commentedThe pages are being generated I can see that much as the comment appears on
http://www.green-clinic.com.php53-2.dfw1-2.websitetestlink.com/cache/nor...
The boost code is very specific for it's location it must be placed after
RewriteBaseand before
which is the standard boost rewrite url code. I cannot see any cookies being set that would override boost, I'd certainly recommend a lifetime much greater than 1 hour.
Comment #8
joeyb583 commentedThat's interesting. Why would it not be outputting the boost code on the home page? Also, what would you recommend the max timeout be set to?
Here's is the .htaccess code...
Comment #9
Anonymous (not verified) commentedIf you have the Drupal code in a virtual host configuration file, then that can also do the same thing. Boost itself is working but your web server is not sending out the pages so the rewrite rules are being missed for some reason. You may want to do a simple
in your .htaccess file just to check that .htaccess is working. This kind of error normally comes down to one of two things, a module installed that logs a user in even if they are anonymous (and no DRUPAL_UID cookie is set on your site), or the rewrite rules being ignored.
As for the length of time. Boost can be set for a massively long time especially if you have httprl, cache expiry and boost_crawler (bad name for it as it regenerates pages that are updated, inserted or deleted rather than crawls the site) installed, then your anonymous users would build a cache of static files and the only reason ever to delete the cache would be if the site underwent any style changes that you wanted to propagate to older pages.
Comment #10
joeyb583 commentedI'll be honest I'm not sure of either of those. We use Rackspace cloud sites for hosting so I'm not sure about the virtual host configuration file.
I also have no clue about a module potentially logging a user in. How would I know or even check on that? I'm a fairly newbie. I'm not familiar with httprl, cache expire or boost crawler either. I'm just looking for a solution to increase speed on all our sites that will be mostly anonymous users. As you can see from that site, it's pretty slow. I really appreciate your help with this.
Comment #11
Anonymous (not verified) commentedYour problem is the Rewrite one, and if you are using a cloud server then you almost certainly have a root login and probably need someone a little more experienced to go through your apache configuration. It is rather slow and the main speed loss is from processing the PHP, I can see that looking at it through firebug. If you rename your .htaccess file (must remove the .ht at the beginning) and drupal still works then there is configuration elsewhere that has set up your system, if you are not comfortable with editing files directly on the server.
Comment #12
joeyb583 commentedYeah I'm comfortable doing that. I've been developing front-end stuff for a while, just not much Drupal and php and server administration is not my cup of tea.
When I was setting up clean URLs, I had to edit the RewriteBase in the .htaccess file and I added some dummy text at the end of it to make sure the site was using that and it broke the site if that tells you anything. Could it still be using rewrite configurations elsewhere?
Comment #13
Anonymous (not verified) commentedIt is very likely that it is using rewrite configuration elsewhere, I've seen it a few times including multi-billion dollar turnover company websites :) If the site continues to function then that's the issue, if not then you'll be needing to enable a RewriteDebug log file to see what the problem is, which can only be done by editing main apache configuration and turning it on and off again.
Comment #14
joeyb583 commentedGotcha. I'll chat with some rackspace server admins tomorrow and get them to take a look and see what we can't figure out. I'll update accordingly. Thanks again.
Comment #15
joeyb583 commentedNot much help from those guys. Let's go back to this...
order allow,deny
deny from all
What is that and what does it do?
I also came across this...
http://drupal.org/node/1888588
Once guy specifically mentions the vhost here...
http://drupal.org/node/1888588#comment-6939236
Comment #16
Anonymous (not verified) commentedThe deny from all statement is just a test to see if your .htaccess is working. If it is then your site would stop with a 500 error. The same test can be achieved by renaming the .htaccess file and seeing if the site still works.
Comment #17
joeyb583 commentedYeah I just did it and I got the Forbidden, you do not have access message. Does that mean it's not being overwritten elsewhere?
Comment #18
Anonymous (not verified) commentedNo it just means that the .htaccess file is being read, you need to remove / rename the file to see if the site works to know if there is a configuration elsewhere.
Comment #19
joeyb583 commentedWell I just renamed the files to access and the site still works, but the clean-urls did not. I added the ?q= to the path and it did.
Comment #20
Anonymous (not verified) commentedYou are going to have to read up about enabling the RewriteDebug log in your virtual host/ apache config to try and work out what's going on. If clean URL's aren't working with a disabled .htaccess then the root .htaccess is probably in control and you'll need the debug log to work out why the boost rules are being ignored.
Comment #21
joeyb583 commentedHey man I just figured it out. I stumbled upon this article in the rackspace knowledge space and step #10 was the key...
http://www.rackspace.com/blog/optimizing-your-drupal-site/
Now that I've got that, what would you suggest the length be set to?
Thanks for all your help.
Comment #22
Anonymous (not verified) commentedThat article is not quite correct. For one thing you need to disable page caching (the other pages are fine) but you need under performance to turn off the check box for
Cache pages for anonymous users
I also do not understand the comment about length.
Comment #23
joeyb583 commentedYeah I did that. I just skimmed it until I saw something different and step #10 is what stood out.
The length was referring to the length to set the cache to. Here's the site link...
http://www.green-clinic.com.php53-2.dfw1-2.websitetestlink.com/
Comment #24
Anonymous (not verified) commentedboost cache length is entirely dependent on what else you have installed. If you have the crawler component installed then any page that is edited/ updated/ deleted will be regenerated, so the cache length can be infinite if you wish to untick the box "remove old files on cron" which is my personal preference. However if you are doing major stylistic changes then you are going to want to alter that because your cached pages would be with "the old layout".
cron in drupal 7 is pretty much automatic as long as an authorised user logs in, or someone hits a page that boost does not cache. Depends on the frequency of updates to the site and really is most useful for search indexing, although the crawler does use cron it already expires the pages so an anonymous user could generate the pages before cron gets around to it so fresh content is always going to be provided. The other aspect of cron is the checking for available updates, that's a security issue so should be run at least once a week even if nothing else is happening to the site.
So the solution to the issue seems to be that your %{DOCUMENT_ROOT} does not match the filesystem on your cloud server, this would have been diagnosed by the Rewrite log being enabled as it would have given you the path that mod_rewrite was looking for in the filesystem.
Comment #25
joeyb583 commentedI will look into the boost crawler. Seems like the more efficient way to go so the pages are cached automatically upon expiration rather than requiring a user to load the page again before caching (assuming I'm understanding that correctly).
So with the "remove old files on cron" option, if its enabled Boost will delete the expired pages when cron is run?
If its disabled and using the crawler, the crawler will automatically regenerate the cached file upon expiration so the user doesn't need to hit the page first in order to cache it?
Am I understanding that correctly?
In regards to the rewrite log, wouldn't that need to be enabled in the server configurations? That's what I've read. If so, I don't have access to those settings, nor am I familiar with them.
Comment #26
Anonymous (not verified) commentedYes, otherwise you have to rely on cron deleting the pages and anonymous users creating the cache.
Yes, but I find there is little point to delete stale cache files apart from to save disk space unless the layout has changed, since you still require then an anonymous user to visit the page to re-create the cache. There is a theoretical advantage that the db tables would be "in memory" for non-cached pages, but thay tends to be far and few between.
No not on expiration, on modification, plus if the title is changed and the page appears in blocks on other pages, then the other pages are also deleted and then regenerated.
Yes, but it is unusual to have a cloud server with no access to the configuration, that's more like cheap shared hosting.
Comment #27
joeyb583 commentedAwesome. Installing Boost crawler now. Thanks for all your help on this. Definitely learned a lot. Top notch support and assistance.
Comment #28
joeyb583 commentedLast question. So I need the expire module for this to work as you you described? If not, what happens?
Comment #29
Anonymous (not verified) commentedIf you don't have expire then boost will run on cron and you'll need to have the setting turned on for "expire pages on cron run". This would only effect anonymous users but they could get way out of date pages where comments never updated.
One interesting point. If you put your site into maintenance mode, then anonymous users will still see "the site" as the rewrite rules hit the boost cache before going to index.php so the cache needs to be flushed if any major correction work needs to take place.
Comment #30
joeyb583 commentedSo to make sure I understand correctly, the cache either gets cleared when cron is run or when the expire module detects a content update of some sort. At that point, I've got the boost crawler and httprl modules installed that will automatically regenerate those cached pages that were cleared without a user having to hit the page. That correct?
In regards to the maintenance mode, that makes sense. Put the site in maintenance mode and then flush all cache so the users will see the maintenance page.
Comment #31
Anonymous (not verified) commentedExactly.
Comment #32
joeyb583 commentedAwesome. Last question for real this time. What do you suggest/recommend for cron settings and your internal performance settings, mainly minimum cache lifetime and expiration of cached pages or do those even matter since we're using boost and expire?
Comment #33
Anonymous (not verified) commentedDifficult subject, depends on many things.
You need it running at least once a week for update checking, possibly more if there is a large amount of new content every day as cron also controls your search indexing.
Comment #34
joeyb583 commentedYeah I don't foresee it changing much. I'll set it to run once a week.
What about the internal performance settings? I updated the last post maybe after you read the initial one. Any suggestions there?
Comment #35
Anonymous (not verified) commentedIf you have the crawler then cache everything for a long long time and then turn off the cron setting that removes stale files, then only ever clear the cache if the style of the site changes. The crawler would handle any updates, the only thing that may ever need changing would be if you had the google tracking modules enabled and they changed the javascript code which would then require an old cache clearout. But it will drop your CPU usage etc down to virtually nothing, and speed up everything especially any spiders indexing old information which can be a big drain.
Comment #36
joeyb583 commentedGotcha. What about the built in performance settings, minimum cache lifetime and expiration of cached pages? Do those matter since I'm using Boost? The max of those are 1 day.
Comment #37
Anonymous (not verified) commentedYou can ignore those settings for the pages but not for the blocks (though any performance increase is going to be very minimal). That highlights a lack in my knowledge as I do not know if those settings affect the aggregate css and javascript functions inside of drupal.
Comment #38
joeyb583 commentedSo I've updated the order of menu items and upon hitting it in a different browser not logged in, it's not updating. Is the cache expire and regeneration happen upon update or is it basically "flagged" and queued until cron is run?
Comment #39
joeyb583 commentedI tried running cron and it didn't work. Once I cleared cache, it updated the menu. Is this because this is more than just content?
Comment #40
Anonymous (not verified) commentedYes a menu is a block so is controlled by drupal's caching mechanism not boost's. Boost caches the output to the browser at one instant, if you move your links around or play with stylesheets, boost will either not find the new styles or the links will be the old ones just like if you saved a page to your hard drive.
Comment #41
joeyb583 commentedGotcha. So here's my setup and tell me what you think. I believe I'm at a good level of understanding...
Performance Settings
Cache Blocks
Minimum cache lifetime left to .
Expiration of cached pages left to .
Aggregates both CSS and JS files.
Cache Expiration
Left all defaults
Boost Settings
text/html - Max cache lifetime set to 12 months 4 days
text/html - Min cache lifetime set to 12 months 4 days
Boost Cache Configuration
Uncheck remove old cached files on cron
Boost Crawler
Checked Enable the cron crawler
How does that look to you?
Comment #42
Anonymous (not verified) commentedThat all looks fine as long as you remember that menus are blocks, and that themes work exactly the same way, boost will show the "old one" unless content is updated. It is only my personal opinion that cron should be unticked, but it's based on seeing sites that are spidered and using thousands of CPU cycles to generate old content which can run into thousands of pages.
Comment #43
joeyb583 commentedWhat do you mean specifically by untick cron? You mean the general cron setting?
Also, I went home last night and hit the site from there and it wasn't cached using boost. When I went to another page and then back to the home page, I could see it was using the boost cached pages. I thought the cached pages were automatically generated using boost crawler and httprl? Is that not right? If it should generate on cron run, it's set at every 3 hours so it should have run by the time I hit the page last night. I'm a bit confused by this.
Comment #44
Anonymous (not verified) commentedLook through the boost settings and there is a tick box, "remove stale files on cron", I don't think that should be ticked.
The crawler only crawls for changes, anonymous users generates the cache. It's a frequent naming error.
Comment #45
joeyb583 commentedYes, I did uncheck that box.
I guess the user has to generate the cache initially, huh? No other way to do that?
Comment #46
Anonymous (not verified) commentedThere's lots of ways of doing it from paid for systems, to submitting your site to a search engine which will spider it. There's not much when you're on shared hosting but there is a thread somewhere here on some PHP that people have put together to mimic a spider.
Comment #47
joeyb583 commentedYeah I think I'll stay away from that. What are your thoughts on the Boost expire module?
Comment #48
Anonymous (not verified) commentedAs far as I remember it serves the same function as the crawler and the cache expiration module so is not a core part of boost, it would not be likely that boost would be directly integrated with cache expiration as other modules rely on cache expiration, the change the boost 6 to boost 7 was a total redesign that took a very large piece of code that was quite difficult to manage and to split it into discrete chunks. I would doubt that one would go backwards and I suspect that boost_expire was created without fully understanding the functioning of the crawler as it appears to be a duplicate project.
Comment #49
joeyb583 commentedGotcha, I did read that but just wanted your opinion on it as I have a pretty knowledgeable co-worker using it and figured he just wasn't aware of the situation you described. I think I've got this figured out and I appreciate all your help and patience with me.