Closed (fixed)
Project:
Boost
Version:
6.x-1.x-dev
Component:
User interface
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
4 May 2009 at 18:40 UTC
Updated:
13 Aug 2009 at 17:47 UTC
Jump to comment: Most recent file
Comments
Comment #1
mikeytown2 commentedThis is what I've come up with after looking over the code.
It's root is boost. 95% sure about it being Alpha1.
Database table controls everything; has a nice looking GUI, but using it on over 1,000 URI's could be a little daunting. Integration with views would be an interesting approach. Referer is an interesting feature as well. Best idea for this is VBO & Actions. I see the light, this will be awesome, once the wildcard issue #443736: Smarter boost_cache_expire(). Path Changes & Wildcard Support gets fixed.
Code base has been simplified, some of the baggage from the previous way (4.7) of doing things is gone. (Step in the right direction)
Any code that is hard to understand has been removed (no regular expressions anywhere). (Might be a good idea, if feasible)
URL's are handled differently, query's are not in the rewrite rules. Uses
$_SERVER['REQUEST_URI']to set $path; option to not cache url variables. ('REQUEST_URI' is not a bad idea, might fix rewrite complexities with subdir installs) Only 3/6 rules instead of 4/8.Looks like better multi-site controls.
Smarter cache control vs the block system that we use.
Somewhat clearer module page; you know what it does and how it works.
In short there are some good idea in this module, just wished the author would have dropped a line on the boost issue queue, because it looks like this took him a lot of time to do. There's also some bugs in here most likely because of the alpha1 base.
Comment #2
mikeytown2 commentedHook into http://drupal.org/project/filebrowser for the cache directory structure if feasible , for general cache management #454070: Hooks/Actions. FTP like control of cache, if it works.
VBO will come first though, actions are fairly easy to write.
Comment #3
sebyoga commentedHello,
I voluntarily used $_SERVER['REQUEST_URI'] for more simplicity on the management of files for the hiding place. In the case or a site would be 100 % in mask(hiding place), we could easily use the file hide. It would be very practical for management systems of contained internal to a company which would have are the back-office elsewhere the front server.
Then it is clear that one GUI to manage over 1,000 URI's urls in cache is not very simple It is for it which is integrated a small search engine.
My purpose is not to take the placesquare of the module boost, but to have the second approach of a system of cache according to the needs of the user.
I can write the list of functionnality of my module ?
Best regards,
Sébastien
Comment #4
mikeytown2 commentedThe best method for pushing content IMHO would be something like http://drupal.org/project/html_export OR http://drupal.org/project/savetoftp it allows you to hide the server, and have full control of the site. The disadvantage of those 2 projects is the lack of the query strings being cached & the need to manually publish the content. Both Boost & Cache Static have the potential to take care of the query string issue, in regards to hiding your drupal server.
Using $_SERVER['REQUEST_URI'] is a smarter way to do it, if your going to cache the query string. Boost is working in this regard, so this cosmetic change is fairly low on the priority list.
I would rather hook into an existing system for cache organization/management rather then reinventing the wheel. What are your thoughts about VBO & #368366: Boost Block - Button: Flush Page's Cache?
Why make 2 slightly different modules when we can have 1 superior way of doing it? The core of both modules work the same for the most part, time & effort was waisted IMHO. These changes could have been added on to boost, from the beginning.
Did I miss any other new features/behaviors of your module?
Please write patches/code for boost, not a lot of people can develop/maintain the complexities that surround a file (html) based cache. You have some great ideas, and the drive to get it done :)
Comment #5
sebyoga commentedI think we should stay in the simple, but effective. It is supposed to always know the URL of the current page. We are therefore able to empty the cache for this page. (use the $_SERVER['REQUEST_URI'])
In the case of adding a new page, no problem, the cache will activate the next visit to the page.
For the parameters, with the $_SERVER['REQUEST_URI'], the page domain.tld/cat/cat/node?toto=Good, i put in cache domain.tl/cat/cat/node.html => in .htaccess, you accept a GET parameters.
You wan't a block for control the cache ?
If the page is in cache, when you view the block if the page in cache is generated with no block ?
In cacheStatic, i use a javascript for fetch the toolbar, only if the user a permission.
Comment #6
mikeytown2 commentedA simple way to add in find grained cache expiration is to add a setting to every page, exactly how Meta Tags does. That along with the minimum cache lifetime on the performance page can be used with touch()'s $time to control the expiration. Changes go into boost_cache_write(), hook_form_alter() for the UI; lift code from Meta Tags for UI.
Comment #7
mikeytown2 commentedBeen thinking about this more, _boost_rmdir_rf() traverses the entire
cache/example.com&cache/gz/example.comdir looking at the time stamp of the file and if that is past the cache expiration then it deletes the file. This is slow, going to switch boost over to database operations.File - Path to file - Primary Key
Created - Time of creation
Expiration - Time of expiration
Related - Serialized array of files that should be expired along with this one (planing for the future).
Going to keep _boost_rmdir_rf() for cache flushing, so everything gets cleared (even if the file is not in the DB anymore). Speed shouldn't be any different since everything needs to get nuked in this case.
Comment #8
mikeytown2 commentedComment #9
mikeytown2 commentedwrong post...
Comment #10
mikeytown2 commentedComment #11
mikeytown2 commentedprototype schema
'length' => 2047 comes from http://www.boutell.com/newfaq/misc/urllength.html
Comment #12
mikeytown2 commentedFirst step:
Boost & Core have different cache expiration times.
Block displays # of pages in cache.
Cron uses DB so you should be able to run cron more often on large sites.
This should speed up boost.
Comment #13
yhager commentedsubscribing
Comment #14
mikeytown2 commentedSome thoughts on the schema
CID: int - Primary Key;
filename: text - normal; varchar doesn't go up to 2047
created: int; currently redundant, may drop
expire: int;
duration: int; used to set the expire time. Will allow for very fine grained control of the cache
push: int - tiny, used as bool, default true; pre-cache this page via crawler
Holding off on the related field since I don't see a smart way to set it right now.
Each page will be able to set the duration & push via block and/or VBO actions and/or a setting thats on every page.
Comment #15
mikeytown2 commentedcommitted the non cache parts
Comment #16
mikeytown2 commentedIf and only if using the above patch, run this once, then re-run Update 6100 after applying this attached patch to the latest dev
Going to stick with varchar 255 for now.
Comment #17
mikeytown2 commentedNeed to save the state of
lifetime&pushin the database. Currently gets overwritten with defaults.lifetimedefault set to -1.pushdefault set to -1; add setting to select default pre-caching.If -1 then use default setting.
Need to add in a boost reset button that calls this
Comment #18
mikeytown2 commentedDisplay # of pages for each cache & ability to flush entire cache (core or boost) on performance page.
Comment #19
mikeytown2 commentedIf and only if using one of the above patches, run this once, then re-run Update 6100 after applying this attached patch to the latest dev
This patch will:
Save the state of lifetime & push in the database.
Boost reset button in advanced settings on performance page.
Display # of pages for each cache & ability to flush entire cache (core or boost) on performance page.
Reorganized performance page.
Boost block works off of database.
Comment #20
mikeytown2 commentedNeed to store page_callback in db as well. page_callback is the function that is called to generate the page. Will allow for boost to set the expiration time for each content type. Example: views set at 10 min, nodes set at 12 hrs. Stuff like that.
#453908: Hook for panel node types - expiration of static cache; panel containing multiple nodes
Comment #21
mikeytown2 commentedShould add in page_callback_id column. Storage: Views the vid; nodes the nid; panels the pid ect...
Should add in page_callback_id_name column. Storage: For views pages, it contains the name from views_view table; nodes, the type; panels, the name; ect...
taxonomy doesn't use page_callback, put taxonomy_vid in there. then page_callback_id is tid, page_callback_id_name is name.
Comment #22
mikeytown2 commentedGoing to go for this
This will act as a helper for setting the content expiration (something to query).
Views with the system name * can be set for 10 min.
Nodes with the content type * can be set for 2 hrs.
Default (everything else) can be set at 12 hrs.
I might need to add in a second DB table so the admin can set the cache lifetime and new content of that type inherits these settings.
Comment #23
mikeytown2 commentedCode that will set page_callback & page_argument in the boost db table.
Comment #24
mikeytown2 commentedIf and only if using one of the above patches, run this once, then re-run Update 6100 after applying this attached patch to the latest dev
This patch will store page_callback & page_arguments in the DB. New table called boost_cache_settings.
Comment #25
mikeytown2 commentedCommitted above patch.
Next step is the interface so admin can set expiration times per content type. If you manually add/change things in the DB, it should work as expected.
Comment #26
mikeytown2 commentedComment #27
mikeytown2 commentedComment #28
mzytaruk commentedWill this feature affect cases where multiple front ends are using the same database and have to expire their caches?
Comment #29
mikeytown2 commentedinteresting point...
boost_cache_db_expire() will expire all files if they are in the same database, which isn't necessarily a bad thing... boost_expire_cron will have the same settings since it is using the same database.
boost_cache_clear_all() (what happens when you click a flush all btn) will reset the entries in the database, but will not clear all the files. So this is a slight problem for the admin because it might say this page is not cached when in fact it is.
Long story short, it's a slight issue IFF your running a multisite with one shared database; issues having to do with what the admin sees. Boost will not clear the other sites cached files, only the database entries because it does this
db_query("UPDATE {boost_cache} SET expire = %d", 0);. Looks Like I need to add in a new column to the DB just for the site name. I think I'll use the base URL as another index, will help out with the crawler then.Comment #30
mzytaruk commentedI was more wondering about using a front end server farm, rather than a multisite installation. Since each front end has to expire it's own cache, will one of the front ends update the db when it expires its cache and cause the other servers to not expire theirs? Thanks!
Comment #31
mikeytown2 commentedAs it is right now, because they are all using the same database, when cron is run it will expire all "stale" content in the database and delete the html file associated with that db entry.
Comment #32
mikeytown2 commentedAdd a new column to the db called page_id. Views the vid; nodes the nid; panels the pid; terms the tid; ect... Mainly useful for tid, will allow for term page expiration when term updated. Might also be able to use CCK node reference as well...
Comment #33
mikeytown2 commentedRecord http://api.drupal.org/api/function/timer_read
timer_read('page')in DB, so slow pages can be crawled first.Comment #34
mikeytown2 commentedSummary:
Fix single db multi-site issue - add base url column.
Add a new column to the db called page_id - allows for better cache expiration.
use DB like instead of glob in boost_cache_expire() - don't hit the disk unless necessary.
Expiration settings per page - boost block.
Comment #35
mikeytown2 commentedPatch for
Expiration settings per page - boost block
Comment #36
mikeytown2 commentedcommitted
Comment #37
mikeytown2 commentedNext steps
1: Fix multisite cache flushing
Add a column to boost_cache to store the sites name
2: Smarter Cache Expiration
Add a new column to boost_cache & boost_cache_settings called page_id. Reason for boost_cache_settings has to do with the new boost block interface, allow it to work on all views of with that ID; otherwise different pages of the same view will have different expiration times.
3: Set page cache for content type
Add a fieldset setting group to admin/settings/performance/boost which sets cache expiration times.
4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query.
5: Add Timer Columns
Will be a nice feature.
Comment #38
mikeytown2 commentedIssues 1, 2 & 5 are in this patch... sill need to implement usage for 2.
Comment #39
mikeytown2 commentedsteps 1, 2 & 5 are in this patch.
Still need to do:
3: Set page cache for content type
Add a fieldset group to admin/settings/performance/boost which can selectively clear the boost_cache_settings table.
4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query.
Comment #40
mikeytown2 commentedsteps 1, 2, 3 & 5 are in this patch.
selective clearing of the boost_cache_settings table done via boost block.
4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query; use page_id.
Comment #41
mikeytown2 commentedcommitted. Going to leave #4 hanging as it's not that important right now.
Comment #43
giorgio79 commentedFor the expiration settings per page, how about having that on the node edit page.
This way users wont have to configure a block and setting it so only admin users can see it etc etc :)
Comment #44
mikeytown2 commented@giorgio79
Write the code, and I'll add it in.
Comment #45
giorgio79 commentedThanks Mikey, actually I just tested it, and see that only my admin user sees it by default.
It is also quite easy to add a block control, so it only shows on node edit pages.
Cheers