The main difference between Boost and Cache Static is a database table. A table would allow for a lot of interesting things, like a UI for the entire cache, different expiration times for each page, ect... This thread is here to help with the merging of cache statics new features into boost. First step is to get a complete list of the differences between the 2 projects.

Comments

mikeytown2’s picture

This is what I've come up with after looking over the code.

It's root is boost. 95% sure about it being Alpha1.
Database table controls everything; has a nice looking GUI, but using it on over 1,000 URI's could be a little daunting. Integration with views would be an interesting approach. Referer is an interesting feature as well. Best idea for this is VBO & Actions. I see the light, this will be awesome, once the wildcard issue #443736: Smarter boost_cache_expire(). Path Changes & Wildcard Support gets fixed.
Code base has been simplified, some of the baggage from the previous way (4.7) of doing things is gone. (Step in the right direction)
Any code that is hard to understand has been removed (no regular expressions anywhere). (Might be a good idea, if feasible)
URL's are handled differently, query's are not in the rewrite rules. Uses $_SERVER['REQUEST_URI'] to set $path; option to not cache url variables. ('REQUEST_URI' is not a bad idea, might fix rewrite complexities with subdir installs) Only 3/6 rules instead of 4/8.
Looks like better multi-site controls.
Smarter cache control vs the block system that we use.
Somewhat clearer module page; you know what it does and how it works.

In short there are some good idea in this module, just wished the author would have dropped a line on the boost issue queue, because it looks like this took him a lot of time to do. There's also some bugs in here most likely because of the alpha1 base.

mikeytown2’s picture

Hook into http://drupal.org/project/filebrowser for the cache directory structure if feasible , for general cache management #454070: Hooks/Actions. FTP like control of cache, if it works.
VBO will come first though, actions are fairly easy to write.

sebyoga’s picture

Hello,

I voluntarily used $_SERVER['REQUEST_URI'] for more simplicity on the management of files for the hiding place. In the case or a site would be 100 % in mask(hiding place), we could easily use the file hide. It would be very practical for management systems of contained internal to a company which would have are the back-office elsewhere the front server.

Then it is clear that one GUI to manage over 1,000 URI's urls in cache is not very simple It is for it which is integrated a small search engine.

My purpose is not to take the placesquare of the module boost, but to have the second approach of a system of cache according to the needs of the user.

I can write the list of functionnality of my module ?

Best regards,

Sébastien

mikeytown2’s picture

The best method for pushing content IMHO would be something like http://drupal.org/project/html_export OR http://drupal.org/project/savetoftp it allows you to hide the server, and have full control of the site. The disadvantage of those 2 projects is the lack of the query strings being cached & the need to manually publish the content. Both Boost & Cache Static have the potential to take care of the query string issue, in regards to hiding your drupal server.
Using $_SERVER['REQUEST_URI'] is a smarter way to do it, if your going to cache the query string. Boost is working in this regard, so this cosmetic change is fairly low on the priority list.

I would rather hook into an existing system for cache organization/management rather then reinventing the wheel. What are your thoughts about VBO & #368366: Boost Block - Button: Flush Page's Cache?

Why make 2 slightly different modules when we can have 1 superior way of doing it? The core of both modules work the same for the most part, time & effort was waisted IMHO. These changes could have been added on to boost, from the beginning.

Did I miss any other new features/behaviors of your module?

Please write patches/code for boost, not a lot of people can develop/maintain the complexities that surround a file (html) based cache. You have some great ideas, and the drive to get it done :)

sebyoga’s picture

I think we should stay in the simple, but effective. It is supposed to always know the URL of the current page. We are therefore able to empty the cache for this page. (use the $_SERVER['REQUEST_URI'])

In the case of adding a new page, no problem, the cache will activate the next visit to the page.

For the parameters, with the $_SERVER['REQUEST_URI'], the page domain.tld/cat/cat/node?toto=Good, i put in cache domain.tl/cat/cat/node.html => in .htaccess, you accept a GET parameters.

You wan't a block for control the cache ?

If the page is in cache, when you view the block if the page in cache is generated with no block ?

In cacheStatic, i use a javascript for fetch the toolbar, only if the user a permission.

mikeytown2’s picture

A simple way to add in find grained cache expiration is to add a setting to every page, exactly how Meta Tags does. That along with the minimum cache lifetime on the performance page can be used with touch()'s $time to control the expiration. Changes go into boost_cache_write(), hook_form_alter() for the UI; lift code from Meta Tags for UI.

mikeytown2’s picture

Been thinking about this more, _boost_rmdir_rf() traverses the entire cache/example.com & cache/gz/example.com dir looking at the time stamp of the file and if that is past the cache expiration then it deletes the file. This is slow, going to switch boost over to database operations.

File - Path to file - Primary Key
Created - Time of creation
Expiration - Time of expiration
Related - Serialized array of files that should be expired along with this one (planing for the future).

Going to keep _boost_rmdir_rf() for cache flushing, so everything gets cleared (even if the file is not in the DB anymore). Speed shouldn't be any different since everything needs to get nuked in this case.

mikeytown2’s picture

Title: Merge Cache Static project into boost » Merge Cache Static into boost - Use Database instead of _boost_rmdir_rf()
mikeytown2’s picture

Title: Merge Cache Static into boost - Use Database instead of _boost_rmdir_rf() » Merge Cache Static project into boost

wrong post...

mikeytown2’s picture

Title: Merge Cache Static project into boost » Merge Cache Static into boost - Use Database instead of _boost_rmdir_rf()
mikeytown2’s picture

prototype schema

function boost_schema() {
  $schema['boost_cache'] = array(
    'description' => t('Table for the list of the cached page'),
    'fields' => array(
      'filepath' => array(
        'description' => 'Path of the cached file relative to Drupal root.',
        'type' => 'varchar',
        'length' => 2047,
        'not null' => TRUE,
        'default' => '',
        ),
      'created' => array(
        'description' => t('UNIX timestamp for when the page was cached.'),
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,	        
        'default' => 0,
      ),
      'expire' => array(
        'description' => t('UNIX timestamp for the expiration date of cached page.'),
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,	        
        'default' => 0,
      ),
    ),
    'primary key' => array('filepath'),
  );
  return $schema;
}

'length' => 2047 comes from http://www.boutell.com/newfaq/misc/urllength.html

mikeytown2’s picture

Status: Active » Needs review
StatusFileSize
new11.86 KB

First step:
Boost & Core have different cache expiration times.
Block displays # of pages in cache.
Cron uses DB so you should be able to run cron more often on large sites.
This should speed up boost.

yhager’s picture

subscribing

mikeytown2’s picture

Status: Needs review » Needs work

Some thoughts on the schema

CID: int - Primary Key;
filename: text - normal; varchar doesn't go up to 2047
created: int; currently redundant, may drop
expire: int;
duration: int; used to set the expire time. Will allow for very fine grained control of the cache
push: int - tiny, used as bool, default true; pre-cache this page via crawler
Holding off on the related field since I don't see a smart way to set it right now.

Each page will be able to set the duration & push via block and/or VBO actions and/or a setting thats on every page.

mikeytown2’s picture

committed the non cache parts

mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new8.23 KB

If and only if using the above patch, run this once, then re-run Update 6100 after applying this attached patch to the latest dev

db_query(DROP TABLE 'boost_cache');

Going to stick with varchar 255 for now.

mikeytown2’s picture

Need to save the state of lifetime & push in the database. Currently gets overwritten with defaults.
lifetime default set to -1.
push default set to -1; add setting to select default pre-caching.
If -1 then use default setting.

Need to add in a boost reset button that calls this

db_query("DELETE FROM {boost_cache}");
boost_cache_clear_all();
mikeytown2’s picture

Status: Needs review » Needs work

Display # of pages for each cache & ability to flush entire cache (core or boost) on performance page.

mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new30.57 KB

If and only if using one of the above patches, run this once, then re-run Update 6100 after applying this attached patch to the latest dev

db_query(DROP TABLE 'boost_cache');

This patch will:
Save the state of lifetime & push in the database.
Boost reset button in advanced settings on performance page.
Display # of pages for each cache & ability to flush entire cache (core or boost) on performance page.
Reorganized performance page.
Boost block works off of database.

mikeytown2’s picture

Status: Needs review » Needs work

Need to store page_callback in db as well. page_callback is the function that is called to generate the page. Will allow for boost to set the expiration time for each content type. Example: views set at 10 min, nodes set at 12 hrs. Stuff like that.
#453908: Hook for panel node types - expiration of static cache; panel containing multiple nodes

mikeytown2’s picture

Should add in page_callback_id column. Storage: Views the vid; nodes the nid; panels the pid ect...
Should add in page_callback_id_name column. Storage: For views pages, it contains the name from views_view table; nodes, the type; panels, the name; ect...

taxonomy doesn't use page_callback, put taxonomy_vid in there. then page_callback_id is tid, page_callback_id_name is name.

mikeytown2’s picture

Going to go for this

      'page_callback' => array(
        'description' => 'The name of the function that renders the page.',
        'type' => 'varchar',
        'length' => 255,
        'not null' => TRUE,
        'default' => ''
      ),
      'page_callback_content_name' => array(
        'description' => 'The name of the content type.',
        'type' => 'varchar',
        'length' => 255,
        'not null' => TRUE,
        'default' => ''
      ),

This will act as a helper for setting the content expiration (something to query).
Views with the system name * can be set for 10 min.
Nodes with the content type * can be set for 2 hrs.
Default (everything else) can be set at 12 hrs.

I might need to add in a second DB table so the admin can set the cache lifetime and new content of that type inherits these settings.

mikeytown2’s picture

Code that will set page_callback & page_argument in the boost db table.

$path = $_GET['q'];
$original_map = arg(NULL, $path);
$parts = array_slice($original_map, 0, MENU_MAX_PARTS);
list($ancestors, $placeholders) = menu_get_ancestors($parts);
$router_item = db_fetch_array(db_query_range('SELECT page_callback, page_arguments FROM {menu_router} WHERE path IN ('. implode (',', $placeholders) .') ORDER BY fit DESC', $ancestors, 0, 1));
$router_item['page_arguments'] = menu_unserialize($router_item['page_arguments']);
if (arg(0) == 'node' && is_numeric(arg(1))) {
  $node = node_load(arg(1));
  $router_item['page_arguments'] = $node->type;
}
elseif (arg(0) == 'taxonomy' && is_numeric(arg(2))) {
  $term = taxonomy_get_term(arg(2));
  $vocab = taxonomy_vocabulary_load($term->vid);
  $router_item['page_arguments'] = $vocab->name;
}
elseif (is_array($router_item['page_arguments'])) {
  foreach ($router_item['page_arguments'] as $string) {
    if (is_string($string)) {
      $router_item['page_arguments'] = $string;
      break ;
    }
  }
}
if (empty($router_item['page_arguments'])) {
  $router_item['page_arguments'] = '';
}

echo $router_item['page_callback'];
echo '<br>';
echo $router_item['page_arguments'];
mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new35.34 KB

If and only if using one of the above patches, run this once, then re-run Update 6100 after applying this attached patch to the latest dev

db_query(DROP TABLE 'boost_cache');

This patch will store page_callback & page_arguments in the DB. New table called boost_cache_settings.

mikeytown2’s picture

Status: Needs review » Needs work

Committed above patch.

Next step is the interface so admin can set expiration times per content type. If you manually add/change things in the DB, it should work as expected.

mikeytown2’s picture

Title: Merge Cache Static into boost - Use Database instead of _boost_rmdir_rf() » Merge Cache Static into boost - Create GUI for database operations
mikeytown2’s picture

Status: Needs work » Active
mzytaruk’s picture

Will this feature affect cases where multiple front ends are using the same database and have to expire their caches?

mikeytown2’s picture

interesting point...
boost_cache_db_expire() will expire all files if they are in the same database, which isn't necessarily a bad thing... boost_expire_cron will have the same settings since it is using the same database.

boost_cache_clear_all() (what happens when you click a flush all btn) will reset the entries in the database, but will not clear all the files. So this is a slight problem for the admin because it might say this page is not cached when in fact it is.

Long story short, it's a slight issue IFF your running a multisite with one shared database; issues having to do with what the admin sees. Boost will not clear the other sites cached files, only the database entries because it does this db_query("UPDATE {boost_cache} SET expire = %d", 0);. Looks Like I need to add in a new column to the DB just for the site name. I think I'll use the base URL as another index, will help out with the crawler then.

mzytaruk’s picture

I was more wondering about using a front end server farm, rather than a multisite installation. Since each front end has to expire it's own cache, will one of the front ends update the db when it expires its cache and cause the other servers to not expire theirs? Thanks!

mikeytown2’s picture

As it is right now, because they are all using the same database, when cron is run it will expire all "stale" content in the database and delete the html file associated with that db entry.

mikeytown2’s picture

Add a new column to the db called page_id. Views the vid; nodes the nid; panels the pid; terms the tid; ect... Mainly useful for tid, will allow for term page expiration when term updated. Might also be able to use CCK node reference as well...

mikeytown2’s picture

Record http://api.drupal.org/api/function/timer_read timer_read('page') in DB, so slow pages can be crawled first.

mikeytown2’s picture

Summary:
Fix single db multi-site issue - add base url column.
Add a new column to the db called page_id - allows for better cache expiration.
use DB like instead of glob in boost_cache_expire() - don't hit the disk unless necessary.
Expiration settings per page - boost block.

mikeytown2’s picture

Status: Active » Needs review
StatusFileSize
new5.23 KB

Patch for
Expiration settings per page - boost block

mikeytown2’s picture

Status: Needs review » Active

committed

mikeytown2’s picture

Next steps

1: Fix multisite cache flushing
Add a column to boost_cache to store the sites name

2: Smarter Cache Expiration
Add a new column to boost_cache & boost_cache_settings called page_id. Reason for boost_cache_settings has to do with the new boost block interface, allow it to work on all views of with that ID; otherwise different pages of the same view will have different expiration times.

3: Set page cache for content type
Add a fieldset setting group to admin/settings/performance/boost which sets cache expiration times.

4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query.

5: Add Timer Columns
Will be a nice feature.

mikeytown2’s picture

Status: Active » Needs work
StatusFileSize
new16.27 KB

Issues 1, 2 & 5 are in this patch... sill need to implement usage for 2.

mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new23.13 KB

steps 1, 2 & 5 are in this patch.

Still need to do:

3: Set page cache for content type
Add a fieldset group to admin/settings/performance/boost which can selectively clear the boost_cache_settings table.

4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query.

mikeytown2’s picture

StatusFileSize
new23.13 KB

steps 1, 2, 3 & 5 are in this patch.
selective clearing of the boost_cache_settings table done via boost block.

4: Kill glob() in boost_cache_expire()
Don't hit the disk unless necessary. use a LIKE % query; use page_id.

mikeytown2’s picture

Status: Needs review » Fixed

committed. Going to leave #4 hanging as it's not that important right now.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

giorgio79’s picture

For the expiration settings per page, how about having that on the node edit page.

This way users wont have to configure a block and setting it so only admin users can see it etc etc :)

mikeytown2’s picture

@giorgio79
Write the code, and I'll add it in.

giorgio79’s picture

Thanks Mikey, actually I just tested it, and see that only my admin user sees it by default.

It is also quite easy to add a block control, so it only shows on node edit pages.

Cheers