Create a grid for ways that a node can be expired, and what actions expire what pages.
These "Actions" can currently flush a node (they call boost_expire_node):
* Voting API
* Comments
* Nodes

Once That node has been signaled to be flushed, it can flush:
* Self
* Front page if promoted
* Tagged taxonomy term pages
* Other items contained in the menu it belongs to (nodes, views, ect...)
* CCK node reference fields
* Views containing this node

There is also the option of killing the file or merely marking it as expired in the database.

Example:
New Comments - Kills: Node, and views. Not: front page (if promoted), CCK references, taxonomy terms
Edited Node - Kills: Node, front page, CCK references, views. Expires: taxonomy terms
New Vote - Expires: Node. Kills: Views. Not: Front page, CCK references, taxonomy terms

Open to ideas, patches, ect... this will take some time to do.

Comments

hunvreus’s picture

On a quite large site I am running on Boost+NGINX, I have the need for defining more refined policies regarding the maximum cache lifetime of certain content types or specific pages. I know you can do so by using on the Boost blocks (choosing a scope and then selecting the specific max cache lifetime), but I would rather have a central place to edit these settings. If that idea sounds like something others may be interested into, I can clean up and commit back what I am working on.

mikeytown2’s picture

Have you thought about doing it with VBO? I'm always interested in stuff like that.

hunvreus’s picture

Hmmm, not sure if that could be done in VBO, except if we were to propose batch setting for specific nodes. What I am more interest in a first place is content type specific settings; i think that it's what most people would potentially use. What are exactly the types of "scopes" you had defined for Boost? Content-type, Nid and what else?

mikeytown2’s picture

So for the node its
Node
Node type like 'page'
NID

View
view name
view display like 'page' or 'page_1' or 'default'

taxonomy
vocabulary
TID

Code that does this

/**
 * Gets page_callback & page_arguments from menu_router table
 *
 * Allows for any content type to have it's own cache expiration.
 * TODO Better support of panels.
 */
function _boost_get_menu_router() {
  $router_item = menu_get_item();

  // Handle nodes
  if (arg(0) == 'node' && is_numeric(arg(1))) {
    $node = node_load(arg(1));
    $router_item['page_callback'] = 'node';
    $router_item['page_type'] = $node->type;
    $router_item['page_id'] = arg(1);
    return $router_item;
  }
  // Handle taxonomy
  if (arg(0) == 'taxonomy' && is_numeric(arg(2))) {
    $term = taxonomy_get_term(arg(2));
    $vocab = taxonomy_vocabulary_load($term->vid);
    $router_item['page_callback'] = 'taxonomy';
    $router_item['page_type'] = $vocab->name;
    $router_item['page_id'] = arg(2);
    return $router_item;
  }
  // Handle users
  if (arg(0) == 'user' && is_numeric(arg(1))) {
    $router_item['page_callback'] = 'user';
    $router_item['page_type'] = implode(', ', user_load(array('uid' => arg(1)))->roles);
    $router_item['page_id'] = arg(1);
    return $router_item;
  }
  // Handle views
  if ($router_item['page_callback'] == 'views_page') {
    $router_item['page_callback'] = 'view';
    $router_item['page_type'] = array_shift($router_item['page_arguments']);
    $router_item['page_id'] = array_shift($router_item['page_arguments']);
    // See http://drupal.org/node/651798 for the reason why this if is needed
    if (is_array($router_item['page_id'])) {
      $router_item['page_id'] = array_shift($router_item['page_id']);
    }
    return $router_item;
  }

  // Try to handle everything else
  if (is_array($router_item['page_arguments'])) {
    foreach ($router_item['page_arguments'] as $string) {
      if (is_string($string)) {
        $router_item['page_type'] = $string;
        break;
      }
    }
  }
  // Set empty if page_arguments is an empty object.
  if (!isset($router_item['page_type']) && empty($router_item['page_arguments'])) {
    $router_item['page_type'] = '';
  }
  // Set to first object in array if page_arguments is still an array and cast it as an string.
  if (!isset($router_item['page_type']) && is_array($router_item['page_arguments'])) {
    if (is_object($router_item['page_arguments'][0])) {
      $router_item['page_type'] = (string)get_class($router_item['page_arguments'][0]);
    }
    else {
      $router_item['page_type'] = (string)$router_item['page_arguments'][0];
    }
  }


  // Handle panels
  if (strstr($router_item['page_callback'], 'page_execute')) {
    if (db_table_exists('delegator_pages')) {
      $pid = db_fetch_array(db_query_range("SELECT pid FROM {delegator_pages} WHERE name = '%s'", $router_item['page_type'], 0, 1));
    }
    elseif (db_table_exists('page_manager_pages')) {
      $pid = db_fetch_array(db_query_range("SELECT pid FROM {page_manager_pages} WHERE name = '%s'", $router_item['page_type'], 0, 1));
    }
    $router_item['page_id'] = $pid ? $pid['pid'] : 0;
  }

  return $router_item;
}

hunvreus’s picture

Awesome, I am getting to it later on today and will let you know as soon as I have a usable patch.

mikeytown2’s picture

Like the idea of copying pathauto for node presets & not caching certain node types. There is a lot of potential here.

mikeytown2’s picture

Title: Expiration Grid » Expiration Grid - road map for this module
Project: Boost » Cache Expiration
Version: 6.x-1.x-dev »
Component: User interface » Code

What needs to happen:
Scan every view looking for paths. Each view that contains a path is treated like a node type. Detect other entries in the boost_cache table and generate configuration options for them as well.

Each cache type can have options for (some will be specific to the content container; node, view, etc...)
* min & max cache lifetime
* is-cacheable setting
* pager/url-query setting <- auto detect and have smart defaults (nodes, no pager; views with exposed filters will allow all; etc...)
* promoted can flush front page; pager support.
* node reference flush: forwards, backwards, both
* taxonomy control; certain vocab's can have different flushing options
* menu tree options
* views handling; figure out the paging issue. flush first 2 pages asap (configurable), expire rest over a period of time.
* expire or flush (expire marks it for a crawler; flush kills from the cache instantly)
* support for coded advanced configuration (hooks!)
* make this "exportable"

Different actions can trigger different expiration/flush settings. Example: New comments should only flush the view page that the node lives on, not the entire view with all it's pagers. Or if the theme doesn't indicate the comment count on the view then have the option to not flush the view on comments. Or if the view is directly related to comments then the full view should be flushed in a graceful manner.

The configuration file (the "exportable") will come before the GUI, because the GUI will be quite complicated & it will take some time to make this graceful. Once the GUI is in place, make the manual configuration part as minimal as possible by being smart with detection and defaults.

Make all of this support domain access and be multisite friendly. This is the road map; in short I'll be taking out the smarts from boost and putting it in this module. Boost does most of this right now in some sort of fashion; making it happen based on set rules is key to success.

sdboyer’s picture

interesting. subscribe.

Steven Jones’s picture

Version: » 6.x-1.x-dev

This all becomes mind bogglingly complex after a while, and then someone will go, hey I use OG, can you make this N-dimensional too?

yhahn has been looking into this stuff for OA, so might be worth getting him involved too.

Subscribe.

Vacilando’s picture

Subscribing.

mikeytown2’s picture

How to deal with all the writes that happen to the boost_cache table: Don't update all fields on each cache creation "action".

How to deal with all the writes that happen to the boost_cache_relationship table: Have a dirty flag so if the parent or child entity gets updated then it knows to recreate the relationship on cache creation. Will need some smart logic for views pagers. This is a major priority. Once I get this figured out I can then bring in views to the expires module.

mikeytown2’s picture

views need to store the argument given to it as well as the page number its on. Finding new content on views with arguments will be a challenge; taxonomy I can make it work, other types of arguments will not be as easy to magically do. Current progress on views page cache logic is going on here
http://drupal.org/node/785766#comment-3341042

ogi’s picture

subscribe

a_c_m’s picture

subscribe

achton’s picture

Subscribing.

SqyD’s picture

Just opened the D7 branch. see msg here: http://drupal.org/node/1151684#comment-4932604

How would you feel about releasing the current 6.x branch as a stable 1.0? I've been using it together with Purge on a production site for some time and works like a charm.

I would also like revamp the project page. Add a descriptive up to date list of features and integration options. And no more "playground" etc. This is some serious cache whipping were doing here ;-)

mikeytown2’s picture

1.0 sounds like a plan. Go ahead and publish a release; if you don't I might get around to it by Friday.

This might be an interest to you: http://drupal.org/project/httprl I had some free time yesterday so I put together the code into a module. The possibilities that a non-blocking http request brings to the table is mind boggling. Any task that is not directly associated with generating the current page's html can in theory be spun off into a background task. Something to keep in mind as you develop the code.

SqyD’s picture

Interesting. In purge I already use parallel execution of the requests through the use of curl_multi objects. Did you compair your approach with curl_multi? I'll investigate the background task option. Sounds like it's what I need to get the option to refetch the object after purging to perform reasonably. I was thinking to make it work with drupal_http_request as a failsafe. Will use this as a third option.

SqyD’s picture

the 1.0 is there. No changes to the code itself.
I've also improved (I hop you agree) the project page a bit. Will now start with the D7 port...

mikeytown2’s picture

Not all hosts have curl & I'm not sure if you can have curl "ping" a url (non-blocking mode in httprl). I might want to add in a curl implementation to httprl as well as one using sockets (like d6) as a fallback; because some hosts have socket_select disabled.

SqyD’s picture

On the http request library issue:
The "ping" idea sounds cool. Will investigate never seen in in any php_curl documentation, but then again, many things are not documented I learned the hard way (and through google;-)
The good thing about the purge request I send to varnish is that it doesn't hit a backend and I only check for the error code, ignoring output. So for my use case those requests are fast and "cheap" and the error return code is very usefull, but just for debugging. In the end we're bound to hit some performance bottleneck.

Seen some interesting node.js stuff at drupalcon london. Maybe a "cache-director" deamon on top of node.js could solve out quest for clean but warm caches. Just no idea where to start on that idea.

I guess in the end what we really need is this: #64866: Pluggable architecture for drupal_http_request() . I would love to get some movement in that long standing issue.

On "porting" boost code to expire:
I've been reading through some of the 7.x code of boost but so far cannot find any of the node/comment/use api stuff I was expecting after reading through expire-6.x. I am right to assume boost 7.x acts on completely different logic? Could you give me some pointers on where to start ripping out the expiration parts?
In the meanwhile I was just porting expire 6.x hooks to 7.x and that might get the job done too and a good 7.x api coding exercise and ready to rip that out when you come up with a better idea.

D7 port status: Configure form and "drush xu" already work. Tested with Purge and Varnish :-)

mikeytown2’s picture

Found this library for curl that has something similar to the non blocking mode of httprl: https://github.com/jmathai/php-multi-curl

7.x boost is dumb currently. I was going to put the smarts in expire. Your best bet is to translate what is in 6.x and move it forward to 7.x. I still don't have any 7.x sites so any code that is of 7.x series in any of my modules is fairly basic.

attiks’s picture

subscribe

SqyD’s picture

Sorry to do this but I just just changed the recommended version back to the 1.x branch for now.
I applaud the progress that has been made over the past weeks but I encountered a few critical issues with it, mainly the drush integration that seems to be completely broken. While not a core feature many users rely on a stable feature set. I'll try to spend some time on the drush integration and do some additional testing to make sure we can revert this change asap.

Spleshka’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev

Sure, ping me if you need any help.

Spleshka’s picture

@SqyD,

Any progress in fixing drush issues for 7.x-1.x branch?

Spleshka’s picture

@SqyD,

Please tell us about issues that you encountered in 7.x-2.x release so I can fix them.

bibo’s picture

Please tell us about issues that you encountered in 7.x-2.x release so I can fix them.

According to what I've noticed by doing some test, the 7.x-2.x just doesn't flush invidual pages from Varnish. Not with drush or otherwise. They both would work just fine when this expire-issue is fixed:
#2113941: hook_expire_cache() should not give absolute URLs but internal Drupal paths.

Or alternatively, if this varnish -issue gives better and more global solution:
#2017097: Ban/purge requests to varnish are malformed..

Currently almost all Varnish-purges don't work because because the page_cache implementation of Varnish module expects to receive only internal paths, but expire (7.2-series at least) feeds it with the base (domain/hostname/base_url). I noticed the same problem in the autocache-module, which aims to provide something similar but with less configuration.

This problem could/should also be considered a varnish-module -bug, because the standard Database page_cache includes the full url in the cid. The varnish-module issue queue has a lot of activity, but full releases seem rare.

Nevertheless a I hope for a quick and simple fix (getting the expire-admin setting at least would be nice), so I can officially recommend the expire-7.2- series to be used company wide, partially replacing cache_actions, which hasn't seen much action lately.

bibo’s picture

Issue summary: View changes

My patch here addresses both the Varnish/hostname problem and adds some drush functionality. If it is deemed a good solution, I hope to see 7.2 officially out.

jelo’s picture

The initial question was which actions can expire nodes. I would like to add/request the option for date comparisons. To me this seems to be a very common use case, e.g. anything event driven such as deadlines, registration dates etc.

Use case: in the case of events we may publish the event description, the event date, a date when registration opens, a date when registration closes etc. In most cases we may want the page to change when these milestones are hit, e.g. show a registration button when the registration opens, remove the registration button when the registration closes, add a note once the event is in the past etc.

Unfortunately, none of these items are actions that are triggered on the node. After the node is saved with x date fields populated, the changes need to happen on a date comparison against those fields.

Maybe this could be achieved with something like an advanced cache expiration schedule, i.e. we identify content types with date fields, expose a setting in the UI to allow site builders to check if a date field should be considered for cache expiration, if the date field is selected any node action (create, edit, delete) grabs the actual date and writes it into the cache clear schedule table. Then upon cron run we go through that schedule table and expire any nodes for which the date has been reached?

Cheers, J.