Hello
I'm using Boost module and Boost_Expiration module for expiration logic and httprl module too .
I set my Maximum lifetime to 3months and i'm checking the " Ignore a cache flush command if cron issued the request." and " Remove old cache files on cron." options .
I remarked that the cache files are flushed after each cron and regenerated afetr i visit the page .
Is there anyway to fix this probleme please .
Thank you very much

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

Priority: Major » Minor

Frequently this comes about because the Boost code is not in the .htaccess file correctly or there is a rewrite rule in the virtual hosting environment so that redirects everything to the index.php before the boost rules. Boost rules should be after rewrite base and before any other rules as otherwise the browser will hit index.php and then generate a file.

aminebourkadi’s picture

here is my htaccess file :

#
# Apache/PHP/Drupal settings:
#

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)$|^(\..*|Entries.*|Repository|Root|Tag|Template)$">
  Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

# Make Drupal handle any 404 errors.
ErrorDocument 404 /index.php

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings that cannot be changed at runtime. See
# sites/default/default.settings.php and drupal_environment_initialize() in
# includes/bootstrap.inc for settings that can be changed at runtime.

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
  php_flag magic_quotes_gpc                 off
  php_flag magic_quotes_sybase              off
  php_flag register_globals                 off
  php_flag session.auto_start               off
  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_flag mbstring.encoding_translation    off
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
  # Enable expirations.
  ExpiresActive On

  # Cache all files for 2 weeks after access (A).
  ExpiresDefault A1209600

  <FilesMatch \.php$>
    # Do not allow PHP scripts to be cached unless they explicitly send cache
    # headers themselves. Otherwise all scripts would have to overwrite the
    # headers set by mod_expires if they want another caching behavior. This may
    # fail if an error occurs early in the bootstrap process, and it may cause
    # problems if a non-Drupal PHP file is installed in a subdirectory.
    ExpiresActive Off
  </FilesMatch>
</IfModule>
### BOOST START ###

  # Allow for alt paths to be set via htaccess rules; allows for cached variants (future mobile support)
  RewriteRule .* - [E=boostpath:normal]

  # Caching for anonymous users
  # Skip boost IF not get request OR uri has wrong dir OR cookie is set OR request came from this server OR https request
  RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
  RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]
  RewriteCond %{HTTPS} on [OR]
  RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
  RewriteCond %{ENV:REDIRECT_STATUS} 200
  RewriteRule .* - [S=3]

  # GZIP
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=1]
  RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:boostpath}/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/%{ENV:boostpath}/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html,E=no-gzip:1]

  # NORMAL
  RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:boostpath}/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/%{ENV:boostpath}/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]

  ### BOOST END ###
  
# Various rewrite rules.
<IfModule mod_rewrite.c>
  RewriteEngine on

  # Block access to "hidden" directories whose names begin with a period. This
  # includes directories used by version control systems such as Subversion or
  # Git to store control files. Files whose names begin with a period, as well
  # as the control files used by CVS, are protected by the FilesMatch directive
  # above.
  #
  # NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is
  # not possible to block access to entire directories from .htaccess, because
  # <DirectoryMatch> is not allowed here.
  #
  # If you do not have mod_rewrite installed, you should remove these
  # directories from your webroot or otherwise protect them from being
  # downloaded.
  RewriteRule "(^|/)\." - [F]

  # If your site can be accessed both with and without the 'www.' prefix, you
  # can use one of the following settings to redirect users to your preferred
  # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  #
  # To redirect all users to access the site WITH the 'www.' prefix,
  # (http://example.com/... will be redirected to http://www.example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\. [NC]
  # RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  #
  # To redirect all users to access the site WITHOUT the 'www.' prefix,
  # (http://www.example.com/... will be redirected to http://example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
  # RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

  # Modify the RewriteBase if you are using Drupal in a subdirectory or in a
  # VirtualDocumentRoot and the rewrite rules are not working properly.
  # For example if your site is at http://example.com/drupal uncomment and
  # modify the following line:
  # RewriteBase /drupal
  #
  # If your site is running in a VirtualDocumentRoot at http://example.com/,
  # uncomment the following line:
  # RewriteBase /

  # Pass all requests not referring directly to files in the filesystem to
  # index.php. Clean URLs are handled in drupal_environment_initialize().
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^ index.php [L]

  # Rules to correctly serve gzip compressed CSS and JS files.
  # Requires both mod_rewrite and mod_headers to be enabled.
  <IfModule mod_headers.c>
    # Serve gzip compressed CSS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.css $1\.css\.gz [QSA]

    # Serve gzip compressed JS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.js $1\.js\.gz [QSA]

    # Serve correct content types, and prevent mod_deflate double gzip.
    RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1]
    RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1]

    <FilesMatch "(\.js\.gz|\.css\.gz)$">
      # Serve correct encoding type.
      Header set Content-Encoding gzip
      # Force proxies to cache gzipped & non-gzipped css/js files separately.
      Header append Vary Accept-Encoding
    </FilesMatch>
  </IfModule>
</IfModule>
  
Anonymous’s picture

And therein lies the problem. In the boost .htaccess generation instructions it states

Copy this into your .htaccess file below

# If your site is running in a VirtualDocumentRoot at http://example.com/,
# uncomment the following line:
# RewriteBase /

and above

# Pass all requests not referring directly to files in the filesystem to
# index.php. Clean URLs are handled in drupal_environment_initialize().

and your boost rules are out of place.

aminebourkadi’s picture

Sorry my english is not very good, should i uncomment those two lines or just one ?


# If your site is running in a VirtualDocumentRoot at http://example.com/,
# uncomment the following line:
 RewriteBase /

and that?::

# Pass all requests not referring directly to files in the filesystem to
# index.php. Clean URLs are handled in drupal_environment_initialize().
Anonymous’s picture

Status: Active » Closed (works as designed)

It would be impossible for me to say 100% as I have no knowledge of your domain configuration but since your site is working then you should probably comment out the

RewriteBase /

line. This is basic drupal clean URL set up. Then the BOOST rules go after that line.

aminebourkadi’s picture

thank you very much it works now

aminebourkadi’s picture

it's work but i found another critical problem:
All the cache is flushed after each new node submit, i'm using the boost_expire module to manage boost cache changes

Anonymous’s picture

Install boost_crawler, it's not a correct name, it regenerates only edited/ inserted/ deleted pages.

aminebourkadi’s picture

boost_crawler is already installed, and i verifie it today many times, before i insert any node it was arround 1000 cached file, but after it goes !!

RAWDESK’s picture

I have a similar symptom as described in #7.
When an anonymous user (most of the time spam bot crawlers) are hitting node/add/blog, my whole static cache folder /cache/mydomain.com gets flushed.
This happens although a 401 is returned. So no actual content has been created yet !

Apparently triggered via a cron thread :
Schermafbeelding 2017-02-25 om 17.59.21.png

Having boost crawler enabled, as well as "Ignore a cache flush command if cron issued the request." and "Remove old cache files on cron."

Looking into the responsible piece of code of boost.module :

/**
 * Implements hook_flush_caches(). Deletes all static files.
 */
function boost_flush_caches() {
  // Remove all files from the cache
  global $_boost;

  // This may not have been invoked in hook_init because of the quick
  // check to avoid caching requests from the CLI
  $_boost = boost_transform_url();

  // The lock_may_be_available() checks to see if the flush was requested by
  // the core cron, since we may want to ignore it (boost_ignore_flush)
  if (isset($_boost['base_dir']) && (lock_may_be_available('cron') || variable_get('boost_ignore_flush', BOOST_IGNORE_FLUSH) == FALSE)) {
    $count = _boost_rmdir($_boost['base_dir'], TRUE);
    watchdog('boost', 'Flushed all files (%count) from static page cache.', array('%count' => $count), WATCHDOG_NOTICE);
  }
  return;
}

i am concluding that the first condition lock_may_be_available('cron') was met, since 'boost_ignore_flush' evaluates TRUE.

Having little understanding of what this lock function actually evaluates in relation to boost cached nodes.
Obviously it is checking the existance and expire value of 'cron' in the semaphore table.

function lock_may_be_available($name) {
  $lock = db_query('SELECT expire, value FROM {semaphore} WHERE name = :name', array(':name' => $name))->fetchAssoc();
  if (!$lock) {
    return TRUE;
  }
  $expire = (float) $lock['expire'];
  $now = microtime(TRUE);
  if ($now > $expire) {
    // We check two conditions to prevent a race condition where another
    // request acquired the lock and set a new expire time. We add a small
    // number to $expire to avoid errors with float to string conversion.
    return (bool) db_delete('semaphore')
      ->condition('name', $name)
      ->condition('value', $lock['value'])
      ->condition('expire', 0.0001 + $expire, '<=')
      ->execute();
  }
  return FALSE;
}

Sorry but this puzzles me, since i would expect some query instead evaluating the expiration of the cached nodes.
In my case the semaphore table is completely empty, returns true in the evaluation and flushes my static cache.
Wondering if the Ultimate cron installation might be conflicting with this evaluation, since boost might be expecting core cron settings and evalutions.

RAWDESK’s picture

Priority: Minor » Major
Status: Closed (works as designed) » Active

Reopend since no argumentation was added after now 2 similar symptoms, with boost crawler enabled.

RAWDESK’s picture

Honestly, i believe the check should look like this :

  // The lock_may_be_available() checks to see if the flush was requested by
  // the core cron, since we may want to ignore it (boost_ignore_flush)
  if (isset($_boost['base_dir']) && (lock_may_be_available('cron') && variable_get('boost_ignore_flush', BOOST_IGNORE_FLUSH) == FALSE)) {

instead of :

  // The lock_may_be_available() checks to see if the flush was requested by
  // the core cron, since we may want to ignore it (boost_ignore_flush)
  if (isset($_boost['base_dir']) && (lock_may_be_available('cron') || variable_get('boost_ignore_flush', BOOST_IGNORE_FLUSH) == FALSE)) {

It reflects what the comment above it actually says it should do in combination with lock, not or lock.

smitty’s picture

Same problem here: All boost files are flushed, every time system corn is running.

The problem is that I don't know what lock_may_be_available('cron') and what it should deliver. In my case at least it delivers TRUE in every Case, no matter if cron is running or not. In my case, the semaphore table is completely empty, too.

So I doubt if lock_may_be_available('cron') checks to see if the flush was requested by the core cron like it is mentioned in the code.

I found out that there is a variable _cron_executing_job in the $GLOBALS telling if a cron is running. To be able to see, when boost cache is flushed without cron I added a message. So I came up with this code:

  // The $_cron_executing_job checks if the flush was requested by system_cron,
  // since we may want to ignore it (boost_ignore_flush)
  global $_cron_executing_job;
  $cron_is_running = FALSE;
  If ($_cron_executing_job == 'system_cron') {
    $cron_is_running = TRUE;
  }
  if (isset($_boost['base_dir']) && (!$cron_is_running || variable_get('boost_ignore_flush', BOOST_IGNORE_FLUSH) == FALSE)) {
    $count = _boost_rmdir($_boost['base_dir'], TRUE);
    watchdog('boost', 'Flushed all files (%count) from static page cache.', array('%count' => $count), WATCHDOG_NOTICE);
    if (!$cron_is_running) {
      drupal_set_message(t('Flushed all files ('.$count.') from static page cache.'), 'status');
    }

instead of:

  // The lock_may_be_available() checks to see if the flush was requested by
  // the core cron, since we may want to ignore it (boost_ignore_flush)
  if (isset($_boost['base_dir']) && (lock_may_be_available('cron') || variable_get('boost_ignore_flush', BOOST_IGNORE_FLUSH) == FALSE)) {
    $count = _boost_rmdir($_boost['base_dir'], TRUE);
    watchdog('boost', 'Flushed all files (%count) from static page cache.', array('%count' => $count), WATCHDOG_NOTICE);
  }
Proteo’s picture

I just faced a similar problem and perhaps can provide some help for future visitors. I've had the same issue for months with one of the largest sites I manage. At first I thought it wasn't a big deal, until the site started to exhibit some performance issues. Administrators started to notice frequent slowdowns and after checking the watchdog I realized that the whole Boost cache was being flushed away every few minutes (sometimes, several thousand of perfectly valid entries were being deleted). I spent an entire morning trying to figure out why until I realized I was making an stupid mistake.

The site uses Ultimate cron to manage cron, which is invoked by a cron task every minute (as suggested). I was running the task like this:

* * * * * /usr/bin/drush cron -q

And that's the problem. What was happening, is that by invoking cron in that way every task in in the cron was being executed every time the cron was run, and caches were being flushed.

As per Ultimate cron's instructions, you should'nt invoke the cron in this way, but instead using something like this:

* * * * * /usr/bin/drush cron-run --uri=www.mysite.com --root=/var/www/public_html > /dev/null 2>&1

After making the change, everything worked as expected. I'll keep an eye on the watchdog for a couple of days, but after 4 or 5 hours things look great.

Proteo’s picture

Quick follow-up. Today, I noticed that the system cron task (which runs every 12 hours in the site) still flushed away the whole Boost cache. Still an improvement from before, but definitively not ideal. As it turns out, it's a know issue with Ultimate cron and the lock_may_be_available() function (as suggested above), but it's been fixed:

https://www.drupal.org/project/boost/issues/1829832

After applying the patch in that thread the issue has been completely solved.