Hi,

I have installed and configured Varnish, to the best of my knowledge. It works, I can confirm that. The problem is that the captchas are also cached, despite in the performance admin area it saying that the captcha module will disable caching on pages with a captcha.

I know captchas are not cached, or that is my understanding, with the standard drupal cache. But is this the same for external cache (i.e. Varnish)?

Now I know this is probably a Varnish config issue and not a captcha issue. I have seen a few other items in the issues queue similar to this but none with a solution. So is there a setting(s) that can be added to the varnish VCL to disable caching of the pages with a captcha?

Or have I totally missed the point?

I would appreciate any pointers.

Regards,
Nick

Comments

nickbits’s picture

Hi,

Firstly forgot to say I am using PressFlow 6.23 and the latest dev release for Captcha and Re-captcha.

The only way I can get Varnish not to cache the captcha is to identify in the VCL file the pages (i.e. blog, user, contact, etc) that have a form/captcha on it. It is a solution but not an ideal one.

Nick

------------oOo----------------------
Nick Young (www.nickbits.co.uk)

John_B’s picture

My understanding, and please correct me if I am wrong, is that if you use Varnish with the Drupal Varnish module, then it uses Varnish for Drupal's built-in caching, which means that Drupal can invalidate the Varnish cache when it wants to. But if instead you set settings.php to use reverse proxy (on D7 or Pressflow 6), it caches cachable pages without reference to Drupal and takes away from Drupal the ability to invalidate cache entries. This raises the question whether you are using the Drupal Varnish Integration module in your Pressflow installation? Without the module Pressflow will cache pages in Varnish but will not allow Drupal to invalidate the cache, AFAIK. But I am not an expert on Pressflow, so I may be wrong about that. However, if I am right, installing http://drupal.org/project/varnish should ensure that Drupal can successfully invalidate the cache if told to do so by the Captcha module you are using.

Incidentally there are many threads on Mollom giving false positives, including open issues in the issue queue, which make no mention of caching. But maybe it is not Mollom at fault but caching : http://drupal.org/node/727668#comment-4488310

BTW I like your site.

Digit Professionals specialising in Drupal, WordPress & CiviCRM support for publishers in non-profit and related sectors

nickbits’s picture

Hi John,

Thanks for the reply.

I am using the standard Drupal Captcha ( drupal.org/project/captcha ) module at present, with Match Captcha. My understanding was that this module doesn't allow drupal to cache pages with the captcha on it. However I know that to be true for the default Drupal Cache and I know you are correct that using PressFlow/D7 with Varnish does Cache all the pages.

I have installed the Varnish Module and thought that would do exactly what you describe, however it doesn't appear to do that, not unless I have misunderstood what it does or misconfigured it.

I am now investigating this further with the expire and other modules, double checking all the documentation, etc.

Thanks,
Nick

------------oOo----------------------
Nick Young (www.nickbits.co.uk)

Fabianx’s picture

Hi,

As Varnish is sitting before Drupal, it can't know that captcha disabled the cache for this particular page.

The reason is:

captcha module does only do:

$_GLOBAL['conf']['cache'] = FALSE;

in captcha_init() function.

What it would need to additionally do is to spill out some headers for the remote proxy to not cache this page, like done in the following function:

/**
 * Implements hook_init().
 */
function custom_varnish_skip_init() {
  if (!$_GET['q']) {
    drupal_init_path();
  }

  if (arg(0) == CUSTOM_VARNISH_SKIP_PATH_ARG0) {
    // Disable drupal cache. Based on http ://drupal.org/project/cacheexclude module.
    $GLOBALS['conf']['cache'] = FALSE;
    // Set headers according to https ://www.varnish-cache.org/trac/ticket/79
    drupal_add_http_header('Pragma', 'no-cache');
    drupal_add_http_header('Cache-Control', 's-maxage=0, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0');
    drupal_add_http_header('Expires', 'Sun, 19 Nov 1978 05:00:00 GMT');
  }
}

If you are smart, you can take this code and instead of checking against a path check against $GLOBALS['conf']['cache'] and have that run via weight after captcha module. Voila: Problem solved.

Best Wishes,

Fabian

nickbits’s picture

Hi Fabian,

Thank you for the reply.

I do understand about the:

$_GLOBAL['conf']['cache'] = FALSE;

in the in captcha_init() function.

My first thought was to set a cookie on pages with a captch and not on the other pages. It is easy enough to set a VCL rule in Varnish not to cache if a specific cookie is seen. I can see a few issues with doing that though.

I will take a look at your suggestion and links.

The simplest solution I have found that works, although not ideal, is to add in the VCL file something like:

if (req.url ~ "\(myblog|contact)\")
{
return(pass);
}

Obviously I would need to do that for a whole path (i.e. blog/* - wildcards!). It is that on each blog there is a comment allowed that requires a captcha. The alternative to improve performance is to put the comment form on a separate page to the blog entry in combination with a rule like the one above.

Thanks for the help.

Regards,
Nick

------------oOo----------------------
Nick Young (www.nickbits.co.uk)

dasjo’s picture

Here's a patch that solves this in the varnish module:
https://www.drupal.org/node/2490186#comment-9933032

nickbits’s picture

Hi,

Thank you all for the feedback and comments. I am still looking at some of the information provided, including adding a hook. At present what I have done is to simply add a rule to the VCL file to bypass varnish if the path matches. So we have:

 if ((req.url ~ "contact|comment/reply/(.*)") && (req.http.host ~ "^(www\.)?example\.com$")){
    return (pass);
  }

So this works. Not ideal though. Why? Simply because I have multiple websites. So what, do the same? Well to be honest I may have to, however some of the sites are larger and will take a bit more effort to check which paths need not to be cached. Hence it would be nice to have a module/patch so that this was all automatic, i.e. any page with a captch was ignored automatically.

I did try one other method. This was just as a proof of concept to see it it would work. Using the captcha module (http://drupal.org/project/captcha) I:

1. Open captcha.inc and add to the _captcha_insert_captcha_element function:

setcookie('VARNISH', 'Y', $_SERVER['REQUEST_TIME'] + ($lifetime + 300), '/', $cookie_domain);

2. Open captcha.module and add in the captcha_form_alter function:

if (isset($_COOKIE["VARNISH"])) {
  unset($_COOKIE["VARNISH"]);
  setcookie("VARNISH", NULL, -1);
}

As I said, proof of concept and would have to either be submitted as a formal patch to the module or perhaps into its own module. It did appear to work, although I would prefer not to have to set cookies and not too sure if it is the best way.

I am still looking at this and will take a look at Fabian's suggestions.

Thanks,
Nick

------------oOo----------------------
Nick Young (www.nickbits.co.uk)

John_B’s picture

This is interesting.

However, on a more pragmatic level. at the end of the day I find Honeypot does an adequate job of spam protection at present, at least for the kind of sites I run. So I see no need to use Captcha. I think Spamicide is similar but I have not used it.

Digit Professionals specialising in Drupal, WordPress & CiviCRM support for publishers in non-profit and related sectors

nickbits’s picture

Hi John,

I tried Honeypot before, but it blocked a lot of legit users, it even blocked the local council from viewing one site. It was effective at stopping spammers, mostly, but it blocked too many legit users for it to be useful, for me at least. I have looked at other solutions such as ZB Block, but so far not found a really effective solution.

I have tried other scripts, I have my server firewall set-up with various Black Lists that update daily, but still spammers get through. Although few like captchas, I find they are a necessary and one of the best defenses against spam, in my opinion.

I think captchas and spammers is one of those topics that could go on forever, no one way that works 100%.

Anyway, the caching issue I started with on the captcha is resolved, although not to the best possible solution.

Regards,
Nick

------------oOo----------------------
Nick Young (www.nickbits.co.uk)

abbybaby’s picture

My platform is wordpress and not drupal but situation is similar. My problem was that when I was logged in contact form was working but when I logged out contact us form stopped working.

When I disabled varnish and run website only on Nginx without Varnish then everything worked fine.

So, I thought that something funny with POST, but why the problem was only happening when I am not logged in.

This is what I found

sub vcl_fetch {
....
# don't cache response to posted requests or those with basic auth
if ( req.request == "POST" || req.http.Authorization ) {
return (hit_for_pass);
}
...
}

but vcl_fetch is used when a request is sent by our backend (Nginx server) but nothing was their in vcl_recv so I added this

sub vcl_recv{
....
if (req.request == "POST") {
ban("req.url ~ /");
return(pass);
}
....
}

Ref: https://gist.github.com/ijin/3038775/raw/06dfeab7d0034664dba396c8d54873b...

Voila! My contact us forms were working again. Hope it helps you too.