I suggest to provide additional static page with Antibot response which is requested by GET and can be cached by Varnish or other proxies.
So if web server rule is added (in .htaccess or nginx.conf, etc.) to redirect /antibot requests to /antibot-static, the page /antibot-static will be loaded via GET without additional parameters, and the response can be cached.

Comments

maximpodorov created an issue. See original summary.

maximpodorov’s picture

Status: Active » Needs review
StatusFileSize
new447 bytes
mstef’s picture

Interesting point. At first, I was going to suggest simply making the path configurable, then you could just set it to whatever you wanted, but the point about the POST requests being uncachable is a good one.

Is there a way we can build this in to the module so it does not require any web server configuration to redirect the POST request to a GET?

maximpodorov’s picture

I think no. Any actions which are executed in Drupal will require bootstrapping Drupal. We need to write the documentation with the configuration examples for Apache and Nginx and maybe other servers.

mstef’s picture

I'm not following then. Your patch adds another Drupal controller. That would do the some invoking for Drupal as what I am suggesting. If you truly want to avoid Drupal all together, why not just add a static HTML file and use htaccess to redirect to it on any request to /antibot?

sinn’s picture

Yes, static HTML is also possible but in this case you will need to implement header and footer yourself.

antibot-static will be cached by Varnish, so Drupal will be invoked just once.

mstef’s picture

There is nothing preventing you from requesting /antibot via a GET request. If you need to tinker with htaccess to hijack requests, why not just redirect any POST requests to /antibot to a GET request on /antibot. There's no difference between that page and the one your patch adds. Maybe I'm missing something?

maximpodorov’s picture

I use the following redirect configuration in .htaccess (for the possible language prefixes, ([a-z\-]+/)? is used):

# Redirect antibot (GET or POST) to antibot-static (GET)
RewriteRule ^([a-z\-]+/)?antibot$ /$1antibot-static [R=302,L]

# Deny access to antibot-static (POST)
RewriteCond %{REQUEST_METHOD} POST [NC]
RewriteRule ^([a-z\-]+/)?antibot-static$ - [F,L]

# Deny access to antibot-static (GET with query parameters)
RewriteCond %{REQUEST_METHOD} GET [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^([a-z\-]+/)?antibot-static$ - [F,L]

It is useful if Drupal responses are cached by Varnish or other caching service.

The first rules doesn't allow POST requests with arbitrary POST data to reach Drupal, and legitimate browsers are able to see the /antibot-static page.
The second rule doesn't allow POST requests to /antibot-static to reach Drupal (some bots can try to do this).
The third rule doesn't allow GET requests to /antibot-static with query parameters to reach Drupal (some bots can try to do this).

So the only requests that reach Drupal are GET requests to /antibot-static (without query parameters), and such responses ARE cached while Antibot module without these .htaccess rules accepts POST requests with arbitrary POST data - such requests can't be cached (unless Varnish is configured to do so which is not always possible).

Since GET requests to /antibot-static are cached, Drupal is reached only once per page cache time, not for every POST request which bots send. This is not true if the site uses language prefixes because Antibot doesn't generate "action" attribute properly:
$build['#action'] = base_path() . 'antibot';
So /antibot is redirected to /antibot-static, and /antibot-static is redirected to /[langcode]/antibot-static which is not so great.
Instead, something like this should be used:
$build['#action'] = Url::fromRoute('antibot.antibot')->toString();

maximpodorov’s picture

StatusFileSize
new1.16 KB

So it's better to change the "action" attribute calculation.

sinn’s picture

> There is nothing preventing you from requesting /antibot via a GET request...
Each form has hidden input fields with unique values: form_id, form_token, form_build_id. Also bots send unique data. Yes, we can use GET request but all these requests will be handled by Drupal because of theirs uniqueness. These requests will be cached by reverse proxy AFTER they are processed by Drupal. We have tons of such requests and want to decrease load on the servers by caching them by Varnish. We can't change Varnish configuration due to complex infrastructure on the cloud hosting, so this trick with htaccess has been invented.
With this solution Antibot module will be able not only protect forms but also reduce load on the servers.

mstef’s picture

Status: Needs review » Needs work

I understand the issue and agree there should be something in this module to address it. I do not like the current patch for a few reasons:

  1. It doesn't actually provide anything functionally. You still need server configuration to do anything with it.
  2. The added route is identical to the current one. I still don't quite understand why you cannot just redirect the user to that route instead while doing the same manipulation of the request to make it a GET without any data.
  3. This route could easily be added in a custom module. In fact, since you're manually configuring your server, you might as well just direct it to any page or content on your site.
  4. If we end up with a solution that does require server configuring, the patch should contain some documentation about it so others can make use of it.

I would like to brainstorm a solution for this which does not require any server configuring. I would think there's a way we can do it - even though it may not be completely as performant as using something like htaccess. Maybe there isn't though..

maximpodorov’s picture

2. It would be possible just to redirect POST requests /antibot -> /antibot, and normal browsers will initiate GET request after receiving such redirection response, but I want to create some defense against stupid bots which can send POST requests again after receiving such redirection response. In my solution above, the second POST request would go to /antibot-static, and such requests are blocked.

I don't think it's possible to provide just a Drupal solution (without changing web server or proxy server configuration). If bots send POST requests, Drupal will be reached in any case. If bots send GET requests with arbitrary query parameters, Drupal will be reached if query parameters are different from the parameters of the previous requests.

mstef’s picture

Status: Needs work » Closed (won't fix)

Okay, I understand.

If this cannot be entirely solved in Drupal, then I don't think committing this makes much sense. You could very simply add a route to a custom module - or even use an existing page on your site, since you have control over the redirect path via htaccess. I would think not relying on this Antibot page is even better so you have full control over the wording/content. The route in antibot does nothing special or specific to meet your needs. It just outputs a generic message.

Please reopen this ticket if there's some way we can even marginally improve performance through this module.

maximpodorov’s picture

I think it would be better to add this opportunity to improve the performance in the module documentation.

maximpodorov’s picture

StatusFileSize
new1.18 KB

Re-roll for the 8.x-1.4.

kuldeepbarot’s picture

StatusFileSize
new1.23 KB

Updated this patch to make it compatible with Drupal 10.5.10