Only allow X number of read/write streams open at a time. While you can issue over a thousand request in under a second, it might not be the best idea.



hass’s picture

Not only a good idea, per server you need to limit to only 8 request per RFC. We may use the module in linkchecker, but than yes - will push 1000 urls in the array and shoot servers...

mikeytown2’s picture

Which RFC so i can reference to it in a code comment?

hass’s picture

Puhh... Cannot remember - it's the limit that browsers have implemented... IE had a limit of 2 in past... You can change by registry. There have been some firefox addons that highered it to 20, but with a note that this may cause troubles. Maybe some sites lock down after 8 connections...

Only to clarify this does not meen 8 in general, it's 8 per same domain or hostname... So you could run 1000 requests from your server, if you make sure one hostname like get's only hurt by 8 simultanous requests.

mikeytown2’s picture

RFC 2616 (HTTP 1.1)

Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. A proxy SHOULD use up to 2*N connections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times and avoid congestion.

Looks like it is not in RFC 1945 (HTTP 1.0)

HTTPRL should define 2 limits. Global max and per domain max. Domain max of 8 works for me. The global max would be something like 128, meaning it will only keep 128 open connections at a time; non-blocking requests close the connection so it is still possible to flood a server (no good way to get around this issue). Something to note is HTTPRL does not use persistent connections and is a 1.0 client; it sets Connection: closed. A persistent connections would look like Connection: keep-alive. Being nice to servers is generally a good idea so setting this to 8 sounds like a plan.

mikeytown2’s picture

Priority: Normal » Critical
mikeytown2’s picture

Status: Active » Fixed
2.93 KB

This patch has been committed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mc0e’s picture

I think this per-server limit wants to be reviewed downwards. If I saw a process hitting my server at this sort of speed, I'd block it. Many sites have mechanisms to do so automatically.

The limits referred to in RFCs are for browsers, and not appropriate in this context. Ie most of those requests are for static files, and the usage pattern is characterised by brief bursts of activity with long gaps between.

In this context it's more appropriate to refer to recommendations for crawler behaviour, which generally specify how many seconds wait between each fetch for a given domain, not how many fetches to do in parallel. Major crawlers like google, yahoo, etc typically wait about 3 seconds between hits on a site with a few hundred thousand pages. They mostly go slower for smaller sites.

hass’s picture

Current rate limit has been reduced to two requests per domain, only.

mikeytown2’s picture