I had problems with the aggregator module not being able to retrieve feeds from certain sites. It's been happening since I'm using Drupal. I tried to investigate, adding prints here and there to follow the drupal_http_request flow, but the only thing I was able to see was that there were 302 redirections to others URLs where there was no feed, so aggregator was unable to fetch the feeds. I couldn't understand the cause, so I left it as it was... but today I've found it!

The problem is caused by these lines:

    // RFC 2616: "non-standard ports MUST, default ports MAY be included". We always add it.
    'Host' => "Host: $uri[host]:$port",

combined with mod_rewrite rules that check the host not taking into account that the port can be there.

Sometimes, webmasters do something like this:

RewriteEngine On
RewriteCond %{HTTP_HOST}	!^subdomain\.example\.com$
RewriteRule .*				http://subdomain.example.com/ [L,R]

or something like:

RewriteEngine On
RewriteCond %{HTTP_HOST}	!^www\.example\.com$
RewriteRule .*				- [F]

That is to force visitors to use a particular host name to access the site. It makes some sense as it may probably help avoid search engines indexing URLs that are not desired. ...or whatever else reason. It is the webmaster right to decide how to do it.

The problem with that kind of rules is requests made by the drupal_http_request() function include the port, so a request to www.example.com is sent as "Host: www.example.com:80" which will conflict with servers where above mentioned rules are in place.

I hope you see what I mean. Well, since it is particulary correct to not include the port in the header when standard ports are used, I propose to change function drupal_http_request() to read like:

    // RFC 2616: "non-standard ports MUST, default ports MAY be included".
    // We don't add the port to prevent from breaking rewrite rules checking
    // the host that not take the port into account.
    'Host' => "Host: $uri[host]",

Please, see attached patch.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

markus_petrux’s picture

I forgot to say that the fact is you cannot control how they setup their redirection rules, but they may be offering feeds for you that you cannot actually read because the port is used in the HTTP request. We have a problem.

markus_petrux’s picture

Sorry, my previous patch was not correct. I was never using the port, but it must be used if non-standard.

Damien Tournoud’s picture

I confirm the problem and the solution 100%.

Example of such feed: http://linuxfr.org/backend/news/rss20.rss

- DamZ

markus_petrux’s picture

Thanks for checking :-)

Dries’s picture

Gerhard, is this something you can look at? I believe you were involved with the original change (but I might be mistaken).

killes@www.drop.org’s picture

Status: Needs review » Reviewed & tested by the community

Yes, I was involved and I didn't want the port included, but you and chx disagreed...

also see http://drupal.org/node/46928 ofr a realted problem

So I amquite ok with the patch.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

OK. Committed. Thanks Markus.

Anonymous’s picture

Status: Fixed » Closed (fixed)