I am getting lots of log entries saying page not found. Anyone know why?

page not found 07/20/2007 - 13:06 feed/index.php/0/taxonomy/term/taxonomy/0/index.php Anonymous
page not found 07/20/2007 - 13:05 term/index.php/0/taxonomy/term/taxonomy/0/feed Anonymous
page not found 07/20/2007 - 13:05 taxonomy/term/index.php/0/taxonomy/term/taxonomy/0/feed Anonymous

Comments

Ev0’s picture

Yes, for some reason I'm receiving a few similar messages, where the url just repeats itself. Are you receiving all of the messages from the same ip address? For some reason they're all coming from Amazon.com... after looking up the ip address.

jase951’s picture

it can well be that these are crawlers or even SPAM bots trying to guess URLs.

JirkaRybka’s picture

Some time ago, I had lots of such messages, and it was an issue with URL-format somewhere (a picture in my custom theme, but that's not relevant here).

You can have two styles of URL's:
--- Absolute: http://domain.com/something/something_else, or just /something/something_else
--- Relative: something/something_else

The absolute ones are based on your document root, while relative ones are - well, relative, to the current document. You should always avoid relative URL's, as there are problems with URL-variables.

In my case, no clean URL's used, the current document was /drupal/?q=something and the relative path seen in my theme was images/foo.gif. Any smart browser builds the real URL like /drupal/images/foo.gif, which is correct, but there are also stupid ones (perhaps crawlers, yes) sending requests to /drupal/?q=something/images/foo.gif. See what happens? The relative path appended to the whole string, including the variable, so it's Drupal who gets the images/foo.gif part to deal with, not the webserver. In my case, the resulting "not found" page contained the image again, making it even worse: The crawler appended another occurence of the string, then one more... I ended up with thousands messages in watchdog, each repeating the string almost endlessly.

Okay, it was just a stupid crawler, not stripping the variables from URL. But you seem to be using clean URLs, and in THIS case the browser may not know what's a variable and what's just subdirectory. The problem with relative paths is then unavoidable... So make sure that all your URL's start with at least a slash (or better full "http://"). In a theme, you can just print base_path() in front of your path, to make it absolute.

jamesclarke’s picture

I have been having a similar problem . . . I think it has actually been causing performance issues on a relatively low traffic site! So, here is hoping I can figure where this repetitive loop is!

I'll try to remember to report back if it worked.

jamesclarke’s picture

I'm not sure if this has solved all of my drupal problems, but it seems to have addressed a lot of them. I had one relative link with an extra / and that seemed to send a crawler into fits . . . I also think that this was causing db problems because of all the 404 entries into the watchdog table. Time will tell.

suzanne.aldrich’s picture

Subscribing.