This is Day 3 of Ubersoft.net running live on Drupal 5.1. Other than Drupal.org freaking out my host provider, causing them to think I was being DOS'd and temporarily suspending my site (lol) everything is going relatively smoothly. However I am trying to figure out how to reduce the amount of resources that my setup is using.

I'm seeing this in my watchdog list:

 page not found 23 Mar 2007 - 8:30am d/20040506.html Anonymous  
 page not found 23 Mar 2007 - 8:30am d/19970227.html Anonymous  
 page not found 23 Mar 2007 - 8:30am d/19970306.html Anonymous  
 page not found 23 Mar 2007 - 8:30am comic.rss Anonymous  
 page not found 23 Mar 2007 - 8:29am d/20010308.html Anonymous  
 page not found 23 Mar 2007 - 8:29am d/20010814.html Anonymous  
 page not found 23 Mar 2007 - 8:29am d/20000522.html Anonymous  
 page not found 23 Mar 2007 - 8:28am comics/hd20070323.png Anonymous  
 page not found 23 Mar 2007 - 8:27am comics/hd20070322.png Anonymous  
 page not found 23 Mar 2007 - 8:26am d/20001226.html Anonymous  
 page not found 23 Mar 2007 - 8:25am comics/hd20070321.png Anonymous  
 page not found 23 Mar 2007 - 8:25am favicon.ico Anonymous 

Essentially "anonymous" is attempting to access files from my old site structure, back when it was all static pages. I expect a certain amount of this, since these links have been relatively unchanged since 2000, but the majority of all entries in watchdog are anonymous users trying to access files that no longer exist.

I have two questions:

1. would users trying to access these files be consuming Drupal/database resources (i.e., Drupal is trying toitnerpret these as permalinks and finding nothing, consuming resources)?

2. would watchdog keeping track of all these files be consuming Drupal/database resources?

If the answer to either of these is "yes," what's the best way to deal with them?

Thanks for any help you can provide...

Comments

ubersoft’s picture

... that this might be a more appropriate question for the "Performance and scalability" forum. I'm not sure, but if an admin thinks it would be a better fit there feel free to relocate this thread.

catch’s picture

Depending on your setup, drupal will serve a 404 page every request to a page not found - which is a page load every time and hence database queries ( not really up on this and it'd depend on page caching how much). I don't think it does this at all for images etc. though at least in 5.x, and also 5.x doesn't call blocks on 404s any more.

There's a couple of reasons for the 404s which you'll know about- external links from other sites and search engine indexes not updating to the new urls. The best way to deal with both afaik is to set 301 redirects for the most common 404 errors - to an equivalent page or file - path redirect module makes this simple or you can do .htaccess rules if you're clever. This is supposed to speed up search indexing re-indexing (and will avoid some damage to your rankings if you're worried about that), and obviously it means people who don't update their rss feeds will still get them etc. etc.

If you're getting lots and lots of failed requests, your watchdog table can get quite full, which could slow things down resources - so it's worth setting the history to only a few days or so. Also make sure you're checking/optimizing/rebuilding your mysql tables every so often since some like watchdog and cache can build up a lot of overhead.

I think it's fairly easy to stop drupal from serving 404 pages at all, so you can revert to the standard apache one (or a static html page) - that would probably make a fair amount of difference to resource consumption.

ubersoft’s picture

At the moment I'd like to use a static page for my 404s. How do I disable it in Drupal? The only setting I see is a way to replace Drupal's default message with a custom page.

ubersoft’s picture

This is actually very worrisome... this is a long quote, so bear with me.

This is the latest info I have from watchdog:

warning	page not found	23 Mar 2007 - 6:57pm	d/20021219.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:54pm	d/20020626.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:51pm	d/20001009.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:51pm	comics/hd20070309.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:50pm	kpanic	Anonymous	
warning	page not found	23 Mar 2007 - 6:50pm	comics/hd20060720.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:48pm	kpanic/comics/kp20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:45pm	comics/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:45pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 6:44pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:44pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:44pm	d/20031008.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:42pm	comics/hd20070305.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:42pm	images/banner/banner02.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:41pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:41pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:39pm	comics/hd20060720.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:38pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:38pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:38pm	kpanic/index.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:38pm	d/19990903.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:37pm	d/20070319.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:36pm	kpanic/comics/kp20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:35pm	doubleclick/DARTIframe.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:34pm	comics/hd20070308.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	kpanic/comics/kp20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070323e.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070323g.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070323c.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070323a.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070322g.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070322.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070322e.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070321.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070322a.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070322c.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070321g.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070321a.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070321c.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	comics/hd20070321e.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:33pm	atom/feed	Anonymous	
warning	page not found	23 Mar 2007 - 6:32pm	files/comics/kp/osw20070110.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:31pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:31pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	files/comics/kp/osw20070110	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	files/comics/kp/kp20061228.	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	files/comics/hd/hd20070323.	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	features/bothbarrels/20010316.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:30pm	features/bothbarrels/20010316.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:29pm	d/20000727.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:29pm	d/20050929.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:28pm	comics/kp/osw20070110	Anonymous	
warning	page not found	23 Mar 2007 - 6:28pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:24pm	d/archives.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:23pm	comics/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:23pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:23pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:22pm	comics/kp/kp20061228.	Anonymous	
warning	page not found	23 Mar 2007 - 6:22pm	comics/hd/hd20070323.	Anonymous	
warning	page not found	23 Mar 2007 - 6:22pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:16pm	comics/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:16pm	comics/kp20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:15pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 6:13pm	d/20010711.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:13pm	d/20070312.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:10pm	d/19990920.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:10pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:10pm	images/banner/banner27.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:08pm	features/apostrophecolon/reallygreat.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:07pm	doubleclick/DARTIframe.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:07pm	links/png.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:05pm	feed/atom	Anonymous	
warning	page not found	23 Mar 2007 - 6:05pm	d/20050413.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:04pm	d/20070323.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:03pm	files/comics/hd/hd20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 6:03pm	d/19990316.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:02pm	atom/feed	Anonymous	
warning	page not found	23 Mar 2007 - 6:01pm	d/20050711.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:01pm	d/20020807.html	Anonymous	
warning	page not found	23 Mar 2007 - 6:00pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 6:00pm	d/20070323.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:59pm	doubleclick/DARTIframe.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:59pm	comics/hd20060405.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:58pm	d/20070312.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:55pm	d/20061110.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:49pm	comics/comics/hd/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:47pm	d/20021106.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:47pm	comics/comics/hd/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:47pm	comics/comics/hd/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:45pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 5:45pm	links/linux.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:42pm	comics/hd20060405.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:40pm	comics/hd20060405.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:40pm	features/bothbarrels/20010820.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:38pm	d/20031029.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:38pm	d/20021212.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:38pm	comics/hd20060405.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:38pm	d/20050721.html	Anonymous
warning	page not found	23 Mar 2007 - 5:37pm	d/20050718.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:36pm	comics/hd20031114.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:36pm	comics/hd20031117.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:36pm	comics/hd20031114.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:35pm	kpanic/index.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:34pm	images/banner/banner13.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:30pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 5:30pm	comics/hd20011205c.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:30pm	wp-comments-post.php	Anonymous	
warning	page not found	23 Mar 2007 - 5:30pm	siteinfo.xml	Anonymous	
warning	page not found	23 Mar 2007 - 5:28pm	d/20041102.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:26pm	d/20030220.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:26pm	d/19980111.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:26pm	d/20030526.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:26pm	comics/hd20070309.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:26pm	d/20000526.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:24pm	comics/hd20060405.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:19pm	d/20001222.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:19pm	d/19990816.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:18pm	d/20010509.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:17pm	d/20001221.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:17pm	images/banner/banner25.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:15pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 5:09pm	kpanic	Anonymous	
warning	page not found	23 Mar 2007 - 5:08pm	d/19990908.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:08pm	comic.rss	Christopher Wright	
warning	page not found	23 Mar 2007 - 5:06pm	d/20051212.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:06pm	comics/hd20031114.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:06pm	comics/hd20031117.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:06pm	comics/hd20031114.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:05pm	comics/hd20031113.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:05pm	comics/hd20031113.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:05pm	comics/hd20031114.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:05pm	feed/atom	Anonymous	
warning	page not found	23 Mar 2007 - 5:05pm	d/20010607.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:04pm	d/20070319.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:04pm	d/20020211.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:03pm	d/20041129.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:02pm	doubleclick/DARTIframe.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:01pm	d/20070323.html	Anonymous	
warning	page not found	23 Mar 2007 - 5:00pm	kpanic/comics/kp20070324.png	Anonymous	
warning	page not found	23 Mar 2007 - 5:00pm	comic.rss	Anonymous	
warning	page not found	23 Mar 2007 - 5:00pm	d/20070323.html	Anonymous	
warning	page not found	23 Mar 2007 - 4:54pm	comics/hd20061101.png	Anonymous	
warning	page not found	23 Mar 2007 - 4:53pm	d/archives.html	Anonymous	
warning	page not found	23 Mar 2007 - 4:53pm	d/20070306.html	Anonymous	
warning	page not found	23 Mar 2007 - 4:53pm	d/20060109.html	Anonymous	
warning	page not found	23 Mar 2007 - 4:52pm	comics/hd20070323.png	Anonymous	
warning	page not found	23 Mar 2007 - 4:51pm	d/20010503.html	Anonymous	
warning	page not found	23 Mar 2007 - 4:49pm	d/20050412.html	Anonymous

This goes on for pages and pages and pages. Every few minutes someone keeps trying to access files from the OLD location. It's not the same person -- the IP addresses are different (though there's a lot of activity from a 74.6.69.x domain) -- but it's every few minutes.

I've modified my htaccess file as described here to keep Drupal from generating a database-driven 404 for many of those kinds of files (alas, not png) but that only works for attempts to access files in the root directory -- files like "d/20050412.html" are still treated as if they were permalinks.

Is there some way to just turn off Drupal's error messages? The shtml files my provider supplies will be fine for me.

chrisschaub’s picture

You need to setup mod_rewrite rules for these old pages in your site .htaccess. A global rewrite rule for *.html would be bad because google could flag you as having too many requests all pointing to the same page. So, could you map the old urls to new drupal pages? If so, create the mod_rewrite rules using a 301 redirect. This way google and others will update themselves and stop hammering you. You could use the 404search module to redirect not founds to search requests. But, google will hate this since you will never have a 404 error. Google tests that you actually have not found pages. Hopefully you have a backup of the old site and can create mappings. Mod rewrite is you friend. The Redirect module also lets you do this within drupal and set a 301. Triage, do your most requested files first using Redirect. It doesn't carry forward any query strings, but it will do the redirect. Hope this helps.

ubersoft’s picture

in my /d/ directory alone on the old site. There are some specific files I've redirected to nodes, but to do that to all of them... ouch.

Quint’s picture

I've done that kind of matchup before.

Do you have lists of the old and the new? Is there any logic that can be used (like whole folders being relocated)? If not, it's still not so hard to paste the lists into Excel and drag the cells around and match up the old and new, then format for exact syntax. ... worth considering. Maybe you can just relate the top-hammered URLs only.

Quint

complete calculator for building stairs -- www.Shalla.Net

ubersoft’s picture

Meanwhile, I modified the robots.txt file to exclude the directories that no longer exist -- hopefully that will help some.

ubersoft’s picture

... but I'd like to know everyone's opinion on this first.

Last night I tried setting up 301 Redirects for specific files. This worked extremely well (and was fast, too) BUT I have so many files that dealing with each one individually would make my htaccess file over 200K in size!

So I thought I'd try using RedirectMatch, like so:

RedirectMatch 301 /comics/hd(.*)\.png$ http://ubersoft.net/files/comics/hd$1
RedirectMatch 301 /kp/comics/kp(.*)\.png$ http://ubersoft.net/files/comics/kp$1
RedirectMatch 301 /comics/osw(.*)\.png$ http://ubersoft.net/files/comics/osw$1

RedirectMatch 301 /d/hd1996(.*)\.html$ http://ubersoft.net/comic/hd/archives/1996
RedirectMatch 301 /d/hd1997(.*)\.html$ http://ubersoft.net/comic/hd/archives/1997
RedirectMatch 301 /d/hd1998(.*)\.html$ http://ubersoft.net/comic/hd/archives/1998
RedirectMatch 301 /d/hd1999(.*)\.html$ http://ubersoft.net/comic/hd/archives/1999
RedirectMatch 301 /d/hd2000(.*)\.html$ http://ubersoft.net/comic/hd/archives/2000
RedirectMatch 301 /d/hd2001(.*)\.html$ http://ubersoft.net/comic/hd/archives/2001
RedirectMatch 301 /d/hd2002(.*)\.html$ http://ubersoft.net/comic/hd/archives/2002
RedirectMatch 301 /d/hd2003(.*)\.html$ http://ubersoft.net/comic/hd/archives/2003
RedirectMatch 301 /d/hd2004(.*)\.html$ http://ubersoft.net/comic/hd/archives/2004
RedirectMatch 301 /d/hd2005(.*)\.html$ http://ubersoft.net/comic/hd/archives/2005
RedirectMatch 301 /d/hd2006(.*)\.html$ http://ubersoft.net/comic/hd/archives/2006
RedirectMatch 301 /d/hd2007(.*)\.html$ http://ubersoft.net/comic/hd/archives/2007

This takes all the pngs for each individual comic, as they had been placed in the old locations, and redirects them to their new locations. It also takes all html files for each individual comic and redirects them to the specific years archive in the new site structure.

Is this feasible?

Quint’s picture

ubersoft’s picture

... the problem is that all the examples assume you're moving from a static file in one location to a static file in another.

In this case, however, one of the things I'd be doing is moving from a static file to a database query, i.e. /d/hd20000101.html to /comic/hd/archives/2000

My attempts at this have not worked, and I'm unsure if it's because I'm using the wildcards improperly or if apache simply thinks I'm trying to redirect a file to a directory and refuses to be involved in that kind of nonsense.