It's good to know what our page not found errors are. However, it's more important to determine what link rot we have. Anyone have any ideas to determine our link rot?

Top 'page not found' errors
Countsort icon Message
2808 node/index.php
1304 themes/xtemplate/default/logo.gif
786 node/user/register
673 tools/send_reminders.php
666 node/user/login
665 errors.php
531 node/tools/send_reminders.php
524 node/errors.php
367 node/4342
363 project/tools/send_reminders.php
342 poll/comments.php
316 project/poll/comments.php
314 8/forum/8
302 17/forum/8
290 project/forum
286 modules/Forums/admin/admin_db_utilities.php
251 project/Modules/watchdog-ok.png
250 includes/functions_portal.php
240 inc/cmses/aedatingCMS.php
240 node/modules/Forums/admin/admin_db_utilities.php
235 _vti_bin/owssvr.dll
235 MSOffice/cltreq.asp
233 atom/feed
232 appserv/main.php
230 node/inc/cmses/aedatingCMS.php
227 index2.php
220 node/appserv/main.php
216 project/advpoll/poll/comments.php
210 project/webcal/tools/send_reminders.php
207 node.php

Comments

scor’s picture

this is quite easy in webmaster tools. go to 'Links' > 'Pages with external links' and open the 'Find a page' fieldset. enter the url, click on see details. on the right there is the number of external links to that page. click on that number to display the external URLs pointing that 'page not found' page.

edit: the same feature exists for the 'Pages with internal links'. This is probably more relevant here i.e. it will give all the drupal.org pages containing broken links to drupal.org.

sepeck’s picture

@scor - Google webmaster tools are useful yes.
On page not found for us, we have 29 pages but the vast majority are broken/dead links to spammer accounts.

On the other... well, that's an interesting process but seems to be a manual per page process and webmasters tools shows, well.... first page 1 to 30 of 15987

I believe what Amazon is looking for is more in the nature of automation to determine live/dead links and report on them without bringing our database to it's knees.

catch’s picture

Well there's this: http://validator.w3.org/docs/checklink.html

Never used the non-web version but it's supposed to be able to handle large sites without putting too much load on.

scor’s picture

alternatively, there is also the ubuntu linkchecker package written in python.

apaderno’s picture

Status: Active » Closed (fixed)

After more than one year, or this task has been done, or it will never be done.
I am closing this report.