I installed the lates release after deinstallation of the old release. After rebuilding, when I try to open my.site/sitemap.xml error 404 Not found is returned. The rebuilding of the cashe tigers creation of a file xmlsitemap-en-0.xml (attached) in the directory sites/default/files/xmlsitemap as well as population of records in the table xmlsitemap in MySql site schema.

CommentFileSizeAuthor
xmlsitemap-en-0.txt2.83 KBsmoldovansky
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

smoldovansky’s picture

One update when I try to generate sitemap for other language I can access the file without any problems - in my particular case for language bg so I was able to access my.site/bg/sitemap.xml

apaderno’s picture

What is the default language, in your site?

Dave Reid’s picture

It looks like Apache is intercepting the request for http://www.balkan-fs.com/sitemap.xml. If it were a Drupal problem you'd see Drupal 404 page. Maybe check your rewrite rules, see if there's anything that would be causing this. Also double check that in your root Drupal's directory there isn't actually a sitemap.xml file there.

Dave Reid’s picture

For example, when we manually use the un-clean Drupal path http://www.balkan-fs.com/?q=sitemap.xml, it works just fine.

smoldovansky’s picture

The default language is english

smoldovansky’s picture

Thank you for the prompt response. My rewriting rules are as follows:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !^/$
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !^/(files|misc|uploads)(/.*)?
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !\.(php|ico|png|jpg|gif|css|xml|js|html?)(\W.*)?
RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
What could be wong there?

smoldovansky’s picture

pls ignore this

apaderno’s picture

For what I can see, that is the normal set of rules Drupal comes with.

Dave Reid’s picture

@Kiam/8: Nope, those are not the stock Drupal 6 rewrite rules.

@smoldovansky/7: Are you saying that you resolved the issue? Should I mark this as fixed?

apaderno’s picture

I guess he is saying to ignore comment #7, because it was a duplicate.
There are no people who are able to write a comment 1 second after they wrote a previous one.

Dave Reid’s picture

@Kiam, I didn't notice the timestamps, but I also can't tell when people edit comments.

I confirmed using the poster's rewrite rules will fail with loading locations like sitemap.xml and rss.xml (the default RSS feed provided by Drupal).

I'm not an expert on rewrite rules, but when I changed:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !\.(php|ico|png|jpg|gif|css|xml|js|html?)(\W.*)?
to:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} !\.(php|ico|png|jpg|gif|css|js|html?)(\W.*)?
it worked just fine. As such, I am marking as fixed.

Dave Reid’s picture

Status: Active » Fixed

Looks like this has been fixed on the smoldovansky's site as well. Really marking as fixed now. :)

apaderno’s picture

Category: bug » support
Status: Fixed » Active

If the problem is the rewrite rules used, then it's not a bug of the module.
I am changing the type of the report.

apaderno’s picture

Status: Active » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

anoopjohn’s picture

Version: 6.x-2.x-dev » 6.x-1.1
Status: Closed (fixed) » Active

I have the same symptom on a 6.x.1.1 version of the module. However when I disable the 'XML sitemap node' module the problem disappears. But then without the 'XML sitemap node' the nodes are not being listed in the sitemap.

Dave Reid’s picture

Status: Active » Postponed (maintainer needs more info)

Did you run cron after enabling the XML sitemap node?

anoopjohn’s picture

Thanks for the reply. Yes, cron ran several times after the module was set up and no associated errors were reported in the log with the cron run.

apaderno’s picture

Status: Postponed (maintainer needs more info) » Active
Anonymous’s picture

Status: Active » Postponed (maintainer needs more info)
  • Did you have a previous 6.x-1.x version installed?
  • If yes, did you execute update.php follow the directions in INSTALL.txt?
  • Does clicking the "Clear cached data" button on the admin/settings/performance page help?
anoopjohn’s picture

Status: Postponed (maintainer needs more info) » Active

This is a new installation of the module.
I had cleared the drupal cache as well as cleared the xml sitemap cache files using the tools.

Anonymous’s picture

Status: Active » Postponed (maintainer needs more info)
  • What about your rewrite rules?
  • What does the admin/reports/dblog show for xmlsitemap entries?
  • What is the complete list of contrib modules that you have active?
anoopjohn’s picture

Status: Postponed (maintainer needs more info) » Active

1) What about your rewrite rules?

Clean URLs is working fine for all modules. So are URL aliases.

RewriteCond %{HTTP_HOST} ^benzinga\.com$ [NC]
RewriteRule ^(.*)$ http://www.benzinga.com/$1 [L,R=301]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

are the rewrite rules inside .htaccess

2) What does the admin/reports/dblog show for xmlsitemap entries?

Cron entries for successful submission to search engines and errors once in a while for submissions to Yahoo and Ask.

3) What is the complete list of contrib modules that you have active?

I have disabled xmlsitemap_node temporarily to prevent the 404 error.

admin_menu
adsense
adsense_managed
betterselect
better_formats
bzcustom
bzstocks
content
content_profile
content_profile_registration
ctools
date
date_api
date_timezone
feedapi
feedapi_node
filefield
globenews
googleanalytics
googlenews
imageapi
imageapi_gd
imageapi_imagemagick
imagecache
imagecache_ui
imagefield
insert_view
link
marketwire
menu_block
mollom
nodequeue
node_breadcrumb
optionwidgets
page_manager
panels
panels_export
panels_mini
panels_node
parser_common_syndication
pathauto
plus1
prnewswire
quicktabs
service_links
simplenews
simplenews_scheduler
submitagain
taxonomy_image
text
token
views
views_content
views_groupby
views_ui
votingapi
webform
webformblock
wysiwyg
xmlsitemap
xmlsitemap_engines

Anonymous’s picture

  • I have disabled xmlsitemap_node temporarily to prevent the 404 error.

    I don't see how xmlsitemap_node can affect the sitemap.xml URL! It doesn't contain the menu entry for it. Is there a admin/reports/dblog entry for the error?

  • Do you have a path alias assigned that uses sitemap.xml as either the src or dst?
  • Do you have a sitemap.xml file in your DocumentRoot directory?
apaderno’s picture

I don't see how xmlsitemap_node can affect the sitemap.xml URL

XML Sitemap node cannot indeed affect the sitemap URL, which is set by XML sitemap.

Dave Reid’s picture

@anoopjohn: Do you mind if we know the link to the site you're having problems with? If you want to keep it confidential, you can use my contact form to send the link to me.

anoopjohn’s picture

Thanks everybody for the replies. I have more information about the errors.

The 404 error that is being shown when trying to access the sitemap.xml is actually because of an internal server error that is then causing a 404 on the internal_server.html page

I checked out the apache error log and the internal server error is logged as "Premature end of script headers: index.php"

Would that be a PHP memory_limit issue? The memory limit is 90MB and it can't be increased as the site is on a Dreamhost PS which is kind of a limited VPS.

I don't have any url alias with src or dst as xmlsitemap.

I don't have a sitemap.xml in the root folder. Had there been one then .htaccess would have pointed the request to the file, woudn't it?

@earnie and @KiamLaLuno - Couldn't the xmlsitemap_node module require higher memory allocation triggering a PHP error while the module is on because of the above memory limitation and nothing when it is off?

@Dave Reid - No problem sir :-) It is www.benzinga.com.

anoopjohn’s picture

One more piece of information. The site has 12000 nodes

apaderno’s picture

I don't have a sitemap.xml in the root folder.

That is normal. XML sitemap doesn't create that file; it uses some cache files, but it will never create a sitemap.xml in Drupal root directory (also because the modules are not allowed to write into that directory).

anoopjohn’s picture

@KiamLaLuno - Thanks for the answer. I was only answering earnie's question about whether there is a sitemap.xml file in the documentroot.

Anonymous’s picture

Couldn't the xmlsitemap_node module require higher memory allocation triggering a PHP error while the module is on because of the above memory limitation and nothing when it is off?

This would be logged into the watchdog table for the admin/reports/dblog report. At the very least there should be an entry in the php.log file on the server. But xmlsitemap_node shouldn't be in play during the display of sitemap.xml so I don't understand how it can matter.

Would that be a PHP memory_limit issue? The memory limit is 90MB and it can't be increased as the site is on a Dreamhost PS which is kind of a limited VPS.

You have a lot of heavyduty modules listed. Certainly something could be going over that memory limit or perhaps some other limit. There is also a PHP log somewhere are there errors in it?

The 404 error that is being shown when trying to access the sitemap.xml is actually because of an internal server error that is then causing a 404 on the internal_server.html page

This seems to indicate that this isn't a problem with Drupal but something external to it. Drupal itself doesn't serve up internal_server.html. Perhaps MySQL is timing out the connection during the rebuild of the cache files. What do the MySQL logs tell you?

anoopjohn’s picture

I checked out the files/xmlsitemap folder. There are no files in there. The folder has write permissions.

Drupal itself doesn't serve up internal_server.html but apache could be configured to look for internal_server.html and the actual error could be from within drupal itself. Couldn't it?

I have requested the host for the MySQL log.

The reason why I suspected that the xmlsitemap_node has something to do with this is that the moment I turn off that module, sitemap.xml becomes accessible and that the 404 error disappers. But the problem is that there would be no nodes listed in sitemap.xml.

Anonymous’s picture

Drupal itself doesn't serve up internal_server.html but apache could be configured to look for internal_server.html and the actual error could be from within drupal itself. Couldn't it?

No, the actual error is outside of Drupal if the internal_server.html file is served to the user. If it were within Drupal the Drupal soft version of the 404 error would be displayed and a watchdog table entry would be present. Drupal registers its own error handlers to handle errors within Drupal. Take a look at the index.php file and you'll see what Drupal does for a 404 error.

anoopjohn’s picture

Sorry if I was misunderstood. The internal_server.html is not shown to the end user, Drupal 404 error is shown to end user. Drupal log records that the 404 error is due to a request to an internal_server.html page.

apaderno’s picture

Drupal log records that the 404 error is due to a request to an internal_server.html page.

XML sitemap doesn't try to access internal_server.html; I guess that there is something else wrong, here, and the problem is not caused by XML sitemap modules at all. Correct me if I am wrong.

Anonymous’s picture

Status: Active » Fixed

So Drupal's 404 error is given because for some unknown reason a request is made to internal_server.html page which Drupal cannot find. You'll have to research your server logs to determine why you are receiving an E500 error. Since this is obviously outside the realm of xmlsitemap the issue is being set to fixed. You may feel free to comment even with a status of fixed.

I really suspect your DB is the issue here. You may even find a query out of line, please open a different issue if that is the case.

Dave Reid’s picture

I've had this happen before when my PHP memory limit ran out. PHP tried to redirect to internal_error.html which doesn't exist and Drupal serves up a 404 page. What I'm guessing might be happening is when you enable the xmlsitemap_node, it needs to regenerate the sitemap cached files with all of the node data. When it tries to do that, it runs out of memory before it can finish, so the files are not generated and you keep getting redirected to a 404. When you disable the xmlsitemap_node module, it no longer has to include your 12000 nodes in the sitemap, and so PHP doesn't run out of memory.

You might want to try out the 6.x-2.x branch which was a rewrite for performance and scalability. It's been tested on sites with thousands and thousands of nodes.

anoopjohn’s picture

@Dave - thanks for the pointer to the 2.x branch. I have set up the 2.x branch and it seems to be working fine. I am no longer using the 1.x module so it shouldn't matter and 2.x seems to have done a lot of different things in terms of performance and I am not going back to 1.x.

I think I have found the specific part of the module that was causing the time outs and memory overshooting. It was in the xmlsitemap_node_xmlsitemap_links() function. I commented the module invoke code and the whole thing went through.

For those who come across this post - a simple message. If your site has a large number of nodes try 2.x

If your site already has a lot of nodes and you are setting up xmlsitemap module don't even think about 1.x version of the module as there is a heavy function xmlsitemap_node_xmlsitemap_links() that retrieves all the nodes in your site and processes them one by one. You will end up getting 404 errors if you enable the xmlsitemap_node module and your server has PHP memory limits or if you are using a shared host with limited processing power available for you.

2.x seems to be going beautifully well in terms of performance. So go ahead and try 2.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.