Hi,
I am unable to generate a sitemap with 6.x-2.0-alpha2.
I have a Drupal 6.13 site with almost 200k nodes, of which, ideally I would like to put about 70k in the sitemap.

I've installed the module, followed the readme.txt instructions, however I don't seem able to get the module to generate a sitemap either on cron or using the rebuild function. Cron times out when the module is installed, and when I use the rebuild function the module progress bar shows it is processing the records (and it does seem to populate the xmlsitemap table in the database) but the process terminates with an error message like: "An error has occurred.Please continue to the error page An HTTP error 500 occurred. /batch?id=13&op=do" . Returning to admin/settings/xmlsitemap confirms that the sitemap is out of sync and needs to be rebuilt. And there is no data at sitemap.xml (although the module has sometimes produced an index page of links to sitemap pages, although these pages are empty).

I am aware that the volume of pages I want to index is an issue, so have experimented with the "Maximum number of sitemap links to process at once" and other settings. And have also set content to be excluded from the sitemap, have even tried excluding everything except a few nodes or menu links. I have also uninstalled and reinstalled the module twice (NB I did receive error messages on one of these uninstalls - as per issue- http://drupal.org/node/705226 - not sure if this is an issue?). However the same errors, both with cron and with the rebuild function, persist regardless.

Before upgrading to 6.x-2.0-alpha2, I was using 6.x-2.0-unstable5. With this version I was able to successfully build the sitemap - the rebuild function worked exactly as expected- although a little while later had problems with Cron timeouts (which I think occurred once the "Minimum Sitemap Lifetime" period I had specified expired).

I hope you can help point me to a solution. Not sure if this a bug I've uncovered or an issue with how I have configured the module. Is there a site size limitation on the module?
Thanks very much for the module, and appreciate your help.

CommentFileSizeAuthor
#62 xmlsitemap.jpg18.06 KBBMBRV
#57 Screenshot.png28.45 KBAlexisWilke
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dave Reid’s picture

Status: Active » Postponed (maintainer needs more info)

Couple questions:
1. What version of PHP are you using?
2. What's your PHP memory limit and Apache/PHP max execution time limits?
3. Do you see any odd errors at admin/reports/dblog? Also if any cron errors are there either.

danieldd’s picture

Thanks for the quick reply:

1) PHP version 5.2.9
2) memory_limit 64M, max_execution_time 300
3) Nothing other than the cron errors: Cron run exceeded the time limit and was aborted.
Nothing shows up at admin/reports/dblog with the errors on rebuilding the sitemap.

anoopjohn’s picture

Ok I think I have hit a similar issue while rebuilding the sitemap. My site has around 130k nodes. Initially I was getting "An error has occurred.Please continue to the error page An HTTP error 500 occurred." during the rebuild process. I was using mod_fcgid which I changed to prefork and now the rebuild gets stuck at different node numbers. When I set the batch size to 5 I got stuck around 600 nodes and when I set to 100 I got stuck at around 7800. There were no errors in watchdog.

webcons’s picture

The same problem. I have about 150K entries after rebuilding about 30K it crushes. I increased PHP memory from 128MB to 256MB and now it crushes at 70K. So I have only trancated version of sitemap.

An error occurred. /batch?id=15&op=do { "status": true, "percentage": 24, "message": "Remaining 4 of 5.\x3cbr/\x3eNow processing \x3cem\x3enode\x3c/em\x3e 553279 (33400 of 154301)." }
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 16444555 bytes) in /home/______.com/public_html/includes/common.inc on line 626

After that I get a list of errors:

user error: Negative changecount value. Please report this to http://drupal.org/node/516928.
array ( 'type' => 'node', 'id' => '519881', 'subtype' => 'contact', 'loc' => 'node/519881', 'status' => '1', 'status_default' => '1', 'status_override' => 0, 'priority' => '0.5', 'priority_default' => '0.5', 'priority_override' => 0, 'changefreq' => 0, 'changecount' => -1, 'lastmod' => '1258105774', 'access' => true, 'language' => '', ) in /home/f______.com/public_html/modules/xmlsitemap/xmlsitemap.module on line 471.

The error was also in alpha1

Drupal 6.12
MySQL database 5.0.51a
PHP 5.2.6-1+lenny3
PHP memory limit 256M
PHP register globals Disabled
Web server Apache/2.2.9 (Debian) DAV/2 PHP/5.2.6-1+lenny3 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g

danieldd’s picture

Version: 6.x-2.0-alpha2 » 6.x-2.0-alpha1

For others with this problem, were you able to use earlier versions of XML Sitemap successfully?

I haven't fully tested this, but my issue seemed to appear on upgrading from 6.x-2.0-unstable5 to 6.x-2.0-alpha2

webcons’s picture

The same problem was in 6.x-2.0-alpha1

webcons’s picture

Tested 6.x-2.0-unstable5. The same problem.

An error occurred. /batch?id=16&op=do { "status": true, "percentage": 58, "message": "Remaining 4 of 9.\x3cbr/\x3eNow processing \x3cem\x3exmlsitemap_node\x3c/em\x3e link 35100." }
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 21912988 bytes) in /home/________.com/public_html/includes/common.inc on line 642

zyxware’s picture

@webcons - I think you have hit a memory limit issue. How many nodes are you processing in one go?

webcons’s picture

Actually cron is running fine with 5000 nodes per run. Before it was 100.
So it looks that it will index all content, but the problem is I cannot rebuild sitemap.

So I have in yellow "The XML sitemap data is out of sync and needs to be completely rebuilt."
Then I press "rebuild" I get errors.
PHP memory limit is 256 MB

zyxware’s picture

Ok (I think) I have a working solution for the "cannot rebuild sitemap problem". Please try this.

a) Disable and Uninstall XMLSitemap completely
b) Install XMLsitemap
c) Make sure that no content types are enabled for the sitemap
d) Rebuild the sitemap while no content types are enabled. This should completely run successfully
e) Enable contenttypes you wish to include in the sitemap
f) Set the number of items to be processed per cron run to be a small number and increase cron frequency if you want. I used 100 per cron run and cron every 1 min
g) Wait until all the nodes are processed
h) Verify that you can see your sitemap and subpages at the given sitemap URL

Anoop

zyxware’s picture

Status: Postponed (maintainer needs more info) » Active

Changing status back to active as there are a few people who have hit this issue

Dave Reid’s picture

Hmm...I wonder if this is bad node data causing this.

zyxware’s picture

I have more information - possibly related to this and hence posting it here instead of as a new issue. I have had cron timing out since yesterday after I had issued the rebuild and added the content types. I ran cron from command line and got this error

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 83 bytes) in /sites/all/modules/xmlsitemap/xmlsitemap.generate.inc on line 31

The code says

    $query = db_query("SELECT src, dst FROM {url_alias} WHERE language = '' ORDER BY pid");
    while ($alias = db_fetch_array($query)) {
      $aliases['all'][$alias['src']] = $alias['dst'];
    }

Isn't that a lot of strings to be loaded into the array for a site like mine which has around 150K nodes?

danieldd’s picture

Version: 6.x-2.0-alpha1 » 6.x-2.0-alpha2
Category: support » bug

Hi, I was just wondering if there had been a solution or workaround to this issue?

I have tried the workaround in #10, but this did not do anything. I still get "An HTTP error 500 occurred. /batch?id=20&op=do" on rebuilding the sitemap. Cron also times out whenever this module is enabled.
This is even when no nodes or any other content are set to be included in the sitemap, and when "maximum number of sitemap links to process at once" is set to 5.

(Note that I did receive error messages on one of my uninstalls of this module - as per issue- http://drupal.org/node/705226 - not sure if this has contributed to these problems?)

Unfortunately, it is preventing me from being able to use this module or generate a sitemap.

Is there anyone who is successfully using the module on a largeish site (>50,000 nodes) who has found a way of configuring it? webcons/ zyxware, did you manage to solve this? Or are there any other modules/ solutions anyone can recommend?

I'm assuming now, that this behaviour is not intended, so recategorising this as a bug report.

Thanks for any help.

zyxware’s picture

I have the module running fine (mostly) on this site where I have closely 200,000 nodes. I have not ventured to rebuild the site map after I had the problems though. So I cannot confirm whether the rebuild is working fine.

Anoop

Dave Reid’s picture

Rebuilding should only be used if absolutely necessary. Ideally you shouldn't ever have to do it.

danieldd’s picture

Thanks. Although in my case I'm also unable to run cron to build the sitemap. Cron times out whenever this module is installed, even if I do not include any content in the sitemap.
How did you manage to get cron to work?

Anonymous’s picture

Status: Active » Postponed (maintainer needs more info)

What do you mean by "cron times out"? Cron runs for 240 seconds and that is for all hook_cron implementations. What does your server side log files indicate for PHP? Are you sure you have a big enough memory_limit value? How about admin/reports/dblog, does it show any indication of failure?

Maybe try the suggestions from #361171-40: How to debug cron stoppage problems with bad PHP content?

danieldd’s picture

Sorry, I mean I get a WSOD , as a result of a PHP memory error:

[02-Apr-2010 17:24:37] PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 39 bytes) in /home/director/public_html/includes/database.mysqli.inc on line 161

My memory limit is 64M, which I cannot increase as on a shared server. Does the module require greater memory?

I just tried the rebuild function and got the same error, although earlier I was getting this memory error: [02-Apr-2010 09:50:41] PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 83 bytes) in /home/director/public_html/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc on line 39

If relevant, the related code here is:

 if ($language && $last_language != $language) {
    unset($aliases[$last_language]);
    $aliases[$language] = array();
    $query = db_query("SELECT src, dst FROM {url_alias} WHERE language = '%s' ORDER BY pid", $language);
    while ($alias = db_fetch_array($query)) {
      $aliases[$language][$alias['src']] = $alias['dst'];
    }
    $last_language = $language;
  }

Could #13 above, have something to do with this?

Thanks for the suggestions on debugging Cron (NB- I don't have any PHP data in my nodes, and Cron runs fine without XML sitemap installed, even with Search, so don't think this is a bad data issue).

Really appreciate the help and any more suggestions anyone has.

danieldd’s picture

PS- admin/reports/dblog confirms the error on Cron.

cron 04/02/2010 - 22:24 Cron run exceeded the time limit and was aborted. admin

I also installed a bit of code to help debug Cron (http://drupal.org/node/123269#comment-644012) and this shows that Cron has hit xmlsitemap prior to exceeding the time limit.

Anonymous’s picture

Ok. Did you check your apache error log? Did you check your php log? Did you check the syslog? Any other errors indicated in watchdog before or after your "hit xmlsitemap" message? Have you tried installing the elysia_cron module and the dtools->wsod to try to help debug the issue as I suggested earlier? I'll suggest elysia_cron to anyone, it should become a part of the core cron management system. With elysia_cron you can run the various cron hooks with different timings from each other and even block the running of a cron hook and still force it to run manually. The dtool->wsod is a mess and I've been hacking on it for a project I have installed Drupal. I stopped the output to watchdog if the user is anonymous and I added a '>' string to the $output variable so that seemingly empty messages have a link I can open.

danieldd’s picture

Thanks.

I've installed elysia_cron and have run the various cron hooks in isolation. Everything is executing except: xmlsitemap_cron (w) XML sitemap ping. [run]

(NB- The Xml sitemap subsidiary modules have executed according to elysia_cron:
xmlsitemap_engines_cron (w) - [run]
xmlsitemap_menu_cron (w) - [run]
xmlsitemap_node_cron (w) Update XML sitemap with new nodes [run]

However, the sitemap is not being populated, presumably because the main module will not run. )

/admin/reports/dblog is recording the following messages when I attempt to run cron:
cron 04/06/2010 - 05:00 Unexpected temination of cron context default, aborted. Anonymous
warning cron 04/06/2010 - 05:00 Cron run exceeded the time limit and was aborted. Anonymous
cron 04/06/2010 - 05:00 Cron context default run started. Anonymous
cron 04/06/2010 - 05:00 hit elysia_cron cron Anonymous

Note it does not record any message when I try and run individual cron hooks in isolation.

Regarding error messages, the same memory error messages I mention in #19 above, repeat when I try to execute xmlsitemap_cron - sometimes I get:
[06-Apr-2010 03:15:20] PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 83 bytes) in /home/director/public_html/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc on line 39

Or sometimes I get:
[06-Apr-2010 04:03:16] PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 37 bytes) in /home/director/public_html/includes/database.mysqli.inc on line 161

NB - the code related to the first of these error messages is in post #19, above. The code related to the second error message, if relevant, is:

/**
 * Fetch one result row from the previous query as an array.
 *
 * @param $result
 *   A database query result resource, as returned from db_query().
 * @return
 *   An associative array representing the next row of the result, or FALSE.
 *   The keys of this object are the names of the table fields selected by the
 *   query, and the values are the field values for this result row.
 */
function db_fetch_array($result) {
  if ($result) {
    $array = mysqli_fetch_array($result, MYSQLI_ASSOC);
    return isset($array) ? $array : FALSE;
  }
}

These messages are from my error_log. I'm afraid I do not know where to locate a php log or syslog. There are no messages in my watchdog other than the ones I have mentioned above.

I have not yet installed dtools, because the site is a production site and am a bit nervous of some of the critical bugs reported in that module, and also the dependencies that the module has. And, I should stress, I am not getting a wsod on the site itself. I just cannot get cron to run with xmlsitemap installed.

I hope this gives you a few more clues, and I'm happy to run more tests or retrieve more info. Really appreciate your help

Anonymous’s picture

Status: Postponed (maintainer needs more info) » Closed (works as designed)

Now that is the type of information we needed. Can you copy the production data to a test system and reproduce the error? I need to take a look at the wsod code and produce something a little better for one of my own projects. But it won't be until later in the month so not a help at the moment. NOTE: This won't matter since there is nothing that can be done without increasing the memory_limit as described below.

Looking at

[06-Apr-2010 03:15:20] PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 83 bytes) in /home/director/public_html/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc on line 39

I see that the issue is more alias paths than will fit in your allowed memory space.

Given that you would need to either increase the amount of memory allowed or decrease what is being fit into memory. The database query being executed is filling the alias array for the specified language. It has already filled the alias array for the 'all' language (i.e. the alias paths where the language isn't specified). In #19 you stated

My memory limit is 64M, which I cannot increase as on a shared server.

is this because you can't change the php.ini file? You can add php_value memory_limit 128M to the .htaccess file to increase the memory_limit. Or it may be time to buy more resources or a different host provider.

danieldd’s picture

Thanks, that makes sense. It seems I can increase the memory a little so will experiment and post back the results.

Although there are still a couple of mysteries. First, the rebuild function (and possibly cron) on this module was initially working on my site in version 6.x-2.0-unstable5. It then stopped working (even though the site had only grown slightly and memory had not decreased), and also did not work when I upgraded to the latest release, and when I uninstalled and tried downgrading again. If it is a hard memory contraint I've hit, then it seems strange that it previously worked.
Second, I've tried the module while excluding almost all content from the xml sitemap, yet cron still does not run. Would excluding this content not reduce the alias array that the module loads into memory and avoid the problem I've experienced?

Thanks again for all your time & help.

Dave Reid’s picture

I think most of the problem is caused by pre-fetching of the url aliases. I'll add an option to have that disabled by default.

danieldd’s picture

Just to update, I increased memory to 96M. The good news is that Cron now executes, and also the sitemap rebuild function executes (or at least it executes when xml sitemap is set to only include minimal content). There are now no PHP memory errors in the error log. Having said that, I think that #25 is an extremely good idea as it is obviously preferable for this amount of memory not to be neccessary.

The bad news is that although I no longer experience the memory issue, sadly I am still experiencing a problem. I'll post it here as appears related, but if you think it needs its own issue, let me know and I'll start a new one.

When I run cron or the rebuild function I now receive this error message:
user warning: Error writing file '/var/lib/mysql/tmp/MY5u4L3x' (Errcode: 28) query: SELECT src, dst FROM url_alias WHERE language = 'en' ORDER BY pid in /home/director/public_html/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc on line 37.

and the sitemap.xml file does not seem to be updated properly. A sitemap has been created, although it seems to be indexing only a tiny handful of pages, and it contains node/1234 links rather than their aliases. admin/reports/dblog also shows the above error message. Although Cron is reported as having run successfully, and the rebuild function is also reported as having completed. There are no messages in the error log.

I understand that error 28 implies that the disk is full. I've spoken to my host and understand it was trying to write a 1/2 GB file to the temporary folder, which is beyond my disk space allowance, and also implies something not working as it should.

I hope this makes sense & please let me know if you need any more info. Thanks,

Dave Reid’s picture

I'm pretty sure that has to deal with the pre-fetching of the URL aliases. Try the latest 6.x-2.x-dev and turn that option off.

danieldd’s picture

I've tested the latest 6.x-2.x-dev with pre fetching of aliases unchecked - thanks for adding that feature to the module.

This does indeed resolve the above error. Both Cron and the rebuild function now run with no reported errors either in admin/reports/dblog or the error log.

However, I hate to say this, but it's still not quite working.

Running cron does not seem to update or populate the sitemap. xmlsitemap/settings does indicate that links are being indexed and made visible when cron runs, but these are not visible at sitemap.xml. And the "list" page of xmlsitemap module did not indicate that links and pages were being added to the sitemap. The sitemap did not seem to update on cron when even only a small handful of nodes were set to be included within it. I have also tried it while setting maximum links to process at once to 5.

The rebuild function however, did seem to work more effectively. I was able to run it to build a sitemap for c. 60k nodes. Although, I had set "Number of links in each sitemap page" to automatic, and the module created 2 pages each with 30k links. As these pages were too large to load, I have tried to rerun the rebuild function setting number of links per page to 1000. Although the "List" page on the module now indicates that a sitemap of c 60 pages has been created, only 2 pages, each with 1000 links are visible at sitemap.xml.

Really grateful for all your help with this.

danieldd’s picture

Sorry, misreported above. It seems the rebuild did actually create the new sitemap properly, with c. 60 pages. I guess the reason it did not update earlier must have been a caching issue. So, it seems the rebuild function now more or less works for larger sites (I did experience one crash, but other than that seems to have worked fine).

Although further testing confirms that Cron is still not building the sitemap. It is updating the xmlsitemap database table, but this data is not then getting written to the xml file.

Thanks,

vpapadim’s picture

Version: 6.x-2.0-alpha2 » 6.x-2.0-beta1

subscribe
I really need to have this up and running....

danieldd’s picture

Title: Cannot generate sitemap- module causes cron timeouts and error on Rebuild » Problems running cron to build the sitemap on large sites
Status: Closed (works as designed) » Active
seanburlington’s picture

subscribing

amysteen’s picture

I have this same issue. I have about 75 links included in the sitemap and 128M of memory (dedicated virtual host). My error log says "Cron run exceeded the time limit and was aborted."

I unchecked the option for "Prefetch URL aliases during sitemap generation."

Got this cron error as soon as the xml sitemap module was installed, was ticking along fine before.

*subscribe*

seanburlington’s picture

As our site has grown this module has become unusable

I've created a more basic sitemap based on the code at

http://www.seo-expert-blog.com/tutorial/drupal-6-xml-sitemap-for-nodes

This builds very quickly - lists only node pages - but it works.

Dave Reid’s picture

"As our site has grown this module has become unusable" is not at all helpful to me. What is wrong? What error messages do you get? Have you tried some of the performance tips recommended in the handbook pages and/or issue pages?

seanburlington’s picture

I see timeout or memory exhaustion errors - depending on the client I use to access the page.

I've looked around for solutions and tried various versions of this module - I'm not asking for help or complaining - just sharing my solution as personally I find it helpful when people post alternatives on issue queues.

Dave Reid’s picture

Status: Active » Postponed (maintainer needs more info)

It's still hard to say how we can address anything when we don't know what's at fault.

tomsm’s picture

subscribing

HS’s picture

I saw the following in my logs today: "Cron run exceeded the time limit and was aborted."

Cron had not run for two days and when I tried to re-run cron manually I see this: Fatal error: Call to undefined function lock_acquire() in /homepages/8/****/htdocs/Domains/****/sites/all/modules/xmlsitemap/xmlsitemap.module on line 1162

I am on D6 using 6.x-2.0-beta1 and I do not even have 5000 nodes.

Anonymous’s picture

@HS: What version of D6? You may need to upgrade to a more recent version of Drupal because lock_acquire is new. The "Cron run exceeded ..." is a normal warning when all of the cron hooks exceed 240 seconds. You can use Elysia Cron module to control which module's cron hooks are executing at a given time.

HS’s picture

Hi earnie,

This site is on Drupal 6.14.

BTW, the only way I can get cron to run is by rebuilding the site map.

HS’s picture

One more thing, Cron runs every half hour as it should on my site, but once a day it fails. I have 'Minimum sitemap lifetime' set to one day as well. Which leads to me to believe that regenerating the sitemap is what is causing cron to fail or exceed the time limit/timeout.

Anonymous’s picture

@HS: The warning that the cron exceeded the time limit isn't a failure, it is normal. There are 240 seconds to execute all cron_hooks in all modules and once the 240 seconds are exceeded Drupal recovers itself by removing or resetting the variables that tell the cron process that it is executing already and gives you a warning. You can use a module named Elysia Cron to help you minimize this. I hope Elysia Cron is inserted into core in D8.

HS’s picture

@earnie: Thank you. I'm going to give Elysia Cron a look now.

HS’s picture

earnie, that looks pretty confusing. How would that help me and how can I set it up? Can you please help?

Thank you!

chawl’s picture

In our case, this was the solution.

FYI.

AlexisWilke’s picture

Version: 6.x-2.0-beta1 » 6.x-2.x-dev
Status: Postponed (maintainer needs more info) » Active

The problem is because of this function:

_xmlsitemap_create_cache_files()

Someone wrote everything properly to generates chunks one by one, and then (probably that same) someone wrote the cache function that loops through ALL the chunks to generate the entire cache all at once. That's not appropriate. If I am querying chunk 3, why do you generate chunk 0, 1 and 2 first? And then go on and generate chunk 4, 5 and 6? Really! I only need chunk 3 right now. 8-P

The cache files function needs to get the chunk as a parameter and generate only that one chunk as required.

I think that the needs update flag needs to be used in a completely different way. In xmlsitemap_output(), do something like this:

if (variable_get('xmlsitemap_sitemap_needs_update', FALSE)) {
  xmlsitemap_erase_cache();
  variable_get('xmlsitemap_sitemap_needs_update', TRUE);
}

Then have a new check & create function pair with $chunk as a parameter:

  if (xmlsitemap_check_cache_file($chunk)) {
     xmlsitemap_create_cache_file($chunk);
  }

Now, if I were to rewrite that file, I would have ONE function to generate the filename of a given file. Right now, this is a HUGE mess. Maybe even create an array (an object?!?!?) with all the parameters and pass that around the different functions so we can generate things as required.

I'll be happy to show you what I can do (so it actually works for websites with 1 million nodes,) but you have to promise me that you will check in my patch (once it works which may not be the 1st iteration, but you know...)

Thank you.
Alexis Wilke

Dave Reid’s picture

@Alexis Wilke: Please do us a favor and double-check which version you're using compared to this issue, because there is no function like that in 6.x-2.x.

AlexisWilke’s picture

Dave,

Thank you for reminding me! 8-) I have version 2 of another module and thought it was this one...

Anyway, I can see that the system was changed to make use of a batch. But I don't think a batch can work in a cron call. Instead, each step of the batch should be "manually" called one at a time on each cron run. That's the way I would do it.

Anyway, outside of the version 2.x looks a lot better, but as I commented on another issue, it's missing a few lines of code in regard to the proper contexts... and without those no XML sitemap is generated.

Update 1: the context is not necessary for the nodes to be added, however, you need to make sure that they are enabled (in Content types.) There is an issue in regard to creating a new XML sitemap & contexts... it is not necessary to have any context to create a first (basic/default) sitemap. However, it was not automatically created on the upgrade path.

Update 2: I can now see the "Default base URL" entry in the "Advanced settings". This is good for my first case problem (i.e. www. versus secure. — same website but I want the XML sitemap on the www. and never on the secure. pages.) This works. Now I still have a problem with sitemaps on other URLs that use the same site to show a few pages... but that is a rather special case. So at this point I'm good with the current 2.x version.

Thank you.
Alexis Wilke

anshuman’s picture

I can confirm a combination of comment #10 and comment #25 worked for me. My site has about 15k articles and 10k taxonomy terms.

This is what I did -

- Disable and uninstalle XMLSitemap completely - this will clean up the XMLsitemap tables too
- Clear out all cached xmlsitemaps from sites/default/files/xmlsitemap
- Install XMLsitemap (I used 2x-beta)
- Disable pre-fetching of the url aliases (IMPORTANT! This is one of the main culprits if memory is a constraint)
- Make sure that no content types or taxonomies are enabled for the sitemap
- Rebuild the sitemap while no content types or taxonomy are enabled.
- Enable content types you wish to include in the sitemap
- Set the number of items to be processed per cron run to 5000 (or a smaller number if your cron times out)
- Run cron manually -> note that the XMLsitemap tables increase in size, and sitemap is generated in sites/default/files/xmlsitemap folder
- Repeat manual cron run until all the nodes are processed
- now enable taxonomies to be included
- repeat manual cron runs until all of it is indexed -> check index status in XMLsitemap settings page
- verify that you can see your sitemap and subpages at the given sitemap URL

Cheers,
Anshuman

AlexisWilke’s picture

It would be a good idea to test how much memory you have left in the the loop process and end the process if you're low in memory (or time too.) That way you avoid problems. Dunno whether that's possible to stop the loop without preventing nodes from making it to the XML sitemap on the next cron call.

giozzz’s picture

Hello, I had the issue, then I solved restarting Apache. Now there's something else worrying me: links on xml sitemap have been rebuilt, but on configuration page I see only 1 page has been indexed? I can't understand what this means. In settings I've included all content types I want to be indexed, and the list there shows they've actually been indexed... so what's the meaning of "links 150" "Pages 1"?
Another scary fact is that the address mysite/sitemap.xml returns a white page... I remember I could see all xml links before...
In Google WebMaster Tools page no errors are reported concerning the xml sitemap.... I know these may be stupid questions but I can't understand, and I need an explanation! Thanks everybody for support!!!!

AlexisWilke’s picture

giozz,

(1) the word "Pages" is indeed confusing, I made a comment about that. It should be "Chunks" as defined in the XML sitemap documentation (i.e. how many XML sitemap files are required to handle your entire site.)

(2) if you are logged in on your website, you may not be able to see the XML sitemap because only anonymous users are supposed to do so.

giozzz’s picture

Thank you for this explanation! Everything seems to work right... only thing is I'm still unable to see the xml links page, even as anonymous. I hope this ain't a problem....

AlexisWilke’s picture

giozz,

You may want to give it a try without that CSS style sheet support in case you had that turned on. But you should see it. Another problem could be that you somehow got a cached version somewhere along the way.

Thank you.
Alexis

giozzz’s picture

ok... but what could I do to find/delete this cached version you refer to? .. I've tried to clean the cache with "css optimization" unchecked, but still no results... but maybe again tha's not what you meant... thanks again for your commitment!

AlexisWilke’s picture

FileSize
28.45 KB

I meant the style sheet reference in the sitemap.xml file.

I'm attaching a screenshot of the XML sitemap settings (under admin/settings/xmlsitemap/settings)

Notice the first checkbox? That's the one that you should try unchecked. Then make sure that the XML sitemap is regenerated at least once and try again to access it at /sitemap.xml

Thank you.
Alexis

afagioli’s picture

Thanks for http://drupal.org/comment/reply/720878/3239180#comment-3239180

I've added a couple of arguments to url to handle count limits...
This way hopefully I'll get rid of all those memory/time/resource issues

function nu_sitemap() {

  if ( !is_numeric( arg(1) )  )  {
    echo 'please provide a valid integer as arg(1) ';
    exit;
  }

  if ( !is_numeric( arg(2) )  )  {
    echo 'please provide a valid integer as arg(2) ';
    exit;
  }

  $base = ($_SERVER['HTTPS'] ? 'https://' : 'http://') . $_SERVER['SERVER_NAME'] . base_path();
  $urls = array();

  $qry  = "SELECT ua.dst, n.changed ";
  $qry .= "FROM node n ";
  $qry .= "INNER JOIN url_alias  ua ON ua.src = CONCAT( 'node/', n.nid ) ";
  $qry .= "WHERE n.status =1 ";
  $qry .= "AND n.nid > " . arg(1) . " ";
  $qry .= "AND n.nid <= "  . arg(2) . " ";
  $qry .= "ORDER BY n.changed DESC ";

  $result =db_query($qry );

  while($r = db_fetch_object($result)) {
    if ($r->dst != 404 && $r->dst != 403) {
      $urls[$base . $r->dst] = $r->changed;
    }
  }
 
  header('Content-Type: text/xml');
  $xml =<<<EOF
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>$base</loc><changefreq>daily</changefreq></url>
EOF;
  foreach ($urls as $url => $t) {
    $xml .= '<url>';
    $xml .= '<loc>' . $url . '</loc>';
    $xml .= '<lastmod>' . date("Y-m-d", $t) . '</lastmod>';
    $xml .=  '</url>';
  }
  $xml .= '</urlset>';
  print $xml;exit();
}
giozzz’s picture

Thanks again! now i can see it!!! Sorry for my sluggishness! :)

wuinfo - Bill Wu’s picture

Have same this problem when the node number was over 150,000. Cron stop running because of this module. At the end, have to disable this module and generated sitemap manually with some sql function and put the file under the website root folder.

Anonymous’s picture

Sites that have a large number of nodes really need to consider using the Elysia Cron module to help manage the cron hooks. Especially knowing that you have 240 seconds of time to execute all cron hooks. Executing individual hooks less or more frequently is a plus and doable with Elysia Cron.

BMBRV’s picture

Issue tags: +image xmlsitemap BMBRV
FileSize
18.06 KB

Hi all, My xmlsitemap did'nt update new node
My site is more than 500 k node, over 2500 new node per day.
Php 5.216, php limit 256M, Server Centos, run cron setting every 1 minute max 5000 node per run finish xmlsitmap in 8 hour 500k node.
after 2 days index increase over 7000 node but visible do not change
This appear like this http://baomuabanraovat.com/xmlsitemap.jpg
If i need new xmlsitemap must remove old xml sitemap node from module and run elisa cron over 9 hour
Please advice

Beanjammin’s picture

Running 6.x-2.0-beta3 on a site with 111k nodes and attempting to build the initial xmlsitemap. Site is running Pressflow 6.20.97 with PHP 5.2.6

Preload path aliases is disabled.

xmlsitemap is configured to build a default, english, and spanish sitemap.

Initially tried via web interface, but a page reload timed out during the batch process and when I reloaded the page, while it appeared to continue on its merry way, the counter reached 400K+ of 111K after a full day so I stopped it. I had hoped the counter was just off by 3x due to the multiple languages.

I then tried via drush however it failed with the following:

drush xmlsitemap-rebuild
Drush command terminated abnormally due to an unrecoverable error.                                                                                           [error]
Error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 24 bytes) in
/includes/cache.inc, line 33

Yes, that is 1GB of RAM allocated.

In it's current state cron is running ok (only 100 nodes processed per run), but the sitemap is incomplete and there is the warning that the whole sitemap needs to be rebuilt.

I would appreciate suggestions on how to proceed.

Beanjammin’s picture

To follow up on my earlier post, I tried running the rebuild process via the web interface and, rather than starting over at the beginning, it appeared to pick up where the earlier drush rebuild process has run out of RAM. I say this because it completed the rebuild very quickly.

All appears to be working now, including cron runs.

shaiman’s picture

I'm working with a site running Drupal 6.19 and XML sitemap 6.x-2.0-beta2 that is obviously pushing some limits w/ more than 1 million nodes and custom links that need to be included in the generated sitemap. At present, I'm unable to complete a manual or scheduled cron. The client times out, and although I am not seeing any errors generated in the dblog, the process appears to eventually time out. I can see a number of .xml files generated in the sitemap cache directory, but eventually those cease to generate leaving the most recent file remaining empty. The next attempt to execute the cron results in an error "Cron has been running for more than an hour and is most likely stuck.", but a subsequent attempt runs, appearing to start over, but again times out as before.

I've unchecked the option to "Prefetch URL aliases during sitemap generation" and even attempted to reduce the "Maximum number of sitemap links to process at once" option to its minimum value of 5, but have not been able to complete a cron run or even rebuild the sitemap since increasing the number of node/links for inclusion.

Am I just pushing the limitations of this module too far, or are there other steps I can take to allow this to complete normally?

AlexisWilke’s picture

The cron process, from what I've seen, does not take the "Maximum number of sitemap links to process at once" in account. So it will try to generate the 1 million + extra links all at once. You obviously will exhaust your memory or time out before that happens.

I'm afraid that your best bet in this case is to write an external process that creates your sitemap.xml "off line" (not via Apache).

Anonymous’s picture

The cron process, from what I've seen, does not take the "Maximum number of sitemap links to process at once" in account. So it will try to generate the 1 million + extra links all at once. You obviously will exhaust your memory or time out before that happens.

Yea, but, Dave's rewrite was supposed to manage this.

AlexisWilke’s picture

earnie,

And most of the code is there, but it's still not 100% correct yet... 8-)

Still, with 1 million pages, going through CRON to update those will slow down the site much more than creating a separate system (I think).

Anonymous’s picture

Alexis,

I don't disagree but xmlsitemap should provide the external process for initial loads or a complete rebuild. Only changes should be managed by cron. Perhaps a queue design where the changed nid is stored in a queue table that is then processed in FIFO order. And if you have a lot of modules with hook_cron implementations then please use Elysia Cron module to control the individual hook_cron. Give xmlsitemap more time and others less in cron.

shaiman’s picture

Earnie,

Agreed. The initial generation of the sitemap is a single occurrence in my case. Subsequent updates will be a substantially smaller fraction of the initial number. It would be nice to pay that cost up front once and only have a small hit for incremental updates. This actually reflects my earlier understanding of a rebuild operation vs. the continuing cron operations.

anoopjohn’s picture

What is the status of this issue? I have not run into problems in later versions of xmlsitemap. Can we mark this as fixed?

gobinathm’s picture

I believe there won't be any support / fix for a problem identified in 6x going forward. Given the fact that D6 is already EOL. Hence i guess this issue can be closed.

Changing the status, if incorrect pls revert the status.

gobinathm’s picture

Status: Active » Closed (outdated)