Please review
#1435264-23: Xml stops working Uncaught exception 'XMLSitemapGenerationException' error
#1435264-34: Xml stops working Uncaught exception 'XMLSitemapGenerationException' error
Original Issue
XMLSitemap was working fine with 6.x-2.0-rc1 generating two pages (links are more than 10.000)
Some day i noticed there was an error preventing cron to finish.
Unknown error occurred while writing to file sites/default/files/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml. στο αρχείο /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc στη γραμμή 158.
also i found this in apache's error log:
2012-02-10 09:42:17.497 [NOTICE] [194.219.230.130:57503-0#APVH_moriasnow.gr] [STDERR] PHP Fatal error: Uncaught exception 'XMLSitemapGenerationException' with message 'Unknown error occurred while writing to file sites/default/files/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml.' in /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.xmlsitemap.inc:162
Stack trace:
#0 /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc(155): XMLSitemapWriter->endDocument()
#1 /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.generate.inc(338): xmlsitemap_generate_page()
#2 /var/www/vhosts/moriasnow.gr/httpdocs/includes/batch.inc(189): xmlsitemap_regenerate_batch_generate(Object(stdClass), 1)
#3 /var/www/vhosts/moriasnow.gr/httpdocs/includes/form.inc(2550): _batch_process()
#4 /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.module(1436): batch_process()
#5 /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.module(222): xmlsitemap_run_unprogressive_batch('NXhscRe0440 in /var/www/vhosts/moriasnow.gr/httpdocs/sites/all/modules/xmlsitemap/xmlsitemap.xmlsitemap.inc on line 162
I downloaded the 6.x-2.x.-dev version with the same errors.
Disabling XMLSitemap module (and uninstalling) seems to work fine again but when the xml file gets bigger with cron jobs, the problem comes again. (for rc1 and dev version also)
Trying to rebuild sitmapxml file, the batch proccess never ends successfully.
Any ideas ?
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous commentedI've no idea. The parent class XMLWriter set a status that caused xmlsitemap to throw a stackdump while executing the endDocument method. Well, that's not exactly true. The this::status is set during the flush operation. The documentation for XMLWriter::flush states the return value as
Dave's use of this value is wrong, IMO. It is possible for the number of bytes written with a flush to be zero and thus cause the status to be set to bool FALSE.
If you change the xmlsitemap.xmlsitemap.inc file line 161 from
to
do you get a valid sitemap.xml files?
I am going to add here:
FINALLY, someone with enough logging and know how to give us a proper report. I believe this issue is the crux of a bunch of other issues plaguing this module.
Comment #2
Alex Andrascu CreditAttribution: Alex Andrascu commentedI get this in 7.2.x-dev too while trying to regenerate the sitemap via drush.
Worth mentioning this happened after a huge taxonomy vocabulary inclusion (about 4 million entries) and that it was working before with 20K links split in 4 pages (sitemap.xml?page=1 / sitemap.xml?page=2 / sitemap.xml?page=3 ...)
Number of links in each sitemap page 5000.
Maximum number of sitemap links to process at once: 50 (not sure if this affects drush )
I'd say earnie is pretty close to answer it.
Please let me know
Comment #3
Anonymous (not verified) CreditAttribution: Anonymous commentedRe: #2
Did you try my suggested fix from #1?
Comment #4
Alex Andrascu CreditAttribution: Alex Andrascu commentedYep...applied that...still running but got way far than previous runs. I'll let you know how it ends up :)
Comment #5
Alex Andrascu CreditAttribution: Alex Andrascu commentedWell it went past that error it seems but it eventually died throwing this:
wich i think it's unrelated.
I'm gonna keep trying.
EDIT:
changed drupal_set_time_limit(240); to drupal_set_time_limit(0);
fingers crossed
Comment #6
Anonymous (not verified) CreditAttribution: Anonymous commentedLook at your DB logs. Make sure there isn't a DB lock. If one batch set doesn't finish in 4 minutes then you've other issues to deal with. Setting the time limit to unlimited will only cause you to wait indefinitely for the process to end and if there is a DB lock then well...
Also, rebuilding the xmlsitemap should happen during site maintenance. Probably should be a feature to refuse to do the rebuild if the site is active.
Comment #7
FanisTsiros CreditAttribution: FanisTsiros commentedApplying #1 i got a valid 2 pages xml file with 12.000 links!
Not sure what #1 does though. Is this safe to keep ?
(sorry i have no idea about xmlsitemap module...)
Now i'll wait for cron jobs to run to see if the problem is still there. I have just rebuilt links (not in maintenance mode, but with cron stopped as long as the batch was working).
earnie, thanks for your time dealing with this.
Comment #8
Alex Andrascu CreditAttribution: Alex Andrascu commentedApparently it went fine generating 5004 xml chunks but then thrown this:
Looks like it had a hard time writing to tmpfs wich looks like:
Question for earnie:
If i restart xmlsitemap-regenerate now what happens ? Is it smart enough to pick up where it left or it'll start all over from the beggining. I don't particulary want to read trough all the module's code but i'll gladly help sorting this module out as currently is kinda useless for big data / high volume websites.
Here's more info about my server just to clarify any memory hog questions:
EDIT:
I got my answer...it seems like it's starting over. Maybe this can be a nice optimization ? Add an index of all sane xml chunks and only regenerate from there on. Or maybe add an option to partialy regenerate.
Comment #9
Anonymous (not verified) CreditAttribution: Anonymous commentedRe #8, You're MySQL server is running out of memory. You need to look into re-configuring the memory allocations for it.
Re #7, yes it is safe to keep. The original code is making an incorrect assumption about the return value of the flush. I'll try to find some time for a proper patch but Dave can feel free to go ahead with a patch.
Comment #10
Anonymous (not verified) CreditAttribution: Anonymous commentedComment #11
gordon CreditAttribution: gordon commentedThis fixed my problem that I was having. here is a patch.
Comment #12
gordon CreditAttribution: gordon commentedFixed up the paths so it will apply
Comment #13
Anonymous (not verified) CreditAttribution: Anonymous commentedWhile the patch is a work around we really need to modify or possibly remove the getStatus() function and we need to apply it to D7 first then role it back to D6.
Comment #14
troyer CreditAttribution: troyer commentedPatch from #12 fixed my issue in 6.x-2.0-rc1. Sitemap generation is now way faster too.
Comment #15
rooby CreditAttribution: rooby commentedHere is a version of the workaround patch in #12 for D7.
It fixed the issue for me.
Comment #16
Anonymous (not verified) CreditAttribution: Anonymous commentedSee my comment in #13, it still applies. Rolling the same "needs work" patch doesn't resolve the issue that caused it to need work.
Comment #17
rooby CreditAttribution: rooby commentedI never said it did, and you might notice I didn't set it back to needs review for that very reason.
It is just there as a drupal 7 version of the workaround that was previously provided for drupal 6 for those who either don't know how to fix it better or don't have time to dig into it.
Comment #18
mgriego CreditAttribution: mgriego commentedPatch against 7.x that removes the getStatus() method and associated code.
Comment #19
mgriego CreditAttribution: mgriego commentedSame patch against 6.x.
Comment #21
mgriego CreditAttribution: mgriego commentedResubmit corrected 7.x patch.
Comment #22
mgriego CreditAttribution: mgriego commented*sigh*
Comment #23
mgriego CreditAttribution: mgriego commentedIt's apparently been a while since I did this... apologies.
Comment #24
mgriego CreditAttribution: mgriego commentedAnd, once again, here's the 6.x patch in the proper format.
Comment #26
mgriego CreditAttribution: mgriego commentedFixing the status back.
Comment #27
Anonymous (not verified) CreditAttribution: Anonymous commentedThe patch presented in #24 is opposite of the patch in #23. However, we don't want to get rid of the flush function.
Comment #28
mgriego CreditAttribution: mgriego commentedSorry, was obviously in a hurry while I was doing this, so I got my file order backwards when rolling the 6.x patch. It's not otherwise identical to #23, though, as the offsets are a few lines off in the 6.x patch due to other differences in that file between 6.x and 7.x.
To your other point, though, I'm not sure why you want to keep the flush() method around... That method's *sole* purpose is to set the status property based on the return value from the actual XMLWriter::flush() method. Since the only place that the status property is used is in getStatus, and we've already determined that the return from XMLWriter::flush() isn't useful to what we're doing, I'm not sure why it needs to be kept. If the getStatus() method is removed, then flush() is setting a property that will never get read.
Comment #29
Anonymous (not verified) CreditAttribution: Anonymous commentedThe flush() itself has a purpose. We base status on the return value which is either FALSE on failure or the number of bytes written. The issue is that 0 is a valid return value which equates to FALSE. So the change should be something like the following.
Comment #30
mgriego CreditAttribution: mgriego commentedI'm definitely missing something, then. The PHP docs for XMLWriter::flush() don't state anywhere that the flush() method *might* return FALSE... It just says that it simply returns either the buffer or the number of written bytes, never FALSE...?
Comment #31
Anonymous (not verified) CreditAttribution: Anonymous commentedWell, sorry for the noise in #29. We still need the flush() it is used elsewhere but setting status with the return value is bogus. So the endDocument function should check the return value from the parent, the getStatus function should go away, the setting of the status variable should go away. I don't think anything else is extending XMLSitemapWriter so we should be good with that.
Comment #32
mgriego CreditAttribution: mgriego commentedRight, OK... Checking the $return from the parent::endDocument() is exactly what my patch does, so we're good there.
As to the flush() method, here is what it was before:
Notice that the *only* thing it does, other than calling parent::flush() is set the value of the status property, which we are no longer using. So, again, I see no reason to override it. Removing the method from the XMLSitemapWriter *sub*class won't remove the method from the class... it will just inherit the method from its parent, as all subclasses do. And, since it would no longer be doing anything that the inherited method wasn't, there's no reason to override it. Hence, I removed the method override in my patch... no longer necessary.
And, for the record, I checked the rest of the code, and the *only* place I see the flush() method being called directly is from the writeSitemapElement(), and it's calling it in the object context, so it will use the inherited method if there is no overriding method, just as it should.
So, based on all that, I still don't see what's wrong with the patch I supplied.
Comment #33
Anonymous (not verified) CreditAttribution: Anonymous commentedYour points are accepted as valid and I have no further objection. I'm really not far from a OO beginner in my logic thinking (I'm a strong procedural programmer) and didn't think about the fact that removing the overridden function doesn't remove the function from the object in total. But I agree with everything you've said. We just need to correct the D6 version of the patch. Add -D6 to the file name on the upload so that the testbot leaves it alone.
Comment #34
mgriego CreditAttribution: mgriego commentedSure thing, here ya go.
Comment #35
Anonymous (not verified) CreditAttribution: Anonymous commentedShould we add a parent::flush() after the end of the if {} in the endDocument() function since we've written data with the endDocument()?
Comment #36
mgriego CreditAttribution: mgriego commentedI wouldn't think so... I'm making an assumption here, but I would think that calling endDocument would actually force a flush of the buffer since you're closing out the XML document. Though, if you do decide to add a flush call, you don't need to call parent::flush(), rather $this->flush().
Comment #37
Anonymous (not verified) CreditAttribution: Anonymous commentedIt doesn't say so in the documentation so therefore let's not assume that it does and add it.
Comment #38
pianomansam CreditAttribution: pianomansam commentedJust reporting in that patch from #34 seems to be working for me.
Comment #39
Taxoman CreditAttribution: Taxoman commentedSo, is it time to push this one to 7.x-dev?
Comment #40
Anonymous (not verified) CreditAttribution: Anonymous commentedAs soon as the patch is corrected for #35 - #37. I haven't accepted the fact that we do not need parent::flush().
Comment #41
Carsten Müller CreditAttribution: Carsten Müller commentedHi,
i had the same problem. It appears if the return value in the function flush is 0.
then (bool) $return will become to FALSE and will not change back again.
From the header in parent::flush
so the number of written bytes seems to be 0.
I just changed the function flush to the following lines and it works
Comment #43
Anonymous (not verified) CreditAttribution: Anonymous commentedRe: Carsten Müller
We already have a patches for this. Your patch provides a result that is no different than the already bogus result and will not be accepted.
See the following comments for the patches needing review:
#1435264-23: Xml stops working Uncaught exception 'XMLSitemapGenerationException' error
#1435264-34: Xml stops working Uncaught exception 'XMLSitemapGenerationException' error
Comment #44
asb CreditAttribution: asb commentedI'm experiencing the 'XMLSitemapGenerationException' error from tome to time with D6. The patch from #34 seems to fix the problem, including Cron runs through Drush. Thanks @mgriego!
Since there also are other confirmations that this patch is fine (#38), I'd like to set the status to RTBC, but I can't comment on #35-#37. I can just say that before applying the patch, cron dies and sitemap generation fails, and after applying the patch, cron finishes and sitemap generation succeeds (tested on a site with split sitemaps with 132,209 links).
Comment #45
FanisTsiros CreditAttribution: FanisTsiros commented#1435264-23: Xml stops working Uncaught exception 'XMLSitemapGenerationException' error Works for me ...finally.
I had the same issue for my new website built on drupal 7. This is a new News Aggregator site collecting news with feeds from more than a thousand sites. At the time, we have about 600.000 nodes. Xmlsitemap is 13 pages in a month.
Without the patch sitemap xml generation stopped sometime when we had about 10 pages. Rebulding links was not working throwing unknown error. With the patch #23, i rebuild links successfully and everything goes smoothly !
Comment #46
klausiPatch from #23 seems to solve the problem.
Comment #46.0
klausiAdd the comments containing the patches for review.
Comment #47
lmakarovJust another confirmation that #34 fixes this on 6.x-2.0-rc2. Thanks!
Comment #48
edrokov CreditAttribution: edrokov commented#34 fixes this on 6.x-2.0-rc2. Thanks!
Comment #49
bessone CreditAttribution: bessone commented#23 fixed my problem with 7.x-2.0-rc2
Comment #50
Dave ReidCommitted #23 to 7.x-2.x and #34 to 6.x-2.x. Sorry everyone for the wait.
http://drupalcode.org/project/xmlsitemap.git/commit/5c78eec
http://drupalcode.org/project/xmlsitemap.git/commit/d7f36f6
Comment #51
lostboys CreditAttribution: lostboys commentedxmlsitemap-7.x-2.0 and xmlsitemap-7.x-2.x-dev both give me:
XMLSitemapGenerationException: Generated public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml resulted in an empty file. in XMLSitemapWriter->endDocument() (line 156 of /var/www/www.******.com/public/sites/all/modules/xmlsitemap/xmlsitemap.xmlsitemap.inc).
Running the cron gives me a connection reset.
Any idea how to get it working?
thnx
Comment #52
markusa CreditAttribution: markusa commentedresponse to comment #51, check out this issue, looks like a different problem (with solution) : https://drupal.org/node/2220707
Comment #53
bryanhidalgo CreditAttribution: bryanhidalgo commentedSolution on #52 fixed my problem
Comment #54
iantresman CreditAttribution: iantresman commented#52 pointing to "Error on sitemap generation: "An AJAX HTTP request terminated abnormally." #6 resolved this for me, except that in XML Sitemap 6.x-2.0, the offending line is on line 161 in the file xmlsitemap.xmlsitemap.inc.
I was getting blank white pages on "Rebuild links", and running the cron. I am running Drupal 6.30.