In cases of errors (especially if caused from wrong parameters passed to xmlsitemap_output()) the module returns a 404 error. It would be better if it would return XML content without links; it would be better than to have an error that causes the user to open an issue report.
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | xmlsitemap_pages_092398.patch | 2.38 KB | avpaderno |
Comments
Comment #1
andreiashu commentedHi Kiam,
I'm not really sure if your suggested approach would be oki. We should not look at the problem from a user's point of view but from a search engine's point of view (SEs are the ones that matter in this case).
I'm not sure whether is better to return a XML without links than a 404. Could an XML without links mean for the search engine that the current website doesn't have any content (so it is not worth checking) ?
PS: btw: congratulations for the beta release ! Very nice :)
Comment #2
avpadernoI am not sure the search engines would take a web site without links in the sitemap as a not worth to check web site.
The actual code returns a 404 error even in the case the table used to contain the links is empty; in such case would be better to return an empty sitemap, which is the most correct interpretation of the facts.
Also, the 404 error is interpreted from the users like a permanent error; in other words, they think the error will be returned until somebody will fix the code, or they are said what to do to remove the error message they see.
If the project would return an empty sitemap, we could tell them be patient; the sitemap will get populated with time, which is exactly what happens (taking in consideration that the single modules populate their database tables at cron time).
Comment #3
avpadernoThat is also the assumption taken from the search engines, which will report the 404 message error in the page they offer for the webmasters. In such cases, the webmaster page suggest to verify that all is working, and set correctly; the search engines don't think oh well, let us wait a little more and see what happens, but report the error immediately after they get it.
As I said, in the case of the project modules installed from few time, it's only question to wait the database tables get populated; returning a 404 error causes everybody to think something is bugged, and they need to fix it in someway.
Comment #4
Anonymous (not verified) commentedPerhaps a better approach is for the hook_cron of xmlsitemap.module to recheck the access of links in the xmlsitemap table and update the priority to -1. The issue with that is we would need to do it in chunks and store the last lid processed.
Comment #5
Anonymous (not verified) commentedScratch the previous comment. I finally had some coffee. ;D
In case of errors such as stated in the OP the course of action would be to log a watchdog entry and have only the site link in the sitemap.
Comment #6
avpadernoThe topic of the report is a little different.
xmlsitemap_output()callsdrupal_not_found()when it reveals an error. As the 404 error is interpreted like a permanent error, I suggested that the function would simply return an empty chunk, or an empty sitemap because, most of the times, the error is caused by the xmlsitemap table not populated. This happens because the single modules don't updated their tables immediately; the tables are populated when a node is created/update, when a new taxonomy term is created/updated, when a user is created, or whenhook_cron()is executed.Before that happens, the single tables don't contain data, and the central table cannot contain data either.
EDIT: I had my coffee earlier; that is the reason I can think more brilliantly than my usual. :-)
Comment #7
avpadernoThis is the proposed patch.
EDIT: The proposed patch doesn't work;
file_transfer()can be used for files contained in the file directory, and not for files contained in module directories.Comment #8
avpadernoThis has been changed in CVS.