What I would like to see is a new hook (eg. hook_gsitemap()) which allows for the inclusion of pages that are not node based.

eg. include user profile pages (user/n) into the sitemap.

Maybe it could be done with a single function that can be called up update the data, and then a hook which can get any additional data.

I ask this as I have a bibliography module, which all the data is not held in a node, as I didn't really want the overhead of a node, but it needs to be indexed as well.

CommentFileSizeAuthor
#4 ghook.tar_0.gz4.16 KBSamAMac
#3 ghook.tar.gz4.16 KBSamAMac

Comments

Tobias Maier’s picture

+1 for the hook_gsitemap

I think we need it.

peter_n’s picture

Yes, me too. My member directory (/profile) is an important part of my site that google should know about.

SamAMac’s picture

Assigned: Unassigned » SamAMac
Status: Active » Needs review
StatusFileSize
new4.16 KB

I have implemented a simple hook_gsitemap. I have attached a modified version of gsitemap.module 4.6 as well as a simple module demonstrating usage of the hook. I would appreciate it if you could try it out and offer any suggestions before I commit it to cvs.

SamAMac’s picture

StatusFileSize
new4.16 KB

I have implemented a simple hook_gsitemap. I have attached a modified version of gsitemap.module 4.6 as well as a simple module demonstrating usage of the hook. I would appreciate it if you could try it out and offer any suggestions before I commit it to cvs.

gordon’s picture

I took a look at this, and the problem that I can see it is it will not scale infinately without running out of memory. If you were to try and load an additional 100,000 links into the hook it will fail.

Maybe it you were to implement this something like the search implementation. So when a link gets updated it will call an api function which will have enough information to populate the gsitemap table, and the gsitemap table will have all the information to build the xml.

Also maybe there should be a method with the api so that you can override the google submit option so that when you are doing global updates you can delay the update to google.

SamAMac’s picture

Forget about having it "scale infinately." Google imposes a hard limit of 50,000 URLs per sitemap anyway.

Google's solution to this is a sitemap index file: http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#sitemapF.... However, a sitemap index file can only contain links to sitemaps. Thus, my suggestion would be that if you have a legitimate need to have 100,000 non-node links you should create your own sitemap index file and point it the gsitemap as well as sitemaps you define yourself with your own links. The solution I have provided does not use an inordinate amount of memory, and is fine for adding a few random links.

Now the inclusion of user profile pages is a legitimate thing to be included by the gsitemap module itself, and I am working on adding that functionality.

gordon’s picture

I didn't realise that google has a limit. Well this does mean that drupal.org wouldn't be able to use this, as it is over 50000 nodes.

What I need to use it for is to add an additional 2400 urls which is going to be a big array. which is why switching it the other way would make sense for scalling.

SamAMac’s picture

Well, I am loath to do anything that would require changing the database schema, which would be required to include any non-node links in the gsitemap table.

I'd suggest a much simpler method: in the hook_gsitemap, simply print out your own entries and return an empty array. This approach not only eliminates the memory usage, but would work without modifying the code I submitted.

SamAMac’s picture

Status: Needs review » Fixed

Applied to all versions.

Anonymous’s picture

Status: Fixed » Closed (fixed)