Hi,
My client needs has a bunch of pdf-files (plus a number of .doc, .xml and .txt files) that need to be indexed so that the content is searchable. I've looked at different solutions but have found that using Google is what I'm going for. The problem is Google doesn't index all of my pdf-files - even though the pages they're accessible from are in the sitemap,xml. I can make sure it indexes a pdf-file by adding it specifically to the custom pages sub-module that's included, but otherwise Google only seems to index a few random files. But there are too many files for me to add them all manually and it needs to keep indexing new files by default. I've create a php-script that displays all of the files as links and tried adding it manually as a custom page but to no avail.

Is there any way I can add each file dynamically to the sitemap using this kind of script?

Comments

Anonymous’s picture

#1047178: Include .pdf Files in the XML Site Map

A work around maybe to create a custom module that creates a custom content type with a field for the file path and create an alias based on the value of that field and the node/%NID.

alandor’s picture

Thanks, but that sounded quite complicated. I haven't made a module before and I don't know how I would make it create custom content types or how to add each of those to the sitemap.

Shouldn't Google be able to index the files if they are accessible from a site on the sitemap? Maybe there's just something wrong with my Google settings?

Anonymous’s picture

Status: Active » Postponed (maintainer needs more info)

If you have a page with a file link on it and the page is in the sitemap.xml file then Google should find the link to the file as well based on the page data unless you tell it not to follow the link.

flaviotorelli’s picture

You can set your files on sitemap.xml using the feature "Custom links" - /admin/settings/xmlsitemap/custom.

Yes, Google should be able to index the files by a single link.
Make sure that you have no rule in your robots.txt locking some file extension, such as .pdf, .txt etc.

jenlampton’s picture

Issue summary: View changes

I'm working on a new module to add PDFs to the sitemap. I will update this comment with a link as soon as it's ready.

edit: It's ready for testing: https://www.drupal.org/project/xmlsitemap_pdf