Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
By Vako on
On my website people can convert documents to PDF using the Print-PDF module. That module saves the files in a cache folder. How do I prevent search engines from indexing this folder and the PDF files in it?
I have used the Disallow
option to exclude the folder and extension in robots.txt file, but it's not working for me.
I don't want to put a password on the PDF file either.
Comments
Hi,
Hi,
In robots.txt use Noindex as well with disallow
like;
Disallow: /page-one.html
Noindex: /page-two.html
OR
can implement CAPTCHA option as well.
https://www.drupal.org/node/22265
Thank you for the quick reply
Thank you for the quick reply. As mentioned, I already use the robots.txt with the following lines:
Also the files are not html, they have pdf extension and are created on-the-fly using the module. All I know is the directory they are being stored and the extension, both are 'disallowed' in robots.txt
Google is finding the link to the PDF file, which I am trying to exclude it on Google searches.
It fixes easily in the robots
It fixes easily in the robots.txt, you just need to disallow indexing of *.pdf pages. If you need to hide some specific documents but not all of them, you can write a "disallow" rule here as well for each of them separately, or do so with this app https://4000a-125-2-form.pdffiller.com/ it's more about editing pdf forms' output but fits as well for editing restrictions. But you should consider that it's ap aid one yet there's a free trial period right from the start
The PDF files get generated
The PDF files get generated on the fly by visitors, so I don't know the file names, all I know is the folder location.
So I want to have a way to disallow a specific FOLDER from being indexed.
I used Disallow command in the robots.txt file but for some reason it's not working, maybe someone can post the Disallow syntax for folders?
Syntax for Disallowing a complete folder in robots.txt
For anyone still needing an answer to the above, here is an example for preventing the entire core folder from being indexed:
Disallow: /core/
In the same way, you can disallow any folder using the path relative to the docroot. For example, if I want to exclude .pdf files in a specific files folder called donotindex
Disallow: /sites/default/files/donotindex/*.pdf
Lynn Haas