Hey,

Generated a test sitemap of my nearly complete site.

Surprised to see links I did not know existed - like urls pointing to taxonomy results (the site gathered all the elements with a specific taxonomy term and created a page from them) - this MIGHT have something to do with the Exposed Filters I used.

Since I have never built a Drupal site, I am not sure how to "block" those pages from:
1 - Being indexed on my sitemap (do I just erase those entries on the XML file?)
2 - Being crawled (I am guessing this is with a robots.txt file though I have never created one before)
3 - Being viewed (stumbled upon) -> if this even possible to block?

I am just looking for confirmation to these or other directions.

Really appreciate it.

Thanks!

Comments

sprite’s picture

1. What method did you use to create the "sitemap" for your Drupal website?

2. Drupal distributions come with a robots.txt file already, that you can/should edit, version control and diff when updating core.

3. If you are using the - xmlsitemap module - to generate an xml file to submit to search engines, the xmlsitemap admin UI includes options for excluding taxonomy terms from the sitemap, either all, or you can do so individually in each taxonomy term. Sometimes taxonomy terms are sitemap entries that should appear in the sitemap, for example on a commerce site that uses a taxonomy to store all its product categories (with - pathauto module - url aliases for each of them of course), where each taxonomy product category is a landing page (important) within the site.

spritefully yours
Technical assistance provided to the Drupal community on my own time ...
Thank yous appreciated ...

adminMN2023’s picture

1 - I was trying out the xmlsitemap module - but it was not picking up everything, I need to read more about it. So I went to a sitemap builder online that did the job great - and pointed me to the issues that prompted the thread.

2 - I'm taking by your response that I'm on the right track - I'll definitely edit the core robots file.

3 - See number 1. I really wish the XML sitemap module started with everything then let you pare/trim back. With this module you have to know, at the beginning, what you want to list - which also means you have to know exactly how all the parts work to begin with. Makes it a lot more difficult for a beginner. (Same reason I'm struggling with the Pathologic module, conversely the Metatags modules is awesome and easy to understand right from the install.)

Thanks for the help. I'm getting there, albeit slowly!

ThirstySix’s picture

Simple! Add disallow in robots.txt file

Disallow: /taxonomy/

& Add no follow no index on taxonomy pages using the metatags module (/admin/config/search/metatag)

<meta name="robots" content="noindex, nofollow" />