Drupal generates a page at /filter/tips, and that page is indexed by search engines, offering it up to the public as a destination on your website.

Currently, robots.txt is configured only to block sub-pages of /filter/tips/, but in order to block the tips page itself from indexing engines, the trailing slash must be removed. Patch #28, posted by Tor Arne Thune, addresses this issue by removing the trailing slash.

Consideration should be given to re-scoping and/or rolling this issue in with similar issues related to robots.txt.

Committed and pushed 4300e616cc to 8.6.x and 17c9a8a27a to 8.5.x. Thanks!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jensimmons’s picture

Add this:Disallow: /filter/tips to line 32 of robots.txt

Like this:
a screenshot of the robots.txt file with one line of code added to exclude this page

Does someone want to make a patch?

BrockBoland’s picture

FileSize
0 bytes

Sure!

BrockBoland’s picture

Status: Active » Needs review
FileSize
302 bytes

Aw, fer cryin - fixed attached.

Dave Reid’s picture

Version: 7.0 » 7.x-dev
Category: bug » feature
Status: Needs review » Needs work

We need to add the un-clean URL version of it as well. Note, that page is not a file, so it should go one section lower.

tim.plunkett’s picture

Version: 7.x-dev » 8.x-dev
Category: feature » bug
Status: Needs work » Reviewed & tested by the community
Issue tags: +Needs backport to D6, +Needs backport to D7

Looks good.

Dave Reid’s picture

Category: bug » feature
Status: Reviewed & tested by the community » Needs work
Issue tags: -Needs backport to D6, -Needs backport to D7

And as always, fix in 8.x first, then backport easily.

tim.plunkett’s picture

tim.plunkett’s picture

Status: Needs work » Needs review
FileSize
528 bytes

Updated. Crazy cross-posts. It's what happens when Jen asks for Drupal things on twitter.

BrockBoland’s picture

I've read and understand the backport policy (http://drupal.org/node/767608), but what's the actual process for an issue like this? For a simple item like this, it makes sense that a single patch can be applied to D7 and D8, but in more complex cases where the patches differ, should a separate issue be spun off for the D7 version?

Apologies for being a newb - I haven't done any core patches before.

ksenzee’s picture

should a separate issue be spun off for the D7 version?

No, normally it all stays in the same issue. It works fine for simple stuff like this but it's kind of a messy process for complicated issues.

Also, subscribing. I saw drupal.org/filter/tips in some Google results the other day and said huh what?

jensimmons’s picture

Twitter FTW!!

Yeah, the CVS-centric workflow's been to update things in the dev version of Drupal (now D8), then backport to the current version (D7), and then the one-older version (D6). IMO, this workflow could/should/might change now that we have Git, and we can work with branches instead of patches.... but that's not happened yet. So meanwhile, we are following the same rules that were used two years ago when D6 was brand-new and D7 development had just opened. (Or was that three years ago?)

Issues like this one are the test. Super easy to understand. Super easy to write the code. Not much to debate.... now let's see how long it takes to get this into D7, with the crazy D8-first-rule. Especially since we don't have a D8 co-maintainer, and Angie (webchick) doesn't have commit access to D8. Will this be no-biggy? Or will it take months to fix? Our process post-switch-to-git is still evolving.

Meanwhile, welcome BrockBoland to core development! You've been awarded the "My First Drupal Core Patch" badge. :D YAY!

ksenzee’s picture

I don't think the rule about committing to the newest version first is likely to change just because of git. The process is being discussed over at #1050616: Figure out backport workflow from Drupal 8 to Drupal 7.

ksenzee’s picture

Status: Needs review » Reviewed & tested by the community

Oh, and this passed tests since I was last here, so RTBC.

Dave Reid’s picture

+1 from me as well, although now I will no longer be able to google for sites that have their full html input filter on...which is a good thing!

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Makes sense to me.

Committed to 8.x and 7.x. Thanks!

pillarsdotnet’s picture

Version: 8.x-dev » 6.x-dev
Status: Fixed » Needs review
Issue tags: -Needs backport to D7
FileSize
700 bytes

Requested d6 backport:

Dave Reid’s picture

Status: Needs review » Needs work

Don't forget the Disallow: /?q=filter/tips

pillarsdotnet’s picture

Status: Needs work » Needs review
FileSize
1.4 KB

Oops.

pillarsdotnet’s picture

(sigh) Probably better as one patch. Sorry for the noise.

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community
juliangb’s picture

This has been RTBC for 3 months.

I'm using this patch on my live sites and would greatly like for new D6 releases to include this as standard.

Is there anything stopping this from being committed?

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 6.x too, thanks!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

juliangb’s picture

Version: 6.x-dev » 8.x-dev
Status: Closed (fixed) » Active

I'm now finding that Google is not blocking filter/tips because the line in robots.txt has a trailing slash.

We need to remove the slash to ensure that Google always knows to block this page.

pillarsdotnet’s picture

The Redirect module has an option to remove trailing slashes.

juliangb’s picture

Actually the redirect module doesn't help in this instance.

The issue is that in the robots.txt the paths all have trailing slashes, which means that Google does not block any paths without the trailing slashes.

To ensure that it catches everything, we should include a version without the trailing slash in robots.txt.

pillarsdotnet’s picture

Ah. That explains the module which *adds* trailing slashes to everything.

Write a patch, please?

Tor Arne Thune’s picture

Category: feature » bug
Status: Active » Needs review
Issue tags: +Quick fix, +Novice, +Needs backport to D7
FileSize
628 bytes

juliangb is right. It should not have a trailing slash. Attaching a patch that corrects it. As for the suggestion to add a non-trailing-slash-version of paths with a trailing slash, I feel that it deserves its own issue.

Tor Arne Thune’s picture

juliangb’s picture

Status: Needs review » Needs work

Thanks for posting the patch, Tor Arne - a good reminder for me seeing this pop up in my issues tracker.

I disagree with fixing the other links in a separate issue though, hence the "needs work" for now. This would leave a slightly "hacked" state until the other issue was fixed.

GaëlG’s picture

Issue summary: View changes
Issue tags: +#amsterdam2014

I'm on it.

GaëlG’s picture

Status: Needs work » Needs review
FileSize
1.12 KB

Here's a new patch. I checked in the router table to see if the path can have subpaths. If so, we need to list both formats (end slashes and no end slashes).
/search/ needs indeed to be listed to avoid search results indexing, but it seems not bad to me that the search landing page can be indexed. That's why I did not add /search.

oenie’s picture

Issue tags: -#amsterdam2014 +Amsterdam2014

fixing the amsterdam sprint tag to amsterdam2014

ronaldmulero’s picture

cilefen’s picture

The scope of this issue is /filter/tips only and that is all that should be fixed here, considering #180379: Fix path matching in robots.txt exists. So, proceed from #28.

ericjenkins’s picture

I'm at a sprint in Los Angeles. I'm going to check that the patch in #28 still applies to D8 core.

ericjenkins’s picture

Patch #28 still applies successfully into robots.txt. I will seek a way to test it against an indexing validator.

ericjenkins’s picture

I'm hiding Patch #32 because it was beyond the scope of this ticket.

ericjenkins’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
9.64 KB

I've tested the indexing of filter/tips using a personal development machine with Google Webmaster Tools robots.txt tester. I confirmed that, prior to applying Patch #28, the tips page was indexed by Google. After applying Patch #28, the tips page is no longer indexed by Google. This validates the removal of the trailing slash on filter/tips

YesCT’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: -Novice +Needs issue summary update

Seems like the problem is that we have some listings where we intended to disallow, but they are not being disallowed because of an erroneous trailing slash.

Do that same test in google webmaster tools on node/add to see. (for example)

If so, I would suggest retitling and rescoping this issue to just address that problem. #180379: Fix path matching in robots.txt might be about a variety of problems.

(also, an issue summary update would be nice, explaining the back and forth of the direction of the issue)

Depending on the result, maybe add the novice tag back, with explicit next steps.

ericjenkins’s picture

I adjusted Add Node permissions on my Drupal 8 test site to allow anonymous browsing to node/add and also to node/add/article. Here are the results of my findings from Webmaster Tools index testing of node/add/, with and without the trailing slash in robots.txt:

Trailing slash:
Disallow: /node/add/
Disallow: /index.php/node/add/
The node/add page is indexed, but sub-URLS of node/add are blocked.

No trailing slash:
Disallow: /node/add
Disallow: /index.php/node/add
The node/add page is blocked, and sub-URLS of node/add are blocked.

ericjenkins’s picture

Issue summary: View changes
ericjenkins’s picture

Issue summary: View changes
mgifford’s picture

Status: Needs review » Needs work

Needs re-roll.

opdavies’s picture

@mgifford: Which patch and branch are you testing with? The patch in #28 applies cleanly to both 8.0.x and 8.1.x.

mgifford’s picture

opdavies’s picture

Status: Needs work » Needs review

It looks like it's trying to apply a Drupal 7 patch to 8.0.x.

mgifford’s picture

My bad... I see what I did wrong. My main goal was looking at the bots not being able to test the patches.

I'm just going to re-upload the patch from #28.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

  • webchick committed 01ac764 on 8.3.x
    Issue #1137848 by BrockBoland, tim.plunkett, jensimmons: Disallow /...

  • webchick committed 01ac764 on 8.3.x
    Issue #1137848 by BrockBoland, tim.plunkett, jensimmons: Disallow /...

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

  • webchick committed 01ac764 on 8.4.x
    Issue #1137848 by BrockBoland, tim.plunkett, jensimmons: Disallow /...

  • webchick committed 01ac764 on 8.4.x
    Issue #1137848 by BrockBoland, tim.plunkett, jensimmons: Disallow /...

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

FiNeX’s picture

Hi, will this patch be included on the next Drupal release? Thanks!

FiNeX’s picture

In multilanguage environment the path could be in the following form: /LANGCODE/filter/tips. Do we need to manually patch robots.txt in order to Disallow all those pages?

Pancho’s picture

Status: Needs review » Reviewed & tested by the community

The followup still isn’t committed.
Very straightforward, #48 does the job: the trailing slash must go.

alexpott’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Fixed
Issue tags: -Needs backport to D6

Re #59 this is true for all of the things listed in robots.txt and as such we need a general solution. This patch does not make the situation worse. There are other issues around this topic. For example doing something like #1032234: Use Robots Meta Tag rather than robots.txt when possible would be a better solution.

Let's proceed with this small improvement.

Credit is a bit of mess for this issue therefore just going with everyone who added a file since the last commit.

Committed and pushed 4300e616cc to 8.6.x and 17c9a8a27a to 8.5.x. Thanks!

Drupal 7 backports are now filed as separate issues linked to this one.

  • alexpott committed 4300e61 on 8.6.x
    Issue #1137848 by Tor Arne Thune, mgifford, GaëlG, ericjenkins: /filter/...

  • alexpott committed 17c9a8a on 8.5.x
    Issue #1137848 by Tor Arne Thune, mgifford, GaëlG, ericjenkins: /filter/...
Pancho’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.