Problem/Motivation

Search engines are indexing the media/oembed links in websites
And the search results for the oembed remote media can be viewed in search results
Then a landing page will open with only the content of the remote media in it.

Include Oembed media link in the robots.txt to be disallowed

# Oembed media
Disallow: /media/oembed
Disallow: /en/media/oembed
Disallow: /fr/media/oembed
Disallow: /es/media/oembed
Disallow: /ar/media/oembed

This is a repetitive issue that the support team faces when starting the handover process

Drupal core robots.txt file dose not have the Disallow: /media/oembed
and only following with http://www.robotstxt.org/robotstxt.html

Proposed resolution

Add a line “Disallow: /media/oembed“ to be configured from Drupal core to be always included in robots.txt
Regarding the other languages will be dealt with by the delivery team to add the variation according to the available languages on the website

Cover both single and multilingual sites.

Disallow: /media/oembed
Disallow: /*/media/oembed

And clean and not clean Urls

Disallow: /index.php/media/oembed
Disallow: /index.php/*/media/oembed

Remaining tasks

  • ✅ File an issue
  • ✅ Addition/Change/Update/Fix
  • ✅ Testing to ensure no regression
  • ✅ Automated unit testing coverage
  • ✅ Automated functional testing coverage
  • ➖ UX/UI designer responsibilities
  • ➖ Readability
  • ➖ Accessibility
  • ✅ Performance
  • ✅ Security
  • ➖ Documentation
  • ✅ Code review by maintainers
  • ✅ Full testing and approval
  • ➖ Credit contributors
  • ➖ Review with the product owner
  • ✅ Release notes snippet
  • ❌ Release

User interface changes

  • N/A

API changes

  • N/A

Data model changes

  • N/A

Release notes snippet

  • Drupal's default robots.txt file (used for informing web crawlers what paths to not index) has been updated to disallow indexing of Oembed media links.

Issue fork drupal-3271222

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Rajab Natshah created an issue. See original summary.

rajab natshah’s picture

Issue summary: View changes
rajab natshah’s picture

Issue summary: View changes
rajab natshah’s picture

Assigned: rajab natshah » Unassigned
Status: Active » Needs review

yogeshmpawar made their first commit to this issue’s fork.

rajab natshah’s picture

Thanks, Yogesh for the change.
It is the right format.

What do you think of having following

Disallow: /media/oembed
Disallow: /*/media/oembed

Not only Disallow: /media/oembed. To support multilingual sites too.

rajab natshah’s picture

Issue summary: View changes
rajab natshah’s picture

Changing to use

Disallow: /media/oembed
Disallow: /*/media/oembed
rajab natshah’s picture

Issue summary: View changes
rajab natshah’s picture

Issue tags: +SEO
rajab natshah’s picture

Issue summary: View changes
Issue tags: +Drupal SEO
rajab natshah’s picture

Issue summary: View changes

Cover both single and multilingual sites.

Disallow: /media/oembed
Disallow: /*/media/oembed

And clean and not clean Urls

Disallow: /index.php/media/oembed
Disallow: /index.php/*/media/oembed
rajab natshah’s picture

Title: Include Disallow Oembed media link in the robots.txt file » Include Disallow Oembed media link in the robots.txt file for better Drupal SEO
yogeshmpawar’s picture

@Rajab Natshah - Changes looking good. This will cover both scenario's

rajab natshah’s picture

Issue summary: View changes
rajab natshah’s picture

Title: Include Disallow Oembed media link in the robots.txt file for better Drupal SEO » Include Disallow Oembed media links in the robots.txt file for better Drupal SEO
alaa jwiehan’s picture

Status: Needs review » Reviewed & tested by the community

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs change record, +Needs release note

We need a change record and a release note. See #3123285: Actually exclude user register, login, logout, and password pages from search results in robots.txt (current rules are broken) for a previous example of one for a robots.txt change.

bramdriesen’s picture

Assigned: Unassigned » bramdriesen

I will make a change record and release note.

bramdriesen’s picture

Assigned: bramdriesen » Unassigned
Issue summary: View changes
Status: Needs work » Needs review

Change record created: https://www.drupal.org/node/3313604
Release note added to the issue description.

bramdriesen’s picture

bramdriesen’s picture

Issue summary: View changes
alexpott’s picture

Status: Needs review » Reviewed & tested by the community

Thanks @BramDriesen the issue summary changes and the CR looks good.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +9.5.0 release notes, +10.0.0 release notes

Committed and pushed 198b03a0e5 to 10.1.x and 584a723c91 to 10.0.x and ab6f2ee90b to 9.5.x. Thanks!

  • alexpott committed 198b03a on 10.1.x
    Issue #3271222 by Rajab Natshah, yogeshmpawar, BramDriesen: Include...

  • alexpott committed 584a723 on 10.0.x
    Issue #3271222 by Rajab Natshah, yogeshmpawar, BramDriesen: Include...

  • alexpott committed ab6f2ee on 9.5.x
    Issue #3271222 by Rajab Natshah, yogeshmpawar, BramDriesen: Include...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.