Problem/Motivation
Search engines are indexing the media/oembed links in websites
And the search results for the oembed remote media can be viewed in search results
Then a landing page will open with only the content of the remote media in it.
Include Oembed media link in the
robots.txtto be disallowed# Oembed media Disallow: /media/oembed Disallow: /en/media/oembed Disallow: /fr/media/oembed Disallow: /es/media/oembed Disallow: /ar/media/oembedThis is a repetitive issue that the support team faces when starting the handover process
Drupal core robots.txt file dose not have the Disallow: /media/oembed
and only following with http://www.robotstxt.org/robotstxt.html
Proposed resolution
Add a line “
Disallow: /media/oembed“ to be configured from Drupal core to be always included inrobots.txt
Regarding the other languages will be dealt with by the delivery team to add the variation according to the available languages on the website
Cover both single and multilingual sites.
Disallow: /media/oembed
Disallow: /*/media/oembed
And clean and not clean Urls
Disallow: /index.php/media/oembed
Disallow: /index.php/*/media/oembed
Remaining tasks
- ✅ File an issue
- ✅ Addition/Change/Update/Fix
- ✅ Testing to ensure no regression
- ✅ Automated unit testing coverage
- ✅ Automated functional testing coverage
- ➖ UX/UI designer responsibilities
- ➖ Readability
- ➖ Accessibility
- ✅ Performance
- ✅ Security
- ➖ Documentation
- ✅ Code review by maintainers
- ✅ Full testing and approval
- ➖ Credit contributors
- ➖ Review with the product owner
- ✅ Release notes snippet
- ❌ Release
User interface changes
- N/A
API changes
- N/A
Data model changes
- N/A
Release notes snippet
- Drupal's default robots.txt file (used for informing web crawlers what paths to not index) has been updated to disallow indexing of Oembed media links.
Issue fork drupal-3271222
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
rajab natshahComment #4
rajab natshahComment #5
rajab natshahComment #7
rajab natshahThanks, Yogesh for the change.
It is the right format.
What do you think of having following
Not only
Disallow: /media/oembed. To support multilingual sites too.Comment #8
rajab natshahComment #9
rajab natshahChanging to use
Comment #10
rajab natshahComment #11
rajab natshahComment #12
rajab natshahComment #13
rajab natshahCover both single and multilingual sites.
And clean and not clean Urls
Comment #14
rajab natshahComment #15
yogeshmpawar@Rajab Natshah - Changes looking good. This will cover both scenario's
Comment #16
rajab natshahComment #17
rajab natshahComment #18
alaa jwiehan commentedComment #20
alexpottWe need a change record and a release note. See #3123285: Actually exclude user register, login, logout, and password pages from search results in robots.txt (current rules are broken) for a previous example of one for a robots.txt change.
Comment #21
bramdriesenI will make a change record and release note.
Comment #22
bramdriesenChange record created: https://www.drupal.org/node/3313604
Release note added to the issue description.
Comment #23
bramdriesenComment #24
bramdriesenComment #25
alexpottThanks @BramDriesen the issue summary changes and the CR looks good.
Comment #26
alexpottCommitted and pushed 198b03a0e5 to 10.1.x and 584a723c91 to 10.0.x and ab6f2ee90b to 9.5.x. Thanks!