I have the Global Redirect and RobotsTxt modules enabled with a multilingual site. Unfortunately this means that www.mysite.com/robots.txt gets redirected to www.mysites.com/en/robots.txt.

While this seems ok in a browser, Google does not follow redirects for robots.txt so it has stopped indexing my site. This seems to have only started recently during an automatic module version update. It seems that something in globalredirect.module is adding a prefix to the robots.txt path.

Is there a way to stop this with the RobotsTxt module? Or do I need to report a bug to Global Redirect to never redirect /robots.txt ?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hass’s picture

Project: RobotsTxt » Global Redirect
Version: 7.x-1.1 » 7.x-1.x-dev
Issue summary: View changes
Mnilionic’s picture

I still have this problem. Is there a solution?

Esculap’s picture

dewalt’s picture

The patch brokes globalredirect module.

It take place, because conditions, listed in #3, are in conflict. Condition $a != $b || $a != $c has no sence and always return TRUE, so the solution is the same as:

function _globalredirect_is_active($settings) {
  return FALSE;
}
dewalt’s picture

@mnilionic

I still have this problem. Is there a solution?

Disable "Language Path Checking" option, or use domain-based or session-based language detection.

awm’s picture

The Patch above does break the module. I think it should be disabled when the the requeset is to robots.txt and not the opposite. So in other words the if statment should be "==" instead of "!= ' . Patch updated.

awm’s picture

awm’s picture

awm’s picture

FileSize
556 bytes
dewalt’s picture

@awm, but if the previous check is passed:

if ($_SERVER['SCRIPT_NAME'] != $GLOBALS['base_path'] . 'index.php') {
    return FALSE;
}

so $_SERVER['SCRIPT_NAME'] is equal to index.php, and there is no sense in the patch check. I think the possible solution can be to check current_path() for 'robots.txt'

awm’s picture

@dewalt I understand.
Well perhaps this issue is not within the scope of globalredirect. More importantly I don't think this is still true for google.
From google source:

3xx (redirection)
Redirects will generally be followed until a valid result can be found (or a loop is recognized). We will follow a limited number of redirect hops (RFC 1945 for HTTP/1.0 allows up to 5 hops) and then stop and treat it as a 404.

Source:
https://developers.google.com/webmasters/control-crawl-index/docs/robots...

If that's the case I think this issue can be closed as won't fix

JeroenT’s picture

Status: Active » Needs review
FileSize
459 bytes

Created a new patch.

aesuk’s picture

I seem to have the same issue here. (7.x-1.6) Im on a multisite setup so we have to use the robots module, and globalredirect is moving the address to https://www.domain.com/en/robots.txt but for us this generates a 404 error. i presume because the robotstxt module does not kick in for the en prefix. But even if it did.

Crawlers will not check for robots.txt files in subdirectories.

https://developers.google.com/search/reference/robots_txt#file-location-...

@JeroenT Did you have any success with that patch

cmseasy’s picture

#13 worked for me

Multisite Drupal, multilanguage, php7

aesuk’s picture

#13 worked for me too, thanks JeroenT

Chris Matthews’s picture

Status: Needs review » Reviewed & tested by the community

The 2 year old patch in #13 to globalredirect.module applied cleanly to the latest 7.x-1.x-dev and has two community reviews so taking the liberty to change the status to RTBC.

The last submitted patch, 4: global_redirect-robots_txt-2027705-4.patch, failed testing. View results