Hi
We use Google Custom Search Elements for our site so the accuracy of Google's index is very important. We have found that, when we unpublish content (which we do as part of our versioning, worklow etc), the links aren't removed from Google's index. On investigation, I realised that the redirect 403 to login gives a '302 temporararily moved' status code. This means that Google won't remove it from its search index. I guess this is expected behaviour but it is really bad for us as it means that out of date content remains in google's search index so is shown to users when they search our site, even within the site. Is anyone else experiencing this? Does anyone have any suggestions?
Thanks
Chris
| Comment | File | Size | Author |
|---|---|---|---|
| #33 | 1887930-fu-33.patch | 458 bytes | pwolanin |
| #31 | 1887930-6.x-31.patch | 4.11 KB | pwolanin |
| #28 | 1887930-7.x-27.patch | 4.69 KB | pwolanin |
| #26 | alternate_1887930_needs_work.patch | 2.59 KB | greggles |
| #25 | 1887930-7.x-25.patch | 4.25 KB | pwolanin |
Comments
Comment #1
deekayen commentedAre you proposing a change?
Comment #2
macman commentedHi
Sorry for the delay in getting back to you. It seems this module is working as designed. I just wanted to share the issues that it causes us with our setup and hopefully stimulate some discussion on how to solve this.
Cheers
Comment #3
michalmajzlik commentedSolved this by modifying one line on r4032login.module, where is hardcoded "302", so I changed it to "301".
Comment #4
michalmajzlik commentedHere is a patch.
Comment #5
deekayen commentedSince I don't entirely agree with you, how about changing the patch to make the redirection code configurable? Then admins could pick 301, 302, or 307.
Comment #6
michalmajzlik commentedGood idea. Here is updated patch.
Comment #7
michalmajzlik commentedComment #8
greggles@deekayen can you clarify why you think 302 is the right error here? It seems that Google's interpretation of 302 results in clearly undesirable behavior in this scenario.
Comment #9
deekayen commentedI don't feel strongly enough to block this any longer. Committed #4. I wasn't a fan of #6's ability to insert any random text into the HTTP header.
Committed to 7.x-1.x, 6.x-1.x, and 5.x-1.x.
Comment #11
pwolanin commentedI think this change was a mistake. Yes, for unpublished content you'd want a 301 perhaps. However, if the goal is to e.g. help an admin to log in if they got logged, you don't want a 301 which may be cached in their browser.
Comment #12
deekayen commentedI would entertain inserting some conditional checks to do alternations between 301 and 302.
Comment #13
pwolanin commentedHere's a 7.x version that allows you to set a default code and match paths for the other code.
Comment #14
pwolanin commentedoops, that has a typo in the code comment. I'll fix after rolling the 6.
Comment #15
pwolanin commentedhere's the corresponding 6.x patch
Comment #16
deekayen commentedSince we're injecting a user-defined HTTP code, is there any point in adding extra #validate to the settings form to make sure a valid 301 or 302 setting was actually submitted vs someone firebugging the form?
Comment #17
pwolanin commented@deekayen - normally the Drupal Form API only allows you to post the options defined in the form. If you modify the options Drupal gives this error message:
"An illegal choice has been detected. Please contact the site administrator."
Comment #18
pwolanin commentedHere's the 7.x patch with typo fixed.
Comment #19
pwolanin commentedAnd with support for the Variable module.
Comment #20
pwolanin commentedupdating title. Since this wasn't released yet, moving back to feature.
Comment #21
pwolanin commentedcommitted to 7.x
Comment #22
pwolanin commentedand 6.x
Comment #23
gregglesThis fix lacks real documentation of when to use which code or why one might be better than the other for a particular path. If we're going to give people this kind of interface I think we owe it to them to explain when they would want 302 vs. 301. I think it would be ideal to add some documentation before a releases explaining which one people should use and the conditions that they might want the other.
Mostly this makes me think the underlying issues are not understood well enough and we should explore the problem and potential solutions some more.
I think we need a bigger solution (that might be bigger than this issue). What about returning the 403 and putting the login form inside the page instead of doing a redirect. Is there a reason the module doesn't do that?
Comment #24
pwolanin commentedI agree I'm still not sure this is the right solution.
Maybe we should change this so instead of letting you have pages with a different redirect code, we instead skip the redirect and so serve the 403.
That should be effective in removing e.g. outdated content from search engines. We could have a default node/* pattern
Comment #25
pwolanin commentedthis changes it so we skip the redirect on match, and by default don't redirect any node pages.
Comment #26
gregglesHow about this?
It keeps the 403 header so that search engines can do the right thing.
Comment #27
deekayen commentedI'm alright with keeping 403.
Comment #28
pwolanin commentedCombines the 2 approaches
Comment #29
gregglesThis works in testing of a few scenarios.
Comment #30
pwolanin commentedOk, thanks - since this provides everything the OP wanted, and also leaves the default behavior unchanged w.r.t. the prior release, let's include it.
Comment #31
pwolanin commentedcommitted to 7.cx, here's the corresponding patch for 6.x
Comment #32
pwolanin commentedtested locally and committed.
Comment #33
pwolanin commentedfollow-up to delete variables in uninstall
Comment #34
pwolanin commentedcommitted to 6 and 7