Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I get a lot of Google requests like "/node/comment/reply/13". This URL is invalid (it should be "/comment/reply/13"), yet invoking it returns the main page and a status of 200: OK. This leads to duplicated content, which is penalized by Google and others.
I suggest to instead return a 404: Not Found, this is because the page requested was - well - not found, actually.
To achieve this, lines 2109p in node.module have to be changed from
>
drupal_set_title('');
return node_page_default();
to
drupal_not_found();
return;
Comment | File | Size | Author |
---|---|---|---|
#30 | node_64.patch | 852 bytes | chx |
#24 | node404.5.0.patch_2.txt | 880 bytes | dww |
#22 | node404.5.0.patch_1.txt | 881 bytes | dww |
#15 | node404.5.0.patch | 633 bytes | gerd riesselmann |
#3 | node.module.404.4-7.patch | 333 bytes | gerd riesselmann |
Comments
Comment #1
gerd riesselmann CreditAttribution: gerd riesselmann commentedSilly me, the case of /node needs to be handled. Thats what the code should look like:
Comment #2
owahab CreditAttribution: owahab commentedI had this issue too.
I think the way drupal handles the second argument (node/$arg) is confusing for me, I tried fixing this in the node.module file but it's not an easy job, at least for me.
Dozens of dependancies I can't foresee.
Comment #3
gerd riesselmann CreditAttribution: gerd riesselmann commentedComment #4
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedlooks good, moving to HEAD for more reviews.
Comment #5
m3avrck CreditAttribution: m3avrck commentedWould like to see this backported to 4.7 as well--- once it gets committed.
Code looks good and makes sense to me.
Comment #6
ChrisKennedy CreditAttribution: ChrisKennedy commentedComment #7
m3avrck CreditAttribution: m3avrck commentedThis is a bug, if a page is not found, it should not redirect to another page claiming it still exists :-p
This should go into 5 and be backported to 4.7. However, this patch doesn't apply to HEAD because node_page() is gone now.
Comment #8
m3avrck CreditAttribution: m3avrck commentedIMO, this is indeed a critical bug. It would be trivial to create a spamming solution using this. Easy enough to ruin a sites reputation in a search engine too.
Bumping this to 6 though, this would not be a trivial patch. This would be great to incorporate into the thinking going behind the new menu system proposed for 6 too.
Comment #9
gerd riesselmann CreditAttribution: gerd riesselmann commentedI don't know about 5 and 6, but in the meantime: can somebody just apply the existing patch to the 4.7 branch? ;-)
Comment #10
moshe weitzman CreditAttribution: moshe weitzman commentedi proposed a more generic solution for this at http://drupal.org/node/74347. i think this is a duplicate of that one, but i'll wait for someone else to change status.
Comment #11
gerd riesselmann CreditAttribution: gerd riesselmann commentedYes, your patch handles the same problem, but I doubt it will be incorporated in the foreseeable future, since it proposes a rather big change. At least it won't be incorporated into 4.7.
Since everybody commenting here actually
I'd like to ask again for someone to apply it to this branch - please.
Comment #12
moshe weitzman CreditAttribution: moshe weitzman commentedi think this is a pretty good fix. this has to be applied to HEAD before it can be backported. moving ...
Comment #13
gerd riesselmann CreditAttribution: gerd riesselmann commentedAs m3avrck pointed out in #7, this patch cannot be applied to HEAD, since HEAD menu system works different in many ways. So what's the sense of porting this patch to HEAD, which means to completely rewrite it, just to port it back afterwards?
My point of view is that this patch solves a 4.7-specific critical bug in a 4.7-specific way. If there's are similar unsolved problem in HEAD or 5.0, so it may be. But this cannot be an argument to not fix the original bug!
Comment #14
gerd riesselmann CreditAttribution: gerd riesselmann commentedI've investigated 5.0 and the patch does not apply to this, either, since a URL like http://drupal.org/node/ispamdrupalorg gets routed to node_default_page() without further argument checking. URLs with patter node/id however are mapped to node_page_view().
In 4.7 node_page() does handle both cases and does internal argument checking - and that is where my patch comes in: It modifies the argument checking case statement.
Therefore, for 5 or HEAD a completely different solution needs to be found.
Comment #15
gerd riesselmann CreditAttribution: gerd riesselmann commentedSo, to end moving responsibilities around, here is a proposal to fix this for 5.0. In HEAD however, menu system has changed again and seems to be in the flow, so I second the patch proposed by moshe in #10.
I suggest somebody applies the 4.7-fix, and afterwards we discuss the 5.0 patch ;-). Agree?
Comment #16
m3avrck CreditAttribution: m3avrck commentedI think this might be a better general fix:
http://drupal.org/node/74347
So this might be a duplicate of that as Moshe pointed out above.
Comment #17
moshe weitzman CreditAttribution: moshe weitzman commentedseems that i've made this too complicated. lets fix DRUPAL-5 with ted's patch and then 4.7 with gerd's patch. reassigning accordingly.
Comment #18
BioALIEN CreditAttribution: BioALIEN commentedI agree with Moshe, lets get this applied for 4.7.x and 5.x and then concentrate on D6 maybe using the new menu system. Drupal is widely used for its good SEO features. This patch addresses a major bug with regards to SEO. I've seen Google and Yahoo! bots send out test checks on some of our client sites in the form of google+hash.html so they do carry out 404 tests on websites.
+1 to getting this committed asap.
Comment #19
m3avrck CreditAttribution: m3avrck commentedI agree with the said comments :-)
However, I didn't write a patch yet :-p
Moshe, can you clarify *which* comments have the correct patches to be committed? I'm sure one of the core committers would ask this question, so might as well ask it for them :-p
Comment #20
moshe weitzman CreditAttribution: moshe weitzman commentedoops again. gerd's most recent patches are the ones to go with. one for 5.0 and one for 4.7
Comment #21
dwwgah, hate to do this, but http://drupal.org/files/issues/node404.5.0.patch from #15 contains
drupalnot_found()
. someone should re-roll that.http://drupal.org/files/issues/node.module.404.4-7.patch is indeed RTBC.
oh, and +1 for fixing bugs in older versions, even if HEAD is in flux or bugs aren't present in newer versions. ;)
Comment #22
dwwhere you go. ;)
to repeat, http://drupal.org/files/issues/node.module.404.4-7.patch from #3 is also RTBC for 4.7.x.
cheers,
-derek
Comment #23
dwwalas, whoops. previous patch (and therefore, my latest for 5.x) had syntax errors, too. hehe, guess i was a little hasty with that RTBC. ;)
Comment #24
dwwi couldn't get empty() to play nicely with arg(). this seemed like the simplest approach. another pair of eyes, please. ;)
Comment #25
m3avrck CreditAttribution: m3avrck commentedI'm not clear where this $validateargs comes from -- maybe a note about that in a comment? Should it be called $validate_args for clarity as well? Lowercase "true" should be "TRUE".
Wouldn't !isset(arg(1)) be faster than != as well?
Comment #26
dwwFYI: i was just making a quick pass through the RTBC queue, and came upon this. i don't personally care much, and i have no time to work on it, which is why it's not Assigned to dww. ;) someone who cares will have to pick up the torch here. just wanted to make sure i hadn't created any false expectations...
cheers,
-derek
Comment #27
gerd riesselmann CreditAttribution: gerd riesselmann commentedSince everyone agrees that 4.7-patch is RTBC, I marked this bug accordingly. I suggest to set it to 5.x.dev and status "code needs review" after the 4.7-patch has been applied.
Comment #28
RobRoy CreditAttribution: RobRoy commentedShould go on HEAD first and then be backported. Marking back to review to make sure this applies cleanly to HEAD.
Comment #29
gerd riesselmann CreditAttribution: gerd riesselmann commentedAargh! Please do not start moving responsibilities around, again! There is a solution of a critical bug for 4.7 around for 4 months, and it doesn't get applied because everyones wants it to be ported to higher versions first. This is not good.
Rob, please read a thread before changing status: porting this to 5.0, or HEAD means complete rewriting, backporting therefore obviously means rewriting the rewritten - or applying the actual patch, that everyone already agrees on.
Comment #30
chx CreditAttribution: chx commentedWho is the everyone that agreed this is RTBC? The code is downright ugly. Why the attached patch is not enough?
Comment #31
BioALIEN CreditAttribution: BioALIEN commentedI didn't have enough time to test chx's approach but once again,
Comment #32
gerd riesselmann CreditAttribution: gerd riesselmann commentedOooookay. Let's get things straight.
However, since menu system has changed from 4.7 to 5.0 patches to either version are completely different. This means that talking about "backporting" is downright nonsense, since "porting" in this special case means "rewriting".
Now, since 5.0 and HEAD still need to be discussed but 4.7 is ready, and unfortunately the bug reporting system doesn't allow us to express this situation properly, I proposed in #27 to mark this bug as RTBC for 4.7, wait till it gets committed, and afterwards mark it as 5.0, "code needs review".
Can we agree on this, please? I fear this bug else will never get fixed for 4.7 but endlessly discussed.
Or maybe two bug should be filed, one for 4.7 and one for 5.0?
Comment #33
BioALIEN CreditAttribution: BioALIEN commentedSomeone please commit patch in #3 to 4.7.x branch. Then we'll move on to the 5.x discussions.
Comment #34
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedgerd, that's not the workflow we follow. We follow the sequence HEAD, 5, 4.7 in order to avoid inconsistencies.
Comment #35
Dries CreditAttribution: Dries commentedWhen I try to access 'http://localhost/drupal-cvs/node/comment/reply/13' as per the original bug report, I get an access denied message. Could it be that this is no longer an issue for CVS HEAD?
Update: I also get an SQL error, I'm afraid. Not sure if that's related to the 'access denied' but I'm guessing it might be. We'll want to fix the 'access denied' message in CVS HEAD before we can properly test this. :/
Comment #36
Wesley Tanaka CreditAttribution: Wesley Tanaka commented"Yes Please" (subscribing)
I've found a bunch of hits from googlebot of the form:
/node/?page=&
because the default 'node' page has a pager at the bottom......
Comment #37
Wesley Tanaka CreditAttribution: Wesley Tanaka commentedErr, that was supposed to say:
/node/[some string]?page=[large number]&[maybe other stuff]
This bug appears to be a duplicate of http://drupal.org/node/82764
To clarify the 5.x status:
Comment #38
Wesley Tanaka CreditAttribution: Wesley Tanaka commentedSorry. http://drupal.org/files/issues/node-not-found.2.diff doesn't work on non-numeric node paths.
So the only patch which works for me on 5.1 is http://drupal.org/files/issues/node404.5.0.patch_2.txt
(and to answer the question in #24, you could assign the return from arg() to a variable and test that variable with empty())
Comment #39
matt_paz CreditAttribution: matt_paz commentedsubscribing
Comment #40
Dries CreditAttribution: Dries commentedThis is a bit unwieldy. So, what is considered to be the best patch for CVS HEAD? :)
Comment #41
fractile81 CreditAttribution: fractile81 commentedhttp://drupal.org/files/issues/node404.5.0.patch_2.txt
This patch appears to work for me as well. It appears to handle this problem on my site correctly (5.1, had to apply by hand and move the changes to line 2372). Will keep testing it, of course.
Comment #42
kbahey CreditAttribution: kbahey commentedAs per Dries in #35, I can't reproduce the original issue with today's HEAD.
If you type ?q=node/something, you get an access denied, plus the following errors:
If you type a URL that references a non existent node such as q=comment/reply/123456, you get a clean access denied.
If you type an invalid URL such as q=node/comment/reply/123456, you get the access denied plus the above error.
So, what is the problem we are trying to solve? Doesn't exist anymore?
Comment #43
drummI went ahead and reviewed and committed #30 to Drupal 5 at chx's advice. In the future, I think multiple issues for different versions of Drupal with significantly different patches is okay.
Comment #44
chx CreditAttribution: chx commentedGerhard, #3 is OK and HEAD needs a totally different patch (menu.inc level, I am thinking allowing callback arguments to be FALSE not an array in cases like this).
Comment #45
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedApplied #3 to 4.7.
moving back to HEAD
Comment #46
m3avrck CreditAttribution: m3avrck commentedWhere's the 6.x patch? Fixed in 4.7, 5.x but not 6.x? *doh*
Comment #47
xqus CreditAttribution: xqus commentedAs far as I can see this is not an issue in 6.x.
Comment #48
chx CreditAttribution: chx commentedindeed. if node/dfghjk is not defined that any path beginning with that will come back with 404 thanks to node_load returning FALSE thus translate returning FALSE thus menu_get_item returning FALSE
Comment #49
(not verified) CreditAttribution: commentedComment #50
xsean CreditAttribution: xsean commentedver 6.x got this issue as when the link as below
valid url: http://www.example.com/node/4
invalid url: http//www.example.com/node/4/invalid
where the invalid show the same content as valid url. it show direct to 404:page not found and will cause duplicate content
i search the solution for whole day but still cannot find it...anyone can help here? thanks!
Comment #51
looney_toons CreditAttribution: looney_toons commentedI wonder why this issue is ignored.
On 6.17 I see the following:
example.com/"termalias"/"something-malformed" shows the content of
example.com/"termalias"
As a result we can have several urls pointing Google to the term's page if the url of some node of this term (which is example.com/"termalias"/"date"/"title") is changed and/or some other node gets deleted.
Why does Drupal in this case show the content of some url it has found while it should just redirect to this url?
Comment #52
looney_toons CreditAttribution: looney_toons commentedI will keep bumping this issue till I find the solution.
Comment #53
dww@looney_toons: Please don't do that. It's just annoying and adds noise. Only bump the issue if you have something useful to contribute to it. Thanks.
Comment #54
looney_toons CreditAttribution: looney_toons commentedWith boost module enabled this can also lead to wasting disk space if a lot of malformed URLs are accessed forcing boost to generate static files for these URLs.
Comment #55
Teddd CreditAttribution: Teddd commentedThis is awkwardly from seo point it makes the frontpage looks as duplicated content
also the crawlers gets confused but keep coming back butt for no reason for pages that not exist anymore or never have been.
I know setup a disallow in robot.txt may help, but this not solves it.
Comment #56
grendzy CreditAttribution: grendzy commentedThis is the way the menu router works; some modules actually rely on this behavior via the arg() function. Given that, it probably can't be changed until Drupal 8. Also note this isn't the only way to create duplicate URLs, you can add arbitrary query strings like node/1?foo and the query string will be ignored.
The real solution is to avoid outputting incorrect links in the first place.