There are many handbook pages, code snippets, forum posts, issues etc. etc. which are only relevant to Drupal 4.7 or earlier, and don't apply to the two versions of Drupal currently supported (Drupal 5 & 6). This situation is only going to get worse over the next year or two as Drupal 5 is EOL'ed, and that content becomes less and less relevant as the userbase for that version shrinks away to nothing.

This body of information, while still useful for a small minority of people who are unable to upgrade their installations for various reasons, is near useless for the vast majority of Drupal's users. More importantly, this information is coming up in searches, and vastly decreasing the signal-to-noise ratio of search results for the average user on drupal.org.

As I stated above, this situation is only going to become worse once D5 is EOL'ed. Then we will have 200k+ nodes of information which are irrelevant to the average Drupal user.

I would like to see some suggestions of ways to handle this situation.

I would like to put forward the idea of allowing users of the site to flag nodes as 'obsolete', and somehow lowering or removing them from the default search unless a 'remove obsoletes' box is unchecked in the advance search.

I don't know if that is possible with the current architecture, or feasible for the webmaster team. Also, there is abuse potential. However, it's the best idea I've had so far.

Perhaps it would be possible to restrict the search to a given date range. The average D6 user probably isn't concerned with any nodes over 18 months old, or thereabouts.

Thoughts?

Comments

Damien Tournoud’s picture

It would be relatively easy to boost search results according to the Core compatibility vocabulary. I suggest we try the following boost factors:

7.x    13
6.x    21
5.x    8
4.7.x  0.2
4.6.x  0.1
4.5.x  0
4.4.x  0
4.3.x  0
4.2.x  0
4.1.x  0
4.0.x  0
Gábor Hojtsy’s picture

I think the real problem is that most content is not tagged with the version (properly)?

Damien Tournoud’s picture

In that case, there is nothing useful we can really do. Having to tag content with "obsolete" is not easier then just tagging for the correct version(s).

brianV’s picture

Leaving it as something filtered by a taxonomy term, whether the existing 'version' term or a new term, won't work since only webmasters and original authors could update existing content to the terms.

This would have to be some kind of flag, that could be set by any authenticated users without needing access to edit the node. I know there is abuse potential, but we need something that the community can do by itself. 300k+ nodes is a lot to try to organize with a team the size of the webmaster team here.

What if we set up a flag of some sort that needed to be flagged by three different users before a node is considered obsolete. That would make it a little tougher for a user to abuse.

add1sun’s picture

Just FYI, we already have a vocab for book pages for status, that includes an "outdated" term. SOP for completely obsolete pages is to flag them with the "outdated" term, prefix the node title with ARCHIVE:, and move them to the Archive book. There has been a lot of discussion about what to do with ancient stuff and we don't really have a good solution. We stopped deleting them (which we used to do) and started moving them to the Archive book last year because people were concerned about link rot. I'm certainly open to a real solution, since I think the current one is a hack at best.

Note that if it is simply an old page that could still be applicable in newer versions, it should not be archived, but tagged with the "Needs updating" term so that we can update the content to the newest versions of Drupal, rather than losing them altogether. The handbook policy pretty much follows the larger Drupal support policy in that we do not maintain or preserve docs for unsupported versions of Drupal. The only exception that we have kept to date is the upgrade path docs.

Jennifer_M’s picture

I'm interested in this. I often find myself searching on d.o for something and having to wade through lots of 4.7, 5.x, 6.x etc before finding what I need which is currently D7.

I find it a recurring frustration that the "Refine your search" option doesn't refine it in any way that's actually useful for this! I don't mind if the answer comes via a forum post or documentation or somewhere under "modules", I just want it to be relevant to the software I'm actually using!

=

Sometimes I switch to ordering by date just to get rid of some of the old stuff - but ordering by date currently means not ordering by relevance, so then I get lots with only a very tenuous connection to my search.

An interim fix would be to be able to limit by date and then order by relevance within that. That wouldn't require any extra tagging. So yeah, definitely a +1 for date range limits.

Dating doesn't really solve it, though. For one thing, a thread can start several years previous and have its most recent comment today (or, for a module page, be edited lots of times since its origin many years ago). So Date isn't necessarily all that helpful even when you know it.

=

As for tagging, I would vote for version tagging rather than "obsolete" tagging.

a) Neither of D6 and D7 is obsolete at this moment, but someone might easily only want one of them and not the other. (That's another reason why dating isn't an ideal solution. There's always going to be a long phase when many of the posts over any given recent time-span are based around D(x) and many are D(x+1). Presumably at some point in the migration it's 50% each :-) )

b) And ideally I think you'd want people to tag as they write, not have to come by later specially just to tag something obsolete.

=

A few other thoughts:

* The same thread can contain discussion of more than one version. (See for instance http://drupal.org/node/37767 - picked only because I was reading it earlier today: original post is a code snippet for D6, but a commenter provides a D7 adaptation and much of the ensuing thread is D7.) Likewise a module can have D5 and D6 versions or whatever. So multiple tags per page would be essential.

* Some help could potentially be automated. E.g. software could do initial unconfirmed tags for all the threads just by searching on the text, though you'd probably want to distinguish between "computer-guessed" and "human-confirmed". And ongoingly, e.g. if a thread wasn't already tagged for "D7" but then someone used the expressions "D7" or "Drupal 7" in a comment or edit, the software could intervene to ask "Is this relevant to D7?", or tentatively set the flag to that effect.

But there's no getting away from the need for human input, and the resulting challenges of scaling.

* Some threads might remain un-human-confirmed (or, if no initial automated tagging, untagged) a long time. Depending on how that was done, you might want an "untagged" option in search (same as eBay has a "not specified" tickbox in some of its tickbox categories, and you can choose to include such results or not). In that case, you might want to be able to explicitly say "Not D7" - to distinguish "Not D7" from "Not yet tagged as D7 but might in fact be D7".

=

If a version tagging system did exist, I for one would be very happy to tag old posts as I read them. There's quite a high degree of self-interest in that exercise, because every time I tag a thread as "Not D7", it saves me having to ever look at it again :-)

=

I agree that abuse potential is something to be thought about. Wrong tagging of this nature could hide useful pages, which is arguably more serious than most spam and obnoxiousness.

Therefore I'd suggest that tagging should have a separate permission, or more than one depending on the category of page, and that the system kept a record of who had set each tag. Then if someone left a trail of misleading tags they could just have the permission revoked.

By "more than one depending on category of page", I mean e.g. Documentation team could be the only ones to have permission to tag official documentation, whereas ordinary posters would be able to tag their own, and either nearly-all-users or a subset of users could have permission to tag other people's forum threads.

E.g. optionally, the permission to tag other people's forum threads could be auto-granted only after the person's made $minimum_number of other comments & posts without being picked up for nuisance behaviour.

=

Well those are some relevant thoughts off the top of my head... if there's more discussion of this elsewhere which I'm inadvertently duplicating here, feel free to point me at it :-)

lisarex’s picture

Hi Jennifer_M, the search results problem is fairly well known, and so one fix is #1101962: Add date filtering to search.

When you say "something" are you looking for your answer, regardless of whether it's found in documentation, forum posts or issue queues?

tvn’s picture

Status: Active » Closed (won't fix)

Closing old issues. Please re-open if needed.

brianV’s picture

Status: Closed (won't fix) » Active

This problem hasn't been solved, and in fact it's *worse* now than it was when originally reported.

Old D4.7 content still comes up in searches when I am looking for D7 info.

drumm’s picture

Title: Lots of obsolete data coming up in searches. » Lots of obsolete documentation coming up in searches
Project: Drupal.org site moderators » Drupal.org customizations
Version: » 7.x-3.x-dev
Component: Site organization » Code
Status: Active » Postponed (maintainer needs more info)
Issue tags: +drupal.org search
Parent issue: » #658048: Users should get an expected result when using search

Biasing on version, in a way that we don't forget to update when Drupal 9 comes out would help a lot. And book pages also have page statuses.

Lots of specific examples would help immensely. I need:

  • Search keywords
  • Great result(s) that should show
  • Horrible result(s) that do show
  • (And okay results that are worth tracking)

Ideally, a patch to http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php, or something I can easily add in.