We have this archive book, which has become bloated with content that doesn't deserve to be searchable or visible, because it is outdated or bad or whatever.

It would be nice if when someone moved a Book page to the Archive book, it would also become unpublished.

Comments

jhodgdon’s picture

Related task (both of these are from the Vancouver docs meeting 2010):
#995322: Unpublish everything in the Archive book

LeeHunter’s picture

This is also something that may be needed in the docs management View from time to time (i.e. "show me a list of all the pages that have been recently unpublished").

jhodgdon’s picture

RE #2 -- If we do
#995360: Create a doc admin management view
then you should be able to filter to "unpublished" and sort by most recent update, so that should do it.

pwolanin’s picture

It would be pretty trivial to code up admin page, etc, to exclude certain books from the Solr index even if they are still published.

Not sure how we are managing robots.txt, but maybe there is a way to handle that as well?

jhodgdon’s picture

But we don't want them accessible or indexed by Google etc. either. We really just want to kill them, but rather than delete automatically, unpublish is a bit less drastic (in case of a mistake).

jhodgdon’s picture

Project: Drupal.org infrastructure » Drupal.org customizations
Component: Drupal.org module » Code

Moving project - this is custom code needed

zzolo’s picture

Just a thought about this.

1) It would be trivial for someone to move a page they don't like to the Archive, and it may be difficult (even with a few filters in the Docs View) for anyone to notice it is gone until its too late.
2) There should be some way for registered users to get to this content, as it may have some value in writing new content.

Just some thoughts. My main concern is number 2. I don't have a strong preference so don't let it slow down any progress.

jhodgdon’s picture

Version: » 6.x-2.x-dev

Good points zzolo.

Maybe we should reconsider. It's now very easy with the Management view to view a list of all pages in the Archive book that are still published, and then an admin can unpublish them. So maybe we should just add this to our weekly/monthly To Do list instead of automatically unpublishing?

Who has permission to move pages around in book outlines, though? Just anyone?

scor’s picture

Unpublishing nodes will increase the link rot on Drupal.org. Redirecting instead of just giving an access denied will surely be more appreciated by search engines and users alike. Many of these pages have been linked not only on Drupal.org itself, but also elsewhere on the Web, it's bad web architecture practice to shut them down without any kind of clue on why they've be taken down. Ideally there would be a 301: Moved permanently redirection, but I understand that they won't always be a 1:1 linkage possible. At least, there could be a default landing page explaining that the content is outdated, maybe it could be the main documentation page with a drupal_set_message() on the top. How about a node reference field pointing to the new relevant node, and upon loading of an unpublished node, it would redirect to that node (maybe there is already a contrib module for that).

jhodgdon’s picture

That sounds pretty complicated scor, and a lot to ask of maintainers to figure out the correct redirect (not to mention that not too many people have permission to create redirects on d.o., so they'd need to file an issue to ask for a redirect to happen).

But the main thing here is that we do not want the content that goes into the Archive to be indexed by d.o's search engine or outside search engines (google et al). It's not current, and is screwing up search results, and that is why we have unpublished what is currently in the archive. I realize it may be inconvenient, but we haven't really heard any complaints about excessive bad links since we did this (we unpublished everything in the archive several weeks back).

Changing the message on a 404 page is a possibility ... or actually these would be 403 (not authorized) pages, if you are a user who cannot see unpublished pages. So we could make that happen.

mlncn’s picture

More speech, not censorship ;-) I'll work to improve Term Message to meet the minimum requirements (as covered here), and tools making redirect links like path redirect/redirect can be considered in due time.

jhodgdon’s picture

mlncn: The objective is not to censor. The objective is to make doc searches work. More discussion: #1026542: What to do with Drupal 5 documentation? comments #11 and #12. If you would like to maintain an archive of documentation that we've unpublished, we can definitely export it for you, but we don't want it on drupal.org available in regular searches.

We have path_redirect on drupal.org already, by the way.

mlncn’s picture

Link rot is bad. Pages people used to be able to find disappearing is worse than some possibly obsolete data.

Why is this page access denied now and where has it gone? http://drupal.org/node/128513
It was the Drupal handbook page “sed - replace text in single or multiple files”, and i referred to it in a Def. Guide to Drupal 7 chapter. Now my editors had to tell me it goes to access denied. :-/ How does 'sed' go out of date? [UPDATE: Found it marked as archived in this issue even though comments were that it needed updating: http://drupal.org/node/1012476#comment-4354732 (and again, archive should not mean unpublished).]

We have control over our own Apache SOLR indexing and can even tell Google and other search engines not to index. We can archive / deprecate but unpublishing is not a true solution to the challenge of curating content.

And if we can't change this on Drupal.org, i will take a dump/feed of all unpublished handbook pages. We're throwing out some gold in pages we are unpublishing *and* making it harder for people to find the newer pages, not easier, compared to headlining obsolete pages with "look over here"-- it's all the old text that makes it possible for people to find these pages, even redirect is not a full answer.

scor’s picture

@jhodgdon: it seems your main concern is searchability, e.g. you'd be ok if the pages were still published as long as they are not indexed by neither drupal.org search not Google et al. Is that right?

If that's the case, all we need would be a taxonomy term "deprecated" or "exclude from index" to tag these nodes with and implement hook_apachesolr_node_exclude() in drupalorg, I'd be happy to roll a patch for that. Alternatively, no need for a new taxonomy term, we can just check if the book page is part of the Archive book. As for google et al., we could leverage the same logic to throw in a content="noindex" meta tag in the page.

jhodgdon’s picture

scor/#14: That sounds workable - yes, being part of the Archive book means we don't want it on Google or Solr. No taxonomy is needed.

Want to make a patch to make that happen?

mgifford’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev
Issue summary: View changes

@scor still willing to roll a patch?

If this is still needed, I think your suggestion has consensus.

drumm’s picture

Status: Active » Closed (won't fix)

We removed the archive book completely: #2744863: Delete Archive book