Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We have this archive book, which has become bloated with content that doesn't deserve to be searchable or visible, because it is outdated or bad or whatever.
It would be nice if when someone moved a Book page to the Archive book, it would also become unpublished.
Comments
Comment #1
jhodgdonRelated task (both of these are from the Vancouver docs meeting 2010):
#995322: Unpublish everything in the Archive book
Comment #2
LeeHunter CreditAttribution: LeeHunter commentedThis is also something that may be needed in the docs management View from time to time (i.e. "show me a list of all the pages that have been recently unpublished").
Comment #3
jhodgdonRE #2 -- If we do
#995360: Create a doc admin management view
then you should be able to filter to "unpublished" and sort by most recent update, so that should do it.
Comment #4
pwolanin CreditAttribution: pwolanin commentedIt would be pretty trivial to code up admin page, etc, to exclude certain books from the Solr index even if they are still published.
Not sure how we are managing robots.txt, but maybe there is a way to handle that as well?
Comment #5
jhodgdonBut we don't want them accessible or indexed by Google etc. either. We really just want to kill them, but rather than delete automatically, unpublish is a bit less drastic (in case of a mistake).
Comment #6
jhodgdonMoving project - this is custom code needed
Comment #7
zzolo CreditAttribution: zzolo commentedJust a thought about this.
1) It would be trivial for someone to move a page they don't like to the Archive, and it may be difficult (even with a few filters in the Docs View) for anyone to notice it is gone until its too late.
2) There should be some way for registered users to get to this content, as it may have some value in writing new content.
Just some thoughts. My main concern is number 2. I don't have a strong preference so don't let it slow down any progress.
Comment #8
jhodgdonGood points zzolo.
Maybe we should reconsider. It's now very easy with the Management view to view a list of all pages in the Archive book that are still published, and then an admin can unpublish them. So maybe we should just add this to our weekly/monthly To Do list instead of automatically unpublishing?
Who has permission to move pages around in book outlines, though? Just anyone?
Comment #9
scor CreditAttribution: scor commentedUnpublishing nodes will increase the link rot on Drupal.org. Redirecting instead of just giving an access denied will surely be more appreciated by search engines and users alike. Many of these pages have been linked not only on Drupal.org itself, but also elsewhere on the Web, it's bad web architecture practice to shut them down without any kind of clue on why they've be taken down. Ideally there would be a 301: Moved permanently redirection, but I understand that they won't always be a 1:1 linkage possible. At least, there could be a default landing page explaining that the content is outdated, maybe it could be the main documentation page with a drupal_set_message() on the top. How about a node reference field pointing to the new relevant node, and upon loading of an unpublished node, it would redirect to that node (maybe there is already a contrib module for that).
Comment #10
jhodgdonThat sounds pretty complicated scor, and a lot to ask of maintainers to figure out the correct redirect (not to mention that not too many people have permission to create redirects on d.o., so they'd need to file an issue to ask for a redirect to happen).
But the main thing here is that we do not want the content that goes into the Archive to be indexed by d.o's search engine or outside search engines (google et al). It's not current, and is screwing up search results, and that is why we have unpublished what is currently in the archive. I realize it may be inconvenient, but we haven't really heard any complaints about excessive bad links since we did this (we unpublished everything in the archive several weeks back).
Changing the message on a 404 page is a possibility ... or actually these would be 403 (not authorized) pages, if you are a user who cannot see unpublished pages. So we could make that happen.
Comment #11
mlncn CreditAttribution: mlncn commentedMore speech, not censorship ;-) I'll work to improve Term Message to meet the minimum requirements (as covered here), and tools making redirect links like path redirect/redirect can be considered in due time.
Comment #12
jhodgdonmlncn: The objective is not to censor. The objective is to make doc searches work. More discussion: #1026542: What to do with Drupal 5 documentation? comments #11 and #12. If you would like to maintain an archive of documentation that we've unpublished, we can definitely export it for you, but we don't want it on drupal.org available in regular searches.
We have path_redirect on drupal.org already, by the way.
Comment #13
mlncn CreditAttribution: mlncn commentedLink rot is bad. Pages people used to be able to find disappearing is worse than some possibly obsolete data.
Why is this page access denied now and where has it gone? http://drupal.org/node/128513
It was the Drupal handbook page “sed - replace text in single or multiple files”, and i referred to it in a Def. Guide to Drupal 7 chapter. Now my editors had to tell me it goes to access denied. :-/ How does 'sed' go out of date? [UPDATE: Found it marked as archived in this issue even though comments were that it needed updating: http://drupal.org/node/1012476#comment-4354732 (and again, archive should not mean unpublished).]
We have control over our own Apache SOLR indexing and can even tell Google and other search engines not to index. We can archive / deprecate but unpublishing is not a true solution to the challenge of curating content.
And if we can't change this on Drupal.org, i will take a dump/feed of all unpublished handbook pages. We're throwing out some gold in pages we are unpublishing *and* making it harder for people to find the newer pages, not easier, compared to headlining obsolete pages with "look over here"-- it's all the old text that makes it possible for people to find these pages, even redirect is not a full answer.
Comment #14
scor CreditAttribution: scor commented@jhodgdon: it seems your main concern is searchability, e.g. you'd be ok if the pages were still published as long as they are not indexed by neither drupal.org search not Google et al. Is that right?
If that's the case, all we need would be a taxonomy term "deprecated" or "exclude from index" to tag these nodes with and implement hook_apachesolr_node_exclude() in drupalorg, I'd be happy to roll a patch for that. Alternatively, no need for a new taxonomy term, we can just check if the book page is part of the Archive book. As for google et al., we could leverage the same logic to throw in a content="noindex" meta tag in the page.
Comment #15
jhodgdonscor/#14: That sounds workable - yes, being part of the Archive book means we don't want it on Google or Solr. No taxonomy is needed.
Want to make a patch to make that happen?
Comment #16
mgifford@scor still willing to roll a patch?
If this is still needed, I think your suggestion has consensus.
Comment #17
drummWe removed the archive book completely: #2744863: Delete Archive book