Postponed (maintainer needs more info)
Project:
XML sitemap
Version:
7.x-2.x-dev
Component:
Other
Priority:
Major
Category:
Support request
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
14 May 2014 at 02:42 UTC
Updated:
18 Sep 2024 at 12:21 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
bhushan.nagaonkar commentedI am also facing exactly the same issue.
Will subscribe.
Comment #2
martini9011 commentedRan into this issue as well on the 7.x-2.2 branch. I created a patch for both versions (See links below).
I added a property condition to the fetch query in xmlsitemap.generate.inc.
7.x-2.x: https://www.drupal.org/files/issues/2266933-exclude-unpublished-nodes.patch
7.x-2.2: https://www.drupal.org/files/issues/2266933-exclude-unpublished-nodes-2-...
Let me know if this fixes the issue for you.
Comment #3
chegor commentedVery strange error, I tested 7.x-2.2 on the site with 10.000 nodes. And all is ok, unpublished node disappears from the site map after the launch of cron.
Comment #4
patrickwatzeels commentedI have the same error. However I also include Taxonomy pages to my sitemap which will occur in an error when using the patches given in #2. Further, this will only work when you use the 'rebuild sitemap' button in the admin panel. When the cron job runs the unpublished nodes are still visible in the xmlsitemap.
You don't want to rebuild your sitemap every time when you change the publication status of a node.
I believe there is a more pragmatic solution in a new patch which 'uses' the access flag in the 'xmlsitemap' table. When a node status changes from published to unpublished it will set the 'access' flag to 0. This flag is normally used to determine whether the anonymous user has access to a page or not but has a default fallback of 1.
It solved the issue in my site and I am also able to add other entities to my sitemap besides nodes.
Comment #5
patrickwatzeels commentedI will have to recall my previous comment and revoke the patch. With the patch in #4 it will not work properly when you use the 'rebuild sitemap' button in the admin.
I have updated the patch in commit #2 and added the changes in this patch combined with the changes in #4.
This solves the creation of the sitemap in both cases. Both the Cron job and the 'rebuild sitemap'
Comment #6
martini9011 commentedConfirming that #5 works, and is probably a better solution since it prevents errors when handling entities that are not nodes (don't have a status column).
Comment #7
bc commentedI ran into a bug with the posted patch where $node->status is a string, and the strict test === 0 didn't work... interdiff & new patch here.
Comment #9
bc commentedoops bad patch, i created it against the wrong repository... re-created so we can test :)
Comment #10
bc commentedI ended up finding a better way of altering the inclusion of certain entities in the sitemap by using a... surprise... hook_alter :)
Rather than patching xmlsitemap (whose functionality works fine on a drupal site without weird user_access hacks), you can do something like this:
Comment #11
Nick Robillard commented#9 works as expected.
Comment #12
dave reidThis needs a test to prove the current behavior is broken, which I'm not convinced it is. Is some other contrib module not implementing hook_node_access() properly?
Comment #13
bc commentedDave, I agree that xmlsitemap seems to work as intended.
The codebase I was dealing with uses view_unpublished, but I wasn't aware of its messing with node access.
Would it be beneficial to include a more detailed example in xmlsitemap.api.php that shows the full extent of what you can change in
hook_xmlsitemap_link_alter's&$link?I just wrote a quick test to explicitly verify things work as intended (although this is implicitly covered in the existing testNodeSettings() test.
Comment #14
dimetry commentedThere is incompatibility between xmlsitemap and nodeaccess projects.
Patch from #9 helps to hide unpublished nodes from sitemap.
Comment #15
tshorock commentedWe've been having the same issue as noted above (unpublished content showing sitemaps) in certain content. Other node access modules are granting permissions.
Rather than patching this module, we've added another ultra-tiny module just to catch this particular case.
Comment #16
tbpixel commentedHi bc,
I've incorporated your test case into a new patch that I just wanted to submit for future users and potentially making it easier to incorporate the patch into the xmlsitemap module.
Comment #17
bwaindwain commentedWe're using view_unpublished and some other modules which implement hook_node_grants(). I found we needed this as well as https://www.drupal.org/project/xmlsitemap/issues/690120 to fix problems with new pages being added to sitemap.xml.
Comment #18
norman.lolComment #19
darrell_ulm commentedYes seeing this issue and this looks like the patch to fix.
Comment #20
jtsnow commentedThis should be fixed in the 7.x-2.4 release of XML Sitemap.
See https://www.drupal.org/sa-contrib-2018-053
Comment #21
caspervoogt commentedVersion 7.x-2.4 did not resolve this for me, but the patch from #9 did. The patch from #16 also worked for me. Not sure why this is marked "needs work" now. We have a test and we have a working patch..
Comment #22
caspervoogt commentedComment #23
rajeevchoudhary commentedI reviewed patch from #9 and #16, Its worked for me.
Comment #24
chris matthews commentedThe patches in #9 and #16 do not apply to the latest xmlsitemap 7.x-2.x-dev and need to be rerolled.
Comment #25
den tweed commentedRerolled patch #16 for 2.7 dev and fixed typo in one of the comments
Comment #26
bill.zero commentedApplied #25, cleared caches, updated sitemap, ran cron, updated sitemap, deleted sitemap, created new sitemap, updated sitemap, ran cron ... and after every step unpublished nodes were/are still displaying for unauthenticated users. Maybe I missed something ...
xmlsitemap 7.x-2.6, drupal 7.64, PHP 7.1.26, MySQL 5.5x
Comment #27
bill.zero commentedRethinking my comment above (#26), could a taxonomy term that has "Included" in the term's XML settings and is present on my node be overriding the unpublished status of the node, forcing it onto the sitemap?
Comment #28
esteinborn commentedConfirming that patch in #25 does not remove the unpublished nodes from the sitemap.xml file as intended.
#9 and #16 do not apply to latest 7.x-2.x-dev branch either.
Comment #29
gladden commentedI think I've seen an issue in a site I work with that's related to this thread. We have an event content type in our college's website.
Sometimes events, that were scheduled to be unpublished by the Scheduler module, won't fall out of the sitemap (sitemap.xml) on their own via the cron job that runs to update the sitemap. Some solutions to get them out of the sitemap are to go into each event and save it manually as unpublished or go to the XML Sitemap module settings in Drupal and click 'Rebuild sitemap'. I wonder if there's something that can be fixed to get cron to pull these unpublished events out of the sitemap?
Comment #30
rpayanmComment #31
rpayanmComment #33
erwangel commentedAll patches here are altering function xmlsitemap_node_create_link(), then they bypass the xmlsitemap_node_view_access() which is called inside this function replacing it by the result of the $node->status only test. It looks to me that the right place to test if a node is unpublished is inside xmlsitemap_node_view_access() so we preserve the other access tests.
// Check if node is 'unpublished'.
if ($node->status == 0) {
return FALSE;
}
Comment #34
damienmckennaThis doesn't need a reroll anymore, but the suggestion from #33 still stands.
Comment #35
damienmckennaIt might also be modified e.g. using hook_init() and hardcoding specific values in drupal_static('xmlsitemap_node_view_access').. maybe?
Comment #36
rajeevchoudhary commentedI reviewed suggestions on #34
Its worked for me.
Comment #37
monph commentedReviewed and tested #33. Works for me on the latest version 7.x-2.6.
Comment #41
anybodyJust ran into this and created a MR from #33 as I guess that's correct. I documented that, as it's important at which point the check happens (after other checks which might allow access).
Please review.
Comment #43
anybodyMR4 fixed the problem for me, could you please test and review @all here?
https://git.drupalcode.org/project/xmlsitemap/-/merge_requests/4.patch
Comment #44
dave reidI'm still not sure why this change is needed? By my reading, the current code seems to be the same as the node_access() function, which should default to FALSE at the end, much like the current logic does if none of the conditions evaluate to TRUE. I need more of a reason or some deeper debugging into why this needs to be added. And also needs tests to show why this is failing currently, because I don't believe that with core only that this is, there must be some kind of contributed module that maybe isn't using hook_node_access() like it should.
Comment #45
rajeevchoudhary commented