I found unpublished nodes can be excluded from sitemap on first setup. But if a node is unpublished afterwards, it still display in sitemap even cron job is running regularly.

I would be appreciated if you can give me some advice. Thanks.

Issue fork xmlsitemap-2266933

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

bhushan.nagaonkar’s picture

I am also facing exactly the same issue.

Will subscribe.

martini9011’s picture

Component: xmlsitemap.module » Other
Assigned: Unassigned » martini9011
Status: Active » Needs review
StatusFileSize
new580 bytes
new580 bytes

Ran into this issue as well on the 7.x-2.2 branch. I created a patch for both versions (See links below).
I added a property condition to the fetch query in xmlsitemap.generate.inc.

7.x-2.x: https://www.drupal.org/files/issues/2266933-exclude-unpublished-nodes.patch
7.x-2.2: https://www.drupal.org/files/issues/2266933-exclude-unpublished-nodes-2-...

Let me know if this fixes the issue for you.

chegor’s picture

Very strange error, I tested 7.x-2.2 on the site with 10.000 nodes. And all is ok, unpublished node disappears from the site map after the launch of cron.

patrickwatzeels’s picture

StatusFileSize
new1.12 KB

I have the same error. However I also include Taxonomy pages to my sitemap which will occur in an error when using the patches given in #2. Further, this will only work when you use the 'rebuild sitemap' button in the admin panel. When the cron job runs the unpublished nodes are still visible in the xmlsitemap.

You don't want to rebuild your sitemap every time when you change the publication status of a node.

I believe there is a more pragmatic solution in a new patch which 'uses' the access flag in the 'xmlsitemap' table. When a node status changes from published to unpublished it will set the 'access' flag to 0. This flag is normally used to determine whether the anonymous user has access to a page or not but has a default fallback of 1.

It solved the issue in my site and I am also able to add other entities to my sitemap besides nodes.

patrickwatzeels’s picture

StatusFileSize
new1.72 KB

I will have to recall my previous comment and revoke the patch. With the patch in #4 it will not work properly when you use the 'rebuild sitemap' button in the admin.

I have updated the patch in commit #2 and added the changes in this patch combined with the changes in #4.

This solves the creation of the sitemap in both cases. Both the Cron job and the 'rebuild sitemap'

martini9011’s picture

Confirming that #5 works, and is probably a better solution since it prevents errors when handling entities that are not nodes (don't have a status column).

bc’s picture

I ran into a bug with the posted patch where $node->status is a string, and the strict test === 0 didn't work... interdiff & new patch here.

The last submitted patch, 7: 2266933-exclude-unpublished-nodes-weaktest-7.patch, failed testing.

bc’s picture

oops bad patch, i created it against the wrong repository... re-created so we can test :)

bc’s picture

I ended up finding a better way of altering the inclusion of certain entities in the sitemap by using a... surprise... hook_alter :)

Rather than patching xmlsitemap (whose functionality works fine on a drupal site without weird user_access hacks), you can do something like this:

/**
 * Implements hook_xmlsitemap_link_alter().
 *
 * Don't include certain links based on a taxonomy term or status.
 */
function MODULE_xmlsitemap_link_alter(array &$link, array $context) {
  if ($link['type'] == 'node') {
    $has_some_term = ($context['node']->field_some_term[LANGUAGE_NONE][0]['tid'] == 138);
    $is_unpublished = ($node->status != 1);

    if ($is_paidissue || $has_some_term) {
      $link['access'] = 0;
    }
  }
}
Nick Robillard’s picture

#9 works as expected.

dave reid’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

This needs a test to prove the current behavior is broken, which I'm not convinced it is. Is some other contrib module not implementing hook_node_access() properly?

bc’s picture

Dave, I agree that xmlsitemap seems to work as intended.

The codebase I was dealing with uses view_unpublished, but I wasn't aware of its messing with node access.

Would it be beneficial to include a more detailed example in xmlsitemap.api.php that shows the full extent of what you can change in hook_xmlsitemap_link_alter's &$link?

I just wrote a quick test to explicitly verify things work as intended (although this is implicitly covered in the existing testNodeSettings() test.


  /**
   * Ensure unpublished content is not included in sitemap.
   */
  function testNodeStatus() {
    $unpublished_node = $this->drupalCreateNode(array('status' => FALSE, 'uid' => $this->normal_user->uid));
    $published_node = $this->drupalCreateNode(array('status' => TRUE, 'uid' => $this->normal_user->uid));

    $this->assertSitemapLinkValues('node', $unpublished_node->nid, array('access' => 0, 'status' => 0, 'priority' => 0.5, 'status_override' => 0, 'priority_override' => 0));
    $this->assertSitemapLinkValues('node', $published_node->nid, array('access' => 1, 'status' => 1, 'priority' => 0.5, 'status_override' => 0, 'priority_override' => 0));
  }

dimetry’s picture

There is incompatibility between xmlsitemap and nodeaccess projects.
Patch from #9 helps to hide unpublished nodes from sitemap.

tshorock’s picture

We've been having the same issue as noted above (unpublished content showing sitemaps) in certain content. Other node access modules are granting permissions.

Rather than patching this module, we've added another ultra-tiny module just to catch this particular case.

function xmlsitemap_diezombies_xmlsitemap_link_alter(array &$link, array $context) {
  if($link['type']=='node' && $context['node']->status==false) {
    $link['access']=false;
  }
}
tbpixel’s picture

Hi bc,

I've incorporated your test case into a new patch that I just wanted to submit for future users and potentially making it easier to incorporate the patch into the xmlsitemap module.

bwaindwain’s picture

We're using view_unpublished and some other modules which implement hook_node_grants(). I found we needed this as well as https://www.drupal.org/project/xmlsitemap/issues/690120 to fix problems with new pages being added to sitemap.xml.

norman.lol’s picture

Priority: Normal » Major
darrell_ulm’s picture

Yes seeing this issue and this looks like the patch to fix.

jtsnow’s picture

This should be fixed in the 7.x-2.4 release of XML Sitemap.

See https://www.drupal.org/sa-contrib-2018-053

caspervoogt’s picture

Version 7.x-2.4 did not resolve this for me, but the patch from #9 did. The patch from #16 also worked for me. Not sure why this is marked "needs work" now. We have a test and we have a working patch..

caspervoogt’s picture

Status: Needs work » Needs review
rajeevchoudhary’s picture

I reviewed patch from #9 and #16, Its worked for me.

chris matthews’s picture

Assigned: martini9011 » Unassigned
Status: Needs review » Needs work
Issue tags: -XML sitemap, -unpublish +Needs reroll

The patches in #9 and #16 do not apply to the latest xmlsitemap 7.x-2.x-dev and need to be rerolled.

den tweed’s picture

StatusFileSize
new3.18 KB

Rerolled patch #16 for 2.7 dev and fixed typo in one of the comments

bill.zero’s picture

Applied #25, cleared caches, updated sitemap, ran cron, updated sitemap, deleted sitemap, created new sitemap, updated sitemap, ran cron ... and after every step unpublished nodes were/are still displaying for unauthenticated users. Maybe I missed something ...

xmlsitemap 7.x-2.6, drupal 7.64, PHP 7.1.26, MySQL 5.5x

bill.zero’s picture

Rethinking my comment above (#26), could a taxonomy term that has "Included" in the term's XML settings and is present on my node be overriding the unpublished status of the node, forcing it onto the sitemap?

esteinborn’s picture

Confirming that patch in #25 does not remove the unpublished nodes from the sitemap.xml file as intended.

#9 and #16 do not apply to latest 7.x-2.x-dev branch either.

gladden’s picture

I think I've seen an issue in a site I work with that's related to this thread. We have an event content type in our college's website.
Sometimes events, that were scheduled to be unpublished by the Scheduler module, won't fall out of the sitemap (sitemap.xml) on their own via the cron job that runs to update the sitemap. Some solutions to get them out of the sitemap are to go into each event and save it manually as unpublished or go to the XML Sitemap module settings in Drupal and click 'Rebuild sitemap'. I wonder if there's something that can be fixed to get cron to pull these unpublished events out of the sitemap?

rpayanm’s picture

Status: Needs work » Needs review
StatusFileSize
new2.92 KB
rpayanm’s picture

StatusFileSize
new2.94 KB

Status: Needs review » Needs work

The last submitted patch, 31: 2266933-31.patch, failed testing. View results

erwangel’s picture

All patches here are altering function xmlsitemap_node_create_link(), then they bypass the xmlsitemap_node_view_access() which is called inside this function replacing it by the result of the $node->status only test. It looks to me that the right place to test if a node is unpublished is inside xmlsitemap_node_view_access() so we preserve the other access tests.

// Check if node is 'unpublished'.
if ($node->status == 0) {
return FALSE;
}

damienmckenna’s picture

Issue tags: -Needs reroll

This doesn't need a reroll anymore, but the suggestion from #33 still stands.

damienmckenna’s picture

It might also be modified e.g. using hook_init() and hardcoding specific values in drupal_static('xmlsitemap_node_view_access').. maybe?

rajeevchoudhary’s picture

I reviewed suggestions on #34

Its worked for me.

monph’s picture

Reviewed and tested #33. Works for me on the latest version 7.x-2.6.

Anybody made their first commit to this issue’s fork.

anybody’s picture

Status: Needs work » Needs review

Just ran into this and created a MR from #33 as I guess that's correct. I documented that, as it's important at which point the check happens (after other checks which might allow access).
Please review.

anybody’s picture

MR4 fixed the problem for me, could you please test and review @all here?

https://git.drupalcode.org/project/xmlsitemap/-/merge_requests/4.patch

dave reid’s picture

Status: Needs review » Postponed (maintainer needs more info)

I'm still not sure why this change is needed? By my reading, the current code seems to be the same as the node_access() function, which should default to FALSE at the end, much like the current logic does if none of the conditions evaluate to TRUE. I need more of a reason or some deeper debugging into why this needs to be added. And also needs tests to show why this is failing currently, because I don't believe that with core only that this is, there must be some kind of contributed module that maybe isn't using hook_node_access() like it should.

rajeevchoudhary’s picture