Normally when a new node inserted to drupal, xmlsitemap should rebuild the sitemap on next cron task. but i found it never on 6.x-2.x-dev.
I look into the code and found function xmlsitemap_node_create_link(&$node) cause this issue.

note the line:

$node->xmlsitemap['access'] = $node->nid ? (bool) node_access('view', $node, drupal_anonymous_user()) : 1;

$node->xmlsitemap['access'] will always has a value '0' for new submitted node.

because when this function being called,the access data for this node is empty on the database.

why?

see the code from node_save($node)


  // Call the node specific callback (if any).
  node_invoke($node, $op);
  node_invoke_nodeapi($node, $op);

  // Update the node access table for this node.
  node_access_acquire_grants($node);

Let me know if i am wrong.

xmlsitemap_node.module


function xmlsitemap_node_create_link(&$node) {
  if (!isset($node->xmlsitemap)) {
    $node->xmlsitemap = array();
  }

  $node->xmlsitemap += array(
    'type' => 'node',
    'id' => $node->nid,
    'subtype' => $node->type,
    'loc' => 'node/'. $node->nid,
    'status' => variable_get('xmlsitemap_node_status_' . $node->type, 1),
    'status_default' => variable_get('xmlsitemap_node_status_' . $node->type, 1),
    'status_override' => 0,
    'priority' => variable_get('xmlsitemap_node_priority_' . $node->type, 0.5),
    'priority_default' => variable_get('xmlsitemap_node_priority_' . $node->type, 0.5),
    'priority_override' => 0,
    'changefreq' => $node->nid ? xmlsitemap_calculate_changefreq(xmlsitemap_node_get_timestamps($node)) : 0,
    'changecount' => $node->nid ? count(xmlsitemap_node_get_timestamps($node)) - 1 : 0,
  );

  // The following values must always be checked because they are volatile.
  $node->xmlsitemap['lastmod'] = isset($node->changed) ? $node->changed : REQUEST_TIME;
  $node->xmlsitemap['access'] = $node->nid ? (bool) node_access('view', $node, drupal_anonymous_user()) : 1;
  $node->xmlsitemap['language'] = isset($node->language) ? $node->language : '';

  return $node->xmlsitemap;
}
Members fund testing for the Drupal project. Drupal Association Learn more

Comments

Dave Reid’s picture

Priority: Critical » Normal

Hmm, not exactly critical since its not exposing secret nodes and the next time a node is saved the correct access value is saved, but I'm not sure how this could be fixed from XML sitemap. This would appear to be either by design or a core bug.

Dave Reid’s picture

I filed a core issue at #690520: Calling node_access() from inside hook_node_save() results in wrong access for new nodes and I'll try and get some higher-ups involved in the issue to see what they recommend.

Dave Reid’s picture

Status: Active » Postponed
ricsonhoo’s picture

I feel it is critical, since the access is always 0 for new node, then $flag will has value FALSE, then variable "xmlsitemap_regenerate_needed" will never has value TRUE. until someday, maybe 5 days maybe 3 month later you update a node. (I am not sure if update a node will trigger a "access" recheck on all nodes?)

  if ($changed && $flag) {
    variable_set('xmlsitemap_regenerate_needed', TRUE);
  }

THEN no matter how many new nodes you have posted, it will never trigger a sitemap rebuild action, 'cause the cron will break on varible xmlsitemap_regenerate_needed == FALSE,

function xmlsitemap_cron() {
  // If there were no new or changed links, skip.
  if (!variable_get('xmlsitemap_regenerate_needed', FALSE)) {
    return;
  }
......

call "node_access_acquire_grants($node); " inside function "xmlsitemap_node_create_link" will work,

function xmlsitemap_node_create_link(&$node) {
  if (!isset($node->xmlsitemap)) {
    $node->xmlsitemap = array();
  }

  $node->xmlsitemap += array(
    'type' => 'node',
    'id' => $node->nid,
    'subtype' => $node->type,
    'loc' => 'node/'. $node->nid,
    'status' => variable_get('xmlsitemap_node_status_' . $node->type, 1),
    'status_default' => variable_get('xmlsitemap_node_status_' . $node->type, 1),
    'status_override' => 0,
    'priority' => variable_get('xmlsitemap_node_priority_' . $node->type, 0.5),
    'priority_default' => variable_get('xmlsitemap_node_priority_' . $node->type, 0.5),
    'priority_override' => 0,
    'changefreq' => $node->nid ? xmlsitemap_calculate_changefreq(xmlsitemap_node_get_timestamps($node)) : 0,
    'changecount' => $node->nid ? count(xmlsitemap_node_get_timestamps($node)) - 1 : 0,
  );
  node_access_acquire_grants($node);
  // The following values must always be checked because they are volatile.
  $node->xmlsitemap['lastmod'] = isset($node->changed) ? $node->changed : REQUEST_TIME;
  $node->xmlsitemap['access'] = $node->nid ? (bool) node_access('view', $node, drupal_anonymous_user()) : 1;
  $node->xmlsitemap['language'] = isset($node->language) ? $node->language : '';

  return $node->xmlsitemap;
}
Dave Reid’s picture

Core and a good majority of users aren't using any kind of node access modules, so this function returns the proper values for them. This only returns the improper value if there are node access/grant modules enabled.

kiamlaluno’s picture

Calling node_access_acquire_grants(), which is already called from Drupal core modules, doesn't seem a correct work-around.
Maybe it is better to first verify if there are modules implementing hook_node_grants(), and then take the necessary actions; the node data could be marked as needing updats, and the module could verify later if the anonymous user has view access to the node.

Dave Reid’s picture

Title: 'access' value always be 0 for new node » Cannot reliably check node_access from inside hook_node_save() with node grants
Version: 6.x-2.x-dev » 7.x-2.x-dev
Status: Postponed » Active
Dave Reid’s picture

I think #786484: Use a 'link update' queue for delayed link changes is going to be how we solve this problem. Instead of trying to update the links from within hook_node_update() or hook_node_insert(), we put them into a 'link update queue' and process them at the end of the page request or during cron.

Any alternative ideas are welcome, but I think that's our best option.

petrica.martinescu’s picture

Temporary solution: altering the query used for selecting elements that goes into sitemap.xml file, using an edited node access function. There is actually a left join with the node access table only for node items in the sitemap table (the access field is not taken into account anymore). Add the following code into a custom module.

/**
 * Temporary fix bug: http://drupal.org/node/690120
 */
function module_query_xmlsitemap_generate_alter(&$data, $sitemap) {
  $data['FROM'] .= " LEFT JOIN {node} n ON n.nid = x.id AND x.type = 'node' LEFT JOIN {node_access} na ON n.nid = na.nid";
  $data['WHERE'] = "WHERE x.status = 1 AND ((x.type <> 'node' AND x.access = 1) OR " . module_node_access_where_sql('view', 'na', drupal_anonymous_user()) . ')';
}

function module_node_access_where_sql($op = 'view', $node_access_alias = 'na', $account = NULL) {
  $grants = array();
  foreach (node_access_grants($op, $account) as $realm => $gids) {
    foreach ($gids as $gid) {
      $grants[] = "($node_access_alias.gid = $gid AND $node_access_alias.realm = '$realm')";
    }
  }

  $grants_sql = '';
  if (count($grants)) {
    $grants_sql = 'AND ('. implode(' OR ', $grants) .')';
  }

  $sql = "$node_access_alias.grant_$op >= 1 $grants_sql";
  return $sql;
}
alex_wang’s picture

I do a patch which can rebuild the xmlsitemap in every cron....

This is not a proper solution but at least can get a reliable result.

earnie’s picture

#10 is off topic for this issue. Please find or create an appropriate issue for it.

madmanmax’s picture

#4 is actually onto something: node_access_acquire_grants will call in the end node_access_write_grants which will first wipe all records from node_access for that given node. I agree it's not the best solution but for now it's just one line of code that temporary fixes the problem.

nagba’s picture

The issue with #4 is that it is possible to get a SQL Integrity error. Say, you have a site with a node access module enabled and you create a new node. With the change the node_access_acquire_grants will save a row into the table, and then later node_save will try to save the same thing again. The queue approach feels safer, also probably going to include less hacks.

mstef’s picture

I'm having this problem with Workbench moderation. In some cases, when hook_node_update() is invoked, xmlsitemap_node_view_access() will return the access based on the OLD node (obviously a big problem). It seems to me that the issue is the grants table being outdated at this point.

This is how I resolved for my case. Perhaps the solution might help for a "fix" for this module..

/**
 * Implements hook_node_update().
 */
function mymodule_node_update($node) {
  // Check if this is a live revision being saved from workbench moderation.
  // @see workbench_moderation_store()
  // @see workbench_moderation_node_unpublish()
  if (isset($node->workbench_moderation['updating_live_revision'])) {
    // Rebuild access grant records for the node
    node_access_acquire_grants($node);

    // Clear out XML Sitemap Node's access cache
    drupal_static_reset('xmlsitemap_node_view_access');
    drupal_static_reset('xmlsitemap_node_view_access:grants');

    // Generate the sitemap link for this node (again)
    if ($link = xmlsitemap_node_create_link($node)) {
      xmlsitemap_link_save($link);
    }
  }
}
mstef’s picture

Does seem a bit strange that the grants would be updated after the hooks... (in core)

Kristen Pol’s picture

This has been a (difficult) long standing issue and there does not appear to be progress on the core bug side of things or on the 'link update' queue side of things so I've attached a patch with the temporary "fix" from @ricsonhoo on #4 so that non-coders can apply the patch to get this working. Thanks @ricsonhoo !

chrisdarke42@gmail.com’s picture

That patch in #16 based off #4 works great, node access now gets set to 1. I haven't tried anything beyond published/unpublished, but that is all I need right now, and for that it works perfectly.

Rudi Teschner’s picture

Thanks for the patch. Even though "temporary", it solves the problems and the sitemap now regenerates properly.