As I was looking into a reason for a URL to generate a permission error, even though I'm the admin (User 1), I stumble on some code in function _linkchecker_link_node_ids() as follow:

  if (!empty($node_author_account)) {
    $nodes = db_query(db_rewrite_sql('SELECT n.nid
      FROM {node} n
      INNER JOIN {linkchecker_nodes} ln ON ln.nid = n.nid
      INNER JOIN {node_revisions} r ON r.vid = n.vid
      WHERE ln.lid = %d AND (n.uid = %d OR r.uid = %d)'), $link->lid, $node_author_account->uid, $node_author_account->uid);
  }

If the account is defined, then only nodes from that "author" are returned instead of all the pages when the "author" is the administrator (user 1.) I'm thinking that the if() here should be changed to the following:

  if (!empty($node_author_account) && $node_author_account->uid != 1) {

That way if you are the all almighty administrator you see everything instead of being limited to the nodes you created/edited.

The function handling comments has the same check which I also think should be bypassed if uid is 1.

  if (!empty($comment_author_account) && $comment_author_account->uid != 1) {

This being said, when I get such errors, it is most often because there are links in the links table that do not match any links in a page. In particular, I noticed one that changed, because of a 301, and the link in the linkchecker_links table was not updated. That link shows as "Permission restrictions deny you access", even though I have permission to that page, but the link does not exist... you do not know the difference where the permission restriction message is generated.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hass’s picture

Status: Active » Postponed (maintainer needs more info)

Do you know how I'm able to reproduce this? Haven't seen this and need to investigate further.

AlexisWilke’s picture

One of the links I had a problem with simply had a link entry in the linkchecker_links, a corresponding link/node relation in linkchecker_nodes, but the link did not exist in the node anymore.

That particular link was marked as "301 Permanently Redirected" and the node actually had the new link. I'm not 100% sure whether the link was replaced by me or automatically. I don't really remember changing it, especially because I probably wouldn't have done so in the first place. So I suspect that the 301 was acted upon by the system and somehow the old link versus node relationship not properly updated.

hass’s picture

We need to reproduce this. A link can stay in linkchecker link for some time after it is no longer in use, but the references in linkchecker node must be removed if the node is saved and no longer has the link in the content.

I have no idea how this can cause a link to show in a report. It sounds like something is wrong, but what you are writing should all not happen and i have no clue about the root cause. Do you have a backup of the database that has the issue?

You are running auto updates? Has the auto update user FULL permissions? What node modules are you using? Any forward revisioning modules like workbench? Any other modues that may confuse node update hook?

AlexisWilke’s picture

We need to reproduce this. A link can stay in linkchecker link for some time after it is no longer in use, but the references in linkchecker node must be removed if the node is saved and no longer has the link in the content.

I would imagine that at the time I save a page the permissions of the person saving that page rule.

However, I have pages that make use of [node:123] tag from the https://drupal.org/project/InsertNode module. And that can cause the issue where I change node 123, but all the nodes that have the [node:123] tag do not get updated. That would explain the problem of why a reference node/link remains in the tables.

I have no idea how this can cause a link to show in a report. It sounds like something is wrong, but what you are writing should all not happen and i have no clue about the root cause. Do you have a backup of the database that has the issue?

I have 30 days of backup. Just in case, I will keep a snapshot of that one database with the problem. It is 61Mb and for PostgreSQL. I'm not too sure how I could release that though because some of the data is proprietary and other is covered by user privacy...

You are running auto updates?

If you mean, auto-update from Drupal Core, yes. That module is up and running on that very site.

As for the linkchecker, I have the CRON system running once an hour.

Has the auto update user FULL permissions?

CRON runs as an anonymous user. However, the contents of the [node:123] tags are accessible to anonymous users. However, it is very likely that some of the errors appear on pages where anonymous users do not have permission. Yet, that only happens for a few set of links. That is, many hidden pages are showing the problem for one or two links, but most of the links are not reported.

What node modules are you using?

I also run node_privacy_byrole. This is how I can hide a certain number of nodes that are used for things such as emailing people who register. But again, most of the links on those pages are not reported with an error.

Any forward revisioning modules like workbench?

I tested revision_moderation a while back. Probably way before I ever installed linkchecker. It's not in place now, and most certain not used on the specific nodes generating a problem.

Any other modules that may confuse node update hook?

I'm thinking the https://drupal.org/project/insert_block would have the same problem as the InsertNode module mentioned earlier since it will show links from another block.

I suppose that any module that dynamically show content from another page, block, comment, view, ... would be a good culprit for this sort of problem.

I can think of one way to fix the problem: if you notice such a problem with a node, mark it for update on next CRON run. Rescan the node and adjust the tables as required, eventually deleting link/node connections.

hass’s picture

Category: bug » support
Status: Postponed (maintainer needs more info) » Fixed

However, I have pages that make use of [node:123] tag from the https://drupal.org/project/InsertNode module. And that can cause the issue where I change node 123, but all the nodes that have the [node:123] tag do not get updated. That would explain the problem of why a reference node/link remains in the tables.

This is a configuration fault. Make sure that NodeInsert is checked in Filters disabled for link extraction on link checker settings page. This make sure you only need to fix node 123 and no other. With your wrong configuration the link is kept in database as long all the nodes with the InsertNode placeholders [node/123] are updated. But if you clear caches it may look like you do not have the link any longer in the content. This is the root cause of this malfunction. Disable the InsertNode on link checker settings page!

If you mean, auto-update from Drupal Core, yes. That module is up and running on that very site.

No, I mean the Impersonate user account and Update permanently moved links (auto-update) in Error handling settings of link checker settings. If disabled, all is fine.

CRON runs as an anonymous user.

That's fine. Linkchecker module impersonates to the Impersonate user account if Update permanently moved links is enabled. But if you use this Update permanently moved links the Impersonate user account must have full permissions on the website.

I also run node_privacy_byrole.

I'm more looking for forward revisioning modules that may confuse linkchecker. See #2067317: Support revisioning and workflow modules to remove unpublished nodes from broken link report

I tested revision_moderation a while back. Probably way before I ever installed linkchecker. It's not in place now, and most certain not used on the specific nodes generating a problem.

If not in place, there should be no problem.

I'm thinking the https://drupal.org/project/insert_block would have the same problem as the InsertNode module mentioned earlier since it will show links from another block.

Absolutly. Same rule like InsertNode - disable it in the settings.

If you have any other filters that references to other real content, make sure they are also disabled in the link checker settings page. This is really critical for linkchecker as it causes a lot of these malfunctions. That's why the RED "flashing" (Recommended) is shown next to the filter setting in link checker settings page.

After you changed this Filters disabled for link extraction settings make sure you save and run Reanalyze content for links in Maintenance section on link checker settings page! This will cleanup all your links and references in the linkchecker tables.

Changing to support as it looks like the source of all your issues are the incorrect filter settings and I was not aware that this can cause permission denied issues... really interesting.

AlexisWilke’s picture

Great! That helped quite a bit! I still have 3 links that generate an error. I'll look into it to see what happens with those, but it cleared many others just by adding the InsertNode filter to the list of disabled filters.

AlexisWilke’s picture

Darn, we cannot edit our own posts anymore...

Anyway, I see that the last 3 nodes I have a problem with are unpublished. You still check the links of unpublished nodes, but you do not update those links!

I can see several possibilities to fix this problem, which I can see also includes another problem: node types.

I suppose the idea is that if a node is not published then it has to be ignored (see $node->status test and n.status = 1):

    // nodeapi code snippet
    case 'insert':
    case 'update':
      // The node is going to be published.
      if (_linkchecker_scan_nodetype($node->type) && $node->status) {
        _linkchecker_add_node_links($node);
      }
      break;
    //
    // _linkchecker_batch_import_nodes code snippet
    $result = db_query('SELECT n.nid FROM {node} n WHERE n.status = %d AND n.type IN (' . db_placeholders($node_types, 'varchar') . ') ORDER BY n.nid', array_merge(array(1), $node_types));

I think that the simplest way to fix that problem is to change the extraction process and return array() if the status of the node is 0. Thus, all the nodes are checked, but disabled nodes always tell you that they have no links.

I'm attaching a patch which I tested with my "broken" system and it worked perfectly.

So... steps to reproduce:

1. create a node with an invalid link

2. make sure the node is parsed and you see the invalid link in your list of broken links

3. go edit the node, unset the Published flag, save

4. go back to the list of broken links, see that the system generates one of those errors

5. apply my patch

6. edit the node + save

7. go to the list of broken links, it worked

8. to test that the patch fixes the batch process, restart from step 1 to 5

9. go to linkchecker settings

10. run Reanalyze contents for links

11. go to the list of broken links, it worked

So... I agree that the InsertNode was my mistake (misuse of the module), however, the last 3 "broken" links is an issue in the module as I just shown.

Side note:

All the other things you mentioned were all properly setup. Thank you for listing them though!

AlexisWilke’s picture

Issue summary: View changes
FileSize
1.76 KB
AlexisWilke’s picture

Sorry... there is actually one more thing to do when unpublishing the node:

3. edit the node, unset the Published flag, delete the broken link, save

You have to remove the broken link in step 3, that means it isn't there anymore and thus the module see that as really bad and bang: permission errors in the list of links.

AlexisWilke’s picture

Title: About error "Permission restrictions deny you access" » About error "Permission restrictions deny you access" and unpublished nodes
Assigned: Unassigned » AlexisWilke
Category: Support request » Bug report
Priority: Normal » Major
Status: Fixed » Needs review

Trying to get used to the new Drupal issue queue! Changing the issue settings this time...

hass’s picture

Now you may found a bug I missed to backport from D7. Look into D7, please. Your patch can be a lot simplier. Comments may require this, too.

hass’s picture

Title: About error "Permission restrictions deny you access" and unpublished nodes » About error "Permission restrictions deny you access"
Assigned: AlexisWilke » Unassigned
Category: Bug report » Support request
Priority: Major » Normal
Status: Needs review » Fixed

Please keep this issue clean. I opened #2125719: Links in unpublished nodes are still checked.

hass’s picture

Title: About error "Permission restrictions deny you access" » Incorrect filter configuration may cause "Permission restrictions deny you access"

Changing title.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.