Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hass’s picture

Status: Active » Postponed (maintainer needs more info)

I'm not sure what exactly you are talking about.

omaster’s picture

It is quite a hard issue to explain really. But what basicly happens is if a cron job is run by the system then any page with a link that is getting updated by link checker because it was recorded as "moved permanently" loses any extra information like which Taxonomy it belonged to. It also does not actually create a revision of this updated page which makes it hard to see what changes link checker actually made to the page.

hass’s picture

I looked into the code and I think this is a permission issue. The cron task need to load the user 1 user object or it may have not enough permissions. What type of restrictions are you using in taxonomy?

omaster’s picture

user 1 has full admin permission on everything. The same as my user. If I run the cron myself manually then it works. I get a new revision and I get the correct taxonomy. but if I let it run by cron itself then both of those do not happen.

hass’s picture

Version: 6.x-2.5 » 6.x-2.x-dev
Category: support » bug
Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Active

Yeah, that is what I mean. There is a bug in the module as cron runs unter anonymous user account and do not have permission. I need to load the admin user; if the cron job updates your nodes.

For now, please disable the auto update until we have a working fix implemented.

hass’s picture

Title: Fails to keep Taxonomy. » Cron task does not impersonate to admin user on automatic content updates
hass’s picture

achton’s picture

I want to help with this issue, but I can't replicate the bug.

I installed D6.25 using drush si and installed latest 6.x-2.x-dev. I then used devel generate to create content and assign taxonomy automatically. I edited a node and added a 301 link to the body, and then proceeded to call cron thus: wget --no-check-certificate -O - -q -t 1 http://dev.site/an/drupal-6.25/cron.php

As you can see from attached screenshots, cron ran as Anonymous, adding a new revision and watchdog log. The node was updated correctly, retaining all taxonomy.

So what am I missing? Are there other access restrictions in place in the original bug report that could muddy the waters?

hass’s picture

Status: Active » Postponed (maintainer needs more info)

Than we need more infos.

omaster’s picture

In my site Anonymous users are not allowed to edit pages so that may be where the issue comes.
You may need to remove all permissions from Anonymous to replicate the issue. Infact the site has no anonymous accessibility at all.

I think you are right about the issue being permission and it is probably in the effect that this does need to be run as admin user as opposed to anonymous.

omaster’s picture

I have actually managed to fix it using the link to safely impersonating another user.

I will do a patch and submit it for testing.

hass’s picture

Status: Postponed (maintainer needs more info) » Active

I think we should allow users to configure a custom user account for the updates. So we don't need to force user/1 in this case. Something like recommending to create a Linkchecker user or Automatic Link Updater. For this we need a setting in linkchecker settings page. We can start with an integer (user/1), but usability wise we should make it more an autocomplete field or something like this that shows the username behind the userID.

omaster’s picture

That shouldn't be too hard. Just need to add the field to the admin page for selecting the user and have it so that if a user is selected in there then that is the user to add in the function _linkchecker_status_handling which is where I placed the user fix.

function _linkchecker_status_handling($link, $response) {
global $user;
$original_user = $user;
$old_state = session_save_session();
session_save_session(FALSE);
$user = user_load(1);
 
Lots and lots of lines of code (Entire content of the function)


  $user = $original_user;
session_save_session($old_state);
}

Then have it that where it uses $user= user_load(1) that the 1 can be the value for the user in that field. The only issue with it is that a user can be impersonated without any authentication what so ever which is not a good thing. But I think that is more a drupal issue then a link checker one ;).

achton’s picture

I just thought of another issue in relation to this today. On a site that I maintain, admin users are able to render unpublished nodes (might be Panels that makes this possible), but most regular users will not, and will instead receive a HTTP 403 (should be 404, I know).

In any case, when LC runs as user/1, links to this content won't show in the reports, since user/1 has access to the content.

So it seems perfectly valid for LC to assume the anonymous role when visiting content, and find another solution for the permissions issue, if it is indeed valid as described in the OP.

I like the feature on specifying a specific user, but I think these are seperate issues.

hass’s picture

Unpublished nodes or links in them are not checked, updated, etc. This logic has nothing to do with permissions.

achton’s picture

@hass: I'm talking about published nodes that contain links to unpublished nodes. If user/1 has permissions/is able to render an unpublished node, Link Checker will never tell me about links to unpublished content.

It is my understanding that on a vanilla Drupal site, links to unpublished content will be detected by Link Checker. But maybe I'm wrong?

hass’s picture

We only impersonate the content updates. The check is not impersonated. It's still an anonymous http request.

omaster’s picture

Yeah the location that I placed the impersonation only affects the after effects of the check. So when an action must be performed.

achton’s picture

Ah, okay - thanks for clearing that up. That seems obvious now that I think about it. Sorry for hijacking the thread, then :-/

Still want to help with the bug/feature/patch though :)

hass’s picture

@omaster: Are you working on the D6 and D7 patches? Please assign case to you if you do.

omaster’s picture

Assigned: Unassigned » omaster

I have found one more problem now. It appears that it is not actually doing the changes. Merely creating the revision and adding a comment to it. But not actually changing the link. I have to look at it further.

I will work on testing the D6 a little more and creating a patch for this.

omaster’s picture

I am attaching my patch that I did for now that does not fully work but allows impersonating a user so maybe if anyone else wants to test as well and see where it is going wrong they can.

omaster’s picture

Status: Active » Needs work
omaster’s picture

Status: Needs work » Needs review
omaster’s picture

I am guessing that the problem may be that is loses it's impersonation when it changes functions. So we may need to impersonate multiple times. But I will test and let you know.

omaster’s picture

Adding it to sub functions did not help. Need to work out why it is not editing the nodes.

Not sure if it is related or not.

hass’s picture

Please keep in mind that the module is currently not able to update internal or relative links automatically. Don't waste time on this task. :-)

hass’s picture

Status: Needs review » Needs work

There are many unrelated changes in the patch and the UI option is missing.

omaster’s picture

Yeah I noticed the unrelated changes. Have no idea how that happens when I just downloaded the copy I compared to and made no changes to those files. Will try and rectify and yes UI option is the next step. Just wasn't sure about the function because my links do not change. ;)

Ok so obviously once this is finished I will need all links to be able to be updated so I am thinking that it may end up as my next task unless something is already happening on it? :D

hass’s picture

We should add one more patch that compare if node content has changed. If not, don't save a new revision. Changing local links can be quite tricky...

omaster’s picture

Yes it sounds like a good idea.

And then work on the local links. Because that is where my biggest issue is seeing as my site in an Intranet so majority of linking is internal. :) With 1500 pages having been created and still active in 1 year of it being live and about 1000 documents linked to the pages it means there is a lot of links flying around. :D. And at present 95% of all broken / moved links are internal. :)

BTW for me it is both External and Internal links that are not changing. Not sure why.

hass’s picture

External must really work. You can debug in the link replace function. Dump $text before and after.

Replacing internal links is extremly difficult. I think it would be best to replace internal links with drupal raw paths, but this requires pathologic module. So the result depends on what we have installed on the system and can break fairly soon as it's nearly impossible to reproduce all user entered path variants. I don't like to educatate everyone to install pathologic and never use aliases, etc. I strongly suggest You to use pathologic and you will never run into this issue again.

omaster’s picture

I am not sure I understand the problem.

Is it not possible to just change the text in question with the correct Alias. I mean if there is a redirection to the correct page and the system has been able to determine this should this not just be a case of swapping the information.

Or am I not understanding something?

Like for example if I change the alias and a redirection is created by the Path redirect module then shouldn't it just be able to change any reference to the old link to the new link?

hass’s picture

The problems come mainly all from input filters and I'm sure I cannot list or guess all variants that are possible. Some filters like pathologic will always show the correct link as long as the node exists. BBCode, markup are different and any other module makes more indivisual issues. Others people may use relative links, that are difficult to find on replace and so on and so on. Better safe than sorry.

Use pathologic for future and you will never run into internal path issues again... I'm using it everywhere.

omaster’s picture

So I have done a dump now and the before and after end up the same. Going to investigate why.

omaster’s picture

Ok now I am done testing. I found out the issue. The pages I was testing with were missing the a tag. The link was because the text was a https:// and it automatically was creating the link which linkchecker found but did not change because of the fact the link was in the main body text not a tag.... It should be possible to make this change also though?

hass’s picture

We should go with #287292: Add functionality to impersonate a user and use http://drupal.org/node/42552 autocomplete in admin settings form.

hass’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
Status: Needs work » Needs review
FileSize
11.48 KB

Patch attached.

Status: Needs review » Needs work
hass’s picture

New patch.

Status: Needs review » Needs work
hass’s picture

Hopefully that's it.

hass’s picture

Here is a D6 version.

hass’s picture

Status: Needs review » Fixed
hass’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.