If a link in a node is named like http://example.com/foo/bar and http://example.com/foo get's a 301 from a remote site to http://example.com/example1 the link http://example.com/foo/bar is changed into http://example.com/example1/bar. This may break the other link.

Actions:
1. Code need to verify if a checkable link is surrounded by single or double quotes or blanks and we can be save this is complete link and not only a chunk of a link.
2. Or code need to update only links inside a link element (<a href="http://example.com/foo/bar">example</a>
3. Other links need to be changed by hand only to be save.

Comments

hass’s picture

Priority: Normal » Critical
AlexisWilke’s picture

What would the browser do if http://example.com/example1/bar is incorrect?

hass’s picture

It may be listed on the next link check as "file not found", but we cannot say for sure how others configure their servers. If they provide a 301 for every link that does not exists to their home we change again to http://example.com/ and than you lost this link... You may not able to reproduce the link and than the link is lost.

I have added very special regexes to the extraction process within the last days. We need to adapt this and use regex replace with nearly the same regexes and replace only links we know for sure they are inside a link or one of the other HTML elements and not only a part of a link written in plain text or so.

hass’s picture

Status: Active » Needs review
StatusFileSize
new10.46 KB
new10.46 KB

Patches attached.

hass’s picture

New patches, less lines, should be faster and fixed numbering bugs in replacement string.

hass’s picture

Status: Needs review » Fixed

Above patches are partly no-brainers...

I've committed better patches that have been tested on some links and seems to work reliably. Would be good to hear from some people if it works well or not.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.