If a link in a node is named like http://example.com/foo/bar and http://example.com/foo get's a 301 from a remote site to http://example.com/example1 the link http://example.com/foo/bar is changed into http://example.com/example1/bar. This may break the other link.
Actions:
1. Code need to verify if a checkable link is surrounded by single or double quotes or blanks and we can be save this is complete link and not only a chunk of a link.
2. Or code need to update only links inside a link element (<a href="http://example.com/foo/bar">example</a>
3. Other links need to be changed by hand only to be save.
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | linkchecker_replace_only_save_urls2-D52.patch | 6.26 KB | hass |
| #5 | linkchecker_replace_only_save_urls2-D62.patch | 6.26 KB | hass |
| #4 | linkchecker_replace_only_save_urls-D52.patch | 10.46 KB | hass |
| #4 | linkchecker_replace_only_save_urls-D62.patch | 10.46 KB | hass |
Comments
Comment #1
hass commentedComment #2
AlexisWilke commentedWhat would the browser do if http://example.com/example1/bar is incorrect?
Comment #3
hass commentedIt may be listed on the next link check as "file not found", but we cannot say for sure how others configure their servers. If they provide a 301 for every link that does not exists to their home we change again to http://example.com/ and than you lost this link... You may not able to reproduce the link and than the link is lost.
I have added very special regexes to the extraction process within the last days. We need to adapt this and use regex replace with nearly the same regexes and replace only links we know for sure they are inside a link or one of the other HTML elements and not only a part of a link written in plain text or so.
Comment #4
hass commentedPatches attached.
Comment #5
hass commentedNew patches, less lines, should be faster and fixed numbering bugs in replacement string.
Comment #6
hass commentedAbove patches are partly no-brainers...
I've committed better patches that have been tested on some links and seems to work reliably. Would be good to hear from some people if it works well or not.