Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Some time to time, after linkchecker test links from node, in body appears "& # 13;" (without spaces) on every line break.
How can i stop it?
Comment | File | Size | Author |
---|---|---|---|
#28 | linkchecker_add_on_every_linebreak-2454335-28.patch | 734 bytes | Skabbkladden |
Comments
Comment #1
Dandily CreditAttribution: Dandily commentedComment #2
Haessler CreditAttribution: Haessler commentedPermission restrictions deny you access to this broken link on every line. As I saw in the history this error was allready present in Drupal 6. Simply unprofessional.
Comment #3
Dandily CreditAttribution: Dandily commentedAre we talking about the same?
In my situation, if in node was a link (in body or in other field), Link checker resave it with body.
And body will change to this:
If link checker check this node next time, it will be:
And so on.
I think this happen when link checker have found -11000 error or other, because not every day he add & #13; in body.
Where need i look to found this error and fix it?
Comment #4
hass CreditAttribution: hass commentedThat is not caused by linkchecker I think. It is your content and filter settings. I guess you copy&pasted the content from somewhere on a apple Mac and thereby added the \r's to the body and now the core content filters are correcting this \r and these are disallowed and therefore replaced by an html entity. What core filters are doing is not in the hand of linkchecker. Check your core filter settings, please. Linkchecker is only a victim as it is the next user that saves the node.
If this may be a linkchecker issue than you need to write a repro case first. I have never seen this myself and have no idea how to repro. I think you may also see this issue with other manual edits. But maybe you need to switch between mac and windows.
http://stackoverflow.com/questions/5082253/what-is-html-entity-13
Comment #5
hass CreditAttribution: hass commentedComment #6
Dandily CreditAttribution: Dandily commentedI'm still dont try to reproduce this from windows with clear drupal installation, but i have about 4 sites on drupal, and i write articles on them from my mac. This problem appear only on site with Linkchecker running and only in nodes with body text with links.
I made some experiments and now i can say, that's happened only if one of the link have response 301 - permanent moved.
I make a custom module page with
In settings in linkchecker i have "Update permanently moved links : two times"
After i add new link to my custom page in some node body's text and run cron two times (checked link two times) - those symbols appears on every line in body text.
After looking into Linkchecker's code - i found, that those symbols appear in function _linkchecker_link_replace - about 1951 line in linkchecker.module. The variable $text received such changes in line near the end of function:
Adding after this some lines:
... help me to solve my problem, but it's not a best way...
Comment #7
hass CreditAttribution: hass commentedCan you check if it is also there in the beginning of this function?
Comment #8
Dandily CreditAttribution: Dandily commentedI check this before and after
Worked only after, if i clear old "13" from node. With old "13" in body - worked two times.
Comment #9
hass CreditAttribution: hass commentedSounds like a PHP configuration issue or bug in filter_dom_serialize() or filter_dom_load().
Can I get a file dump of such a before $text, please? So I can try if I get the same result on my box.
Comment #10
hass CreditAttribution: hass commentedTry this, please. Add it into linkchecker.module line 1984.
Comment #11
Dandily CreditAttribution: Dandily commentedIn clean linkchecker.module i have on line 1984:
Need i place code between 1983 and 1984 or after 1984 ?
Comment #12
hass CreditAttribution: hass commentedNo, earlier. Compare lines with latest dev.
Comment #13
Dandily CreditAttribution: Dandily commentedI have 7-1.2 version, but i look i dev version and place this code after:
And yes - it's work, i see this message in log/journal.
Comment #14
hass CreditAttribution: hass commentedDEV is stable. Don't feat to install. You can just go back to 7.x-1.2 after the tests.
Does the
$text
variable in line 1984 already contain the entities? If true, dump$node
in line 619 (linkchecker.module) and see if it already contain the HTML entities. I gues it will.What PHP version are you running?
Comment #15
hass CreditAttribution: hass commentedComment #16
vinmassaro CreditAttribution: vinmassaro commentedThis is an issue for us on nodes that have had links updated with Linkchecker 7.x-1.3. Please let me know any information I can provide to help debug this. Thanks.
Comment #17
hass CreditAttribution: hass commentedSee above. It is still unclear how to reproduce and the root cause need to be identified. If you may help here we may be able to solve the issue.
Comment #18
hass CreditAttribution: hass commentedNo feedback.
Comment #19
vinmassaro CreditAttribution: vinmassaro commentedIf I get time to test, I will reopen. Thanks.
Comment #20
alt36 CreditAttribution: alt36 commentedI think the problem does indeed come about due to the way filter_dom_serialize() works. Consider the following PHP snippet:
$text = "first line
second line";
print_r(filter_dom_serialize(filter_dom_load($text)));
If the newline in the middle of $text is a CRLF (i.e. 0x0d 0x0a , achieved by "set ff=dos" in vim) then the printed output contains
in the middle. If instead the newline is just a LF (i.e. 0x0a , achieved by "set ff=unix" in vim) then there's no
in the output.Looking at the filter_dom_serialize() source, I think this arises in part from their use of saveXML() - continuing my example, if you then
$dom = new DOMDocument();
$dom = filter_dom_load($text);
print($dom->saveHTML());
print($dom->saveXML());
then the generated
<body>
node is the same in both cases if $text uses a LF, but in the case of CRLF, saveXML() includes the
entity whereas saveHTML() does not.Comment #21
alt36 CreditAttribution: alt36 commentedCode to reproduce: create a node with the following fragment (which includes a link that I've checked is returning 301)
Then run linkchecker; a
entity will appear in the node body due to the way CRLFs are handled, as described in my previous comment.Comment #22
vinmassaro CreditAttribution: vinmassaro commentedReopening because @alt36 provided some new info. I'm not sure how to work around it.
Comment #23
hass CreditAttribution: hass commentedThat is an interesting analysis. Now we only need to figure out why this issue does not show up in core and add the same code... i suspect there is a regex somewhere that changes all windows/mac line feeds to unix line feeds before its getting saved.
Comment #24
alt36 CreditAttribution: alt36 commentedGrepping for \r in the code for the core filter module, I find
https://api.drupal.org/api/drupal/modules!filter!filter.module/function/...
which uses a str_replace() to standardise line endings:
check_markup() isn't called inside the filter module itself, but it is called from e.g. https://api.drupal.org/api/drupal/modules%21field%21modules%21text%21tex... so it would seem like a plausible thing to add to Linkchecker
Comment #25
hass CreditAttribution: hass commentedThe same is in linkchecker module. See http://cgit.drupalcode.org/linkchecker/tree/linkchecker.module?h=7.x-1.x...
I only use core api to save nodes. So the code should just follow the same code flow like a node form save. Normally code in node forms is never altered. Filters run only on node view.
Comment #26
alt36 CreditAttribution: alt36 commentedThanks for the info @hass . The str_replace() call you link to in linkchecker.module happens too late in this instance though, because by that point $text already includes
instead of\r
, which happens due to the call to filter_dom_serialize() in _linkchecker_link_replace() . I've not fully traced all the function calls, but if I print some simple debug messages I can see that _linkchecker_link_replace() is being called before _linkchecker_check_markup() .So perhaps one option would be to the same str_replace() immediately before the filter_dom_serialize() at http://cgit.drupalcode.org/linkchecker/tree/linkchecker.module?h=7.x-1.x... ?
Comment #27
oggsmith CreditAttribution: oggsmith commentedI'm also seeing the same issue, saving content doesn't add in #13 linkchecker auto-fixing broken links does.
Comment #28
Skabbkladden CreditAttribution: Skabbkladden at Digitaliseringsdirektoratet commentedAfter some testing, it seems that doing the str_replace() call before building the DOM in _linkchecker_link_replace() (around line 2141) solves the problem. See patch.
Comment #29
hass CreditAttribution: hass commentedStill not clear to me where the entities are added and why. Doing random string replacements are not the right way to solve bugs.
Can you write a test that shows how it breaks, please?