The drupal_html_to_text() function was broken by collateral damage from #356074: Provide a sequences API

This is a regression in d7 versus d6 that affects other projects such as Simplenews.

A comprehensive patch/fix is being discussed, but may never get resolved to everyone's satisfaction. As a stopgap, therefore, I suggest that we merely reverse the damage so that the d7 function performs at least as well as the d6 version.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pillarsdotnet’s picture

Status: Active » Needs review
FileSize
1008 bytes
1008 bytes

Patches for d7 and d8:

Simon Georges’s picture

Subscribe, in case this one goes faster than the original one (#299138: Improve \Drupal\Core\Utility\Mail::htmlToText()).

pillarsdotnet’s picture

You can help by posting a review of the patch in #1. An acceptable review would:

  • Confirm that the patch actually solves the problem. Describe a reproducible test where the current code fails and the patched code succeeds.

  • Change the issue status from "Needs Review" to "Reviewed and Tested By the Community".

If, after thorough testing, you cannot in good conscience affirm the above, please explain what is lacking in the current patch and I will do my best to address your concerns.

drupjab’s picture

Is this the same patch as http://drupal.org/node/299138#comment-4193400 (which is for D7)? It looks like it from the text in the actual patch code, but just want to confirm.

pillarsdotnet’s picture

@rrcjab -- Yes, it's the same patch; just re-rolled against a fresh 8.x checkout.

pillarsdotnet’s picture

Status: Closed (duplicate) » Needs review
Issue tags: -Quick fix, -quickfix
drupjab’s picture

@pillarsdotnet Both of them (D7 & D8) have worked from my functional tests. Plain text, full and filtered HTML. Your test suite looks good to me too. I read the discussion, so what's the next step to actually get them in the next official release(s)?

pillarsdotnet’s picture

Issue tags: +Quick fix, +quickfix

@rrcjab -- The next step is someone other than the patch-writer having sufficient confidence and fortitude to mark the issue RTBC. After that, well... (looking) The oldest patch in a similar status has been waiting for less than a month.

Damien Tournoud’s picture

Status: Needs review » Closed (duplicate)
drupjab’s picture

@pillarsdotnet - Well, it was a good effort, :/

I'll start catching up on the other thread now.

pillarsdotnet’s picture

Issue tags: -Needs committer feedback

Patch in #1 still applies, and the the other issue may still be in debate when D9 ships. Let's fix what we broke in D7 first.

pillarsdotnet’s picture

Issue tags: +Quick fix, +Needs committer feedback, +quickfix
Damien Tournoud’s picture

Status: Needs review » Closed (duplicate)
pillarsdotnet’s picture

Status: Closed (duplicate) » Needs review

It is still a duplicate of #299138:

No, it's not. This is a only a single-line quick-fix reversal of a known regression introduced in D7. #299138 is a much larger issue, which no one is willing to commit until it reaches a level of perfection that may not happen in my lifetime.

Damien Tournoud’s picture

Status: Needs review » Needs work
Issue tags: -Regression, -quickfix +Needs tests

Well. The other issue transformed into a massive feature request (partly on your own impulse... support for tables... really?), so let's keep this one for the bug fix.

Can we incorporate some of the tests from the other issue?

Damien Tournoud’s picture

Title: Regression: restore original functionality to drupal_html_to_text(). » Regression: line-breaks are mangled by drupal_html_to_text()
Priority: Normal » Major
pillarsdotnet’s picture

@#15 -- Added tests, made them pass, (by "expecting" whatever results were generated), then removed the ones which are totally n/a.

support for tables... really?

Wasn't that hard, actually, since I already had a fully-parsed DOM tree.

pillarsdotnet’s picture

Status: Needs work » Needs review

Looks like testbots are fixed.

Status: Needs review » Needs work
Issue tags: -Quick fix, -Needs tests, -Needs backport to D7

The last submitted patch, drupal_html_to_text-fix_regression-1130198-17.patch, failed testing.

pillarsdotnet’s picture

Issue tags: -Needs backport to D7

Dunno; tests pass locally and the testbot is not displaying any of the $this->verbose() output. Trying again...

The last submitted patch, drupal_html_to_text-fix_regression-1130198-17.patch, failed testing.

pillarsdotnet’s picture

Moved the verbose() text back into the assertion message so I can figure out why the tests are failing.

pillarsdotnet’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, drupal_html_to_text-fix_regression-1130198-22.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
FileSize
17.93 KB

Grrr....

Status: Needs review » Needs work

The last submitted patch, drupal_html_to_text-fix_regression-1130198-25.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
FileSize
19.25 KB

Status: Needs review » Needs work

The last submitted patch, drupal_html_to_text-fix_regression-1130198-26.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
FileSize
20.79 KB

Status: Needs review » Needs work

The last submitted patch, drupal_html_to_text-fix_regression-1130198-28.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
FileSize
21.73 KB

Status: Needs review » Needs work
Issue tags: -Quick fix, -Needs tests, -Needs backport to D7

The last submitted patch, drupal_html_to_text-fix_regression-1130198-31.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
Issue tags: +Quick fix, +Needs tests, +Needs backport to D7
Damien Tournoud’s picture

Status: Needs review » Needs work
+/**
+ * Unit tests for drupal_html_to_text().
+ */
+class DrupalHtmlToTextTestCase extends DrupalUnitTestCase {
[...]
+  /**
+   * Test that footnote references are properly generated.
+   */
+  function testFootnoteReferences() {
[...]
+      . "\n[3] " . url('node/1', array('absolute' => TRUE))
[...]
+  }
+}

There is no way, ever, to use url() in a DrupalUnitTestCase, as this function needs to access the database to resolve URL aliases. I'm not sure how the tests can pass for your locally :)

Damien Tournoud’s picture

Please switch to DrupalWebTestCase. It is simply not valid to use url() inside a DrupalUnitTestCase. I'm not completely sure why it still somehow work here, running this through the UI rightly gives me a PDO error.

Also, could you please add @todo everywhere in the test where you think the current behavior is not the intended one? Specifically, testDrupalHtmltoTextCollapsesWhitespace should say that the current behavior is in no way correct.

Damien Tournoud’s picture

Issue tags: -Needs tests
pillarsdotnet’s picture

Status: Needs review » Needs work
Issue tags: -Needs tests

Also, could you please add @todo everywhere in the test where you think the current behavior is not the intended one?

That would be just about everywhere, all right. Working on it...

pillarsdotnet’s picture

Status: Needs work » Needs review
Issue tags: +Needs backport to D7
FileSize
22.16 KB

Switched to DrupalWebTestCase and added @todo lines. Removing the "Quick fix" tag, as it no longer applies.

The last submitted patch, drupal_html_to_text-fix_regression-1130198-39.patch, failed testing.

pillarsdotnet’s picture

Damien Tournoud’s picture

+      // @todo Footer urls should be absolute.
       '<a href = "/">Homepage</a>' => "Homepage [1]\n\n[1] /\n",

This is likely to fail, and likely to be due to the use of DrupalUnitTestCase before.

The rest looks super nice. Waiting for the test bot verdict.

pillarsdotnet’s picture

Bot is green.

Damien Tournoud’s picture

Issue tags: +Needs backport to D7

Weirdest thing. The tests are failing here locally, both on 8.x and 7.x. It see the proper behavior around absolute URLs:

result = "Homepage [1]\n\n[1] http://drupal.lab/\n"
expected = "Homepage [1]\n\n[1] /\n"

It seems that the test bots are running with a broken base URL.

Anyway, you should restore those calls to url() in there, they seem valid.

pillarsdotnet’s picture

Will do, after I re-test locally. @sun gave me some pointers on IRC. My local test must run with clean urls turned off, and with Drupal installed in a subdirectory, else the results aren't valid.

pillarsdotnet’s picture

sun’s picture

pillarsdotnet’s picture

pillarsdotnet’s picture

Issue and patch summary

October 15, 2009
Dries committed a patch from #407452-58.
May 21, 2010
Arjenk posted #299138-48 which claimed that part of the patch had broken drupal_html_to_text() functionality:
diff -u -p -r1.25 mail.inc
--- includes/mail.inc	1 Sep 2009 17:40:27 -0000	1.25
+++ includes/mail.inc	15 Oct 2009 05:58:51 -0000
@@ -466,7 +496,7 @@ function drupal_html_to_text($string, $a
         $chunk = $casing($chunk);
       }
       // Format it and apply the current indentation.
-      $output .= drupal_wrap_mail($chunk, implode('', $indent)) . "\n";
+      $output .= drupal_wrap_mail($chunk, implode('', $indent));
       // Remove non-quotation markers from indentation.
       $indent = array_map('_drupal_html_to_text_clean', $indent);
     }
August 1, 2010
Kscheirer posted #299138-50, explaining:

What wasn't obvious (because we had no tests for this function!) was that many of the tags were relying on that extra \n to format themselves properly. Stuff like h1, h2, dl, br, p, ol, ul, li, dl, dt, dd, etc. I think your patch was trying to make this function more general - but it doesn't seem to fulfill its original purpose now. Also _drupal_html_to_text_pad() is doing a substr, relying on the fact that there was an extra \n to throw away.

August 31, 2010
After some discussion, Kscheirer summarized in #299138-56:

Agreed, we're back to square one.

  • Define what exactly this function is supposed to do
  • Write tests for it
  • Fix the function to pass tests
April 17, 2011
Pillarsdotnet posted a patch in #299138-75 to reverse the chunk that caused the initial problem.
Sun changed the status back to "needs work" with #299138-76, saying:

It would tremendously help if people would re-roll the proper patch. See #57.

Pillarsdotnet posted a revised patch in #299138-84 which passed all but one of the proposed tests, saying:

Removed the "Full HTML Document" test because:

  1. I couldn't get it to pass no matter what I did.
  2. I think it's going about things the wrong way.
Sun rejected the patch in #299138-86, saying:

Removing a test that has been added for a good reason is a bad idea. The stated reasons are, in fact, critical reasons to keep it.

Pillarsdotnet opened #1130198 to offer the patch from #299138-75 as an interim solution.
April 23, 2011
Damien Tournoud closed #1130198-9, saying:

This is an obvious duplicate of #299138: Improve \Drupal\Core\Utility\Mail::htmlToText()

April 17 - June 4, 2011
An additional 58 patches were posted to #299138 (34 of which passed all tests), and the patch size grew from 9.5Kb to 42.5 Kb as newly-requested features and tests were added.
June 4, 2011
Pillarsdotnet re-opened #1130198-11, saying:

Let's fix what we broke in D7 first. ... This is a only a single-line quick-fix reversal of a known regression introduced in D7.

After some discussion, Damien Tournoud agreed, and requested in #1130198-15:

Can we incorporate some of the tests from the other issue?

June 5, 2011
Damien Tournoud added in #1130198-36:

Also, could you please add @todo everywhere in the test where you think the current behavior is not the intended one?

June 6, 2011
Pillarsdotnet posted a patch in #1130198-46 that appears to resolve all objections raised so far.
pillarsdotnet’s picture

Adding tests-only and tests+fix patch to highlight exactly what this patch changes.

clayball’s picture

Status: Needs review » Reviewed & tested by the community

The patch applied cleanly to 8.x and 7.x.
Email from my 7.x install looks MUCH better now.
Thank you for doing this.

Status has been updated.
However, maybe someone else would like to/should confirm?

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Yeeaahhh...

pillarsdotnet, please don't shoot me, but I would feel better about sign-off on this from Damien, since he seems to be one of few people who understands the intricacies here tech-wise and could give a final sign-off review.

I would really like to get this into 7.3, though. Thanks so much for your work on this!!

pillarsdotnet’s picture

(sigh)

pillarsdotnet’s picture

attiks’s picture

subscribing

Damien Tournoud’s picture

Status: Needs review » Reviewed & tested by the community

This patch is perfect. Thanks @pillarsdotnet for your hard work on this.

*Please*, let's get this in 7.3. It's part of the things in Drupal 7 that are *so broken* it's not even funny.

Damien Tournoud’s picture

In addition, the fix of this patch is so tiny and it adds so many tests that it cannot possibly break anything.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Sorry. This was marked RTBC too late for 7.3/7.4. But 7.5, that sure sounds nice! :)

Great work on this patch, and all of the extensive tests. Really happy to see this fixed, and I know all mail-using Drupal folks will be, too!

Committed to 8.x and 7.x.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

coredumperror’s picture

Priority: Major » Normal
Status: Closed (fixed) » Needs review
FileSize
648 bytes

I'm sorry to bring this up in such an old, closed issue, but there's a slight problem with this patch: it doesn't support the $conf['mail_line_endings'] override that MAIL_LINE_ENDINGS's documentation refers to, on line 11 of includes/mail.inc. The only other place in core that refers to MAIL_LINE_ENDINGS (modules/system/system.mail.inc) uses it like this:

    $line_endings = variable_get('mail_line_endings', MAIL_LINE_ENDINGS);

While this patch uses it without the possibility of override:

      $output .= drupal_wrap_mail($chunk, implode('', $indent)) . MAIL_LINE_ENDINGS;

I'm bumping this down to normal, since it only affects people who use the override.

I've attached a patch that remedies this problem. However, I have Drupal 7.16, so my patch may not have the right line numbers for Drupal 8's version of includes/mail.inc. It's just a one-liner, though, so it's easy enough to apply manually.

Status: Needs review » Needs work

The last submitted patch, mail-line_endings_fix-1130198-60.patch, failed testing.

sun’s picture

I also noticed that over in #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5 — we should additionally adjust some tests (as in the patch over there) to force MAIL_LINE_ENDINGS to be \n and make mail formatting tests pass on Windows (which are failing currently, since the actual formatted output uses \r\n).

I asked myself why @coredumperror re-opened this issue instead of creating a separate follow-up, but yeah, #58 states this was committed to both D8 and D7, so it's indeed a regression.

ianthomas_uk’s picture

Title: Regression: line-breaks are mangled by drupal_html_to_text() » Regression: line-breaks are mangled by \Drupal\Core\Utility\Mail::htmlToText()
Issue summary: View changes
Berdir’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Needs work » Closed (fixed)
Issue tags: -Needs backport to D7

Looks like that follow-up has been addressed somewhere else in Drupal 8, please open a new issue if it's still an issue for D7.