Drupal uses UTF-8 for content, so have to make sure DomDocument needs to know about this. Simplest solution to do is is prefix the partial html with a charset meta element.

$dom->loadHTML($html);

Needs to become

$dom->loadHTML('<meta charset="UTF-8">' . $html);

Sorry for not providing a patch. Not able to roll one right now.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

casey created an issue. See original summary.

mikeyk’s picture

@casey - Thanks for posting, I was having the same issue too which this fixes.
Attached is a patch with your change.

mikeyk’s picture

Status: Active » Needs review
adamwhite’s picture

I've tested and applied this on the latest dev and it solves the reported problem for me.

mikeyk’s picture

@casey @adamwhite - Attached is an updated patch to solve this problem. I found the original solution worked fine on our test environment (php 5.x, Windows) but didn't on our production site (php 7.x, Linux). This patch works fine on both.
I'm not entirely sure what is the cause of the difference, could be PHP version or OS or something else entirely - but would be good to have feedback on this.

alexpott’s picture

Priority: Normal » Critical
FileSize
5.83 KB

Drupal 8 has some helper functions to manipulate snippets of HTML using DomDocument let's use them. This reduces some of the complexity. I've added tests too. We need to seriously beef up the test coverage of TruncateHTML - its got complex recursive logic and has to deal with user input so it is super super easy to break. For example in 8.x-1.x any text that is not wrapped in an HTML element like div is wrapped with a p tag. The patch here fixes that too because it is using the core tools to manipulate the html snippet in a DomDocument.

Given that the whole point of smart trim is to trim user input I think if it is messing with said user input and corrupting it we should consider that a critical bug for this module.

  • markie committed 21b9857 on 8.x-1.x authored by alexpott
    Issue #2639188 by mikeyk, alexpott: Encoding issue
    
markie’s picture

Status: Needs review » Fixed

All,
I have added patch #6 and pushed it. Please test the dev version and verify this is working for you. If no one screams about it in the next week I want to do a full release by Wednesday.

thanks!

mbaynton’s picture

Just encountered encoding issues on trimmed text and can confirm a git pull fixed me right up. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.