Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Drupal uses UTF-8 for content, so have to make sure DomDocument needs to know about this. Simplest solution to do is is prefix the partial html with a charset meta element.
$dom->loadHTML($html);
Needs to become
$dom->loadHTML('<meta charset="UTF-8">' . $html);
Sorry for not providing a patch. Not able to roll one right now.
Comment | File | Size | Author |
---|---|---|---|
#6 | 2639188-6.patch | 5.83 KB | alexpott |
#5 | smart_trim-encoding-issue-2639188-5.patch | 1.61 KB | mikeyk |
#2 | smart_trim-encoding-issue-2639188-2.patch | 527 bytes | mikeyk |
Comments
Comment #2
mikeyk CreditAttribution: mikeyk commented@casey - Thanks for posting, I was having the same issue too which this fixes.
Attached is a patch with your change.
Comment #3
mikeyk CreditAttribution: mikeyk commentedComment #4
adamwhite CreditAttribution: adamwhite at JMR Logics commentedI've tested and applied this on the latest dev and it solves the reported problem for me.
Comment #5
mikeyk CreditAttribution: mikeyk commented@casey @adamwhite - Attached is an updated patch to solve this problem. I found the original solution worked fine on our test environment (php 5.x, Windows) but didn't on our production site (php 7.x, Linux). This patch works fine on both.
I'm not entirely sure what is the cause of the difference, could be PHP version or OS or something else entirely - but would be good to have feedback on this.
Comment #6
alexpottDrupal 8 has some helper functions to manipulate snippets of HTML using DomDocument let's use them. This reduces some of the complexity. I've added tests too. We need to seriously beef up the test coverage of TruncateHTML - its got complex recursive logic and has to deal with user input so it is super super easy to break. For example in 8.x-1.x any text that is not wrapped in an HTML element like div is wrapped with a p tag. The patch here fixes that too because it is using the core tools to manipulate the html snippet in a DomDocument.
Given that the whole point of smart trim is to trim user input I think if it is messing with said user input and corrupting it we should consider that a critical bug for this module.
Comment #8
markie CreditAttribution: markie at Mediacurrent commentedAll,
I have added patch #6 and pushed it. Please test the dev version and verify this is working for you. If no one screams about it in the next week I want to do a full release by Wednesday.
thanks!
Comment #9
mbayntonJust encountered encoding issues on trimmed text and can confirm a git pull fixed me right up. Thanks!