Problem/Motivation
Many HTML tags add horizontal space or a line break between them when rendered.
PlainTextOutput::renderFromHtml() simply removes HTML tags but does not add spaces, which can lead to a result that is surprising for users.
Steps to reproduce
Call PlainTextOutput::renderFromHtml('<p>Foo</p><p>Bar</p>'); .
The result will be FooBar but it would make more sense to have Foo Bar.
We noticed this when using HTML fields as metatag tokens. Some sentences are joined together without a space between them.
Proposed resolution
The method could add a space between each tag before stripping the tags.
Remaining tasks
Review
User interface changes
NA
Introduced terminology
NA
API changes
NA
Data model changes
NA
Release notes snippet
NA
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | Screenshot 2023-07-11 at 5.36.21 PM.png | 48.9 KB | keshavv |
Issue fork drupal-3372446
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
prudloff commentedComment #4
cilefen commentedComment #5
smustgrave commentedComment #6
cilefen commented😉
Comment #7
keshavv commentedI have tested the MR with this code snippet.
It results

Comment #8
prudloff commentedI added some tests.
Comment #9
smustgrave commentedFixed up summary just slightly.
Left 1 comment on MR
If you are another contributor eager to jump in, please allow the previous poster at least 48 hours to respond to feedback first, so they have the opportunity to finish what they started!
Comment #10
smustgrave commentedYou are absolutely correct. That was the only feedback I had
Comment #11
nod_I'll put that at least to NW, it's possibly a won't fix situation depending on how much exists out there that could help with the situation.
So the fix here is to add a space between all tags, this breaks pretty fast, for example:
Here we would not expect a space to be added. Whitespace in HTML is very complex, see https://blog.dwac.dev/posts/html-whitespace/ so hand crafting rules by hand is not a reasonable solution. We could parse the string as HTML and use
textContentproperty fromDOMNodeand hope it does things correctly with html5.Comment #12
prudloff commentedI did a quick test like this:
But it seems it never adds any space.
A more robust solution would be to use the html2text library but it would be a bit overkill to add a new dependency just for this.
Comment #13
prudloff commentedI did more tests and I agree we won't be able to handle various scenarios without having complex rules or add a new dependency, so I'm closing as wontfix.
Sites that need this can use a library like html2text instead.
Comment #14
anybodyWe recently ran into this with Metatag module and created #3555964: PlainTextOutput::renderFromHtml removes important whitespace from HTML as we didn't find this one.
Over there, I suggested the following solutions:
From my perspective, this definitely isn't a feature request, but a (major) bug! The issue with this is, that the outcome of the function is wrong and unexpected and it's used in several places in core and contrib.
One super widely used that has wrong results that are a real-world issue for SEO is Metatags: #3555694: :summary Tokens with HTML are shortened too much - first strip HTML, then trim
Should we reopen this and call it a bug or proceed in #3555964: PlainTextOutput::renderFromHtml removes important whitespace from HTML?
Using a library here instead of the poor-man's
strip_tags()might also be able to fix #2896735: PlainTextOutput::renderFromHtml may render potentially dangerous outputI totally understand that this was closed as feature-request, but after I saw the consequences in Metatag and potentially other core places, I'd strongly vote to improve the situation someway.
Comment #15
michael.j.gleeson commentedHave to second this,
It is a bug, the metatags are not rendering correctly.
I am getting spaces not being rendered at the end of sentences and headers running into paragraphs.
The HTML tag stripping is not allowing for common sense white spaces.
e.g.
meta description="HeaderThis is my paragraph......"
Comment #16
michael.j.gleeson commentedFor those finding this issue one solution is to use Smart trim, See documentation