Problem/Motivation

CKEditor does a somewhat expected feature if you add a space to the end of an <h2> tag, it will preserve the space by converting it to &nbsp;. This is true for both Drupal 10.1 and 10.2.

The difference that is problematic for TOC_api is that that whitespace was previously converted to a space character in Drupal 10.1 is now left alone as a &nbsp; in 10.2.

The result toc_api/src/TocBuilder.php:90

Warning: DOMDocumentFragment::appendXML(): Entity: line 2: parser error : Entity 'nbsp' not defined in Drupal\toc_api\TocBuilder->renderContent() (line 90 of modules/contrib/toc_api/src/TocBuilder.php).

Drupal\toc_api\TocBuilder->renderContent(Object) (Line: 176)
Drupal\toc_filter\Plugin\Filter\TocFilter->process('TEXT HERE', 'en') (Line: 118)
Drupal\filter\Element\ProcessedText::preRenderText(Array)
call_user_func_array(Array, Array) (Line: 111)
Drupal\Core\Render\Renderer->doTrustedCallback(Array, Array, 'Render #pre_render callbacks must be methods of a class that implements \Drupal\Core\Security\TrustedCallbackInterface or be an anonymous function. The callback was %s. See https://www.drupal.org/node/2966725', 'exception', 'Drupal\Core\Render\Element\RenderCallbackInterface') (Line: 858)
Drupal\Core\Render\Renderer->doCallback('#pre_render', Array, Array) (Line: 421)
Drupal\Core\Render\Renderer->doRender(Array, 1) (Line: 240)
Drupal\Core\Render\Renderer->render(Array, 1) (Line: 165)
Drupal\Core\Render\Renderer->Drupal\Core\Render\{closure}() (Line: 627)
Drupal\Core\Render\Renderer->executeInRenderContext(Object, Object) (Line: 166)
Drupal\Core\Render\Renderer->renderPlain(Array) (Line: 62)
Drupal\text\TextProcessed->getValue() (Line: 154)
Drupal\Core\Field\FieldItemBase->__get('processed') (Line: 143)
node_tokens('node', Array, Array, Array, Object)
call_user_func_array(Object, Array) (Line: 409)
Drupal\Core\Extension\ModuleHandler->Drupal\Core\Extension\{closure}(Object, 'node') (Line: 388)
Drupal\Core\Extension\ModuleHandler->invokeAllWith('tokens', Object) (Line: 416)
Drupal\Core\Extension\ModuleHandler->invokeAll('tokens', Array) (Line: 364)
Drupal\Core\Utility\Token->generate('node', Array, Array, Array, Object) (Line: 241)
Drupal\Core\Utility\Token->doReplace(1, '[node:summary]', Array, Array, Object) (Line: 191)
Drupal\Core\Utility\Token->replace('[node:summary]', Array, Array, NULL) (Line: 66)
Drupal\metatag\MetatagToken->replace('[node:summary]', Array, Array) (Line: 789)
Drupal\metatag\MetatagManager->processTagValue(Object, Array, Array, , 'en') (Line: 628)
Drupal\metatag\MetatagManager->generateRawElements(Array, Object) (Line: 61)
Drupal\metatag\Plugin\Field\MetatagEntityFieldItemList::Drupal\metatag\Plugin\Field\{closure}() (Line: 627)
Drupal\Core\Render\Renderer->executeInRenderContext(Object, Object) (Line: 62)
Drupal\metatag\Plugin\Field\MetatagEntityFieldItemList->computeValue() (Line: 32)
Drupal\metatag\Plugin\Field\MetatagEntityFieldItemList->ensureComputedValue() (Line: 114)
Drupal\metatag\Plugin\Field\MetatagEntityFieldItemList->isEmpty() (Line: 162)
Drupal\Core\TypedData\Validation\RecursiveContextualValidator->validateNode(Object) (Line: 164)
Drupal\Core\TypedData\Validation\RecursiveContextualValidator->validateNode(Object, Array, 1) (Line: 106)
Drupal\Core\TypedData\Validation\RecursiveContextualValidator->validate(Object, NULL, NULL) (Line: 93)
Drupal\Core\TypedData\Validation\RecursiveValidator->validate(Object) (Line: 132)
Drupal\Core\TypedData\TypedData->validate() (Line: 518)
Drupal\Core\Entity\ContentEntityBase->validate() (Line: 188)
Drupal\Core\Entity\ContentEntityForm->validateForm(Array, Object)
call_user_func_array(Array, Array) (Line: 82)
Drupal\Core\Form\FormValidator->executeValidateHandlers(Array, Object) (Line: 275)
Drupal\Core\Form\FormValidator->doValidateForm(Array, Object, 'node_page_edit_form') (Line: 118)
Drupal\Core\Form\FormValidator->validateForm('node_page_edit_form', Array, Object) (Line: 593)
Drupal\Core\Form\FormBuilder->processForm('node_page_edit_form', Array, Object) (Line: 325)
Drupal\Core\Form\FormBuilder->buildForm(Object, Object) (Line: 73)
Drupal\Core\Controller\FormController->getContentResult(Object, Object) (Line: 39)
Drupal\layout_builder\Controller\LayoutBuilderHtmlEntityFormController->getContentResult(Object, Object)
call_user_func_array(Array, Array) (Line: 123)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() (Line: 627)
Drupal\Core\Render\Renderer->executeInRenderContext(Object, Object) (Line: 124)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->wrapControllerExecutionInRenderContext(Array, Array) (Line: 97)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() (Line: 181)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 76)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 58)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 106)
Drupal\page_cache\StackMiddleware\PageCache->pass(Object, 1, 1) (Line: 85)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 36)
Drupal\Core\StackMiddleware\AjaxPageState->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\StackedHttpKernel->handle(Object, 1, 1) (Line: 704)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)

Proposed resolution

Options:

  1. Strip it out again at the line again
  2. Sort out why the DOM parser has an issue with appending &nbsp; HTML entities

Issue fork toc_api-3412644

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

joelpittet created an issue. See original summary.

joelpittet’s picture

Issue summary: View changes
Status: Active » Needs review
joelpittet’s picture

Issue summary: View changes
joelpittet’s picture

Further investigation is that it seems to be related to
Html::normalize()
https://www.drupal.org/node/2441811

joelpittet’s picture

https://stackoverflow.com/a/9760247/80281

Use DOMDocument::loadHTMLFile() instead of load(). That's what it has been made for. HTML is not XML.

XML does not know the named entity  . However if you use loadHTML, the XML parser will get the HTML named entities loaded so the error goes away.

See as well: XML parser error: entity not defined.

I think this makes sense, that masterminds/html5 has been in there I believe since Drupal 8.0 release because I remember solving problems with it in the tests prior to twig being included in core.

joelpittet’s picture

I'm not a fan of the DOMDocument API, so my apologies for being verbose. If only there was an appendHTML() this would have been a 1 liner.

joelpittet’s picture

I do believe it's ready to review though, this seems to work and will work in both D10.2 and earlier

alan delval’s picture

Tested but didn't work. It fails silently, headers on nodes text are missing.

My workaround to avoid filling logs, is decoding nbsp to " ".

szy’s picture

The same as #9 - my headers (h2 tags) in text are lost.

The latest Drupal, the latest ToC combo.

Szy.

ericgsmith’s picture

I am curious about the reported failures - I tested and have not be able to reproduce the headings getting lost. Is anybody able to share sample HTML they used when the headers go lost & config?

It looks good to me, resolves the errors reported and I can still see the headings in the text.

szy’s picture

@ericgsmith, are you leaving at least one nbsp at the end of any h2? :)

This is a condition, iirc.

Szy.

hephaestus’s picture

Can confirm that we encounterd this issue (in our case, an &nbsp; at the start of a header) and the commit in the MR (1c95fbbb) resolved the issue.

We didn't experience the problem Matthew reported in the MR comment.

kiwad’s picture

Status: Needs review » Needs work

I had same result as Szy.

After applying patch, h2 were vanishing from text

Liam Morland made their first commit to this issue’s fork.

liam morland’s picture

Status: Needs work » Needs review

I have put on the merge request a fix that uses html_entity_decode(). This is fixing the error messages and not removing the h2 elements.

liam morland’s picture

StatusFileSize
new954 bytes

This is the changed mentioned in #16 as a patch.

liam morland’s picture

Status: Needs review » Needs work

In further testing, I found this approach does not work because it outputs characters instead of entities for something that need to be entities in XML.

It could be fixed by just replacing &nbsp; with a UTF-8 non-breaking space character. The problem could come up again for a different character.

There could be code that finds all the entities in the text, if they are entities in XML, skip them, for the others, use html_entity_decode() to replace them with their character.

ericgsmith’s picture

@ericgsmith, are you leaving at least one nbsp at the end of any h2? :)

Yes - hence saying that the patch resolved this issue.

I am curious - people reporting the headers are missing - do you have a custom toc-header.html.twig file that adds additional elements to the header? There is a bug in the original patch in that it expects only 1 root level element in the header, I have been able to reproduce HTML being removed with a custom template that was adding additional elements in the header element.

*edit* - have checked again and theme dubug mode can also trigger this behaviour as the first node is the html comment.

I am going to open a new MR with some changes to the original approach, I don't want to revert the changes made to the original in case the issue still persists for other - but I think if there are 2 different solutions for how to handle an issue, we should use 2 different MRs rather than reverting previous work.

ericgsmith’s picture

Status: Needs work » Needs review
StatusFileSize
new2.01 KB

Ok, I have opened MR12 for reasons described above (a change in approach IMO should be done in a new MR so I didn't want to revert the approach introduced MR11)

This goes back to the original approach from @joelpittet and includes a minor change to account for instances where the toc header may return more than 1 child element.

I believe the issue some described with missing text was caused by either theme debug being enabled, or by having a template what introduced additional elements prior to the heading.

smulvih2’s picture

#21 works for me on 1.4.0

nojj’s picture

for us the patch#21 seems to solve the problem too.
would be nice, to have an update of the module.

aaronpinero’s picture

I can also confirm that patch#21 seems to solve the problem in Drupal 10.2.5

jurgenhaas’s picture

Status: Needs review » Reviewed & tested by the community

Patch from #21 fixes the issue for us with Drupal 10.2.5 as well.

marcoka’s picture

I can confirm #21 works here too with Drupal 10.2.4

jrockowitz made their first commit to this issue’s fork.

  • jrockowitz committed 2bf377e9 on 8.x-1.x authored by ericgsmith
    Issue #3412644: Whitespace HTML entities break DOM parsing in Drupal 10....
jrockowitz’s picture

Status: Reviewed & tested by the community » Fixed

#21 also works for me.

jrockowitz’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.