Problem/Motivation

When invoking $this->getTextContent() or $this->assertSession()->pageTextContains() on BrowserTestBase then the plain text page content is not correct. The text content of <style> and <script> tags is included in the page text, but it should only contain text that a human can see on the page.

The problem is that Mink just uses PHP's DOMDocument extension to get the nodeValue from the document, which extracts the text content of all tags regardless of their meaning.

Proposed resolution

Override Mink's Session class to return our own DocumentElement when the page content is requested. That way we can filter out styles and scripts when the text version of a page is requested. This also fixes the problem for assertSession()->pageTextContains() and getTextContent() in one place and we can avoid overriding the Mink driver class.

Remaining tasks

Patch.

User interface changes

None.

API changes

None.

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

klausi created an issue. See original summary.

klausi’s picture

dawehner’s picture

I'm a little bit worried with changing behaviour of mink, given that this makes potential mink documentation invalid.
On the other hand I totally agree that the page text should not contain CSS styles nor JS settings.

+++ b/core/tests/Drupal/Tests/DocumentElement.php
@@ -0,0 +1,46 @@
+      $remove[] = $script;
...
+      $remove[] = $style;
...
+      $item->parentNode->removeChild($item);

Is there a reason we don't just remove the elements as we go? One could also do a array_merge($scripts, $styles) to skip 2 of the 3 foreach loops

klausi’s picture

The problem is that $scripts and $styles are not arrays, but traversable DOMNodeList objects. As explained in the comment DOMNodeList behaves weirdly :-(

dawehner’s picture

The problem is that Mink just uses PHP's DOMDocument extension to get the nodeValue from the document, which extracts the text content of all tags regardless of their meaning.

Is this actually an upstream issue we are aware of?

The problem is that $scripts and $styles are not arrays, but traversable DOMNodeList objects. As explained in the comment DOMNodeList behaves weirdly :-(

Ah I see, so we could use iterator_to_array(), as that, obviously, takes a traversable.

klausi’s picture

Good idea with iterator_to_array(), implemented that!

I don't know of any upstream issue in Mink about this yet, but it might be worth to create one.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

Ah yeah this is indeed now looking much nicer

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 6: plain-page-2784427-6.patch, failed testing.

klausi’s picture

Status: Needs work » Reviewed & tested by the community

Random test fail, back to RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 6: plain-page-2784427-6.patch, failed testing.

klausi’s picture

Status: Needs work » Needs review
FileSize
4.55 KB

Rerolled, no other changes.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

That's a pure bug fix, this should totally be possible for 8.2.x and 8.3.x

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 11: plain-page-2784427-11.patch, failed testing.

SuWagner’s picture

Assigned: Unassigned » SuWagner
SuWagner’s picture

Assigned: SuWagner » Unassigned

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

jofitz’s picture

Status: Needs work » Needs review
FileSize
4.09 KB

Re-rolled.

Status: Needs review » Needs work

The last submitted patch, 17: plain-page-2784427-17.patch, failed testing.

jofitz’s picture

Status: Needs work » Needs review
FileSize
1013 bytes
4.87 KB

Ensure BrowserTestBase instantiates the new Drupal\Tests\Session rather than a Behat\Mink\Session.

The last submitted patch, 17: plain-page-2784427-17.patch, failed testing.

klausi’s picture

Status: Needs review » Needs work
+++ b/core/tests/Drupal/Tests/WebAssert.php
@@ -25,8 +24,8 @@ class WebAssert extends MinkWebAssert {
+   * @param Session $session
+   *   The session object;

we need a fully qualified name here for Session.

Otherwise looks good, thanks!

klausi’s picture

+++ b/core/tests/Drupal/Tests/Session.php
@@ -0,0 +1,43 @@
+  public function __construct(DriverInterface $driver, SelectorsHandler $selectorsHandler = NULL) {
+      parent::__construct($driver, $selectorsHandler);
+
+      // Use our own document class.
+      $this->page = new DocumentElement($this);
+  }

indentation errors as reported by the testbot, use only 2 spaces instead.

jofitz’s picture

Status: Needs work » Needs review
FileSize
1.13 KB
4.88 KB
  • Added fully qualified name for Session.
  • Corrected indentation.
klausi’s picture

Status: Needs review » Needs work

Sorry, one more thing:

+++ b/core/tests/Drupal/Tests/DocumentElement.php
@@ -0,0 +1,37 @@
+use DOMDocument;

We don't create use statements for classes in the global namespace, see https://www.drupal.org/docs/develop/coding-standards/namespaces . Use \ directly in code for that class instead.

jofitz’s picture

Status: Needs work » Needs review
FileSize
679 bytes
4.86 KB

You mean the use statement you added in #2? :p

klausi’s picture

Status: Needs review » Reviewed & tested by the community

hihihi, yes, I like finding issues in my own work. Looks good now!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

In simpletest we strip the head content...

  /**
   * Retrieves the plain-text content from the current raw content.
   */
  protected function getTextContent() {
    if (!isset($this->plainTextContent)) {
      $raw_content = $this->getRawContent();
      // Strip everything between the HEAD tags.
      $raw_content = preg_replace('@<head>(.+?)</head>@si', '', $raw_content);
      $this->plainTextContent = Xss::filter($raw_content, []);
    }
    return $this->plainTextContent;
  }

See #2546582: AssertContentTrait::getTextContent() includes the @import css statement, that is nuts

I don't think pageTextContains should find title text either - for example.

jofitz’s picture

Status: Needs work » Needs review
FileSize
1.8 KB
5.01 KB

Removed the head tag within getText().

Status: Needs review » Needs work

The last submitted patch, 28: plain-page-2784427-28.patch, failed testing.

michielnugter’s picture

Issue tags: +phpunit initiative

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Tess Bakker’s picture

Assigned: Unassigned » Tess Bakker
Tess Bakker’s picture

Issue tags: +DevDaysLisbon
Tess Bakker’s picture

Discussed this with LenDude and we don't see the need to extend Mink in this way.

New features in Mink can break now, also it will break \Drupal\Tests\WebAssert::titleEquals()

Tess Bakker’s picture

Assigned: Tess Bakker » Unassigned
dawehner’s picture

@Tessa Bakker
Do I understand it correctly that you decided to keep the existing behaviour of mink? I'd be totally fine with that, especially as this means less code for us to maintain. Do you want to mark the issue as fixed?

Tess Bakker’s picture

@dawehner, Yes, I think it would be better to leave this 'as is' and mark this issue as 'won't fix'. If a test is successful because of (inline) JS or CSS, then the test was probably wrong ;)

dawehner’s picture

Status: Needs work » Fixed

Nice!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

alexpott’s picture