Problem/Motivation

I have a test for a custom module that does

$assert->pageTextNotContains('SBS');

This fails randomly if the theme token generated for drupal settings contains the string SBS. Unfortunately, \Behat\Mink\Element\Element::getText will return the text included inside the script tags in the body. The problematic one is:

<script type="application/json" data-drupal-selector="drupal-settings-json">{"path":{"baseUrl":"\/","scriptPath":null,"pathPrefix":"en\/","currentPath":"admin\/telus","currentPathIsAdmin":true,"isFront":false,"currentLanguage":"en"},"pluralDelimiter":"\u0003","suppressDeprecationErrors":true,"ajaxPageState":{"libraries":"admin_toolbar\/toolbar.tree,admin_toolbar_search\/search,admin_toolbar_tools\/toolbar.icon,classy\/base,classy\/messages,core\/drupal.active-link,core\/html5shiv,core\/normalize,seven\/global-styling,telus_toolbar\/toolbar,toolbar\/toolbar,toolbar\/toolbar.escapeAdmin,toolbar_menu\/icons,user\/drupal.user.icons","theme":"seven","theme_token":"qmmJrXPeEdpihFizSBSnD-kZanHo-4WzDf3L2F63puk"},"ajaxTrustedUrl":[],"toolbar":{"breakpoints":{"toolbar.narrow":"only screen and (min-width: 16.5em)","toolbar.standard":"only screen and (min-width: 38.125em)","toolbar.wide":"only screen and (min-width: 61em)"},"subtreesHash":"6pedNI4W-Gv5uhLYRwkHwC5MGZxlWKDbctxk3jJqFuM"},"adminToolbarSearch":{"loadExtraLinks":true},"user":{"uid":"2","permissionsHash":"d8e278451f423cb7d81f32499f7dc040316897215570611b7c41419636a2f629"}}</script>

as it contains a number of cryptographic strings that can contain any set of characters.

Steps to reproduce

Proposed resolution

I think this requires an upstream fix. I'll check Behat / Symfony's crawler for issues. The problem is that eventually this is getting the \DOMNode::$nodeValue for the page - see we're talking about core PHP functionality - see \Symfony\Component\DomCrawler\Crawler::text()

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3175718

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alexpott created an issue. See original summary.

alexpott’s picture

Issue summary: View changes
alexpott’s picture

I think if we raise this upstream they're going to point out that doing document.body.textContent in the browser console will return include the drupal settings json too.

alexpott’s picture

alexpott’s picture

Priority: Normal » Major

This almost certainly causes core random fails. In \Drupal\Tests\media_library\FunctionalJavascript\MediaOverviewTest::testAdministrationPage() we do $assert_session->pageTextNotContains('Dog');. Bumping priority as these types of random fails confuse and waste contributors time.

longwave’s picture

Maybe we just need to discourage pageTextContains() and be more specific with elementTextContains()?

alexpott’s picture

@longwave yeah that'd help but when thinking about contrib / custom - the mind boggles. Also we have over 1000 pageTextContains/pageTextNotContains in core.

jonathanshaw’s picture

I was wondering if anyone had raised a mink issue for this, but look what I found: #2784427: Browser tests: Plain page text contains CSS styles and JS settings, complete with patch from @klausi.

longwave’s picture

Nice find, @alexpott even commented on that :)

https://www.drupal.org/files/issues/plain-page-2784427-28.patch looks close to what we are asking for here, should we just revive that issue?

longwave’s picture

alexpott’s picture

So this is almost certainly an upstream bug in https://github.com/minkphp/MinkBrowserKitDriver. Here's why:

I'm going to open an issue and see what happens.

alexpott’s picture

alexpott’s picture

FWIW the fact that \Behat\Mink\Driver\Selenium2Driver works as expected means that #5 is incorrect.

alexpott’s picture

We have

    $assert_session->pageTextContains('ID');
    $assert_session->pageTextContains('1');

\Drupal\Tests\layout_builder\Functional\LayoutBuilderTest::testLayoutBuilderUi() - the second of those is unlikely to ever fail.

alexpott’s picture

Here's proof of our problem. \Drupal\Tests\node\Functional\NodeTypeTest::testNodeTypeEditing will fail.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

jungle’s picture

Status: Active » Needs review
FileSize
1.87 KB
1.39 KB

>Maybe we just need to discourage pageTextContains() and be more specific with elementTextContains()?

How about a shorthand wrapping up elementTextContains() to check against the body element explicitly, and replace pageTextContains(), pageTextNotContains() usages in core further?

For example: naming them pageBodyTextContains() and pageBodyTextNotContains()

jungle’s picture

+++ b/core/tests/Drupal/Tests/WebAssert.php
@@ -897,6 +897,26 @@ public function pageTextNotContains($text) {
+   *  Checks that the body element does not contains text.

Self-review: should be `does not contain`, not `does not contains`.

jungle’s picture

FileSize
1.89 KB
1.88 KB
jungle’s picture

Or assuming calling pageTextContains() is to check against the text of body. Then inside pageTextContains(), calls elementTextContains()

The last submitted patch, 20: 3175718-20.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 21: 3175718-21.patch, failed testing. View results

mondrake made their first commit to this issue’s fork.

mondrake’s picture

Priority: Major » Critical

@jungle the drupal-settings-json is part of a <script> tag that is contained in the <body>, so #21 is insufficient.

I feel we need to revive the approach to override the Session and its getPage method from #2784427: Browser tests: Plain page text contains CSS styles and JS settings, and cleanup the script tags.

I propose to bump to critical because now WebAssert::pageTextContains having replaced AssertLegacyTrait::assertText, we may have false negatives. It's also blocking replacement of AssertLegacyTrait::assertNoText in #3191935: Replace usages of AssertLegacyTrait::assertNoText, which is deprecated.

mondrake’s picture

Status: Needs work » Needs review

Instead of swapping the entire Session class like in #2784427: Browser tests: Plain page text contains CSS styles and JS settings, patch from MR is just cleaning up the page text from contents of the script tags before processing the WebAssert checks.

longwave’s picture

Interesting set of fails!

Worth adding some unit tests for that private method?

longwave’s picture

Status: Needs review » Needs work
mondrake’s picture

Assigned: Unassigned » mondrake

The script cleanup does not work as expected, that may explain some fails. On it.

mondrake’s picture

Assigned: mondrake » Unassigned

Yes, interesting failures...

mondrake’s picture

mondrake’s picture

Uhm, at least one FunctionaJavascript failure is due to the text still being present on the page, BUT its style attribute being set to display: none. Changed one test to be more accurate anch check the attribute value rather thatn overall presence of text on the page.

Will need to check the remaining issues.

longwave’s picture

Maybe out of scope for this but in FunctionalJavascript tests can we ask the browser for all the visible text?

alexpott’s picture

> Maybe out of scope for this but in FunctionalJavascript tests can we ask the browser for all the visible text?
Yep we should be able to.

I still feel that this should be fixed upstream in https://github.com/minkphp/MinkBrowserKitDriver/issues/153

alexpott’s picture

I think we should consider a different fix. We could alias and copy \Behat\Mink\Element\DocumentElement and add getText method that does:

    public function getText() {
      // Trying to simulate what the user sees, given that it removes all text
      // inside the head tags, removes inline JavaScript, fix all HTML entities,
      // removes dangerous protocols and filtering out all HTML tags, as they are
      // not visible in a normal browser.
      $raw_content = preg_replace('@<head>(.+?)</head>@si', '', $this->getContent());
      $page_text = Xss::filter($raw_content, []);
    }
alexpott’s picture

Hmmm.... after looking at this at bit I've found that the legacy methods didn't properly remove drupal settings from the text content either.

      // Trying to simulate what the user sees, given that it removes all text
      // inside the head tags, removes inline JavaScript, fix all HTML entities,
      // removes dangerous protocols and filtering out all HTML tags, as they are
      // not visible in a normal browser.
      $raw_content = preg_replace('@<head>(.+?)</head>@si', '', $this->getSession()->getPage()->getContent());
      $page_text = Xss::filter($raw_content, []);

This bit from \Drupal\FunctionalTests\AssertLegacyTrait::assertText() / \Drupal\FunctionalTests\AssertLegacyTrait::assertNoText() did NOT remove the drupal settings json either.

jungle’s picture

FileSize
1.86 KB

As we have symfony/dom-crawler and symfony/css-selector in core, can we just use CSS selector to get rid of the script tag in the body with selector body :not(script) or both the script tag and style tag (if it has) with the selector body :not(script):not(style). For example:

$crawler = new Crawler($html);

// Default to empty string.
$text = '';

$crawler
  // Benefit from symfony/css-selector making CSS selector blow works.
  ->filter('body :not(script):not(style)')
  ->each(function (Crawler $node) use (&$text){
    // Set $normalizeWhitespace to FALSE to preserve spaces.
    $text .= $node->text(NULL, FALSE);
  });

var_dump($text);

See the attached file for the full demo.

mondrake’s picture

Status: Needs work » Reviewed & tested by the community

Looks good to me, @alexpott++

alexpott’s picture

@jungle - I'm concerned about performance for doing that. To do that in a performant way we're going to need to swap out \Behat\Mink\Driver\BrowserKitDriver and do stuff and I'm not sure this is correct. I think this fix really belongs upstream in https://github.com/minkphp/MinkBrowserKitDriver/issues/153 but a work around for this in \Drupal\Tests\DocumentElement is acceptable - swapping out BrowserKitDriver is way more fraught as the scope is much larger.

Also trying to implement and making \Drupal\Tests\DocumentElement::getText

 public function getText() {
    if ($this->getDriver() instanceof BrowserKitDriver) {
      $crawler = new Crawler(NULL, $this->getDriver()->getCurrentUrl());
      $crawler->addContent($this->getDriver()->getContent());
      // Remove inline Javascript and styles, especially Drupal settings json.
      $text = $crawler->filter('body :not(script)')->text();
      // To match \Behat\Mink\Element\Element::getText() include the page title.
      $title_element = $this->find('css', 'title');
      if ($title_element) {
        $text = $title_element->getText() . ' ' . $text;
      }
      // To match \Behat\Mink\Element\Element::getText() remove new lines and
      // normalize spaces.
      $text = str_replace("\n", ' ', $text);
      $text = preg_replace('/ {2,}/', ' ', $text);
      return trim($text);
    }

    // If using a real browser fallback to the \Behat\Mink\Element\Element
    // implementation.
    return parent::getText();
  }

And running \Drupal\FunctionalTests\BrowserTestBaseTest::testGoTo() shows that the selector 'body :not(script)' unfortunately does not work as hoped.

alexpott’s picture

I changed

-      // Use Xss::filter() to:
-      // - remove inline JavaScript
-      // - fix all HTML entities
-      // - remove dangerous protocols
-      // - filter out all HTML tags, as they are not visible in a normal browser.
-      $text = Xss::filter($raw_content, []);
+      // Filter out all HTML tags, as they are not visible in a normal browser.
+      $text = strip_tags($raw_content);

I think using Xss::filter() is unnecessary. We can call strip_tags() as that's what we want to do. The other features of Xss::filter() - fixing HTML entities / removing dangerous protocols I think get in the way of the test and shouldn't really be part of preparing the text output for browser testing.

jungle’s picture

>I'm concerned about performance for doing that

Got it.

> $text = $crawler->filter('body :not(script)')->text();

@alexpott, it only returns the first child's text of the body. So in my example, called ->each() further to traverse all its children and concat the text.

> And running \Drupal\FunctionalTests\BrowserTestBaseTest::testGoTo() shows that the selector 'body :not(script)' unfortunately does not work as hoped.

Probably, the above is the reason.

// Default to empty string.
$text = '';

$crawler
  // Benefit from symfony/css-selector making CSS selector blow works.
  ->filter('body :not(script):not(style)')
  ->each(function (Crawler $node) use (&$text){
    // Set $normalizeWhitespace to FALSE to preserve spaces.
    $text .= $node->text(NULL, FALSE);
  });
alexpott’s picture

@jungle yep you're correct... if I do

  public function getText() {
    if ($this->getDriver() instanceof BrowserKitDriver) {
      $crawler = new Crawler($this->getDriver()->getContent(), $this->getDriver()->getCurrentUrl());
      // Remove inline Javascript and styles, especially Drupal settings json.
      $text = '';
      $crawler->filter('body :not(script):not(style)')->each(function(Crawler $node) use (&$text) {
        $text .= ' ' . $node->text();
      });
      // To match \Behat\Mink\Element\Element::getText() include the page title.
      $title_element = $this->find('css', 'title');
      if ($title_element) {
        $text = $title_element->getText() . ' ' . $text;
      }
      // To match \Behat\Mink\Element\Element::getText() remove new lines and
      // normalize spaces.
      $text = str_replace("\n", ' ', $text);
      $text = preg_replace('/ {2,}/', ' ', $text);
      return trim($text);
    }

    // If using a real browser fallback to the \Behat\Mink\Element\Element
    // implementation.
    return parent::getText();
  }

That everything works as expected.

I still think the solution in the patch is okay until we can land a much better fix upstream.

longwave’s picture

The HTML5 spec lists what are considered "hidden elements": https://html.spec.whatwg.org/#hidden-elements

[hidden], area, base, basefont, datalist, head, link,
meta, noembed, noframes, param, rp, script, source, style, template, track, title {
  display: none;
}

Do we need to consider any of these tags as well? Or is that out of scope for this issue?

alexpott’s picture

@longwave I think we should solve the big issue at hand - ie. the inclusion of json and then try again upstream.

larowlan’s picture

Updating issue credits

  • larowlan committed 49eb106 on 9.3.x authored by mondrake
    Issue #3175718 by mondrake, alexpott, jungle, longwave: Random fails due...

  • larowlan committed f342570 on 9.2.x authored by mondrake
    Issue #3175718 by mondrake, alexpott, jungle, longwave: Random fails due...
larowlan’s picture

Version: 9.3.x-dev » 9.2.x-dev
Status: Reviewed & tested by the community » Fixed
Issue tags: +Bug Smash Initiative

Merged to 9.3.x

As this is a critical bug-fix, is test-code only and for testing system consistency, backported to 9.2.x

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

maosmurf’s picture

FWIW the change in f342570 has broken tests on my 9.2.1 after upgrade:

Error in bootstrap script: Error:
Class 'Behat\Mink\Element\TraversableElement' not found

Caused by tests/bootstrap.php

class_alias('\Drupal\Tests\DocumentElement', '\Behat\Mink\Element\DocumentElement', TRUE);

because I had no mink installed and therefore \Drupal\Tests\DocumentElement could not extend \Behat\Mink\Element\TraversableElement.

Fixed by composer require --dev behat/mink locally into the project.

chegor’s picture

#51 helped me.

capysara’s picture

Thanks @maosmurf! I needed that.

claudiu.cristea’s picture

I have the same problem in my project with Behat tests and I badly need this class alias. First I thought it could alias in the project's composer.json, in autoload-dev.files. That works for Behat tests but when I'm running the kernel/functional tests, I'm, obviously, getting:

Warning: Cannot declare class \Behat\Mink\Element\DocumentElement, because the name is already in use in web/core/tests/bootstrap.php on line 157

Would it be possible to:

  • EITHER move the class aliasing in core/composer.json, under autoload-dev.files (place it in a file)
  • OR in web/core/tests/bootstrap.php check if the class has been aliased and only do the alias if not
fkelly12054@gmail.com’s picture

After upgrading from 9.5.4 to 9.5.5 (including fixing some composer validation errors in an attempt to lay the foundation for 10.0.5) I started to encounter the error listed here:

Error : Class "Behat\Mink\Driver\BrowserKitDriver" not found

Looking in my vendor directory, behat only had the geckodriver.exe program and no mink subdirectory. I ran the fix listed in #51 and now I have a mink subdirectory under the behat directory. There is a new line in composer.json after the require:

"require-dev": {
"behat/mink": "^1.10"

However, I still get the

Error : Class "Behat\Mink\Driver\BrowserKitDriver" not found message when doing PHPunit testing under PHPstorm. I have been running those tests successfully for several months trying to fix tests in a contrib module.

I can see that the patch listed in #47 has been applied. Looking on my hosted production system (where I don't run tests or need Behat) there is no behat directory at all under vendor.

Since it's been 8 months since the last post in this thread, I am wondering if this issue just resurfaced between 9.5.4 and 9.5.5. There has never been a response to #54 nor do I have any way to know if it would fix my immediate problem.

Edit: I don't think that behat/mink:^1.10 is the correct version. Researching that further.