Problem/Motivation

I am working on a site which requires quite a specific and detailed element/attribute whitelist. My "allowed_html" setting is 1200 characters long. When updating this, it's impossible to tell from a diff what changed. In addition to this, we've found that changes manually made to the attributes and classes are reverted, chopped and changed by Drupal.behaviors.filterFilterHtmlUpdating erroneously, possibly from a bug in the JS or one of the filters we have installed.

It would be great if the schema that stored these settings provided a better diff.

Proposed resolution

Update the schema to look something like:

    allowed_html:
      type: sequence
      label: 'Allowed HTML'
      sequence:
        type: sequence
        label: 'Attributes'
        sequence:
          type: string
          label: 'Value'

Which would transform settings like:

allowed_html: '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type>'

Into:

      allowed_html:
        a:
          href: '*'
          hreflang: '*'
        em: {  }
        strong: {  }
        cite: {  }
        blockquote:
          cite: '*'
        code: {  }
        ul:
          type: '*'
        ol:
          start: '*'
          type: '*'

Remaining tasks

Validate & see if this is something we can achieve without breaking BC.

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#93 Screenshot from 2024-03-14 13-16-53.png104.3 KBacbramley
#74 2871354-filter_html-schema-74.patch43.72 KBSam152
#74 interdiff.txt3.16 KBSam152
#72 2871354-filter_html-schema-72.patch50.92 KBSam152
#72 interdiff.txt4.6 KBSam152
#70 2871354-filter_html-schema-70.patch52.02 KBSam152
#70 interdiff.txt23.91 KBSam152
#68 2871354-filter_html-schema-68.patch48.88 KBSam152
#66 2871354-filter_html-schema-66.patch48.84 KBSam152
#46 interdiff.txt1.22 KBSam152
#46 2871354-filter_html-schema-46.patch48.8 KBSam152
#42 2871354-filter_html-schema-42.patch47.93 KBSam152
#42 interdiff.txt1013 bytesSam152
#40 2871354-filter_html-schema-40.patch47.93 KBSam152
#40 interdiff.txt3.8 KBSam152
#37 2871354-filter_html-schema-37.patch47.03 KBSam152
#37 interdiff.txt528 bytesSam152
#36 2871354-filter_html-schema-36.patch47.28 KBSam152
#36 interdiff.txt1.89 KBSam152
#34 2871354-filter_html-schema-34.patch48.12 KBSam152
#34 interdiff.txt4.73 KBSam152
#31 2871354-filter_html-schema-31.patch47.67 KBSam152
#31 interdiff.txt17.68 KBSam152
#23 2871354-filter_html-schema-23.patch47.36 KBSam152
#23 interdiff.txt1.27 KBSam152
#22 2871354-filter_html-schema-22.patch47.25 KBSam152
#22 interdiff.txt4.06 KBSam152
#20 2871354-filter_html-schema-20.patch46.78 KBSam152
#20 interdiff.txt463 bytesSam152
#18 2871354-filter_html-schema-18.patch46.76 KBSam152
#18 interdiff.txt4.59 KBSam152
#16 2871354-filter_html-schema-16.patch43.54 KBSam152
#16 interdiff.txt7.47 KBSam152
#14 2871354-filter_html-schema-14.patch38.38 KBSam152
#14 interdiff.txt3.46 KBSam152
#12 2871354-filter_html-schema-12.patch35.66 KBSam152
#12 interdiff.txt14.87 KBSam152
#10 2871354-filter_html-schema-10.patch28.39 KBSam152
#8 2871354-filter_html-schema-8.patch10.75 KBSam152
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Sam152 created an issue. See original summary.

Sam152’s picture

Component: ckeditor.module » editor.module
Wim Leers’s picture

dawehner’s picture

In case someone needs help implementing this (and writing the update function/tests), please ping me, I'd so like to see this landing.

Wim Leers’s picture

Thanks, @dawehner!

Sam152’s picture

This setting had a lot more touch points with things like tests, migrations, clientside code and existing config then the related issue. Starting on that one first to see how far it gets.

Sam152’s picture

Issue summary: View changes

I've been going through a few iterations of designing this schema. It turns out it's very helpful to split out the attribute name and value when parsing the string into the schema, so that this information doesn't have to be put back into a DOM node and parsed down the track when calculating getHTMLRestrictions(). For that reason it had to be altered a bit.

Here is one which seemed to work, but the issue was it also seems quite verbose:

From:

<a href hreflang> <em class="foo" test="*"> <test data-*> <foo *> <strong> <cite>

To:

    settings:
      allowed_html:
        -
          element: a
          attributes:
            -
              attribute: href
            -
              attribute: hreflang
        -
          element: em
          attributes:
            -
              attribute: class
              value: foo
            -
              attribute: test
              value: '*'
        -
          element: test
          attributes:
            -
              attribute: 'data-*'
        -
          element: foo
          attributes:
            -
              attribute: '*'
        -
          element: strong
        -
          element: cite

The other option would be to use the tag name and attribute name as the key for the sequences. I liked the result of this a lot better:

    settings:
      allowed_html:
        a:
          href: '*'
          hreflang: '*'
        em:
          test: '*'
          class: foo
        test:
          'data-*': '*'
        foo:
          '*': '*'
        strong: {  }
        cite: {  }

Since no tag or attribute should be repeated, I think this works well. Going to continue with it.

Sam152’s picture

Assigned: Unassigned » Sam152
Status: Active » Needs review
Issue tags: +Needs tests
FileSize
10.75 KB

Uploading to see which tests break and for any feedback on the schema and approach thus far. Still needs tests and upgrade path.

Status: Needs review » Needs work

The last submitted patch, 8: 2871354-filter_html-schema-8.patch, failed testing.

Sam152’s picture

Assigned: Sam152 » Unassigned
Status: Needs work » Needs review
FileSize
28.39 KB

Some more progress.

Status: Needs review » Needs work

The last submitted patch, 10: 2871354-filter_html-schema-10.patch, failed testing.

Sam152’s picture

Status: Needs work » Needs review
FileSize
14.87 KB
35.66 KB

Some more test fixes. Since migrate needed the transformation and the update hook was using reflection, I think it makes sense to just move the array => string transformation into its own class. Benefit being they can also be unit tested.

Status: Needs review » Needs work

The last submitted patch, 12: 2871354-filter_html-schema-12.patch, failed testing.

Sam152’s picture

Status: Needs work » Needs review
FileSize
3.46 KB
38.38 KB

Status: Needs review » Needs work

The last submitted patch, 14: 2871354-filter_html-schema-14.patch, failed testing.

Sam152’s picture

Status: Needs work » Needs review
FileSize
7.47 KB
43.54 KB

Still need to figure out a way to set default allowed_html without re-adding the elements them every time the filter is saved. When it was a single string, the value in the plugin configuration was always overriding the default. Now that's it's a mapping the whole structure is merged in, as you would expect for typical nested settings.

Sam152’s picture

I've been looking into how to set some defaults for the allowed_html value without forcing the defaults to always appear. The conflict is in FilterPluginCollection::initializePlugin:

    if (isset($this->configurations[$instance_id])) {
      $configuration = NestedArray::mergeDeep($configuration, $this->configurations[$instance_id]);
    }

The plugin doesn't have a chance to act before being created, the plugin collection does a mergeDeep in the settings from the annotation into the saved configuration.

Sam152’s picture

An approach to dealing with the default value problem.

Status: Needs review » Needs work

The last submitted patch, 18: 2871354-filter_html-schema-18.patch, failed testing.

Sam152’s picture

Status: Needs work » Needs review
FileSize
463 bytes
46.78 KB

Every time :)

Status: Needs review » Needs work

The last submitted patch, 20: 2871354-filter_html-schema-20.patch, failed testing.

Sam152’s picture

Status: Needs work » Needs review
FileSize
4.06 KB
47.25 KB

Fails were tracked down to FilterPluginCollection. The kind of funky thing going on is the initial default config is merged based on the plugin definition (FilterPluginCollection::initializePlugin), then later removed from the collection if the config matches the method call (FilterPluginCollection::getConfiguration), so this is a bit of a work around to not use NestedArray::mergeDeep for configuration, but still have an array for default configuration.

Sam152’s picture

dawehner’s picture

+++ b/core/modules/filter/config/schema/filter.schema.yml
@@ -51,8 +51,14 @@ filter_settings.filter_html:
     allowed_html:
-      type: string
+      type: sequence
       label: 'Allowed HTML'
+      sequence:
+        type: sequence
+        label: 'Attributes'
+        sequence:
+          type: string
+          label: 'Value'

Is there any way how we could add some constrains to validate these a bit more.

Sam152’s picture

I don't know if we could start being more restrictive with the storage, because what we're upgrading from is essentially a string with no schema at all. We are emulating the way that string was parsed for the purposes of restricting HTML, but instead of doing some filtering we're storing that structure in the config object. This ensures the settings we're storing will behave in an exact BC fashion once they've been upgraded from the old string.

We could add extra validations to the form, but those validations wouldn't be blocked by this issue. The current UI is very permissive so we could do that work without changing the schema at all.

dawehner’s picture

I don't know if we could start being more restrictive with the storage, because what we're upgrading from is essentially a string with no schema at all.

I try to understand why filter.install doesn't resolve it.

Sam152’s picture

I was going to try catch you on on IRC to discuss. Perhaps you could show me what you mean by #26. Not sure I follow.

Wim Leers’s picture

Status: Needs review » Needs work
Issue tags: -Needs upgrade path, -Needs update path tests, -Needs tests +Needs change record

Sorry for the delay in reviewing this, I somehow lost track of this after DrupalCon! And wow, very solid work: impressive eye to detail!

#7: thanks for sharing your thought process! So very important :) I agree using the elements as keys makes sense.

#16 + #17: this reminds me of #2361539: Config export key order is not predictable for sequences, add orderby property to config schema — see the CR at https://www.drupal.org/node/2852566.

Patch review time!

  1. +++ b/core/modules/ckeditor/tests/modules/src/Kernel/CKEditorTest.php
    @@ -49,7 +49,22 @@ protected function setUp() {
    +              'id' => [
    +                'id' => '*',
    +              ],
    

    The top-level key here should be 'h2'.

  2. +++ b/core/modules/filter/filter.install
    @@ -0,0 +1,35 @@
    +function filter_update_dependencies() {
    +  // Run the system install hook to add extra attributes to HTML tags before
    +  // updating the schema.
    +  $dependencies['filter'][8001] = [
    +    'system' => 8009,
    +  ];
    +}
    

    Wow, I didn't even know this was a thing!

  3. +++ b/core/modules/filter/filter.install
    @@ -0,0 +1,35 @@
    +      $updated_allowed_html = FilterHtmlAllowedMarkupSchema::generateSettings($allowed_html);
    

    Note: in principle we don't let update functions call external functions, because there's a risk that those external (non-update) functions change over time, and hence break the update path.

    In this case, that seems impossible, because it's literally about a 1:1 transformation from one representation to another.

  4. +++ b/core/modules/filter/src/FilterHtmlAllowedMarkupSchema.php
    @@ -0,0 +1,80 @@
    +class FilterHtmlAllowedMarkupSchema {
    

    I'm wondering if we should these static methods be moved into \Drupal\filter\Plugin\Filter\FilterHtml?

  5. +++ b/core/modules/filter/src/FilterHtmlAllowedMarkupSchema.php
    @@ -0,0 +1,80 @@
    +   * Generate an array to represent the allowed HTML settings.
    

    Nit: s/Generate/Generates/

  6. +++ b/core/modules/filter/src/FilterHtmlAllowedMarkupSchema.php
    @@ -0,0 +1,80 @@
    +  public static function generateSettings($allowed_html_string) {
    

    I'd rename this to convertAllowedHtmlStringToArray().

    (Because we're not generating anything.)

  7. +++ b/core/modules/filter/src/FilterHtmlAllowedMarkupSchema.php
    @@ -0,0 +1,80 @@
    +      if ($node->hasAttributes()) {
    +        // Iterate over any attributes, and mark them as allowed.
    +        foreach ($node->attributes as $name => $attribute) {
    +          // Put back any trailing * on wildcard attribute name.
    +          $name = str_replace($star_protector, '*', $name);
    +          $value = str_replace($star_protector, '*', $attribute->value);
    +          if (empty($value)) {
    +            $value = '*';
    +          }
    +          $allowed_html_settings[$tag][$name] = $value;
    +        }
    +      }
    

    This portion deviates from the existing code in HEAD in \Drupal\filter\Plugin\Filter\FilterHtml::getHTMLRestrictions(). Why?

    EDIT: oh, because there's a different need for getHTMLRestrictions(): that is complying with an API that expects a certain return value. By changing the allowed_html setting from a string to an actual key-value map, our transformation needs match the needs of getHTMLRestrictions() almost exactly. The difference lies in those final details. getHTMLRestrictions() can itself then become a lot simpler, because it won't need to parse/transform a string anymore!

  8. +++ b/core/modules/filter/src/FilterHtmlAllowedMarkupSchema.php
    @@ -0,0 +1,80 @@
    +  public static function generateString($allowed_html_settings) {
    

    I'd rename this to convertAllowedHtmlArrayToString().

    (Because we're not generating anything.)

  9. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -20,7 +21,7 @@
    - *     "allowed_html" = "<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type='1 A I'> <li> <dl> <dt> <dd> <h2 id='jump-*'> <h3 id> <h4 id> <h5 id> <h6 id>",
    + *     "allowed_html" = TRUE,
    
    @@ -82,6 +105,53 @@ public function setConfiguration(array $configuration) {
    +  public function defaultConfiguration() {
    +    $default_configuration = parent::defaultConfiguration();
    +    $default_configuration['settings']['allowed_html'] = [
    +      'a' => [
    +        'href' => '*',
    +        'hreflang' => '*',
    +      ],
    +      'em' => [],
    +      'strong' => [],
    +      'cite' => [],
    +      'blockquote' => [
    +        'cite' => '*',
    +      ],
    +      'code' => [],
    +      'ul' => [
    +        'type' => '*',
    +      ],
    +      'ol' => [
    +        'start' => '*',
    +        'type' => '1 A I',
    +      ],
    +      'li' => [],
    +      'dl' => [],
    +      'dt' => [],
    +      'dd' => [],
    +      'h2' => [
    +        'id' => 'jump-*',
    +      ],
    +      'h3' => [
    +        'id' => '*',
    +      ],
    +      'h4' => [
    +        'id' => '*',
    +      ],
    +      'h5' => [
    +        'id' => '*',
    +      ],
    +      'h6' => [
    +        'id' => '*',
    +      ],
    +    ];
    +    return $default_configuration;
    +  }
    

    I don't understand this. Why can't the annotation list the actual default setting?

  10. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -65,14 +70,32 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // The allowed_html setting is stored as an array of elements and
    +    // attributes. The default annotation configuration for a plugin is merged
    +    // into the actual configuration. So that we can support a default
    +    // configuration that isn't merged and thus certain elements forced to be
    +    // enabled, only set the configuration when allowed_html is TRUE as
    +    // specified in the annotation.
    +    if ($configuration['settings']['allowed_html'] === TRUE) {
    +      $configuration['settings']['allowed_html'] = $this->defaultConfiguration()['settings']['allowed_html'];
         }
    

    I don't understand this either. "So that we can support a default configuration that isn't merged" -> what does this mean? Is this for BC reasons?

  11. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -251,33 +321,16 @@ public function getHTMLRestrictions() {
    +      // If no attributes are specified, the element is allowed, but with
    +      // no attributes.
    +      if (empty($element_attributes)) {
    +        $restrictions['allowed'][$element_name] = FALSE;
           }
    
    @@ -285,25 +338,21 @@ public function getHTMLRestrictions() {
    -      else {
    -        // Mark the tag as allowed, but with no attributes allowed.
    -        $restrictions['allowed'][$tag] = FALSE;
    -      }
    

    Nit: If we'd change this to if (!empty(…, then we could keep the original code order, and the diff would be slightly easier to read. No strong opinion though, if you prefer this way, that's okay.

  12. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -251,33 +321,16 @@ public function getHTMLRestrictions() {
    +          // Parse the allowed attribute values.
    +          $allowed_attribute_values = preg_split('/\s+/', $attribute_value, -1, PREG_SPLIT_NO_EMPTY);
    

    Hm, it's kind of strange to convert everything from strings to key-value maps, except for attribute values. I suspect you have a good reason for it though?

  13. +++ b/core/modules/filter/src/Tests/Update/FilterHtmlSchemaUpdateTest.php
    @@ -0,0 +1,61 @@
    +    $this->assertEqual([
    +      'no-attributes' => [],
    +      'single-attribute' => [
    +          'foo' => '*',
    +      ],
    +      'attribute-with-value' => [
    +        'foo' => 'bar',
    +      ],
    +      'multiple-attributes' => [
    +        'foo' => 'value',
    +        'bar' => 'value',
    +      ],
    +      'wildcard-attributes' => [
    +        '*' => '*',
    +      ],
    +      'wildcard-attribute-name' => [
    +        'data-*' => '*',
    +      ],
    +      'wildcard-attribute-name-with-value' => [
    +        'data-*' => 'foo',
    +      ],
    +      'wildcard-attribute-value' => [
    +        'foo' => '*',
    +      ],
    +    ], $filter_html_configuration['settings']['allowed_html']);
    +  }
    

    This looks great, but there's one case I'm missing: when an attribute has multiple allowed attribute values listed.

  14. +++ b/core/modules/filter/tests/src/Functional/FilterHtmlAdminTest.php
    @@ -0,0 +1,63 @@
    +  public function testFilterHtmlAdminForm() {
    

    I think that if we can remove the defaultConfiguration() method I reviewed above, that we also no longer need this test.

  15. +++ b/core/modules/filter/tests/src/Kernel/FilterAPITest.php
    @@ -214,7 +223,18 @@ public function testFilterFormatAPI() {
    +                'class' => 'foo bar-* *',
    
    +++ b/core/modules/filter/tests/src/Unit/FilterHtmlTest.php
    @@ -22,7 +22,33 @@ class FilterHtmlTest extends UnitTestCase {
    +          'class' => "pretty boring align-*",
    

    Ah, here is the one example of multiple allowed attribute values. Doesn't it look strange to you too that this is the only place where we're not dealing with an array, where you still have a string that you must parse?

    (Of course, this is something you use fairly rarely, so you definitely already solved 90% of the problem. Still, it's curious/noteworthy, hence I'm calling it out.)

    EDIT: found another. I'm glad and relieved to see that we did have some test coverage for this.

  16. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlAllowedMarkupSchemaTest.php
    @@ -0,0 +1,71 @@
    +class FilterHtmlAllowedMarkupSchemaTest extends UnitTestCase {
    

    <3

  17. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlAllowedMarkupSchemaTest.php
    @@ -0,0 +1,71 @@
    +    $this->assertEquals($settings, FilterHtmlAllowedMarkupSchema::generateSettings($string));
    ...
    +    $this->assertEquals($string, FilterHtmlAllowedMarkupSchema::generateString($settings));
    

    Can we change these to assertSame()?

  18. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlTest.php
    @@ -79,18 +105,4 @@ public function providerFilterAttributes() {
    -  public function testSetConfiguration() {
    

    Why are we removing this?

  19. +++ b/core/modules/system/src/Tests/Update/FilterHtmlUpdateTest.php
    index 92224c2..566c2b1 100644
    --- a/core/profiles/standard/config/install/filter.format.basic_html.yml
    
    +++ b/core/profiles/standard/config/install/filter.format.basic_html.yml
    index 323d077..b7d4ded 100644
    --- a/core/profiles/standard/config/install/filter.format.restricted_html.yml
    

    I reviewed the changes here very carefully, I didn't find any problems. They are 1:1 translations.

Sam152’s picture

Thank you for the review, this is awesome feedback. I'm going to take the time to go through and make the required changes tomorrow.

In the meantime, let me explain in some more detail the issues with the annotation containing the default allowed_html configuration, since to me that's probably one of the funkier things going on here. I should probably have pointed out they are deep merged. The result of that is, if you save an instance of the plugin and remove some elements/attributes, they come back as soon as it's initialised again. We want the merge to stop at allowed_html and not consider what's inside. Unless the plugin works around this, I don't know if there is a clean BC way to do this in FilterPluginCollection.

// \Drupal\filter\FilterPluginCollection::initializePlugin
  /**
   * {@inheritdoc}
   */
  protected function initializePlugin($instance_id) {
    // Filters have a 1:1 relationship to text formats and can be added and
    // instantiated at any time.
    // @todo $configuration is the whole filter plugin instance configuration,
    //   as contained in the text format configuration. The default
    //   configuration is the filter plugin definition. Configuration should not
    //   be contained in definitions. Move into a FilterBase::init() method.
    $configuration = $this->manager->getDefinition($instance_id);
    // Merge the actual configuration into the default configuration.
    if (isset($this->configurations[$instance_id])) {
      $configuration = NestedArray::mergeDeep($configuration, $this->configurations[$instance_id]);
    }
    $this->configurations[$instance_id] = $configuration;
    parent::initializePlugin($instance_id);
  }
Wim Leers’s picture

In the meantime, let me explain in some more detail the issues with the annotation containing the default allowed_html configuration, since to me that's probably one of the funkier things going on here.

Indeed it is! Very unexpected!

The result of that is, if you save an instance of the plugin and remove some elements/attributes, they come back as soon as it's initialised again.

Okay, so this is due to FilterPluginCollection always merging the default configuration, and that coming back to bite us for allowed_html, because this issue is changing that into an array?

Sam152’s picture

Status: Needs work » Needs review
FileSize
17.68 KB
47.67 KB

Really appreciate the thorough review. Feedback addressed as follows:

  1. Fixed, thanks.
  2. :)
  3. I think we could copy it in, but overall I would have hoped the tests would catch anything that breaks the upgrade path.
  4. I flipped flopped a bit over this initially, but I think this makes more sense. Moved + moved the test.
  5. Fixed.
  6. Good point. Fixed.
  7. Yep, getHTMLRestrictions should be exactly the same before and after. We’ve just moved the parsing from when the string is being filtered, to when the plugin is saved.
  8. Makes sense. Done.
  9. As explained in #29. You nailed it in #30.
  10. As per point 9.
  11. Fixed. I’m all for minimising disruption in the diff.
  12. I hadn’t considered this to be honest. I think just eyeballing the schema, it seemed like the readability aspect had been accomplished. We can get this done here, but I think you're probably right that the current iteration works better 90% of the time. Consider these two examples from standard profile:

    Before attribute value arrays:

            h2:
              id: '*'
            h3:
              id: '*'
            h4:
              id: '*'
            h5:
              id: '*'
            h6:
              id: '*'
    

    After:

            h2:
              id:
                - '*'
            h3:
              id:
                - '*'
            h4:
              id:
                - '*'
            h5:
              id:
                - '*'
            h6:
              id:
                - '*'
    
  13. Added a new test case for this.
  14. Yeah, if we can fix it without having to hack around it in the plugin, I’m all for it. A better way isn’t jumping out at me though.
  15. See 12.
  16. :)
  17. TIL
  18. The string version of the settings are never stored any more, so I’ve instead tested what would be the equivalent of this by asserting ::convertAllowedHtmlStringToArray deals correctly with whitespace.
  19. Awesome :)
Wim Leers’s picture

Component: editor.module » filter.module

I just realized this is first and foremost a filter.module issue. Moving.

Wim Leers’s picture

Status: Needs review » Needs work
  1. Right, I hadn't even thought about that — that's an excellent point!
  1. Yep, I now understand how you arrived here. Thanks for the explanation in #29, it indeed clicked for me in #30 :). That being said, we would ideally be able to keep the default setting in the annotation. I wonder if we can change FilterPluginCollection::getConfiguration() to not do this auto-merging-by-default-always thing. Although that'd be a BC break :/ So, let's instead document this very clearly, with an explicit @see \Drupal\filter\FilterPluginCollection::getConfiguration().
  1. Agreed. Basically: Do not decompose everything, decompose to maximize readability and minimize complexity.
  1. Agreed, sadly. :)
  1. Makes sense!

Next steps

So the only remaining thing to be done is to clarify the docs for point 9, and a change record. Then I'll RTBC! :) Thanks so much for this! Sorry for the long wait since #31, this fell of my radar.

+++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
@@ -20,7 +21,7 @@
+ *     "allowed_html" = TRUE,

@@ -65,14 +70,32 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
+    if (!isset($configuration['settings']['allowed_html']) || $configuration['settings']['allowed_html'] === TRUE) {

And actually, it might be a good idea to change the default value here from TRUE to something more descriptive. Such as dynamically_set_to_not_break_bc (better name TBD).

Sam152’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
4.73 KB
48.12 KB

Thank you for the review, really appreciate it! One thing which I hadn't considered but might work is to use the string representation in the annotation instead and convert it when the plugin is initialised.

Adding the CR shortly.

Sam152’s picture

+++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
@@ -88,13 +88,15 @@ public function preRenderAllowedHtml($element) {
+    if (!isset($configuration['settings']['allowed_html']) || is_string($configuration['settings']['allowed_html'])) {
       $configuration['settings']['allowed_html'] = $this->defaultConfiguration()['settings']['allowed_html'];
     }

I wonder if for BC reasons, any string passed to setConfiguration (from the annotation or elsewhere) should be converted into the array representation, instead of just assuming it's the default.

Sam152’s picture

Slightly altered approach where any config presented as a string will be converted, not just the default in the annotation.

Sam152’s picture

Removing unused use from previous iteration.

Sam152’s picture

Issue tags: -Needs change record
Wim Leers’s picture

Status: Needs review » Needs work

CR looks good! I only improved its formatting.

One thing which I hadn't considered but might work is to use the string representation in the annotation instead and convert it when the plugin is initialised.

Me neither! Genius! :D Furthermore, this means no change at all to the annotation, which is even better, because fewer BC concerns!

#36 did exactly what I wanted to suggest when reviewing #34.

In a last pass, I found one small problem that a core committer will definitely block this for. And in your last few rerolls, I think some of the comments could be clearer. So… just one more reroll :)

  1. +++ b/core/modules/filter/filter.install
    @@ -0,0 +1,35 @@
    +function filter_update_8001() {
    

    Must be filter_update_8401().

  2. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -65,14 +69,34 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // The allowed_html setting is stored as a sequence of elements and
    

    of allowed HTML tags, each containing more configuration values.

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -65,14 +69,34 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // So that we aren't deep merging the default configuration with the actual
    

    s/So that we aren't/To prevent/


Minor improvement suggestions, which can happen in a follow-up and don't need to block commit:

  1. +++ b/core/modules/filter/config/schema/filter.schema.yml
    @@ -51,8 +51,14 @@ filter_settings.filter_html:
           label: 'Allowed HTML'
    

    Allowed HTML tags

  2. +++ b/core/modules/filter/config/schema/filter.schema.yml
    @@ -51,8 +51,14 @@ filter_settings.filter_html:
    +        label: 'Attributes'
    

    Allowed attributes

  3. +++ b/core/modules/filter/config/schema/filter.schema.yml
    @@ -51,8 +51,14 @@ filter_settings.filter_html:
    +          label: 'Value'
    

    Allowed attribute values

Sam152’s picture

Status: Needs work » Needs review
FileSize
3.8 KB
47.93 KB

To fix the tests, it looks like the code to change the string into the array in defaultConfiguration is still required because that's what \Drupal\filter\FilterPluginCollection::getConfiguration uses to determine if the plugin is enabled for any given filter. Removing it causes it to consider filter_html enabled when it isn't.

As for the feedback:

1. Good catch.
2. Updated as suggested.
3. Fixed.
1. Better label, fixed.
2. Fixed.
3. Fixed.

Thanks for looking at the CR too.

Wim Leers’s picture

Status: Needs review » Needs work
+++ b/core/modules/filter/filter.install
@@ -0,0 +1,35 @@
+  $dependencies['filter'][8001] = [

+++ b/core/modules/filter/src/Tests/Update/FilterHtmlSchemaUpdateTest.php
@@ -0,0 +1,64 @@
+   * Tests filter_update_8001().

You forgot to update these to 8401… :) Once those are fixed, insta-RTBC!

Sam152’s picture

Status: Needs work » Needs review
FileSize
1013 bytes
47.93 KB

Fixed, good catch.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

🎉

larowlan’s picture

+++ b/core/modules/ckeditor/tests/modules/src/Kernel/CKEditorTest.php
@@ -49,7 +49,22 @@ protected function setUp() {
-            'allowed_html' => '<h2 id> <h3> <h4> <h5> <h6> <p> <br> <strong> <a href hreflang>',
+            'allowed_html' => [

Seeing this alerts me to the fact that there would be many contrib/client project tests that include something similar. Can we update the change record to say that if you have tests that create filter formats with this plugin, you have to change your allowed_html from a string to an array. But we should mention that you can use the static method inline to make the conversion with minimal fuss. We should provide a code sample showing this in the change record.

Other than that, it looks awesome and will be a great improvement to many Drupal8 projects in the wild!

larowlan’s picture

I'm not sure if #44 constitutes a BC break, asking some others for clarification.

How much work would it be for the plugin to be smart enough to juggle both a string and an array when used at the API level (obviously at the form level it doesn't need to)?

Sam152’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
48.8 KB
1.22 KB

We actually got this for free in #36 but never added a test to prove it, added that.

larowlan’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, this is ready to go in my book

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/editor/tests/editor_private_test/config/install/filter.format.private_images.yml
@@ -15,7 +15,12 @@ filters:
-      allowed_html: '<img src alt data-entity-type data-entity-uuid>'
+      allowed_html:
+        img:
+          src: '*'
+          alt: '*'
+          data-entity-type: '*'
+          data-entity-uuid: '*'

We are not supposed to change config key types given we don't have versioning for them. To keep BC AFAIS we would need to add a new key that would have the new type definition and deprecate the old key. Otherwise what happens on a site with the new code that tries to import from a staged version of the site with the old config? Allowed HTML settings lost? That could lead to security problems even.

Wim Leers’s picture

Is this the first issue where we're trying to improve config schema? Do we have prior experience with this?

Sam152’s picture

Do we really support importing config between different versions of Drupal? This seems like it should fall under "undefined behavior", I do wonder how well past schema changes have dealt with this or if this was even a consideration. I was under the impression the accepted practice was that an updb was always accompanied by a config export for this reason.

dawehner’s picture

The only main problem I can see is configuration shipped with install profiles/modules. The code, which is part of this patch, ensures on runtime, that the values are converted properly, but maybe tools like config inspector get confused?

We did a sort of similar change in views: #2459289: Boolean default values are not saved. It was possible for us to switch from booleans to strings as long we provided an update path + a conversion on runtime.

Gábor Hojtsy’s picture

In #1977498-7: Add version tracking for configuration schema and data 3 years ago I summarized the discussion we had with this paragraph:

It is an issue but I think we decided we ignore it for now and just assume that whatever is in releases will need to keep backwards compatibility for configuration structures, so if you used a config key for something before, you'll need to keep backwards compatibility on it going forward. With the same thinking, we marked schema change issues in core critical and D8 upgrade path blocking, because we would need to pick new config keys later on if we need to change the structures.

Also #2543150: Document consequences of contrib changing config schema without core's API supporting config version tracking has a bunch of more detail, eg. comment #2 there again from me:

As per our discussions given that #1977498: Add version tracking for configuration schema and data is not resolved, core and contrib alike should only make additive changes to schema, it should not make backwards incompatible changes in any way whatsoever. Eg. if a views plugin need to make backwards incompatible changes to their settings, they need to modify their plugin_id and add schema under the new plugin_id from then on. If a module needs to change its modulename.settings file in a backwards incompatible compatible way, it will need to change the config name to something else, eg. modulename.configuration or something else. Adding keys and stopping the use of some keys is fine, so if a views plugin needs to use now new keys for its settings and abandon old ones, that is fine too.

Wim Leers’s picture

Following that reasoning/advice, would we need to modify \Drupal\filter\Plugin\Filter\FilterHtml's plugin ID from filter_html to something else?

Wim Leers’s picture

Issue tags: +Configuration system
Gábor Hojtsy’s picture

Well, if the new plugin cannot be made to work with the deprecated key and the newly added key then yes. I think based on the above it could work with both so no need to rename the plugin.

Sam152’s picture

The only main problem I can see is configuration shipped with install profiles/modules.

If the profile has any tests that use strict config schema checking, they'll fail as soon as the schema doesn't match what's in the profile. For this reason our internal process for aGov has been:

  1. Install profile on current version.
  2. Update the code to latest versions, run updb.
  3. Export the configuration.

But I guess schema changes would technically mean a module/profile would have a minimum version requirement on the major version of core the config was created from, so I can see the pitfalls here.

I'll read over the issues that refer to BC, thanks for pointing those out @Gábor Hojtsy.

This issue was largely a DX improvement, so I'm hesitant to go down the path of doing this at the expense of a lot of ugly code, so will have to see how hard it is to deprecate the old keys in favour of the new. Given the "easy upgrades forever" policy, I'm guessing all improvements to config schema might face this issue in the future too?

mallezie’s picture

Wim Leers’s picture

#56:

Given the "easy upgrades forever" policy, I'm guessing all improvements to config schema might face this issue in the future too?

Yes. This issue is just the first.

#57: Interesting links!

Gábor Hojtsy’s picture

Also this is the same process/thing we do for PHP APIs because we don't have API versioning there either.

effulgentsia’s picture

This reminds me of how CKEditor has an allowedContent configuration that can be expressed in either string format or object format.

I don't know if we can satisfy all the requirements referenced in #52 with a schema type that can be either a string or a sequence. I don't think we have any precedent for that, but maybe it's worth considering, I don't know.

Alternatively, we could create a new key, named something like allowed_html_object_format (in JSON, this would be an object), allowed_html_array_format (in PHP, it's an array), allowed_html_sequence_format (in YAML schema, it's a sequence), or allowed_html_extended_format (if we want programming-language-neutral terminology).

Just some food for thought.

Additionally, do we want this issue's scope to include making a similar change in aggregator.settings:items.allowed_html or would we rather discuss that in a follow-up?

effulgentsia’s picture

allowed_html_extended_format might be a confusing name, if people think that the "extended format" suffix is being applied to "html" (i.e., some special flavor of html) rather than applying to the format of the configuration value. So, if anyone has better ideas, please share.

effulgentsia’s picture

+++ b/core/modules/editor/tests/editor_private_test/config/install/filter.format.private_images.yml
@@ -15,7 +15,12 @@ filters:
     settings:
+      allowed_html:
+        img:
+          data-entity-type: '*'

Do we have any precedent for allowing - characters in config key names? And more generically, other characters that are allowed in HTML element and attribute names, such as : or .? In #2293773: Field allowed values use dots in key names - not allowed in config, we had to change config schema to avoid the problems with ..

Gábor Hojtsy’s picture

Possible alternate names: allowed_html_tags, allowed_html_elements, allowed_html_list, etc.

Sam152’s picture

#62 is correct. "." were previously accepted in attribute names but no longer work with the patch applied.

"allowed_html_elements" seems like a good key to me.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Sam152’s picture

Status: Needs work » Needs review
FileSize
48.84 KB

Straight reroll, none of the feedback addressed yet.

Status: Needs review » Needs work

The last submitted patch, 66: 2871354-filter_html-schema-66.patch, failed testing. View results

Sam152’s picture

Status: Needs work » Needs review
FileSize
48.88 KB

Needed another reroll + test fix.

Sam152’s picture

Status: Needs review » Needs work
Sam152’s picture

Status: Needs work » Needs review
FileSize
23.91 KB
52.02 KB

Having a go at what BC for the schema might look like before addressing #64. Seems to work pretty well, I also introduced FilterLegacyConfigFormatTest to try and prove the old format will import and work as expected, however it might insufficient, not totally sure at this stage.

Status: Needs review » Needs work

The last submitted patch, 70: 2871354-filter_html-schema-70.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Sam152’s picture

Status: Needs work » Needs review
FileSize
4.6 KB
50.92 KB

Test fixes.

Status: Needs review » Needs work

The last submitted patch, 72: 2871354-filter_html-schema-72.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Sam152’s picture

Status: Needs work » Needs review
FileSize
3.16 KB
43.72 KB

Test fixes.

Sam152’s picture

Status: Needs review » Needs work
Wim Leers’s picture

Thanks for picking this up again, @Sam152!

Sam152’s picture

Only bits and pieces here and there :). Sorry for the noise in the issue!

Wim Leers’s picture

No worries at all! Very glad & grateful to see some movement here!

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

mstrelan’s picture

This is still very relevant today. We should match ckeditor, for example from core/profiles/standard/config/install/editor.editor.basic_html.yml:

settings:
  plugins:
    ckeditor5_sourceEditing:
      allowed_tags:
        - '<cite>'
        - '<dl>'
        - '<dt>'
        - '<dd>'
        - '<a hreflang>'
        - '<blockquote cite>'
        - '<h2 id>'
        - '<h3 id>'
        - '<h4 id>'
        - '<h5 id>'
        - '<h6 id>'
isalmanhaider’s picture

I agree that this issue remains pertinent, especially considering the structure used by CKEditor in Drupal core.

Adopting a similar format for the "allowed_html" setting, as demonstrated in core/profiles/standard/config/install/editor.editor.basic_html.yml, would not only ensure consistency with CKEditor's configuration but also improve clarity and manageability for developers and site administrators.

Such alignment could greatly enhance user experience and reduce configuration errors.

Wim Leers’s picture

Title: Create a schema for "allowed_html" which provides a better config diff » Improve schema for "allowed_html" to provide a better config diff
Issue tags: +Configuration schema, +validation, +blocker
Related issues: +#3421946: [PP-2] Make FilterFormat config entities fully validatable

This at minimum soft-blocks #3421946: [PP-2] Make FilterFormat config entities fully validatable and potentially hard-blocks it.

acbramley’s picture

Glad I found this issue, this is a problem every single time we change filter formats in any way, especially when (for example) adding new styles.

What do people think of an interim solution where we keep the existing unstructured string and at the very least sort the tags alphabetically? That would at least alleviate very hard to parse diffs like the following:

Filter format diff

This is an example where a new style had been added to the table element, but all the other tags were randomly reordered.

Wim Leers’s picture

#93++ — let's do that here so we can close this already ~100-comment long issue? 😄