Problem/Motivation
I am working on a site which requires quite a specific and detailed element/attribute whitelist. My "allowed_html" setting is 1200 characters long. When updating this, it's impossible to tell from a diff what changed. In addition to this, we've found that changes manually made to the attributes and classes are reverted, chopped and changed by Drupal.behaviors.filterFilterHtmlUpdating erroneously, possibly from a bug in the JS or one of the filters we have installed.
It would be great if the schema that stored these settings provided a better diff.
Proposed resolution
Update the schema to look something like:
allowed_html:
type: sequence
label: 'Allowed HTML'
sequence:
type: sequence
label: 'Attributes'
sequence:
type: string
label: 'Value'
Which would transform settings like:
allowed_html: '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type>'
Into:
allowed_html:
a:
href: '*'
hreflang: '*'
em: { }
strong: { }
cite: { }
blockquote:
cite: '*'
code: { }
ul:
type: '*'
ol:
start: '*'
type: '*'
Remaining tasks
Validate & see if this is something we can achieve without breaking BC.
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#93 | Screenshot from 2024-03-14 13-16-53.png | 104.3 KB | acbramley |
#74 | 2871354-filter_html-schema-74.patch | 43.72 KB | Sam152 |
#74 | interdiff.txt | 3.16 KB | Sam152 |
#72 | 2871354-filter_html-schema-72.patch | 50.92 KB | Sam152 |
#72 | interdiff.txt | 4.6 KB | Sam152 |
Comments
Comment #2
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #3
Wim Leers+1!
Comment #4
dawehnerIn case someone needs help implementing this (and writing the update function/tests), please ping me, I'd so like to see this landing.
Comment #5
Wim LeersThanks, @dawehner!
Comment #6
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedThis setting had a lot more touch points with things like tests, migrations, clientside code and existing config then the related issue. Starting on that one first to see how far it gets.
Comment #7
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedI've been going through a few iterations of designing this schema. It turns out it's very helpful to split out the attribute name and value when parsing the string into the schema, so that this information doesn't have to be put back into a DOM node and parsed down the track when calculating getHTMLRestrictions(). For that reason it had to be altered a bit.
Here is one which seemed to work, but the issue was it also seems quite verbose:
From:
To:
The other option would be to use the tag name and attribute name as the key for the sequences. I liked the result of this a lot better:
Since no tag or attribute should be repeated, I think this works well. Going to continue with it.
Comment #8
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedUploading to see which tests break and for any feedback on the schema and approach thus far. Still needs tests and upgrade path.
Comment #10
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedSome more progress.
Comment #12
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedSome more test fixes. Since migrate needed the transformation and the update hook was using reflection, I think it makes sense to just move the array => string transformation into its own class. Benefit being they can also be unit tested.
Comment #14
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #16
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedStill need to figure out a way to set default allowed_html without re-adding the elements them every time the filter is saved. When it was a single string, the value in the plugin configuration was always overriding the default. Now that's it's a mapping the whole structure is merged in, as you would expect for typical nested settings.
Comment #17
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedI've been looking into how to set some defaults for the allowed_html value without forcing the defaults to always appear. The conflict is in FilterPluginCollection::initializePlugin:
The plugin doesn't have a chance to act before being created, the plugin collection does a mergeDeep in the settings from the annotation into the saved configuration.
Comment #18
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedAn approach to dealing with the default value problem.
Comment #20
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedEvery time :)
Comment #22
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedFails were tracked down to FilterPluginCollection. The kind of funky thing going on is the initial default config is merged based on the plugin definition (FilterPluginCollection::initializePlugin), then later removed from the collection if the config matches the method call (FilterPluginCollection::getConfiguration), so this is a bit of a work around to not use NestedArray::mergeDeep for configuration, but still have an array for default configuration.
Comment #23
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #24
dawehnerIs there any way how we could add some constrains to validate these a bit more.
Comment #25
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedI don't know if we could start being more restrictive with the storage, because what we're upgrading from is essentially a string with no schema at all. We are emulating the way that string was parsed for the purposes of restricting HTML, but instead of doing some filtering we're storing that structure in the config object. This ensures the settings we're storing will behave in an exact BC fashion once they've been upgraded from the old string.
We could add extra validations to the form, but those validations wouldn't be blocked by this issue. The current UI is very permissive so we could do that work without changing the schema at all.
Comment #26
dawehnerI try to understand why filter.install doesn't resolve it.
Comment #27
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedI was going to try catch you on on IRC to discuss. Perhaps you could show me what you mean by #26. Not sure I follow.
Comment #28
Wim LeersSorry for the delay in reviewing this, I somehow lost track of this after DrupalCon! And wow, very solid work: impressive eye to detail!
#7: thanks for sharing your thought process! So very important :) I agree using the elements as keys makes sense.
#16 + #17: this reminds me of #2361539: Config export key order is not predictable for sequences, add orderby property to config schema — see the CR at https://www.drupal.org/node/2852566.
Patch review time!
The top-level key here should be
'h2'
.Wow, I didn't even know this was a thing!
Note: in principle we don't let update functions call external functions, because there's a risk that those external (non-update) functions change over time, and hence break the update path.
In this case, that seems impossible, because it's literally about a 1:1 transformation from one representation to another.
I'm wondering if we should these static methods be moved into
\Drupal\filter\Plugin\Filter\FilterHtml
?Nit: s/Generate/Generates/
I'd rename this to
convertAllowedHtmlStringToArray()
.(Because we're not generating anything.)
This portion deviates from the existing code in HEAD in
\Drupal\filter\Plugin\Filter\FilterHtml::getHTMLRestrictions()
. Why?EDIT: oh, because there's a different need for
getHTMLRestrictions()
: that is complying with an API that expects a certain return value. By changing theallowed_html
setting from a string to an actual key-value map, our transformation needs match the needs ofgetHTMLRestrictions()
almost exactly. The difference lies in those final details.getHTMLRestrictions()
can itself then become a lot simpler, because it won't need to parse/transform a string anymore!I'd rename this to
convertAllowedHtmlArrayToString()
.(Because we're not generating anything.)
I don't understand this. Why can't the annotation list the actual default setting?
I don't understand this either. "So that we can support a default configuration that isn't merged" -> what does this mean? Is this for BC reasons?
Nit: If we'd change this to
if (!empty(…
, then we could keep the original code order, and the diff would be slightly easier to read. No strong opinion though, if you prefer this way, that's okay.Hm, it's kind of strange to convert everything from strings to key-value maps, except for attribute values. I suspect you have a good reason for it though?
This looks great, but there's one case I'm missing: when an attribute has multiple allowed attribute values listed.
I think that if we can remove the
defaultConfiguration()
method I reviewed above, that we also no longer need this test.Ah, here is the one example of multiple allowed attribute values. Doesn't it look strange to you too that this is the only place where we're not dealing with an array, where you still have a string that you must parse?
(Of course, this is something you use fairly rarely, so you definitely already solved 90% of the problem. Still, it's curious/noteworthy, hence I'm calling it out.)
EDIT: found another. I'm glad and relieved to see that we did have some test coverage for this.
<3
Can we change these to
assertSame()
?Why are we removing this?
I reviewed the changes here very carefully, I didn't find any problems. They are 1:1 translations.
Comment #29
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedThank you for the review, this is awesome feedback. I'm going to take the time to go through and make the required changes tomorrow.
In the meantime, let me explain in some more detail the issues with the annotation containing the default allowed_html configuration, since to me that's probably one of the funkier things going on here. I should probably have pointed out they are deep merged. The result of that is, if you save an instance of the plugin and remove some elements/attributes, they come back as soon as it's initialised again. We want the merge to stop at allowed_html and not consider what's inside. Unless the plugin works around this, I don't know if there is a clean BC way to do this in FilterPluginCollection.
Comment #30
Wim LeersIndeed it is! Very unexpected!
Okay, so this is due to
FilterPluginCollection
always merging the default configuration, and that coming back to bite us forallowed_html
, because this issue is changing that into an array?Comment #31
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedReally appreciate the thorough review. Feedback addressed as follows:
Before attribute value arrays:
After:
Comment #32
Wim LeersI just realized this is first and foremost a
filter.module
issue. Moving.Comment #33
Wim LeersFilterPluginCollection::getConfiguration()
to not do this auto-merging-by-default-always thing. Although that'd be a BC break :/ So, let's instead document this very clearly, with an explicit@see \Drupal\filter\FilterPluginCollection::getConfiguration()
.Next steps
So the only remaining thing to be done is to clarify the docs for point 9, and a change record. Then I'll RTBC! :) Thanks so much for this! Sorry for the long wait since #31, this fell of my radar.
And actually, it might be a good idea to change the default value here from
TRUE
to something more descriptive. Such asdynamically_set_to_not_break_bc
(better name TBD).Comment #34
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedThank you for the review, really appreciate it! One thing which I hadn't considered but might work is to use the string representation in the annotation instead and convert it when the plugin is initialised.
Adding the CR shortly.
Comment #35
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedI wonder if for BC reasons, any string passed to setConfiguration (from the annotation or elsewhere) should be converted into the array representation, instead of just assuming it's the default.
Comment #36
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedSlightly altered approach where any config presented as a string will be converted, not just the default in the annotation.
Comment #37
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedRemoving unused use from previous iteration.
Comment #38
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #39
Wim LeersCR looks good! I only improved its formatting.
Me neither! Genius! :D Furthermore, this means no change at all to the annotation, which is even better, because fewer BC concerns!
#36 did exactly what I wanted to suggest when reviewing #34.
In a last pass, I found one small problem that a core committer will definitely block this for. And in your last few rerolls, I think some of the comments could be clearer. So… just one more reroll :)
Must be
filter_update_8401()
.of allowed HTML tags, each containing more configuration values.
s/So that we aren't/To prevent/
Minor improvement suggestions, which can happen in a follow-up and don't need to block commit:
Allowed HTML tags
Allowed attributes
Allowed attribute values
Comment #40
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedTo fix the tests, it looks like the code to change the string into the array in defaultConfiguration is still required because that's what \Drupal\filter\FilterPluginCollection::getConfiguration uses to determine if the plugin is enabled for any given filter. Removing it causes it to consider filter_html enabled when it isn't.
As for the feedback:
1. Good catch.
2. Updated as suggested.
3. Fixed.
1. Better label, fixed.
2. Fixed.
3. Fixed.
Thanks for looking at the CR too.
Comment #41
Wim LeersYou forgot to update these to 8401… :) Once those are fixed, insta-RTBC!
Comment #42
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedFixed, good catch.
Comment #43
Wim Leers🎉
Comment #44
larowlanSeeing this alerts me to the fact that there would be many contrib/client project tests that include something similar. Can we update the change record to say that if you have tests that create filter formats with this plugin, you have to change your allowed_html from a string to an array. But we should mention that you can use the static method inline to make the conversion with minimal fuss. We should provide a code sample showing this in the change record.
Other than that, it looks awesome and will be a great improvement to many Drupal8 projects in the wild!
Comment #45
larowlanI'm not sure if #44 constitutes a BC break, asking some others for clarification.
How much work would it be for the plugin to be smart enough to juggle both a string and an array when used at the API level (obviously at the form level it doesn't need to)?
Comment #46
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedWe actually got this for free in #36 but never added a test to prove it, added that.
Comment #47
larowlanThanks, this is ready to go in my book
Comment #48
Gábor HojtsyWe are not supposed to change config key types given we don't have versioning for them. To keep BC AFAIS we would need to add a new key that would have the new type definition and deprecate the old key. Otherwise what happens on a site with the new code that tries to import from a staged version of the site with the old config? Allowed HTML settings lost? That could lead to security problems even.
Comment #49
Wim LeersIs this the first issue where we're trying to improve config schema? Do we have prior experience with this?
Comment #50
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedDo we really support importing config between different versions of Drupal? This seems like it should fall under "undefined behavior", I do wonder how well past schema changes have dealt with this or if this was even a consideration. I was under the impression the accepted practice was that an updb was always accompanied by a config export for this reason.
Comment #51
dawehnerThe only main problem I can see is configuration shipped with install profiles/modules. The code, which is part of this patch, ensures on runtime, that the values are converted properly, but maybe tools like config inspector get confused?
We did a sort of similar change in views: #2459289: Boolean default values are not saved. It was possible for us to switch from booleans to strings as long we provided an update path + a conversion on runtime.
Comment #52
Gábor HojtsyIn #1977498-7: Add version tracking for configuration schema and data 3 years ago I summarized the discussion we had with this paragraph:
Also #2543150: Document consequences of contrib changing config schema without core's API supporting config version tracking has a bunch of more detail, eg. comment #2 there again from me:
Comment #53
Wim LeersFollowing that reasoning/advice, would we need to modify
\Drupal\filter\Plugin\Filter\FilterHtml
's plugin ID fromfilter_html
to something else?Comment #54
Wim LeersComment #55
Gábor HojtsyWell, if the new plugin cannot be made to work with the deprecated key and the newly added key then yes. I think based on the above it could work with both so no need to rename the plugin.
Comment #56
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedIf the profile has any tests that use strict config schema checking, they'll fail as soon as the schema doesn't match what's in the profile. For this reason our internal process for aGov has been:
But I guess schema changes would technically mean a module/profile would have a minimum version requirement on the major version of core the config was created from, so I can see the pitfalls here.
I'll read over the issues that refer to BC, thanks for pointing those out @Gábor Hojtsy.
This issue was largely a DX improvement, so I'm hesitant to go down the path of doing this at the expense of a lot of ugly code, so will have to see how hard it is to deprecate the old keys in favour of the new. Given the "easy upgrades forever" policy, I'm guessing all improvements to config schema might face this issue in the future too?
Comment #57
mallezieDoes #2762235: Ensure that ConfigImport is taking place without outstanding updates and #2628144: Ensure that ConfigImport is taking place against the same code base remove the need for the 'backwards compatible' 'hack' which should / could be needed here?
Comment #58
Wim Leers#56:
Yes. This issue is just the first.
#57: Interesting links!
Comment #59
Gábor HojtsyAlso this is the same process/thing we do for PHP APIs because we don't have API versioning there either.
Comment #60
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThis reminds me of how CKEditor has an allowedContent configuration that can be expressed in either string format or object format.
I don't know if we can satisfy all the requirements referenced in #52 with a schema type that can be either a string or a sequence. I don't think we have any precedent for that, but maybe it's worth considering, I don't know.
Alternatively, we could create a new key, named something like
allowed_html_object_format
(in JSON, this would be an object),allowed_html_array_format
(in PHP, it's an array),allowed_html_sequence_format
(in YAML schema, it's a sequence), orallowed_html_extended_format
(if we want programming-language-neutral terminology).Just some food for thought.
Additionally, do we want this issue's scope to include making a similar change in
aggregator.settings:items.allowed_html
or would we rather discuss that in a follow-up?Comment #61
effulgentsia CreditAttribution: effulgentsia at Acquia commentedallowed_html_extended_format
might be a confusing name, if people think that the "extended format" suffix is being applied to "html" (i.e., some special flavor of html) rather than applying to the format of the configuration value. So, if anyone has better ideas, please share.Comment #62
effulgentsia CreditAttribution: effulgentsia at Acquia commentedDo we have any precedent for allowing
-
characters in config key names? And more generically, other characters that are allowed in HTML element and attribute names, such as:
or.
? In #2293773: Field allowed values use dots in key names - not allowed in config, we had to change config schema to avoid the problems with.
.Comment #63
Gábor HojtsyPossible alternate names: allowed_html_tags, allowed_html_elements, allowed_html_list, etc.
Comment #64
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commented#62 is correct. "." were previously accepted in attribute names but no longer work with the patch applied.
"allowed_html_elements" seems like a good key to me.
Comment #66
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedStraight reroll, none of the feedback addressed yet.
Comment #68
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedNeeded another reroll + test fix.
Comment #69
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #70
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedHaving a go at what BC for the schema might look like before addressing #64. Seems to work pretty well, I also introduced
FilterLegacyConfigFormatTest
to try and prove the old format will import and work as expected, however it might insufficient, not totally sure at this stage.Comment #72
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedTest fixes.
Comment #74
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedTest fixes.
Comment #75
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #76
Wim LeersThanks for picking this up again, @Sam152!
Comment #77
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedOnly bits and pieces here and there :). Sorry for the noise in the issue!
Comment #78
Wim LeersNo worries at all! Very glad & grateful to see some movement here!
Comment #90
mstrelan CreditAttribution: mstrelan at PreviousNext commentedThis is still very relevant today. We should match ckeditor, for example from
core/profiles/standard/config/install/editor.editor.basic_html.yml
:Comment #91
isalmanhaider CreditAttribution: isalmanhaider as a volunteer commentedI agree that this issue remains pertinent, especially considering the structure used by CKEditor in Drupal core.
Adopting a similar format for the "allowed_html" setting, as demonstrated in
core/profiles/standard/config/install/editor.editor.basic_html.yml
, would not only ensure consistency with CKEditor's configuration but also improve clarity and manageability for developers and site administrators.Such alignment could greatly enhance user experience and reduce configuration errors.
Comment #92
Wim LeersThis at minimum soft-blocks #3421946: [PP-2] Make FilterFormat config entities fully validatable and potentially hard-blocks it.
Comment #93
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedGlad I found this issue, this is a problem every single time we change filter formats in any way, especially when (for example) adding new styles.
What do people think of an interim solution where we keep the existing unstructured string and at the very least sort the tags alphabetically? That would at least alleviate very hard to parse diffs like the following:
This is an example where a new style had been added to the
table
element, but all the other tags were randomly reordered.Comment #94
Wim Leers#93++ — let's do that here so we can close this already ~100-comment long issue? 😄