Problem/Motivation

  • Drupal 8 core outputs <!DOCTYPE html> at the top of html.html.twig, which tells the browser that the contents are HTML5.
  • http://www.w3.org/TR/html5/single-page.html#non-conforming-features lists many tags and attributes from older versions of HTML that the HTML5 specification defines as: Elements in [this] list are entirely obsolete, and must not be used by authors.
  • Install Standard Profile, and go to /node/add/article. Decide that you can't think of anything original to write so you go searching for interesting content on other web pages to copy and paste. You stumble upon https://www.cs.tut.fi/~jkorpela/www/justify.html, which is a perfectly valid HTML 4 page and find that you like the section "What is justification?", so you select that whole section and copy it to your clipboard. You then switch back to your node form and paste that into the body field, click Save, and voila, you now have a page on your site with some lovely justified and right-aligned paragraphs.
  • Problem is, align is not a valid HTML5 attributes. Current browsers still respect it, but there's nothing stopping a future browser from ignoring it. And meanwhile, you now have a site that will be failing strict html validation (if you care about that sort of thing).
  • If instead, non-html5-valid attributes were filtered out, you'd have the opportunity to discover a more correct (i.e., standards-compliant, and therefore future-proof) way of aligning your paragraphs, such as going to your format configuration and adding the alignment/justification buttons to your toolbar.

Proposed resolution

Fortunately, we already have all of the tooling necessary to fix this. FilterInterface::getHTMLRestrictions() provides the API for expressing attribute (and other) restrictions and Html::load() gives us a DOM document that we can use in order to apply those rules. So the proposed resolution consists of changing the allowed_html setting of html_filter to include a whitelist of attributes, and for each attribute, an optional whitelist of allowed attribute values.

Remaining tasks

Review.

User interface changes

The format configuration UI.

API changes

None.

Data model changes

The allowed_html key of the filter_html filter's configurations is expanded to allow for whitelisting attributes. So, e.g. <a> to <a href hreflang dir>, <h4> to <h4 id>, et cetera.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because content authors shouldn't be required to know the HTML5 specification in order to avoid generating invalid web pages.
Issue priority Major because we should not change the format configurations of existing sites, so if we don't prioritize this for 8.0.0, then sites built between 8.0.0 and when this is fixed will have invalid html content potentially for a long time.
Prioritized changes The main goal of this issue is usability, in the sense of helping content authors generate content that is standards-compliant and therefore future-proof.
Disruption Requires an upgrade path and is disruptive to some existing D8 sites (depending on which attributes their content depends on, they may need to manually add additional attributes and attribute values to be whitelisted). Potentially disruptive for contributed and custom modules that implement filters that run after 'filter_html' but use internal attributes that 'filter_html' will now be filtering out. However, these modules have ways to resolve that, such as instructing site builders on the required configuration for compatibility and/or using the FilterInterface::prepare() method.
CommentFileSizeAuthor
#149 increment.txt4.11 KBpwolanin
#149 2549077-149.patch63.99 KBpwolanin
#147 interdiff.txt776 bytesWim Leers
#147 2549077-147.patch63.48 KBWim Leers
#145 interdiff.txt1.21 KBWim Leers
#145 2549077-145.patch63.34 KBWim Leers
#140 2549077-140.patch63.34 KBpwolanin
#133 interdiff.txt1.04 KBlauriii
#133 allow_the_limit-2549077-133.patch63.35 KBlauriii
#132 increment.txt1 KBpwolanin
#132 2549077-132.patch63.27 KBpwolanin
#131 increment.txt9.83 KBpwolanin
#131 2549077-131.patch63.27 KBpwolanin
#115 increment.txt1.02 KBpwolanin
#115 2549077-115.patch62.89 KBpwolanin
#114 increment.txt3.15 KBpwolanin
#114 2549077-114.patch63.29 KBpwolanin
#112 increment.txt4.36 KBpwolanin
#112 2549077-112.patch62.29 KBpwolanin
#108 increment.txt2.05 KBpwolanin
#108 2549077-108.patch59.77 KBpwolanin
#104 2549077-102.patch59.76 KBpwolanin
#102 increment.txt2.1 KBpwolanin
#102 2549077-102.patch59.61 KBpwolanin
#99 increment.txt3.24 KBpwolanin
#99 2549077-99.patch59.59 KBpwolanin
#98 increment.txt6.19 KBpwolanin
#98 2549077-98.patch60.06 KBpwolanin
#97 increment.txt413 bytespwolanin
#97 2549077-97.patch59.49 KBpwolanin
#91 interdiff.txt3.2 KBWim Leers
#91 2549077-attribute-filtering-91.patch59.45 KBWim Leers
#90 interdiff.txt1.27 KBWim Leers
#90 2549077-attribute-filtering-90.patch59.03 KBWim Leers
#87 2549077-87.patch58.5 KBpwolanin
#84 interdiff.txt7.71 KBWim Leers
#84 2549077-attribute-filtering-84.patch59.12 KBWim Leers
#80 increment.txt9.99 KBpwolanin
#80 2549077-80.patch54.63 KBpwolanin
#72 increment.txt9.5 KBpwolanin
#72 2549077-71.patch50.59 KBpwolanin
#68 increment.txt4.6 KBpwolanin
#68 2549077-67.patch49.41 KBpwolanin
#61 2510138-js-wip-commits-do-not-test.patch61.18 KBWim Leers
#52 increment.txt4.92 KBpwolanin
#52 2549077-52.patch49.08 KBpwolanin
#51 increment.txt3.53 KBpwolanin
#51 2549077-51.patch49.94 KBpwolanin
#49 increment.txt1.4 KBpwolanin
#49 2549077-49.patch49.32 KBpwolanin
#49 z.output.txt1.76 KBpwolanin
#49 z.php_.txt3.95 KBpwolanin
#48 increment.txt1.36 KBpwolanin
#48 2549077-48.patch49.14 KBpwolanin
#47 increment.txt20.96 KBpwolanin
#47 2549077-47.patch49.97 KBpwolanin
#46 interdiff-2549077-45-46.txt3.13 KBphenaproxima
#46 2549077-46.patch42.3 KBphenaproxima
#45 increment.txt7.21 KBpwolanin
#45 2549077-45.patch40.61 KBpwolanin
#44 interdiff.txt2.78 KBWim Leers
#44 filter_html_attribute_whitelisting-2549077-44.patch36.97 KBWim Leers
#43 interdiff.txt1.9 KBWim Leers
#43 filter_html_attribute_whitelisting-2549077-43.patch36.63 KBWim Leers
#42 increment.txt4.23 KBpwolanin
#42 2549077-42.patch35.52 KBpwolanin
#40 interdiff.txt5.8 KBWim Leers
#40 filter_html_attribute_whitelisting-2549077-40.patch35.28 KBWim Leers
#39 increment.txt4.59 KBpwolanin
#39 2549077-39.patch35 KBpwolanin
#38 filter_html_attribute_whitelisting-2549077-38.patch47.78 KBWim Leers
#36 increment.txt6.04 KBpwolanin
#36 2549077-36.patch35.49 KBpwolanin
#31 increment.txt4.26 KBpwolanin
#31 2549077-31.patch34.47 KBpwolanin
#29 2549077-29.patch34.11 KBpwolanin
#27 2549077-27.patch34.07 KBpwolanin
#5 filter_html_attribute_whitelisting-2549077-5.patch43.45 KBWim Leers
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

effulgentsia created an issue. See original summary.

effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Issue summary: View changes
effulgentsia’s picture

Assigned: Unassigned » Wim Leers

Wim has already put together a first draft patch. Assigning this to him to post that work.

Wim Leers’s picture

This patch was created using $ git format-patch origin/8.0.x...HEAD --stdout > patchfile, so it can be applied using git am. This ensures you continue to see all individual commits, which should make the review process significantly easier.

The 8 commits contained in this patch, in commit order, are:

  1. Update FilterHtml's processing to support attribute and attribute value whitelisting. Also updated FilterUnitTest to already have reasonable correctness guarantees.
  2. Update other Filter tests, and update shipped configuration's allowed_html settings.
  3. D6 migration path updated to add the same default allowed attributes that D8 ships with.
  4. Update CKEditor module & test coverage: it needs to generate a different allowedContent setting now, to deal with attribute whitelisting.
  5. Fix bugs in HEAD: #2510138-5: Unidirectional editor configuration -> filter settings syncing breaks when button name does not equal the command name
  6. Update the parsing of the user-entered setting into Drupal.FilterHtmlRule objects.
  7. Update the generating of the setting based on the parsed user-entered setting plus additional tags/attributes required by the text editor.
  8. Update the calculating of the additional allowed tags and attributes based on the text editor configuration. EVERYTHING WORKS AGAIN!

Commits 1-4 affect the PHP code. Commits 5–8 affect the JS code for filter setting syncing (i.e. the stuff that #1894644: Unidirectional editor configuration -> filter settings syncing brought to core). And point 5 specifically is a fix to pre-existing bugs in HEAD, the patch for which has been filed in the corresponding issue.

Wim Leers’s picture

Issue summary: View changes
Issue tags: +Needs upgrade path, +Needs change record

The IS was wrong about BC.

While any existing value for the allowed_html key of the filter_html filter's configurations remains valid (and is BC-compatible in that sense), keeping the exact same value means that all attributes of those allowed tags are stripped. Because this patch only allows whitelisted attributes and attribute values to remain.

Also, if this truly were BC-compatible, I wouldn't be working on this now, but I'd do it in 8.1.0. Doing this in 8.1.0 would cause an extremely painful upgrade path, one that would by definition require manual intervention. It'd be better to do that before 8.0.0 for that reason.

The whole point of this issue is that no attributes are allowed by default. The only thing we can do in terms of upgrade path (and this patch already updates the D6 upgrade path in that way too), is assigning the attribute whitelist for each tag that D8 ships with to the tags that are allowed in D6 (and D7, once we have a D7 upgrade path). That sounds very complex, but it simply means mapping <a> to <a href hreflang dir>, <h4> to <h4 id>, et cetera.

The beta evaluation got it right though:

Major because we should not change the format configurations of existing sites, so if we don't prioritize this for 8.0.0, then sites built between 8.0.0 and when this is fixed will have invalid html content potentially for a long time.

IS updated accordingly.

Dave Reid’s picture

I'm assuming this will affect Entity Embed and URL Embed?

Wim Leers’s picture

The attributes they need would need to be whitelisted, but if their CKEditor plugins mark those attributes as required (as they should), then they'll be whitelisted automatically when their corresponding CKEditor buttons are enabled. Which is just like enabling the image button for CKEditor, that causes the data-align and data-caption attributes to be whitelisted on the <img> tag.

RainbowArray’s picture

What this boils down to me is how simple is it to whitelist attributes? Where does that happen?

Is there a way to whitelist an attribute so it can be used on every allowed tag? Or do you need to manually whitelist every attribute for every single allowed tag? If the format I'm seeing in the patch is how that will be done, if somebody wants to whitelist a number of attributes for every tag, that could get very long quickly.

The initial set of attributes looks like it's a pretty small subset of the valid html5 attribute list. And this doesn't prevent anybody from whitelisting an attribute that isn't valid in html5. It might be more difficult than right now, but it's certainly still possible.

JS frameworks all too often use non-valid attributes to do their magic. You'd hope an editor wouldn't need to use those attributes within a rich text field, but I'm betting that will happen.

From my personal experience, all too often I've gotten really frustrated when a wysiwyg like ckeditor wrecks my markup by stripping out returns and tabs and classes and, heaven forbid it is necessary (but sometimes it is), inline styles. And while it may be possible to then dig into the innards of the configuration, hopefully in the UI, to prevent that from happening again, in the meantime a bunch of markup that I wrote now has to be rewritten.

So I'm not sure that when somebody tries to use a non-html5 attribute they'll instantly realize, hey, I can only use valid html5, and then they'll behave from here on out. Because that's not true, there's a whole lot of html5 attributes they can't use either. So the more likely outcome is that if they really wanted to do something, they dig into the config and add in the align attribute so things work again.

I don't know, maybe I'm wrong. I'm just not sure that if the explicit goals is to teach content editors not to use attributes that are no longer valid in html5, that this will do that. It will do other things, and maybe those things are good too.

Overall, if tags can be whitelisted, it probably makes sense that attributes can be whitelisted too.

Dave Reid’s picture

@WimLeers The problem with Entity Embed is that we do not know in advance what all attributes could be used. We make it very possible for any arbitrary attributes to be added to the <drupal-entity> element. For example the image field formatters add in alt and title attributes. Another use case could be that something adds in data-field-name-override="Overridden value". I guess we could basically whitelist anything that starts with data-* in Entity Embed, but I'm not sure if that's possible.

Wim Leers’s picture

#10: But those formatters are run in a filter, right? So that filter should just run after filter_html.

effulgentsia’s picture

@mdrummond, @Dave Reid: thanks for joining the conversation on this issue. It's really necessary to bring in multiple perspectives for the outcome to be a successful one.

Overall, if tags can be whitelisted, it probably makes sense that attributes can be whitelisted too.

Exactly. That's the bottom line. On D6 and D7, attribute whitelisting requires a contrib module (there are several popular ones to choose from), but my point in this issue is that with D8 core outputting HTML5, attribute whitelisting needs to be supported by core.

The initial set of attributes looks like it's a pretty small subset of the valid html5 attribute list.

Right. Just like the default tag list is a small subset of valid html tags. If you want to add a <table> tag to the Restricted HTML format, you need to edit the format and do that. Similarly, if you want to add a draggable attribute, edit the format and do it. Standard profile's default configuration should only include what we consider to be very broadly necessary.

And this doesn't prevent anybody from whitelisting an attribute that isn't valid in html5. It might be more difficult than right now, but it's certainly still possible...So I'm not sure that when somebody tries to use a non-html5 attribute they'll instantly realize, hey, I can only use valid html5, and then they'll behave from here on out. Because that's not true, there's a whole lot of html5 attributes they can't use either. So the more likely outcome is that if they really wanted to do something, they dig into the config and add in the align attribute so things work again.

The difference is that for the person to do this, they need permission to edit that format's configuration. Which means they're acting in a "site building" capacity rather than a "content authoring" capacity. We can't stop site builders from making poor configuration choices. But we can help content authors generate content that complies with technical standards that we shouldn't require them to know.

I'm just not sure that if the explicit goals is to teach content editors not to use attributes that are no longer valid in html5, that this will do that. It will do other things, and maybe those things are good too.

HTML5 is the example I wrote the IS with, because that's the specification that a fresh core installation claims to comply with (via the doctype output in html.html.twig). But it's not just about that: it's about site builders being able to define the specifications of the site they build (HTML5 being one component of that). We shouldn't have to teach content editors about those specifications: the tools should automatically ensure that the content complies.

RainbowArray’s picture

Title: Standard profile outputs an HTML5 doctype declaration, but allows content authors to create content with non-conforming attributes » Allow content authors to use whitelisted attributes with standard profile

I think this issue title better reflects how this is working.

effulgentsia’s picture

Title: Allow content authors to use whitelisted attributes with standard profile » Restrict content authors using core profiles to whitelisted HTML attributes

Or even this.

davidhernandez’s picture

Reading through this, I agree with mdrummond's concerns. They are not unfounded. The high level goal here is a worth one, but this does seem to be some amount of babysitting. The example in the IS is a good one. I can imagine that exact scenario happening, and that editor not having any idea how to fix it. That's the big concern for me.

I do agree with effulgentsia in #12:

...they need permission to edit that format's configuration. Which means they're acting in a "site building" capacity rather than a "content authoring" capacity.

We have to always side with allowing the flexibility to do something, even if it isn't the default. Listen to the person, not the machine.

Did we get an answer as to why the whitelist has to be per tag? That seems cumbersome. And looking briefly at the patch, there are a bunch of tags that don't have any attributes whitelisted, and class is not allowed on anything?

Wim Leers’s picture

Did we get an answer as to why the whitelist has to be per tag?

Because many attributes only make sense on specific tags. And perhaps you e.g. only want to allow images to be aligned, and not blockquotes.

That seems cumbersome.

We could support a * tag whose attribute whitelist would apply to all tags. E.g. <* id> would allow any ID attribute on any tag.

And looking briefly at the patch, there are a bunch of tags that don't have any attributes whitelisted, and class is not allowed on anything?

First, the specific default whitelist in the patch is just an example, it definitely needs scrutiny (aka bikeshedding).
Second, the class attribute is not whitelisted because many sites don't want to use classes. If you want to use classes, enable CKEditor's "Styles" button and add additional styles there; they'll become whitelisted automatically then. I.e. if you configure

p.callout|Callout
img.hero|Hero image

in CKEditor's Styles button configuration, then you'll effectively be whitelisting <p class="callout"> <img class="hero">. Allowing any class to be used is undesirable. If you specifically make classes available for the content editor, then you have to tell the text editor configuration about it, which well then automatically cause the text format (filter) configuration to be kept in sync.

davidhernandez’s picture

We could support a * tag whose attribute whitelist would apply to all tags.

I think that would be an improvement.

Allowing any class to be used is undesirable.

Undesirable for whom? Why make these kinds of value judgements? If the editor is this restrictive out of the box, it would be a fail for users, especially new ones. People will not understand what is happening, and it could potentially increase the perception that Drupal is just too hard to use.

I understand that some sites want a very restricted editing experience, ones run by teams and with more structure, but not every site is like that. (Do we need this to look even more enterprise-like, with the perception that you really need a team of experts to even run a Drupal site?) I support having the ability for sites to create that level of restriction, to control the things their editors can do, but I would not force it upon people.

Just thinking about myself, I use one-off CSS classes all the time for something simple. Having to add them to the styles drop-down of the editor just to make sure they don't get filtered would be terrible. Those styles are only for things editors use frequently, and need easy access to. I would not want everything to appear there. And I don't want to have to change the editor configuration every time I need to use a class. The styles drop down also doesn't support multiple selections, as far as I know. I can't choose "Special subhead" and "Red" together. But, in any case, how would I whitelist that? Does every combination of class names have to be whitelisted?

....

After ranting to myself a bit I'm looking at this patch some more. I don't really understand enough about the filters, but I'm trying to figure some of it out. This looks like it is tied to the filter itself, and not the text format? So you are proposing adding these restrictions to the "Limit allowed HTML tags" filter, which is active for "Basic HTML" and "Restricted HTML", but not "Full HTML". Basic HTML being the default format used after install. That means changing your format from Basic to Full would turn off the filtering completely? Is that right?

If that is the case, that makes this less controversial. And it is a lot easier to tell people to change their text format than to figure out how to change their CKEditor configuration to support exact specifics of what they are doing. What I would prefer then is to see if this could be a separate filter instead of it being part of full_html, or an option of some kind. A separate "Limit allowed HTML attributes". That would make it easier to turn off and on by itself. Then we could argue about whether it should be on by default, but it would available for use.

greggles’s picture

I think the need here really depends on the site. We've seen cases on drupal.org where people abuse classes and ids to get styling on pages to make things look very different from the reality. For example, there was an installation profile a few years ago that manually made a download area that looked like the real one and was "green" and had download links on a completely different site that contained insecure versions of core/contrib. The real download area is (or was) smart enough to turn yellow if core was out of date, but this manually created table kept being green for long after.

I believe this would be part of the filter so advanced users who are trusted with more advanced filters could be trusted with more permissive sets of attributes and attribute values.

Further, some sites could get by with a wildcard (if only trusted users are inputting content), while others will want to strictly limit the attributes. That feels like a very Drupally solution to me.

David_Rothstein’s picture

Title: Restrict content authors using core profiles to whitelisted HTML attributes » Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small whitelist of attributes by default
Issue tags: +Needs backport to D7

Better title, I think, based on the above discussion. Definitely if you're using Full HTML you should not have any restrictions, but for formats that use the HTML filter it makes sense to have this as an option -- in general it is a good principle to only give people access to what they need.

For example, if you paste some HTML from somewhere on the internet and it has class names that overlap with class names used on your site, you could accidentally get some styling that you don't want, etc.

I believe WordPress, for example, only allows unprivileged users (e.g. people leaving comments) access to a limited set of HTML attributes by default. In Drupal you currently need to install something like HTML Purifier or WYSIWYG Filter to accomplish this, which are good but a fair amount more complex than the filters in core.

We should consider this for backport to Drupal 7 also (although without the change in default behavior, especially for existing sites).

Wim Leers’s picture

#17:

So you are proposing adding these restrictions to the "Limit allowed HTML tags" filter, which is active for "Basic HTML" and "Restricted HTML", but not "Full HTML".

Correct.

Basic HTML being the default format used after install.

Correct. And for a good reason: we don't want to actively encourage people from creating all sorts of crazy HTML and having articles output that.

That means changing your format from Basic to Full would turn off the filtering completely? Is that right?

Yes, that's right. And it's nothing new.

This has already been the case in Drupal 8 for a few years, since shortly after we have CKEditor. When using "Basic HTML", CKEditor will not allow you to use (or paste) <table> (just an example) unless the "Limit allowed HTML tags" filter allows that tag. In other words: CKEditor helps see content creators immediately what to expect, rather than happily showing any HTML and then filtering it away on output, which is extremely frustrating.

Since Full HTML does not use the "Limit allowed HTML tags" filter, there are no restrictions, and hence you can do anything, including <script> tags. (Which is another reason why you should not ever allow non-admin users to use "Full HTML").

Finally, this actually also is the case in Drupal 7 and before. The only new thing here is to also whitelist attributes. But the same exemptions in case of "Full HTML" apply: anything goes when using Full HTML.


#18: Thanks, great anecdotal examples of why this is useful.


#19:

in general it is a good principle to only give people access to what they need.

Exactly!

I believe WordPress, for example, only allows unprivileged users (e.g. people leaving comments) access to a limited set of HTML attributes by default.

Indeed, in that regard we are lagging WordPress. (Though WordPress still does its filter-on-input thing, see https://www.drupal.org/node/263002 for why Drupal does not.)


Now, let's look at things from a different perspective. Overall, I would personally say that Drupal really is about data modeling. You create content types, you add fields to them. You configure the fields' settings. This issue is following those same lines, and is addressing something we've overlooked for quite some time.

We currently allow any sort of crazy attributes to be set on tags (only style and on* are disallowed). Which means you still can't reason properly about the data that Drupal stores, because the semantic vocabulary of markup that this particular site, with this particular use case, and this particular audience… is simply not defined. It's only defined coarsely, at the tag level.

davidhernandez’s picture

But what you are proposing doesn't have much of a middle ground. It is either all or nothing. You are untrusted and have no access to tags or attributes or you are an admin user and can do anything. The patch doesn't apply any more so I can't check, but does this give us a configuration to choose the allowed attributes like the option we have to choose the allowed tags? I assume not based on the prior discussion of the whitelisting and wildcard. At a bare minimum that would be needed, in my opinion.

Which means you still can't reason properly about the data that Drupal stores, because the semantic vocabulary of markup that this particular site, with this particular use case, and this particular audience… is simply not defined.

I don't see why that is a concern. It may be for some sites, but it certainly isn't for every site. Again, I'm not against functionality to give a site owner the ability, but we should be careful about forcing it down everyone's throat and making the interface more difficult to use.

Wim Leers’s picture

The patch doesn't apply any more so I can't check, but does this give us a configuration to choose the allowed attributes like the option we have to choose the allowed tags? I assume not based on the prior discussion of the whitelisting and wildcard. At a bare minimum that would be needed, in my opinion.

Dragging in the "Image" button would cause all attributes needed for that button to function correctly to be whitelisted, so it'd add <img src alt data-align data-caption data-entity-type data-entity-uuid> to be whitelisted.

Adding additional styles to CKEditor (e.g. p.callout|Callout, p.intro|Introduction and blockquote.choice-quote|Choice quote) also cause the corresponding attributes to be whitelisted (in the examples given: <p class="callout|intro"> and <blockquote class="choice-quote">.

In other words: as you change the text editor configuration, the corresponding attributes that need to be whitelisted are updated automatically.

we should be careful about forcing it down everyone's throat and making the interface more difficult to use.

Yep, I totally agree with that sentiment, and I'm glad you're voicing these concerns :) I think we just may be doing a poor job of explaining this right now.


I'll get a green patch again.

davidhernandez’s picture

Adding additional styles to CKEditor...

I don't think this is flexible enough. I wouldn't want all classes, especially all block level ones, in the CKEditor config. I also wouldn't want to have to constantly update the configuration. The WYSIWYG is a tool to help build markup. It shouldn't be the gatekeeper. Not to mention some of the limitations it has doing something more than plain vanilla.

I presume that if the editor is turned off all the whitelisting goes with it? Adding the image button would whitelist the img tag and those attributes, so not having the button or the editor means those go away?

I'd like to also see some opinions from more content focused people. I know there are many circles that are trending away from being wysiwyg dependent. That could actually play either way.

pwolanin’s picture

Disucussed with Wim that we should try to allow specifying allowed attribute values using a space separated list - in other words just like you would write a class on html like:

<a class="foo bar baz">

to allow an A tag with any/all of the foo, bar, baz class values.

pwolanin’s picture

We need to check that html entities in attribute values are correctly handled in filtering and restored in the output

I think it tends to decode entities when you fetch an attribute value from the DOM node.

pwolanin’s picture

Perhaps the conversion in attribute values of the basic XML entites like &amp; isn't a problem, since it looks like the patch is also using DOM to extract the allowed values.

We do need to fix the check for allowed attribute values sice it looks like the patch noly supports a single allowed value per attribute, but I expect things like class need to support multiple?

I think we also need a way to specify attribute names and values allowed for all tags? maybe something like <global > or <_attr > or even <_ > which seems to get through the parser?

From the comments on the PHP manual page "if your DOMDocument was loaded from HTML, where element and attribute names are case-insensitive, the DOM parser converts them all to lower-case"

This seems to be true, and I see that reflected in the output of Html::serialize(), so that behavior is not changed by the patch, but something worth documenting perhaps?

pwolanin’s picture

Attempting to reroll. This is missing some changes from HEAD to add H2 and H3 to the allowed tags list.

Status: Needs review » Needs work

The last submitted patch, 27: 2549077-27.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
34.11 KB

oops - double "use" statement.

Status: Needs review » Needs work

The last submitted patch, 29: 2549077-29.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
34.47 KB
4.26 KB

Add h2 and h3 back, and fix up test.

effulgentsia’s picture

I think we also need a way to specify attribute names and values allowed for all tags?

This was also mentioned in #9, #15, #16, and maybe some other comments. I agree with that being useful for some attributes (especially id and dir), but let's also keep in mind that the list of global html5 attributes is very small.

effulgentsia’s picture

+++ b/core/profiles/standard/config/install/filter.format.basic_html.yml
@@ -11,7 +11,7 @@ filters:
-      allowed_html: '<a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h2> <h3> <h4> <h5> <h6> <p> <br> <span> <img>'
+      allowed_html: '<a href hreflang dir> <em> <strong> <cite> <blockquote cite dir> <code> <ul> <ol start> <li> <dl> <dt> <dd> <h4 id> <h5 id> <h6 id> <p id dir> <br> <span dir> <img src alt height width data-entity-type data-entity-uuid data-align data-caption>'

Speaking of dir (#32), let's also change these to enumerate the allowed values of it (e.g., dir="ltr rtl") per http://www.w3.org/TR/html5/dom.html#the-dir-attribute. Not sure if we want to add "auto" to that value list in our default configuration, given that spec's statement of "Authors are urged to only use this value as a last resort".

effulgentsia’s picture

Also, wondering if it make any sense to include dir in our default config without also including lang on the same elements?

pwolanin’s picture

Maybe we should just add id, lang, and dir to the globally allowed list of attributes?

pwolanin’s picture

Here's a start on globally allowing those 3.

Also fixes a logic bug in adding rel=nofollow

Clearly there is missing test coverage for that.

Status: Needs review » Needs work

The last submitted patch, 36: 2549077-36.patch, failed testing.

Wim Leers’s picture

I rebased my local branch, which is identical to the patch in #5. The advantage is that I'm now able to still roll patches that contain multiple commits, which make this much, much easier to review.

This is identical to #5, but rebased, and with two additional commits: to import that changes in #31 and #37. One commit of #5 became obsolete: commit 5, because that was just a straight bugfix being included that was necessary here, and that bugfix was since committed: #2510138: Unidirectional editor configuration -> filter settings syncing breaks when button name does not equal the command name.

#31 looks fine, I have some questions for #37.

@pwolanin, you can apply these commits locally, so that you have the exact same branch! :)

pwolanin’s picture

Status: Needs work » Needs review
FileSize
35 KB
4.59 KB

Fixing test fails.

Wim Leers’s picture

@pwolanin and I are now collaborating in a sandbox, easier than multi-commit patches like #5 and #38. See http://cgit.drupalcode.org/sandbox-dereine-2031809/log/?h=2549077-attrib....

This addresses #34.

Status: Needs review » Needs work

The last submitted patch, 40: filter_html_attribute_whitelisting-2549077-40.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
35.52 KB
4.23 KB

test and logic fixes

Wim Leers’s picture

Added a small bit of missing test coverage.

Wim Leers’s picture

Given that id/lang/dir are now globally allowed and hardcoded attributes, update the migration process plugin accordingly. D7 now also has a filter format migration template, add a @todo to update that too.

pwolanin’s picture

Adding more test coverage and test fixes.

phenaproxima’s picture

Fixed the @todo in the d7_filter_format migration.

pwolanin’s picture

Moving the real code to a method on the filter class, and starting a unit test specifically for the attribute code.

Also, tried to improve the performance of FilterHtml by saving the calculated restrictions array - this might make a difference if you are e.g. rendering a long comment thread and calling it multiple times. Also allows for the new method to be more stand-alone.

pwolanin’s picture

I think we can also remove some of the special casing for the rel=nofollow setting in FilterHtml::getHTMLRestrictions(), since that now is added after the other filtering.

pwolanin’s picture

Here's a little change to getHTMLRestrictions() that makes the settings parsing code 5x faster.

Quick microbench script and example output also attached.

dawehner’s picture

This looks great!

  1. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,159 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    $global_allowed_attrs = array_filter($restrictions['allowed']['*']);
    

    Let's not use short names, there is just no reason to do so, unlike we are in fortran

  2. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,159 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      foreach ($xpath->query('//' . $allowed_tag) as $node) {
    ...
    +      foreach ($allowed_attributes as $attribute => $allowed_values) {
    ...
    +        foreach ($xpath->query('//' . $allowed_tag . '[@' . $attribute . ']') as $node) {
    

    I'm curious whether it would make sense to save some of the xpath queries and instead do the logic in php?

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,159 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +            if (in_array($value, $allowed_values)) {
    

    Should we better we strict here?

  4. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,159 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    $html = strtr($this->settings['allowed_html'], ['>' => ' />']);
    

    Any reason to not just go with str_replace, which is IMHO easier to read for simple usecases and a bit faster

  5. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,159 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // The 'id', 'lang', and 'dir' attributes apply to all elements and are
    +    // always allowed. The value whitelist for the 'dir' attribute is enforced
    +    // by _filter_html().
    

    It would be nice to document why id, lang and dir are always allowed. Why for example are classes not always allowed but IDs are? So yeah I wonder why this attributes are not just part of the global allowed attributes

pwolanin’s picture

per feedback from dawehner

pwolanin’s picture

Additional tweak to use an xpath selector for any attribute to make the code a little simpler and reduce the number of loops.

regilero’s picture

We are talking of filtered content. I don't understand why we should allow 'id' attribute for contributed contents.

"id" should be a unique attribute in a final page. A regular website will usually use "id" on the layout composition, css theming for main blocs. But having some "id" in the dynamic content may lead to duplicates on the page (i know that browsers all kind enough with this, but by definition it is bad). Some bad usages of ids are replacing regular usage of classes, but forbidding classes and allowing ids will maybe push users to replace "class" classical usages with "id", that would be worst.

What are the real world usage of "id" in contributed contents (I'm maybe wrong on that, just that I do not see real examples) ?

Why would you need to whitelist this attribute?

pwolanin’s picture

I think we have some mismatches now with the docs on \Drupal\filter\Plugin\FilterInterface::getHTMLRestrictions()

pwolanin’s picture

@regilero - it's very commonly used to make jump links - take a look at drupal.org handbook pages. Also, I discussed with @alexpott and Drupal 8 has removed most or all use of ID in CSS selectors.

However, I had been debating myself if there should be a setting to disallow ID as part of the filter.

effulgentsia’s picture

From #19:

For example, if you paste some HTML from somewhere on the internet and it has class names that overlap with class names used on your site, you could accidentally get some styling that you don't want, etc.

I think this applies to ID as well. If you copy content from another web page, what it used for an element's ID might overlap with an ID you already have on your page, and then you end up with duplicate IDs and thereby invalid HTML. So at a minimum I think it should be possible for site builders to opt-out of allowing ID attributes to their content authors. As to whether the default configuration should allow ID, ...

I believe WordPress, for example, only allows unprivileged users (e.g. people leaving comments) access to a limited set of HTML attributes by default.

I think this is an area where WordPress might have already done a lot of good thinking on this. So, do they allow ID by default in their content inputs?

Status: Needs review » Needs work

The last submitted patch, 52: 2549077-52.patch, failed testing.

effulgentsia’s picture

One thought might be to allow ID by default in basic_html, but not in restricted_html, but again, I think we should research what WordPress does, just to have that as a data input to this decision.

regilero’s picture

@pwolanin, so a default filter policy allowing "id" for "a" tags only would make sense. But this 'withelist id for anchor' elements should be removable. In a perfect world I would like a filter allowing "id" only on anchors having no other attributes like "href", very sad that html is using anchors for both external linking and internal maps, that's a very different type of anchor.

Also about id usage in css, you cannot expect all themers to forbid usage of ids in css, even if D8 core is doing so, as "id" seems a pretty nice tool for layouts.

RainbowArray’s picture

You can put id on any element to create an in-page anchor. Yes, ideally an id should not be used for CSS selectors, but there are other legitimate uses of id.

Wim Leers’s picture

So, I've been working on the JS. The challenge here is to keep #2510138: Unidirectional editor configuration -> filter settings syncing breaks when button name does not equal the command name working. The code that that issue introduced is very complex, because it needs to parse generic filter settings (in core's case only filter_html's) and generic text editor (in core's case only CKEditor) settings. And then compare the two, and smartly determine which tags, attributes and attribute values are missing.

The problem is that this was never designed to work with attribute values in general, but only classes and styles. Making that work is a huge undertaking.

On the bright side: this allows us to make the JS match \Drupal\filter\Plugin\FilterInterface::getHTMLRestrictions() in all cases: far fewer data structures are necessary.


But this is such a massive amount of changes on the JS side, after having worked on this for >20 hours in the past few days, the end is still not in sight. Due to the lack of JS tests, I'm also continuously breaking things, and so progress is very slow and EXTREMELY frustrating.

Hopefully somebody else is willing to take this over. Here's a patch that contains multiple commits on top of #44, but it should apply just as well to the latest patch.


Or, perhaps we can make it work with FAR less changes and just have it work for styles and classes initially? And break BC for this in 8.1, with a better, more generic JS API to do the editor->filter setting syncing?

My JS-fu is not strong enough anymore, this would take me several more DAYS, fulltime, to get it all to work again at the current pace. Not to mention I just want to kick my computer after having worked on this the entire day. I need a break from this, and I won't have the energy to work on this again tomorrow.

pwolanin’s picture

@Wimleers - I think just supporting styles and classes in 8.0.x would be fine, as long as the editor JS will ignore other attributes.

The last submitted patch, 52: 2549077-52.patch, failed testing.

RainbowArray’s picture

Wait, so you're saying removing the ability for anything except classes and styles as attributes without the possibility of whitelisting other attributes?

If that's the case, I'd say that it's better to not do this at all. I certainly understand that this is complex and time-consuming. But I think this would be a huge frustration point to remove the ability to add attributes with no recourse except for Full HTML.

Wim Leers’s picture

(Sorry for the unclear, emotional comment in #61 — it was a very frustrating day working on the JS side of this.)

Wait, so you're saying removing the ability for anything except classes and styles as attributes without the possibility of whitelisting other attributes?

No, I'm saying that we'll only do the automatic editor configuration -> filter settings syncing of attribute values for the class attribute. Tag syncing and allowed attribute syncing will continue to work. But only the class attribute will get the attribute value syncing.

That still allows you to whitelist specific attribute values for any attribute, it just won't happen automatically. But the 99% use case is whitelisting specific class attribute values anyway.

Sorry for being so unclear!

effulgentsia’s picture

So as an example, you could either add lang as an attribute without listing values (which would mean, any value is allowed), or you could list values, like lang="en de" if you only wanted those languages allowed. But, if you did the latter, and then used a CKEditor plugin that allowed adding a language selector to the toolbar, and that plugin had a UI for adding languages, this patch would not handle the automatic addition of those languages to your filter configuration: you'd need to edit the filter settings manually in addition to configuring the CKEditor plugin. I think that's acceptable. If at some point, someone with the JS chops wants to take on improving that, I think that can be a normal priority issue for 8.1, or maybe even 8.0.1.

RainbowArray’s picture

I think based on my understanding, sounds good. As long as the allowed attribute list is configurable, which it sounds like it is, +1. Hope tomorrow is a better day Wim!

pwolanin’s picture

Looking at \Drupal\filter\Plugin\FilterInterface it seems that the data structure for attribute values wasn't following the described structure - each value should be a key, and the value TRUE or FALSE.

pwolanin’s picture

Status: Needs work » Needs review

grr

The last submitted patch, 38: filter_html_attribute_whitelisting-2549077-38.patch, failed testing.

The last submitted patch, 38: filter_html_attribute_whitelisting-2549077-38.patch, failed testing.

pwolanin’s picture

Ok, in response to comments above, taking out ID as globally allowed and going back to Wim's original approach of putting it just on H2 - H6 tags by default.

trying to figure out what's going with the lang attribute, chx pointed me to #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5

A couple points after looking around.

We seem to not be using the html5 library except in one test, and I can only assume it's much slower or has other problems.

This patch makes the html corrector filter pointless (redundant) if you are using the html filter, so we should possibly remove it if they are both, and change the UI text. from "Limit allowed HTML tags" to e.g. "Limit allowed HTML tags and Correct faulty HTML"

Dave Reid’s picture

I'm a little uncertain about modules like Entity Embed which use data attributes on a <drupal-entity> element, but we don't know what attributes will be actually used at run-time, as individual display options/formatters can inject additional attributes to the element. Are all attributes starting with data- whitelisted by default? Is there any way to wildcard all attributes of a tag? Must we absolutely know what attributes are supported at all times?

effulgentsia’s picture

individual display options/formatters can inject additional attributes to the element

That sounds like something that's done during the Entity Embed filter (i.e., render the <drupal-entity> element into whatever the HTML display of that entity is, which is where formatters would come into play). Per #11, don't you need the filter_html filter running before the Entity Embed filter anyway? Since the rendered entity can end up being output with all sorts of tags not allowed by filter_html as well.

Must we absolutely know what attributes are supported at all times?

If I'm right about the above, then what's the use case where you don't know ahead of time all of the data attributes that are possible on the <drupal-entity> element itself (prior to its rendering by Entity Embed)?

Dave Reid’s picture

Yes, we definitely have filter_html running before the Entity Embed rendering filter.

Use case: I want to override a field on a node when embedding it. I form alter the Entity Embed dialog form to add:

$form['attributes']['data-my-field-override'] = array(
  '#type' => 'text',
  ...
);

This form value is automatically saved as <drupal-entity ... data-my-field-override="Submitted value"></drupal-entity>. While I believe whoever form alters this could also alter our filter plugin definition or anything else alterable, the same attribute is definitely *not* going to be added to our CKEditor plugin JS, since that's hard-coded to certain attributes only by Entity Embed (and has no way to alter them).

Dave Reid’s picture

We *need* this flexibility to permit any additional arbitrary attributes to be added to entity embeds. It's critical to making sure that our project is successful in real-world use. We would need to be able to whitelist all attributes on an element (if not just anything that starts with data-*), and not just certain predefined ones.

Gábor Hojtsy’s picture

@pwolanin asked me to look at this issue specifically re id support. I agree that supporting id, lang and other common attributes by default or making it very easy to configure their support would be important.

Wim Leers’s picture

#68:

+++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
@@ -207,7 +217,7 @@ public function getHTMLRestrictions() {
-      'dir' => ['ltr', 'rtl'],
+      'dir' => ['ltr' => TRUE, 'rtl' => TRUE],

Yep, this was wrong, also fixed in #61.


#73:

Are all attributes starting with data- whitelisted by default?

Not in the current patch.

Is there any way to wildcard all attributes of a tag?

No, that'd largely defeat the purpose: it's likely that most sites would then end up allowing all attributes on some or all tags. That'd then make this entire patch pointless.

I think data-my-field-override is a very, very bad idea, because it can easily conflict with other HTML. IMO it should be namespaced to the Entity Embed module. So: data-entity-embed-my-field-override. If we'd then specify <drupal-entity data-entity-embed-*>, then it should work. On the CKEditor side it will already work. But on the PHP side (in the HTML filter), wildcards are not yet supported.

The current installation instructions of Entity Embed say:

Add <drupal-entity> to the 'Allowed HTML tags'

then that could be changed to:

Add <drupal-entity data-entity-type data-entity-uuid data-view-mode data-entity-embed-*> to the 'Allowed HTML tags'

… and in fact all that could happen automatically if you updated Entity Embed's CKEditor plugin to not specify allowedContent: 'drupal-entity[*]' and requiredContent: 'drupal-entity[*]', both of which are very generic. Specify the necessary attributes explicitly and they'll end up being whitelisted. See core/modules/ckeditor/js/plugins/drupalimage/plugin.js for an example.

Hopefully this addresses your concerns :)


@pwolanin Thoughts about supporting prefix-* support for attribute whitelisting?

pwolanin’s picture

@WimLeers - I was thinking about prefix-* as an important use case, thoguh clearly you can't really integrate that with CKEditor?

In terms of the PHP code, it would be a little ugly or less performant, but certainly doable. XPATH seems to support @* but not @prefix-*, so we need to use strpos or preg_match

pwolanin’s picture

Here's with a first pass addition of * support for attribute names and values. Also change the name in the UI and removed htmlcorrector as a default filter.

Wim Leers’s picture

Looks awesome at first sight!

I'll work on the JS tomorrow, as agreed in #61, #62, #66.

Status: Needs review » Needs work

The last submitted patch, 80: 2549077-80.patch, failed testing.

pwolanin queued 80: 2549077-80.patch for re-testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
59.12 KB
7.71 KB

Now the JS is done!


I did a lot of manual testing.

My testing data set for the "Styles" dropdown settings:

a.foo|Foo
a.bar|Bar
a.foo.bar|Foobar
a.foo.bar.baz|Foobar
x.Z|X

Starting from Basic HTML, I added the "Styles" dropdown and then pasted in the above settings. Removed again. Then saved it with the first, second and fifth line. Adding just the third line and removing it again works as expected. Adding the fourth line and removing it again works as expected.

(Remember: once the text format is saved, any styles dropdown setting that is removed — or generally, any button that is removed — does NOT result in the HTML filter's setting being modified. Because in the mean time, content may have been created that depends on those tags/attributes/attribute values being allowed, and therefore we should always only *add* to the filter setting, not *remove*. Site builders can still remove things manually of course, but the automatic updating of the setting should never cause any problems. Hence this design decision, made a long time ago in #2510138: Unidirectional editor configuration -> filter settings syncing breaks when button name does not equal the command name, and still respected now.)

Status: Needs review » Needs work

The last submitted patch, 84: 2549077-attribute-filtering-84.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review

DrupalCI is green, PIFR is too slow (HEAD has already changed by the time it began testing).

Back to NR.

pwolanin’s picture

Needed a re-roll though for conflicts in D6 and d7 migration files. (new branch in sandbox: 2549077-attribute-filtering-rebase)

not sure I got it right - let's see about the tests.

Status: Needs review » Needs work

The last submitted patch, 87: 2549077-87.patch, failed testing.

The last submitted patch, 87: 2549077-87.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
59.03 KB
1.27 KB

Fixed the incorrectly resolved conflicts (interdiff is relative to #87). Should be green again.

This is identical to #84, and just slightly smaller, because core already introduced part of the changes this needed in #2562073: Filter format migrations need to handle php_code filter.

The last submitted patch, 90: 2549077-attribute-filtering-90.patch, failed testing.

Wim Leers’s picture

  1. +++ b/core/modules/filter/filter.module
    @@ -13,7 +13,7 @@
    +use Drupal\filter\Plugin\FilterBase;
    

    Unused.

  2. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    unset($restrictions['allowed']['*']);
    +    $text = Xss::filter($text, array_keys($restrictions['allowed']));
    +    return new FilterProcessResult($this->filterAttributes($text));
    

    This currently feels kinda mysterious. I think it'd be great to explain:
    - that the first two lines do tag filtering, and for that we reuse the well-tested, battle-hardened Xss::filter(); the asterisk tag does not exist, so there's no point in passing that to Xss::filter()
    - after we've done tag filtering, we do attribute and attribute value filtering

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      if ($tag_attributes === FALSE) {
    +        $tag_attributes = [];
    +      }
    +      // By default, no attributes are allowed for a tag, but due to the
    +      // globally whitelisted attributes, it is impossible for a tag to actually
    +      // completely disallow attributes.
    

    The comment also applies to this if-statement, so I think it should come first.

  4. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      $allowed_attributes = ['?' => []];
    

    I think we want a comment explaining this mysterious question mark :)

  5. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      // Find all nodes that have any attributes and filter the attributes.
    

    Nit: s/filter the attributes/filter the attributes and attribute values/

  6. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +          // Remove attributes not in the white list.
    ...
    +            // Check the attribute values white list.
    

    s/white list/whitelist/ ?

  7. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +  /**
    +   * Helper function to prepare attribute values.
    +   *
    +   * @param bool|array $attribute_values
    +   *   TRUE, FALSE, or an array of allowed values.
    +   *
    +   * @return bool|array
    +   */
    +  protected function prepareAttributeValues($attribute_values) {
    

    This exists mainly (solely?) because of whitelisted attribute values supporting wildcards. Would be great to have that mentioned in the docblock?

  8. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
       public function getHTMLRestrictions() {
    

    Nit: in earlier patches (e.g. #4), this also took into account the filter_html_nofollow setting. Also not the case in HEAD though, so it's fine to consider that out of scope.

  9. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    $star_protector = '__zqh6vxfbk3cg__';
    

    Nit: s/star/asterisk/

    Although this makes it sound much more epic, so scratch that :D

  10. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,226 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +          // Fourth, if the attribute value is not the empty string, this means an
    

    Nit: 80 cols violation.

  11. +++ b/core/modules/filter/src/Plugin/migrate/process/FilterSettings.php
    @@ -0,0 +1,49 @@
    +  protected $allowedHtmlDefaultAttributes = [
    +    '<a>' => '<a href hreflang>',
    +    '<blockquote>' => '<blockquote cite>',
    +    '<ol>' => '<ol start>',
    +    '<img>' => '<img src alt height width>',
    +  ];
    

    Needs to be updated to match the current defaults (which includes e.g. <h2 id>).

  12. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -407,41 +408,82 @@ function testHtmlFilter() {
    +    // Restrict the whitelisted "llama" attribute on <a> to only allow the value
    +    // "majestical", and the class on code to be pretty or boring.
    +    $filter->setConfiguration(array(
    +      'settings' => array(
    +        'allowed_html' => '<a href llama="majestical epic"> <em> <strong> <cite> <blockquote> <code class="pretty boring"> <ul> <ol> <li> <dl> <dt> <dd> <br>',
    +        'filter_html_help' => 1,
    +        'filter_html_nofollow' => 0,
    +      )
    +    ));
    +    $f = (string) $filter->process('<a kitten="cute" llama="awesome">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertIdentical($f, '<a>link</a>', 'HTML filter removes allowed attributes that do not have an explicitly allowed value.');
    +    $f = (string) $filter->process('<a kitten="cute" llama="majestical">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertIdentical($f, '<a llama="majestical">link</a>', 'HTML filter keeps explicitly allowed attributes with an attribute value that is also explicitly allowed.');
    +    $f = (string) $filter->process('<a kitten="cute" llama="awesome">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertNormalized($f, '<a>link</a>', 'HTML filter removes allowed attributes that have a not explicitly allowed value.');
    +    $f = (string) $filter->process('<a kitten="cute" llama="majestical">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertNormalized($f, '<a llama="majestical">link</a>', 'HTML filter keeps explicitly allowed attributes with an attribute value that is also explicitly allowed.');
    +    $f = (string) $filter->process('<a kitten="cute" llama="epic">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertNormalized($f, '<a llama="epic">link</a>', 'HTML filter keeps explicitly allowed attributes with an attribute value that is also explicitly allowed.');
    +    $f = (string) $filter->process('<a href="/beautiful-animals" kitten="cute" llama="epic majestical">link</a>', Language::LANGCODE_NOT_SPECIFIED);
    +    $this->assertIdentical($f, '<a href="/beautiful-animals" llama="epic majestical">link</a>', 'HTML filter keeps explicitly allowed attributes with an attribute value that is also explicitly allowed.');
    

    This test no longer makes sense.

    The comment doesn't match the configuration.

    There is a lot of duplication.

    Seems like the duplication is a copy/paste error that removed the additional test coverage that is now missing?

    See the test coverage I added in #43.

Status: Needs review » Needs work

The last submitted patch, 91: 2549077-attribute-filtering-91.patch, failed testing.

The last submitted patch, 90: 2549077-attribute-filtering-90.patch, failed testing.

The last submitted patch, 91: 2549077-attribute-filtering-91.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
59.49 KB
413 bytes

Fix for the 1 test fail.

pwolanin’s picture

Fix #93.1-10

pwolanin’s picture

Trying to address the last 2 points.

Some of the test coverage (the CODE tag) was moved to the phpunit test.

Status: Needs review » Needs work

The last submitted patch, 99: 2549077-99.patch, failed testing.

The last submitted patch, 99: 2549077-99.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
59.61 KB
2.1 KB

fix up format strings

Status: Needs review » Needs work

The last submitted patch, 102: 2549077-102.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
59.76 KB

Rebase for minor conflicts in HEAD in core/modules/ckeditor/src/Tests/CKEditorTest.php

Status: Needs review » Needs work

The last submitted patch, 104: 2549077-102.patch, failed testing.

Status: Needs work » Needs review

pwolanin queued 104: 2549077-102.patch for re-testing.

Wim Leers’s picture

Status: Needs review » Needs work

NW only for the first point, the rest is nits (and happy observations)!

  1. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -17,10 +18,10 @@
    + *     "allowed_html" = "<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul> <ol start type='1 A I'> <li> <dl> <dt> <dd> <h2> <h3> <h4> <h5> <h6>",
    

    Missing id attribute on the headers. We do add the id attribute everywhere else: in the default text formats for the Standard install profile, and in the migration path for 6 and 7. So we should do so here as well, in the default settings for this filter.

  2. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,236 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +   * and the other for values that are wildcard prefixes.
    +   *
    +   *
    +   * @param bool|array $attribute_values
    

    Nit: Two blank lines, should be one.

  3. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlTest.php
    @@ -0,0 +1,87 @@
    +      'allowed_html' => '<a href> <p> <em> <strong> <cite> <blockquote> <code class="pretty boring align-*"> <ul alpaca-*="wooly-* strong"> <ol llama-*> <li> <dl> <dt> <dd> <br> <h3 id>',
    

    <3 wooly alpaca!

    @pwolanin++++++++++++++++

  4. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlTest.php
    @@ -0,0 +1,87 @@
    +      ['<ol style="display: none;" llama-wim="noble majestic"></ol>', '<ol llama-wim="noble majestic"></ol>'],
    +      ['<ol style="display: none;" LlamA-Wim="majestic"></ol>', '<ol llama-wim="majestic"></ol>'],
    

    ROFL

  5. +++ b/core/modules/filter/tests/src/Unit/FilterHtmlTest.php
    @@ -0,0 +1,87 @@
    +      // Both wildcard names and values
    

    Nit: missing trailing period.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
59.77 KB
2.05 KB

Thanks - easy fixes.

pwolanin’s picture

Wim asked me about hist comment in #6. In the patch I think we already have that in core/modules/filter/src/Plugin/migrate/process/FilterSettings.php?

Wim Leers’s picture

That's a migration path, not an upgrade path. I'm specifically referring to the Needs upgrade path issue tag, and what we need to do to be able to remove that.

This also still needs a change record.

pwolanin’s picture

@Wim Leers - ok, yes. I see what you mean. We should basically be able to use the same code? Or copy/paste it into an update hook?

pwolanin’s picture

Adds update function, test, and fixes debug change to a test class I left in the patch.

Wim Leers’s picture

Status: Needs review » Needs work
Issue tags: -Needs upgrade path

Yes! Almost:

+++ b/core/modules/system/system.install
@@ -1642,5 +1642,33 @@ function system_update_8007() {
+    '<img>' => '<img src alt height width>',

If the caption or alignment filters are enabled, we'll need their data- attributes to be allowed as well.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
63.29 KB
3.15 KB

ok, adding handling for those too

pwolanin’s picture

oops, duplicate code from local rebase.

The last submitted patch, 114: 2549077-114.patch, failed testing.

The last submitted patch, 114: 2549077-114.patch, failed testing.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

Manually tested the upgrade path with content. Works great.

I've thoroughly reviewed the PHP. The PHP is ~90% different compared to the initial patch I wrote in #4. Peter has thoroughly tested the JavaScript and was not able to break it.

Even though I'd very much like more reviews, I do feel this is very important for Drupal 8 going forward, and it must land before RC, so: marking RTBC.


+++ b/core/modules/system/system.install
@@ -1642,5 +1642,44 @@ function system_update_8007() {
+ * Add allowed attributes to existing html filters.

Nit: Add default allowed attributes to existing text formats that use the filter_html filter. Can be fixed on commit.

Wim Leers’s picture

Issue tags: +rc deadline
Gábor Hojtsy’s picture

@pwolanin asked me to look at this patch. The concept makes sense. I spot-checked a few areas and things look fine. Even though the change is backwards compatible, I think it would be great to have a change record (as the existing tag says so as well :) for notifying people of the change.

susanb’s picture

I am starting to write the change record.

susanb’s picture

Issue tags: -Needs change record

I wrote a draft change record, pls check it!

znerol’s picture

I like that patch a lot, this makes HTML filtering a lot more robust. I Verified the new behavior manually in the browser. While reading the code I stumbled across some questions / nitpicks.

  1. +++ b/core/modules/filter/filter.filter_html.admin.js
    @@ -142,45 +139,161 @@
    +        // Let the browser do the parsing work for us.
    +        sandbox.innerHTML = allowedTags[t];
    +        node = sandbox.firstChild;
    +        tag = node.tagName.toLowerCase();
    

    Wondering how risky this might be. I guess that if someone manages to poison `Drupal.settings`, then we have more serious problems?

    Is there a specific reason why this work cannot be done on the server-side? This looks a lot like `FilterFormat::getHtmlRestrictions()`.

  2. +++ b/core/modules/filter/filter.module
    @@ -451,25 +450,6 @@ function template_preprocess_filter_tips(&$variables) {
     /**
    - * Provides filtering of input into accepted HTML.
    - */
    -function _filter_html($text, $filter) {
    

    Great!

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -17,10 +18,10 @@
    + *     "allowed_html" = "<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul> <ol start type='1 A I'> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>",
    
    +++ b/core/modules/filter/src/Plugin/migrate/process/FilterSettings.php
    @@ -0,0 +1,55 @@
    +    '<ol>' => '<ol start type>',
    +    '<ul>' => '<ul type>',
    
    +++ b/core/modules/filter/src/Tests/Migrate/d6/MigrateFilterFormatTest.php
    @@ -42,7 +42,7 @@ public function testFilterFormat() {
    +    $this->assertIdentical('<a href hreflang> <em> <strong> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>', $filters['filter_html']['settings']['allowed_html']);
    
    +++ b/core/modules/filter/src/Tests/Migrate/d7/MigrateFilterFormatTest.php
    @@ -68,7 +68,7 @@ public function testFilterFormat() {
    +    $this->assertIdentical('<div> <span> <ul type> <li> <ol start type> <a href hreflang> <img src alt height width>', $config['settings']['allowed_html']);
    
    +++ b/core/modules/system/src/Tests/Update/FilterHtmlUpdateTest.php
    @@ -0,0 +1,56 @@
    +      'basic_html' => '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h4 id> <h5 id> <h6 id> <p> <br> <span> <img src alt height width data-align data-caption>',
    +      'restricted_html' => '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h4 id> <h5 id> <h6 id>',
    
    +++ b/core/modules/system/system.install
    @@ -1642,5 +1642,44 @@ function system_update_8007() {
    +    '<ol>' => '<ol start type>',
    +    '<ul>' => '<ul type>',
    
    +++ b/core/profiles/standard/config/install/filter.format.basic_html.yml
    @@ -11,7 +11,7 @@ filters:
    +      allowed_html: '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id> <p> <br> <span> <img src alt height width data-entity-type data-entity-uuid data-align data-caption>'
    
    +++ b/core/profiles/standard/config/install/filter.format.restricted_html.yml
    @@ -11,7 +11,7 @@ filters:
    +      allowed_html: '<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>'
    

    The list type attribute (for both ul and ol) sometimes is restricted, sometimes not, sometimes has a value whitelist. Not sure whether or not it is important to be consistent here.

  4. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +          // Fourth, if the attribute value is not the empty string, this means
    

    Fourth?

  5. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // allowed since the the correct values of lang and dir may only be known
    

    s/the the/the

  6. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +70,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    // available style, so specific values should be intentionally whitelisted.
    

    Maybe s/intentionally/explicitly/

droplet’s picture

  /**
   * Implements \Drupal\ckeditor\Plugin\CKEditorPluginInterface::getConfig().
   */
  public function getConfig(Editor $editor) {
    // Reasonable defaults that provide expected basic behavior.
    $config = array(
      'customConfig' => '', // Don't load CKEditor's config.js file.
      'pasteFromWordPromptCleanup' => TRUE,
      'resize_dir' => 'vertical',
      'justifyClasses' => array('text-align-left', 'text-align-center', 'text-align-right', 'text-align-justify'),
      'entities' => FALSE,
    );

In D8, we set CKEditor to use Class for alignment. Is it possible to config it to allow 4 classes only ?

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work

Many thanks, @znerol and @droplet, for your excellent reviews!

  • #123.1: indeed, that'd be a zillion times bigger problem.
  • #123.3: It's only the Restricted HTML text format that doesn't have <ul type> AFAICT? I agree that it should be consistent.
  • #123.4: First, Second, Third, Fourth… so … what do you mean? :)
  • #124: If you align images, the data-align attribute is set. The FilterAlign filter then adds those classes upon filtering. But, you're right that those classes are used when specifically enabling CKEditor's alignment buttons (which are disabled by default because that is a layout decision that ideally a content creator should not be able to make, but specific sites can choose to opt in to that). In that case, dragging the alignment buttons into the toolbar on the CKEditor toolbar configuration page should already cause those classes to become whitelisted :) And if that is not the case, that's IMO a follow-up, because it doesn't impact default functionality.

NW just for #123.3.

znerol’s picture

#123.4: First, Second, Third, Fourth… so … what do you mean? :)

It looks a bit weird if Fourth comes out of nothing, I cannot see First, Second and Third anywhere.

droplet’s picture

few more:

1. Config doesn't sync in /admin/config/content/formats/manage/basic_html. Drag img icon out and still remain a `img` tag.
2. ESLint warning :)
3.

+++ b/core/modules/system/system.install
@@ -1642,5 +1642,44 @@ function system_update_8007() {
+    '<h2>' => '<h2 id>',
+    '<h3>' => '<h3 id>',
+    '<h4>' => '<h4 id>',
+    '<h5>' => '<h5 id>',
+    '<h6>' => '<h6 id>',

Assumed they used ID in h2 ~ h6 before this patch ?

pwolanin’s picture

@droplet re: #127.1 this is by design (but maybe bneeds to be better documented). The JS only adds more allowed tags it never removes them from the format since you might have existing content with IMG tags (in this case)

#127.2 what's the warning?

#127.3 all attributes (including ID) were allowed before, so it makes sense to at least allow ID the same as the new default.

znerol’s picture

Regarding #123.1: I realize that this piece of JavaScript code is used in the filter / editor admin screen only and not on content editing forms. No concerns from my side anymore.

pwolanin’s picture

re: #123.1 - we should possibly create a follow-up to also make sure the values are aligned in the form submit processing, instead of only relying on JS, but yes, it's not used on the node edit form only on the configuration form

laurii ran ESLint for me and found:

core/modules/filter/filter.filter_html.admin.js
  146:7   error  Split 'var' declarations into multiple statements                                                           one-var
  147:7   error  The body of a for-in should be wrapped in an if statement to filter unwanted properties from the prototype  guard-for-in
  197:40  error  Infix operators must be spaced                                                                              space-infix-ops
  197:68  error  Infix operators must be spaced                                                                              space-infix-ops
  200:63  error  Infix operators must be spaced                                                                              space-infix-ops
  230:7   error  Split 'var' declarations into multiple statements  
pwolanin’s picture

Status: Needs work » Needs review
FileSize
63.27 KB
9.83 KB
pwolanin’s picture

1 more JS lint fix.

lauriii’s picture

This fixes all the remaining eslint errors.

The last submitted patch, 131: 2549077-131.patch, failed testing.

The last submitted patch, 131: 2549077-131.patch, failed testing.

The last submitted patch, 131: 2549077-131.patch, failed testing.

The last submitted patch, 132: 2549077-132.patch, failed testing.

znerol’s picture

Status: Needs review » Reviewed & tested by the community

Reading through the interdiff #123, #124 and #127 are addressed. Back to RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 133: allow_the_limit-2549077-133.patch, failed testing.

pwolanin’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
63.34 KB

simple conflict in core/profiles/standard/config/install/filter.format.restricted_html.yml

pwolanin’s picture

Assigned: Unassigned » catch

for catch

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 140: 2549077-140.patch, failed testing.

Status: Needs work » Needs review

pwolanin queued 140: 2549077-140.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 140: 2549077-140.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
63.34 KB
1.21 KB

#2573667: Entity uninstallation does not clean up field schema data in state properly already added system_update_8008(), rerolling with an incremented update ID.

catch’s picture

Status: Reviewed & tested by the community » Needs review

Mostly nits:

  1. +++ b/core/modules/ckeditor/js/views/ControllerView.js
    @@ -221,6 +221,9 @@
           var featureName = this.model.get('buttonsToFeatures')[button.toLowerCase()];
    +      if (!featureName) {
    +        featureName = button.toLowerCase();
    +      }
    

    Why would !featureName be false - could use an inline comment.

  2. +++ b/core/modules/filter/filter.filter_html.admin.js
    @@ -142,45 +139,170 @@
    +      // objects (to allow comparing userTags with autoTags).
    +      for (featureName in newFeatures) {
    +        if (newFeatures.hasOwnProperty(featureName)) {
    +          feature = newFeatures[featureName];
    +          for (var f = 0; f < feature.length; f++) {
    +            featureRule = feature[f];
    +            for (var t = 0; t < featureRule.required.tags.length; t++) {
    +              tag = featureRule.required.tags[t];
    

    Three nested for loops is a lot - can we factor some out to a method and reduce the nesting?

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +74,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +      foreach ($xpath->query('//' . $allowed_tag . '[@*]') as $node) {
    +        $modified_attributes = [];
    +        foreach($node->attributes as $name => $attribute) {
    +          // Remove attributes not in the whitelist.
    +          $allowed_value = $this->findAllowedValue($allowed_attributes, $name);
    +          if (empty($allowed_value)) {
    +            $modified_attributes[$name] = FALSE;
    +          }
    +          elseif ($allowed_value !== TRUE) {
    +            // Check the attribute values whitelist.
    +            $attribute_values = preg_split('/\s+/', $attribute->value, -1, PREG_SPLIT_NO_EMPTY);
    +            $modified_attributes[$name] = [];
    +            foreach ($attribute_values as $value) {
    +              if ($this->findAllowedValue($allowed_value, $value)) {
    +                $modified_attributes[$name][] = $value;
    +              }
    +            }
    +          }
    +        }
    +        // If the $allowed_value was TRUE for an attribute name, it does not
    +        // appear in this array so the value on the DOM node is left unchanged.
    +        foreach ($modified_attributes as $name => $values) {
    +          if ($values) {
    +            $node->setAttribute($name, implode(' ', $values));
    +          }
    +          else {
    +            $node->removeAttribute($name);
    +          }
    +        }
    +      }
    +    }
    

    This could also possibly be factored out to a helper - takes an HTML node and filters the attributes on it.

  4. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +74,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +    $star_protector = '__zqh6vxfbk3cg__';
    

    I'm imagining the space opera with llamas that someone is slowly writing into Drupal 8 source code.

  5. +++ b/core/modules/filter/src/Plugin/Filter/FilterHtml.php
    @@ -61,22 +74,235 @@ public function settingsForm(array $form, FormStateInterface $form_state) {
    +        // Mark the tag as allowed, but with no attributes allowed.
    

    Not sure what that comment means. If attributes were allowed it'd be an array instead of FALSE?

  6. +++ b/core/modules/filter/src/Tests/FilterAPITest.php
    @@ -108,7 +108,15 @@ function testFilterFormatAPI() {
    +          '*' => array('style' => FALSE, 'on*' => FALSE, 'lang' => TRUE, 'dir' => array('ltr' => TRUE, 'rtl' => TRUE)),
    

    If style and on* are always disallowed, why are they specified here? It looks like you could flip the FALSE to TRUE and allow them but that's not the case.

  7. +++ b/core/modules/system/system.install
    @@ -1663,5 +1663,44 @@ function system_update_8008() {
     /**
    + * Add allowed attributes to existing html filters.
    + */
    +function system_update_8009() {
    

    Is this going to need a follow-up issue for the filter migration from 6.x/7.x too?

Wim Leers’s picture

Only addressing the JS & llama bits:

  1. This is a curious CKEditor internals thing. E.g. the "Source" button doesn't have a feature name, nor does the set of default rules (which whitelists the minimum tags CKEditor needs to function at all: <p> and <br>). Basically, anything that doesn't have an associated command does not have a feature name by default. Documented.
  2. It cannot very easily be factored out into a separate method without impeding legibility. The @todo references #2567801: Deprecate core/modules/editor/js/editor.admin.js JS APIs in Drupal 10, for removal in Drupal 11, which will greatly improve the data structures, which will in turn greatly simplify the logic. I ask to leave this as-is. The overall quality of the JS in this area is not great, and I take responsibility for that. We should be able to refactor that in the future though, but for understandability, I'm certain this is better for now (speaking as the maintainer of this module).
  1. ROFL :D
pwolanin’s picture

re:

#146.6 This is an expression of what the filter does that other APIs can read, so even if it's suggestive that it could be changed I don't think it's wrong. You also cannot save '*' settings, and if you tried the code should overwrite it here:

+    $restrictions['allowed']['*'] = [
+      'style' => FALSE,
+      'on*' => FALSE,
+      'lang' => TRUE,
+      'dir' => ['ltr' => TRUE, 'rtl' => TRUE],
+    ];

#146.7 There is already a new class in the patch to handle migrations of filter settings: \Drupal\migrate\Plugin\migrate\process\FilterSettings

pwolanin’s picture

re:

#146.3 split this code into a protected helper method and renamed $node to $element to reflect the actual class we are working with.

#146.5 added a comment to the allowed section in the IF branch. In that case it's either TRUE or an array

catch’s picture

#146 and #149 interdiffs both look good to me.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

@pwolanin's changes in #149 look great, AFAICT all feedback is addressed, and my changes in #147 are only comments.

Back to RTBC.

  • catch committed f09046a on 8.0.x
    Issue #2549077 by pwolanin, Wim Leers, phenaproxima, lauriii: Allow the...
catch’s picture

Status: Reviewed & tested by the community » Fixed

OK. It looks like mdrummond's concerns were addressed, and while the code is complex, I'll take Wim's point that there are existing issues to simplify what's there in HEAD which would then simplify things here.

Since this is security hardening, and we may have a beta tomorrow, seems better to get this in earlier and tested rather than later.

I forgot to add effulgentsia to the attribution when committing, but added to the issue.

Committed/pushed to 8.0.x, thanks!

effulgentsia’s picture

Version: 8.0.x-dev » 7.x-dev
Assigned: catch » Unassigned
Status: Fixed » Patch (to be ported)
Issue tags: -rc deadline

Moving to the 7.x queue per #19.

@David_Rothstein: the backport for this won't be entirely trivial, given the need to not break BC for existing 7.x sites. And this issue is already >150 comments. Therefore, would you prefer a new issue or keep using this one?

Status: Patch (to be ported) » Needs work

The last submitted patch, 149: 2549077-149.patch, failed testing.

DuaelFr’s picture

David_Rothstein’s picture

Status: Needs work » Patch (to be ported)

I think it might be a (somewhat) straightforward backport actually? Mostly we'd just need to leave out a large part of the patch (including the update function) which wouldn't be relevant for Drupal 7.

The main actual difference is that for backwards compatibility we don't want to change the meaning of the 'allowed_html' setting in Drupal 7. So we'd need to leave that as is, and add a new setting ('allowed_html_attributes'?) that has the same syntax as Drupal 8's 'allowed_html'. The duplication would be a little unfortunate there, but not a huge deal since this is already a very advanced area of the admin UI anyway.

So if a tag appears in 'allowed_html' but not 'allowed_html_attributes', it's allowed with all non-style/on* attributes (backwards compatibility). If a tag appears in both, attributes are filtered. If it appears in 'allowed_html_attributes' but not 'allowed_html', it gets ignored (the tag is still forbidden).

I hope that makes sense and covers everything.

David_Rothstein’s picture

By the way, did the Drupal 8 version of this patch miss changing the admin UI?

I'm looking at an up-to-date version of Drupal 8 right now, and the setting is still titled "Allowed HTML tags" with the description set to "A list of HTML tags that can be used. JavaScript event attributes, JavaScript URLs, and CSS are always stripped." That doesn't seem right anymore...

(for example, at admin/config/content/formats/manage/basic_html)

pwolanin’s picture

@David_Rothstein for Drupal 7 I'd instead think you'd want a checkbox to enable attribute filtering. Then you could keep the same list of tags you have now, but if you enable attribute filtering the attributes are parsed from them as in this patch. I don't think it makes any sense to have 2 lists of allowed tags.

David_Rothstein’s picture

If we stored it that way we'd still be changing the meaning of the 'allowed_html' setting - for example, you could no longer use explode() to get a list of the allowed tags.

But maybe we could use a checkbox like that in the UI, then change it to two separate settings on save.

pwolanin’s picture

pwolanin’s picture

@David_Rothstein - since the meaning of the setting is internal to the filter I don't understand why it matter that explode() would no longer work and you'd need ot use the technique that D8 has.

You think code external to the filter is trying to read this setting in D7?

David_Rothstein’s picture

You think code external to the filter is trying to read this setting in D7?

I don't see why it wouldn't. It's part of the public API, e.g. as returned by filter_list_format().

  • catch committed f09046a on 8.1.x
    Issue #2549077 by pwolanin, Wim Leers, phenaproxima, lauriii: Allow the...
bneil’s picture

I'm interested in working on the patch for D7, but would like more insight on which direction to go: Create a separate field in the UI for attributes, or go with the checkbox? Personally a checkbox seems like the best UX to me.

David_Rothstein’s picture

@bneil, that would be great if you're still interested!

My vote is for what I wrote in #160: A checkbox in the UI (since that's the best user experience), but then stored as two separate tag lists in the database (for backwards compatibility at the API level).

stefan.r’s picture

Just bumping this as this would be good to get into 7.x

  • catch committed f09046a on 8.3.x
    Issue #2549077 by pwolanin, Wim Leers, phenaproxima, lauriii: Allow the...

  • catch committed f09046a on 8.3.x
    Issue #2549077 by pwolanin, Wim Leers, phenaproxima, lauriii: Allow the...
Fabianx’s picture

Issue tags: +Drupal bugfix target
Wim Leers’s picture

Status: Patch (to be ported) » Fixed

Per changed policies, this needs a separate issue for D7.

Wim Leers’s picture

Version: 7.x-dev » 8.0.x-dev
David_Rothstein’s picture

Status: Fixed » Patch (to be ported)

Yes, but it is left open until that issue has been created and linked here.

pwolanin’s picture

Here it is: #2826458: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small list of attributes by default

Leaving this open for now until we can capture any key comments about D7 into the issue summary there.

Wim Leers’s picture

Status: Patch (to be ported) » Fixed

Thanks for opening that issue, @pwolanin! I think that issue can still find those key comments while this issue is marked fixed. To ensure it, I left a comment: #2826458-1: Allow the "Limit allowed HTML tags" filter to also restrict HTML attributes, and only allow a small list of attributes by default.

Now finally closing this issue again.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

effulgentsia’s picture

+++ b/core/profiles/standard/config/install/filter.format.basic_html.yml
@@ -36,12 +36,6 @@ filters:
-  filter_htmlcorrector:

This seemed harmless enough, but it broke an assumption in text_summary(). See #3067116: text_summary() returns malformed (not normalized) HTML for basic_html and other formats that use filter_html instead of filter_htmlcorrector.

Mêlée’s picture

Can users be given the choice of which tags are filtered from their code when using Ckeditor please.

Seems strange to remove a useful feature that was included with previous versions.