Beta phase evaluation
Issue category | "Bug" because common use case (SASS / Compass) is broken |
---|---|
Issue priority | "Normal" |
Problem/Motivation
If a css file is encoded in UTF-8 with BOM (byte order mark) it breaks the design when enabling css aggregation.
In accordance with the CSS3 spec, Drupal should support UTF-8 encoding in style sheets:
http://www.w3.org/TR/2013/WD-css-syntax-3-20130919/#determine-the-fallba...
Starting with Sass 3.4, the BOM is added automatically when non-ascii content is detected, so this problem will become increasingly common (see https://github.com/sass/sass/blob/3.4.0/lib/sass/tree/visitors/to_css.rb... ).
To reproduce, run
php ./vendor/phpunit/phpunit/phpunit --filter=CssOptimizerUnitTest
with 1833356_test.patch.
Proposed resolution
Checks the encoding on input files (per the fallbacks in the CSS3 spec), and converts non-unicode input to UTF8.
Remaining tasks
Reviews needed
User interface changes
N/A
API changes
N/A
Original report by sammuell
If a css file is encoded in UTF-8 with BOM (byte order mark) it breaks the design when enabling css aggregation. The byte order mark will be inserted at the beginning of every aggregated file (showing up as a questionmark in the code), breaking the first css declaration.
span.AveragePoints{font-size:12pt;margin-bottom:20px;}
?div#Timeline{width:100%;height:200px;text-align:center;}
The issue can be fixed by encoding the files in UTF-8 without BOM, e.g. in Notepad++. However this information does currently not seem to be documented. Where is the best place to put this information? drupal_add_css()?
Comment | File | Size | Author |
---|---|---|---|
#42 | core-bom-aggregation-1833356-42.patch | 2.92 KB | xeraseth |
#33 | 1833356-33-interdiff.txt | 5.59 KB | grendzy |
#33 | css_files_encoded_in-1833356-33.patch | 11.51 KB | grendzy |
#27 | interdiff_27.txt | 2.07 KB | grendzy |
#27 | 1833356_27.patch | 6.47 KB | grendzy |
Comments
Comment #1
longwaveAlternatively, shouldn't the aggregator just strip any BOM from all files it includes?
Comment #2
jhodgdonThis does sound like a bug in the aggregation routines. Anyway, whether we fix it there or in the documentation, it needs to be addressed in 8.x first then backported to 7.x.
And... can someone please attach a CSS file with a BOM in it (I have no idea how to make one) so that we can create a test for this? Thanks!
Comment #3
jhodgdonforgot backport tag
Comment #4
Wim LeersConsidering how very little reaction there has been to this, I think it's safe to assume that almost nobody runs into this problem. Probably because it's recommended in the specs to *not* have the BOM.
If you reopen, please provide a patch that fixes it along with test coverage.
Comment #5
grendzy CreditAttribution: grendzy commentedComment #6
grendzy CreditAttribution: grendzy commentedYou probably can't see it in your browser, but this patch adds a <U+FEFF> BOM to one of the test files, and produces test failures.
Comment #7
grendzy CreditAttribution: grendzy commentedComment #8
Wim LeersComment #10
Wim LeersThere's a weird seemingly random test failure. But there's also this:
which indeed proves it's a problem.
grendzy, are you planning to work on a patch to fix this?
Comment #11
grendzy CreditAttribution: grendzy commentedI'll give it a shot. A few notes:
Comment #12
Heine CreditAttribution: Heine commentedGrendzy, great work that covers the most common cases.
Some remarks:
http://www.w3.org/TR/CSS2/syndata.html#charset prescribes match rules (table) to determine the sheet character set. Should that be used for completeness?
Comment #13
Wim LeersThanks for the review, Heine! I'm very glad you're reviewing this hardcore stuff! :)
Comment #14
grendzy CreditAttribution: grendzy commentedGreat feedback! I'll make a new patch soon. For the third point, CSS3 seems to have simplified this matching, in that it requires the @charset at-rule to match in hex 40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B. (ASCII or windows-1252). Would it be acceptable then to not look for @charset rules in UTF-16 and UTF-32, as CSS3 implies this is impossible? (UTF-16 and UTF-32 would still be supported using BOM).
Comment #15
grendzy CreditAttribution: grendzy commentedNew patch:
convertToUtf8()
is essentially no-op. Still, I think it's still possible for convertToUtf8() to fail on systems with no Unicode support, so it's worth checking.Comment #16
grendzy CreditAttribution: grendzy commentedComment #18
grendzy CreditAttribution: grendzy commentedComment #19
gregglesWith this change to sass in 3.4+ I think more and more people will be encountering this issue. For me it came up because font-awesome has utf8 in it so sass added a byte-order-mark to that file which then got aggregated in the middle of my other files.
My workaround was to add this sed command to my deploy script (command source is stack overflow question on BOMs:
Comment #22
grendzy CreditAttribution: grendzy at Metal Toad commentedReroll.
Comment #23
grendzy CreditAttribution: grendzy at Metal Toad commentedJust the tests:
Comment #25
grendzy CreditAttribution: grendzy at Metal Toad commentedComment #26
jhedstromJust some coding style nitpicks here:
Need a period here.
This should be
elseif
.Needs to end with a new line.
Comment #27
grendzy CreditAttribution: grendzy at Metal Toad commentedComment #28
grendzy CreditAttribution: grendzy at Metal Toad commentedComment #29
grendzy CreditAttribution: grendzy at Metal Toad commentedComment #30
jhedstromAssuming #27 goes green, I think this is good to go. It adds a method, so is technically an API change, but it is completely non-breaking.
Comment #31
anavarreThose should be wrapped at 80 cols.
Comment #32
alexpottI really like that we're fixing this - it has caught out front devs on projects I've worked on many times.
Two methods with the same one line method description? I think it'd be great to be able to discern the difference by reading the one liner.
Also what about javascript aggregation?
Comment #33
grendzy CreditAttribution: grendzy at Metal Toad commentedThanks for the reviews! New patch:
convertToUtf8UsingBOM
is nowencodingFromBOM
– it just returns the encoding - this is more flexible for future users that may not want to load a large file entirely into memory.JsOptimizer::optimize()
(plus tests).Comment #34
jhedstromAssuming #33 goes green, I think this has addressed the remaining feedback. Back to RTBC.
Snowman FTW!
Comment #35
alexpott@grendzy nice changes - really like the the flexibility introduced.
This issue is a normal bug fix, and doesn't include any disruptive changes, so it is allowed per https://www.drupal.org/core/beta-changes. Committed 10c6e1d and pushed to 8.0.x. Thanks!
This would have saved so much aggro and time on Capgemini projects - really happy to see a fix here.
Comment #36
jhedstromI don't think this was pushed.
Comment #37
alexpottComment #42
xeraseth CreditAttribution: xeraseth commentedWe ran into this issue in D7. Attached is the patch I applied, tests still need to be written if someone wants to take a stab.