CSS files encoded in UTF-8 with BOM break the design when enabling CSS aggregation [#1833356]

Reference: https://www.drupal.org/core/beta-changes
Issue category	"Bug" because common use case (SASS / Compass) is broken
Issue priority	"Normal"

Comment	File	Size	Author
#42	core-bom-aggregation-1833356-42.patch	2.92 KB	xeraseth
7.x: PHP 7 & MySQL 5.5 2,042 pass
#33	1833356-33-interdiff.txt	5.59 KB	grendzy
#33	css_files_encoded_in-1833356-33.patch	11.51 KB	grendzy

#27	interdiff_27.txt	2.07 KB	grendzy
#27	1833356_27.patch	6.47 KB	grendzy

#23	1833356_23__TEST_ONLY_DEMONSTRATING_FAILURE.patch	3.97 KB	grendzy

#22	1833356_22.patch	6.56 KB	grendzy

#18	1833356_18.patch	9.65 KB	grendzy

#15	1833356_15b.patch	8.4 KB	grendzy

#11	1833356_11.patch	6.35 KB	grendzy

#5	1833356_test.patch	451 bytes	grendzy

Comment #1

longwave

he/him

English

UK

CreditAttribution: longwave commented 6 November 2012 at 14:24

Alternatively, shouldn't the aggregator just strip any BOM from all files it includes?

Log in or register to post comments

Comment #2

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 November 2012 at 15:14

Version:	7.16	» 8.x-dev
Component:	documentation	» base system
Issue tags:		+Needs tests

This does sound like a bug in the aggregation routines. Anyway, whether we fix it there or in the documentation, it needs to be addressed in 8.x first then backported to 7.x.

And... can someone please attach a CSS file with a BOM in it (I have no idea how to make one) so that we can create a test for this? Thanks!

Log in or register to post comments

Comment #3

jhodgdon

she/her

English

CreditAttribution: jhodgdon commented 7 November 2012 at 15:15

Issue tags:

+Needs backport to D7

forgot backport tag

Log in or register to post comments

Comment #4

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 4 July 2013 at 15:15

Status:

Active

» Closed (won't fix)

Considering how very little reaction there has been to this, I think it's safe to assume that almost nobody runs into this problem. Probably because it's recommended in the specs to *not* have the BOM.

If you reopen, please provide a patch that fixes it along with test coverage.

Log in or register to post comments

Comment #5

grendzy CreditAttribution: grendzy commented 19 November 2014 at 08:14

Issue summary:	View changes
Status:	Closed (won't fix)	» Active

File	Size
1833356_test.patch	451 bytes

Log in or register to post comments

Comment #6

grendzy CreditAttribution: grendzy commented 19 November 2014 at 08:16

You probably can't see it in your browser, but this patch adds a <U+FEFF> BOM to one of the test files, and produces test failures.

Log in or register to post comments

Comment #7

grendzy CreditAttribution: grendzy commented 19 November 2014 at 08:18

Issue summary:

View changes

Log in or register to post comments

Comment #8

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 19 November 2014 at 09:45

Status:

Active

» Needs review

Log in or register to post comments

Comment #9

19 November 2014 at 10:25

Status:

Needs review

» Needs work

The last submitted patch, 5: 1833356_test.patch, failed testing.

Log in or register to post comments

Comment #10

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 19 November 2014 at 10:39

There's a weird seemingly random test failure. But there's also this:

Drupal\Tests\Core\Asset\CssOptimizerUnitTest                   8 passes   1 fails

which indeed proves it's a problem.

grendzy, are you planning to work on a patch to fix this?

Log in or register to post comments

Comment #11

grendzy CreditAttribution: grendzy commented 21 November 2014 at 08:56

Status:

Needs work

» Needs review

File	Size
1833356_11.patch	6.35 KB

I'll give it a shot. A few notes:

Be careful if editing the test CSS files, to not change the encoding. They have unusual encodings on purpose (UTF-8 with BOM, and Latin-9).
testOptimizeRemoveCharset() is removed, because the new tests cover not only @charset removal, but conversion as well.
@charset statements are not legal in a style attribute on an HTML element or inside the <style> element. So, the charset handling is done during file loading, not processCss().

Log in or register to post comments

Comment #12

Heine CreditAttribution: Heine commented 21 November 2014 at 10:28

Status:

Needs review

» Needs work

Grendzy, great work that covers the most common cases.

Some remarks:

Charset names should probably be matched case-insensitive: @charset="utf-8" is the recommended charset for new stylesheets per http://www.w3.org/International/questions/qa-css-charset.en.php. The CSS spec however, refers to preferred names, those are uppercase.
@charset="" is possible after a BOM. The pregmatch should be anchored on ^BOM, or operate on a stripped BOM
In certain encodings, '@charset' may not match the ASCII bytesequence '@charset'.

http://www.w3.org/TR/CSS2/syndata.html#charset prescribes match rules (table) to determine the sheet character set. Should that be used for completeness?

Log in or register to post comments

Comment #13

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 21 November 2014 at 11:17

Thanks for the review, Heine! I'm very glad you're reviewing this hardcore stuff! :)

Log in or register to post comments

Comment #14

grendzy CreditAttribution: grendzy commented 25 November 2014 at 07:00

Great feedback! I'll make a new patch soon. For the third point, CSS3 seems to have simplified this matching, in that it requires the @charset at-rule to match in hex 40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B. (ASCII or windows-1252). Would it be acceptable then to not look for @charset rules in UTF-16 and UTF-32, as CSS3 implies this is impossible? (UTF-16 and UTF-32 would still be supported using BOM).

Log in or register to post comments

Comment #15

grendzy CreditAttribution: grendzy commented 25 November 2014 at 19:53

File	Size
1833356_15b.patch	8.4 KB

New patch:

charset case-sensitivity has been addressed. The check is almost unnecessary, it was intended as a performance optimization that proved unwarranted. Micro-benchmarks showed that passing equivalent in/out charsets to convertToUtf8() is essentially no-op. Still, I think it's still possible for convertToUtf8() to fail on systems with no Unicode support, so it's worth checking.
BOM and @-rule combination has been fixed, and tested.
Non-ascii-compatible @-rules are ignored, per my understanding of CSS3 spec. However I did add a test with a UTF-16 BOM.

Log in or register to post comments

Comment #16

grendzy CreditAttribution: grendzy commented 25 November 2014 at 19:53

Status:	Needs work	» Needs review
Issue tags:	-Needs tests

Log in or register to post comments

Comment #17

25 November 2014 at 19:55

Status:

Needs review

» Needs work

The last submitted patch, 15: 1833356_15b.patch, failed testing.

Log in or register to post comments

Comment #18

grendzy CreditAttribution: grendzy commented 26 November 2014 at 03:06

Status:

Needs work

» Needs review

File	Size
1833356_18.patch	9.65 KB

Log in or register to post comments

Comment #19

greggles

he/him

English

Denver, Colorado, USA

CreditAttribution: greggles commented 26 January 2015 at 22:35

With this change to sass in 3.4+ I think more and more people will be encountering this issue. For me it came up because font-awesome has utf8 in it so sass added a byte-order-mark to that file which then got aggregated in the middle of my other files.

My workaround was to add this sed command to my deploy script (command source is stack overflow question on BOMs:

find sites/all/themes/themename/cssoutputdirectory/ -name '*.css' -exec sed -i '1 s/^\xef\xbb\xbf//' {} \;

Log in or register to post comments

Comment #20

9 March 2015 at 20:16

mgifford queued 18: 1833356_18.patch for re-testing.

Log in or register to post comments

Comment #21

9 March 2015 at 20:18

Status:

Needs review

» Needs work

The last submitted patch, 18: 1833356_18.patch, failed testing.

Log in or register to post comments

Comment #22

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 18:11

Status:

Needs work

» Needs review

File	Size
1833356_22.patch	6.56 KB

Reroll.

Log in or register to post comments

Comment #23

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 18:26

File	Size
1833356_23__TEST_ONLY_DEMONSTRATING_FAILURE.patch	3.97 KB

Just the tests:

Log in or register to post comments

Comment #24

15 May 2015 at 18:59

Status:

Needs review

» Needs work

The last submitted patch, 23: 1833356_23__TEST_ONLY_DEMONSTRATING_FAILURE.patch, failed testing.

Log in or register to post comments

Comment #25

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 18:59

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #26

jhedstrom

English

Portland, OR

CreditAttribution: jhedstrom at Phase2 commented 15 May 2015 at 20:40

Status:

Needs review

» Needs work

Just some coding style nitpicks here:

+++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
@@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
+      // Check for BOM

Need a period here.

+++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
@@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
+      else if (preg_match('/^@charset "([^"]+)";/', $contents, $matches)) {

This should be elseif.

+++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_bom.css
@@ -0,0 +1,3 @@
\ No newline at end of file

+++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_bom_and_charset.css
@@ -0,0 +1,4 @@
\ No newline at end of file

+++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_charset.css
@@ -0,0 +1,4 @@
\ No newline at end of file

Needs to end with a new line.

Log in or register to post comments

Comment #27

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 21:00

Status:

Needs work

» Needs review

File	Size
1833356_27.patch	6.47 KB

interdiff_27.txt	2.07 KB

Log in or register to post comments

Comment #28

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 21:13

Issue summary:

View changes

Log in or register to post comments

Comment #29

grendzy CreditAttribution: grendzy at Metal Toad commented 15 May 2015 at 21:14

Issue summary:

View changes

Log in or register to post comments

Comment #30

jhedstrom

English

Portland, OR

CreditAttribution: jhedstrom at Phase2 commented 15 May 2015 at 21:17

Status:

Needs review

» Reviewed & tested by the community

Assuming #27 goes green, I think this is good to go. It adds a method, so is technically an API change, but it is completely non-breaking.

Log in or register to post comments

Comment #31

anavarre

French

🇪🇺

CreditAttribution: anavarre at Acquia commented 17 May 2015 at 11:28

+++ b/core/lib/Drupal/Component/Utility/Unicode.php
@@ -186,6 +186,37 @@ public static function check() {
+   *   The data to be converted. The encoding is detected based on the byte order mark.

+++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
@@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
+      // If no BOM, check for fallback encoding. Per CSS spec the regex is very strict.

Those should be wrapped at 80 cols.

Log in or register to post comments

Comment #32

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 20 May 2015 at 11:21

Status:

Reviewed & tested by the community

» Needs work

I really like that we're fixing this - it has caught out front devs on projects I've worked on many times.

+++ b/core/lib/Drupal/Component/Utility/Unicode.php
@@ -186,6 +186,37 @@ public static function check() {
   /**
    * Converts data to UTF-8.
...
+  /**
+   * Converts data to UTF-8.

Two methods with the same one line method description? I think it'd be great to be able to discern the difference by reading the one liner.

Also what about javascript aggregation?

Log in or register to post comments

Comment #33

grendzy CreditAttribution: grendzy at Metal Toad commented 21 May 2015 at 18:01

Status:

Needs work

» Needs review

File	Size
css_files_encoded_in-1833356-33.patch	11.51 KB

1833356-33-interdiff.txt	5.59 KB

Thanks for the reviews! New patch:

80-column fix per #31
Fixed docs per #32. convertToUtf8UsingBOM is now encodingFromBOM – it just returns the encoding - this is more flexible for future users that may not want to load a large file entirely into memory.
Added JS support for BOM and charset attribute. It's only about 5 lines of actual code to run the same check on JsOptimizer::optimize() (plus tests).

Log in or register to post comments

Comment #34

jhedstrom

English

Portland, OR

CreditAttribution: jhedstrom at Phase2 commented 21 May 2015 at 18:29

Status:

Needs review

» Reviewed & tested by the community

Assuming #33 goes green, I think this has addressed the remaining feedback. Back to RTBC.

+++ b/core/tests/Drupal/Tests/Core/Asset/js_test_files/utf8_bom.js.optimized.js
@@ -0,0 +1 @@
+var utf8BOM = '☃';

Snowman FTW!

Log in or register to post comments

Comment #35

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 21 May 2015 at 19:10

Version:	8.0.x-dev	» 7.x-dev
Status:	Reviewed & tested by the community	» Patch (to be ported)

@grendzy nice changes - really like the the flexibility introduced.

This issue is a normal bug fix, and doesn't include any disruptive changes, so it is allowed per https://www.drupal.org/core/beta-changes. Committed 10c6e1d and pushed to 8.0.x. Thanks!

This would have saved so much aggro and time on Capgemini projects - really happy to see a fix here.

Log in or register to post comments

Comment #36

jhedstrom

English

Portland, OR

CreditAttribution: jhedstrom at Phase2 commented 21 May 2015 at 20:09

Version:	7.x-dev	» 8.0.x-dev
Status:	Patch (to be ported)	» Reviewed & tested by the community

I don't think this was pushed.

Log in or register to post comments

Comment #37

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 21 May 2015 at 22:36

Version:	8.0.x-dev	» 7.x-dev
Status:	Reviewed & tested by the community	» Patch (to be ported)

Log in or register to post comments

Comment #38

21 May 2015 at 22:37

alexpott committed 10c6e1d on 8.0.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

Comment #39

19 November 2015 at 17:53

alexpott committed 10c6e1d on 8.1.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

Comment #40

3 August 2016 at 00:04

alexpott committed 10c6e1d on 8.3.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

Comment #41

3 August 2016 at 00:05

alexpott committed 10c6e1d on 8.3.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

Comment #42

xeraseth CreditAttribution: xeraseth commented 18 January 2017 at 23:03

File	Size
core-bom-aggregation-1833356-42.patch	2.92 KB
7.x: PHP 7 & MySQL 5.5 2,042 pass

We ran into this issue in D7. Attached is the patch I applied, tests still need to be written if someone wants to take a stab.

Log in or register to post comments

Comment #43

27 January 2017 at 17:53

alexpott committed 10c6e1d on 8.4.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

Comment #44

27 January 2017 at 17:52

alexpott committed 10c6e1d on 8.4.x

Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

Log in or register to post comments

CSS files encoded in UTF-8 with BOM break the design when enabling CSS aggregation

Beta phase evaluation

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Original report by sammuell

Comments