Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category "Bug" because common use case (SASS / Compass) is broken
Issue priority "Normal"

Problem/Motivation

If a css file is encoded in UTF-8 with BOM (byte order mark) it breaks the design when enabling css aggregation.

In accordance with the CSS3 spec, Drupal should support UTF-8 encoding in style sheets:
http://www.w3.org/TR/2013/WD-css-syntax-3-20130919/#determine-the-fallba...

Starting with Sass 3.4, the BOM is added automatically when non-ascii content is detected, so this problem will become increasingly common (see https://github.com/sass/sass/blob/3.4.0/lib/sass/tree/visitors/to_css.rb... ).

To reproduce, run
php ./vendor/phpunit/phpunit/phpunit --filter=CssOptimizerUnitTest
with 1833356_test.patch.

Proposed resolution

Checks the encoding on input files (per the fallbacks in the CSS3 spec), and converts non-unicode input to UTF8.

Remaining tasks

Reviews needed

User interface changes

N/A

API changes

N/A

Original report by sammuell

If a css file is encoded in UTF-8 with BOM (byte order mark) it breaks the design when enabling css aggregation. The byte order mark will be inserted at the beginning of every aggregated file (showing up as a questionmark in the code), breaking the first css declaration.

span.AveragePoints{font-size:12pt;margin-bottom:20px;}
?div#Timeline{width:100%;height:200px;text-align:center;}

The issue can be fixed by encoding the files in UTF-8 without BOM, e.g. in Notepad++. However this information does currently not seem to be documented. Where is the best place to put this information? drupal_add_css()?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

longwave’s picture

Alternatively, shouldn't the aggregator just strip any BOM from all files it includes?

jhodgdon’s picture

Version: 7.16 » 8.x-dev
Component: documentation » base system
Issue tags: +Needs tests

This does sound like a bug in the aggregation routines. Anyway, whether we fix it there or in the documentation, it needs to be addressed in 8.x first then backported to 7.x.

And... can someone please attach a CSS file with a BOM in it (I have no idea how to make one) so that we can create a test for this? Thanks!

jhodgdon’s picture

Issue tags: +Needs backport to D7

forgot backport tag

Wim Leers’s picture

Status: Active » Closed (won't fix)

Considering how very little reaction there has been to this, I think it's safe to assume that almost nobody runs into this problem. Probably because it's recommended in the specs to *not* have the BOM.

If you reopen, please provide a patch that fixes it along with test coverage.

grendzy’s picture

Issue summary: View changes
Status: Closed (won't fix) » Active
FileSize
451 bytes
grendzy’s picture

You probably can't see it in your browser, but this patch adds a <U+FEFF> BOM to one of the test files, and produces test failures.

grendzy’s picture

Issue summary: View changes
Wim Leers’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, 5: 1833356_test.patch, failed testing.

Wim Leers’s picture

There's a weird seemingly random test failure. But there's also this:

Drupal\Tests\Core\Asset\CssOptimizerUnitTest                   8 passes   1 fails

which indeed proves it's a problem.

grendzy, are you planning to work on a patch to fix this?

grendzy’s picture

Status: Needs work » Needs review
FileSize
6.35 KB

I'll give it a shot. A few notes:

  • Be careful if editing the test CSS files, to not change the encoding. They have unusual encodings on purpose (UTF-8 with BOM, and Latin-9).
  • testOptimizeRemoveCharset() is removed, because the new tests cover not only @charset removal, but conversion as well.
  • @charset statements are not legal in a style attribute on an HTML element or inside the <style> element. So, the charset handling is done during file loading, not processCss().
Heine’s picture

Status: Needs review » Needs work

Grendzy, great work that covers the most common cases.

Some remarks:

  • Charset names should probably be matched case-insensitive: @charset="utf-8" is the recommended charset for new stylesheets per http://www.w3.org/International/questions/qa-css-charset.en.php. The CSS spec however, refers to preferred names, those are uppercase.
  • @charset="" is possible after a BOM. The pregmatch should be anchored on ^BOM, or operate on a stripped BOM
  • In certain encodings, '@charset' may not match the ASCII bytesequence '@charset'.

http://www.w3.org/TR/CSS2/syndata.html#charset prescribes match rules (table) to determine the sheet character set. Should that be used for completeness?

Wim Leers’s picture

Thanks for the review, Heine! I'm very glad you're reviewing this hardcore stuff! :)

grendzy’s picture

Great feedback! I'll make a new patch soon. For the third point, CSS3 seems to have simplified this matching, in that it requires the @charset at-rule to match in hex 40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B. (ASCII or windows-1252). Would it be acceptable then to not look for @charset rules in UTF-16 and UTF-32, as CSS3 implies this is impossible? (UTF-16 and UTF-32 would still be supported using BOM).

grendzy’s picture

FileSize
8.4 KB

New patch:

  • charset case-sensitivity has been addressed. The check is almost unnecessary, it was intended as a performance optimization that proved unwarranted. Micro-benchmarks showed that passing equivalent in/out charsets to convertToUtf8() is essentially no-op. Still, I think it's still possible for convertToUtf8() to fail on systems with no Unicode support, so it's worth checking.
  • BOM and @-rule combination has been fixed, and tested.
  • Non-ascii-compatible @-rules are ignored, per my understanding of CSS3 spec. However I did add a test with a UTF-16 BOM.
grendzy’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests

Status: Needs review » Needs work

The last submitted patch, 15: 1833356_15b.patch, failed testing.

grendzy’s picture

Status: Needs work » Needs review
FileSize
9.65 KB
greggles’s picture

With this change to sass in 3.4+ I think more and more people will be encountering this issue. For me it came up because font-awesome has utf8 in it so sass added a byte-order-mark to that file which then got aggregated in the middle of my other files.

My workaround was to add this sed command to my deploy script (command source is stack overflow question on BOMs:

find sites/all/themes/themename/cssoutputdirectory/ -name '*.css' -exec sed -i '1 s/^\xef\xbb\xbf//' {} \;

mgifford queued 18: 1833356_18.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 18: 1833356_18.patch, failed testing.

grendzy’s picture

Status: Needs work » Needs review
FileSize
6.56 KB

Reroll.

grendzy’s picture

Just the tests:

Status: Needs review » Needs work

The last submitted patch, 23: 1833356_23__TEST_ONLY_DEMONSTRATING_FAILURE.patch, failed testing.

grendzy’s picture

Status: Needs work » Needs review
jhedstrom’s picture

Status: Needs review » Needs work

Just some coding style nitpicks here:

  1. +++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
    @@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
    +      // Check for BOM
    

    Need a period here.

  2. +++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
    @@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
    +      else if (preg_match('/^@charset "([^"]+)";/', $contents, $matches)) {
    

    This should be elseif.

  3. +++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_bom.css
    @@ -0,0 +1,3 @@
    \ No newline at end of file
    
    +++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_bom_and_charset.css
    @@ -0,0 +1,4 @@
    \ No newline at end of file
    
    +++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_charset.css
    @@ -0,0 +1,4 @@
    \ No newline at end of file
    

    Needs to end with a new line.

grendzy’s picture

Status: Needs work » Needs review
FileSize
6.47 KB
2.07 KB
grendzy’s picture

Issue summary: View changes
grendzy’s picture

Issue summary: View changes
jhedstrom’s picture

Status: Needs review » Reviewed & tested by the community

Assuming #27 goes green, I think this is good to go. It adds a method, so is technically an API change, but it is completely non-breaking.

anavarre’s picture

+++ b/core/lib/Drupal/Component/Utility/Unicode.php
@@ -186,6 +186,37 @@ public static function check() {
+   *   The data to be converted. The encoding is detected based on the byte order mark.

+++ b/core/lib/Drupal/Core/Asset/CssOptimizer.php
@@ -125,6 +126,18 @@ public function loadFile($file, $optimize = NULL, $reset_basepath = TRUE) {
+      // If no BOM, check for fallback encoding. Per CSS spec the regex is very strict.

Those should be wrapped at 80 cols.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

I really like that we're fixing this - it has caught out front devs on projects I've worked on many times.

+++ b/core/lib/Drupal/Component/Utility/Unicode.php
@@ -186,6 +186,37 @@ public static function check() {
   /**
    * Converts data to UTF-8.
...
+  /**
+   * Converts data to UTF-8.

Two methods with the same one line method description? I think it'd be great to be able to discern the difference by reading the one liner.

Also what about javascript aggregation?

grendzy’s picture

Status: Needs work » Needs review
FileSize
11.51 KB
5.59 KB

Thanks for the reviews! New patch:

  • 80-column fix per #31
  • Fixed docs per #32. convertToUtf8UsingBOM is now encodingFromBOM – it just returns the encoding - this is more flexible for future users that may not want to load a large file entirely into memory.
  • Added JS support for BOM and charset attribute. It's only about 5 lines of actual code to run the same check on JsOptimizer::optimize() (plus tests).
jhedstrom’s picture

Status: Needs review » Reviewed & tested by the community

Assuming #33 goes green, I think this has addressed the remaining feedback. Back to RTBC.

+++ b/core/tests/Drupal/Tests/Core/Asset/js_test_files/utf8_bom.js.optimized.js
@@ -0,0 +1 @@
+var utf8BOM = '☃';

Snowman FTW!

alexpott’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

@grendzy nice changes - really like the the flexibility introduced.

This issue is a normal bug fix, and doesn't include any disruptive changes, so it is allowed per https://www.drupal.org/core/beta-changes. Committed 10c6e1d and pushed to 8.0.x. Thanks!

This would have saved so much aggro and time on Capgemini projects - really happy to see a fix here.

jhedstrom’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Patch (to be ported) » Reviewed & tested by the community

I don't think this was pushed.

alexpott’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

  • alexpott committed 10c6e1d on 8.0.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

  • alexpott committed 10c6e1d on 8.1.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

  • alexpott committed 10c6e1d on 8.3.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

  • alexpott committed 10c6e1d on 8.3.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...
xeraseth’s picture

We ran into this issue in D7. Attached is the patch I applied, tests still need to be written if someone wants to take a stab.

  • alexpott committed 10c6e1d on 8.4.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...

  • alexpott committed 10c6e1d on 8.4.x
    Issue #1833356 by grendzy: CSS files encoded in UTF-8 with BOM break the...