Problem/Motivation

From .gitattributes:

# Auto-detect text files, ensure they use LF.
*         text=auto eol=lf

This line is wrecking havoc when adding themes or dependencies that contain binary files with CRLF line endings, such as font files (*.ttf). I just spent a couple of hours debugging an error I was getting in a custom theme for D8. The error was: "warning: CRLF will be replaced by LF in foobar.ttf". Removing that line from .gitattributes made the error go away.

To reproduce:

  • add this line to composer.json: "symfony/console": "2.6.*@dev",
  • composer update
  • git add core/vendor/symfony/console/
  • warning: CRLF will be replaced by LF in core/vendor/symfony/console/Symfony/Component/Console/Resources/bin/hiddeninput.exe.
    The file will have its original line endings in your working directory.

At the moment, we set a general rule that all files in git should use LF line endings, except for a handful of binary extension that we define at the end of .gitattributes. We shouldn't dictate how people should manage any possible file extension in git, we should only enforce the "Auto-detect text files, ensure they use LF." rules on the files that we know belong to Drupal. Having such a rule for all files is not going to scale for joe the developer (that was me today) who is using a theme that happens to include binary files that have the extension *.foobar and have CRLF line endings.

Proposed resolution

Remove the general rule for all extension, and only list the files we know should be versioned in Git (the way we have it now just below the * rule). As a result, we can remove the list of exceptions at the end of .gitattributes.

Remaining tasks

patch

User interface changes

n/a

API changes

n/a

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

scor’s picture

Issue summary: View changes
scor’s picture

Issue summary: View changes
Iztok’s picture

Yes, and there are many other file types out there - like video files (where I found the issue).

I also commented out line 20 to fix this.

dawehner’s picture

Two points:

  • We should provide a example.gitattributes file, given that people also maybe want to adapt it
  • I think the rule should be just applied to core/* given that this is the code which is developed inside the Drupal repository
ultimike’s picture

We hit a related issue with a D8 Migrate in Core patch (#2410623) that includes a .gz file (of a D6 mysql dump). I was able to create and apply the patch locally, but the d.o. test bot couldn't apply it cleanly. chx figured out that it was the fact that the patch was created using a normal "diff" for the binary file which was the root of the problem. We solved this issue by adding a .gitattributes file to the proper directory with:

d6.gz -text -diff

This makes it so there is no diff text for the d6.gz file, instead a binary diff is included in the patch.

It is my understanding that in some cases spaces and line breaks that mess with binary diffs, rendering them problematic.

Here is the issue and patch: https://www.drupal.org/node/2410623#comment-9566443

-mike

neclimdul’s picture

#2498515: Update additional Symfony Components to 2.7.0 changed the console .exe file but this is still a bit of an issue.

kenorb’s picture

Some git clients wrongly detect some files as in text format. This includes DOC files, and many other binaries, therefore corrupting them.

Example of such file: https://github.com/ruflin/Elastica/blob/master/test/data/test.doc

Related:

- http://stackoverflow.com/questions/18536863/git-refuses-to-reset-discard...
- http://stackoverflow.com/questions/15958446/sourcetree-app-says-uncommit...

kenorb’s picture

ztl8702’s picture

Priority: Normal » Critical

Actually even .JPG, .jpeg, etc are mistakenly treated as text files.
This could cause damage of data.

I am using git 2.1.4 on Ubuntu 15.04.

kenorb’s picture

@ztl8702 I had issues with images as well. If you could please test the patch and see if that helps.

swentel’s picture

Priority: Critical » Normal

This is really not critical

ztl8702’s picture

@kenorb yes it works.
I think as long as we remove * text=auto eol=lf it should no longer corrupt binary files.

But the problem then is will some file types that were supposed to be auto-detected be missed?

ztl8702’s picture

Version: 8.0.x-dev » 8.2.x-dev

This issue should be moved to the newest dev version now.

ztl8702’s picture

Version: 8.2.x-dev » 8.1.x-dev
Jon@s’s picture

Just wanted to add that instead of removing the line
* text=auto eol=lf

You can just add the other file types to the end of the file (these lines):

# Define binary file attributes.
# - Do not treat them as text.
# - Include binary diff in patches instead of "binary files differ."
*.gif     -text diff
*.gz      -text diff
*.ico     -text diff
*.jpg     -text diff

Had to do that with a bunch of font file types.

tstoeckler’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me.

ztl8702’s picture

@Jon@s
Be aware that capitalization does matter. .JPG .jPG .jpg and .jPg etc. need seperate lines, and we cannot gurantee which one will be for user uploaded files. Base on my testing, Drupal will not change the file extensition to lower case if the user uploads something like "photo.JPG".

Therefore I suggest we stick to the patch on #8.

Jon@s’s picture

@ztl8702
True, but without * text=auto eol=lf you now have to go and explicitly name every text file type, which is many more than binary file types and chances are a new file type is a text format. Also while capitalization matters, .JPG .jPg and the like should never appear anyways (although especially .JPG was common on older Microsoft system).

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

Here's a list of extensions of all files in the git repository...

bak
config
css
data
dist
engine
gif
gitignore
gz
html
ico
inc
install
jpeg
jpg
js
json
lock
make
map
md
module
orig
php
png
po
profile
save
script
sh
sql
svg
svgz
swo
swp
theme
twig
txt
xml
xtmpl
yml

I think at the very least we need to ensure all of these are present and correct. Many of the odder ones come from the htaccess test.

kenorb’s picture

Status: Needs work » Needs review

@alexpott This could be list of extensions on specific repository, but it doesn't guarantee that the rules won't corrupt any other file formats which are part of theme or modules. Such as project/module documents (docx), spreadsheets (xlsx), fonts (woff). The list can be endless.

alexpott’s picture

@kenorb I agree with removing the rule - I just think we need to cover all the extensions that are included in core.

kenorb’s picture

The issue is about bug report where file are being corrupted, because of current version of .gitattributes. Listing all file extensions from the core to implement additional rules sounds like a separate feature request.

alexpott’s picture

Status: Needs review » Needs work

@kenorb nope - removing the rule affects all of these files.

mlncn’s picture

I was frustrated by Drupal's apparently .gitattributes overriding my own (globally and a level up in the repository) without knowing what was going wrong. If this approach stays we should at least add a couple more font files to the "this is not text please stop breaking everything" list:

*.eot     -text diff
*.woff    -text diff
kenorb’s picture

Some people say (at least 3) that using text=auto and eol=lf at the same time is misleading and both options contradict each other, since eol option disables automatic detection of text files, in other words eol=lf effectively sets text. I'm not sure if that's valid claim. Any comments?

alexpott’s picture

Status: Needs work » Needs review
FileSize
3.29 KB
  • bak - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • config - web.config in the root should be text
  • css - already is text
  • data - this only exists because of a weird test - should file a followup to change to .php
  • dist - should be text cause of phpcs.xml.dist
  • engine - already is text
  • gif - not text
  • gitignore - git should do this right
  • gz - not text
  • html - already is text
  • ico - not text
  • inc - already is text
  • install - already is text
  • jpeg - not text
  • jpg - not text
  • js - already is text
  • json - already is text
  • lock - already is text
  • make - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • map - should be text cause of .map for minified js
  • md - already is text
  • module - already is text
  • orig - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • php - already is text
  • png - not text
  • po - already is text
  • profile - should be text cause of standard.profile
  • save - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • script - already is text
  • sh - already is text
  • sql - already is text
  • svg - tricky it can be both text or binary :) https://commons.wikimedia.org/wiki/Help:SVG
  • svgz - should be binary
  • swo - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • swp - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • theme - already is text
  • twig - should be text
  • txt - already is text
  • xml - already is text
  • xtmpl - did not add because in general these files shouldn't be added to .git - the only example is HtAccessTest
  • yml - already is text

Additionally I've removed .info and .test as we no longer have files with these extensions.

rootwork’s picture

I would back up @mlncn's suggestion in #24 that we add those font-file extensions. I've run into that a bunch of times with theming.

alexpott’s picture

@rootwork if we do #26 then we don't need to as

+++ b/.gitattributes
@@ -16,27 +16,27 @@
-# Auto-detect text files, ensure they use LF.
-*         text=auto eol=lf
-

Will mean that fonts will no longer be treated as text.

rootwork’s picture

Oh I get it now. Sorry for being dense!

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

Looks like this is on target now. Thanks for helping get this straight

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 26: 2333243-26.patch, failed testing.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community

Unrelated test fail.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 26: 2333243-26.patch, failed testing.

joelpittet’s picture

Status: Needs work » Reviewed & tested by the community
kenorb’s picture

Re #25, see: EOL conversion on checkout for text files only and What will `* text=auto eol=lf` in gitattributes do? which makes text=auto and eol=lf invalid combination as it is currently (not a patch).

xjm’s picture

Version: 8.1.x-dev » 8.2.x-dev
Status: Reviewed & tested by the community » Fixed
Issue tags: +8.2.0 release notes

As a result, we can remove the list of exceptions at the end of .gitattributes.

The current patch does not seem to implement this part of the proposed resolution. However, I think this is fine -- being explicit is better to make sure we do not corrupt files. #26 seems comprehensive for me.

I think #24 is also already addressed by the current patch and anything else can be a followup. I think #4 in particular is potentially worth followups.

Since this file is in the root of the Drupal site and could override user customizations, I think it is best to commit to 8.2.x only to avoid disruptions. We typically warn people when we change such configuration files in release notes, so tagging for that.

  • xjm committed 8db32d1 on 8.2.x
    Issue #2333243 by kenorb, alexpott, ztl8702, scor, Jon@s, ultimike,...
toomanypets’s picture

@alexpott -- #26 does not address #24. I've tested. In order to treat .eot, .woff, and .woff2 files as binary, they must be explicitly listed in .gitattributes. I'm reopening #2711855: Add binary font file types to .gitattributes.

alexpott’s picture

@toomanypets can you detail how you tested so I can repeat the test?

toomanypets’s picture

FileSize
1.37 KB

@alexpott This is how I tested...

$ drush dl drupal-8.2.x --drupal-project-rename=drupal-8.2.x-2333243
$ cd drupal-8.2.x-2333243
$ curl https://fonts.gstatic.com/s/roboto/v15/CWB0XYA8bzo0kSThX0UTuA.woff2 --create-dirs -o themes/junk/fonts/Roboto.woff2
$ git init -q && git add -A && git commit -q -m "Initial commit."
$ truncate -s +10 themes/junk/fonts/Roboto.woff2
$ git add -A && git commit -q -m "Added 10 bytes to the end of Roboto.woff2"
$ git diff HEAD^ HEAD

The resulting diff simply indicates that the files differ -- this is not the desired behavior with binary files.

$ echo "*.woff2 -text diff" >> .gitattributes
$ git diff HEAD^ HEAD

The resulting diff is correct.

I've attached a bash script if you'd prefer to do it the easy way.

  • xjm committed 8db32d1 on 8.3.x
    Issue #2333243 by kenorb, alexpott, ztl8702, scor, Jon@s, ultimike,...

  • xjm committed 8db32d1 on 8.3.x
    Issue #2333243 by kenorb, alexpott, ztl8702, scor, Jon@s, ultimike,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.