file_entity_file_presave() overwrites filemime data generated by other modules. It seems from the comment that the intention is to just make sure the filemime is up-to-date ("// Always ensure the filemime property is current"). However in practice it overwrites the filemime any time there is an original file even if the filemime is already present:
function file_entity_file_presave($file) {
// Always ensure the filemime property is current.
if (!empty($file->original) || empty($file->filemime)) {
$file->filemime = file_get_mimetype($file->uri);
}
This is an issue because file_get_mimetype() is not a reliable way of getting a filemime from remote files that have content-types like "application/octet-stream" or "content-type: text/csv; charset=utf-8"
The following patch only updates the filemime if it has changed since the original:
// Always ensure the filemime property is current.
if (!empty($file->original) || empty($file->filemime) &&
isset($file->original->filemime) &&
$file->filemime != $file->original->filemime) {
$file->filemime = file_get_mimetype($file->uri);
}
Comment | File | Size | Author |
---|---|---|---|
#8 | file_entity-avoid_bashing_mime_type-2570377-8-7.x.patch | 3.73 KB | GuyPaddock |
| |||
#2 | file-entity-preserve-filemime.patch | 533 bytes | acouch |
Comments
Comment #2
acouch CreditAttribution: acouch at NuCivic (formerly Nuams) commentedComment #3
acouch CreditAttribution: acouch at NuCivic (formerly Nuams) commentedComment #4
acouch CreditAttribution: acouch at NuCivic (formerly Nuams) commentedComment #5
GuyPaddock CreditAttribution: GuyPaddock at Inveniem commented@acouch I think the intent of this code was that if the content of a file record was replaced (e.g. you replace a GIF with a JPG using "edit file"), the mime type gets updated along with the content change. It doesn't seem like your proposed patch addresses that scenario, since you're only updating the mime type if the mime type has changed (it might not change if this is the place in the code that would handle that update).
Since the mime type detection tends to be based on a filename change, it may be safer here to do one of the following:
file_get_mimetype()
returns"application/octet-stream"
, since that's the default fallback if it cannot detect the mime type (it's a fancy way to say just "data").file_get_mimetype()
on the original file and running it on the new file returns a different MIME type for each (i.e. if it returns "image/gif" on the original but "image/jpeg" on the new file, change the mime type; but if both return"application/octet-stream"
, leave it alone).file_get_mimetype()
on the original file returns the same MIME type as the original's MIME type, but running it on the new file returns a different MIME type.Comment #6
GuyPaddock CreditAttribution: GuyPaddock at Inveniem commentedAssigning to myself to try to fix this.
Comment #7
joseph.olstadCool!
Comment #8
GuyPaddock CreditAttribution: GuyPaddock at Inveniem commentedAttached is a patch that implements what I described as approach 4 in #5 above.
This modifies how
$file->filemime
is populated so that we don't bash a MIME type that has been manually set by code. The idea is that we only set a MIME type if there is no MIME type already on the file, or if the file is being updated and the previous MIME type of the file matches the MIME type that would have been automatically detected from the original file.The reasoning is that, if the file had a MIME type that matched what we would auto-detect, there's good reason to believe that the MIME type of that file is not being managed manually. But, if the original MIME type is custom (or, if auto-detection is falling back to
application/octet-stream
) then there is no reason to believe that auto-MIME-type-detection will be reliable for the updated file, so we skip it.Comment #9
joseph.olstadnice work, had a quick look at the patch, but haven't tested it yet. Not sure if I have come accross this issue though but seems legit
passes automated tests too! :)
Comment #10
joseph.olstadHi GuyPaddock, I can see you've spent a lot of time on this and done great work. Is code change this something that everyone should have or is this kind of an edge case?
What module requires this? a custom module?
I've read the history above , I'm hesitant to put this in without really knowing the scope. Any specific use cases that can be simply explained? I didn't see any other contrib modules being mentioned, is there one in particular that requires this?
Thanks!
Comment #11
GuyPaddock CreditAttribution: GuyPaddock at Inveniem commentedIn my case, it's required by the Views Data Export PDF module. That module extends Views Data Export (VDE) to support generating HTML and PDF exports of views tables (the HTML format is more for debugging/styling the PDF output, but it's nonetheless an output format). The way that VDE modules work is they create temporary files that report output are written to during a batch process, and then after the report is generated, the output is sent to the user's browser. If the user bookmarks the link to the results page, he or she can retrieve the report again for a limited time, until the temporary file expires. The mime type on the file determines what type of report is being retrieved (e.g. PDF, HTML, CSV, XLSX, etc).
In an older version of Views Data Export PDF (prior to me becoming a maintainer), there was a limitation that PDFs could only be downloaded but not rendered directly in the browser window. In looking into why this was the case, there were multiple issues, but after fixing the issues in the module itself, I discovered that one major problem was that the mime-type being set by the module was always being overridden by File Entity, from "application/pdf" to "application/octet-stream" because File Entity did not recognize the mime type based on the filename. When File Entity was not installed, this problem did not occur.
This is a code change that everyone should have for the simple reason that File Entity breaks the contract core defines for the
filemime
field on files, and it does so for all files being saved to the database, even when File Entity is not being asked to manage those files (e.g. temporary files). Core's contract is that an application can set thefilemime
field to the specific MIME type of the file that is in the managed files table, but with File Entity this field becomes a read-only representation of the MIME type that is intrinsically linked to the name of the file. File Entity always has the last word on what the MIME type will be on a file any time it is saved through normal File API means.Modules cannot set a custom MIME type for a file at file creation or update time. The MIME type is always set by File Entity, no matter what, even if the MIME type that File Entity sets is more generic or less useful than what a module has set.
The thorny part of this problem is that sites and other contrib modules likely depend on this behavior. So, we cannot simply remove the logic that sets the MIME type, nor can we only set the MIME type at creation time (as was the approach in #2) because this would break the experience for users who use the "Replace file" functionality in the Admin UI to swap, for example, a PNG image for a JPEG image. The approach I took, therefore, is that File Entity should only force the MIME type if:
Without this patch, the only workarounds I have found are:
drupal_write_record()
directly to modify thefilemime
field of the new record without going through the File API. This works but is a kludge, and it comes with the performance penalty of doubling the number of UPDATE queries, which can be a problem for operations creating or updating a large number of files.Comment #12
joseph.olstadOk thanks for the detail explanation, looking more at the patch code,
there is a return value for
however it is not used by the caller.
Should the caller be concerned about the return value in this case?
Comment #13
joseph.olstadHi @GuyPaddock, looked closer at the patch, seeing as the return value is not used, assuming it is probably safe to just remove the return line in that function.
Are you ok with this.?
Joseph.Olstad
Comment #14
GuyPaddock CreditAttribution: GuyPaddock at Inveniem commented@Joseph.Olstad: The return type isn't strictly required, no. I was providing it just in case another module needed to know whether or not the MIME type was updated or not without having to replicate the logic here. But, applying YAGNI principle, it can be removed, yes.
Comment #15
joseph.olstadSorry I released file_entity 7.x-2.28 without this patch, maybe next release.