file_entity_file_presave() overwrites filemime data generated by other modules. It seems from the comment that the intention is to just make sure the filemime is up-to-date ("// Always ensure the filemime property is current"). However in practice it overwrites the filemime any time there is an original file even if the filemime is already present:

function file_entity_file_presave($file) {
  // Always ensure the filemime property is current.
  if (!empty($file->original) || empty($file->filemime)) {
    $file->filemime = file_get_mimetype($file->uri);
  }

This is an issue because file_get_mimetype() is not a reliable way of getting a filemime from remote files that have content-types like "application/octet-stream" or "content-type: text/csv; charset=utf-8"

The following patch only updates the filemime if it has changed since the original:

  // Always ensure the filemime property is current.
  if (!empty($file->original) || empty($file->filemime) &&
  isset($file->original->filemime) &&
  $file->filemime != $file->original->filemime) {
    $file->filemime = file_get_mimetype($file->uri);
  }
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

acouch created an issue. See original summary.

acouch’s picture

Issue summary: View changes
FileSize
533 bytes
acouch’s picture

Issue summary: View changes
acouch’s picture

Issue summary: View changes
GuyPaddock’s picture

@acouch I think the intent of this code was that if the content of a file record was replaced (e.g. you replace a GIF with a JPG using "edit file"), the mime type gets updated along with the content change. It doesn't seem like your proposed patch addresses that scenario, since you're only updating the mime type if the mime type has changed (it might not change if this is the place in the code that would handle that update).

Since the mime type detection tends to be based on a filename change, it may be safer here to do one of the following:

  1. Only update the MIME type if the filename changed between the original and the new record.
  2. Only update the MIME type if file_get_mimetype() returns "application/octet-stream", since that's the default fallback if it cannot detect the mime type (it's a fancy way to say just "data").
  3. Only update the MIME type if running file_get_mimetype() on the original file and running it on the new file returns a different MIME type for each (i.e. if it returns "image/gif" on the original but "image/jpeg" on the new file, change the mime type; but if both return "application/octet-stream", leave it alone).
  4. Only update the MIME type if running file_get_mimetype() on the original file returns the same MIME type as the original's MIME type, but running it on the new file returns a different MIME type.
GuyPaddock’s picture

Assigned: Unassigned » GuyPaddock
Status: Active » Needs work

Assigning to myself to try to fix this.

joseph.olstad’s picture

Cool!

GuyPaddock’s picture

Assigned: GuyPaddock » Unassigned
Status: Needs work » Needs review
FileSize
3.73 KB

Attached is a patch that implements what I described as approach 4 in #5 above.

This modifies how $file->filemime is populated so that we don't bash a MIME type that has been manually set by code. The idea is that we only set a MIME type if there is no MIME type already on the file, or if the file is being updated and the previous MIME type of the file matches the MIME type that would have been automatically detected from the original file.

The reasoning is that, if the file had a MIME type that matched what we would auto-detect, there's good reason to believe that the MIME type of that file is not being managed manually. But, if the original MIME type is custom (or, if auto-detection is falling back to application/octet-stream) then there is no reason to believe that auto-MIME-type-detection will be reliable for the updated file, so we skip it.

joseph.olstad’s picture

nice work, had a quick look at the patch, but haven't tested it yet. Not sure if I have come accross this issue though but seems legit
passes automated tests too! :)

joseph.olstad’s picture

Hi GuyPaddock, I can see you've spent a lot of time on this and done great work. Is code change this something that everyone should have or is this kind of an edge case?

What module requires this? a custom module?

I've read the history above , I'm hesitant to put this in without really knowing the scope. Any specific use cases that can be simply explained? I didn't see any other contrib modules being mentioned, is there one in particular that requires this?

Thanks!

GuyPaddock’s picture

What module requires this? a custom module?

In my case, it's required by the Views Data Export PDF module. That module extends Views Data Export (VDE) to support generating HTML and PDF exports of views tables (the HTML format is more for debugging/styling the PDF output, but it's nonetheless an output format). The way that VDE modules work is they create temporary files that report output are written to during a batch process, and then after the report is generated, the output is sent to the user's browser. If the user bookmarks the link to the results page, he or she can retrieve the report again for a limited time, until the temporary file expires. The mime type on the file determines what type of report is being retrieved (e.g. PDF, HTML, CSV, XLSX, etc).

In an older version of Views Data Export PDF (prior to me becoming a maintainer), there was a limitation that PDFs could only be downloaded but not rendered directly in the browser window. In looking into why this was the case, there were multiple issues, but after fixing the issues in the module itself, I discovered that one major problem was that the mime-type being set by the module was always being overridden by File Entity, from "application/pdf" to "application/octet-stream" because File Entity did not recognize the mime type based on the filename. When File Entity was not installed, this problem did not occur.

Is code change this something that everyone should have or is this kind of an edge case?

I've read the history above , I'm hesitant to put this in without really knowing the scope. Any specific use cases that can be simply explained? I didn't see any other contrib modules being mentioned, is there one in particular that requires this?

This is a code change that everyone should have for the simple reason that File Entity breaks the contract core defines for the filemime field on files, and it does so for all files being saved to the database, even when File Entity is not being asked to manage those files (e.g. temporary files). Core's contract is that an application can set the filemime field to the specific MIME type of the file that is in the managed files table, but with File Entity this field becomes a read-only representation of the MIME type that is intrinsically linked to the name of the file. File Entity always has the last word on what the MIME type will be on a file any time it is saved through normal File API means.

Modules cannot set a custom MIME type for a file at file creation or update time. The MIME type is always set by File Entity, no matter what, even if the MIME type that File Entity sets is more generic or less useful than what a module has set.

The thorny part of this problem is that sites and other contrib modules likely depend on this behavior. So, we cannot simply remove the logic that sets the MIME type, nor can we only set the MIME type at creation time (as was the approach in #2) because this would break the experience for users who use the "Replace file" functionality in the Admin UI to swap, for example, a PNG image for a JPEG image. The approach I took, therefore, is that File Entity should only force the MIME type if:

  1. At creation time: It has not already been set.
  2. At update time: The previous MIME type of the file matches what the File Entity module would have set for the previous filename -- indicating that the MIME type was auto-generated at creation time, and therefore it needs to be updated.

Without this patch, the only workarounds I have found are:

  • Try to change the name of the file to get the desired MIME type (assuming it's a file extension that File Entity can recognize); OR
  • After creating/updating a file through the File API, use drupal_write_record() directly to modify the filemime field of the new record without going through the File API. This works but is a kludge, and it comes with the performance penalty of doubling the number of UPDATE queries, which can be a problem for operations creating or updating a large number of files.
joseph.olstad’s picture

Ok thanks for the detail explanation, looking more at the patch code,

there is a return value for

  return $need_to_update_mimetype;

however it is not used by the caller.

  file_entity_update_mimetype($file);

Should the caller be concerned about the return value in this case?

joseph.olstad’s picture

Hi @GuyPaddock, looked closer at the patch, seeing as the return value is not used, assuming it is probably safe to just remove the return line in that function.

Are you ok with this.?

Joseph.Olstad

GuyPaddock’s picture

@Joseph.Olstad: The return type isn't strictly required, no. I was providing it just in case another module needed to know whether or not the MIME type was updated or not without having to replicate the logic here. But, applying YAGNI principle, it can be removed, yes.

joseph.olstad’s picture

Sorry I released file_entity 7.x-2.28 without this patch, maybe next release.