1. drupal_static misses the & sign, no static caching.
  2. If a field_type is given which holds a file id in a column not called fid, the EntityFieldQuery bombs. Entityreference might be an example.
  3. Even if we fix that, we run as many queries as many fields that field type has.
  4. If a field type type is not given, then we run one EFQ for every. single. field. in the site. With a hardwired fid column, mind you, so the query bombing will very quickly fatal.
  5. If you call this function multiple times with different fids, you always get the same results.
  6. If you call this function multiple times with different values for $age, you always get the same results.
  7. You can't leave $field AND $field_type empty, see point 4.
  8. The @return value is absolutely hilarious. The best docs is in wiredocs.module.

Solutions:

  1. Add a & sign.
  2. Figure out the column name from the field foreign keys. Entityreference does maintain this and so do the core fields.
  3. Use file_list_usage to run one query.
  4. This is already fixed in the previous two points.
  5. Cache by fid as well.
  6. Cache by age as well.
  7. This is already fixed as in points 4.
  8. Write proper @return doxygen.

Some of the code already exists in file_file_download which, in itself benefits from this patch: instead of loading entities one by one we load them multiple by the entity type. Also, other consumers of this function will benefit from not getting pseudo-entities but real ones.

Comments

chx’s picture

Assigned: Unassigned » quicksketch
Status: Active » Needs review
StatusFileSize
new6.82 KB
andypost’s picture

+++ b/core/modules/file/file.field.incundefined
@@ -930,6 +930,26 @@ function file_field_formatter_view($entity_type, $entity, $field, $instance, $la
+ * Helper to determine whether a field references files.

...references a {file_managed}

+++ b/core/modules/file/file.field.incundefined
@@ -930,6 +930,26 @@ function file_field_formatter_view($entity_type, $entity, $field, $instance, $la
+ * @param array $field

Needs description, probably a field_instance

+++ b/core/modules/file/file.field.incundefined
@@ -930,6 +930,26 @@ function file_field_formatter_view($entity_type, $entity, $field, $instance, $la
+ * @return bool

blank line needed before

+++ b/core/modules/file/file.moduleundefined
@@ -1670,24 +1648,67 @@ function file_icon_map(File $file) {
+      elseif (!isset($file_fields[$entity_type][$bundle])) {
+        $instances = field_info_instances($entity_type, $bundle);
+        $file_fields[$entity_type][$bundle] = array();
+        foreach ($instances as $field_name => $instance) {

need more inline docs

chx’s picture

StatusFileSize
new7.99 KB

Added a lot of comments.

Status: Needs review » Needs work

The last submitted patch, 1805690_3.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new8.15 KB

See, we do have test coverage for this. And, I went overboard with caching. Also, I clarified another weird edge case where both field and field_type is specified but the field is not the correct type. Added more and more comments.

chx’s picture

I would like to point out that foreign keys is implemented in entityreference_field_schema so the only contrib module I can think of already is reliable. And, it makes an excellent argument of why you can't just presume fid.

quicksketch’s picture

@chx has been asking me to review this for ages. Upon initial inspection it all seems to make sense to me. It's a more logical way of collecting file usages than what we're doing currently, which is basically legacy code from before we had the file_usage table to reference.

chx’s picture

Instead of this monstrosity of code I put together can we remove this and use the file_usage table in file_file_download?

quicksketch’s picture

Actually upon investigation (and trying to compare to the previous code), it looks like this entire patch was actually committed in #1801726: EntityFieldQuery v2. Doh.

Well the new approach may be more efficient but it certainly is really ugly. Any new cleanup to simplify the code would be great, but seems like it should be a different issue.

quicksketch’s picture

Assigned: quicksketch » Unassigned
quicksketch’s picture

Issue summary: View changes

Added Entityreference to the summary.

devin carlson’s picture

Version: 8.x-dev » 7.x-dev
Issue summary: View changes
StatusFileSize
new8.69 KB

As mentioned in #9 this was already committed to Drupal 8.

I've attached a backport of the current code for Drupal 7 assuming that's the remaining task here with cleanup done in another issue. Otherwise, please feel free to update the issue title/status.

magtak’s picture

StatusFileSize
new8.68 KB

So, I was debugging a site using Commons with a private filesystem. The root cause of the issue is quite irrelevant at this point, but it led me down the rabbit hole to realise that there is an issue with the latest patch. Let me elaborate.

$usage_list = file_usage_list($file);

file_usage_list, will return an array mapping (in the deepest level) IDs to counts https://api.drupal.org/api/drupal/includes%21file.inc/function/file_usag...

thus resulting in an array eg:

array { 
    ["file"]=> array { 
        ["node"]=> array { 
            [216]=>  "1" } 
        }
    }
}

Later on in the patch, there is this foreach:

foreach ($file_usage_list as $entity_type => $entity_ids) {

In which, entity_ids is an array eg:

array { 
    [216]=>  "1",
 }

Notice how the entity ID is actually the key to the array and not the value?

Further down, this array is "fed" into entity_load:

$entities = entity_load($entity_type, $entity_ids);

which does not comply with the array structure entity_load expects, loading node/1 in every case (for the site I was troubleshooting at least - other numbers may come up), which in turn propagates to the root issue I started investigating 3 hours ago :)

I'm attaching a patch that adds an

$entity_ids = array_keys($entity_ids);

That should ensure that the correct entities are loaded in every case.

Status: Needs review » Needs work

The last submitted patch, 12: 1805690_12.patch, failed testing.

magtak’s picture

Status: Needs work » Needs review

12: 1805690_12.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 12: 1805690_12.patch, failed testing.

magtak’s picture

StatusFileSize
new8.73 KB

Attempting to recreate the patch.

chx’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 16: 1805690.patch, failed testing.

chx’s picture

Status: Needs work » Needs review

16: 1805690.patch queued for re-testing.

magtak’s picture

@chx, sorry if the patch is failing, it's my first attempt at committing an actual patch and not proposing the the chance in code blocks :/

chx’s picture

No, no, it's just the bot. Your code is great! Thanks much. I will follow up with a proper review. Do you think you could add tests?

magtak’s picture

Not at this point I'm afraid. Time limitations. Sorry.

BarisW’s picture

It seems that this is already committed to core?

chx’s picture

Not in D7, I believe.

Paul B’s picture

StatusFileSize
new8.74 KB

Re-roll for Drupal 7.43

fietserwin’s picture

Issue summary: View changes
Status: Needs review » Needs work

A few nitpicks and it is OK.

  1. +++ b/modules/file/file.module
    @@ -1042,27 +1022,84 @@ function file_icon_map($file) {
      * @return
    - *   An integer value.
    

    Let's add the type: array or even array[].

    In fact, let's specify the type of the params as well. it is not a must for D7, but it clarifies a lot (while looking at this code, all the time I was thinking that $field was a string...).

  2. +++ b/modules/file/file.module
    @@ -1042,27 +1022,84 @@ function file_icon_map($file) {
    +     $entity_ids = array_keys($entity_ids); ¶
    

    space at end of line after ;

  3. +++ b/modules/file/file.module
    @@ -1042,27 +1022,84 @@ function file_icon_map($file) {
    +     $entity_ids = array_keys($entity_ids); ¶
    +     $entity_info = entity_get_info($entity_type);
    +      // The usage table contains usage of every revision. If we are looking
    

    add 1 space of indentation. to the code lines.

Other remarks:

1) This patch solves bugs that are not described in the IS:
- If you call this function multiple times with different fids, you always get the same results.
- If you call this function multiple times with different values for $age, you always get the same results.
- You can't leave $field AND $field_type empty (hmmm, it is described but differently).
I updated the IS for this.

2) I closed #993082: file_get_file_references() @return doc totally wrong as a duplicate.

Paul B’s picture

Status: Needs work » Needs review

Fietserwin, I think you reviewed the Drupal 8 patch (which has already been committed), not the Drupal 7 patch.

fietserwin’s picture

Status: Needs review » Needs work

I reviewed the patch in #25 ?!

Paul B’s picture

That is the right patch, I need glasses.

Meanwhile I noticed a more serious problem with the patch:

    $usage_list = file_usage_list($file);
    $file_usage_list = isset($usage_list['file']) ? $usage_list['file'] : array();

If a file is referenced by something other than the file module, the references are not found.
In fact I have a custom module that references files. It's based on the file module.
I had been wondering why I'm spammed with errors 'Did not delete temporary file "foo" during garbage collection, because it is in use by the following modules: bar'

I'm not sure if it's a defect in the patch, or if I'm doing something weird?
I can' t test it directly without the patch because without it the function doesn't work with mongo_entities
because of the EntityFieldQuery that's missing an 'entity_type' entityCondition.

fietserwin’s picture

Yes: you can refer to images/files on the public/private file system directly or via a managed file. If you use the insert module to get inline images you even have both.

But I think that the function file_get_file_references() and thus this patch is about references to a managed file (a record in the table file_managed), not to the underlying file/image. So, for me, this patch is not incorrect because of this.

I am at this moment writing a module to find and remove duplicate images(/files) and there, besides searching in the table file_managed, yes, I also have to search through all long text fields and such.

Paul B’s picture

But my custom module does write records to file_managed and file_usage. Other modules than 'file' can do that,
otherwise why would file_usage_list() be keyed by module?
Looking at the image module, it seems to do the same thing. In image_field_update_field() it calls

file_save($file_new);
 file_usage_add($file_new, 'image', 'default_image', $field['id']);

file_save() writes to file_managed and file_usage_add() writes to file_usage.
What is weird is that if I insert an image and then look at the file_usage table, the "module" and "type" are somehow overwritten: module becomes "file" and type becomes "node". In my custom module, no such overwriting takes place.
Anyway, file_get_file_references() for the image also returns an empty array, with or without the patch. Is it supposed to do that?

Paul B’s picture

OK my tests with images were a bit confused - image_field_update_field() is only for default images, also to get the image with file_get_file_references() you have to pass the fourth argument. file_file_download() does that when called from image_file_download().

fietserwin’s picture

[EDIT: updated usages of file_usage_add() by contrib]

@Paul B: you are completely right with your remark in #29: only usages via the file module will be considered.

But what does this mean? I dove into how file_usage_add() is used:

  • Image fields update usage via the file module, thus with 'file' for module.
  • Image field settings defines a default image, this is stored under module 'image', but field settings are not an entity thus this usage should better be ignored if we want to return a multidimensional array keyed on (a.o.) entity type and id.
  • Media updates usage under their own name, i.e. module='media'. These will be missed.
  • File_entity does not call file_usage_add() itself but leaves that (correctly) up to the file module.
  • Migrate correctly corrects user_save() that does not update usage on account creation.
  • Feeds does add its own usage when importing a file because it keeps track of imported stuff (feeds_source table). However the type is the fetcher (class) type whereas the id is the source id. Thus not to be seen as an entity_type/id pair.
  • Imce's use of file_usage_add() is bogus: if an image is uploaded via IMCE, a managed file is created and that managed file (type = 'file, id = $file->fid) is stored as a usage of ... itself. Thus this usage is also better ignored.
  • Entityreference: if this module is used to create a field that can reference managed files, usage is not updated via file_usage_add(). But I guess we would like to get these usages. BTW: entityreference does specify foreign keys in the schemas of its fields.
  • User pictures are correctly added as a usage. The field is named picture. However this is not specified as a foreign key in the schema, nor is it a field (it is a property of the user entity), so file_field_find_file_reference_column() will not find this.
  • Webform records file submissions via webforms under 'webform' for module and, correctly, 'submission' as type with its submission id.
  • I do not know about other modules, including Paul B's custom module.

What does this mean:

  • file_usage_list() will return all references to managed files from file and image fields under the key 'file'.
  • file_usage_list() will return references from media fields but these are not processed by the current patch.
  • file_usage_list() will return references from user pictures but these are not processed by the current patch.
  • file_usage_list() will return some other usages under module 'image', 'imce', 'webform', 'feeds' but these are better ignored.
  • file_usage_list() will not return references from custom reference fields created via entityreference.
  • file_get_file_references() is defined in file.module and should thus be considered a file field specific function. Remember that the file module does not define managed files (that is done in system), but a file field. In that sense this patch does what it should do.
  • IMO, it should thus better be renamed to file_get_file_field_references(). At least it should be clearly documented that other usages than via file or image fields are not returned and these other usages include usages from the user module and contrib modules like media and entityreference.
Paul B’s picture

I think it may be more of a private function, so it should be renamed to _file_get_file_references. Anyway, any bugfixes and improvements have to be added to Drupal 8 first, so I think they're out of scope for this issue. The fix added in #1805690-12: file_get_file_references() is rather bogus has already been added to Drupal 8, see #1997716: file_get_file_references() doesn't load entities correctly.

That would leave only the trailing whitespace issue I guess, however
if I understand #1452100: Private file download returns access denied, when file attached to revision other than current correctly, the new code doesn't work for revisions, and can't work in general, see #1452100-16: Private file download returns access denied, when file attached to revision other than current.

It shouldn't matter for file_file_download() since that function calls file_get_file_references() with FIELD_LOAD_CURRENT (but also see
#992674-22: Private file download returns access denied, but if the old code worked correctly that would still be a regression.

fietserwin’s picture

Yeah, let's do the remarks of #26 and update the function documentation that it is a file field module function and thus only returns file field references (and image field as that "extends" file field) and nothing else. Also renaming is out of scope for D7, so we also don't do that.

ey’s picture

Status: Needs work » Needs review
StatusFileSize
new9.25 KB

I've done the remarks of #26 and re-rolled the patch. Please review.

Status: Needs review » Needs work

The last submitted patch, 36: file-get-file-references-is-rather-bogus-1805690-36.patch, failed testing.

ey’s picture

Status: Needs work » Needs review
Paul B’s picture

Re-rolled it again for 7.56

Paul B’s picture

StatusFileSize
new9.36 KB

previous patch had an extraneous path component

Paul B’s picture

Reroll. I'm not sure if SA-CORE-2017-003 affects this, but the tests should determine that.

idebr’s picture

StatusFileSize
new9.67 KB

Reroll against current 7.x

klausi’s picture

The entity field query in file_get_file_references() is unlimited and will bomb your site if you use a reference many times. Opened #3207669: file_get_file_references() can cause PHP out of memory errors for that small aspect.

avpaderno’s picture

Title: file_get_file_references is rather bogus » file_get_file_references() is rather bogus
Related issues: +#3207669: file_get_file_references() can cause PHP out of memory errors

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.