Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Currently File entities can not be effectively deserialized because the file which the URI points to does not get copied locally.
Proposed resolution
Temporarily, create a FileEntityNormalizer. Once File is an EntityNG entity, and its properties are handled as such, this could potentially be handled in a field normalizer instead.
Comment | File | Size | Author |
---|---|---|---|
#40 | 1988426-40.patch | 7.66 KB | damiankloip |
#13 | interdiff-1988426-13.txt | 614 bytes | damiankloip |
#11 | 1988426-11.patch | 3.37 KB | damiankloip |
#11 | interdiff-1988426-11.txt | 1.18 KB | damiankloip |
#9 | 1988426-9.patch | 2.19 KB | damiankloip |
Comments
Comment #1
linclark CreditAttribution: linclark commentedHere's a simple start. It assumes that the URI is an HTTP URI.
This opens a potential security exploit if a site's temp directory is publicly accessible, as was discussed in #1927648: Allow creation of file entities from binary data via REST requests. I'm not sure whether the Serializer needs to guard against this, or whether the site should give a warning (if it doesn't already) when a temp directory is publicly accessible.
Comment #2
linclark CreditAttribution: linclark commentedComment #3
Dave ReidI think actually if you use system_retrieve_file($url, NULL, TRUE) it will do the file/entity saving for you.
This also assumes that all files would go in public directory, regardless of where they came from. Not sure how this would work for temporary or remote (YouTube or Amazon S3 files).
Regardless, this needs to account for a FALSE on entity save or HTTP failure.
Comment #4
linclark CreditAttribution: linclark commentedWe don't want the entity to be saved in the Serializer, but just want the file to be pulled in. My stance is that Serializer should alter state as little as possible, so I think saving the file to the temp directory is the desired behavior.
Comment #5
Dave ReidOh duh, I read entity_create() as entity_save().
I'm not sure how system_retrieve_file() would work for private files either.
Comment #6
benjifisher#1: 1988426-01-file-entity-deserialize.patch queued for re-testing.
Comment #7
linclark CreditAttribution: linclark commentedberdir tells me that the File NG patch is getting close, #1818568: Convert files to the new Entity Field API. The patch might change based on this.
Comment #8
moshe weitzman CreditAttribution: moshe weitzman commentedAll entities are NG now. This is unblocked AFAICT. Any changes needed?
Comment #9
damiankloip CreditAttribution: damiankloip commentedI think the normalization from the EntityNormalizer gives us what we need there? Made a couple of other changes.
Comment #11
damiankloip CreditAttribution: damiankloip commentedHmm, we will have to make sure that the class/interface exists in the check if we start adding normalizers for other core modules in hal.
So we could do this..
Alternatively, this normalizer could live in file module. But then it will have a soft dep on hal module which doesn't seem right at all.
Comment #12
dawehnerI guess we should rather go with the the FileInterface instead.
Do we actually have to support classes at all?
Comment #13
damiankloip CreditAttribution: damiankloip commentedI changed this supported interface to FileInterface, although implementing that interface will not guarantee that denormalizing code will work as you could spit out something totally different form the properties, it's pretty specific to the File class behaviour.
We should definitely support classes too. People should be able to target interfaces or specific classes that serialization works for.
Comment #14
dawehner#1892320: Add deserialize for JSON/AJAX moves stuff to the base class, so this patch is fine and in scope.
Comment #15
effulgentsia CreditAttribution: effulgentsia commentedI don't think that's what this patch does though. system_retrieve_file() called without a $destination saves to public://, not to temporary://. Should we add tests here for:
- deserializing without it being followed by a $entity->save()
- deserializing with it being followed by a $entity->save()
?
Also, what do we expect in terms of idempotence? If we post the same payload twice, seems like this patch would create two separate local copies of the file: is that what we want? In either case, seems like we should have a test for that as well.
Comment #16
linclark CreditAttribution: linclark commentedUnless I'm really mistaken, then Serializer should definitely not save to public. This could open a very easily exploitable security issue.
Comment #17
linclark CreditAttribution: linclark commentedComment #18
damiankloip CreditAttribution: damiankloip commentedYep, seems this should save files to tmp only, if anything. Ideally serializer should never save anything :-)
Comment #19
moshe weitzman CreditAttribution: moshe weitzman commentedIt looks like we just have to pass 'temporary://' as 2nd param to system_retrieve_file(). I would reroll but not on my pc at the moment.
Comment #20
damiankloip CreditAttribution: damiankloip commentedI think you're right Moshe. Here we go.
Comment #21
linclark CreditAttribution: linclark commentedThat makes sense. Let's make sure this doesn't get to RTBC without tests, though. I would also prefer we get security review before committing this since it got to RTBC with an easy to exploit hole before.
Comment #22
linclark CreditAttribution: linclark commentedIt looks like there is some potentially related work in #2128783: Files are not testable.
Comment #23
moshe weitzman CreditAttribution: moshe weitzman commented20: 1988426-20.patch queued for re-testing.
Comment #24
dawehnerMaybe we should explain what is going on here.
... we do have an interface now.
Comment #25
damiankloip CreditAttribution: damiankloip commentedOk, so I've added a test for this too.
There were are couple of issues with the current patch that certainly needed fixing, like 'Drupal\file\FileInterface' and not 'Drupal\file\Entity\FileInterface'. As well as adding a normalize() method to create a file url. Otherwise there is no way it can be consumed in denormalize by system_retrieve_file().
Comment #26
dawehnerNot needed is a bit odd, does that mean it is actually not expected to be there, or would it be just ignored?
Comment #27
damiankloip CreditAttribution: damiankloip commentedTrue, let's just remove that, it will be ignored if it's not needed, like any other properties.
Comment #28
linclark CreditAttribution: linclark commentedSo in the test,
Why are we using the filename druplicon.txt. I think this is kind of confusing since it isn't a druplicon.
So here, it seems we are showing that you can normalize a file entity and get the same data when you denormalize.
However, I think the functionality that we really want to test is what happens when you provide a URI which doesn't already have a corresponding file entity.
Comment #29
damiankloip CreditAttribution: damiankloip commentedCome on! :) It is not that confusing, plus this same pattern is used in a few other places.
Not just that, mainly that a temporary file is created.
Why does this matter? We shouldn't care about saving the actual file entity once it has been denormalized. I don't see any problems with already having this saved in that case? The uri provided by me in the test is the one we are using, as we need to generate a full uri like http://mytestsite.com/wherever/the/file/is anyway.
I can try to remove the saving. It will make it slightly harder to test and won't give us any benefits though :/ (at least I think not).
Comment #30
linclark CreditAttribution: linclark commentedFor example, what happens if you provide just the URI? Not the extra metadata that Drupal's file class provides for you (and which is then exported using normalize).
The main reason for implementing this functionality is services... making it possible for people to POST files. Are we saying that people should send all of the metadata when they post it? Or can they just post the URI?
If we are requiring certain metadata (such as the filemime) then we should be clear about which properties are required. We should have a test that ensures that it works when only those properties are submitted.
Comment #31
damiankloip CreditAttribution: damiankloip commentedOK, me and Lin spoke about this yesterday. It is a good plan to add tests for just a uri, as it could be a perfectly valid case that people will only have the file uri but want to post this data and create a file entity.
Comment #32
BerdirHah, I was going to suggest that we then need to make sure that File::preCreate() sets the default for mime time, but we already do that, nice :)
What's a bit inconsistent is that mimetime is set on preCreate and size on preSave but both are ensured to be set.
Comment #33
damiankloip CreditAttribution: damiankloip commentedYes, I was pleasantly surprised to find out I didn't really have to do very much here :)
Comment #34
moshe weitzman CreditAttribution: moshe weitzman commentedCode and new tests look good to go.
Comment #35
catchI wrote a longer comment but lost it. Short version, system_retrieve_file() doesn't look fit for purpose here - for example it dsm()s rather than logs errors. That might be the best function we have to use here, but at least could use a follow-up to make it a bit less rubbish?
Comment #36
BerdirLet's just rely Guzzle instead of using that function :)
See http://stackoverflow.com/questions/16939794/copy-remote-file-using-guzzle
Probably needs drupal_basename() call or something like that to set the target file name, but should be fairly easy other than that?
Or do we need to use some of our crazy wrappers to avoid problems with strange file system?
Comment #37
damiankloip CreditAttribution: damiankloip commentedAlright, alright :)
This changes it to inject the Guzzle client instead, and use that. This doesn't do any exception handling, I would guess we just want them to be thrown? Then you would know you HTTP request went sour pretty quick.
Comment #38
Berdir37: 1988426-37.patch queued for re-testing.
Comment #40
damiankloip CreditAttribution: damiankloip commentedRerolled after changes to hal service injected went in.
Comment #41
BerdirNot 100% sure if need the unmanaged file API, but it doesn't hurt.
Comment #42
catchThanks, that looks better. Committed/pushed to 8.x, thanks!