Problem/Motivation

Currently File entities can not be effectively deserialized because the file which the URI points to does not get copied locally.

Proposed resolution

Temporarily, create a FileEntityNormalizer. Once File is an EntityNG entity, and its properties are handled as such, this could potentially be handled in a field normalizer instead.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

linclark’s picture

Here's a simple start. It assumes that the URI is an HTTP URI.

This opens a potential security exploit if a site's temp directory is publicly accessible, as was discussed in #1927648: Allow creation of file entities from binary data via REST requests. I'm not sure whether the Serializer needs to guard against this, or whether the site should give a warning (if it doesn't already) when a temp directory is publicly accessible.

linclark’s picture

Status: Active » Needs review
Dave Reid’s picture

+++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.phpundefined
@@ -0,0 +1,44 @@
+    $file_data['uri'] = system_retrieve_file($data['uri']);
+

I think actually if you use system_retrieve_file($url, NULL, TRUE) it will do the file/entity saving for you.

This also assumes that all files would go in public directory, regardless of where they came from. Not sure how this would work for temporary or remote (YouTube or Amazon S3 files).

Regardless, this needs to account for a FALSE on entity save or HTTP failure.

linclark’s picture

We don't want the entity to be saved in the Serializer, but just want the file to be pulled in. My stance is that Serializer should alter state as little as possible, so I think saving the file to the temp directory is the desired behavior.

Dave Reid’s picture

Oh duh, I read entity_create() as entity_save().

I'm not sure how system_retrieve_file() would work for private files either.

benjifisher’s picture

linclark’s picture

berdir tells me that the File NG patch is getting close, #1818568: Convert files to the new Entity Field API. The patch might change based on this.

moshe weitzman’s picture

All entities are NG now. This is unblocked AFAICT. Any changes needed?

damiankloip’s picture

Component: serialization.module » hal.module
FileSize
2.54 KB
2.19 KB

I think the normalization from the EntityNormalizer gives us what we need there? Made a couple of other changes.

Status: Needs review » Needs work

The last submitted patch, 1988426-9.patch, failed testing.

damiankloip’s picture

Status: Needs work » Needs review
FileSize
1.18 KB
3.37 KB

Hmm, we will have to make sure that the class/interface exists in the check if we start adding normalizers for other core modules in hal.

So we could do this..

Alternatively, this normalizer could live in file module. But then it will have a soft dep on hal module which doesn't seem right at all.

dawehner’s picture

  1. +++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
    @@ -0,0 +1,53 @@
    +   */
    +  protected $supportedInterfaceOrClass = 'Drupal\file\Entity\File';
    

    I guess we should rather go with the the FileInterface instead.

  2. +++ b/core/modules/hal/lib/Drupal/hal/Normalizer/NormalizerBase.php
    @@ -48,7 +48,7 @@ public function supportsNormalization($data, $format = NULL) {
    +    if (in_array($format, $this->formats) && (class_exists($this->supportedInterfaceOrClass) || interface_exists($this->supportedInterfaceOrClass))) {
    

    Do we actually have to support classes at all?

damiankloip’s picture

FileSize
614 bytes
3.38 KB

I changed this supported interface to FileInterface, although implementing that interface will not guarantee that denormalizing code will work as you could spit out something totally different form the properties, it's pretty specific to the File class behaviour.

We should definitely support classes too. People should be able to target interfaces or specific classes that serialization works for.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

#1892320: Add deserialize for JSON/AJAX moves stuff to the base class, so this patch is fine and in scope.

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs tests

My stance is that Serializer should alter state as little as possible, so I think saving the file to the temp directory is the desired behavior.

I don't think that's what this patch does though. system_retrieve_file() called without a $destination saves to public://, not to temporary://. Should we add tests here for:
- deserializing without it being followed by a $entity->save()
- deserializing with it being followed by a $entity->save()
?

Also, what do we expect in terms of idempotence? If we post the same payload twice, seems like this patch would create two separate local copies of the file: is that what we want? In either case, seems like we should have a test for that as well.

linclark’s picture

Unless I'm really mistaken, then Serializer should definitely not save to public. This could open a very easily exploitable security issue.

linclark’s picture

Status: Needs review » Needs work
damiankloip’s picture

Yep, seems this should save files to tmp only, if anything. Ideally serializer should never save anything :-)

moshe weitzman’s picture

It looks like we just have to pass 'temporary://' as 2nd param to system_retrieve_file(). I would reroll but not on my pc at the moment.

damiankloip’s picture

Status: Needs work » Needs review
FileSize
810 bytes
3.4 KB

I think you're right Moshe. Here we go.

linclark’s picture

Issue tags: +Needs security review

That makes sense. Let's make sure this doesn't get to RTBC without tests, though. I would also prefer we get security review before committing this since it got to RTBC with an easy to exploit hole before.

linclark’s picture

It looks like there is some potentially related work in #2128783: Files are not testable.

moshe weitzman’s picture

20: 1988426-20.patch queued for re-testing.

dawehner’s picture

  1. +++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
    @@ -0,0 +1,53 @@
    +    $file_data = $data;
    +    unset($file_data['_links']);
    +    $file_data['uri'] = system_retrieve_file($data['uri'][0]['value'], 'temporary://');
    +
    +    return $this->entityManager->getStorageController('file')->create($file_data);
    

    Maybe we should explain what is going on here.

  2. +++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
    @@ -0,0 +1,53 @@
    +   * @param \Drupal\Core\Entity\EntityManager $entity_manager
    ...
    +   * @var \Drupal\Core\Entity\EntityManager
    ...
    +  public function __construct(EntityManager $entity_manager) {
    

    ... we do have an interface now.

damiankloip’s picture

Issue summary: View changes
FileSize
6.14 KB
5.13 KB

Ok, so I've added a test for this too.

There were are couple of issues with the current patch that certainly needed fixing, like 'Drupal\file\FileInterface' and not 'Drupal\file\Entity\FileInterface'. As well as adding a normalize() method to create a file url. Otherwise there is no way it can be consumed in denormalize by system_retrieve_file().

dawehner’s picture

+++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
@@ -0,0 +1,64 @@
+    // Unset '_links' property, it's not needed for file entities.
+    unset($data['_links']);

Not needed is a bit odd, does that mean it is actually not expected to be there, or would it be just ignored?

damiankloip’s picture

Issue tags: -Needs tests
FileSize
6.04 KB
805 bytes

True, let's just remove that, it will be ignored if it's not needed, like any other properties.

linclark’s picture

So in the test,

  1. +++ b/core/modules/hal/lib/Drupal/hal/Tests/FileDenormalizeTest.php
    @@ -0,0 +1,65 @@
    +      'uri' => 'public://druplicon.txt',
    ...
    +      'filename' => 'druplicon.txt',
    

    Why are we using the filename druplicon.txt. I think this is kind of confusing since it isn't a druplicon.

  2. +++ b/core/modules/hal/lib/Drupal/hal/Tests/FileDenormalizeTest.php
    @@ -0,0 +1,65 @@
    +    $normalized_data = $serializer->normalize($file, 'hal_json');
    +    $denormalized = $serializer->denormalize($normalized_data, 'Drupal\file\Entity\File', 'hal_json');
    

    So here, it seems we are showing that you can normalize a file entity and get the same data when you denormalize.

    However, I think the functionality that we really want to test is what happens when you provide a URI which doesn't already have a corresponding file entity.

damiankloip’s picture

Why are we using the filename druplicon.txt. I think this is kind of confusing since it isn't a druplicon.

Come on! :) It is not that confusing, plus this same pattern is used in a few other places.

So here, it seems we are showing that you can normalize a file entity and get the same data when you denormalize.

Not just that, mainly that a temporary file is created.

However, I think the functionality that we really want to test is what happens when you provide a URI which doesn't already have a corresponding file entity.

Why does this matter? We shouldn't care about saving the actual file entity once it has been denormalized. I don't see any problems with already having this saved in that case? The uri provided by me in the test is the one we are using, as we need to generate a full uri like http://mytestsite.com/wherever/the/file/is anyway.

I can try to remove the saving. It will make it slightly harder to test and won't give us any benefits though :/ (at least I think not).

linclark’s picture

Why does this matter?

For example, what happens if you provide just the URI? Not the extra metadata that Drupal's file class provides for you (and which is then exported using normalize).

The main reason for implementing this functionality is services... making it possible for people to POST files. Are we saying that people should send all of the metadata when they post it? Or can they just post the URI?

If we are requiring certain metadata (such as the filemime) then we should be clear about which properties are required. We should have a test that ensures that it works when only those properties are submitted.

damiankloip’s picture

FileSize
6.97 KB
1.98 KB

OK, me and Lin spoke about this yesterday. It is a good plan to add tests for just a uri, as it could be a perfectly valid case that people will only have the file uri but want to post this data and create a file entity.

Berdir’s picture

Hah, I was going to suggest that we then need to make sure that File::preCreate() sets the default for mime time, but we already do that, nice :)

What's a bit inconsistent is that mimetime is set on preCreate and size on preSave but both are ensured to be set.

damiankloip’s picture

Yes, I was pleasantly surprised to find out I didn't really have to do very much here :)

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Code and new tests look good to go.

catch’s picture

Status: Reviewed & tested by the community » Needs review
+++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
@@ -0,0 +1,62 @@
+    $data['uri'] = system_retrieve_file($data['uri'][0]['value'], 'temporary://');

I wrote a longer comment but lost it. Short version, system_retrieve_file() doesn't look fit for purpose here - for example it dsm()s rather than logs errors. That might be the best function we have to use here, but at least could use a follow-up to make it a bit less rubbish?

Berdir’s picture

+++ b/core/modules/hal/lib/Drupal/hal/Normalizer/FileEntityNormalizer.php
@@ -0,0 +1,62 @@
+   */
+  public function denormalize($data, $class, $format = NULL, array $context = array()) {
+    $data['uri'] = system_retrieve_file($data['uri'][0]['value'], 'temporary://');
+

Let's just rely Guzzle instead of using that function :)

See http://stackoverflow.com/questions/16939794/copy-remote-file-using-guzzle

Probably needs drupal_basename() call or something like that to set the target file name, but should be fairly easy other than that?

Or do we need to use some of our crazy wrappers to avoid problems with strange file system?

damiankloip’s picture

FileSize
7.44 KB
2.39 KB

Alright, alright :)

This changes it to inject the Guzzle client instead, and use that. This doesn't do any exception handling, I would guess we just want them to be thrown? Then you would know you HTTP request went sour pretty quick.

Berdir’s picture

37: 1988426-37.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 37: 1988426-37.patch, failed testing.

damiankloip’s picture

Status: Needs work » Needs review
FileSize
7.66 KB

Rerolled after changes to hal service injected went in.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Not 100% sure if need the unmanaged file API, but it doesn't hurt.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Thanks, that looks better. Committed/pushed to 8.x, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.