Problem/Motivation

Using S3:// schema, files are stored as private on AWS requiring a signed URL to access (avoid bucket discovery), the intent is to keep the timeout short in case permission to access the file is revoked.

Page cache of the getExternalURL() response causes the URL not to be regenerated after the presign timeout causing the S3 request to be denied as expired.

Drupal 9.0.8 for testing. 8.x-3.0-alpha16 with the validScheme patch.

Steps to reproduce

$settings['s3fs.upload_as_private'] = TRUE;
Presigned URLs "60|.*"

Assign a file upload to S3 on a content type.
Upload a new file and publish the content
Visit the node and click the link. This should work in most cases (unless slow to click the link)
Wait 60 seconds for the presign to expire and reload the node,
Node will generate using the same URL and presign as previously which will now return "Access Denied Request has expired" from S3.

If the dynamic_page_cache is disabled S3FS can generate a new presign each page load.

Proposed resolution

1) Publish an expire time for the getExternalURL() function (if possible) that populates up to cause the cache to invalidate.
(Not sure this can be done, I know this can be set at the render level but from this deep I'm not aware of a method)

2) Route all requests for presigned URL's through a route/controller callback similar to how Image Style generation is first done so that the call can be redirected with updated presign signature.

Remaining tasks

User interface changes

API changes

Data model changes

Comments

cmlara created an issue. See original summary.

cmlara’s picture

Status: Closed (outdated) » Active

Accidentally opened as a closed report, correcting.

cmlara’s picture

Version: 8.x-3.0-alpha16 » 8.x-3.x-dev

#2352009: Bubbling of elements' max-age to the page's headers and the page cache likely relevant. Even if there were a method to set how long the link should be cached core appears to have an issue with bubbling it to the top. I see jansete commented in that thread so this might be a duplicate ticket. I will keep my eyes open and adjust should I come across a parent issue.

Cache concerns also raised in #3007043: Image styles not generated due to page cache that may need to be considered if a controller is used.

cmlara’s picture

It looks like that we may need #2867355: file_create_url() causes LogicException when used with certain stream wrappers or #2669074: Convert file_create_url() & file_url_transform_relative() to service, deprecate it to accomplish this.

Option of creating a controller isn't viable due to security concerns (no good way to know its safe to allow access to the file).

This leads us either

A) Bubbling the metadata up, which isn't viable yet without the above issue(s) being committed to core.
B) Accepting this is 'Works as Designed' related to the site administrator (in this case myself) failing to match cache limits with presign timeouts.

Setting as postponed on core as it would be nice to bubble this up, but for now it appears this needs to be a site configuration issue.

helioha’s picture

Is there a workaround for this issue other than turning off page cahe?

cmlara’s picture

Unfortunately there is no real good method around this at all with the Internal Page Cache (IPC) as it 100% ignores max-age, see #2352009: Bubbling of elements' max-age to the page's headers and the page cache for more details. The IPC being enabled is basically a no-go.

For the Internal Dynamic Page Cache (DPC) #2669074: Convert file_create_url() & file_url_transform_relative() to service, deprecate it wasn't the answer to all our problems that I thought it would be. I haven't looked in depth since that issue landed to see alternative paths.

A quick search reveals it may be worth looking into an OutboundPathProcessor to add the metadata as noted in https://www.drupal.org/node/2480761 however I believe many locations may use FALSE on toString() which renders this solution non viable. We could try adding cache bubbling inside a FileUrlGenerator service decorator however I'm not sure that will work either (but worth a shot).

If neither of those works, I have mocked up (however it isn't production ready) some code to use #lazy_builder's. This requires that the content only be in areas that are for logged in users (so the DPC is used) or for the IPC to be disabled. #lazy_builder's involve rendering the (majority) of the page and storing it in the DPC however at the 'last minute' a token placeholder in the page is replaced via a callback function with a freshly generated link before being sent. Potentially this could be worked in to have some level of caching of the page rather than using the DPC for every hit for anonymous users however I never got that far in mockups. For reference see: https://www.drupal.org/node/2498803. If you are interested I could likely pull it out of my dev lab and post it, however it is very rough.

kingdutch’s picture

Title: S3 Access Denied Request has expired - getExternalUrl() cached beyond presign timeout » [PP-1] S3 Access Denied Request has expired - getExternalUrl() cached beyond presign timeout
Related issues: +#3358113: [Meta] Allow StreamWrapper's to provide cacheability metadata
grasmash’s picture

cmlara’s picture

RE:CCO

I do not believe so.

Firstly because there is currently no ability to even provide metadata for caching from a StreamWrapper, meaning there is nothing to bubble up to CCO.

Secondly:
IIRC CCO has known problems that can only be solved by Core fixing its metadata handling problems

grasmash’s picture

I found a workaround for this that is good enough for my own needs. Basically, you need to user alter hooks to add cachebility metadata at the entity level. In my case, I wanted to add a max-age to a node when it had a file field with a presigned url that was set to expire.

This could be simplified. I had two different fields I had to take into account. Posting for posterity:

/**
 * Implements hook_node_load().
 */
function mymodule_node_load($entities) {
  $map = [
    // node-type => s3 key.
    'song' => [
      's3_key' => 'songs',
      // We only override the max-age if one of the file fields in actually populated.
      'required_fields' => ['field_song_audio', 'field_song_archive'],
    ],
    'firmware_release' => [
      's3_key' => 'firmware-releases',
      'required_fields' => ['field_firmware_archive'],
    ]
  ];

  foreach ($entities as $entity) {
    if ($entity->getEntityTypeId() == 'node' && in_array($entity->getType(), array_keys($map))) {
      if (!shouldOverrideCacheHeaders($map, $entity)) {
        continue;
      }

      $config = \Drupal::config('s3fs.settings');
      if ($config->get('presigned_urls')) {
        $presigned_urls = getPresignedUrls($config->get('presigned_urls'));
        $s3_key = $map[$entity->getType()]['s3_key'];
        foreach ($presigned_urls as $blob => $timeout) {
          if (preg_match("^$blob^", $s3_key)) {
            $cachebility_metadata = new CacheableMetadata();
            $cachebility_metadata->setCacheMaxAge((int) $timeout);
            $entity->addCacheableDependency($cachebility_metadata);
            break;
          }
        }
      }
    }
  }
}

/**
 * @param $map
 * @param $entity
 *
 * @return bool
 */
function shouldOverrideCacheHeaders($map, $entity): bool {
  foreach ($map[$entity->getType()]['required_fields'] as $field_name) {
    if (!$entity->{$field_name}->isEmpty()) {
      return TRUE;
    }
  }

  return FALSE;
}

/**
 * @param string $presigned_urls_config
 *
 * @return array
 * @see \Drupal\s3fs\StreamWrapper\S3fsStream->__construct())
 */
function getPresignedUrls(string $presigned_urls_config): array {
  $presigned_urls = [];
  foreach (explode(PHP_EOL, $presigned_urls_config) as $line) {
    $blob = trim($line);
    if ($blob) {
      if (preg_match('/(.*)\|(.*)/', $blob, $matches)) {
        $blob = $matches[2];
        $timeout = $matches[1];
        $presigned_urls[$blob] = $timeout;
      }
      else {
        $presigned_urls[$blob] = 60;
      }
    }
  }
  return $presigned_urls;
}