The attached patch could probably be expanded to apply to the entire advagg_install_get_first_advagg_file function, but I wanted to keep it limited to first get your feedback.

advagg_install_get_first_advagg_file relies on the PHP scandir function to find files. I think a better approach would be to use file_scan_directory.

The file_scan_directory function supports remote wrappers, allows for a regex mask to retrieve specific files (css, js, gz, br, etc.), and automatically skips any hidden file starting with a period.

The patch adds a conditional for s3fs to build the $scanned_directory array using file_scan_directory. To keep it simple, the array is an exact match for the format of the current $scanned_directory.

For this to be a more thorough solution, the rest of the function should be reworked to use the default file_scan_directory array format. For example:

Array
(
    [public://advagg_js/js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js] => stdClass Object
        (
            [uri] => public://advagg_js/js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [filename] => js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [name] => js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw
        )

    [public://advagg_js/js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js] => stdClass Object
        (
            [uri] => public://advagg_js/js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [filename] => js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [name] => js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw
        )

Please review the attached patch and appreciate your feedback. Thanks.

Comments

ron_s created an issue. See original summary.

sgdev’s picture

Status: Active » Needs review
sgdev’s picture

The previous version of this patch contains code that should be part of Issue #2972528:

Add consistent support for the s3fs no_rewrite_cssjs setting
https://www.drupal.org/project/advagg/issues/2972528

The new patch removes these items. Also, I don't believe this patch solves the core problems I have been seeing with AdvAgg. I am ok with closing this issue if you don't want to include this approach, or believe it does not add value.

mikeytown2’s picture

A better rewrite looks like a good idea here!

sgdev’s picture

StatusFileSize
new1.83 KB

Attached is an updated patch for review, including a typo I made in the previous one.

I didn't do a full rewrite of the loop used to identify the file. All of the next() and prev() pointers were turning into more complex code than already exists.

sgdev’s picture

Thinking about this a little more... is this code essentially attempting to find the first non-compressed JS or CSS file it can load?

If so, can all of the checking code be replaced with the right regex (with the exception of maybe checking when using s3fs and the file is compressed but no gz/br extension)?

sgdev’s picture

Here is an idea... this regex would check for the filename being 91 characters or longer, and filter out any filename that does not follow the specific name format or have gz/br extensions. Obviously would replace "js" with $type to account for css too:

/^(?=.{91,}$)(js__(.*)__(.*)__(.*)\.js)$/

Could then check to see if s3fs + advagg_gzip/advagg_brotli is being used. Your thoughts?

mikeytown2’s picture

see advagg_match_file_pattern()

/**
 * Checks if the filename matches the advagg file pattern.
 *
 * @param string $filename
 *   Path to check.
 *
 * @return int
 *   Returns 1 if the pattern matches, 0 if it does not.
 */
function advagg_match_file_pattern($filename) {
  return preg_match('/.*(j|cs)s' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}\.(j|cs)s$/', $filename);
}
sgdev’s picture

Title: Use file_scan_directory to find first advagg file for s3fs » Use file_scan_directory to find first advagg file
StatusFileSize
new4.38 KB

Attached is an updated patch. Removed the checks for files ending in .gz and .br, since there won't be any due to using the regex. Same for filenames less than 91 characters.

One item where I'd like your opinion is the s3fs files. Right now, the advagg s3fs code generates compressed aggregates with css/js extensions, since S3 used to not accept compressed files. This should eventually be changed, since S3 can now receive gz/br extension files just like a local directory. There is no need to have separate code.

This is not something I have time to update right now. As a bypass, we could include a check for files in this situation to skip scanning. However, I found this leads to problems in attempting to find a first file for s3fs. If I include this, it causes errors for S3, but without it, it is able to access the files (even though they are technically compressed).

Here is the code I mocked up. Would be interested in your thoughts:

  // If using s3fs, and gzip or brotli (but not both) are enabled, the aggregate files are
  // compressed but do not include the .gz or .br extension. In this scenario, all CSS/JS files
  // returned from a file_scan_directory will be compressed, and the file scan should be skipped.
  //
  // @todo: S3 now supports files with .gz/.br extensions. Modify AdvAgg to create the same type
  // of aggregates for S3 as it does for a local directory.
  $scan_files = TRUE;
  if (module_exists('s3fs')) {
    if (variable_get('advagg_gzip', ADVAGG_GZIP) ||
      (variable_get('advagg_brotli', ADVAGG_BROTLI) &&
        function_exists('brotli_compress') && defined('BROTLI_TEXT'))) {
          $scan_files = FALSE;
    }
  }

  if ($scan_files) {
    ...
mikeytown2’s picture

How can you have s3 serve the .gz version of a file? I know CloudFront will do so but I haven't found anything with s3

sgdev’s picture

Sorry, I should have been a bit more specific.

I was referring to serving from CloudFront or something similar, but you make a good point. There are those that might serve directly from S3.

I'm sure I am probably missing something, but if .gz files have public permissions in S3, they can be accessed via URL.

For nginx, would it be possible to use the gzip, gzip_types, and gzip_proxied directives (http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_proxied) for proxied requests?

Supposedly this also requires gzip_http_version and proxy_http_version to match.

Might be totally off base, but just thought I would mention.

mikeytown2’s picture

CloudFront supports origin pull so that will do gzip no problem.

sgdev’s picture

Yes, that is my understanding as well. We are using CloudFront, but need S3 for file storage.

I was trying to understand if there was a way to store files in S3 with the gz extension, rather than requiring separate code to compress and then remove the extension, and then pull this with CloudFront.

I don't think it's worth trying to overanalyze this right now. The current code works just fine, and there could very well be other remote storage solutions that need the same type of process. Ignore my question in #9.

mikeytown2’s picture

StatusFileSize
new1.09 KB

IGNORE; wrong thread

mikeytown2’s picture

Status: Needs review » Closed (works as designed)

Going to go ahead a close this out. Change back if there's more here.