Use file_scan_directory to find first advagg file [#2973392]

The attached patch could probably be expanded to apply to the entire advagg_install_get_first_advagg_file function, but I wanted to keep it limited to first get your feedback.

advagg_install_get_first_advagg_file relies on the PHP scandir function to find files. I think a better approach would be to use file_scan_directory.

The file_scan_directory function supports remote wrappers, allows for a regex mask to retrieve specific files (css, js, gz, br, etc.), and automatically skips any hidden file starting with a period.

The patch adds a conditional for s3fs to build the $scanned_directory array using file_scan_directory. To keep it simple, the array is an exact match for the format of the current $scanned_directory.

For this to be a more thorough solution, the rest of the function should be reworked to use the default file_scan_directory array format. For example:

Array
(
    [public://advagg_js/js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js] => stdClass Object
        (
            [uri] => public://advagg_js/js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [filename] => js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [name] => js__-RcatjcMVenN_967GSANe2P8oIeNBGgC4UKzvpzlbLQ__VcPXp-Bs_pakqWd250XB33QSCDRceKeiaG2R06_vYXg__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw
        )

    [public://advagg_js/js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js] => stdClass Object
        (
            [uri] => public://advagg_js/js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [filename] => js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw.js
            [name] => js__05vTYExjJRw2KKp2ZbSjdZ6BU2T5ryX2k9hjnQmB6rc__VIwUkVSWyPwIiPBsETa3yOspbnRI_1C_6UyiYmO0waM__BnaddM1d0_Vxx6gPx5Kv7Ke6Klx40k62BA3pbd0Tkpw
        )

Please review the attached patch and appreciate your feedback. Thanks.

Comment	File	Size	Author
#14	advagg-s3fs-2973392-14.patch	1.09 KB	mikeytown2
#9	advagg-use_file_scan_directory_for_first_file-2973392-9.patch	4.38 KB	sgdev
#5	advagg-use_file_scan_directory_for_first_file-2973392-5.patch	1.83 KB	sgdev
#3	advagg-use_file_scan_directory_for_s3-2973392-3.patch	1.14 KB	sgdev
	advagg-use_file_scan_directory_for_s3-1.patch	3.21 KB	sgdev

Comments

Comment #1

16 May 2018 at 18:18

ron_s created an issue. See original summary.

Comment #2

sgdev commented 16 May 2018 at 18:25

Status:

Active

» Needs review

Comment #3

sgdev commented 19 May 2018 at 20:59

Status	File	Size
new	advagg-use_file_scan_directory_for_s3-2973392-3.patch	1.14 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	advagg-use_file_scan_directory_for_s3-1.patch	3.21 KB

The previous version of this patch contains code that should be part of Issue #2972528:

Add consistent support for the s3fs no_rewrite_cssjs setting
https://www.drupal.org/project/advagg/issues/2972528

The new patch removes these items. Also, I don't believe this patch solves the core problems I have been seeing with AdvAgg. I am ok with closing this issue if you don't want to include this approach, or believe it does not add value.

Comment #4

mikeytown2 commented 30 May 2018 at 02:21

A better rewrite looks like a good idea here!

Comment #5

sgdev commented 31 May 2018 at 17:39

Status	File	Size
new	advagg-use_file_scan_directory_for_first_file-2973392-5.patch	1.83 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	advagg-use_file_scan_directory_for_s3-2973392-3.patch	1.14 KB

Attached is an updated patch for review, including a typo I made in the previous one.

I didn't do a full rewrite of the loop used to identify the file. All of the next() and prev() pointers were turning into more complex code than already exists.

Comment #6

sgdev commented 31 May 2018 at 17:45

Thinking about this a little more... is this code essentially attempting to find the first non-compressed JS or CSS file it can load?

If so, can all of the checking code be replaced with the right regex (with the exception of maybe checking when using s3fs and the file is compressed but no gz/br extension)?

Comment #7

sgdev commented 31 May 2018 at 18:08

Here is an idea... this regex would check for the filename being 91 characters or longer, and filter out any filename that does not follow the specific name format or have gz/br extensions. Obviously would replace "js" with $type to account for css too:

/^(?=.{91,}$)(js__(.*)__(.*)__(.*)\.js)$/

Could then check to see if s3fs + advagg_gzip/advagg_brotli is being used. Your thoughts?

Comment #8

mikeytown2 commented 31 May 2018 at 23:16

see advagg_match_file_pattern()

/**
 * Checks if the filename matches the advagg file pattern.
 *
 * @param string $filename
 *   Path to check.
 *
 * @return int
 *   Returns 1 if the pattern matches, 0 if it does not.
 */
function advagg_match_file_pattern($filename) {
  return preg_match('/.*(j|cs)s' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}' . ADVAGG_SPACE . '[A-Za-z0-9-_]{43}\.(j|cs)s$/', $filename);
}

Comment #9

sgdev commented 3 June 2018 at 22:17

Title:

Use file_scan_directory to find first advagg file for s3fs

» Use file_scan_directory to find first advagg file

Status	File	Size
new	advagg-use_file_scan_directory_for_first_file-2973392-9.patch	4.38 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	advagg-use_file_scan_directory_for_first_file-2973392-5.patch	1.83 KB

Attached is an updated patch. Removed the checks for files ending in .gz and .br, since there won't be any due to using the regex. Same for filenames less than 91 characters.

One item where I'd like your opinion is the s3fs files. Right now, the advagg s3fs code generates compressed aggregates with css/js extensions, since S3 used to not accept compressed files. This should eventually be changed, since S3 can now receive gz/br extension files just like a local directory. There is no need to have separate code.

This is not something I have time to update right now. As a bypass, we could include a check for files in this situation to skip scanning. However, I found this leads to problems in attempting to find a first file for s3fs. If I include this, it causes errors for S3, but without it, it is able to access the files (even though they are technically compressed).

Here is the code I mocked up. Would be interested in your thoughts:

  // If using s3fs, and gzip or brotli (but not both) are enabled, the aggregate files are
  // compressed but do not include the .gz or .br extension. In this scenario, all CSS/JS files
  // returned from a file_scan_directory will be compressed, and the file scan should be skipped.
  //
  // @todo: S3 now supports files with .gz/.br extensions. Modify AdvAgg to create the same type
  // of aggregates for S3 as it does for a local directory.
  $scan_files = TRUE;
  if (module_exists('s3fs')) {
    if (variable_get('advagg_gzip', ADVAGG_GZIP) ||
      (variable_get('advagg_brotli', ADVAGG_BROTLI) &&
        function_exists('brotli_compress') && defined('BROTLI_TEXT'))) {
          $scan_files = FALSE;
    }
  }

  if ($scan_files) {
    ...

Comment #10

mikeytown2 commented 5 June 2018 at 21:25

How can you have s3 serve the .gz version of a file? I know CloudFront will do so but I haven't found anything with s3

Comment #11

sgdev commented 6 June 2018 at 00:19

Sorry, I should have been a bit more specific.

I was referring to serving from CloudFront or something similar, but you make a good point. There are those that might serve directly from S3.

I'm sure I am probably missing something, but if .gz files have public permissions in S3, they can be accessed via URL.

For nginx, would it be possible to use the gzip, gzip_types, and gzip_proxied directives (http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_proxied) for proxied requests?

Supposedly this also requires gzip_http_version and proxy_http_version to match.

Might be totally off base, but just thought I would mention.

Comment #12

mikeytown2 commented 6 June 2018 at 01:28

CloudFront supports origin pull so that will do gzip no problem.

Comment #13

sgdev commented 6 June 2018 at 06:01

Yes, that is my understanding as well. We are using CloudFront, but need S3 for file storage.

I was trying to understand if there was a way to store files in S3 with the gz extension, rather than requiring separate code to compress and then remove the extension, and then pull this with CloudFront.

I don't think it's worth trying to overanalyze this right now. The current code works just fine, and there could very well be other remote storage solutions that need the same type of process. Ignore my question in #9.

Comment #14

mikeytown2 commented 24 March 2021 at 23:16

Status	File	Size
new	advagg-s3fs-2973392-14.patch	1.09 KB

IGNORE; wrong thread

Comment #15

mikeytown2 commented 7 February 2022 at 18:27

Status:

Needs review

» Closed (works as designed)

Going to go ahead a close this out. Change back if there's more here.

Use file_scan_directory to find first advagg file

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Related issues

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community