Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Updated with #89.
Problem/Motivation
The file_scan_directory()
function includes a default nomask
pattern that is redundant. It excludes the CVS directory which is now long time history and therefore very unlikely to be used.
Proposed resolution
Remove the default nomask.
Remaining tasks
None
User interface changes
None.
API changes
It is possible that a specific contrib module that was relying on the default nomask
pattern will have to supply a custom one.
Comment | File | Size | Author |
---|---|---|---|
#90 | file_scan_directory-nomask-1165694-90.patch | 2.77 KB | Sutharsan |
#90 | interdiff-1165694-89-90.txt | 817 bytes | Sutharsan |
#89 | file_scan_directory-nomask-1165694-89.patch | 2.68 KB | Sutharsan |
#89 | interdiff-1165694-86-89.txt | 1.35 KB | Sutharsan |
#86 | file_scan_directory-nomask-1165694-86.patch | 2.67 KB | Sutharsan |
Comments
Comment #1
pillarsdotnet CreditAttribution: pillarsdotnet commentedPatch.
Comment #2
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #3
droplet CreditAttribution: droplet commented- need to change function doc as well
- consider remove CVS or also add .svn, .hg, the common version control folders..
Comment #4
pillarsdotnet CreditAttribution: pillarsdotnet commentedAdded a subset of the exclusions used by rsync in its cvs-exclude mode, and changed docs to match.
Comment #5
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #6
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #7
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #8
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #9
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #10
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #11
pillarsdotnet CreditAttribution: pillarsdotnet commented#4: file_scan_directory-nomask_defaults-1165694-4.patch queued for re-testing.
Comment #12
Aron NovakWhat about the infamous __MACOSX directory? Maybe it should be excluded as well.
Otherwise, the patch seems to be great.
Comment #13
pillarsdotnet CreditAttribution: pillarsdotnet commentedI fail to understand why a __MACOSX directory would be deliberately included within a drupal website, and if it were, why we would want to exclude it by default.
I originally got involved in this issue because the automatic scans of my
.git
folders were killing performance, and I was unwilling to delete them. It is my understanding that __MACOSX folders, if they exist, have only a few small files.Comment #14
Aron NovakThey can be copied to files directory by any user by accident. AFAIK some contrib modules suggest you to copy files/directories directly into /files/ for later processing - that's why it can appear in the filesystem.
Those folders can be big or not, not sure if it is false or not: "Some take as much as 30% of the file." - from:
http://superuser.com/questions/104500/what-is-macosx-folder
Personally i've seen such folders under drupal site's files folder, the question if they can grow big or not actually.
Why we should exclude it by default? Cause it's a metadata what would even not appear if the originating ZIP is extracted under macosx.
Comment #15
pillarsdotnet CreditAttribution: pillarsdotnet commentedWhatever. Nobody is going to RTBC this anyway.
Comment #16
pillarsdotnet CreditAttribution: pillarsdotnet commented#15: file_scan_directory-nomask_defaults-1165694-15.patch queued for re-testing.
Comment #17
Everett Zufelt CreditAttribution: Everett Zufelt commented+1 on the concept. Patch applies cleanly on 8.x.
Sadly not familiar enough w/ regex to give a +1 on implementation.
Comment seems a bit verbose, can we summarize what is included (cvs, git, bzr, temp, etc... and let those who care look at api.d.o?
Comment #18
pillarsdotnet CreditAttribution: pillarsdotnet commentedI suppose I should write some unit tests to prove that it works, such as creating a directory structure containing at least one of each excluded type, plus at least one of each commonly-included type, and test that a file_scan_directory() over the directory tree returns only the expected results.
That is exactly what the comments are intended to summarize.
And that is exactly where the comments would be published.
Really, I don't understand what, if anything, you are objecting to here. Can you provide a patch, or at least an annotated quote, to illustrate?
Comment #19
Everett Zufelt CreditAttribution: Everett Zufelt commented* The default includes the following:
* - Current, parent, CVS, RCS, SCCS, SVN, GIT and BZR directories. Temporary file prefixes and suffixes
Basically I don't see a need to document all the regex patterns in the doc block when they are in the code.
Comment #20
pillarsdotnet CreditAttribution: pillarsdotnet commentedFine. CNW until I get around to rolling another patch, as well as unit tests.
But honestly, I've kinda lost hope of this ever making it into core.
Comment #21
Everett Zufelt CreditAttribution: Everett Zufelt commentedI definitely think it should be in core, but maintainers only look at issues like this when crit / maj bugs and tasks are under their thresholds. * note I realize that this has been sitting around for a while.
With tests and the doc block change I would be happy to set RTBC.
Comment #22
pillarsdotnet CreditAttribution: pillarsdotnet commentedInteresting. I find that even with
$options = array('nomask' => '/^NOMASK$/')
, directories beginning with a leading'.'
are excluded fromfile_scan_directory()
results.Setting CNR for testbot anyway.
Comment #24
droplet CreditAttribution: droplet commentedThis line from file_scan_directory excluded '.' (All file start with DOT)
Comment #25
pillarsdotnet CreditAttribution: pillarsdotnet commentedThanks; dunno how I missed that. A little digging reveals that this feature was added to exclude
.svn
directories.See #91592: file_scan_directory() scans subversion working base (.svn) by default
Gonna upload two different strategies; y'all tell me which one looks better.
Comment #26
pillarsdotnet CreditAttribution: pillarsdotnet commentedCNR for bot.
Comment #27
pillarsdotnet CreditAttribution: pillarsdotnet commentedJust looked at those test results in detail. WTF is "dummy-remote" ??!!!!
Comment #28
pillarsdotnet CreditAttribution: pillarsdotnet commented(sigh)
Comment #30
pillarsdotnet CreditAttribution: pillarsdotnet commentedTrying again.
Comment #31
pillarsdotnet CreditAttribution: pillarsdotnet commentedPosted testbot bug report at http://drupal.org/node/1282144
Comment #33
pillarsdotnet CreditAttribution: pillarsdotnet commentedWhoops! The current and parent directories must *always* be excluded.
Comment #34
pillarsdotnet CreditAttribution: pillarsdotnet commentedBetter tests -- removed a couple of unnecessary asserts and drupal_realpath() calls.
Comment #35
pillarsdotnet CreditAttribution: pillarsdotnet commentedRemoving tag since tests are written and working.
Comment #36
droplet CreditAttribution: droplet commentedIt's few comments
It matched: xCVSx
Same as above, will match xRCSx..etc
mercurial is quick popular now. I remember the temp is .hg ?
(http://mercurial.selenic.com)
I'm not 100% sure if PHP will do it automatically. We don't need capture back here. so our regex can be (?:xxx), performance optimized. (a little bit increase only I think)
10 days to next Drupal core point release.
Comment #37
pillarsdotnet CreditAttribution: pillarsdotnet commented@droplet -- I see that you are reviewing the alllow patch rather than the exclude patch. Is that the version you prefer?
I disagree, and will add tests to prove it.
Thanks; will add this.
Yup; will do.
Comment #38
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #39
droplet CreditAttribution: droplet commentedahh. I can see it now. I missed this 2 => "^" "$".
It's quick hard to read.
can we rewrite it to:
I found an example. I think the code standard allowed.
http://api.drupal.org/api/drupal/includes--common.inc/function/filter_xss/8
*** I will try to figure out the different between 2 patch later, maybe tmr ***
Comment #40
pillarsdotnet CreditAttribution: pillarsdotnet commentedI think you mean "quite" hard to read?
Can do. Hopefully @sun will chime in and tell me whether I'm violating his coding standards.
I'm sorry.
.bzr
.darcs
.hg
.svn
Comment #42
pillarsdotnet CreditAttribution: pillarsdotnet commented#40: file_scan_directory-exclude_dot_dirs-1165694-40.patch queued for re-testing.
Comment #43
pillarsdotnet CreditAttribution: pillarsdotnet commented#40: file_scan_directory-include_dot_dirs-1165694-40.patch queued for re-testing.
Comment #45
droplet CreditAttribution: droplet commentedI voted Exclude all .dot dirs
Okay. One more problem I've seen.
These TWO can be optimized
suffixes
patch regex:
I suggested:
It will not attempt to match at character 0,1,2,3,4 to end.
e.g.
abcdefgh.rej
patch regex attempt to match:
a
ab
abc...
......
.............
abcdefgh.
abcdefgh.r
abcdefgh.re
abcdefgh.rej
(until find it or fail)
New one:
a (failed. and move to next)
(SKIP)
. (matched now)
.r
.re
.rej
prefixes
patch regex:
I suggested :
It will stop after matched the pattern.
OLD:
cvslog.F
cvslog.FI
cvslog.FIL
cvslog.FILE
NEW:
cvslog.F
(bingo !!)
I'm not explain very well but hope it make sense.
Well done @pillarsdotnet. Thanks!!
Oh. 04:17 am on my side. Time to bed :)
10 days to next Drupal core point release.
Comment #46
pillarsdotnet CreditAttribution: pillarsdotnet commented@droplet -- like this?
Comment #48
boombatower CreditAttribution: boombatower commentedsub.
mac specific ignore --
Comment #49
boombatower CreditAttribution: boombatower commentedcross-post
Comment #50
pillarsdotnet CreditAttribution: pillarsdotnet commentedFixed -- switching to extended regex requires that I backslash-escape the embedded '#'.
Comment #52
pillarsdotnet CreditAttribution: pillarsdotnet commented@droplet -- Took me a long time to figure out how to use your methods while combining prefixes, suffixes, and whole-name matches into a single regex,
Comment #52.0
pillarsdotnet CreditAttribution: pillarsdotnet commentedRevise using issue summary template.
Comment #53
droplet CreditAttribution: droplet commentedsee what testbots said.
Comment #54
pillarsdotnet CreditAttribution: pillarsdotnet commentedYup. Fixing indentatation...
Comment #54.0
pillarsdotnet CreditAttribution: pillarsdotnet commentedUpdated links to point to latest patch files.
Comment #54.1
pillarsdotnet CreditAttribution: pillarsdotnet commentedUpdate links to latest patch versions.
Comment #55
droplet CreditAttribution: droplet commented@pillarsdotnet
my one with a DOT at the end. I assume all prefixes will contain at least one more char ? e.g.: cvslog.a
Without it, im happy with the CODE part now :)
10 days to next Drupal core point release.
Comment #56
pillarsdotnet CreditAttribution: pillarsdotnet commentedAh. So "cvslog." should be included, while "cvslog.a" should be excluded, in your opinion?
I suppose -- Should I include that distinction in the tests?
Comment #57
droplet CreditAttribution: droplet commented@pillarsdotnet,
I rethink again. it do not necessarily.
Comment #58
pillarsdotnet CreditAttribution: pillarsdotnet commentedWell, I already rewrote the code and tests -- if they pass locally, I'm going to upload.
Comment #59
pillarsdotnet CreditAttribution: pillarsdotnet commentedTests pass locally, and the updated test code is both simpler and more robust, I think.
Comment #59.0
pillarsdotnet CreditAttribution: pillarsdotnet commentedSeparation of the two points of contention.
Comment #60
pillarsdotnet CreditAttribution: pillarsdotnet commentedWhitespace error.
Comment #60.0
pillarsdotnet CreditAttribution: pillarsdotnet commentedUpdated patch links and added two dot-prefixes.
Comment #61
pillarsdotnet CreditAttribution: pillarsdotnet commentedMoved to a separate test case. Since I can't get 'dummy-remote' to create files, there's no sense in adding this test to the "File API (remote)" section.
Comment #62
pillarsdotnet CreditAttribution: pillarsdotnet commentedBetter comments in the tests.
Comment #62.0
pillarsdotnet CreditAttribution: pillarsdotnet commentedUpdated patch links.
Comment #62.1
pillarsdotnet CreditAttribution: pillarsdotnet commentedUpdated patch links again.
Comment #63
Everett Zufelt CreditAttribution: Everett Zufelt commented+ * - 'nomask': The preg_match() regular expression for files to excluded.
files to be excluded.
Looks good to me other wise.
Comment #64
pillarsdotnet CreditAttribution: pillarsdotnet commentedFacepalm!
Comment #65
Everett Zufelt CreditAttribution: Everett Zufelt commentedThis is looking pretty good to me.
Comment #66
sunAlmost all of these look unnecessary to me. Users of file_scan_directory() are supposed to pass in an explicit mask that does not yield false-positive backup files in the first place. This means that .old, .bak, .orig, .rej files are not in collected anyway.
.git, .svn, and .whatever are excluded already now, since we do not recurse into hidden directories (starting with a dot).
The other prefixes and suffixes look like big assumptions to me, which don't necessarily hold water. And in combination with aforementioned point, should not be found by the scan in the first place.
Not sure whether it makes sense to introduce these RCS/LOG/SCCS/.adm cases at this point in time. 6 years ago, that might have been a good idea. But adding them today is pretty much pointless.
This addition to nomask is more or less the only in this patch that might make sense. But highly debatable on its own, as many would consider it a user mistake/error to upload these directories.
9 days to next Drupal core point release.
Comment #67
pillarsdotnet CreditAttribution: pillarsdotnet commentedOf course @sun is right, as always. Sorry I wasted everybody's time.
Comment #68
droplet CreditAttribution: droplet commented@pillarsdotnet,
No. :)
something I think we can do here:
- enable it to able recurse into dir start with DOT?
we can remove CVS now?
\.\. should be removed since a DOT is always exclued.
check $filename[0] != '.' first, better performance.
8 days to next Drupal core point release.
Comment #69
sunWhat's the use-case of doing that? Never heard of anyone wanting to recurse into hidden subdirectories in the past 5 years. Normally, hidden directories are hidden on purpose, not arbitrarily.
. is, but .. is not.
That assumes that there'd be more hidden files/directories than non-hidden, which I don't think is the case (except for, perhaps, in your user home directory). But of course, if you can prove that the other order is faster, then I'll happily shut up. ;)
Comment #70
pillarsdotnet CreditAttribution: pillarsdotnet commentedThe only thing else left to do is to remove the nomask pattern entirely.
Comment #71
droplet CreditAttribution: droplet commented$filename[0] != '.'
$filename = .
$filename = ..
$filename[0] will be . DOT, no?
at least this one can be rewrite from
\.\.?
to
\.\.
Comment #71.0
droplet CreditAttribution: droplet commentedMinor spelling/grammar correction.
Comment #73
pillarsdotnet CreditAttribution: pillarsdotnet commentedThe attached patch hardcodes the exclusion of dot-files, making the nomask option unnecessary.
If that is undesirable, I leave it up to you experts to debate which of the following is fastest:
(!preg_match('/^\.\.?/', $filename) && !preg_match($options['nomask'], $filename))
(!preg_match($options['nomask'], $filename) && !preg_match('/^\.\.?/', $filename))
($filename !== '.' && $filename !== '..' && !preg_match($options['nomask'], $filename))
(!preg_match($options['nomask'], $filename) && $filename !== '.' && $filename !== '..')
Comment #74
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #76
pillarsdotnet CreditAttribution: pillarsdotnet commentedFixed typo.
Comment #78
pillarsdotnet CreditAttribution: pillarsdotnet commentedOkay, have to insert an isset() check.
Comment #80
pillarsdotnet CreditAttribution: pillarsdotnet commented(sigh)
Comment #81
pillarsdotnet CreditAttribution: pillarsdotnet commentedMake docs match code.
Comment #82
MaxWesten CreditAttribution: MaxWesten commented#81 seems to work like a charm....
Site working with .svn much much faster.
Comment #83
sunIf this is about performance now, then we need benchmarks.
I can't believe this change makes a big difference. If it really does, then we might gain a similar improvement by front-anchoring the default nomask PCRE (with
'^'
).Comment #84
xjmComment #84.0
xjmEntire focus changed by direction of @sun.
Comment #85
areke CreditAttribution: areke commentedThe proposed patch no longer applies, so it will need to be re-rolled.
Comment #86
Sutharsan CreditAttribution: Sutharsan commentedRerolled patch from #81.
Since #1808200: Unit test performance significantly decreased since system list config conversion already changed the content of the default 'nomask' and introduced the more efficient filename checking using
$filename[0] !== '.'
, this issue is no longer about performance improvement. Removed the corresponding tags accordingly. The patch now only removes the default value of 'nomask'.Comment #87
sunLooks good to me now.
The code of file_scan_directory() was copied into Drupal\Core\SystemListing::scanDirectory(), but the offending default $nomask does not exist in that copy.
Comment #88
alexpottis the comparison operation
!==
necessary? I don't think it is. We're not using this inSystemListing::scanDirectory()
Also since the scope of the patch has changed we need an issue summary update.
Comment #89
Sutharsan CreditAttribution: Sutharsan commentedComment #90
Sutharsan CreditAttribution: Sutharsan commentedThe above reverts a change in the documentation of the function introduced in previous patches and changes 'period' into 'dot'. Further the comment of #88 is processed.
This one fixes a typo in the comment.
Comment #91
sunThanks! Summary appears to have been updated as well.
Comment #93
alexpottCommitted 9d5e802 and pushed to 8.x. Thanks!
Comment #94
xjmTagging "Missing change record", as completing the change record updates here blocks a beta: #2083415: [META] Write up all outstanding change notices before release
Comment #95
Gábor HojtsyAdded/published change record at https://drupal.org/node/2188553