Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
http://symfony.com/doc/2.0/components/finder.html
Symfony2 provides a library that handles various issues that deal with files including:
- getting the path of a file
- retrieving file metadata
- handle custom stream wrappers
We currently have a mature handling of files that is improving by the day with the effort to make files to be entities. Perhaps adopting this library will help that effort. Perhaps it won't. We should find out.
Comment | File | Size | Author |
---|---|---|---|
#25 | 1451320-25.patch | 5.23 KB | claudiu.cristea |
Comments
Comment #1
Crell CreditAttribution: Crell commentedI don't think Symfony's Finder system is really a replacement for File Entities. It works at a lower level, and is mostly about file system traversal and manipulation. They're not directly comparable.
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedIf I read the OP properly, the point is about replacing some of our lower-level helper functions (drupal_mkdir(), drupal_chmod(), drupal_dirname(), etc. and possibly some of the methods of DrupalStreamWrapperInterface). We might have to contribute some stuff back (I'm not sure Symfony has our streamwrapper-friendly chmod, for example).
Comment #3
Crell CreditAttribution: Crell commentedDamien: Yes, I was clarifying the OP's statement that this is relating to files becoming entities. I don't think it is. It's a good thing to consider all on its own. :-)
I don't think anyone would object to us contributing back upstream, but we should do it soon as I think 2.1 is supposed to hit stable well before Drupal 8 does.
Comment #4
brianV CreditAttribution: brianV commentedI just read through the Finder component documentation, and I fail to see how it applies to our current file system handling.
It is, as the name implies, primarily built around finding files in a filesystem based on set criteria:
1. You can specify in which directories to search for files, which to exclude, and whether to search recursively.
2. Specify whether the returned list contains is files, directories, or files and directories.
3. How the resulting list should be sorted.
4. Filter results to match or exclude entries based on name, filesize, date, or a few other criteria.
Since the paths to which we've saved files are stored in the database, we never really have the use case in which we are searching the drive (or appropriate stream) for files. This *could* make for some interesting additional functionality at some point, but doesn't appear to offer much to simplify existing file handling.
That is, it has no support for moving, renaming, copying, chmodding etc. files.
Comment #5
pounardThere is some features such as the drupal_scan_files() or whatever high level functions that could use such high level API, but even before looking at Symfony for this I'd first look at SPL, with classes such as DirectoryIterator or RecursiveDirectoryIterator which are supposedly (when we look at various benchmarks) highly faster than
glob()
,readdir()
and friends, which can be used with all kind of FilterIterator implementation such as the RegexIterator which can be used upon the SplFileInfo class.EDIT: After a short reading of Finder component code, it uses all of that and seems to do some work about stream wrappers, it's probably a good idea to use it.
Comment #6
ardas CreditAttribution: ardas commentedThats for sure!
Since we all want to move towards Symfony, we would like to see their Finder Component inside Drupal... At least Library API module can use it to traverse 'libraries' directory to gather libraries.
I can say that file read and write operations - is one of the slowest things in Drupal (right after amount of SELECTs and loading ALL modules on bootstrap FULL stage).
Comment #7
sunJust happened to have a chance to have a deeper look into Finder for some other issue.
I took the most common use-case we have in core (
drupal_system_listing()
), extracted the effectivefile_scan_directory()
arguments of it, and compared that to Finder. The results are not in favor of Finder:Effective bench code:
It's also noteworthy that Finder is not really flexible/customizable. E.g., we'd typically pass the
FilesystemIterator::UNIX_PATHS | FilesystemIterator::SKIP_DOTS
flags to skip hidden files and achieve platform-agnostic filepaths, but Finder doesn't allow to customize the $flags currently.Comment #8
Crell CreditAttribution: Crell commentedIs there anything we could legally push upstream to improve the Finder component to make it more compelling for us, and thus reduce the total amount of code in the world?
Comment #9
sunBased on my tonight's investigation, Finder would have to be completely rewritten from scratch, in order to leverage RecursiveDirectoryIterator instead of DirectoryIterator, and likewise, re-implementing all filters as RecursiveFilterIterator instead of FilterIterator.
Essentially, Finder is running into the same trap like a gazillion of PHP code snippets I found on the net:
However, to perform filtering before recursing until the end of the world, the RecursiveDirectoryIterator has to be wrapped with a RecursiveFilterIterator, before the RecursiveIteratorIterator is invoked.
E.g., like this:
A pure RecursiveDirectoryIterator implementation with proper RecursiveFilterIterators is able to cut down the total time approx. by half on my machine, but that is still 3x times slower than
file_scan_directory()
.See also: #1833732: Find a way to skip ./config directories without skipping core/modules/config directory in drupal_system_listing()
Comment #10
pounardDon't forget the RegexIterator class too, which can be used for filtering by pattern. It should be tested in place of the SystemListRecursiveFilterIterator. And you should check performances of SystemListRecursiveFilterIterator too.
Comment #11
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe point of using Symfony components is not and has never been performance. The amount of indirection everywhere in Symfony is going to slow down Drupal 8 by orders of magnitude. This is by design.
Scanning the filesystem being a relatively infrequent operation anyway, could we just decide we don't care?
Comment #12
pounardI agree with Damien about this one, the finder looks good for us. But even without the finder we still need to consider using the SPL right, which could drastically reduce Drupal system listing code to a 3 iterators objects instanciation and a simple foreach.
A patch written by chx is actually doing that for bootstrap/kernel stuff
, I don't remember which one exactly. EDIT: See #1831350: Break the circular dependency between bootstrap container and kernel and https://drupal.org/files/1831350_22.patchComment #13
sunAs long as Finder uses iterators instead of recursive iterators, it cannot be considered for core. 5 times slower is not acceptable.
I asked upstream whether there are any plans to convert it to recursive iterators. @fabpot didn't object to it, but someone would have to perform the conversion (which isn't particularly trivial). I'm also not sure whether the switch to recursive iterators wouldn't demand for a changed iterator architecture in Finder — i.e., I think a lot of investigation and architectural design work is needed there. Given the remaining time we have until D8 feature freeze, it rather appears unlikely to be able to 1) fix the library upstream, and 2) get it ready for core inclusion afterwards.
Filesystem scans are not as rare as you might think. Any performance decrease there significantly slows down the installer, update.php, and from my perspective most importantly, tests. Drupal also has to perform filesystem scans in case all caches are empty — the slower the scan is, the higher the chance for race conditions and parallel requests getting (b)locked. The performance impact is measurable and visible on all fronts, both for users and developers.
E.g., only just recently, we had to tweak the existing scan functions, so as to get the performance of unit tests back under control.
I certainly know of RecursiveRegexFilter and I tested it in my early benchmarks — it performed very slow. I think it only makes sense to use that filter when subclassing it or when using it within a stack of other filters.
Comment #14
pounardDid you benchmark the GlobIterator?
I did use RecursiveWhatnotIterator and RegexIterator a lot, for parsing a huge volume of XML and HTML files (converting static site to a CMS content) and I never experienced any performance issues. Actually, in most case, those iterators runtime was so small compared to whatever I had to do arround that it was insignificant to me.
I'd be curious to know in which conditions you tested those iterators, which filesystem, which kind of harddrive, and on which OS.
From there http://stackoverflow.com/questions/11652481/php-fast-recursive-directory... the post actually bencmarks 30 secondes for reading 60,000 files/subdir, I guess this is a lot slower than the hardcore more C-like version, but I don't think we're ever gonna parse 60,000 files/subdir in Drupal. I'd still like to see more benchmarks before saying this is not acceptable.
Parsing files, in most cases especially during a normal runtime, is not acceptable, when we're dealing with modules finding for example we don't really care about performances because it's a pure administrative task that we are not supposed to ever do during normal runtime.
Even thought Simpletest sounds like an edge case we're never gonna encounter in such volumetry elsewhere in core, I'd be happy to use the slower iterators everywhere and make exceptions in cases such as Simpletest.
EDIT: And final note, trying to benchmark the iterators with different flags (for example not returning SplFileInfo objects, but just filename instead) might also change benchmark results.
Comment #15
bzitzow CreditAttribution: bzitzow commentedThis is an interesting thread. Did the conversation continue elsewhere?
Comment #18
phenaproximaIt may be time to re-open this discussion, because from a cursory look at the Finder component's main class, it appears they are now using recursive iterators.
Strictly from a DX standpoint, I am very in favour of Drupal using the Finder component. But I recognize the far-reaching performance implications of such a change, so clearly it should be carefully researched. I'm not a performance guy, but I'd be willing to help in any way I could so that we could get a clear answer on whether this is something we would want to include in core.
Comment #19
stefan.r CreditAttribution: stefan.r commentedComment #22
andypostComment #23
andypostComment #25
claudiu.cristeaHere's a benchmark I ran with the script from the patch. Is seems that the Symfony finder performance has improved dramatically. However, the Drupal scanner is still better. But a discussion would be necessary as the Finder is a modern library:
Comment #26
dawehner@claudiu.cristea
What would happen if we would move
file_scan_directory
into a nicer to use component? Conceptually I don't see a reason not to be able to provider a better developer experience while keeping the current limitation and optimisations up.Comment #28
kim.pepperCreated #3035312: Move file_scan_directory() to file_system service
Comment #30
andypostFinder also added in #2242947: Integrate Symfony Console component to natively support command line operations
Comment #31
andypostOne more issue needs finder
Comment #32
andypostComment #38
andypostOne more usage gonna be added in #3272110: Drupal 9 and 10's Drupal\Component composer.json files are totally out of date