http://symfony.com/doc/2.0/components/finder.html

Symfony2 provides a library that handles various issues that deal with files including:

  • getting the path of a file
  • retrieving file metadata
  • handle custom stream wrappers

We currently have a mature handling of files that is improving by the day with the effort to make files to be entities. Perhaps adopting this library will help that effort. Perhaps it won't. We should find out.

CommentFileSizeAuthor
#25 1451320-25.patch5.23 KBclaudiu.cristea
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Crell’s picture

Issue tags: +symfony

I don't think Symfony's Finder system is really a replacement for File Entities. It works at a lower level, and is mostly about file system traversal and manipulation. They're not directly comparable.

Damien Tournoud’s picture

If I read the OP properly, the point is about replacing some of our lower-level helper functions (drupal_mkdir(), drupal_chmod(), drupal_dirname(), etc. and possibly some of the methods of DrupalStreamWrapperInterface). We might have to contribute some stuff back (I'm not sure Symfony has our streamwrapper-friendly chmod, for example).

Crell’s picture

Damien: Yes, I was clarifying the OP's statement that this is relating to files becoming entities. I don't think it is. It's a good thing to consider all on its own. :-)

I don't think anyone would object to us contributing back upstream, but we should do it soon as I think 2.1 is supposed to hit stable well before Drupal 8 does.

brianV’s picture

I just read through the Finder component documentation, and I fail to see how it applies to our current file system handling.

It is, as the name implies, primarily built around finding files in a filesystem based on set criteria:

1. You can specify in which directories to search for files, which to exclude, and whether to search recursively.
2. Specify whether the returned list contains is files, directories, or files and directories.
3. How the resulting list should be sorted.
4. Filter results to match or exclude entries based on name, filesize, date, or a few other criteria.

Since the paths to which we've saved files are stored in the database, we never really have the use case in which we are searching the drive (or appropriate stream) for files. This *could* make for some interesting additional functionality at some point, but doesn't appear to offer much to simplify existing file handling.

That is, it has no support for moving, renaming, copying, chmodding etc. files.

pounard’s picture

There is some features such as the drupal_scan_files() or whatever high level functions that could use such high level API, but even before looking at Symfony for this I'd first look at SPL, with classes such as DirectoryIterator or RecursiveDirectoryIterator which are supposedly (when we look at various benchmarks) highly faster than glob(), readdir() and friends, which can be used with all kind of FilterIterator implementation such as the RegexIterator which can be used upon the SplFileInfo class.

EDIT: After a short reading of Finder component code, it uses all of that and seems to do some work about stream wrappers, it's probably a good idea to use it.

ardas’s picture

Thats for sure!

Since we all want to move towards Symfony, we would like to see their Finder Component inside Drupal... At least Library API module can use it to traverse 'libraries' directory to gather libraries.

I can say that file read and write operations - is one of the slowest things in Drupal (right after amount of SELECTs and loading ALL modules on bootstrap FULL stage).

sun’s picture

Just happened to have a chance to have a deeper look into Finder for some other issue.

I took the most common use-case we have in core (drupal_system_listing()), extracted the effective file_scan_directory() arguments of it, and compared that to Finder. The results are not in favor of Finder:

$ php bench.drupal.system-listing.php
ref: refs/heads/8.x
Peak memory before test: 2,912.03 KB
Iterations: 10

nothing:                      0 seconds
function no_op():             0 seconds
file_scan_directory():   2.6179 seconds
Finder:                 12.2466 seconds

Peak memory after test: 3,401.31 KB
Memory difference: +489.28 KB

Effective bench code:

$dir = 'core/modules';
$mask = '/^' . DRUPAL_PHP_FUNCTION_PATTERN . '\.module$/';
    $files = file_scan_directory($dir, $mask, array(
      'key' => 'name',
      'min_depth' => 1,
      'nomask' => '/^(CVS|lib|templates|css|js)$/',
    ));
  $finder = new Symfony\Component\Finder\Finder();
  $finder
    ->files()
    ->depth('> 1')
    ->name($mask)
    ->exclude(array('lib', 'templates', 'js', 'css', 'config'))
    ->in($dir);
  $files = array();
  foreach ($finder as $file) {
    $file->uri = $file->getPathName();
    $file->filename = $file->getFileName();
    $file->name = pathinfo($file->filename, PATHINFO_FILENAME);
    $files[$file->name] = $file;
  }

It's also noteworthy that Finder is not really flexible/customizable. E.g., we'd typically pass the FilesystemIterator::UNIX_PATHS | FilesystemIterator::SKIP_DOTS flags to skip hidden files and achieve platform-agnostic filepaths, but Finder doesn't allow to customize the $flags currently.

Crell’s picture

Is there anything we could legally push upstream to improve the Finder component to make it more compelling for us, and thus reduce the total amount of code in the world?

sun’s picture

Based on my tonight's investigation, Finder would have to be completely rewritten from scratch, in order to leverage RecursiveDirectoryIterator instead of DirectoryIterator, and likewise, re-implementing all filters as RecursiveFilterIterator instead of FilterIterator.

Essentially, Finder is running into the same trap like a gazillion of PHP code snippets I found on the net:

        $iterator = new \RecursiveIteratorIterator(
            new Iterator\RecursiveDirectoryIterator($dir, $flags),
            \RecursiveIteratorIterator::SELF_FIRST
        );

# ...which translates into:

        $directory = new Iterator\RecursiveDirectoryIterator($dir, $flags);
        // ^^ This is completely unfiltered *AND* recursive; i.e., all files, all directories.

        // vv As soon as this is invoked, the total filesystem scan happens, unfiltered.
        $iterator = new \RecursiveIteratorIterator($directory, \RecursiveIteratorIterator::SELF_FIRST);

        // ...whereas Finder only starts to filter _here_.

However, to perform filtering before recursing until the end of the world, the RecursiveDirectoryIterator has to be wrapped with a RecursiveFilterIterator, before the RecursiveIteratorIterator is invoked.

E.g., like this:

$directory = new RecursiveDirectoryIterator('core/modules', $flags);
$filter    = new SystemListRecursiveFilterIterator($directory, 'module', array('lib', 'config', 'js', 'css', 'templates'));
$iterator  = new RecursiveIteratorIterator($filter);

$files = array();
foreach ($iterator as $filename => $file) {
  $file->uri = $file->getPathName();
  $file->filename = $file->getFileName();
  $file->name = pathinfo($file->filename, PATHINFO_FILENAME);
  $files[$file->name] = $file;
}

A pure RecursiveDirectoryIterator implementation with proper RecursiveFilterIterators is able to cut down the total time approx. by half on my machine, but that is still 3x times slower than file_scan_directory().

See also: #1833732: Find a way to skip ./config directories without skipping core/modules/config directory in drupal_system_listing()

pounard’s picture

Don't forget the RegexIterator class too, which can be used for filtering by pattern. It should be tested in place of the SystemListRecursiveFilterIterator. And you should check performances of SystemListRecursiveFilterIterator too.

Damien Tournoud’s picture

The point of using Symfony components is not and has never been performance. The amount of indirection everywhere in Symfony is going to slow down Drupal 8 by orders of magnitude. This is by design.

Scanning the filesystem being a relatively infrequent operation anyway, could we just decide we don't care?

pounard’s picture

I agree with Damien about this one, the finder looks good for us. But even without the finder we still need to consider using the SPL right, which could drastically reduce Drupal system listing code to a 3 iterators objects instanciation and a simple foreach.

A patch written by chx is actually doing that for bootstrap/kernel stuff , I don't remember which one exactly. EDIT: See #1831350: Break the circular dependency between bootstrap container and kernel and https://drupal.org/files/1831350_22.patch

sun’s picture

  1. As long as Finder uses iterators instead of recursive iterators, it cannot be considered for core. 5 times slower is not acceptable.

    I asked upstream whether there are any plans to convert it to recursive iterators. @fabpot didn't object to it, but someone would have to perform the conversion (which isn't particularly trivial). I'm also not sure whether the switch to recursive iterators wouldn't demand for a changed iterator architecture in Finder — i.e., I think a lot of investigation and architectural design work is needed there. Given the remaining time we have until D8 feature freeze, it rather appears unlikely to be able to 1) fix the library upstream, and 2) get it ready for core inclusion afterwards.

  2. Filesystem scans are not as rare as you might think. Any performance decrease there significantly slows down the installer, update.php, and from my perspective most importantly, tests. Drupal also has to perform filesystem scans in case all caches are empty — the slower the scan is, the higher the chance for race conditions and parallel requests getting (b)locked. The performance impact is measurable and visible on all fronts, both for users and developers.

    E.g., only just recently, we had to tweak the existing scan functions, so as to get the performance of unit tests back under control.

  3. I certainly know of RecursiveRegexFilter and I tested it in my early benchmarks — it performed very slow. I think it only makes sense to use that filter when subclassing it or when using it within a stack of other filters.

pounard’s picture

Did you benchmark the GlobIterator?

I did use RecursiveWhatnotIterator and RegexIterator a lot, for parsing a huge volume of XML and HTML files (converting static site to a CMS content) and I never experienced any performance issues. Actually, in most case, those iterators runtime was so small compared to whatever I had to do arround that it was insignificant to me.

I'd be curious to know in which conditions you tested those iterators, which filesystem, which kind of harddrive, and on which OS.

From there http://stackoverflow.com/questions/11652481/php-fast-recursive-directory... the post actually bencmarks 30 secondes for reading 60,000 files/subdir, I guess this is a lot slower than the hardcore more C-like version, but I don't think we're ever gonna parse 60,000 files/subdir in Drupal. I'd still like to see more benchmarks before saying this is not acceptable.

Parsing files, in most cases especially during a normal runtime, is not acceptable, when we're dealing with modules finding for example we don't really care about performances because it's a pure administrative task that we are not supposed to ever do during normal runtime.

Even thought Simpletest sounds like an edge case we're never gonna encounter in such volumetry elsewhere in core, I'd be happy to use the slower iterators everywhere and make exceptions in cases such as Simpletest.

EDIT: And final note, trying to benchmark the iterators with different flags (for example not returning SplFileInfo objects, but just filename instead) might also change benchmark results.

bzitzow’s picture

Issue summary: View changes

This is an interesting thread. Did the conversation continue elsewhere?

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

phenaproxima’s picture

As long as Finder uses iterators instead of recursive iterators, it cannot be considered for core. 5 times slower is not acceptable.

It may be time to re-open this discussion, because from a cursory look at the Finder component's main class, it appears they are now using recursive iterators.

Strictly from a DX standpoint, I am very in favour of Drupal using the Finder component. But I recognize the far-reaching performance implications of such a change, so clearly it should be carefully researched. I'm not a performance guy, but I'd be willing to help in any way I could so that we could get a clear answer on whether this is something we would want to include in core.

stefan.r’s picture

Title: Evaluate Symphony2's Finder Component to simplify file handling » Evaluate Symfony2's Finder Component to simplify file handling

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

andypost’s picture

Version: 8.4.x-dev » 8.5.x-dev
andypost’s picture

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

claudiu.cristea’s picture

FileSize
5.23 KB

Here's a benchmark I ran with the script from the patch. Is seems that the Symfony finder performance has improved dramatically. However, the Drupal scanner is still better. But a discussion would be necessary as the Finder is a modern library:

+----------+----------+----------+--------+
| pass #   | entries  | symfony  | drupal |
+----------+----------+----------+--------+
| Pattern: /.*/                  |        |
+----------+----------+----------+--------+
| #1       | 28306    | 1176 ms  | 967 ms |
| #2       | 28306    | 1264 ms  | 974 ms |
| #3       | 28306    | 1154 ms  | 997 ms |
| #4       | 28306    | 1149 ms  | 852 ms |
| #5       | 28306    | 1145 ms  | 866 ms |
+----------+----------+----------+--------+
| Average             | 1178 ms  | 931 ms |
+----------+----------+----------+--------+
| Pattern: /\.php$/              |        |
+----------+----------+----------+--------+
| #1       | 10950    | 1133 ms  | 610 ms |
| #2       | 10950    | 1110 ms  | 588 ms |
| #3       | 10950    | 1114 ms  | 593 ms |
| #4       | 10950    | 1088 ms  | 583 ms |
| #5       | 10950    | 1077 ms  | 605 ms |
+----------+----------+----------+--------+
| Average             | 1104 ms  | 596 ms |
+----------+----------+----------+--------+
| Pattern: /\.html\.twig$/       |        |
+----------+----------+----------+--------+
| #1       | 538      | 1091 ms  | 555 ms |
| #2       | 538      | 1084 ms  | 553 ms |
| #3       | 538      | 1092 ms  | 555 ms |
| #4       | 538      | 1075 ms  | 517 ms |
| #5       | 538      | 1086 ms  | 534 ms |
+----------+----------+----------+--------+
| Average             | 1086 ms  | 543 ms |
+----------+----------+----------+--------+
| Pattern: /\.css$/              |        |
+----------+----------+----------+--------+
| #1       | 487      | 1057 ms  | 549 ms |
| #2       | 487      | 1093 ms  | 545 ms |
| #3       | 487      | 1072 ms  | 560 ms |
| #4       | 487      | 1080 ms  | 539 ms |
| #5       | 487      | 1092 ms  | 549 ms |
+----------+----------+----------+--------+
| Average             | 1079 ms  | 548 ms |
+----------+----------+----------+--------+
| Pattern: /\.yml$/              |        |
+----------+----------+----------+--------+
| #1       | 2076     | 1094 ms  | 551 ms |
| #2       | 2076     | 1048 ms  | 559 ms |
| #3       | 2076     | 1089 ms  | 560 ms |
| #4       | 2076     | 1116 ms  | 567 ms |
| #5       | 2076     | 1085 ms  | 562 ms |
+----------+----------+----------+--------+
| Average             | 1086 ms  | 560 ms |
+----------+----------+----------+--------+
dawehner’s picture

@claudiu.cristea
What would happen if we would move file_scan_directory into a nicer to use component? Conceptually I don't see a reason not to be able to provider a better developer experience while keeping the current limitation and optimisations up.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

kim.pepper’s picture

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

andypost’s picture

andypost’s picture

One more issue needs finder

andypost’s picture

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

andypost’s picture

Version: 9.4.x-dev » 9.5.x-dev

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.