Problem/Motivation

MTimeProtectedFastFileStorage::save() has concurrency issues when multiple processes are dumping the container at the same time. This is because the mechanism relies on non-atomic file operations without locking. The two problematic parts are:

  1. The enclosing directory is recursively removed and recreated.
  2. The mechanism to synchronize the directory modification time with the filename hash relies on busy loop which involves file operations.

Note that a change of the directory modification time will result in the invalidation of an existing container. As a result if multiple processes concurrently modify the same directory, there is a chance that newly written containers are invalidated immediately by another process.

Steps to reproduce (based on #2497243-214: Replace Symfony container with a Drupal one, stored in cache:

Remove the compiled container from disk: cd sites/default/files/php && rm -rf service_container. Add a line of code to MTimeProtectedFastFileStorage::save to write 1 line per save into some log file. Now start hitting your site hard with ab, 100 concurrent requests: ab -n 100000 -c 100 http://drupal-8.localhost/ Observe the log file has more than 100 lines, possibly it does not even finish.

Proposed resolution

  1. Never remove the enclosing directory
  2. Reset the modification time of the enclosing directory using touch() to the one of the newly written file instead.

This resolves the endlessness in #2497243: Replace Symfony container with a Drupal one, stored in cache. On busy systems there is still the possibility that multiple processes concurrently rebuild the container. However, because the storage relies on atomic filesystem operations, this will neither result in a deadlock, nor in multiple processes each other newly written containers. Per #2547827-7: PhpStorage: past, present and future the multiple builds is a major bug, the endlessness is critical.

Remaining tasks

Review.

User interface changes

None.

API changes

API addition: PhpStorageInterface::garbageCollection().

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

znerol’s picture

Status: Needs work » Needs review
FileSize
2.84 KB
dawehner’s picture

Priority: Normal » Major

Nice research, nice work!

This is of course at least major, if not even critical, given what it solves.

Fabianx’s picture

Tagging security.

This is probably one of the most security reviewed code in core, so will need a security review unfortunately.

znerol’s picture

FYI, I can reproduce the original issue with the following setup:

  • LAMP in a vagrant box.
  • Drupal exported via NFS from host into the VM in data/htdocs/drupal-8-bootstrap.
  • ab -c 100 -n 1000000 -s120 http://127.0.0.1:8080/drupal-8-bootstrap/
  • rm -rf data/htdocs/drupal-8-bootstrap/sites/default/files/php/service_container

Then observe the container being rewritten constantly using:

watch -n0.1 find data/htdocs/drupal-8-bootstrap/sites/default/files/php/service_container

I've turned down the number of servers to only 8 and the issue still exists:

<IfModule mpm_prefork_module>
    StartServers          4
    MinSpareServers       4
    MaxSpareServers       8
    MaxClients            8
    MaxRequestsPerChild   0
</IfModule>

This is on a not-so-new but still decent laptop with an SSD.

znerol’s picture

In order to prevent that touch creates a file (in the place of the directory) if the directory does not exist, ensure that the path terminates with a directory separator.

PHP source of touch(). Updating the mtime is done using the utime() system call on unix and that seems to be POSIX. I'm trying to find a windows now to see whether that works the same way.

znerol’s picture

Confirmed that this approach also works in Windows.

znerol’s picture

Add garbage collection.

znerol’s picture

Issue summary: View changes
znerol’s picture

Issue summary: View changes
googletorp’s picture

Status: Needs review » Needs work
FileSize
6.79 KB
3.81 KB

Patches from #7 is a bit messed up, interdiff.txt is the actual patch and the patch is empty. Fixed this.

googletorp’s picture

Status: Needs work » Needs review
znerol’s picture

FileSize
3.81 KB

Oh, sorry about that. Here is the interdiff for #5-#10.

alexpott’s picture

+++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
@@ -78,43 +78,26 @@ public function save($name, $data) {
+    rename($temporary_path, $full_path);

Using an atomic file operation here is clever. Let's document why we're doing this.

znerol’s picture

Added a comment for the rename.

bjaspan’s picture

I am a crufty Drupal core developer that has not been involved for five years, and I have not looked at D8 before this week. So, it is entirely possible that I do not know what I'm talking about. However, I have some concerns with this patch.

First, specific comments on the code:

  1. +++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
    @@ -78,43 +78,29 @@ public function save($name, $data) {
    +    rename($temporary_path, $full_path);
    

    This operation can fail, and you are not checking the return value.

  2. +++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
    @@ -78,43 +78,29 @@ public function save($name, $data) {
    +    touch($directory . '/', $mtime);
    

    Ditto, even if you know the rename succeeded. For example, a network partition might have just separated you from the NFS server.

  3. +++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
    @@ -162,6 +148,37 @@ public function delete($name) {
    +        foreach (new \DirectoryIterator($directory) as $fileinfo) {
    

    The directory might cease to exist between file_exists() and the creation of the iterator.

  4. +++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
    @@ -162,6 +148,37 @@ public function delete($name) {
    +              @chmod($directory, 0777);
    +              @unlink($fileinfo->getPathName());
    

    Return codes.

  5. +++ b/core/lib/Drupal/Component/PhpStorage/MTimeProtectedFastFileStorage.php
    @@ -162,6 +148,37 @@ public function delete($name) {
    +          $this->unlink($name);
    

    Return codes.

Second, and more importantly, even if you address all of those comments, I'm still concerned. It is commonplace now for Drupal to run in multi-web environments using a distributed filesystem, and distributed filesystems are just not that reliable. A "POSIX compatible distributed filesystem" is basically a myth. It is not safe to assume that rename() is atomic or that touch() operations work or, basically, almost anything else "clever" about a filesystem. So, this patch will result in D8 being not that reliable.

Furthermore, even if I'm wrong and there is some way to make this work reliably using the filesystem (I doubt it), I'm *still* concerned, because then you will have forced D8 to be absolutely dependent on using a distributed filesystem that provides strictly POSIX-compatible rename() operations. There are a number of potentially compelling choices for Drupal's shared filesystem that do not meet this criteria (e.g. S3).

The common component in a Drupal stack that is capable of performing atomic operations is the SQL database. So, for operations that need to be locked or atomic, use the database. Or, use a locking service.

Does D8 have an existing service interface for locking across all web nodes?

chx’s picture

> because then you will have forced D8 to be absolutely dependent on using a distributed filesystem that provides strictly POSIX-compatible rename() operations

Noone should use the mtime protected storage on a shared filesystem. That's a piss poor idea. #2301163: Create a phpstorage backend that works with a local disk but coordinates across multiple webheads and #2513326: Performance: create a PHP storage backend directly backed by cache both offer better ideas.

znerol’s picture

Please note that the term concurrency in the title is not meant to express distributed filesystems, but merely parallel processing. We currently have a problem with multiple server processes invalidating each others work when rebuilding the container.

Storing code on the filesystem has the advantage that we can leverage opcode caching. Especially for small sites this is probably the fastest option for retrieving generated code we currently have. The picture is certainly different in a load balanced / cluster setup. We already have the option in HEAD to swap out container and twig storage for something different.

Even if we eventually move container storage to somewhere else as #2497243: Replace Symfony container with a Drupal one, stored in cache aims to do it, we probably still will be using PHP storage for different things. Therefore I think it is important to make it as robust as possible. The other option would be to ditch it completely.

Good call on the return values. In the garbageCollection method we do not care though, because there is nothing which wants to know whether the operation went well or not. Regardless it is possible to write this function in a more defensive way.

chx’s picture

> Storing code on the filesystem has the advantage that we can leverage opcode caching.

Still beating my own drum: both approaches linked in #16 keeps this via different means.

> Especially for small sites this is probably the fastest option for retrieving generated code we currently have.

That is certainly true -- querying a distributed data store to coordinate across webheads will certainly be slower than not querying one :) Once coordination is done, opcache hits so it's a wash from there.

On to this patch, this is a very nice trick, making the directory mtime a (possibly) false mtime, very nice, requires PHP 5.3.0 but we are way past that.

pwolanin’s picture

Should we find a way to totally remove mtime storage before release if it's not reliable? Is this issue or the linked issues critical?

It feels like we should consider also facilitating the pattern of committing the built PHP to git and blocking config changes in production?

Berdir’s picture

It feels like we should consider also facilitating the pattern of committing the built PHP to git and blocking config changes in production?

Locking down is all good and well, but at some point, you need to do actually do a config deployment, which can absolutely include new modules.

I really don't see how that is supposed to be possible unless you also deploy the whole database from staging which already has the new modules installed. Or have multiple prepared containers ready, which then depend on the list of enabled modules or something like that.

catch’s picture

Or have multiple prepared containers ready, which then depend on the list of enabled modules or something like that.

We've got a bit closer to that with the addition of $deployment_identifier in settings.php, although that would mean the new container is used immediately on code deployment and before update.php is run or config is synced, so it'd still be out of sync until both of those are done. You could go further and write the deployment hash to memcache then read it in settings.php but that gets really nasty fast.

#2497243: Replace Symfony container with a Drupal one, stored in cache means the container is no longer written to PHP (and is critical), so the container will be completely out of phpstorage.

However #2429659: Race conditions in the twig template cache - twig templates are still using phpstorage.

It would be possible to use read-only phpstorage for twig templates, but they can differ based on enabled modules (due to node visitors http://twig.sensiolabs.org/doc/advanced.html) - so while that could be pre-generated in a build step on a per-site basis, we can't add template building to d.o packaging etc. as it stands.

znerol’s picture

I believe that mtime protected PHP storage can be implemented in a way such that it reliably works (i.e., does not obstruct itself). It is not even a problem if the underlying system fails to implement rename() in an atomic way, the worst thing that can happen is a reduced hit/miss ratio.

The same goes for the touch() call. In most cases, the timestamp of the enclosing directory is correct. The call attempts to fix the timestamp in the rare case of a mismatch. Even if the filesystem does not support updating the metadata, this will simply result in a cache-miss on a subsequent request and that will just rebuild the data and retry to store it. In fact I believe we only need the touch() in order to prevent random test failures, in production this is most likely irrelevant.

The problem of the current implementation is the recursive unlink() which will happily remove work done by other processes.

catch’s picture

Priority: Major » Critical

Going to bump this to critical on the basis that:

- It fixes #2497243: Replace Symfony container with a Drupal one, stored in cache (even if I think that issue is release blocking for other reasons).

- Also fixes #2429659: Race conditions in the twig template cache even though we worked around it in twig - although the workaround requires an eval() which still concerns me a bit.

- Any other use of phpstorage will have the same race condition/stampede issue until this is in.

chx’s picture

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

I done a line-by-line review of this. The use of atomic operations to avoid race conditions is the way to go. Nice work @znerol.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 17: resolve_concurrency-2527478-17.patch, failed testing.

znerol’s picture

That's weird. This used to pass on the 5.4 testbot, but it seems to break now on PHP 5.5 :/

znerol’s picture

chx’s picture

Title: Resolve concurrency issues in PHP storage » Resolve infinite stampede in mtime protected PHP storage
chx’s picture

Issue summary: View changes

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 17: resolve_concurrency-2527478-17.patch, failed testing.

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 17: resolve_concurrency-2527478-17.patch, failed testing.

alexpott’s picture

If you what to repeat the test result you need to run the test without a parent site installed and have no settings.php at all. You need to use the sqlite test runner. Something like:
php ./core/scripts/run-tests.sh --sqlite /tmp/coretest.sqlite --dburl sqlite://localhost/sites/default/files/db.sqlite --color --url http://your.url

alexpott’s picture

    if (!$this->addServiceFiles(Settings::get('container_yamls'))) {
      throw new \Exception('The container_yamls setting is missing from settings.php');
    }

Is this code actually valid? Is it really mandatory to have a container_yamls setting in settings.php. This looks wrong to me. Simpletest sets it to an empty array a lot so that this does not thrown an exception.

dawehner’s picture

Afaik #2547447: Stop creating services.yml by default will get rid of that requirement, so strictly speaking I don't think we would really need it.

Status: Needs work » Needs review
dawehner’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +D8 Accelerate

So the other issue fixed indeed the test failures ... back to RTBC

catch’s picture

Status: Reviewed & tested by the community » Fixed

Really good to see this green. Committed/pushed to 8.0.x, thanks!

  • catch committed 8d12555 on 8.0.x
    Issue #2527478 by znerol, googletorp: Resolve infinite stampede in mtime...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.