Batch the configuration synchronisation process.

The Problem

There is a concern that the configuration synchronisation process could potentially cause php to exceed the execution time limit or to exhaust its memory limit.

The rough outline as to how the sync worked is:

  1. Build the storage comparer.

    When accessing the page we scan through all the files in the staging directory. Files which are the same as their counterparts in the active config are ignored. Files which differ or don't have counterparts are added to a list of files to write. Files in the active config which don't have a counterpart in the stage are added to a list to delete.

    The comparison of the files which are present in both staging and active is done by parsing both files into PHP and comparing the objects for equivalence. Presumably we do this, rather than comparing the files directly, because simple re-orderings of variables shouldn't count as being different.

    Each file is loaded in its entirety and the process will fall over if the file size is greater than PHP's memory limit. YAML does lend itself to partial loading so this might be addressable.

    Once the files have been compared the variables are destroyed and the memory is freed.

    In theory the list of changes files could exceed PHPs memory limit but this seems unlikely. It is possible though that the number of files could reach the point where file IO time could cause PHP to exceed the execution time limit. Therefore this is a candidate for being batched also - we may also want to cache the result so that the user doesn't have to sit through a batch process every time they access the sync page.

  2. Show change summary.

    The storage comparer is used to build a change summary. At this point the user has to click continue to proceed.

  3. Prioritise the change set.

    Fire a prioritise event which allows modules to order the change set if necessary. For example system config tends to be handled earlier.

  4. Validate the changes

    Fire a validate event to allow modules to validate the changes.

    There doesn't seem to be any easy way to batch this process or any other event so responsibility will have to devolve to the individual modules to batch potentially long running actions here. Question: can we start a batch process from an active batch process? In theory it should be fine but should check.

  5. Attempt to lock the sync process.

    This is probably the cheapest of the set up operations so should probably happen earlier.

  6. Invoke configuration owners

    Give modules the opportunity to take over the import process. We actually pass a copy of the old and new config at this step which means we're doing two more file loads here for each changed file. Again making it a good candidate for being in a batch.

  7. Handle any other configuration changes

    We load the config entity for each item in the change list and delete the entities which are not present in staging and overwrite the active configurations of the ones that are.

    Presumably we load and write in this way rather than just copying the directory across as the configuration storage system is plugable and so not necessarily file based.

    Again this is quite an intensive IO task and will probably need to be batched.

  8. Fire an import event.

    Notify modules that an import has happened.

  9. Release the lock.
  10. Reset the classes.
  11. Clear all caches.
  12. Delete the staged configuration files.
  13. Complete.

This commit attempts to address the import tasks which take place in steps 6 and 7.

Proposed solutions

The solution this commit implements is to use batch API to split the task over multiple page loads where necessary. Each batch task will be a single operation (create/update/delete) on a single configuration entity. (Note to self: This doesn't mean only one of these will take place per page load. Batch API will try to execute as many as possible while keeping an eye on the running time and will trigger a page load when it approaches the PHP execution time out.) This should solve the problem of the process potentially taking too long.

The downside to this approach is that configuration updates which are codependent could be split up onto separate batch page loads; in which case Drupal will load with a mixed and broken configuration on the second batch page, potentially killing the site.

It would be nicer if we could write the imported config into another store (active-pending say) which we could then switch to quickly at the very end. This would solve the problem by making the configuration change atomic and would additionally allow us to easily roll back in the case of an error. I assume the reason we didn't go for this solution is that the "active" config is special, in that Drupal and modules might have state associated with it not entirely captured by the configuration system. If it's not possible to entirely capture Drupal's state in config then we can't simple swap config states and we would need to run some other sort of import process and we would face the same problem again.

Another solution was to try and group codependent configuration changes so that they all happen at the same time. However, after a quick discussion we decided that this wasn't worth while and might not even be possible to do programatically - of course, there's nothing preventing the end user from doing this manually themselves.

Given that, it was decided that this would be the best solution with the caveat that the user be forewarned to backup and take their site offline before hand and having drush (configured to run for a longer period of time) as the preferred method.

Implementation details

This is a rough early attempt in order to get some feedback on the approach. Things still to address are:

  • The order in which configuration is imported differs in the batch
    process from the normal process.

In the normal process all configuration entities which are handled by modules will be imported first and then the remaining list will be handled by the importer.

In the batched process the entities will be imported in the order they appear in the change list regardless of the method used.

We should standardise on an order and use that for both.

  • When we set up the batch job we pass the importer as an argument to the function. The object is serialised and stored in the database. The problem is that this doesn't happen across page loads so we end up getting an importer with out of date state on subsequent page loads.

    I've worked around this by shifting the importer to the batch api context sandbox on the first call to the worker but this feels a little ugly. There might be a better way of handling this.

  • Currently, the batch process needs to know too much about how the import works. Ideally it should just be calling a batch() function and the specifics of what's happening will be handled by the object.
  • The code relies on the simplified change set from this issue:

  • There are a number of serialisation bugs which prevent the importer from being serialised. These need to be fixed properly. The issues are:
    • When serialising the database connection the object clones itself and unsets the connection, schema and driver objects to save on space as they can be reinitiated easily. This stripped down clone object is then serialised and destroyed, and the serialised value is returned.

      However, the mysql version implements a deconstructor which tidies up the sequences table and so expects there to be a connection available.

    • As mentioned above when we unserialise the connection we recreate the PDO connection object. However we don't bother to re-set the custom statement class object so the statements returned aren't what Drupal expects.

    • The Drupal Kernel inherits the serialize and unserialize functions of the Symfony kernel which has a different signature. This will cause an error to be thrown when we try to unserialize it.

#19 interdiff.txt591 bytesxjm
#19 config-2004370-19.patch13.84 KBxjm
PASSED: [[SimpleTest]]: [MySQL] 64,579 pass(es). View
#18 interdiff.txt1.33 KBxjm
#18 config-2004370-18.patch13.84 KBxjm
FAILED: [[SimpleTest]]: [MySQL] Setup environment: Test cancelled by admin prior to completion. View
#15 2004370.15.patch13.84 KBalexpott
PASSED: [[SimpleTest]]: [MySQL] 64,443 pass(es). View
#15 10-15-interdiff.txt3.01 KBalexpott
#10 2004370.10.patch12.99 KBalexpott
FAILED: [[SimpleTest]]: [MySQL] 64,513 pass(es), 4 fail(s), and 1 exception(s). View


Dean Reilly’s picture

18.09 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/DrupalKernel.php. View

Patch attached.

beejeebus’s picture

Issue tags: +Configuration system


beejeebus’s picture

Status: Active » Needs review

The last submitted patch, batch-import.patch, failed testing.

xjm’s picture

xjm’s picture

Issue tags: +beta blocker

Per @alexpott, this is a beta blocker.

Berdir’s picture

Note that this isn't quite true:

Batch API will try to execute as many as possible while keeping an eye on the running time and will trigger a page load when it approaches the PHP execution time out.)

batch has a hardcoded limit of 1s per request, which does mean that there is a considerable overhead involved batch is used, but I do agree that we need to do this, it takes forever on large(er) config stagings.

catch’s picture

Priority: Normal » Critical

If it's a beta blocker, it should also be critical.

xjm’s picture

Issue tags: +sprint
alexpott’s picture

Status: Needs work » Needs review
12.99 KB
FAILED: [[SimpleTest]]: [MySQL] 64,513 pass(es), 4 fail(s), and 1 exception(s). View

The patch attached takes @Dean Reilly's work and brings it up to date with the latest changes. Seems to work nicely.

yched’s picture

Just like FAPI callbacks, batch callbacks as methods is dangerous.

'callback' => array($this, 'someMethod') means $this is going to be serialized / unserialized in the $batch structure at each HTTP request during processing. It's doubly nasty when used for $batch['operations'] callbacks since each operation is stored in its own db record and $this will thus get serialized separately for each operation (patch here uses one single multistep operation, so at least this part doesn't explode).

In short: better to use *static* callback methods if at all possible, adding them to the batch as 'callback' => array(get_class($this), 'someStaticMethod').

Also, mind you that the 'operation' arguments (here $config_importer) will also be be serialized, and then unserialized on each HTTP request. Typically, we only put ids as arguments, not full fledged objects. Do we really need to inject it ?

Status: Needs review » Needs work

The last submitted patch, 10: 2004370.10.patch, failed testing.

alexpott’s picture

@yched yes because we need to maintain the state of the configuration importer.

This will pass once we get the testbot on PHP 5.4

yched’s picture

Well, maintaining the state of the BatchConfigImporter as the batch progresses is done by placing it in $context['sandbox']['config_importer'] on first processBatch() run. Then it's stored/serialized/unserialized as part of the 'sandbox' in the $batch on each request.

I wish we could avoid additionally storing/serializing it as part of the 'operation' argument, since that's only needed to kickstart the process, but it looks like it has lots of dependencies itself and thus needs to be created outside of the batch first, and then it does have to enter the processing loop somehow :-/
I just hope BatchConfigImporter extends DependencySerialization :-)

Still sad that we don't use static methods for the 'operation' & 'finished' callback instead of serializing the whole ConfigSync form. Avoiding needless data serialization/unserialization on each request could noticeably speed up processing (well, rather, reduce the batch overhead)
- processBatch() could easily be a static as is.
- finishBatch() only uses $this->t()

alexpott’s picture

Status: Needs work » Needs review
3.01 KB
13.84 KB
PASSED: [[SimpleTest]]: [MySQL] 64,443 pass(es). View

Done the most I can to avoid serialising the ConfigSync form (although of course it does extend DependencySerialisation) and yes this patch ensures the ConfigImporter extends DependencySerialisation too.

This patch also fixes the $batch['finished'] callback to work with php5.3 since there is no reason why it should not.

yched’s picture

Status: Needs review » Reviewed & tested by the community

Thanks Alex. Patch looks good if green.

catch’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Core/Config/BatchConfigImporter.php
@@ -0,0 +1,88 @@
+   * Initialises the config importer in preparation of processing a batch.

While it's fun putting British spelling in core this probably needs a z, same with the method names.

xjm’s picture

Status: Needs work » Needs review
13.84 KB
FAILED: [[SimpleTest]]: [MySQL] Setup environment: Test cancelled by admin prior to completion. View
1.33 KB

That is so much more enjoyable from @catch. ;)

xjm’s picture

13.84 KB
PASSED: [[SimpleTest]]: [MySQL] 64,579 pass(es). View
591 bytes

Actually "in preparation of" doesn't make sense with a gerund, I don't think.

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

@xjm thank you for fixing my spelling mistakes :)

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.x, thanks!

beejeebus’s picture

so this releases the lock between parts of the batch.

which is another way of saying we don't lock the import process. (because it's unlikely another request will get the lock, right? it'll never happen.)

i guess that was seen as the lesser of two evils?

alexpott’s picture

+++ b/core/lib/Drupal/Core/Config/BatchConfigImporter.php
@@ -0,0 +1,88 @@
+    if ($context['finished'] >= 1) {
+      $this->notify('import');
+      // The import is now complete.
+      $this->lock->release(static::ID);
+      $this->reset();
+    }

Only because of lock's releaseAll functionality which is triggered by _drupal_shutdown_function(). I think we better move discussion of the locking facility to another issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.