The current Drupal update framework (roughly: hook_update_N) is quite old. It is still firmly rooted in the procedural origins of Drupal. Also, it has several shortcomings. Some of those shortcomings were addressed in the new(ish) hook_post_update mechanism, but that does not replace the old system, it only augments it for update code that needs the full Drupal framework to be available.
This proposal is inspired by Laravel migrations. These work by defining a class in a pre-determined location and will simply be run “in order”. What in order means we’ll look into later.
Problems with the current system
Sequence numbering
The current system based around hook_update_N will check the “schema version” of a particular module and checks if there are any update hooks with a sequence number higher than that number. It will then execute those, in numerical order, for that particular module. In relation to other modules (an update for a particular module might depend on another update from another module to have run), the order can be influenced by hook_update_dependencies().
It is a clever mechanism, but it does mean that if the solution to an issue needs to add an update hook, that patch/PR ends up chasing the current schema version. Sometimes, a patch is fine, except for the sequence number of its update hook. Running a project with a patch containing an update hook means you need to be constantly on guard whether an update to the module actually introduces a new update hook.
Also, the actual numbering of update hooks is often inconsistent. There are guidelines as to what exactly this number should be. However, only the first digit has any technical bearing, causing the guidelines for the rest of these numbers to regularly be overlooked, even by popular contrib modules. Also, the technical meaning of that first digit, is that it corresponds to the major core version the module is compatible with, which in itself has become a troublesome concept since the new backwards compatible nature of major core versions.
Inconsistent descriptions
A similar problem as the one causing inconsistent sequence numbers applies to the docblock for an update hook; it should only contain a description for the update, which diverges from the regular coding guidelines for function docblocks. (The documentation for hook_update_N not being very explicit about that probably doesn't help; "The documentation block preceding this function is stripped of newlines and used as the description for the update on the pending updates task list, which users will see when they run the update.php script."). This, too, is often missed, causing e.g. "Implements ..." phrasing to creep into the description for update hooks. Other maintainers prefer to add the update number in the description – which actually is not a bad idea – but is not part of the convention so makes for an inconsistent overview of update hooks to run.
Procedural code in an OO world
With Drupal moving to object-oriented code more and more, it is time for the update framework to do the same. OO has many benefits, the discussion of which really is not in the scope of this proposal. It would allow update code to move away from getting essential parts of the Drupal API by calling the Drupal global object and instead use proper Dependency Injection.
The proposal
As said, this proposal is inspired by Laravel Migrations. They serve a very similar purpose to Drupal update hooks, that is, applying changes to the database to adapt it to updated code. One area where we might need to step up is automatically generating code; Laravel migrations get a timestamp automatically when generating them using Laravel’s standard artisan command line tool. Migrations are executed, by default, in the order of their timestamp. For Drupal, this could be a plugin-type named Update (Migration would be too confusing, since we already have a Migration framework that deals with content migrations).
Incidentally, in Laravel, the Migrations also define the database schema. This should probably be considered out of scope here, though; it also means that the database schema is not in a fixed location, but rather the result of the accumulated Migrations. This would not be a design choice I would make.
Although Laravel does support separate packages, interdependency of updates seems less of an issue that it would be with Drupal; the new system would also need a system like hook_update_dependencies to declare dependencies with the updates of other modules. Although it might seem attractive to open up this dependency system for updates from the module itself, this could easily lead to confusing update order and nightmarish dependency resolution; if a dependency arises with another Update from the same module, it should probably just be re-ordered to come after whatever Update it depends on, to keep the order transparent. "How is this different from the current system with a sequential number?", you might ask. The key is that the number is not recorded anywhere, some unique identifier of the update is (like with hook_post_update). This means that an update does not need to change unless there is an actual, technical reason to do so. Incidentally, using the timestamps to come up with a global order by default could potentially also help with implicit dependencies where a dependency actually exists that was not made explicit in code. Not a huge advantage, but nice nonetheless.
Requirements (MoSCoW)
Let’s define some requirements we want a new update framework to adhere to.
- M: Determine whether an update ran based on an Update machine name (as with
hook_post_update) (i.e. not a single sequence number). - M: updates (
hook_update_N) and postupdates use the same mechanism. They should live in the same namespace (Drupal\module_name\Updates?) and use e.g. an annotation to distinguish the type (i.e. ‘database’ for the oldhook_update_Nand ‘content’ forhook_post_update; a clarification in terminology seems in order here). - M: Sequence of execution given any one set of Updates - both within a single module, but also across modules - needs to be deterministic, even if no explicit dependencies were defined.
- M: It is possible to define dependencies on Updates from other modules using an annotation.
- M:Dependency injection is supported. Maybe the possible dependencies are restricted based on update type (see 2).
- C: A way to indicate an Update may run while the system is online (i.e. not in maintenance mode). Content updates can take a very long time, while it often isn’t really essential for the site to be offline while they run. This could also be something that could be manipulated using an alter mechanism, where site developers can opt in to online processing if they deem it responsible for their particular project. This is probably only safe or even possible for post_update-type (‘content’) updates. This would probably require the creation of a new addition to the system that uses a cron queue to process update jobs.
- C: Take into account blue/green deployments. I am not sure how realistic this is, or even how much connection there really is to this issue. It might need a separate ideas issue. The idea behind blue/green deployments is that database and code from one deployment to the next are backwards and forwards compatible, allowing multiple server nodes to be updated out of sync, allowing the site to remain accessible at all times. This seems like a tough challenge with Drupal, where any number of modules might make arbitrary changes to the database.
Examples
Since speaking is silver and code is gold, let's see how this could look.
Simple example
First, a simple one, system_update_8803(). This update is used to enable the new-in-8.8 path_alias module, which turned path aliases into proper entities. This demonstrates the naming of the file, the naming of the class (no real restrictions, except it’s probably a good idea to have the class name to end in ‘Update’, by convention) and the basic Annotation.
20200823_enable_path_alias.php:
<?php
use Drupal\Core\Updates\UpdateBase;
use Symfony\Component\DependencyInjection\ContainerInterface;
/**
* Class EnablePathAliasUpdate.
*
* Example update replicating system_update_8803().
*
* @Update(
* id = "enable_path_alias",
* label = @Translation("Install the 'path_alias' entity type."),
* type = "database",
* )
*/
class EnablePathAliasUpdate extends DatabaseUpdateBase {
/**
* {@inheritdoc}
*/
public function update() {
// Enable the Path Alias module if needed.
if (!$this->moduleHandler()->moduleExists('path_alias')) {
$this->moduleInstaller()->install(['path_alias'], FALSE);
return $this->t('The "path_alias" entity type has been installed.');
}
}
}
The class extends a DatabaseUpdateBase. The idea is that that every update would implement an UpdateInterface, which has several methods to support the update process, the most notable of which would be update(). DatabaseUpdateBase would be a base class that has several conveniences, like some useful dependencies it gets injected by default. In the example we seen moduleHandler and moduleInstaller used. There could also well be a ContentUpdateBase, with possibly different injected dependencies, to provide some guidance what sort of update would be acceptable for each type.
Batched example
Next, a more complicated example. An important aspect of many update hooks is batching. Batching is implemented in a slightly weird way, though. Most notable, it requires the use of a #finished key in the otherwise completely free to use “sandbox”. The example shows how these two principles could be separated. It takes system_update_8804() as its subject, which follows the previous update to actually convert existing path aliases to the new entity type.
20200824_convert_path_aliases.php:
<?php
use Drupal\Core\Updates\BatchedUpdateInterface;
use Drupal\Core\Updates\DatabaseUpdateBase;
use Symfony\Component\DependencyInjection\ContainerInterface;
/**
* Class ConvertPathAliasesUpdate.
*
* Example update replicating system_update_8804().
*
* @Update(
* id = "convert_path_aliases",
* label = @Translation("Convert path aliases to entities."),
* type = "database",
* depends = {
* 'system:enable_path_alias',
* },
* )
*/
class ConvertPathAliasesUpdate extends DatabaseUpdateBase {
/**
* {@inheritdoc}
*/
public function update() {
// Bail out early if the entity type is not using the default storage class.
$storage = $this->entityTypeManager()->getStorage('path_alias');
if (!$storage instanceof PathAliasStorage) {
$this->setCompletion(1);
return;
}
$step_size = 200;
$url_aliases = $this->database->select('url_alias', 't')
->condition('t.pid', $this->sandbox['current_id'], '>')
->fields('t')
->orderBy('pid', 'ASC')
->range(0, $step_size)
->execute()
->fetchAll();
if ($url_aliases) {
/** @var \Drupal\Component\Uuid\UuidInterface $uuid */
$uuid = $this->uuid();
$base_table_insert = $this->database->insert('path_alias');
$base_table_insert->fields(['id', 'revision_id', 'uuid', 'path', 'alias', 'langcode', 'status']);
$revision_table_insert = $this->database->insert('path_alias_revision');
$revision_table_insert->fields(['id', 'revision_id', 'path', 'alias', 'langcode', 'status', 'revision_default']);
foreach ($url_aliases as $url_alias) {
$values = [
'id' => $url_alias->pid,
'revision_id' => $url_alias->pid,
'uuid' => $uuid->generate(),
'path' => $url_alias->source,
'alias' => $url_alias->alias,
'langcode' => $url_alias->langcode,
'status' => 1,
];
$base_table_insert->values($values);
unset($values['uuid']);
$values['revision_default'] = 1;
$revision_table_insert->values($values);
}
$base_table_insert->execute();
$revision_table_insert->execute();
$this->sandbox['progress'] += count($url_aliases);
$last_url_alias = end($url_aliases);
$this->sandbox['current_id'] = $last_url_alias->pid;
// If we're not in maintenance mode, the number of path aliases could change
// at any time so make sure that we always use the latest record count.
$missing = $this->database->select('url_alias', 't')
->condition('t.pid', $this->sandbox['current_id'], '>')
->orderBy('pid', 'ASC')
->countQuery()
->execute()
->fetchField();
$this->setCompletion($missing ? $this->sandbox['progress'] / ($this->sandbox['progress'] + (int) $missing) : 1);
}
else {
$this->setCompletion(1);
}
if ($this->isFinished()) {
// Keep a backup of the old 'url_alias' table if requested.
if (Settings::get('entity_update_backup', TRUE)) {
$old_table_name = 'old_' . substr(uniqid(), 0, 6) . '_url_alias';
if (!$this->database->schema()->tableExists($old_table_name)) {
$this->database->schema()->renameTable('url_alias', $old_table_name);
}
}
else {
$this->database->schema()->dropTable('url_alias');
}
return t('Path aliases have been converted to entities.');
}
}
/**
* {@inheritdoc}
*/
public function initializeBatch() {
$this->sandbox['progress'] = 0;
$this->sandbox['current_id'] = 0;
}
}
First, update hooks that use batching usually start out with some sort of check whether the hook is being run for the first time, based on whether values are set in the sandbox. This shows that with a plugin approach this could just be separated out into a separate method, initializeBatch(). The framework could ensure this is always called first (the base class could include a dummy implementation that does nothing).
Second, this example shows a few other default dependencies from the UpdateBase; entityTypeManager and database. It also shows the base has a sandbox member variable, that functions much like the sandbox parameter passed to update hooks. Again, the update function is not much bothered with it; instead, the UpdateInterface would have a getSandbox() method (implemented by the base) that gets called by the framework. The returned sandbox would then be handled like it is in the procedural update hooks; gets serialized, entered into the batch table, and on the next run, retrieved, unserialized and passed into the plugin using a setSandbox() method.
Similar to startup clauses, finishing the batch is also handled in methods, allowing it to be largely standardized in the base class. The update() method may call setCompletion to let the outside world know how it is progressing (the value works the same as it would when passing it in the #finished key on an old-style sandbox; a float between 0 and 1, or 1 to indicate the process has finished).
Instead of having to check the value directly, the UpdateInterface also offers a method isFinished(). Nothing fancy, but it allows for some nice self-documenting code (the “real” update hook checked for $sandbox[#finished]>=1) before removing the old table and optionally creating a backup of it). The framework would call isFinished() as well, in order to determine if the update has finished, and if not, call a method getCompletion() to get the actual completion number.
How to proceed
Please feel free to discuss this proposal, or completely ignore it, although I am very interested in hearing what people think. This is my first ideas proposal, please be gentle. Let's see if we can modernise yet another part of Drupal!
Comments
Comment #2
eelkeblokComment #3
larowlanThanks for the effort taken to write up such a detailed proposal.
I can recall having the exact same conversation with a colleague about 6 years ago, and I'm pretty sure he'd proposed almost the same solution, in fact I think he even wrote the implementation. I'll see if I can find the issues he proposed
Comment #4
larowlanComment #5
eelkeblokInteresting. Now that you mention it, I do remember reading somewhere that Laravel took several cues from Rails. It indeed seems very similar. I do still think Migration is not a good term for Drupal (because it is already used for something else).
That issue also raises an interesting point: should we support rolling back updates (something I discarded early on as an idea and never really revisited – maybe I should have).
Comment #6
johnwebdev commentedNice write-up!
Laravel does have packages (similar to a Drupal module), which can have their own migrations.
What are these types? How many types do we have?
Comment #7
eelkeblokJust the two types replacing current hook_update_N (which I suggested we call update type 'database') and current hook_post_update_IDENTIFIER (which I suggested we call update type 'content'). This basically comes down to requirement 2.
Ah, yes, that's true. I did consider having some colleagues proofread this, should have done so.. Looks like they disregard the dependency-problem though (which may not be much of a problem for Laravel, not sure). Edit: changed the wording around that a bit.
Comment #8
eelkeblokComment #9
r_a_s_robin commentedNice suggestions Eelke. I do wonder if you should avoid the name 'migrations' or not. If you would adopt the same system as Laravel, you're also kinda adopting the system of Symfony. In both Laravel and Symfony it is named migrations. (and who knows in what other software too). Consistency about the name of a functionality between different toolings can be more important than clashing with another name used within Drupal.
Comment #10
eiriksmI think that sounds like a good idea too. Except if we introduce it now, it will probably become confusing for Drupal users.
I don't remember off the top of my head what framework has this, but the expression "database migration" is also used in the wild. It could feel consistent and meaningful for symfony/laravel users, while being possible to reason about for Drupal users? My 2 cents on naming :)
Comment #11
james.williamsNot sure if this thought is too much of an implementation detail... but plugins are usually autoloaded using the PSR standard that relies on the convention for the namespace + class name matching the file name + path. So having the datestamp in the file name but not the class name could be problematic. I guess the plugin manager would have to do some manual directory scanning and loading instead of relying on the autoloader. Just seems to introduce an awkward amount of complexity.
I really like the ideas around the batch class, though I’m also vaguely aware that efforts / ideas are going into a modernised batch class system somewhere too. So the update system should probably leverage (& inform?) that work rather than making its own stuff.
I’m not entirely convinced about what the problem is that is supposed to be solved by all this, aside from modernisation for its own sake. If it’s that update hooks are awkward with patches, then the problem may be the patch workflow, not necessarily the actual update system. I quite like the simplicity of the numbered updates. (Although I suppose the need for post-updates demonstrates they may be a little too simple.)
Comment #12
eelkeblokWhen boiled down, it is that if parallel issues (or more in general, developments) on the same module both need an update hook, they will be competing with each other for the update number. This is not just a small inconvenience, but a real headache and risk when you are one of these patches in a project; what happens when the base module gets another update hook? It will get a sequence number clashing with the one that your patch has/had and was already applied. Luckily it has remained a theoretical thing for me up till now, but I dread the day I will actually have to deal with this.
You are correct that this will probably not "just" work like regular plugins (although AFAICT the functioning of plugins is not directly dependent on the naming of classes and files; it is dependent on autoloading doing its work, which in turn is based on these naming concentions). Unfortunately, class names can not start with a number, or the problem of naming files and classes would be easy. It is possible to augment autoloading, though, so I did consider this an implementation detail for the moment. Especially since this is exactly what Laravel does, I figured it should be easy enough to figure out.
Comment #13
james.williamsThanks for clarifying. Yeah, let's not worry too much about that naming implementation detail, especially given that it's solved elsewhere already.
I do think this is a very big set of changes/suggestions for what is a theoretical headache. I'm a maintainer of some contrib modules - nothing too big - but I've come across having to handle multiple update hooks across multiple issues/comments. But to be honest, it doesn't bother me too much. Maybe that's just me! I do agree it's not ideal, but it's a big enough problem to warrant changes of this significance. If there was a solution with less impact, I could be up for that though. For the specific issue of keeping up with update hook numbers, that may be something that can be dealt with by changing processes/workflows rather than the system/code itself, maybe.
I don't mean to close the idea down at all, but I am wary of introducing larger change than necessary, or trying to solve the wrong problem(s). I could still be open to the idea, if we're clearer that there really is enough reason to warrant this much change/effort/impact.
Comment #14
dalinSince content is stored in the database I recommend “schema” and “content”.
The idea of running updates with the site online is intriguing, but then content updates get a whole lot more complicated because you don’t want to process entities that were created while your update is running. But maybe more helper method could smooth that out.
- - migrations
+ + updates
But I to wonder about the cost benefit ratio.
Comment #15
eelkeblokJust to come back to the why, the trouble with patches is only a single aspect (and is regardless of the exact workflow; the problem is no different, for example, now that we are moving to a merge request workflow - the problem is the inherent paralellism that is true for any software project that is being worked in in parallel).
However, other advantages were also mentioned;
WRT batch API, when I wrote the proposal, I was only aware of an effort that essentially just wraps the existing API in a generator class (I can't actually find that..). Now I've also found #1797526: Make batch a wrapper around the queue system, which seems a bit more encouraging. Will check it out to see if it suggests any changes to this proposal.
Comment #16
andypostComment #17
effulgentsia commentedThis is a cool idea! I haven't thought about it in enough detail yet to offer more feedback other than just that it sounds great on the surface. But in the meantime, retitling the issue to clarify that this is about running database update functions/methods, as opposed to updating the codebase.
Comment #18
aaronmchaleI like this a lot!
I especially like that we wouldn't have huge update files anymore, each update in its own file.
I also like the idea of being able to auto generate updates, this is something that has always bothered me a little that Drupal can't do, and I feel one of the few things that other frameworks (like Laravel or Django) have as an advantage over Drupal as an application framework.
Comment #19
nicxvan commentedIt should use attributes probably at this point instead of annotations.
The timestamp can be added to a parameter then most likely.
Comment #20
catchThis should really be in the core queue - it's an API proposal, the ideas queue is more for product changes.
Comment #21
catchComment #22
penyaskitoAs much as I would like this:
This is great for _linear_ updates. You have one website, with one upgrade path. Not sure how this translates when you have several versions as a "product" has. E.g. Upgrading from 10.3 to 11.1, jumping 10.4.
One of the cons of the actual system mentioned is catching up with numbering when a MR is in progress. Timestamps might solve this in some cases, but not when those are interrelated, which will happen quite frequently if they are part of the same module. So don't think it would solve this.
I'm not sure if this would change how updates are executed when there are several modules, if needs to change. All from module A, then all from module B? All based on timestamp sorting? What happens in the edge case two bundles have the same timestamp?
Can be problematic, but I guess this is solved in Laravel world with their bundles, I just never faced it.
Even if I love how Laravel works for this matter, not sure if it translates well for Drupal. But I'd love we could do OOO for updates.
Comment #23
catchYes this would need an equivalent of #3108658: Handling update path divergence between 11.x and 10.x, which itself took us over a decade to figure out.
The update hook numbering system is sometimes hard and confusing, but I think that's because especially with core, supporting updates across different possible combinations of major and minor versions (10.3 to 10.4, 10.3 to 11.0, 10.3 to 10.4 to 11.0, 10.3 to 11.2, 10.4 to 11.2, 10.3 to 10.4 to 11.0 to 11.1 to 11.2) is hard and confusing.
The thing I personally always get wrong / have to copy and paste from a different update is the batch handling.
Comment #24
murzIn Drupal 11.1.0 there is a great movement from legacy procedural hooks to OOP, see https://www.drupal.org/node/3442349
But
hook_update_N()is still procedural, so maybe try to reconsider converting it to OOP using the new native Drupal way?