Follow-up to MR !379. The current splitter uses two checks to decide whether the first segment of the legacy payload is a langcode:
- SHAPE — first segment matches
/^[a-z]{2,3}(?:-[a-z0-9]+)*$/i(2-3 letters, optionally followed by hyphenated BCP47 tags); - INSTALLED — first segment is in the configured-languages list.
If the SHAPE matches but the language is not installed, the legacy row is skipped and left in default storage for admin review.
Issue
The SHAPE check mis-classifies 3-letter Drupal module names whose configurations have a perfectly valid 3-letter first segment: eca.settings (ECA), seo.config, geo.…, gtm.…, oai.…, etc.
For a 2.x site that had a per-domain override on, say, eca.settings, the legacy row domain.config.{domain_id}.eca.settings would:
- match the langcode SHAPE (
ecais 3 lowercase letters); - fail the INSTALLED check (
ecais not a Drupal-supported langcode); - be silently skipped and left on disk.
The migration thus drops legitimate legacy module-config overrides on the floor for any 3-letter module on the site.
Proposed fix
Drop the SHAPE regex entirely and use the installed-languages list as the only signal:
if (isset($languages[$first_segment])) {
$langcode = $first_segment;
$config_name = substr($payload, $first_dot + 1);
}
else {
$langcode = NULL;
$config_name = $payload;
}3-letter module configs migrate correctly into the per-domain collection. The trade-off is symmetric to the one called out in MR !379's docblock, but smaller in scope:
- Before: legacy rows with langcode-shaped first segments for languages since uninstalled stay on disk in default storage; legacy rows with 3-letter module-name first segments are silently stranded too. Both invisible at runtime.
- After: legacy rows with first segments in the installed-languages list go to the per-(domain, langcode) collection; everything else goes to the per-domain collection under the original name. A row whose first segment is a langcode for a since-uninstalled language ends up in the per-domain collection under an invalid name -- preserved, never read at runtime, admin can clean up by hand. A row with a 3-letter module-name first segment ends up in the per-domain collection under its real name, where the runtime override mechanism picks it up correctly.
The trade-off favours correctness for the more common case (sites with 3-letter modules) at the cost of a rarely-real edge case (per-domain overrides for uninstalled languages).
Parallel work on 3.0.x
The same simplification is being applied to 3.0.x's DomainConfigMigration in #3589035 / MR !380. The collection-name strings produced by both branches are byte-identical, so an upgrade path 3.0.x → 3.1.x stays clean: a site that ran 3.0.x's patched migration lands in a state 3.1.x's _update_10002 can no-op past on the data side.
Tests
The kernel test DomainConfigOverrideMigrationTest needs:
- removal of
testSkipsLangcodeShapedFirstSegmentWhenLanguageNotInstalled(no longer the behaviour); - replacement of
testLeavesLegacyEntryForUnknownLanguageInPlacewithtestTreatsUninstalledLangcodeFirstSegmentAsConfigName(positive assertion on the new outcome); - a new
testMigratesThreeLetterModuleConfigEntrypinning theeca.settingscase; - docblock cleanup on
testDoesNotMisinterpretModuleNameAsLangcodeto remove the obsolete shape-regex framing.
17 tests, 101 assertions, PHPCS clean on the local branch ready to push.
Issue fork domain-3589046
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #4
mably commented