Problem/Motivation
When a config patch is created it may create lots of entries to add and remove just because the integer keys don't match up.
We need to improve this to show only what is really added or removed. This would become more problematic when things are edited and then the diff shows that everything changed while it in fact only changed slightly.
ie we don't want to keep indexes when they are not relevant.
Steps to reproduce
split something affecting permissions for example.
Proposed resolution
Unfortunately it means we can not do the clever trick with the array utilities.. we have to copy that code to config_split and improve it to handle the situation with integer keys. Maybe we could patch drupal core as well, but maybe we will have to use the config schema and that would be outside of the relam of what the utility can do.
tbd.
Remaining tasks
Write a test with permissions and other config where this is an issue (in general sequences are a pain)
User interface changes
none
API changes
none
Data model changes
slimmer config patch.
| Comment | File | Size | Author |
|---|---|---|---|
| #39 | interdiff_28-38.txt | 2.86 KB | pyrello |
| #38 | 3232243-38.patch | 10.33 KB | pyrello |
| #10 | improve-patch-merge-sequences-3232243-10-FAILING.patch | 1.09 KB | pyrello |
Issue fork config_split-3232243
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #4
pyrello commentedI made a first attempt at getting the logic for sequences working better. Next step is to write a test for this, but I'll probably need some help on that front, since I don't have a lot of experience doing it.
Comment #7
pyrello commentedI'm starting over with a new branch and MR so that I can establish passing tests to start with.
Comment #8
pyrello commentedUgh. Didn't realize that using the ProphecyTrait would be a problem, so I had to remove it to get the test runner not to error out. Also reverted the last commit prior to that to establish the initial passing test.
Comment #9
pyrello commentedLast commit should fix the new test that was failing, but will fail on the existing
testSimpleMergeExampletest. I'm still not totally clear on the logic of what is happening in that test, but it fails on the last assertion.Comment #10
pyrello commentedAdding a patch with just the new permission (sequence) test that was added to the MR in the last commit to demonstrate that it is failing prior to the rest of the changes in the MR.
Comment #12
bircherThis is looking great already.
One small thing we need to add to the test you are adding (in #10): We should also test that the merging works again. I know the sorting can be off, that is a problem for another day.
But basically if you take the config patch (, revert it) and apply it to one of the examples you should get the other one.
We can also add more contrived examples that add and remove things etc with a dataprovider but that is not a hard requirement for a first version.
I think the failing test is testing pretty much this, though in a more involved workflow. So if you can make an easier-to-understand version like I described in the first paragraph of this comment then that will be helpful for all contributors.
Comment #13
birchermaybe it is as simple as switching the boolean value to the preserve integer keys on the merge utility.
Oh and the method should probably not be public, none of this is an API.
Comment #14
pyrello commentedThe problem that is causing the previously existing patch test to fail seems to happen because of the
NestedArray::mergeDeepArray()method, which is unfortunate. When you look at the existing test for this method (https://git.drupalcode.org/project/drupal/-/blob/9.3.x/core/tests/Drupal...), I think it is pretty clear that this use case wasn't envisioned. If$preserve_integer_keysisFALSE, there is noarray_searchto check if the value already exists so that it doesn't get added twice.I'm not sure if, in the world of config, it is a valid use case to have multiple elements in a sequence that are identical to one another or not. If that is a valid use case, then I'm not sure how to solve for both that and being able to merge config that may have duplicative elements that are not intended to be duplicated.
Comment #15
pyrello commentedComment #17
pyrello commentedI opened MR 11 because I was needing to test out the code from https://www.drupal.org/project/config_split/issues/3232667 and this issue at the same time and be able to show progress on the corresponding working that I am doing related to this. I can close it when I am able to create a "finalized" patch or alternately these two issues' MR's get merged.
Comment #18
pyrello commentedBased on my understanding of the issues that were raised in Slack about the necessity of possible duplicate entries in config sequences, it seems like we have two situations that we are trying to account for:
user.role.*.ymlfile.I believe that the first case is actually the more common use case for config and if we were going to solve one versus the other in the near term, that it should be that case. @bircher and I have discussed the possibility of using some sort of plugin system that would provide a finer degree of control over individual cases. I think that is probably going to be necessary without changes to core config schema properties. If such a system comes into being, I still think the default case should be de-duplicating config sequences.
I think MR 9 is mostly accomplishing the first case above.
As the most recent test failure shows, there may still be room for adding some sort of post-merge sorting to sequences, so they will be in a predictable order.
Comment #19
pyrello commentedExpecting this to go from 1 failing test to 2 failing tests.
Comment #20
pyrello commentedAnd then back to 1 failing test.
Comment #21
bircherI just had a crazy idea, what if we make the keys of sequencial arrays string keys with some smart logic before doing the diff?
Like if the array contains unique strings (such as for permissions or dependencies) we make the key
csp_unique $value. Or if the array contains another array we check if it has a uuid or id component and make the keycsp_uuid $value['uuid'].Then after merging we clean it up again and remove all the keys where we added the csp_ prefix. For legitimate duplicates we have its own key, and for arrays which dont have a uuid or where it is not unique we leave it as is or just pefix it but keep it an integer so that it will be removed and added as now.
Basically the idea is to try to identify the elemnents better so that the diff doesn't compare by identifying it as "the first element"
Comment #22
pyrello commentedWe had discussed how we might handle things like
core.entity_view_display.node.*, especially in cases where Layout Builder is turned on and being used to provide a content type template. I think ideally, we would be able to have some sort of object comparison go through and figure out what is different for each object it finds. But I also think that how it is working right now is fine for the time being.Here is an example of a patched config file of the above mentioned type: https://github.com/uiowa/uiowa/blob/0115ed14b9ed5115f21c149301d3a16488f2...
The issue we are trying to solve in this thread is something that makes CS2 unusable for certain types of configuration, which would be necessary for us to move forward with using it. Solving for more intelligent LB template handling and similar use cases seems like more of a nice to have that we could add in the future. So I propose a separate issue for tackling that and decoupling it from this issue.
If we are removing that from consideration, then I think we just need to figure out how to fix the sorting problem in the tests and make whatever adjustments are necessary there to make the tests work in a way that we have confidence in them.
Comment #23
alokbhatt commentedI was facing an issue while run config-split:import for user role permissions, in which I was changing permission for some roles, so I added a few permissions with config-split.patch file. Once import config-split, I noticed that some of the permissions has been changed assigned in main user.role permission configuration file and due to that admin toolbar was removed for a role.
I have applied a patch https://git.drupalcode.org/project/config_split/-/merge_requests/11.patch and that fixed the problem. Now only permission affected written in config-split.patch file.
Thanks for the patch.
Comment #24
pyrello commented@alokbhatt - Just a word of warning that work in this issue is ongoing and so the format could yet change.
Glad to hear the patch worked for you!
Comment #25
pyrello commented@bircher - If you are okay with deferring the more complex object handling that we have discussed until this issue: https://www.drupal.org/project/config_split/issues/3238855, then I think the only thing needed here is to resolve the issue that causes the merge test to fail, which is that the merged sequences don't match because we are not sorting them. I'm not 100% sure that the test itself doesn't cause this problem because we are using fictitious config that is not associated with a schema that would allow the Sorter to do its thing. Some guidance on how to proceed would be appreciated.
Comment #27
pyrello commentedThis solution resolves the failing test. Looking for feedback on whether this approach is okay.
Comment #28
pyrello commentedAdding a patch that I can reference from my project. Based on my testing, this seems stable enough for me to use for the time being.
Comment #29
bircherThis looks already very good. I am just a bit afraid of the sorting, but maybe it is just me being paranoid.
I will have to test this a bit more with some real data and maybe commit it in two steps so that we have more traces in the git log of how the array utility methods change.
It may be hypothetical (and in Drupal there is always someone that has a case where it becomes a practical case) but I don't know if this could handle things that are on purpose duplicated in sequences. (like the same layout builder block twice?).
That is why I thought it could be an option to do something more defensively, ie change keys from numeric to predictable strings in cases we know what to expect (ie know the type of values etc).
Also if this works for you and you are not @pyrello or @alokbhatt (and not working on the same teams) then by all means chime in and let me know! The more people that say that this fixes their problem and nobody that says that it makes things worse the more confident I am with the approach.
Comment #30
pyrello commentedFor what its worth, I believe that for block in layout templates in the
core.entity_view_display.node.*, each block (even duplicates) would have a different UUID, so this would not be an issue in that case. I can’t say for sure that this would never come up (your point about there always being a practical case for hypotheticals in Drupal is well taken). My testing with those particular files has seemed to indicate that they patch as expected.I get being paranoid about the sorting. That is that part of this that I am the most unsure about. But, it also doesn’t break any existing tests, so it is not entirely clear without additional information what the problem with it might be.
Comment #31
tedfordgif commentedHere are the configs in one of my codebases that have explicit integer array keys:
The integer key in devel.settings is for the error_handlers, which is used in devel.module:
system.site has 403 and 404, but those are siblings of the "front" key, so this would technically be OK with your patch. Feels a little squishy to me, if not fishy.
For views.view.* I have four scenarios:
Why don't we just make all configs strongly typed? <ducks>
Comment #32
pyrello commented@tedforgif Thanks for bringing up these cases! This truly is not an easy problem to solve.
I have no idea how to resolve the case of
devel.settings.yml:I think by the time we get to the point where we are doing the array merge, the above would be indistinguishable from:
For the
system.site.yml, that is exactly the reason why it only considers the keys to be numeric if all of them are numeric. I'm open to suggestions about how to improve it.I'll have to look into views a bit more. While I don't think we are currently using config split to override any existing views (which is where this patch merge code would come into play), it is something I had been thinking we would be able to take advantage of after moving to Config Split 2.
I will agree that this solution breaks (or potentially breaks) some of the above config entities if they are partially split. However, I would ask, does that seem like a bigger bug than the way it currently handles something like a
user.role.*.ymlwithout these changes? Here is an example of a patch export of a permissions file from my project: https://github.com/uiowa/uiowa/blob/b2f9d8847917aff4ca62637fe1055fffa931...Comment #33
tedfordgif commentedEdit: I know these don't address the "do we ignore the keys" question, but they are related. I don't have time to track down the existence of the "ignore?" ticket.
Comment #34
pyrello commentedI don't think that first issue will have any affect on the problem with sequence handling outlined by this issue. That issue is driven by the fact that the merging method used doesn't have an elegant way of handling sequences (which really just reflects the same underlying problem with PHP's
array_merge_recursive()function).I added a test to cover the
devel.settings.ymlcase.Comment #35
bircherhmm instead of checking if the keys are integers, I think we could use the same logic the yaml formatting is doing to decide whether to use a dash or the number. (I am not suggesting to call that function but to copy that code or idea)
I think it comes down to check if the array keys match
range(0, count($arr))or similar.I think it would also be good to clearly see where we deviate from the core utility.
I am not sure that we shouldn't also be able to deal with duplicates, but I don't have any good example of any config where that would be a thing. But for that I think the only way is to transform the keys so that this information can be kept.
What I like about this is that it solves a lot of problems.
And I think we should not interfere with the sorting here, its fine if it gets added to the end. The sorting should be a different concern. So for the test we can add a sorter that sorts everything recursively and add a dummy test name (I will refactor it anyway sooner or later to make the name required)
Or alternatively we make assertions that ignore the order.
Comment #36
pyrello commentedTo clarify, if we are going to do a check for duplicates, this would be within each array prior to merging. The merge by its very nature might produce duplicates and we wouldn't want to keep those. So, we would need a check prior to looping through the arrays to check this and to set the
$is_sequencevariable (introduced in the last commit) to be FALSE if duplicates were detected.Comment #37
pyrello commentedFor duplicates, I think we could add another private static method that essentially does something like this: https://stackoverflow.com/a/43635394/1375741 and checks each array before processing and sets
$is_sequenceto FALSE if duplicates are present.However, since we are not actually aware of any cases where valid duplicates might be used, would it make sense to leave this as is, for now, and add the duplicate check later if someone actually reports it as an issue?
Comment #38
pyrello commentedAdding another patch based on current progress.
Comment #39
pyrello commentedMessed up the interdiff. Here's a new one.
Comment #40
bircherRE #37: Yes that makes sense, we can ignore duplicates for now, and then deal with it when there is a real case.
I suspect that your proposed solution for duplicates won't work though because we would need to detect duplicates also when creating the patch. So we could have duplicates before merging, we could have duplicates in the patch or we could have duplicates in the array before applying the patch.
So in order not to get the perfect in the way of the good I think we can live with it for now.
Just a little thing: the new methods need to be private because they do certainly not constitute an API and I can already see people using them and then their code breaks... All of this here is just a sandbox to find out what an API in core or another module would have to do.
Comment #41
bircherI'll give it a proper review when I have some more time, but for now the public methods need work. (ie the tests need to be updated too)
I would prefer if we tested it through the public api of creating the patch and applying it, but I can live with a closure that uses
callto access private functions of other classes in a test or even reflection.Comment #42
pyrello commentedSeeing what happens with the tests.
Comment #43
pyrello commentedI haven't quite figured out the
testNonSequenceNumericKeyMergetest yet, but I think I have the rest of them working. If its obvious what I'm missing about how that test should be set up, let me know.Comment #44
pyrello commentedComment #45
bircherAh that is because the test you copied is way too complicated for what we want to test here.
All we need is:
$configA
$configB
then we create a $patchAB and $patchBA
then we assert that the patch applied to the config gives the other config. (ie it works both ways.)
And that one patch is the inverse of the other.
Comment #46
bircheroh and of course the config A and B could come from a data provider, and we could optionally add a expected patch.
Comment #47
pyrello commentedRefactored the test for numeric keys which are not a sequence. In the process of doing that, I noticed that the diffArrayRecursive method was not respecting the non-sequence keys, which was causing the test to fail. I refactored to fix this issue.
Ready for review!
Comment #48
pyrello commentedChanging the logic in diffArrayRecursive broke another test.
Comment #49
pyrello commentedThe test was implicitly using a mixed array (numeric and string keys), which is not what I think we are trying to test.
Comment #50
bircherI refactored the tests a bit. I hope it will be easier in the future to add test cases to the provider.
I also added it to the beginning so that it is the first thing you see and not the complicated test I made for myself.
Comment #51
pyrello commentedThese changes look great! Much more robust testing tools for the patch side of things!
Comment #52
pyrello commentedI checked with @tedfordgif and he indicated that he was okay with the changes here. I tested again against my setup and am still getting consistent results for
user.role.*config files on export. Marking as RTBC.Comment #54
bircherThanks for all the contribution!
This is a big improvement even though we left some cases unaccounted for.