Follow-up / Split-off from #2466097: uasort() does NOT preserve sort order, but core assumes so (Element::children, css sort, ...)
Use cases
- (most common *) An array of arrays, where each item may have a weight in
$item[$weight_key]. - (less common) An array of arrays, where each item may have multiple weights in
$item[$weight_key_0],$item[$weight_key_1], etc. - (less common) An array of objects, where each object may have a weight method e.g.
$weight = $item->getWeight(). - (less common) An array of items of whichever type, with a callback to determine a weight for each item. E.g.
$weight = $weight_callback($item);
(*) The most common example for the first case is render arrays with $item['#weight']. However, these should not be dealt with here, because there are special peculiarities we don't want to worry about. Esp the distinction of element children vs element properties.
#2522712-26: Simplify Element::children() stable sort.
Existing implementations, previous attempts
Currently there are different sort implementations for different instances of these use cases, often with uasort() and a comparison function.
One problem with this is that uasort() is not stable.
The other problem is that comparison functions are slow. Really slow. Compared to other solutions.
Benchmark results can be found in #2466097-91: uasort() does NOT preserve sort order, but core assumes so (Element::children, css sort, ...) ff.
Scope of this issue: Dedicated sort-by-weight
The goal of this issue is to provide dedicated reusable functions (static methods) for stable sorting based on item weights.
These should be simple, fast, and generic.
Example implementations can be found in https://github.com/donquixote/stable/tree/drupal-org-2466097-95-sort-keys
This issue is only about introducing the new functions + unit tests.
Actually using them should happen in another issue.
This way we keep the git history clean, and the issue readable, avoiding distractions.
This can happen in 8.1.x, because it will not BC-break anything.
DX questions
How do we want to distinguish asort() vs sort() ? (Preserve keys or not)
Distinct methods? Or an additional parameter?
How can the caller enable or disable (relatively expensive) checks such as is_array(), or possibly method_exists() etc? Currently in the proposed methods, the caller can use e.g. sortMixedByWeightKey() to force an is_array() check, or sortArraysByWeightKey() to omit such a check. But this brings a clutter of different methods.
Do we want to return a sorted array, or modify the first parameter by-reference, like the native PHP sort functions?
Do we want an options to make the sort non-stable? Just to get rid of comparison callbacks..
Do we want a fluid API?
This would make some of the API design questions above easier. But it would result in a bigger library than what we are ready to put in core.
Fluid API? Injectable sorts?
If we want a fluid API (which is not decided yet), there are two options:
// First option:
$items_sorted = StableSort::begin()
->byKey('weight', SORT_NUMERIC)
->sort($items_unsorted);
// Second option:
$items_sorted = StableSort::begin($items_unsorted)
->byKey('weight', SORT_NUMERIC)
->sort();
Only the first option leads us to reusable, injectable sorts. E.g.
$sort = StableSort::begin()
->byKey('weight', SORT_NUMERIC);
foreach ($items_all as $k => $items) {
$items_all[$k] = $sort->sort($items);
}
Why a new issue?
From #2466097-97:
This way, newcomers or people who want to understand the git history don't have to read 80 comments about an approach that was already discarded.
| Comment | File | Size | Author |
|---|---|---|---|
| #40 | 2757207-nr-bot.txt | 4.35 KB | needs-review-queue-bot |
| #26 | D8-2757207-26-vs-7-StableSort.interdiff.txt | 55.96 KB | donquixote |
| #26 | D8-2757207-26-StableSort.patch | 39.06 KB | donquixote |
Comments
Comment #2
donquixote commentedComment #3
donquixote commentedComment #4
donquixote commentedIt just came to my mind that in our current plans we don't have an option to sort descending instead of ascending.
The same is true for backdrop_sort(), see https://github.com/quicksketch/backdrop/blob/7bee8d0433e7401e7b5ff95790d...
I will add a parameter to choose a sort direction.
Comment #5
donquixote commentedSo far I am considering an algorithm with (simplified):
I just found a possible "weakness" of this algorithm.
(string)'0.0' is not the same as
(string)0.0or(string)'0.00'.https://3v4l.org/Ua5KU
This can lead to a non-stable sort behavior in some cases with SORT_NUMERIC.
I hope we can say it is the caller's responsibility to use "canonical" float representations.
Comment #6
donquixote commentedWhat should be the "neutral weight" for SORT_REGULAR and SORT_NATURAL?
Maybe '' is ok for all of them?
backdrop_sort() uses '' for SORT_STRING, and 0 otherwise. But it says somewhere that only SORT_STRING and SORT_NUMERIC are supported.
Maybe we should use 0 for SORT_NUMERIC, and '' for any other, to avoid this artificial limitation.
Comment #7
donquixote commentedComment #8
donquixote commentedUseful information about SORT_REGULAR: http://stackoverflow.com/a/38112705/246724
Comment #9
fabianx commentedQuick Drive-By-Review: Looks great already!
I think just StableSort is fine.
What about we put that as a protected static configuration map:
SORT_NUMERIC => '',
'default' => '',
Unless we are sure we never want to extend that.
Comment #10
fabianx commentedOh and it would be great if we could start some conversions of things from uasort() to this new method.
And we definitely should do some profiling.
Comment #11
donquixote commentedThanks!
I will wait with a new patch until there is more feedback :)
Ok.
I was undecided about this. I personally like to prepend *Util to classes that are not meant to be instantiated. But this does not seem to be a pattern in Drupal. So I'm fine without it.
People can override the
sortFlagsGetNeutralWeight()method, can't they?But I have a different proposal (which I think should be done more often in Drupal):
Seal the class with "final", and open it up when someone actually has a valid use case for extending.
At this point we can decide to introduce the protected configuration map, or anything else we deem useful. The
sortFlagsGetNeutralWeight()can remain, but will then use the conf map.Btw by "configuration map" I hope we agree this should be treated as a constant, right? No fiddling around with dynamic settings!
And before we think of inheritance, maybe there are better options:
SortInterface, with different implementations WeightKeySort, WeightMethodSort, and a MultiSort which accepts aSortInterface[]parameter for an array of SortInterface objects. This allows to combine different types of sort, so it is not limited to array with weight keys. Implementations can simply call the static methods, I don't care.I would prefer if we could keep this patch small, and also keep the issue discussion focused.
Maybe we can upload proof-of-concept patches somewhere, which are not going to be committed but just show that it works? Maybe here, or maybe in another issue.
I just hate if a "git blame" sends me to a huge commit and then an issue with a really long list of comments, where only a small part of the commit and the issue actually talk about what I am looking for.
And then people complain if I post without having read all the comments...
It would be great if we can avoid this.
Is it ok to do the profiling in the github library at https://github.com/donquixote/stable ?
The code has changed a bit, but performance should still be similar to what was measured there. There is a bit of added indirection due to the
ksortAndMergeGroups()being factored out. But this call is outside of any loops, so it happens only once per sort.Comment #12
fabianx commented#11:
- A proof of concept patch on-top of this patch would be great. It does not need to convert all of core, but show some examples.
- That patch can then be profiled with XhProf / Blackfire.io / Xhprof-Kit to see if there is an effect on overall core performance (should be faster as uasort is quite slow as we saw).
So happy to discuss that in another issue, but some conversions should be shown.
Comment #13
donquixote commentedI just noticed a case in Symfony that uses uasort() in a similar way as we do, likely with the same issues for performance and stability.
For me this is yet another indicator that this code should live in a 3rd party library.
This would also allow to follow through with the "advanced ideas" in #11.
I will proceed with this work as planned within core, until I get another opinion.
Comment #14
donquixote commentedSortArray::sortByKeyString()usesstrnatcasecmp().I suppose this is equivalent to sorting with SORT_NATURAL, but after applying strotolower().
system_sort_modules_by_info_name()usesstrcasecmp().The PHP manual says this is "binary safe". I wonder if the PHP sort functions / flags are also "binary safe", and how we can verify this.
We could rely on the SORT_FLAG_CASE to achieve case insensitivity.
But then the grouping would make the sort unstable.
The correct order can only be achieved by calling
strtolower()while building the groups. But this will likely destroy the performance gain.The next best option is to use
array_multisort()for such cases. Here the only question is whether this is "binary safe".Comment #15
donquixote commentedQuick note:
It turns out that the array_multisort() solution is equally fast as the grouping solution IF all or most items have different weights.
If a lot of items have the same weight, then the grouping solution (itemsByKey) is faster.
E.g. when sorting blocks by title, we can assume that each block has a different title, so array_multisort() is a good fit.
When sorting something by numeric weight, a lot of items will likely have weight === 0, which makes the grouping solution faster.
With string weights, calling strtolower() for each item's weight, which is necessary to achieve a stable case-insensitive sort by string weight, puts the nail in the coffin for the grouping solution, making it slower than the array_multisort() solution. Or even worse when using mb_strtolower() instead of strtolower().
There will be numbers. But for now I am only posting this quick note.
Comment #16
fabianx commented#15: Thank you very much for your research! Much appreciated!
Comment #17
donquixote commentedMore benchmark results! Smaller number = faster.
https://github.com/donquixote/stable/blob/drupal-org-2757207-16-benchmar...
Everyone is invited to reproduce this stuff and play with the code.
Percentiles (medians at different %) are better than averages!
http://apmblog.dynatrace.com/2012/11/14/why-averages-suck-and-percentile...
sortByWeightKey()
php bench/run.phpComparing sortByWeightKey() functions with SORT_NUMERIC, where the weight key is present in some items, but not all.
Size of sortable array reduced to 100.
This means: A sort which checks
is_array()for each item takes 2x the time than it would without this check.Whenever we already know that items are arrays, it is worthwhile to skip the is_array() check.
sortByKnownWeightKey()
php bench/run-knownKey.phpComparing sortByWeightKey functions with SORT_NUMERIC, where the weight key is present in each item, but the weight is 0 a lot of times. The version with _knownKey omits the
isset($item[$key])check. The version with _isArray checks for each item whether it is an array, in addition to the isset() check. _arrayKeyExists uses array_key_exists() instead of isset.We can see that omitting the isset() makes the sort slightly faster. But the difference is not big enough to justify a separate implementation.
array_key_exists() is not a good idea at all, from a performance point of view. Let's just forget about it. This was to be expected, but now we have it in numbers.
Sort by string weight, case insensitive, even distribution
php bench/run-strtolower-evendist.phpComparing sort by weight functions that sort by string weight with SORT_NATURAL, case insensitively.
Precise implementations can be seen in the git tag.
Here each item has a distinct string weight at $item['x'].
It should be noted that the first one, itemsByWeight(), while it looks great, will not work correctly in all cases, because it puts different uppercase / lowercase variations of a string into distinct groups, thus destroying the initial order.
Also the Schwartzian transform solution (google it!) does not behave accurately, because it does not allow any sort flags other than SORT_REGULAR. This is because it compares arrays, not strings or integers.
From those options that actually work, array_multisort() is a clear winner.
Sort by string weight, case insensitive
php bench/run-strtolower.phpComparing sort by weight functions that sort by string weight with SORT_NATURAL, case insensitively.
Precise implementations can be seen in the git tag.
Here 50% of items have distinct string weights at $item['x'], but the other 50% of items have $item['x'] === 'nn'.
It can be seen that the grouping solution (itemsByWeight) is faster if a lot of items have the same (string) weight.
If they don't, then array_multisort() is faster.
Other interesting findings:
- When using a callback, a function name is much faster than a closure. Funny, eh?
- A callback that computes all weights at once is faster than one that we have to call in every iteration. Not really surprising I'd say..
- Using call_user_func($weight_callback, $item) is slower than calling the callback directly with $weight_callback($item). But the latter does not work for some callback types.
But either way,
array_multisort()is the best of those options that actually work.Comment #18
donquixote commentedMaking the
array_multisort()option slightly faster usingarray_combine():https://github.com/donquixote/stable/blob/drupal-org-2757207-18-faster-m...
Output of bench/run.php:
There is a clear improvement from sortByWeightKey_arrayMultisort() to sortByWeightKey_arrayMultisort_arrayCombine().
Comment #19
Anonymous (not verified) commentedI just had a quick go with using fractions to stabilise the array:
Gives the following by your benchmark.
It does solve the issue with casting weights to strings, and decimals not being equal. You could try
(string) floatval($weight)?Other suggestions:
I'd avoid disallowing construction of instances like this, the reason being that really this should be a service in the dependency injection container (such as
utility.array_sort), and if I were going to use this in Symfony, I'd have to override that constructor again, or write a class that wraps that class, just so I can set up a service that can be injected. Really, since Drupal 8 is using the dependency injection component, static methods should be used sparingly (even though this isn't the case.) You can still call static methods on an instance of a class, so you could leave all your methods as static methods; I would personally discourage this also though, as it encourages people to call the static method on the class directly, rather than injecting the service.Comment #20
donquixote commentedThere is a place and a time for everything...
Really I don't think a component like this should be injected.
The sort functions do not depend on anything dynamic, and there is little reason why any component would want a different sort implementation assigned dynamically.
Knowing which sort implementation is being used makes this entire system more predictable, which is good.
If we want injectable sorts that actually make sense, I would suggest something very different:
This would allow a configurable / pluggable sort order for various components. E.g. one could decide that entity types should be sorted alphabetically, but backwards. Or by their string length.
I currently do not see a valid use case for this. But at least it provides actual value.
The implementation can still live in the static methods. Or.. it does not matter.
----
Imo a class should be designed either to be used statically, or to be instantiated, but not both.
And if it is designed to be instantiated, then it should look very different from what is being proposed here.
Well you can't, it is final :)
Comment #21
donquixote commentedIsn't this largely the same as the existing
sortByWeightKey_weightWithFraction()?https://github.com/donquixote/stable/blob/drupal-org-2757207-18-faster-m...
In #18 this has the same performance as array_multisort(), but is less flexible. So maybe just use array_multisort instead?
It would be more meaningful to actually compare this to something else :)
Ideally start with the git tag from #18, it has a nicer way of showing benchmark results.
If you can get faster than array_multisort(), then your proposal shall be considered! We could check the sort flags, and then choose the implementation that is fastest / suitable for the given sort flags.
True. It does prevent unstableness with e.g. '00.1' vs '0.1' vs '.1' vs '.100'.
But so does array_multisort(). So it has to be faster than array_multisort() to be considered an option.
I'm afraid this will kill the performance benefit of the grouping. And sometimes float gives us fraction dirt.
And this means we need different code for SORT_NUMERIC and other sorts.
------
Btw I just realized that SORT_NATURAL, which I suppose uses the same comparison as
strnatcmp(), also breaks stability with the grouping, even without SORT_FLAG_CASE. Because e.g.strnatcmp('0', '00') === 0, see https://3v4l.org/1V8MK.Maybe we have to ditch the grouping altogether, and use array_multisort() in all cases.
Comment #22
donquixote commentedFollowing your floatval() suggestion:
https://github.com/donquixote/stable/blob/drupal-org-2757207-22-float/sr...
floatval()is expensive, but(float)is cheap.I did not encounter any float dirt effects. I don't really know when these occur.
So it shall be
Comment #23
donquixote commentedhttps://github.com/donquixote/stable/blob/drupal-org-2757207-23-keys-mul...
With multiple keys, array_multisort() are sometimes faster, sometimes slower than the grouping solution.
But the results are quite "jumpy".. See
vs
Also it depends on how many items have the same weight.
I would say let's array_multisort() is generally preferable, because it is fast enough, and more reliable and versatile than other options.
The other solution might be ok for edge cases.
Comment #24
fabianx commentedI do agree with #23, array_multisort() also has been already used in core before.
Comment #25
donquixote commentedComment #26
donquixote commentedPatch contains the same StableSort class as in #2759479-25: Proof of concept: Replace uasort() with dedicated stable-sort-by-weight functions, where possible, but for 8.1.x, and without the changes to other code.
Comment #27
donquixote commentedExample for fluid API:
https://github.com/donquixote/stable/blob/drupal-org-2757207-27-fluid-AP...
(The tag still contains all the benchmark stuff and the demo implementations, which should be disregarded this time.)
SortInterface:
https://github.com/donquixote/stable/blob/drupal-org-2757207-27-fluid-AP...
Likely this has some performance overhead compared to the static methods, due to indirection and a bit of extra class loading.
But most of this overhead should be O(1) (constant), or maybe some of it O(n) (linear), because most of the indirect calls and builder operations happen at most once per sort operation.
Sort objects can be cached at runtime, which means that some possibly expensive decisions now only happen once per request, instead of once per sort operation.
This said: So far I am not planning to make this stuff part of Drupal, unless others specifically ask for it.
Comment #33
avpadernoComment #34
xjmThis would be a minor-only addition. Since 8.9.x and 9.0.x are now in beta, I'm moving this to 9.1.x. Thanks!
Comment #40
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.