Problem/Motivation
The Drupal\Core\KeyValueStore\DatabaseStorage uses a configurable serializer to store PHP objects. This serializer is injected as a service in core. The pre-configured serializer used there Drupal\Component\Serialization\PhpSerialize:
keyvalue.database:
class: Drupal\Core\KeyValueStore\KeyValueDatabaseFactory
arguments: ['@serialization.phpserialize', '@database']
serialization.phpserialize:
class: Drupal\Component\Serialization\PhpSerialize
This architecture allows contrib modules like https://www.drupal.org/project/igbinary to swap the serializer. For example igbinary shrinks the size of larger serialized PHP objects to one third without losing any performance. (In some cases the performance is increased as well.)
Having this kind of memory saving would be a huge benefit for all kind of drupal caches that store serialized PHP objects, especially if there using in memory solutions like memcache. Systems like redis or even the database will benefit from igbinary as well.
Unfortunately the Drupal\Core\Cache\DatabaseBackend and the Drupal\Core\Cache\MemoryBackend use a hard-coded PHP serialization. Therefore the could not be easily improved by modules like https://www.drupal.org/project/igbinary
Proposed resolution
Follow the pattern of Drupal\Core\KeyValueStore\DatabaseStorage and Drupal\Core\KeyValueStore\DatabaseStorageExpirable and inject the serialize as a service to the Drupal\Core\Cache\DatabaseBackend.
The default configuration in core should then "remain" as it is and inject Drupal\Component\Serialization\PhpSerialize.
Even if this improvement makes sense for the Drupal\Core\Cache\MemoryBackend as well, @catch decided to not modify it in core. This backend is heavily used in the tests which will be more complicated afterwards. Therefore the igbinary contrib module will provide it's own backend as drop-in replacement.
As proposed by @berdir we introduce a "new" service serialization.objectaware.default that is in core just an alias for serialization.phpserialize. The redis module will leverage this one as well as soon as this patch gets committed. This new default service eases the task for the modules like igbinary to switch (most of) the serilaizers system wide.
Remaining tasks
Review the existing patch.
User interface changes
None.
API changes
Similar to Drupal\Core\KeyValueStore\KeyValueDatabaseFactory and Drupal\Core\KeyValueStore\DatabaseStorage the constructor of Drupal\Core\Cache\DatabaseBackend an its factory need to be prepared for dependency injection.
In addition we define a new interface Drupal\Component\Serialization\ObjectAwareSerializationInterface that just extends Drupal\Component\Serialization\SerializationInterface to clearly indicate that implementing Serializers are suitable for PHP Object serialization.
Old:
DatabaseBackendFactory::__construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider);
DatabaseBackend::__construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, $bin);
New:
interface ObjectAwareSerializationInterface extends SerializationInterface {}
DatabaseBackendFactory::__construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, ObjectSerializationInterface $serializer = NULL);
DatabaseBackend::__construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, $bin, ObjectSerializationInterface $serializer = NULL);
These API changes don't break BC!
Data model changes
None.
| Comment | File | Size | Author |
|---|
Issue fork drupal-839444
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 839444-make-serializer-customizable
changes, plain diff MR !6296
Comments
Comment #1
jbrown commentedThis is D8 material.
Comment #2
ogi commentedsubscribe
Comment #3
rbayliss commentedI lhad no idea json_encode() was that much faster. Still, json_encode's inability to restore objects as anything but stdClass makes this less worthwhile IMO, especially now that we have the class registry. Maybe it's worth looking into alternative serialization for specific places we store large simple arrays though. I'm looking at drupal_get_schema() and _theme_load_registry(). We can always json_encode() (or whatever function) the data before passing it into cache_set() as long as we know how to decode it on the way back out.
Comment #4
damien tournoud commentedI have done some limited benchmark with serialize / igserialize / json_encode on PHP 5.3, and it seems that serialize is not slow anymore.
Comment #5
keichee commentedplease download the benchmark at
https://github.com/jbergstroem/instabench
please download the php version PHP 5.4.0alpha1 (latest version) in PHP
The latter handles most of the PHP data structures, while the former only handles Javascript arrays (a type of data that doesn't exist in PHP) and Javascript plain objects (that are relatively similar to PHP arrays).
> i think there are two PHP data structures, arrays and objects which json handles better
The latter is binary safe, while the former only handles valid Unicode strings (try json_encode(chr(255)), for example
> if there will be error in the code, i will later patch all invalid unicode strings to json friendly version
please do see the benchmark as serialize are intensively used by drupal, if this benchmark is correct, drupal will be 2.7x faster
i definitely want to choose 2.7x faster drupal and later problem unicode strings like chr(255)
how about others?
please check attached patch if will succeed, many thanks
others please post your benchmark, and let this patch succeed!
imagine, almost 3x speed than the latter. wow!
Comment #6
keichee commentedmore benchmarks, raw screenshots attached
imagine up to 4x speed of drupal than the latter, wow!
please test my patch, many thanks!
Comment #7
damien tournoud commentedThis is just a stupid benchmark. Take anything non trivial (like the attached - we serialize a node generated by Devel Generate), and you will get:
Comment #8
damien tournoud commentedYou can even better results when testing the
unserialize . serializeroundtrip.Comment #9
keichee commentedcan you review my patch if it will return success?
my goal is to change serialize/unserialize to drupal_json_encode/ drupal_json_decode
to make encoded versions in blobs in database more readable and more html5 and javascript friendly
against serialize versions.
there are no native serialize out there, but json can be directly readable to javascript.
added some tests in simpletest so it will return success.
Comment #11
sun#921300: Use json_encode/decode() instead of un/serialize() has been marked as duplicate.
Lessons learned from that issue: The problem with JSON is that it doesn't allow for mixed non-scalar data types. It also does not support 1) serialized class instances and 2) PHP references.
But Drupal stores a lot of serialized data containing mixed data types. The most obvious use-case is:
And that's just one of many examples throughout Drupal core. That's why I stopped working on aforementioned issue.
The only viable option I could see would be to introduce an optional flag, in particular for cache_set(), that would allow the caller to say "I absolutely and definitely know that this data is an array and doesn't contain any objects or references, so you may use JSON for serialization." -- combined with a new or revised "serialized" schema column that tells how to unserialize for each record.
Comment #12
star-szrClosed #1340260: Support for igbinary as a duplicate of this issue.
Comment #13
quicksketchThe topic of this issue doesn't really match the patch in #11, so I'm moving this back to active. I also get the feeling that the performance of unserialize() may depend heavily on which version of PHP you're running. In 2010, PHP apparently made an adjustment to HEAD that sped up unserialize signficantly. In that issue at least they say it couldn't be included in PHP 5.3 for compatibility reasons, so I'm guessing it's only in PHP 5.4.
I was doing research into the igbinary serializer today and found this issue. Against PHP 5.3.22 I didn't find igbinary serialize to be that much faster. According to XHProf, igbinary was only marginally faster (with its "compact_strings" option turned on).
However, the *size* of the serialized data was significantly reduced. Most entries in the cache_* tables were reduced by about half, and entries that had a lot of redundant data (such as cache_form) reduced entries to a 1/3 - 1/4 the original size. Larger reductions are seen the larger the form being serialized.
So overall, yay, I think I like igbinary_serialize() and igbinary_unserialize(). Combined with the reduction in overhead of transferring this data to the database or memcache, you could get some modest improvements.
However my opinion on this is that making the serializer customizable (per this issue title) is unnecessary. unserialize() is already getting faster in newer versions of PHP (apparently, I don't have numbers on that), but more over, if you want to replace the largest uses of serialize/unserialize, you can do so already by providing a custom cache back-end. Alternatively I think it would be a great idea for existing cache backends (Redis, Memcache, etc.) to support using igbinary_serialize() automatically if it is available. The largest unserialization strings we have (and most of them) are coming from cache_get(). Since this is already swappable, I don't think any additional effort is required on the part of core.
Comment #14
thmnhat commentedHey, sorry for digging this up but has anyone had solution for igbinary support? I'm thinking about swappable handler also, just like quicksketch mentioned above but still don't know how to implement it. Because when the cache is built with native serializer, and right after that we get and set new cache with igbinary, everything will be broken
I already made a copy here https://github.com/in2pire/drupal-igbinary just for research. Replaced all serialize() and unserialize() with igbinary functions.
Hope that we can find some ways to add the support for igbinary soon.
Comment #15
grom358 commentedI was talking to Pierre Joye at DrupalSouth and he mentioned we should not be using serialize due to a) it can break between PHP versions. b) its not secure. After this discussion it should perhaps be raised to higher priority.
Comment #16
Anonymous (not verified) commentedSee the contributed module for Drupal 7: https://www.drupal.org/project/igbinary_cache
Comment #18
DaPooch commentedigbinary_cache is a step in the right direction but there's a lot of other contributed modules that use serialize/unserialize function calls directly that could benefit. Truthfully it's unfortunate that there's not a direct drop-in replacement on the PHP level to handle this. Seems like it would make the most sense to do it there so that the syntax for serialize and unserialize could be independant of the actual library doing the serialization. They already do this with the session storage option would be nice to expand it into the functions themselves too.
Comment #20
hchonovComment #21
mkalkbrennerI created https://www.drupal.org/project/igbinary
It enables igbinary for Drupal's key value stores.
The next step would be to create a set of core patches to delegate serialization to services implementing Drupal\Component\Serialization\SerializationInterface instead of calling serialize() and unserialize() directly. In core they should leverage the PHPSerialize by default, just like the key value stores.
Comment #22
mkalkbrennerComment #24
mkalkbrennerDrupal-like hard-coded dependency injection :-(
Comment #25
mkalkbrennerComment #27
mkalkbrennerSorry, #24 was incomplete.
Comment #28
hchonovComment #30
mkalkbrennerComment #31
tstoecklerI think this looks great. Does it make sense to introduce a
CacheBackendBaseandCacheBackendFactoryBasefor this?Comment #33
mkalkbrennerI don't think so. Other Backends don't need that serialization. For example APCu handles serialization by itself (and could be configured in php.ini to use igbinary).
Comment #34
mkalkbrennerComment #36
mkalkbrennerComment #38
mkalkbrennerComment #39
mkalkbrennerComment #41
mkalkbrennerComment #43
mkalkbrennerComment #45
mkalkbrennerComment #47
mkalkbrennerComment #48
mkalkbrennernitpicks
Comment #49
hchonovKernel tests have access to the container so instead retrieving the php serializer through \Drupal:service you should use $this->container->get.
Beside that great job!
Comment #50
mkalkbrenner@hchonov: your comment is correct. But all the code nearby in the tests uses \Drupal::service(), for example:
For me it looks inconsistent to use $this->container-get() just for the serializaation service there. Maybe all of these service() calls have to be replaced, but that seems to be out of scope for this issue.
Comment #51
hchonovThis could be done without array_filter :
$this->traitSleep();
return ['_serviceIds'];
Also we need a comment here why simply returning $this->traitSleep(); is not sufficient.
Comment #52
mkalkbrennerThat's not true. This code always returns _serviceIds even if $this->traitSleep() didn't add that property!
Comment #53
hchonovBut it will add it always.. and thinking about it is correct for the method to add the property always as otherwise you do not have to use the trait. Take a look at DependencySerializationTrait::__wakeup it does not check if the property is set but accesses it directly.
Comment #54
mkalkbrennerI reviewed the code. You're right the property _serviceIds is always set.
Comment #56
mkalkbrennerre-roll against 8.3.x
Comment #58
mkalkbrennerThat's funny. I had to fix all new usages of the MemoryBackend in the tests. But these were erroneous anyway because they use the obsolete parameter 'bin' which was removed in 8.3.x.
Comment #59
hchonovI think the comment in the overridden sleep method should be something like this :
What do you think?
Comment #60
mkalkbrennerI randomly picked the key "locale.translation_status" from the key_value table from an existing installation. The size of the php serialized string is 27k the igbinary serialized is 7k :-)
I'll go on to get some more numbers and update the issue description.
Comment #61
mkalkbrennernew in 8.3.x setiings:
For sure this feature will benefit from using igbinary if this core patch is committed because of the much smaller memory footprint.
Comment #62
mkalkbrennerComment #63
mkalkbrennerAfter a discussion with @catch in IRC I came up with the idea to define an interface that ensures that only serializers that are able to handle PHP objects could be injected.
Comment #65
mkalkbrennerOK, the tests pass, except the ones that currently broken on the 8.3.x regardless of this patch here.
Comment #66
twistor commentedSeems like this patch could be simplified greatly if PhpSerialize was set as a default argument. At least for the MemoryBackend.
It's also a BC break, but not sure how those are counted.
Comment #67
mkalkbrennerDo you have an idea how to do this?
Comment #68
mkalkbrennerComment #69
twistor commentedComment #70
mkalkbrenner@twistor: Sure, that works. But my quick comment wasn't precise enough, sorry :-(
I agree that this will reduce the size of the patch. But that only affects test code.
But such a constructor will open the door for bad usages of the MemoryBackend if it could be instantiated without thinking about the serializer.
Especially for the MemoryBackend I want to inject a compressing serializer.
And the serializer service is a singleton. There's no need to create multiple instances of it - beside in test code.
So the correct default would be to get the serialization.phpserialize service. But this will not work in the test code when the service is not available.
Conclusion:
- Having this default will not simplify the code base, just the test code.
- Having this default allows developers to accidentally bypass the configured serializer.
What we can do is providing a derived TestMemoryBackend within the test code.
Comment #71
mkalkbrennerI adjusted the patch according to a discussion with @catch in IRC.
I renamed the interface to ObjectAwareSerializationInterfacea and reverted the changes to the MemoryBackend.
A new MemoryBackend will then be provided by the igbinary module.
Comment #72
mkalkbrennerComment #73
mkalkbrennerComment #74
mkalkbrennerJust opened the follow-up like @catch suggested.
#2839862: Ensure that serializers injectable into Drupal\Core\KeyValueStore are aware of PHP objects
Comment #76
mkalkbrennerJust the annoying standard 8.3 CI fails. The patch itself passes as the retest proves
Comment #77
berdirComing from #2143149: PHPRedis and igbinary support for read performance and memory reductions..
what about defining a new service that's specifically for this purpose? Something like serialization.object_serializer or some other generic service name. Then your igbinary module could just override that service and anything that uses it will automatically benefit from it and it won't need specific redis and $other_module integration?
Finding a good name for that might be a challenge.
I could imagine this is a class that has some custom subclasses in contrib and changing the constructor could break them.
what we did recently is to add the new argument last and fall back to \Drupal::service() if not provided.
Comment #78
mkalkbrennerThat sounds like a simple solution but I don't think that it would be a good solution. With the igbinary module installed you have 4 object aware serializers:
For redis it is very important to select a compressing serializer. For the default database cache backend and the key value stores you might want to choose compression as well. But I think you'll sooner or later face a situation where you don't want to have compression for a particular sub-system.
Or you use igbinary and compression for redis but you have another component that somehow shares serialized objects with a different system or exports serialized objects. In this case you should use the default PHP serializer without compression for the maximum compatibility.
I thought about that as well. I did it the current way to follow the given pattern of the key avlue store implementation that already exists in core. But if the maintainers prefer this solution I'm fine to change it.
Comment #79
berdirI think we might be able to "solve" that with a service name that explains what it is. serializer.optimized_internal sounds weird, but something in that direction.
I assume the developer that defines a service knows whether he can use that or needs to use phpserialize. And if we have a name that explains thoes developers when to use it and when not, then I don't see a problem with your use case?
Comment #80
mkalkbrennerBut if you introduce that service at least the key value store and the database cache backend will use it in core. And redis in contrib :-)
In this case - just like you proposed - it would be easy for the igbinary module to swap that service and to enable the "igbinary compressed" serializer for all of them.
But if you use mongodb with it's internal compression for the key value store and redis for the cache you need to different flavors of that service.
I know that these are edge cases ;-)
Finally that additionally service won't hurt and if others agree, I'm fine to introduce it.
Comment #81
mkalkbrennerI prepared a version of the patch that reflects @berdir's suggestions.
Comment #83
mkalkbrennerComment #84
mkalkbrennerComment #85
berdirAs commented in IRC, lets keep the explicit argument here, at the new position. The fallback is BC, and we don't want to rely on our own BC, so it will be easy to remove in 9.x
Comment #86
mkalkbrennerComment #88
mkalkbrennerComment #90
mkalkbrennerComment #92
mkalkbrennerCome on CI ...
Comment #93
wim leersHow was I now aware of this issue all this time?! :O
Awesome, how have I never heard of
igbinary?!Let's add a comment to this to explain the purpose of this alias.
This name sounds a bit strange to me. What's "object-aware" about this? It seems like it's just a serializer capable of handling any PHP data type, there's nothing object-specific about it?
Also, what does this add on top of
\Drupal\Component\Serialization\SerializationInterface?The comment does not make this clear at all for me.
"serialization class" vs "serialization interface" vs "serializer"
Let's please be consistent.
Why make this optional? For BC? https://www.drupal.org/core/d8-bc-policy says constructors do not need to maintain BC.
Comment #94
berdir4. We technically don't, but this is becoming a standard pattern, specifically in classes that are likely to be overriden by contrib. I know there is at least one customized cache backend in contrib that implements a max ttl, which likely overrides this. It might or might not have additional constructor arguments, not sure.
Comment #95
mkalkbrenner@Wim Leers:
2. @catch and myself agreed on the requirement for such an interface and that Interface name. Have a look at #71. JSON and YAML are suitable for other PHP types except objects. @catch also requested a follow-up to leverage the new interface for the key value store. See #2839862: Ensure that serializers injectable into Drupal\Core\KeyValueStore are aware of PHP objects
3. This inconsistency already exists in core. Have a look at the key value stores which are the blue print for the patch here. If we decide to clean that up, it should be a larger follow-up.
I agree on 1 but there should be a decision on 4 before a final patch.
Comment #96
wim leers#94: Yep, for a cache back end, that's fair.
#95.2: that sounds like a great reason… but can you then just add that reasoning to the interface itself as well? Then this would've been instantly clear :)
#95.3: Ok, but then let's at least be locally consistent.
Comment #97
mkalkbrennerAs discussed with @berdir in IRC there's no need for an object aware serialization default service in core and for example the redis module. Each component should simply use the serialization.phpserialize as it is. That's the defacto default.
I think I now addressed all concerns.
Comment #98
wim leersJust one incredibly stupid nit. I'm fine with this now. I'll leave it to @berdir to RTBC.
s/serilization/serialization/
Comment #99
mkalkbrennerFixed the nitpick :-)
Comment #100
berdirOk, lets do this. Will have to try out the igbinary module on our sites :)
Comment #101
berdirThat said, I think this belongs in the cache component.
Also not sure if this needs a change notice or not, e.g. for the new interface. It is a new thing, but is also something that very, very few people will need to worry about I think, so might be more noise than necessary?
Comment #102
berdirComment #104
tstoecklerThis failed on the new test introduced in #2793849: Handle offcanvas differently at lower widths. The question is: Is that a new random failure or is that related in some super bizarre way? Sending for a re-test anyway...
Comment #105
tstoecklerAhh, that was reverted now, so assuming this will come back green then.
Comment #107
mkalkbrennerRetest suceeded.
Comment #109
tstoecklerBot vs. human: Episode III
Comment #111
catchI'm still not 100% sure about this, for a couple of reasons, didn't really think of the second one during the irc conversation referenced above:
- This adds an additional dependency for the cache subsystem, which in the default case is required entirely to call serialize(). That's loading an interface and a class to call a PHP native function in 99.9% of cases. If we ever do a subtree split, that'll be even more obvious.
- I'm not sure how many classes we have to load to serve a cached page with the database backend, but as above this adds at least one interface and one class to that number. On a site not using the APCu class loader (and using the database cache, so not that unlikely) this might even be measurable.
Seems like we could refactor the serialization logic in the database cache backend to make it easier for the igbinary contrib module to extend from as an alternative.
Comment #112
mkalkbrennerWe use Redis for most of the caches.
The igbinary module simply swaps the serialization.phpserialize service to do the job for the database and the redis backend at once. The same would happen for any other backend that uses the serialization sevice.
From my point of view this default isn't a good choice at all. For a lot of use cases it makes a lot of sense to at least compress the serialized data. That hard-coded call to serialize() prevents us from optimizing the caching.
Comment #113
hchonovRe-roll only.
Comment #114
catchIt doesn't prevent it though. We could refactor the database cache backend so the serialize() call is in a method, then igbinary module could subclass and change that one method. Even as it is now it'd be possible to subclass, just not very efficient due to amount of duplicated code necessary.
Comment #116
kfritscheSimple re-roll of #113
Comment #118
kfritscheFixing the failing test.
Comment #120
borisson_This looks very similar to #2886405: Cache backend should use the serializer service., do we close one of them as duplicate?
Comment #121
Andre-BSome related work is being done over at #1281408: Add a compressing serializer decorator as well.
Comment #122
wim leersAdding related issues per #120 + #121.
Comment #123
fabianx commentedThis is just an idea, but perhaps to ensure something like #1281408: Add a compressing serializer decorator is possible, the serializer should potentially have control of the 'serialize' data field of the calling code.
So that several chained serializers can store their formats and don't need to create new objects, which need to be serialized again.
Comment #124
mkalkbrennerI just want to mention that https://www.drupal.org/project/igbinary is prepared for this patch here since 18 months.
Comment #125
andypostI think it's really bad idea to add this extra overhead (function calls + loading service) to every cache operation
PHP has common way to define serialization and I remember a lot of issues when "milti-head" was configured differently for php serializer
Making another wrapper for build-in php stuff makes it hard to understand/learn for other newcomers
If you wanna make serializer configurable - it require to replace all places where
serialize()orunserialize()used + all contrib which rely on itComment #126
fabianx commentedI think it is fine to allow contrib to override this easily, but what about we add an:
instead of injecting in core.
And yes switching serializer on the fly is always asking for trouble.
Comment #127
mkalkbrennerYou can swap the serializer for igbinary this way, but I don't think that you can replace the php serializer by a compressing one this way.
There're multiple serializers with fallback strategies within https://www.drupal.org/project/igbinary
Using them you don't crash your site by swapping the serializer!
I guess, Drupal\igbinary\Component\Serialization\PhpCompressSerialize will be sufficient for most people.
I don't agree. Having compression enabled gave us a performance boost regarding the complete stack (not just Drupal, but redis and DB caches).
As you might have noticed, there already exists such a service in core for a long time! It's simply not used in every place.
As far as I remember one reason for that was, that you want to ensure in some places that nobody injects a serializer that is not suitable for PHP objects. That's why the patch introduced the ObjectAwareSerializationInterface. Just have a look back in the comments.
Comment #128
tstoecklerRe-roll for #2853503: Remove all assert('string') calls from Drupal core because deprecated in PHP 7.2.
Comment #129
daffie commented+1 for me. Great idea to make the backend more flexible. I also agree with @andypost that this issue needs profiling.
Comment #130
claudiu.cristeaProbably we should make clear here that this param is optional only for BC reasons. Then we need a
@todoand D9 followup to make this param mandatory in D9.Comment #132
fabianx commentedI profiled this and the overhead (which is micro optimization) is well worth the win -- as many more sites have igbinary available out of the box and that saves 50% on serialization operations, which is pretty huge.
Therefore and by reviewing this patch:
RTBC
Comment #133
fabianx commentedI created #3014514: Make igbinary the default serializer if available, it saves 50% time on unserialize and memory footprint as followup.
I created #3014511: Increase performance and scalability of the cache subsystem as parent issue.
Comment #134
claudiu.cristeaFixed #130, added #3014548: Make the serializer mandatory param for DatabaseBackend and DatabaseBackendFactory as Drupal 9.x followup.
Comment #135
larowlanWe should be triggering a deprecated error here so that when we come to make it mandatory in D9 we've been warning people for 2+ years. We'll also need a legacy test for that.
Other than that, this looks great
Comment #136
claudiu.cristeaAdding the deprecation stuff.
Comment #137
claudiu.cristeaImplemented #135. Removed the reference to follow-up as is not a standard deprecation policy. We rely on error triggering.
Comment #138
jibranBack to RTBC as #135 is addressed.
Comment #139
tstoecklerThanks @claudiu.cristea!! Also wrote a change notice pretty much at the same time in order to help get this issue across the finish line... ...so we now have two. 🙈 Feel free to delete mine, whoever was the powers to do so.
EDIT: Well, technically, we now have three, because I also wrote a change notice for
ObjectAwareSerializationInterfacebut I think that one has merit on its own, so I think we should keep that one.Comment #140
larowlanAdding credit for those who reviewed to shape the patch, profiled, wrote change records
Comment #141
larowlanthere was no answer to #114 here
Comment #142
catchAlso the issues I raised in #3014514: Make igbinary the default serializer if available, it saves 50% time on unserialize and memory footprint could be solved by having distinct database tables, which lends towards a subclass too.
@Fabianx did you profile page cache hits without igbinary?
Comment #143
berdir@catch: I suppose an implementation that *dynamically* picks the better serializer has a much bigger issue with having to deal with it changing on the fly. If you do it based on manual configuration, it's not that crazy to expect to properly clear caches.
But if we want to, we could also make the table name based on the serializer class dynamically, e.g. check if it's not the default and in that case, use a few characters of the hash of the fully qualified service name or so?
Subclassing wouldn't just require a different class but also a different factory class. The advantage of using the serializer service is that the same can be used for different cache backends, redis already supports this as mentioned above.
Comment #144
fabianx commented#142 I did and the overhead for the new service injection is neglectable (compared to what Symfony base has as overhead already).
For #114:
As that is static configuration you should be matching that across environments. Also as we live in the "age of docker" it is much more likely to have local dev envs match production.
In the worst case an automatic igbinary service could add the format as a one number thing at the start.
For a cache it's not a big problem if it's invalid information as it can just return FALSE.
Though that is the problem, okay as FALSE is still valid for
$item->data.Hmm, so what we would need is that we can return an invalid value from ObjectSerializationInterface or throw an Exception that we can catch?
e.g. like a
class ObjectSerializationInvalidValuethat could be used to check that the result is valid?And as we are introducing a new interface it would be indeed good to get that right.
Because the compressor also has the same problem that it might not be able to decompress the data, if e.g. bzip2 extension is not available, etc. and then still needs to return something sane.
For the cache we can rebuild, but what about other uses where we store permanent non-recreatable data in serialized format?
I think if the object serialization interface says that you need to match it OR ensure you can handle decode failures, then that is okay.
I am setting to "Needs work" for potential interface changes.
Comment #145
mkalkbrennerWe use igbinary serialization in production for two years now using this core patch and a special branch of the drupal igbinary module: https://cgit.drupalcode.org/igbinary?h=serialization.objectaware.default
We use it for every kind of serialization in Drupal, like caches and the key value stores! If you have a closer look at https://cgit.drupalcode.org/igbinary/tree/src/Component/Serialization?h=... you'll see that I implemented a simple Fallback mechanism to deal with "old" entries that have been serialized earlier in a different format. If required, we can move these Serializers into core, too.
In the current mix of patches, some (small) parts are tricky to use igbinary everywhere. Maybe that could be solved in an easier way if we have everything in core. Meanwhile we use these settings to leverage igbinary early ;-)
Comment #147
andypostRe-roll and clean-up
Comment #148
batkorsubscribe
Comment #149
andypostFixed deprecation
Comment #151
andypostFix broken tests and paraphrased messages to fit new coder's standards
Comment #153
andypostFix message for factory
Comment #155
geaseSimple reroll against 8.9.x.
Comment #157
fabianx commentedI think once this passes tests again, it can go back to RTBC. This only adds the serializer for the cache, where re-creating values is not an issue.
Comment #158
geaseTest fixed by updating cache_factory service registration for kernel test.
Comment #159
geaseComment #160
fabianx commentedBack to RTBC
Only thing that might warrant a fallback thingy is that this breaks BC for anyone overriding the bootstrap container as there the PHP serialization service obviously does not exist.
Do maybe hardcoding the class to use is better instead of \Drupal::service() as it was hardcoded before?
Comment #161
Andre-B#160 maybe this should be coordinated with the modules which (might) do this: memcache, redis..?
Comment #165
hchonovI had to update the deprecation messages as this might not get committed in 8.8.0, therefore I changed the version from 8.8.0 to 8.8.x.
I am not sure whether the modules override the bootstrap container themselves or the site owners in their
settings.phpfiles.I would rather add the PHP serialization service to the retrieved bootstrap container definition. We could achieve this either by adding it directly or by merging the default bootstrap container definition into the retrieved one. I would prefer merging the bootstrap container definitions, as this is a more general solution and will allow us further adding new services to the bootstrap container definition without worrying about overridden bootstrap container definitions.
Here I upload 3 patches:
Comment #166
fabianx commentedMy feeling is the explicit declaration is better. Leaving decision to framework managers.
Back to RTBC
Comment #167
gnunes commentedI've re-rolled the patches for Drupal 9 and 8.9.
The interdiff contains the resolved conflict in DeprecationListenerTrait
It needs review, please
Comment #168
hchonovBack to RTBC as those are simply re-rolls without any functional changes. Thank you, @gnunes!
Comment #169
hchonovComment #170
alexpottThis is a bit tricky. In the Drupal 9 patch we shouldn't have the deprecation... but making the serializer argument not null is not possible because there are nullable arguments before them. In order to make it required we're going to need to swap arguments around which is possible by removing argument types and doing deprecations but it is not simple.
This needs thinking about.
Also the whole bc-bootstrap-service vs merge-bootstrap-definitions vs doing nothing needs to be chosen before this is RTBC. And if we decide to do merge-bootstrap-definitions then I think we should handle that separately in a blocking issue so we can add explicit test coverage. Putting my framework manager hat on I think ensuring all the services we think should be there are there is important. The bootstrap container is supposed to provide stable place from which to work and this has uncovered the unsurprising that replace-ability means that we need to do more work to make this so.
Also for now I'd target 9.0.x and not worry about 8.9.x and we should handle a backport if and when we get to that.
Comment #171
gnunes commentedI've removed the deprecation warning for 9.0.x
It needs review, please.
Comment #173
gnunes commentedI've removed the tests for the deprecating strings.
It needs review, please.
Comment #174
gnunes commentedFixed syntax mistake that prevented the test from successfully executing.
Comment #175
andypostit needs trigger_error() to point this parameter will be required
it could use PhpSerialize::class syntax
please wrap this long lines
this 2 unused https://www.drupal.org/pift-ci-job/1581642
Comment #176
hchonovRe #839444-175: Make serializer customizable for Cache\DatabaseBackend.1:
Actually it needs to be required instead of having the fallback - as per @alexpott's comment the deprecation should be removed in D9 - #839444-170: Make serializer customizable for Cache\DatabaseBackend . Therefore we need to remove the fallback and make the parameter required.
Comment #177
andypost@hchonov thank you, misread. Then it needs post update hook to require cache rebuild (container changed)
Comment #178
hchonovActually if we make it required then we will need to re-arrange the constructor arguments, because the previous one is optional. This is not nice, but it should be done. Also the service parameter should be placed before the setting parameters.
However if we make it required, then the cached definition will not have it and therefore it will break while trying to initialize the service. How do we solve this in core? I could think only of leaving it optional in the release where it is introduced, but provide an update for invalidating the container, so that after the rebuild the container will contain the new service definition. Then in a later release we could safely make it required.
Comment #179
xjmThis is a minor-only addition. Since 8.9.x is now in beta, I'm moving this to 9.1.x. Thanks!
Comment #181
andypostComment #182
ravi.shankar commentedAdded reroll of patch #174.
Comment #183
suresh prabhu parkala commentedTried to fix custom command failures. Please review.
Comment #184
sokru commentedResolving cspell notices.
Comment #185
andypostSymfony using term marshaller in context of cache (latest 5.3beta) https://github.com/symfony/symfony/pull/40602/files
Btw needs work to pass pre-checks
Comment #186
kapilv commentedFixed Custom Commands Failed.
Comment #188
wellsAnother attempt to fix cspell fail.
Comment #190
andypostneeds deprecation to be added according to https://www.drupal.org/about/core/policies/core-change-policies/drupal-d...
the same
could use PhpSerialize::class
Comment #191
andypostFixed #190
Comment #192
daffie commentedAll the code changes look good.
The IS and the CR's are in order.
The patch allows the igbinary module to override the default implementation.
For me it is RTBC.
Comment #193
quietone commentedComment #194
quietone commentedRe-uploading patch so test run with 10.0x
Comment #195
catch#839444-170: Make serializer customizable for Cache\DatabaseBackend is still unresolved - the argument order needs to be swapped, so that the parameter can be required in a later release.
I'm also still not convinced (see #839444-111: Make serializer customizable for Cache\DatabaseBackend from 2017) about adding a dependency on the Serialization component in the cache system, since the serialization component has a dependency on Symfony YAML.
If we wanted to move Drupal\Cache to a component, would we really want symfony/yaml as an implicit dependency? This could be solved by putting the interface somewhere else that's not that component. Symfony itself has had this problem, which is why it introduced
symfony/contracts.Comment #196
mkalkbrennerAfter a long time just living with the patch I really want to see that committed.
Using other serializers is such a performance boost.
I suggest to simply remove the deprection message entirely and to silently fallback to the default serializer if no other one is set. That is perfectly backward compatible and doesn't block anything new.
Comment #197
mkalkbrennerSorry, I missed the second occurrence.
Comment #198
mkalkbrennerComment #199
mkalkbrennerThe patches don't apply to D10 anymore.
I can port the patch again, but it would be great to have feedback on the version in #198 first and if this would be accepted.
Comment #200
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #201
catchThis is still the case, but I had another look and I don't think it should hold back this patch. It makes sense to use the serialization component here. If we don't want the dependency on Symfony YAML, we should consider splitting the YAML serialization into a separate SerializationYamlComponent, or making the YAML dependency suggested instead of required - but that wouldn't change anything here.
So to me, #198 looks good.
Comment #202
mkalkbrennerComment #203
mkalkbrennerComment #205
mkalkbrennerComment #208
mkalkbrennerUpdated patch for Drupal 10.1.5
Comment #209
mkalkbrennerComment #210
mkalkbrenner"No space left on device"
Drupal CI is down.
Comment #213
volegerYes, we have GitlabCI now. Rerolled #209 in the MR.
Comment #214
andypostone test fails
Drupal\KernelTests\Core\DependencyInjection\AutowireTestComment #215
volegerI left a couple of comments.
Comment #217
vakulrai commentedComment #218
mkalkbrennerThanks for fixing the tests!
@catch already agreed on the patch in #201 and we use it for years on multiple production sites. So, I "carefully" set it to RTBC.
Comment #219
kristiaanvandeneyndeNot going to weigh in on whether this can be approved as @catch has already done so, but setting back to NW for a small nitpick. Feel free to RTBC again once that is fixed.
Comment #220
andypostfixed version string and converted properties to constructor promotion
Comment #221
kristiaanvandeneyndeFrom a code perspective this looks good. Conceptually, I haven't been involved in this discussion at all so I'm still rolling with the approval given in #201
Comment #222
volegerThere is still a problem with argument ordering. The required method argument is placed after the optional. This is against the coding standard because this leads to optional argument becoming required.
Comment #223
andypost@voleger which argument? All added are optional, and last commit fixed it
Comment #224
kristiaanvandeneyndeI don't see it either, to me it shows as the last argument and is marked as optional. If #222 was a mistake than feel free to set back to RTBC
Comment #225
volegerThe deprecation message says that $serializer becomes a required field in drupal:11.0.0. The problem is that the previous argument $max_rows has to remain optional in drupal:11.0.0. We need to keep $max_rows optional.
See https://git.drupalcode.org/project/drupal/-/merge_requests/6296/diffs#51...
The same thing is related to another constructor.
See https://git.drupalcode.org/project/drupal/-/merge_requests/6296/diffs#51...
$settings have to be optional in drupal:11.0.0.
More about required arguments after optional https://php.watch/versions/8.0/deprecate-required-param-after-optional
Comment #226
andypostThank you! Good point, it requires to shif argument somehow
Comment #227
andypostBoth arguments added in #2526150: Database cache bins allow unlimited growth: cache DB tables of gigabytes!
So very probably it was just happened before out deprecation policy formed (core 8.4) https://www.drupal.org/node/2891281
So I reordered
$max_rowsand deprecated to create without settings https://git.drupalcode.org/project/drupal/-/merge_requests/6296/diffs?co...Comment #228
volegerThanks, the changes are reasonable. Good point on pre-deprecation policy changes - agree that settings need to be required.
+1 for RTBC
Comment #230
catchNice to get this done after (checks notes) 14 years!
Let's keep going in #3014514: Make igbinary the default serializer if available, it saves 50% time on unserialize and memory footprint.
Comment #231
alexpottThis causes warnings when doing a site-install using Drush...
It's because Drush has a DrupalKernel without a container so we should check for that too.
Opened #3419230: Drush error on site-install to fix this.
Comment #232
acbramley commentedThis issue conflicts heavily with #3113971: Replace REQUEST_TIME in services - one thing we did there was handle string typed $max_rows in DatabaseBackend which this issue didn't handle. That could lead to a BC break.
See https://git.drupalcode.org/project/drupal/-/merge_requests/4302#note_259560
Comment #233
catchLet's either open a quick follow-up for string-typed max rows or roll this one back. Just re-opening for now.
One other review point from #3113971: Replace REQUEST_TIME in services that's also relevant here is we should add the serialization component classes that are loaded on a regular bootstrap (or especially page cache hit) to the core/composer.json classmap so they're optimized in the autoloader.
Comment #234
acbramley commentedOpened #3419352: DatabaseBackend doesn't handle string typed $max_rows with $serializer I'll get an MR going.
Comment #235
catchOpened #3419354: Add relevant serializer classes to the classmap too.
Comment #236
acbramley commentedAnd a novice follow up for constructor prop promotion #3419356: Use constructor property promotion on $serializer in Drupal\Core\Cache\DatabaseBackend
Comment #237
andypostFollow-ups are filed, btw
$max_rowscould be fixed in #3113971: Replace REQUEST_TIME in servicesComment #238
acbramley commentedCRs for this issue are also still in draft (maybe that's right?) but do need version updates.
It also may be wise to combine the CRs around the new dependencies once 3113971 is in?
Comment #239
quietone commentedThanks to @acbramley who confirmed the three change records here are correct.
They are now published with correct dates.