Problem/Motivation
Annotations parsing is quite slow, but can be perfectly cached with FileCache (introduced by #2395143: YAML parsing is very slow, cache it with FileCache).
Proposed resolution
- Use FileCache for caching Annotations.
Remaining tasks
- Do it
- Fix test failures
User interface changes
- None
API changes
- None
Main Goal: Performance
Follow up from #2395143: YAML parsing is very slow, cache it with FileCache
| Comment | File | Size | Author |
|---|---|---|---|
| #36 | use_filecache_for-2447803-36.patch | 6.38 KB | fabianx |
| #36 | interdiff.txt | 485 bytes | fabianx |
| #27 | annotation_parser_file_cache.png | 99.55 KB | berdir |
Comments
Comment #1
anavarreComment #2
fabianx commentedComment #3
amateescu commentedThis is the most important followup to #2395143: YAML parsing is very slow, cache it with FileCache so I want to try and see if my idea that we need to store a serialized annotation works better than the initial version of this code from #2395143: YAML parsing is very slow, cache it with FileCache.
Comment #5
amateescu commentedFound the problem! Since image toolkit operation plugins were using a sub-namespace of image toolkit plugins, the file cache was returning image toolkit operation plugins next to image toolkit plugins, so
ImageToolkitManagerwas getting plugin definitions that actually belonged toImageToolkitOperationManager.Comment #6
wim leers@amateescu-debugging-skills++++
Comment #7
neclimdulis this acceptable? Seems like a regression and moving files seems pretty disruptive.
Comment #8
amateescu commented@neclimdul, investigation from @Berdir in the parent issue seems to suggest that using FileCache for annotations bring a very nice improvement, see #2395143-6: YAML parsing is very slow, cache it with FileCache and #2395143-131: YAML parsing is very slow, cache it with FileCache, so IMO the disruption is acceptable.
Comment #9
neclimdulI guess that's an argument for the disruption. is it valid for the regression?
Comment #10
fabianx commentedAfter discussing in IRC, an alternate approach would be to:
- use one prefix / collection per plugin manager asking for annotations in the file cache
That way we should avoid the mix-up in the caches and be more clean.
A cache really should not have side effects when used so that a renaming is needed: That is a bug in the cache or usage of the cache, not of the classes itself.
Comment #11
amateescu commentedOf course, it's much better to do it like this. At that time I was just too excited that I found the underlying problem so I wrote the first fix that came to mind :)
Comment #12
berdirYeah, doing nested folders like that seems problematic anyway but that's a separate issue, great if we can solve it like this :)
We should probably profile this a bit.. for example a cache clear in the UI or an installation to see how much we save.
Comment #14
amateescu commentedSadly.. not that much :/
I did quick`n`dirty benchmark with
time drush site-installwith an apache2 restart before each run so we get cold caches. These are the results:Without patch:
drush sis-sq 35,09s user 0,41s system 98% cpu 35,897 total
With patch:
drush sis-sq 33,12s user 0,45s system 99% cpu 33,739 total
So we save ~2 seconds in the installer :)
Comment #15
dawehnerIt feels a little bit ugly that we have to setup the prefix manually ... what about doing that in UnitTestCase instead ...
Comment #16
amateescu commentedSounds great :)
Comment #18
amateescu commentedThis should do it.
Comment #19
fabianx commentedPatch looks great to me!
Just some background info on this:
The reason we need to explicitly serialize / unserialize things here is:
a) classes that are used in here
b) APC(u) uses its own serializer. In APCu later this was (due to some bugs) switched to PHPs internal serializer, but APC is still using its internal one. So I think we need to keep those serialize / unserialize calls.
Unless there is a good way to quickly check if an annotation has a rich object ...
Not sure that is worth it, but unserialize is slow ... And can as its purely internal be a follow-up.
--
drush benchmarks are problematic for FileCache - unless you use a more persistent backend (which we all removed for now).
In the best case the annotations are already present in the APC cache, but command line you always start with a fresh new cache ...
So 2s out of 32s for drush is great!
--
This would be RTBC from my side.
Comment #20
fabianx commentedOh, lets just do it ...
Comment #21
catchI think we should leave the serialize()/unserialize() call in, but they need an explanatory comment. Also when trying to write one, I couldn't see why we'd have that comment here:
It makes sense to have this, but why not serialize in the APC backend ourselves? I don't think the annotation discovery should be worried about APC vs. APCu serialize implementations.
If we don't like that overhead for other uses of the backend where it's not strictly necessary, we could provide separate APC vs. APCu backends (not sure that's needed, and definitely not here).
Comment #22
fabianx commentedYes, agree lets add the serialize / unserialize to ApcFileCacheBackend.
We can optimize this later like in the normal CacheBackends with a 'serialized' => 1|0 flag - if we want to - by checking if the serialized array contains O: or C: entries (IIRC). (more work on cacheSet / less work on cacheGet).
Comment #23
berdirThe seriaize/unserialize has nothing to do with APC.
It is there because sometimes those things are objects, we use serialize/unserialize explicitly to break object references.
See the original test fails #2395143-14: YAML parsing is very slow, cache it with FileCache (comment 14). We have alter hooks for example that based on a setting do something, but then we changed the settings, cleared caches and we go the previous, altered object back.
So yes, we should have a comment to explain this.
Comment #24
fabianx commented#23: Okay, that makes sense.
So #22 is wrong, this is needed also for the FileCache directly, but we don't want to serialize all.
Only the caller has knowledge if what we store contains something that has an object reference, which must be serialized / unserialized to re-fetch it e.g. from the container.
I think for this issue we should just add a comment.
Then think about maybe detecting if the serialized data contains objects and store differently with a flag to not unserialize ... (in a follow-up)
This is slower for cache-sets, but faster for all cache backends to retrieve as for simple data PHP can just return the data directly, JSON can be used and for APC we do not need to serialize twice.
Comment #25
effulgentsia commentedWhat alter hooks are running for this? In #2395143-182: YAML parsing is very slow, cache it with FileCache.8, I raised a concern about making sure no event listeners or hooks are involved in processing the data that is cached in FileCache. (Thanks for the docs addition to FileCacheInterface that made it into the committed patch related to that.)
Comment #27
berdirForgot about your question sorry.
The alter hook is the standard entity type alter hook. Because we pass the entity type definitions around as objects, then any change that happens on those object is reflected in the static cache in there.
Here's the impact of this with my install profile. Keep in mind that there's a big xhprof overhead with this kind of stuff, but I still think this is very much worth doing.
Ignore the memory stats, that's completely messed up somehow... the parent call that does little more than just calling this method shows a 300% memory increase, from -36MB to +76MB. Which obviously makes no sense at all ;)
Comment #28
catchComment #30
fabianx commentedI am revisiting this to make tests faster (#2747075: [meta] Improve WebTestCase / BrowserTestBase performance by 50%) as I found it bottlenecks my local development.
I added the missing comment and also the case when a file does not have annotations.
Comment #31
berdirLooks good to me. I've been using the patch from #18 for a year now and didn't have any issues.
Install time of my install profile:
HEAD: 5m 17s
Patch from #18 that I used so far: 4m 56s
Latest patch: 4m 52s
Numbers are fairly stable, second run with latest patch was 4m 54s.
Comment #32
alexpottI think the cache suffix should also include $annotation_namespaces since if these are altered the results could change even if the files have not.
Comment #33
alexpott$annotation_namespaces was added in #2600926: Allow annotations to inherit across namespaces
Comment #34
fabianx commentedYes, that is a good point.
I added a hash of the annotation namespaces.
Comment #35
dawehnerWe also have to adapt the composer.json file for the plugin component.
Comment #36
fabianx commentedIndeed!
Comment #37
dawehnerThank you @Fabianx!
Comment #38
alexpottCommitted 1da4d15 and pushed to 8.2.x. Thanks!