This is based on #402896: Introduce DrupalCacheArray and use it for drupal_get_schema() and also struggling with memory usage/race conditions/cache stampedes in #853864: views_get_default_view() - race conditions and memory usage.
CacheArray for the schema cache only optimizes the behaviour of that cache item on runtime. If you get a cache miss, then you still have to rebuild the entire schema, alter it, then cache it in a big lump. This is very memory intensive - both the cost of loading every .install file for every module - even if you have APC enabled .install files may well not be cached, as well as creating a very large array in memory, then writing the very large cache item is also not great, especially if it won't fit in a cache backend. Similarly, if we swapped that single cache item for lots of little ones, then you could be potentially writing 400+ cache items from that one request, which is also going to be time-consuming.
We can improve this by doing the following:
- Change hook_schema_alter() so that it only gets schema arrays at the scope of a module (or even table but that might be going a bit far), not the entire site.
- when there is a cache miss for an individual table, only rebuild the schema enough to find the definition of that table and cache it, then just don't bother looking for the rest.
- given that, we might be able to additional optimize that so that if you're looking for the 'node' table, it checks the 'node' module first, not all db tables are namespaced by module name, but lots are.
- the 'full schema' will need to be converted to a class that implement Iterator - then you can foreach over the entire schema, but we don't have to hold it in memory the whole time.
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous commenteddiscussed this with catch in IRC. i agree with making hook_schema_alter() change to only act on the table level.
then we can save each table's schema to a table (hahaha, recursion joke).
then we have a simple path for the read operations, and caching for a single table.
read operations that want the whole lot just select * from the schema table, no need for a full rebuild.
real rebuild is still there for install, new fields etc.
we can further optimise drupal_table_schema_save() to not write if nothing has changed during a rebuild.
here's a gist with a quick PoC implementation: https://gist.github.com/2823172
Comment #14
smustgrave CreditAttribution: smustgrave at Mobomo commentedWonder if this is still a needed task for D10?
Comment #15
catchWe're on our way towards getting rid of hook_schema() - marking duplicate.