Problem/Motivation

Views data has a cache item for all database tables + one per table, based on views hooks, this is a good candidate for cache_discovery because it's always going to be finite.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-3587797

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch created an issue. See original summary.

catch’s picture

Status: Active » Needs review

smustgrave’s picture

Status: Needs review » Needs work

Started a new pipleine to run tests and javascript seem to be consistently failing.

yujiman85’s picture

Assigned: Unassigned » yujiman85
yujiman85’s picture

Assigned: yujiman85 » Unassigned
Status: Needs work » Needs review
smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Test updates seem good

alexpott’s picture

Version: main » 11.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed d4014a7 and pushed to main. Thanks!

Needs a new MR for 11.x

  • alexpott committed d4014a7d on main
    task: #3587797 Move views data cache to cache_discovery
    
    By: catch
    By:...
berdir’s picture

It is finite, but can be very large. I'm concerned this will fill apcu on large sites and cause a regression due to more frequent resets

catch’s picture

I think that's another reason to do #3586418: Set explicit ttl in the apcu backend and allow a default to be set for chained fast but not necessarily to avoid making this change.

berdir’s picture

The problem with apcu is that when it is full, it just resets (expunge counter). So I'm not sure how much the ttl will help.

We saw that on a few larger sites when we added the \Drupal\monitoring\Plugin\monitoring\SensorPlugin\ApcuSharedMemoryExpungesSensorPlugin sensor, apcu just continuously emptied all cache and started from the beginning. It stopped/happens very rarely when we adjusted the max size to fit the project. That considerably hurts performance.

The core status checks does check for 75/90% full, but it doesn't check expunges, if you're >90% then it's quite likely that you're getting frequent expunges and you might not actually see it reach 90%. blackfire apm reports on this as well. It's not trivial to monitoring for this as you need to acccount for how it's been running. I think apcu complains if you have more than 1 expunge per hour or so.

The full bin should be used rarely to never on runtime. What if we just add the smaller per-table entries to apcu but not the big one?

catch’s picture

The per table entries is a good idea. We'll need to inject two cache bins but we can do that.

  • alexpott committed dcb51286 on main
    Revert "task: #3587797 Move views data cache to cache_discovery"
    
    This...
alexpott’s picture

Version: 11.x-dev » main
Status: Patch (to be ported) » Needs work

Discussed with @catch - #12 is a great idea. Thanks @berdir

berdir’s picture

(cross post)

I put together a pretty basic script, at first the numbers were surprisingly low then I remembered that I'm using igbinary which is significantly better on those kind of data structures as they are heavily repeating:

the script only works on redis and currently assumes that igbinary is used:

<?php

use Drupal\Core\StringTranslation\ByteSizeMarkup;
use Drupal\redis\RedisPrefixTrait;

class PrefixHelper {
  use RedisPrefixTrait;

  public function getPrefixHelper() {
    return $this->getPrefix();
  }
}

$client = \Drupal::service('redis.factory')->getClient();

$prefix = (new PrefixHelper())->getPrefixHelper();

$keys = $client->keys($prefix . ':default:views_data*');

$full_entries = [];
$table_entries = [];
foreach ($keys as $key) {
  $hash = $client->hgetall($key);
  $uncompressed = !empty($hash['gz']) ? gzuncompress($hash['data']) : $hash['data'];

  $unserialized = \Drupal::service('serialization.igbinary')->decode($uncompressed);
  $php_serialize = serialize($unserialized);

  if (substr_count($hash['cid'], ':') === 1) {
    $full_entries[] = [
      'raw' => strlen($hash['data']),
      'uncompressed' => strlen($uncompressed),
      'serialized' => strlen($php_serialize),
    ];
  }
  else {
    $table_entries[] = [
      'raw' => strlen($hash['data']),
      'uncompressed' => strlen($uncompressed),
      'serialized' => strlen($php_serialize),
    ];;
  }
}

echo "full entries:" . count($full_entries) . "\n";
echo "full entries size raw: " . ByteSizeMarkup::create(array_sum(array_column($full_entries, 'raw'))) . "\n";
echo "full entries size igbinary: " . ByteSizeMarkup::create(array_sum(array_column($full_entries, 'uncompressed'))) . "\n";
echo "full entries size php serialize: " . ByteSizeMarkup::create(array_sum(array_column($full_entries, 'serialized'))) . "\n";
echo "table entries:" . count($table_entries) . "\n";
echo "table entries size raw: " . ByteSizeMarkup::create(array_sum(array_column($table_entries, 'raw'))) . "\n";
echo "table entries size igbinary: " . ByteSizeMarkup::create(array_sum(array_column($table_entries, 'uncompressed'))) . "\n";
echo "table entries size php serialize: " . ByteSizeMarkup::create(array_sum(array_column($table_entries, 'serialized'))) . "\n";

The output on a site with 4 languages:

full entries:4
full entries size raw: 495.49 KB
full entries size igbinary: 2.04 MB
full entries size php serialize: 13.76 MB
table entries:41
table entries size raw: 59.82 KB
table entries size igbinary: 146.53 KB
table entries size php serialize: 582.42 KB

(disclaimer: when I tested, one language was actually not in the cache, it just reported 3 full entries, so I manually visited it to fill that)

with just 4 languages, on default php serialize, the 4 views_data cache entries use up almost 14MB. there are only ~10 table entries per language (a second site had 145 entries and 1.6MB in redis, so I assume redis had been cleaning up some not frequently used caches)

The table entries make up only 1/20th here, it's more on the other site, but still only a relatively small and manageable portion.

There are sites with 40 or 100 languages, so you can multiply that accordingly.

I think we should only store the table entries in bootstrap.

catch’s picture

Replied to #12 in the issue summary of #3586418: Set explicit ttl in the apcu backend and allow a default to be set for chained fast - short version is as long as the docs are right, that issue should help.

But with or without that issue, only storing the tables in chained fast is a great idea.


table entries size igbinary: 146.53 KB
full entries size php serialize: 582.42 KB
Should this last line say table?

And is it smaller because most of the table entries are never used during runtime?

Discussion reminded me of #3454277: Allow fields to be opted out of views data.

berdir’s picture

Should this last line say table?

Yes, it should you can also see the typo in the php script, I'll edit that.

And is it smaller because most of the table entries are never used during runtime?

Yes exactly. due to the generic integration, views_data contains definitions for whole entity types that don't have any views. paragraphs would be a typical example where you have lots of bundles and fields but most sites don't use any views on paragraphs (err views integration is still completely broken). Opt-out is tricky, as whether or not they are used typically depends a lot on the specific site. there _are_ sits with paragraph views.

It might be more feasible to switch to an approach where we build the views data per entity type and remove the global cache entry completely. But BC around that sounds very challenging. Then entity types that aren't used are then never built, at least not at runtime (we'd need to still understand for which entity types you can add views for the add view selector for example).

catch’s picture

Building by entity, it's hard because views data is a giant array of tables and random other non-table things, that doesn't inherently know about entity types.

In #3380145: ViewsData should not cache by language I was trying to figure out a way not to cache by language, which would dramatically reduce the total cache size if not the individual entries, got a bit stuck there though.

I think it would be reasonable for views data to explicitly build in the concept of entities - instead of converting to and from database tables and back again, which would also help with #3395848: Entity Query views backend, but not sure where to even start with what that might look like.

One vague idea I had was adding hook_views_data_table() as a new hook, and then allow the main views data entry to contain only a stub for the table. When it's requested, we'd then call hook_views_data_table($table_name) to get the data for that table. We'd still need to support hook_views_data_alter() but that could be passed an array with one table in it (and alter implementations would need to check for the existence of array keys which they really ought to anyway).

hook_views_data_table() then would need an entity/field implementation that can identify the entity type and field name from the table name, either via reverse-engineering it or if we added minimal metadata to the main entry which we then expand to the full definition.