Hi,

We faced issue with code caching on multisite. In our structure we have few sites and in each of them we have modules folder.

Case 1:
On-site A we have EventSubscriber with 2 events(E1 and E2).
On-site B we have EventSubscriber with 1 events(E1).
After a while, we get site B down because drupal trying to call callback of E1 event but this method does not exist.

Case 2:
On-site A we have SomeForm with a redirect to page P1.
On-site B we have SomeForm with a redirect to page P2.
After a while on submit form on site B we get redirected to page P1.

After some investigation, i found the approximate cause of the error. Drupal use APC for caching, but site_path variable does not uses on building of apc prefix building.

web/core/lib/Drupal/Core/DrupalKernel.php:1041

protected function initializeSettings(Request $request) {
    $site_path = static::findSitePath($request);
    $this->setSitePath($site_path);
    $class_loader_class = get_class($this->classLoader);
    Settings::initialize($this->root, $site_path, $this->classLoader);

    // Initialize our list of trusted HTTP Host headers to protect against
    // header attacks.
    $host_patterns = Settings::get('trusted_host_patterns', []);
    if (PHP_SAPI !== 'cli' && !empty($host_patterns)) {
      if (static::setupTrustedHosts($request, $host_patterns) === FALSE) {
        throw new BadRequestHttpException('The provided host name is not valid for this server.');
      }
    }

    // If the class loader is still the same, possibly
    // upgrade to an optimized class loader.
    if ($class_loader_class == get_class($this->classLoader)
        && Settings::get('class_loader_auto_detect', TRUE)) {
      $prefix = Settings::getApcuPrefix('class_loader', $this->root);

In the last line you can see that $site_path does not send to prefix builder.

Comments

shalimanov created an issue. See original summary.

sstanislav’s picture

What do you think about this solution?

sstanislav’s picture

StatusFileSize
new723 bytes

Fixed to clear Drupal 8.6.x.

cilefen’s picture

Title: Acp caching on multisite » APCu does not distinguish between sites in a multisite in its prefix
Version: 8.4.4 » 8.5.x-dev
Component: cache system » base system
Category: Support request » Bug report
cilefen’s picture

Title: APCu does not distinguish between sites in a multisite in its prefix » APCu class loading does not distinguish between sites in a multisite in its key prefixing
sstanislav’s picture

Priority: Normal » Major
cilefen’s picture

Status: Active » Needs review
hart0554’s picture

We've dealt with this problem and have to this point worked around by ensuring every site in the multisite has a unique hash salt, since `Settings::getApcuPrefix` will use the hash salt as part of the prefix (that is if 'apcu_ensure_unique_prefix' is set, and it seems to be in my case because this has been working, but I'm not sure how to validate that is actually set). This would seem a more reliable solution since $site_path is included in the prefix regardless of the apcu setting.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

wim leers’s picture

Title: APCu class loading does not distinguish between sites in a multisite in its key prefixing » APCu class loading does not automatically distinguish between sites in a multisite that have per-site code
Related issues: +#2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects, +#2926309: Random fail due to APCu not being able to allocate memory, +#2934002: APCu cache backend can have unreasonable number of entries during testing or multi-site

#2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects introduced this. The goal was to avoid APCu fragmentation. Otherwise each individual site (both when testing and in case of multisite) would end up caching the exact same data multiple times under different keys.

\Drupal\Core\Site\Settings::getApcuPrefix() (introduced in that issue) has this in its docs:

   * Additionally, if a multi site implementation does not use site specific
   * module directories setting apcu_ensure_unique_prefix would allow the sites
   * to share APCu cache items.

And this issue says:

In our structure we have few sites and in each of them we have modules folder.

So in your case you want to set apcu_ensure_unique_prefix to TRUE in your settings.php. This is also what @hart0554 said.

That would make things work as expected. But is it a good idea?

APCu + multisite + per-site code

If you combine these three, like you are, then we again run into the problem originally fixed in #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects. APCu is usually measured in tens of megabytes. If you need to share this space between 5 or 10 or even more sites, then very little space remains available. At that point: what is even the point of APCu caching?

The benefit of APCu is that it is a cache that lives inside the php process. Hence the I/O cost is very low, and it's very fast. Which is why we use it to cache code (PHP classes loaded by the class loader).

Compare it to a CPU's L1 cache: it's tiny, but it's super fast.

So I have to concur with core committer catch at #2926309-31: Random fail due to APCu not being able to allocate memory:

The way we use APCu in core is fine for single site situations, since the cache size is finite. It should never be used for multiple sites on one server and that’s what we’re doing with the test bot.

wim leers’s picture

Title: APCu class loading does not automatically distinguish between sites in a multisite that have per-site code » APCu class loading does not automatically distinguish between sites in a multisite that has per-site code
Issue tags: +D8 cacheability, +multisite, +Needs documentation
andypost’s picture

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

sstanislav’s picture

Issue summary: View changes
dpagini’s picture

@Wim Leers - your comment above helped me quite a bit, but I'm left with a few questions. It seems that `apcu_ensure_unique_prefix` defaults to TRUE, so it would be on for everyone? Is it other changes that maybe changed the way this was called? I see that DrupalKernel calls it with no $site_path, which is what the OP tries to change in the patch on this issue.

For the time being, I'm just making sure I set my hash_salt to a unique value per site. It looks like that will be included, again by default, in the APCu prefix generated in Settings.php. Previously my sites, built on BLT, were sharing a `hash_salt` across every site.

What I don't understand though, is if this is my problem, how this was breaking for me in the first place? It seems like my site gets into a state where it cannot find site-specific module code. When I do a `$ drush cr` - it fixes the problem. If my APCu is effectively the same between all the site, what is it about the drush cr that fixes the problem... and doesn't cause my site to share a single and corrupt APCu?

hugronaphor’s picture

Just want to share my experience

My stack is Kubernetees + APCu + multisite + NO per-site code

Without $settings['apcu_ensure_unique_prefix'] = TRUE; and unique $settings['hash_salt'] per site I'm getting mixed up config values and even css files from other sites as they're different for each site particularly.

Update: Running it in prod for 2 weeks, and I can notice considerable less and more stable resources usage.

wim leers’s picture

@dpagini I'm sorry, but I worked on this nearly two years ago, I do not recall all details. I'd need to re-read all related code to provide you with a solid answer.

But it sounds like you are using APCu in a multi-site scenario. Which is not recommended per

So I have to concur with core committer catch at #2926309-31: Random fail due to APCu not being able to allocate memory:

The way we use APCu in core is fine for single site situations, since the cache size is finite. It should never be used for multiple sites on one server and that’s what we’re doing with the test bot.

… and it's why this issue is unlikely to ever get committed.


It seems that `apcu_ensure_unique_prefix` defaults to TRUE, so it would be on for everyone?

Yes, but if sites share the same hash_salts, they end up getting the same prefix anyway.

I see that DrupalKernel calls it with no $site_path, which is what the OP tries to change in the patch on this issue.

Correct — this is why hash_salts must be configured. The downside of the patch proposed here is that it'd become impossible for sites to share APCu key-value pairs, since different sites always have a different site path. That's probably a major reason why this issue has never gotten close to getting committed.

Finally,

If my APCu is effectively the same between all the site, what is it about the drush cr that fixes the problem... and doesn't cause my site to share a single and corrupt APCu?

APCu is memory in the PHP process. If the PHP process serves multiple sites, then the data in APCu (which are really just a bunch of key-value pairs) is inevitably shared across al sites. There's nothing wrong or corrupt about this. It just means that key-value pairs set on any site are available to any other site served by the same PHP process. That's why apcu_ensure_unique_prefix is critical: because it prefixes the keys of a particular site with that site's hash_salt. This implies that it is indeed necessary to specify a site-specific hash_salt.
In other words: none of this configuration is magical, it's just about ensuring that sites with site-specific configuration also get site-specific key-value pairs. The way this works is by prefixing the keys with a site-specific prefix.
drush cr would have fixed the problem if and only if you did not yet have apcu_ensure_unique_prefix set yet or did not yet configure site-specific hash_salts — but it'd have only fixed it temporarily, for the site that you ran drush cr on: because you then forcibly wiped the entire APCu cache and forced it to be populated with that particular site's key-value pairs again, which would then have broken another site served by the same PHP process.

Whew, I hope I managed to explain that clearly enough 🤞😅

dpagini’s picture

@dpagini I'm sorry, but I worked on this nearly two years ago, I do not recall all details.

Understandable! Thanks for your response/time at all!

drush cr would have fixed the problem if and only if you did not yet have apcu_ensure_unique_prefix set yet

So this part still confuses me a bit. Was there ever a time this did not default to "ON"? I tried looking at the git history and it looks like it's been defaulted to TRUE for quite a long time. So in that case, the only way to not have had it set would have been explicitly set it to FALSE, right?

but it'd have only fixed it temporarily, for the site that you ran drush cr on: because you then forcibly wiped the entire APCu cache and forced it to be populated with that particular site's key-value pairs again, which would then have broken another site served by the same PHP process.

Ok, so this is the most confusing part to me... b/c I've actually been using 2 sites for months now with little to no issue. I don't know what caused the problem the first time, but for the most part, these sites work OK. I remember reading something that the way drush cr rebuilds the cache is possibly a unique way, and maybe that's why it clears up the problem? Or maybe it's related to the PHP processes...? And we have somehow been getting VERY VERY lucky?
Part of what has made this issue so hard for me is I have no idea how to reproduce, which makes fixing this _very difficult_.

---

So at the end of the day... the recommendation would still be to make sure each site has a unique hash_salt and that should really be the end of it? As opposed to something like turning off APCu (is that even possible)?
And having a few (we are 3) multisites would just increase the APCu size by ~3x from what it is now, but there shouldn't be other fallout other than that?

Thanks again for weighing in. I'm ~95% sure this is the issue we've faced, but I don't like that I can't reproduce and confirm the fix, and that what I'm seeing doesn't exactly match up with you're suggesting should only be a temporary fix, when you do a DRUSH CR.

dpagini’s picture

So I still need to solve this for Drupal 8, but it looks like this may not be a problem at all in Drupal 9 after this issue was addressed.
Specifically, this line now varies the APCu prefix based a hash of the module list (site specific) in Drupal 9.

dpagini’s picture

Ok, just thinking out loud... to ask a different way... does anyone have any ideas how to manually recreate this? If I run a drush cr against both of my multisites, everything works fine...

Similarly, I think the OP is saying the same thing...

After a while, we get site B down because drupal trying to call callback of E1 event but this method does not exist.

This is what I'm seeing too... "After a while" - but I don't have any idea what triggers the problem.

roderik’s picture

(I don't have access to all documentation from my previous job anymore, so I hope I'm not making a mistake... but since very few people will likely see this thread, I'll do a quick answer anyway. Basically just confirming what you've discovered already - if that's still needed.)

So at the end of the day... the recommendation would still be to make sure each site has a unique hash_salt and that should really be the end of it? As opposed to something like turning off APCu (is that even possible)?

Correct. That should be the end of it.

The second best option is evading the APCu classloader - you don't need to turn it off APCu for that, you can set the 'class_loader_auto_detect' setting to FALSE. But varying the hash_salt is better. It's better practice in general. (The only reason that tech support would tell you otherwise is - you need to be prepared for a little theoretical bump at the moment you do so: submissions of forms in-flight might fail after the update, image styles might be regenerated, and... anything in contrib/custom code that depends on the hash_salt. Hopefully that's nothing.)

And having a few (we are 3) multisites would just increase the APCu size by ~3x from what it is now, but there shouldn't be other fallout other than that?

Again, correct. And the class loader isn't huge in comparison to other forms of caches that are stored in APCu, so you're likely to not notice. (Even though: no official guarantees, this is an average statement / depends on your specific site, etc: I've done rough statistics on it once - and the increase in cache size generally doesn't seem to be a reason to take the classloader out of APCu instead.)

Ok, just thinking out loud... to ask a different way... does anyone have any ideas how to manually recreate this? If I run a drush cr against both of my multisites, everything works fine...

Since your first "that would be the end of it" is correct, this may be less important.

I do not have a detailed answer to your question because I never dove into actually reproducing this situation. But what I witnessed indirectly, suggests that crashes started happening 'randomly' when sites started using different versions of custom modules or themes - so they have different sets of classes; one site will then load another site's classes with all kinds of weird potential outcomes.
In cases where sites don't have site-specific modules but do have site-specific themes: crashes would start happening when one site had the Bootstrap theme enabled (which includes classes) when another site did not. (Or has a different major version of the Bootstrap theme?)

How to reproduce exactly: no idea. This is just a 'prerequisite' for this to start happening. Also, I have vaguely heard about what you say about 'drush cr' fixing the issue, but what I would expect (and cannot confirm in practice) is that while the site where you executed the 'drush cr' would work correctly, another site (where e.g. the Bootstrap theme situation was different) may start seeing crashes from that moment on.

dpagini’s picture

Argh. So about a month ago, we pushed a change to our production site to vary hash_salt per multisite. Things were going OK until yesterday when we had a site start crashing pages with the same errors.

I'm still highly convinced what we are talking about here is my culprit, but for some reason this hash_salt change did not seem to fix the problem. I'm thinking it may still be an option to try this class_loader_auto_detect=FALSE change.

It's a bit concerning to me that I could change the hash_salt like we are talking about in this thread, but still have this issue. Not being able to reliably recreate this problem is maybe the most difficult part about this bug.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev
bkosborne’s picture

@dpagini - it's been a while, how are things on your site now? I'm looking into this as we have hundreds of sites are I've noticed that our sites are reporting that the APCu cache is full and I'm seeing some performance issues related to class loading. So I think we need to increase our APUc cache size, and that just led me down a rabbit hole of learning all about how Drupal uses it. I haven't experienced any issues with sites crashing or using incorrect cache values, though our sites don't have different versions of the same module or anything like that (though they do all have different modules enabled).

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

borisson_’s picture

It looks like the last people are saying that this is no longer an issue. I think we can close this a duplicate of #3020296: Remove Symfony's classloader as it does not exist in Symfony 4

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new183 bytes

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Version: 9.5.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.