Hi,
We faced issue with code caching on multisite. In our structure we have few sites and in each of them we have modules folder.
Case 1:
On-site A we have EventSubscriber with 2 events(E1 and E2).
On-site B we have EventSubscriber with 1 events(E1).
After a while, we get site B down because drupal trying to call callback of E1 event but this method does not exist.
Case 2:
On-site A we have SomeForm with a redirect to page P1.
On-site B we have SomeForm with a redirect to page P2.
After a while on submit form on site B we get redirected to page P1.
After some investigation, i found the approximate cause of the error. Drupal use APC for caching, but site_path variable does not uses on building of apc prefix building.
web/core/lib/Drupal/Core/DrupalKernel.php:1041
protected function initializeSettings(Request $request) {
$site_path = static::findSitePath($request);
$this->setSitePath($site_path);
$class_loader_class = get_class($this->classLoader);
Settings::initialize($this->root, $site_path, $this->classLoader);
// Initialize our list of trusted HTTP Host headers to protect against
// header attacks.
$host_patterns = Settings::get('trusted_host_patterns', []);
if (PHP_SAPI !== 'cli' && !empty($host_patterns)) {
if (static::setupTrustedHosts($request, $host_patterns) === FALSE) {
throw new BadRequestHttpException('The provided host name is not valid for this server.');
}
}
// If the class loader is still the same, possibly
// upgrade to an optimized class loader.
if ($class_loader_class == get_class($this->classLoader)
&& Settings::get('class_loader_auto_detect', TRUE)) {
$prefix = Settings::getApcuPrefix('class_loader', $this->root);
In the last line you can see that $site_path does not send to prefix builder.
| Comment | File | Size | Author |
|---|---|---|---|
| #30 | 2984232-nr-bot.txt | 183 bytes | needs-review-queue-bot |
| #3 | drupal-adding_site_path_to_cache_prefix-2984232-3.patch | 723 bytes | sstanislav |
Comments
Comment #2
sstanislav commentedWhat do you think about this solution?
Comment #3
sstanislav commentedFixed to clear Drupal 8.6.x.
Comment #4
cilefen commentedComment #5
cilefen commentedComment #6
sstanislav commentedComment #7
cilefen commentedComment #8
hart0554 commentedWe've dealt with this problem and have to this point worked around by ensuring every site in the multisite has a unique hash salt, since `Settings::getApcuPrefix` will use the hash salt as part of the prefix (that is if 'apcu_ensure_unique_prefix' is set, and it seems to be in my case because this has been working, but I'm not sure how to validate that is actually set). This would seem a more reliable solution since $site_path is included in the prefix regardless of the apcu setting.
Comment #10
wim leers#2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects introduced this. The goal was to avoid APCu fragmentation. Otherwise each individual site (both when testing and in case of multisite) would end up caching the exact same data multiple times under different keys.
\Drupal\Core\Site\Settings::getApcuPrefix()(introduced in that issue) has this in its docs:And this issue says:
So in your case you want to set
apcu_ensure_unique_prefixto TRUE in yoursettings.php. This is also what @hart0554 said.That would make things work as expected. But is it a good idea?
APCu + multisite + per-site code
If you combine these three, like you are, then we again run into the problem originally fixed in #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects. APCu is usually measured in tens of megabytes. If you need to share this space between 5 or 10 or even more sites, then very little space remains available. At that point: what is even the point of APCu caching?
The benefit of APCu is that it is a cache that lives inside the
phpprocess. Hence the I/O cost is very low, and it's very fast. Which is why we use it to cache code (PHP classes loaded by the class loader).Compare it to a CPU's L1 cache: it's tiny, but it's super fast.
So I have to concur with core committer catch at #2926309-31: Random fail due to APCu not being able to allocate memory:
Comment #11
wim leersComment #12
andypostComment #15
sstanislav commentedComment #16
dpagini commented@Wim Leers - your comment above helped me quite a bit, but I'm left with a few questions. It seems that `apcu_ensure_unique_prefix` defaults to TRUE, so it would be on for everyone? Is it other changes that maybe changed the way this was called? I see that DrupalKernel calls it with no $site_path, which is what the OP tries to change in the patch on this issue.
For the time being, I'm just making sure I set my hash_salt to a unique value per site. It looks like that will be included, again by default, in the APCu prefix generated in Settings.php. Previously my sites, built on BLT, were sharing a `hash_salt` across every site.
What I don't understand though, is if this is my problem, how this was breaking for me in the first place? It seems like my site gets into a state where it cannot find site-specific module code. When I do a `$ drush cr` - it fixes the problem. If my APCu is effectively the same between all the site, what is it about the drush cr that fixes the problem... and doesn't cause my site to share a single and corrupt APCu?
Comment #17
hugronaphor commentedJust want to share my experience
My stack is Kubernetees + APCu + multisite + NO per-site code
Without
$settings['apcu_ensure_unique_prefix'] = TRUE;and unique$settings['hash_salt']per site I'm getting mixed up config values and even css files from other sites as they're different for each site particularly.Update: Running it in prod for 2 weeks, and I can notice considerable less and more stable resources usage.
Comment #18
wim leers@dpagini I'm sorry, but I worked on this nearly two years ago, I do not recall all details. I'd need to re-read all related code to provide you with a solid answer.
But it sounds like you are using APCu in a multi-site scenario. Which is not recommended per
… and it's why this issue is unlikely to ever get committed.
Yes, but if sites share the same
hash_salts, they end up getting the same prefix anyway.Correct — this is why
hash_salts must be configured. The downside of the patch proposed here is that it'd become impossible for sites to share APCu key-value pairs, since different sites always have a different site path. That's probably a major reason why this issue has never gotten close to getting committed.Finally,
APCu is memory in the PHP process. If the PHP process serves multiple sites, then the data in APCu (which are really just a bunch of key-value pairs) is inevitably shared across al sites. There's nothing wrong or corrupt about this. It just means that key-value pairs set on any site are available to any other site served by the same PHP process. That's why
apcu_ensure_unique_prefixis critical: because it prefixes the keys of a particular site with that site'shash_salt. This implies that it is indeed necessary to specify a site-specifichash_salt.In other words: none of this configuration is magical, it's just about ensuring that sites with site-specific configuration also get site-specific key-value pairs. The way this works is by prefixing the keys with a site-specific prefix.
drush crwould have fixed the problem if and only if you did not yet haveapcu_ensure_unique_prefixset yet or did not yet configure site-specifichash_salts — but it'd have only fixed it temporarily, for the site that you randrush cron: because you then forcibly wiped the entire APCu cache and forced it to be populated with that particular site's key-value pairs again, which would then have broken another site served by the same PHP process.Whew, I hope I managed to explain that clearly enough 🤞😅
Comment #19
dpagini commentedUnderstandable! Thanks for your response/time at all!
So this part still confuses me a bit. Was there ever a time this did not default to "ON"? I tried looking at the git history and it looks like it's been defaulted to TRUE for quite a long time. So in that case, the only way to not have had it set would have been explicitly set it to FALSE, right?
Ok, so this is the most confusing part to me... b/c I've actually been using 2 sites for months now with little to no issue. I don't know what caused the problem the first time, but for the most part, these sites work OK. I remember reading something that the way
drush crrebuilds the cache is possibly a unique way, and maybe that's why it clears up the problem? Or maybe it's related to the PHP processes...? And we have somehow been getting VERY VERY lucky?Part of what has made this issue so hard for me is I have no idea how to reproduce, which makes fixing this _very difficult_.
---
So at the end of the day... the recommendation would still be to make sure each site has a unique
hash_saltand that should really be the end of it? As opposed to something like turning off APCu (is that even possible)?And having a few (we are 3) multisites would just increase the APCu size by ~3x from what it is now, but there shouldn't be other fallout other than that?
Thanks again for weighing in. I'm ~95% sure this is the issue we've faced, but I don't like that I can't reproduce and confirm the fix, and that what I'm seeing doesn't exactly match up with you're suggesting should only be a temporary fix, when you do a DRUSH CR.
Comment #20
dpagini commentedSo I still need to solve this for Drupal 8, but it looks like this may not be a problem at all in Drupal 9 after this issue was addressed.
Specifically, this line now varies the APCu prefix based a hash of the module list (site specific) in Drupal 9.
Comment #21
dpagini commentedOk, just thinking out loud... to ask a different way... does anyone have any ideas how to manually recreate this? If I run a
drush cragainst both of my multisites, everything works fine...Similarly, I think the OP is saying the same thing...
This is what I'm seeing too... "After a while" - but I don't have any idea what triggers the problem.
Comment #22
roderik(I don't have access to all documentation from my previous job anymore, so I hope I'm not making a mistake... but since very few people will likely see this thread, I'll do a quick answer anyway. Basically just confirming what you've discovered already - if that's still needed.)
Correct. That should be the end of it.
The second best option is evading the APCu classloader - you don't need to turn it off APCu for that, you can set the 'class_loader_auto_detect' setting to FALSE. But varying the hash_salt is better. It's better practice in general. (The only reason that tech support would tell you otherwise is - you need to be prepared for a little theoretical bump at the moment you do so: submissions of forms in-flight might fail after the update, image styles might be regenerated, and... anything in contrib/custom code that depends on the hash_salt. Hopefully that's nothing.)
Again, correct. And the class loader isn't huge in comparison to other forms of caches that are stored in APCu, so you're likely to not notice. (Even though: no official guarantees, this is an average statement / depends on your specific site, etc: I've done rough statistics on it once - and the increase in cache size generally doesn't seem to be a reason to take the classloader out of APCu instead.)
Since your first "that would be the end of it" is correct, this may be less important.
I do not have a detailed answer to your question because I never dove into actually reproducing this situation. But what I witnessed indirectly, suggests that crashes started happening 'randomly' when sites started using different versions of custom modules or themes - so they have different sets of classes; one site will then load another site's classes with all kinds of weird potential outcomes.
In cases where sites don't have site-specific modules but do have site-specific themes: crashes would start happening when one site had the Bootstrap theme enabled (which includes classes) when another site did not. (Or has a different major version of the Bootstrap theme?)
How to reproduce exactly: no idea. This is just a 'prerequisite' for this to start happening. Also, I have vaguely heard about what you say about 'drush cr' fixing the issue, but what I would expect (and cannot confirm in practice) is that while the site where you executed the 'drush cr' would work correctly, another site (where e.g. the Bootstrap theme situation was different) may start seeing crashes from that moment on.
Comment #23
dpagini commentedArgh. So about a month ago, we pushed a change to our production site to vary
hash_saltper multisite. Things were going OK until yesterday when we had a site start crashing pages with the same errors.I'm still highly convinced what we are talking about here is my culprit, but for some reason this hash_salt change did not seem to fix the problem. I'm thinking it may still be an option to try this
class_loader_auto_detect=FALSE change.It's a bit concerning to me that I could change the hash_salt like we are talking about in this thread, but still have this issue. Not being able to reliably recreate this problem is maybe the most difficult part about this bug.
Comment #26
bkosborne@dpagini - it's been a while, how are things on your site now? I'm looking into this as we have hundreds of sites are I've noticed that our sites are reporting that the APCu cache is full and I'm seeing some performance issues related to class loading. So I think we need to increase our APUc cache size, and that just led me down a rabbit hole of learning all about how Drupal uses it. I haven't experienced any issues with sites crashing or using incorrect cache values, though our sites don't have different versions of the same module or anything like that (though they do all have different modules enabled).
Comment #28
borisson_It looks like the last people are saying that this is no longer an issue. I think we can close this a duplicate of #3020296: Remove Symfony's classloader as it does not exist in Symfony 4
Comment #30
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.