Had a long discussion with Moshe and Alex today about unique deployment challenges in D8, thanks to compiled PHP, which will only be exacerbated when Twig lands in full.
Here are some details I remember... this will probably need a better issue summary:
- Drupal does certain required bootstrap functions (e.g. service registration) as Symfony "compile passes." This effectively makes the PHP compiled service container hit on every page load. With Twig that's only going to become more true.
- On multi-head environments, using the default configuration that core gives you, this compiled PHP directory will end up on an NFS mount or similar. These are not at all particularly fast, which is going to lead to major performance problems.
- While the Drupal-tuned hosting companies (think Acquia, Pantheon, etc.) will probably invent workarounds for this with their various Very Smart People™, and high-end Drupal sites (think Examiner, Whitehouse, etc.) might be able to do workarounds like turning off the modules page on their servers so that the service container doesn't ever need to be recomplied, it leaves Drupal SaaS providers, as well as users of generic multi-tier hosting platforms (think Rackspace Cloud) in a pretty bad place.
One workaround that Alex suggested was disabling APC's filestat option which would improve performance on NFS, but that is unfriendly to other non-Drupal applications that might also be run on the same box. It might also mean a restart of Apache to deploy which seems like a deal-killer. :\
Since high-traffic sites are a significant portion of Drupal's target audience, this is a pretty big issue. In what ways can we mitigate these challenges prior to release?
Comments
Comment #1
webchickTagging.
Comment #1.0
webchickx
Comment #1.1
webchickUpdated issue summary.
Comment #2
chx CreditAttribution: chx commentedYou keep your container on a local disk and the system will read the enabled module list from CMI to see whether it needs rebuilding and it'll rebuild on every web head on demand. We went through this with msonnabaum, davidstrauss and bjaspan at BADcamp. Never, ever put the container on a shared disk. In DrupalKernel:
The only issue here is to find a similar tombstone record for Twig.
Comment #3
moshe weitzman CreditAttribution: moshe weitzman commentedYeah, I think thats a reasonable approach for the containter. Like you said, Twig is still outstanding.
The larger issue IMO is we are introducing a new deployment complexity for multi headed web sites. When big web sites evaluate Drupal, they now have one more complexity that can make them say "screw it". Just because they are big does not mean they want to deal with this, or have funds for Drupal dev/ops experts.
Comment #4
chx CreditAttribution: chx commented> The larger issue IMO is we are introducing a new deployment complexity for multi headed web sites.
The container is already automated. Let's figure out Twig and move on.
Comment #5
moshe weitzman CreditAttribution: moshe weitzman commentedYou missed my point, or are choosing to ignore it. Just because there is a recommended approach to deal with PHPStorage does not mean that the complexity is justified. The problem isn't just for multi-headed sites. Its also hated by developers who frequently re-install sites.
Comment #6
effulgentsia CreditAttribution: effulgentsia commentedFor both the container and Twig, the compiled PHP files can only depend on the state of the codebase (e.g., when you update core from 8.0 to 8.1, or make a change to one of you custom theme's .twig files, the compiled PHP files must be deleted) and on which modules you have enabled. A multi-web-server site can choose one of the following approaches to manage this:
Approach 1:
a) Create a location on each web server that is writable by the PHP process. This is a new requirement that did not exist in D7. With D7, you deploy your code to each web server, but that code is read-only, and you put your sites/default/files on a shared file system and make it writable. What's new in D8 is needing to create a separate location on each web server and making it writable.
b) Change your settings.php to inform PHPStorage of that location.
c) Change whatever script you have that deploys your code from a central repository to each of your web nodes to also empty out that writable location on each web node. This is also a new requirement that did not exist in D7. In D7, you just had to run update.php / clear caches from a single web request, not do something (other than deploying the code itself) on each web node separately.
Approach 2:
a) Decide to disable admin/modules, and treat changes to which modules are enabled on your site as a code deployment. Meaning you can only do it by modifying the config file that's in your repository, and then running the same process/script that you run to deploy code updates. This allows you to not need a web server writable directory on your web nodes.
b) Change your settings.php to use FileReadOnlyStorage and specify a non-web-server-writable location on your web node.
c) Change whatever script you have that deploys your code from a central repository to each of your web nodes to also compile the DIC and twig files and place those compiled files onto each web node.
I don't think we yet have what we need in Drupal HEAD to support the second approach, but I think we need to do that. For example, add a .sh script to do the compilation, and ensure that a config import of your new list of enabled modules works when the DIC represents your new modules rather than your old ones. Assuming we do that though, then I think the above two would be the most common setups of multiheaded sites. Fancier approaches are possible, like creating a PHPStorage implementation that stores only in APC, and not on disk at all, but this issue is about helping the typical big sites, not the ones that need a dev/ops expert for super tuning.
Yeah, I think the question in this issue is whether the approaches above (especially part c. of each) are within easy enough reach (both in terms of awareness/documentation and skill level needed to perform) of people who evaluate competing platforms for big sites and then administer those sites.
Comment #7
chx CreditAttribution: chx commentedc) will soon be doable from a web request as well. #1872522: Compiled data in PHP storage is cleared too late in drupal_flush_all_caches() Reviews welcome! :)
Comment #8
andypostAlso interesting how to deploy core8 in multihead infra
Comment #9
David_Rothstein CreditAttribution: David_Rothstein commentedCouldn't update.php and cache clears set a config variable which stores the current REQUEST_TIME? And then, have the container-building code check that against the filemtime() of the code directory, and rebuild the container when it's newer? (As far as I understand, the filemtime() check is being done already, so no performance hit there.) Then everything should happen automatically, with no need for a custom script.
We'd want to be careful not to allow stampedes where multiple requests try to rebuild the container at once, but my guess is that the current container-building code is already vulnerable to stampedes anyway...
I think this is a big problem, and the default configuration doesn't make much sense to me for any site (small or large). Why should code, configuration, and user-uploaded files all live in the same place? This could cause security headaches too, since it encourages backups of all those things to be mixed together (even though you might care a lot more about the security of your code + configuration than you do about user-uploaded files, which are already public).
Would it be possible for the default configuration to look more like this:
Sites with multiple web servers would then be instructed to have the first two in a shared filesystem but the third one not shared.
Comment #10
webchickHm. That's an idea. It makes installations harder/more complex though... each one of these are going to get big red error messages on installation that they need to be writable directories. It's a big enough pain to do this on one folder currently.
I was going to suggest setting up a "private, non-webroot" vs. "public, webroot" files directory on installation, since that would at least give a security benefit for the extra annoyance, but that still requires more complex instructions to separate config vs. code on multi-tier architecture.
Comment #11
David_Rothstein CreditAttribution: David_Rothstein commentedWell, the current instructions in INSTALL.txt say this:
So if you're following the first set of instructions it won't be any harder than it was before; once you've made sites/default writable the installer can create all the directories in there that it wants.
If you're following the "Alternatively" section at the end, it will be more of a pain. I'm not sure offhand if there are good reasons for doing that in typical cases.
Comment #12
moshe weitzman CreditAttribution: moshe weitzman commentedI think splitting up those directories is an improvement. My point in #5 about complexity still stands.
Comment #13
chx CreditAttribution: chx commentedRe #9 I think #1970276: Figure out file layouts a separate issue for that.
Comment #13.0
chx CreditAttribution: chx commentedLess dramatic. :P~
Comment #14
catchRemoving 'revisit' tag. This needs sorting out, but it's not tied to any particular 8.x milestone IMO. Anything which changes the actual behaviour in core (like config in the db) can have its own issues.
Comment #15
webchickThis came up again at Dev Days as we explore various ways of improving Drupal 8's performance. Some of those plans involve putting more PHP into storage, so bumping and tagging this to get it back on folks's radars again.
Side-note: Why on earth are comments I posted attributed to chx in this issue..?
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedThanks @webchick. I think Alex's comment is in #6 is very relevant. Lets only put stuff into PHPStorage that remains read-only until a code deploy happens.
Comment #17
BerdirOk, here's my thoughts from today..
This came up again specifically for the APC classloader, but I don't see a problem there. It does a apc_fetch() for every class (which is pretty fast because it's just a string), on a cache miss, we just fetch it again. The only scenario where I think it can fail is when a module is moved, because then the same namespace + class name will be in a different location. But that's a code change, and on a single site, you then need a cache clear anyway, which should also invalidate the classloader, and when multiple webheads, you can restart apache or clear that too in a script as part of the deployment process.
For PhpStorage, my thoughts are similar to what moshe just wrote.
For things that *only* depend on code, like the twig template cache (which template is used can changed based on the enabled modules, configuration, whatever but the compiled template is always the same until the actual code changes). That should be easy to point to a local, temporary/fast storage. No problems there. On deployments, you just invalidate that and you're good.
Things like container, that might change on production are harder. I can see three different ideas on how to address that:
* We do something like #2301163: Create a phpstorage backend that works with a local disk but coordinates across multiple webheads, but that means we have to do an additional database/redis/whatever query in super-early bootstrap. Not that great I think.
* If we make the assumption that we can still serve a request or at least bootstrap far enough, then we could do a similar request much later in the request, for example after page cache, possibly even in terminate. Has a certain risk of breaking that request, but most things should be fine.
* You have some kind of bus system in your environment, and when one web server does something like writing an updated container, it can emit an event to others, which will then delete their local file so that it can be rebuilt. that requires some additional infrastructure but might be fastest thing as it doesn't have any overhead until something happens.