Closed (fixed)
Project:
Provision
Version:
7.x-3.x-dev
Component:
Install process
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
5 Jul 2019 at 20:06 UTC
Updated:
21 May 2021 at 21:36 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
memtkmcc commentedIIRC it was related to the fact that in the old days (but not sure if that actually improved/changed), you couldn't fire up such verify from the backend, because such verify would wipe out all aliases, while the way it was built, subdir sites logic relied on special aliases present. It was because the backend verify, unlike verify run via frontend, didn't have an access to the aliases list (in the database), and they were then gone both from the vhost and drushrc.php file. But I know we have modified our copy used in BOA to fire the verify as hosting task, not provision direct verify, and this worked for problems previously affecting cloning and migrating sites with aliases. This could be just a leftover related to these old problems, but having our own workarounds in BOA for years, we stopped tracking this particular issue, so my explanation may have only historical value.
Comment #3
colanI always appreciate your background information. As I'm relatively new here, the history is really valuable, and provides much-needed context. (Please keep it up!)
So I suppose a good plan would be to try it from the back-end, and see if it works now. If not, I suppose we could unfork your work to trigger it from the front-end?
Comment #4
memtkmcc commentedProvision can see only what is already present in the drushrc.php files, and doesn't access database, that's what only frontend does, so any task operating on vhosts (sites, not just platforms) needs to be a frontend (hosting) registered task. This behaviour hasn't changed, I think. We have unforked almost everything in the past already (2016), so also our added extra verify (via frontend) sub-tasks to fix clone/migrate are already in. We sometimes modify something else during debugging, but should try and submit any tested patches so we can stay in sync and mostly unforked. Example of the unforked stuff:
Comment #5
memtkmcc commentedThat was Issue #2798143: Improve Clone task reliability with extra sub-tasks (BOA unfork)
Comment #6
colanMakes sense. It shouldn't be that hard to do the same type of thing here: Get the parent ID and run that
provision_backend_invoke()call on it.Comment #7
spidermanDigging into this today in an effort to fix as @colan suggests, I ran across the following historical comment in the relevant code
So if I understand, we should try re-instating the original provision_backend_invoke() call which doesn't use the frontend hosting_task to re-verify, and in particular test this in a scenario where the parent and subdir site are on different platforms using different install profiles (although from @memtkmcc's comments above, I suspect that still won't work).
Failing that, we can try using the trick from #2798143, calling provision_backend_invoke to create a hosting-task automatically. In that case, according to the code comment, we need to watch for the Hostmaster database being destroyed.
Just documenting my thought process here, and will try these two approaches shortly, and report back :)
Comment #8
spidermanI've actually tried the second approach first, as it seemed slightly more likely to work. Attached is a patch which simply uncomments the 2 lines (in both Apache and Nginx versions of the Provision task) which try to create the front-end hosting task to re-verify the parent site.
I've tested this in a couple of scenarios, and they seem to work. My parent site is using a basic D8 platform, and installed with the Minimal install profile
* simple case: install a new subdir site on the same platform, using a different install profile (Umami)
* install a subdir site on a *different* platform, using a different install profile (Standard)
The only concern I have is the part of the old code comment which says "instead of creating separate re-verify task for the parent site, it will run it "inline", immediatelly, so in the wrong context, which, depending on other conditions will destroy *hostmaster* database". I'm not clear what these "other conditions" would be, but that doesn't appear to happen for me. That said, the comment also mentions Aegir 2.x, so maybe it's simply not an issue.
At any rate, further validation and testing (or guidance on what else to try/watch for) by someone with more knowledge of Aegir would be appreciated :)
Comment #9
spidermanConfirmed this patch works for me on Apache as well as Nginx. Steps to reproduce:
1. Provision local aegir-dev-vm, add foo.aegir.local to /etc/hosts, pointing to aegir.local IP
2. enable hosting_subdirs module
3. Create 2 platforms, using a basic D8 install (https://gitlab.com/sensespidey/basic-d8.git)
4. Install parent site foo.aegir.local, using Minimal install profile on first D8 platform
5. Install subsite foo.aegir.local/subsite, using Standard install profile on second D8 platform
6. Observe that a Verify task on the foo.aegir.local site shows up in the queue, and runs cleanly
7. Observe that both subsite and parent site are installed and running normally.
Comment #10
colanComment #11
colanInstead of using your repo when creating a platform, I just used the upstream official one (which is what I usually do for testing site installations):
I ran into no technical problems with testing, but it would be good to make the following changes:
Comment #12
ergonlogicRe.
provision_backend_invoke('@hostmaster', 'hosting-task', ...), I think this may break our ability to run Aegir headless. That is, the front-end (obviously) depends on the back-end for all the heavy lifting. However, we should be able to rundrush provision <foo>without requiring a Aegir front-end site.That said, we may already break this rule elsewhere in Aegir3, and since this is specific to subdir sites, maybe we shouldn't worry too much about it.
However, it seems like the underlying problem (according to #4) is that we aren't saving relevant info from the front-end, when generating the drushrc files. I think that's likely the "proper" solution here. That is, when we verify the subdir site, we should store a reference to the parent site (via Drush alias, ie, hosting name) in the subdir site's drushrc. That should allow us to just run something like:
Comment #13
spiderman@ergonlogic Thanks for your comments. At least in this case, it seems that the SubdirVhost code has access to the parent site, so the question was only whether to call
drush provision-verifyordrush hosting-task verifyon that site.I take the point that the better way would be to use the backend provision process to handle this, but it's unclear to me whether the original issues in doing this (mentioned in the code comment I posted above) are still a problem or not. I could test this out, but @colan seems to be indicating we should just use the hosting-task method, even though it breaks the ability to run Aegir headless (if this is already broken elsewhere, maybe it doesn't much matter here?).
That said, I could probably test the alternate approach fairly quickly, but again my question would be: are we certain of all the cases/conditions that make break things?
Comment #14
memtkmcc commentedWe are running front-end tasks from backend in several places.
It was introduced for reasons I have explained above. One example issue: #1004526: Automatic aliases are not persisted across rename and clone
We should perhaps try to remove it and see if the old problems have been actually fixed elsewhere in the meantime and we can stop using such workarounds?
Comment #15
memtkmcc commentedNote that by removing all of them we will re-introduce the old UX problems -- migration failures caused by the fact that Aegir didn't update packages versions because the site and its old and new platform were not re-verified first, and you can't do that without going via front-end task.
Comment #16
ergonlogicYeah, we ran afoul of #1004526: Automatic aliases are not persisted across rename and clone when working through #3036890: Simplify subdir site installation, and had to trigger an extra verify to re-instate the automatic subdir aliases. So, it's pretty likely we'd have problems here, at very least losing any aliases on the parent site.
For expediency, I think triggering a hosting-task from the backend is reasonable. If anyone wants to tackle #1004526: Automatic aliases are not persisted across rename and clone, feel free, but it shouldn't hold up an immediate fix.
According to #11, it looks like we just have some documentation cleanup to add to the patch in #8, in order for this to be RTBC.
For whatever it's worth, this is an instance of Aegir's split-brain problem. Æegir5 (currently pre-alpha) considers the database driving the site (including APIs, console commands, etc.) as canonical, and eliminates the need for any code to pass user data to the back-end.
Comment #17
colanI agree that we should continue with what we're doing here, and then the larger problem can be handled in a new issue. I'll leave that to @memtkmcc or @ergonlogic to create as you two have much more context than the rest of us.
So, @spiderman, please continue with what you were working on.
Comment #18
spidermanI realized that this patch breaks login to the subdir site, so at @ergonlogic's suggestion, will try implementing hook_post_hosting_TASK_TYPE_task in hosting_subdirs module instead, in order to avoid the mess of spawning the Verify from the provision backend. Patch forthcoming, if I can get it to work :)
Comment #19
spidermanHere's a pair of patches- one effectively comments out the warning so the install task will complete Green, and replacing the older comment with a new one pointing to the replacement hosting_subdirs_post_hosting_install_task() hook implementation.
Combined, these seem to work for me, and I can see the drush_log message in the Install output:
A couple of notes that I discovered in testing:
subdir.ddirectory.subdir.ddirectory (at least for Nginx, haven't yet tested for Apache) doesn't get cleaned up when subdir sites (or their parents) are deleted.subdir.dfolder that didn't get cleaned up, and creates the appropriate Vhost config to include them.All of this means that to test the patch properly, you need to install a parent site using a name you haven't yet used on the aegir instance, or have manually cleared out its vhost config after deleting.
I believe the behaviour described above is a bug, and have filed a separate issue to address it.
Comment #20
spidermanI've re-tested these latest 2 patches against a freshly build Aegir instance using Apache, to validate both webservers. Each is achieving the goal (auto-verify parent site to enable subdir site to work) without breaking the reset-login link (my earlier patch had this mysterious side effect).
I've also re-rolled the second (really, the main) patch to Hosting Subdirs, to remove the Note @colan had noted on the Add Site page (thanks Colan!)
If somebody can validate on a different setup, I think we can commit :)
Comment #21
ergonlogicWe're currently supporting a deployment where our clients don't have control over their organization's domains. They have a single DNS entry pointing to the Aegir server (eg.
aegir.example.net), and need for that to be the parent for all subdir sites (eg.aegir.example.net/subdir1,aegir.example.net/subdir2, etc.This use-case is particularly problematic, since it results in the Hostmaster site going down every time a new site is installed, or an existing site migrated, or even just verified. This is fixed by simply verifying the Hostmaster site on the backend (ie,
drush @hm provision-verify). However, no one should be expected to know this, and being faced with a 404 on your Hostmaster site would be understandably unnerving.My understanding was that this only affected parent sites when the first subdir site was installed. But, as it stands, it appears to cause problems on even the most basic tasks (ie. verify). I'm planning to review the underlying vhost configs, to see if some refactoring there can alleviate this somewhat.
Comment #22
ergonlogicIt seems to me that there are 3 vhosts in play here:
subdir.d/aegir.example.net/subdir1.confvhost.d/subdir1.aegir.example.netvhost.d/aegir.example.netOnly the last one changes when the
aegir.example.net/subdir1site is verified:So, basically, it looks like the parent site (
aegir.example.net) placeholder vhost is being generated despite it already existing.Comment #23
ergonlogicAttached is a patch to clean up how we identify whether a parent site exists. I suspect that this'll require re-working the approach the other patches have taken. I'll continue working on this aspect.
Note that the behaviour I noted above (subdirs keep breaking the hostmaster site) is due to looking for Drush aliases, which generally take the form of
example.com.alias.drushrc.php. However, there's a notable exception:hostmaster.alias.drushrc.phpUnfortunately, we don't have a good way to get context outside the specific site/platform/server that we're operating on, or directly linked entities. So the "parent" site will only be found if its primary URL is the parent URL, since that's what's used to look this up (on the backend) if the site uses the parent URL as an alias, it won't be detected.
Comment #24
ergonlogicIt turns out the Apache and Nginx classes were identical, so I moved this logic up to a base class, and just inherit for both service types.
Comment #25
ergonlogicBuilding on the recent changes I suggest above, it was pretty easy to trigger a verify of the parent site on installation with
provision_backend_invoke(). The concerns in the (very old) comments no longer seem to apply. For example, if the parent site is using an alias, that persists across the installation of a subdir site, and the subsequent re-verify.Comment #26
ergonlogicComment #29
colanThanks everyone. I've been testing this, and it's working well.
Comment #31
colanComment #32
colanI started getting this again. Can anyone reproduce? Maybe there was a regression here?