Currently, when a site (e.g. example.com) exists, and a new child subdirectory site is created (say example.com/site1), the following warning shows up in the logs during the Install task:

Parent site @example.com re-verify required to include subdir config for example.com/site1

Why does the user need to do this? There's no reason this can't happen automatically.

Solution:

Instead of issuing a warning, simply add a new parent Verify task to the queue.

Comments

colan created an issue. See original summary.

memtkmcc’s picture

IIRC it was related to the fact that in the old days (but not sure if that actually improved/changed), you couldn't fire up such verify from the backend, because such verify would wipe out all aliases, while the way it was built, subdir sites logic relied on special aliases present. It was because the backend verify, unlike verify run via frontend, didn't have an access to the aliases list (in the database), and they were then gone both from the vhost and drushrc.php file. But I know we have modified our copy used in BOA to fire the verify as hosting task, not provision direct verify, and this worked for problems previously affecting cloning and migrating sites with aliases. This could be just a leftover related to these old problems, but having our own workarounds in BOA for years, we stopped tracking this particular issue, so my explanation may have only historical value.

colan’s picture

I always appreciate your background information. As I'm relatively new here, the history is really valuable, and provides much-needed context. (Please keep it up!)

So I suppose a good plan would be to try it from the back-end, and see if it works now. If not, I suppose we could unfork your work to trigger it from the front-end?

memtkmcc’s picture

Provision can see only what is already present in the drushrc.php files, and doesn't access database, that's what only frontend does, so any task operating on vhosts (sites, not just platforms) needs to be a frontend (hosting) registered task. This behaviour hasn't changed, I think. We have unforked almost everything in the past already (2016), so also our added extra verify (via frontend) sub-tasks to fix clone/migrate are already in. We sometimes modify something else during debugging, but should try and submit any tested patches so we can stay in sync and mostly unforked. Example of the unforked stuff:

function drush_provision_drupal_provision_clone($new_name, $platform = null) {
  drush_set_option('old_platform', d()->platform->name);

  // If the site is cloned between platforms and not just in the same platform,
  // we should update the info collected about source and target platform first.
  if (!is_null(d($platform)->name) && (d($platform)->name != d()->platform->name)) {
    provision_backend_invoke('@hostmaster', 'hosting-task', array(d()->platform->name, 'verify'), array('force' => TRUE));
    sleep(5); // A small trick to avoid high load and race conditions.
    provision_backend_invoke('@hostmaster', 'hosting-task', array(d($platform)->name, 'verify'), array('force' => TRUE));
    sleep(5); // A small trick to avoid high load and race conditions.
  }
  // We should update also the info collected about the site before running clone.
  $local_uri_verify = '@' . d()->uri;
  provision_backend_invoke('@hostmaster', 'hosting-task', array($local_uri_verify, 'verify'), array('force' => TRUE));
  sleep(5); // A small trick to avoid high load and race conditions.
memtkmcc’s picture

colan’s picture

Makes sense. It shouldn't be that hard to do the same type of thing here: Get the parent ID and run that provision_backend_invoke() call on it.

spiderman’s picture

Digging into this today in an effort to fix as @colan suggests, I ran across the following historical comment in the relevant code

            drush_log(dt('Parent site %vhost re-verify required to include subdir config for %alias', array('%vhost' => $site_name, '%alias' => $alias)), 'warning');
            //
            //   drush_invoke_process('@none', 'cache-clear', array('drush'));
            //   provision_backend_invoke($site_name, 'provision-verify');
            //   drush_invoke_process('@none', 'cache-clear', array('drush'));
            //
            // Running automated re-verify for the parent site is currently
            // too dangerous. It will destroy/delete the parent site's database
            // if the parent and the subdir site use different installation
            // profiles, unless both profiles exist in the same platform.
            //
            // This is too serious limitation and we need to find a better
            // way to automate parent site re-verify when needed to add
            // required include line, which enables all subdir sites.
            //
            // With Drush 4 this could be done with separate task created like this:
            //
            //   drush_log(dt('Run parent site %vhost Verify via frontend', array('%vhost' => $site_name)), 'notice');
            //   provision_backend_invoke('@hostmaster', 'hosting-task', array($site_name, 'verify'), array('force' => TRUE));
            //
            // Unfortunatelly, it doesn't work with Drush 5 and current Aegir 2.x,
            // and is even more dangerous, because instead of creating separate
            // re-verify task for the parent site, it will run it "inline",
            // immediatelly, so in the wrong context, which, depending on other
            // conditions will destroy *hostmaster* database, so it is mentioned
            // here as a nostalgic reminiscence of good old Drush 4, which allowed
            // to create frontend tasks from the backend, safely.
            //
            // Without this re-verify it is fully possible to use any profile
            // with any platform for multiple subdir sites, under the same,
            // single, parent URL roof, with or without the parent site hosted
            // on the main URL (domain name). Which is good news!

So if I understand, we should try re-instating the original provision_backend_invoke() call which doesn't use the frontend hosting_task to re-verify, and in particular test this in a scenario where the parent and subdir site are on different platforms using different install profiles (although from @memtkmcc's comments above, I suspect that still won't work).

Failing that, we can try using the trick from #2798143, calling provision_backend_invoke to create a hosting-task automatically. In that case, according to the code comment, we need to watch for the Hostmaster database being destroyed.

Just documenting my thought process here, and will try these two approaches shortly, and report back :)

spiderman’s picture

I've actually tried the second approach first, as it seemed slightly more likely to work. Attached is a patch which simply uncomments the 2 lines (in both Apache and Nginx versions of the Provision task) which try to create the front-end hosting task to re-verify the parent site.

I've tested this in a couple of scenarios, and they seem to work. My parent site is using a basic D8 platform, and installed with the Minimal install profile

* simple case: install a new subdir site on the same platform, using a different install profile (Umami)
* install a subdir site on a *different* platform, using a different install profile (Standard)

The only concern I have is the part of the old code comment which says "instead of creating separate re-verify task for the parent site, it will run it "inline", immediatelly, so in the wrong context, which, depending on other conditions will destroy *hostmaster* database". I'm not clear what these "other conditions" would be, but that doesn't appear to happen for me. That said, the comment also mentions Aegir 2.x, so maybe it's simply not an issue.

At any rate, further validation and testing (or guidance on what else to try/watch for) by someone with more knowledge of Aegir would be appreciated :)

spiderman’s picture

Confirmed this patch works for me on Apache as well as Nginx. Steps to reproduce:

1. Provision local aegir-dev-vm, add foo.aegir.local to /etc/hosts, pointing to aegir.local IP
2. enable hosting_subdirs module
3. Create 2 platforms, using a basic D8 install (https://gitlab.com/sensespidey/basic-d8.git)
4. Install parent site foo.aegir.local, using Minimal install profile on first D8 platform
5. Install subsite foo.aegir.local/subsite, using Standard install profile on second D8 platform
6. Observe that a Verify task on the foo.aegir.local site shows up in the queue, and runs cleanly
7. Observe that both subsite and parent site are installed and running normally.

colan’s picture

Status: Active » Needs review
colan’s picture

Status: Needs review » Needs work

Instead of using your repo when creating a platform, I just used the upstream official one (which is what I usually do for testing site installations):

I ran into no technical problems with testing, but it would be good to make the following changes:

  1. Remove all of those old comments (and whatever else you've commented out yourself) from the code as they're no longer relevant.
  2. On the Add Site page, remove this as we no longer need it:
    Note: Once the first site in a subdirectory is created and the parent site also exists, the parent site must be re-verified (just once) to turn on the web server configuration for the first (and any future) sites in its subdirectory.
ergonlogic’s picture

Re. provision_backend_invoke('@hostmaster', 'hosting-task', ...), I think this may break our ability to run Aegir headless. That is, the front-end (obviously) depends on the back-end for all the heavy lifting. However, we should be able to run drush provision <foo> without requiring a Aegir front-end site.

That said, we may already break this rule elsewhere in Aegir3, and since this is specific to subdir sites, maybe we shouldn't worry too much about it.

However, it seems like the underlying problem (according to #4) is that we aren't saving relevant info from the front-end, when generating the drushrc files. I think that's likely the "proper" solution here. That is, when we verify the subdir site, we should store a reference to the parent site (via Drush alias, ie, hosting name) in the subdir site's drushrc. That should allow us to just run something like:

provision_backend_invoke($this_site->parent_site, 'provision-verify')
spiderman’s picture

@ergonlogic Thanks for your comments. At least in this case, it seems that the SubdirVhost code has access to the parent site, so the question was only whether to call drush provision-verify or drush hosting-task verify on that site.

I take the point that the better way would be to use the backend provision process to handle this, but it's unclear to me whether the original issues in doing this (mentioned in the code comment I posted above) are still a problem or not. I could test this out, but @colan seems to be indicating we should just use the hosting-task method, even though it breaks the ability to run Aegir headless (if this is already broken elsewhere, maybe it doesn't much matter here?).

That said, I could probably test the alternate approach fairly quickly, but again my question would be: are we certain of all the cases/conditions that make break things?

memtkmcc’s picture

StatusFileSize
new686.81 KB

We are running front-end tasks from backend in several places.

hosting-task

It was introduced for reasons I have explained above. One example issue: #1004526: Automatic aliases are not persisted across rename and clone

We should perhaps try to remove it and see if the old problems have been actually fixed elsewhere in the meantime and we can stop using such workarounds?

memtkmcc’s picture

Note that by removing all of them we will re-introduce the old UX problems -- migration failures caused by the fact that Aegir didn't update packages versions because the site and its old and new platform were not re-verified first, and you can't do that without going via front-end task.

ergonlogic’s picture

Yeah, we ran afoul of #1004526: Automatic aliases are not persisted across rename and clone when working through #3036890: Simplify subdir site installation, and had to trigger an extra verify to re-instate the automatic subdir aliases. So, it's pretty likely we'd have problems here, at very least losing any aliases on the parent site.

For expediency, I think triggering a hosting-task from the backend is reasonable. If anyone wants to tackle #1004526: Automatic aliases are not persisted across rename and clone, feel free, but it shouldn't hold up an immediate fix.

According to #11, it looks like we just have some documentation cleanup to add to the patch in #8, in order for this to be RTBC.

For whatever it's worth, this is an instance of Aegir's split-brain problem. Æegir5 (currently pre-alpha) considers the database driving the site (including APIs, console commands, etc.) as canonical, and eliminates the need for any code to pass user data to the back-end.

colan’s picture

I agree that we should continue with what we're doing here, and then the larger problem can be handled in a new issue. I'll leave that to @memtkmcc or @ergonlogic to create as you two have much more context than the rest of us.

So, @spiderman, please continue with what you were working on.

spiderman’s picture

I realized that this patch breaks login to the subdir site, so at @ergonlogic's suggestion, will try implementing hook_post_hosting_TASK_TYPE_task in hosting_subdirs module instead, in order to avoid the mess of spawning the Verify from the provision backend. Patch forthcoming, if I can get it to work :)

spiderman’s picture

Here's a pair of patches- one effectively comments out the warning so the install task will complete Green, and replacing the older comment with a new one pointing to the replacement hosting_subdirs_post_hosting_install_task() hook implementation.

Combined, these seem to work for me, and I can see the drush_log message in the Install output:

Run parent site foo.aegir.local Verify via frontend (from hosting_subdirs)

A couple of notes that I discovered in testing:

  • Only the *first* subdir site installed on a given parent actually needs the parent site re-verify task to be run. Subsequent subdir sites will Just Work(tm) because the parent vhost config already includes the relevant subdir.d directory.
  • The subdir.d directory (at least for Nginx, haven't yet tested for Apache) doesn't get cleaned up when subdir sites (or their parents) are deleted.
  • If I delete a parent site, then re-create it with the same URL, the install process seems to automatically pick up the residual subdir.d folder that didn't get cleaned up, and creates the appropriate Vhost config to include them.

All of this means that to test the patch properly, you need to install a parent site using a name you haven't yet used on the aegir instance, or have manually cleared out its vhost config after deleting.

I believe the behaviour described above is a bug, and have filed a separate issue to address it.

spiderman’s picture

Assigned: Unassigned » spiderman
Status: Needs work » Needs review
StatusFileSize
new2.58 KB

I've re-tested these latest 2 patches against a freshly build Aegir instance using Apache, to validate both webservers. Each is achieving the goal (auto-verify parent site to enable subdir site to work) without breaking the reset-login link (my earlier patch had this mysterious side effect).

I've also re-rolled the second (really, the main) patch to Hosting Subdirs, to remove the Note @colan had noted on the Add Site page (thanks Colan!)

If somebody can validate on a different setup, I think we can commit :)

ergonlogic’s picture

We're currently supporting a deployment where our clients don't have control over their organization's domains. They have a single DNS entry pointing to the Aegir server (eg. aegir.example.net), and need for that to be the parent for all subdir sites (eg. aegir.example.net/subdir1, aegir.example.net/subdir2, etc.

This use-case is particularly problematic, since it results in the Hostmaster site going down every time a new site is installed, or an existing site migrated, or even just verified. This is fixed by simply verifying the Hostmaster site on the backend (ie, drush @hm provision-verify). However, no one should be expected to know this, and being faced with a 404 on your Hostmaster site would be understandably unnerving.

My understanding was that this only affected parent sites when the first subdir site was installed. But, as it stands, it appears to cause problems on even the most basic tasks (ie. verify). I'm planning to review the underlying vhost configs, to see if some refactoring there can alleviate this somewhat.

ergonlogic’s picture

It seems to me that there are 3 vhosts in play here:

  1. subdir.d/aegir.example.net/subdir1.conf
  2. vhost.d/subdir1.aegir.example.net
  3. vhost.d/aegir.example.net

Only the last one changes when the aegir.example.net/subdir1 site is verified:

 server {
-  include       fastcgi_params;
-
-  # Block https://httpoxy.org/ attacks.
-  fastcgi_param HTTP_PROXY "";
-
-  fastcgi_param MAIN_SITE_NAME aegir.example.net;
-  set $main_site_name "aegir.example.net";
-  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
-  fastcgi_param db_type   mysql;
-  fastcgi_param db_name   <db_name>;
-  fastcgi_param db_user   <db_user>;
-  fastcgi_param db_passwd <db_pass>;
-  fastcgi_param db_host   localhost;
-  fastcgi_param db_port   3306;
   listen        *:80;
   server_name   aegir.example.net;
-  root          /var/aegir/hostmaster-7.x-3.180;
-  # Extra configuration from modules:
-  include       /var/aegir/config/includes/nginx_vhost_common.conf;
   include       /var/aegir/config/server_master/nginx/subdir.d/aegir.example.net/*.conf;
 }

So, basically, it looks like the parent site (aegir.example.net) placeholder vhost is being generated despite it already existing.

ergonlogic’s picture

Assigned: spiderman » ergonlogic
Status: Needs review » Needs work
StatusFileSize
new5.8 KB

Attached is a patch to clean up how we identify whether a parent site exists. I suspect that this'll require re-working the approach the other patches have taken. I'll continue working on this aspect.

Note that the behaviour I noted above (subdirs keep breaking the hostmaster site) is due to looking for Drush aliases, which generally take the form of example.com.alias.drushrc.php. However, there's a notable exception: hostmaster.alias.drushrc.php

Unfortunately, we don't have a good way to get context outside the specific site/platform/server that we're operating on, or directly linked entities. So the "parent" site will only be found if its primary URL is the parent URL, since that's what's used to look this up (on the backend) if the site uses the parent URL as an alias, it won't be detected.

ergonlogic’s picture

StatusFileSize
new13.98 KB

It turns out the Apache and Nginx classes were identical, so I moved this logic up to a base class, and just inherit for both service types.

ergonlogic’s picture

Assigned: ergonlogic » Unassigned
Status: Needs work » Needs review
StatusFileSize
new14.46 KB

Building on the recent changes I suggest above, it was pretty easy to trigger a verify of the parent site on installation with provision_backend_invoke(). The concerns in the (very old) comments no longer seem to apply. For example, if the parent site is using an alias, that persists across the installation of a subdir site, and the subsequent re-verify.

ergonlogic’s picture

  • ergonlogic authored c920c2e on 3066146-verify-subdir-parent
    Issue #3066146 by spiderman, ergonlogic, memtkmcc, colan, llamech:...

  • colan committed 5703f64 on 7.x-3.x
    Issue #3066146 by spiderman, ergonlogic, memtkmcc, colan: Merge branch '...
  • ergonlogic authored c920c2e on 7.x-3.x
    Issue #3066146 by spiderman, ergonlogic, memtkmcc, colan, llamech:...
colan’s picture

Status: Needs review » Fixed

Thanks everyone. I've been testing this, and it's working well.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

colan’s picture

colan’s picture

I started getting this again. Can anyone reproduce? Maybe there was a regression here?

Parent site (aegir-dev.example.com) re-verify required to include subdir config for aegir-dev.example.com/site1