Under what circumstance would one want to run 'git pull' on a platform? or 'git checkout' for that matter. I don't see any post hooks to update sites on the platform, or anything like that. It seems like we're allowing, even encouraging the modification of platform code under existing sites. This has always been strongly discouraged, for good reasons.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ergonlogic created an issue. See original summary.

gboudrias’s picture

I've never been a fan of gitifying platforms and I think we should remove the functionality while there are still few users. The feature belongs in a more controlled workflow environment such as devshop.

formatC'vt’s picture

i do it when i need it =)
For example: upgrade core from 7.38 to 7.39

formatC'vt’s picture

I think we should do this:
1) Separate user permissions for git pull/checkout site and platform
2) Add post hooks to update sites on the platform
what are you think?

ergonlogic’s picture

TL;DR The git pull task on platforms are dangerous and should probably be blocked pending re-work to make them safer.

Let me elucidate my concerns:

Changing code under a running site leaves it in an inconsistent state. That is, the new code expects a certain database schema, but until update.php is run on all sites on the platform, this won't be the case. update.php ought to be called when the site is in maintenance mode.

So the workflow, at this point, would be:

  1. Put all sites on the platform under maintenance mode
  2. Git pull
  3. Run update.php on all sites
  4. Take all sites out of maintenance mode

There are several issues with this workflow. If we were to run update.php on all sites in parallel, this could be very resource intensive, to the point of crashing the server if you've got a couple hundred sites on the platform. Otherwise, we'd be running updates serially, and some unlucky site will be in maintenance mode for the entire time it takes to update all the other sites. The uptime of the next to last site won't be in much better.

But it gets worse. What if something goes wrong on one or more sites? At very least, we'll need to take backups of the sites before running the git pull. Triggering dozens or hundreds of backups in parallel is a recipe for disaster, from a disk I/O standpoint, at very least. Plus, we'd need to put the sites in maintenance mode prior to the backups, to avoid data loss if we had to rollback.

I'm not even sure what a rollback in such a case should look like. We cannot just revert back to the previous checkout on the platform, without restoring the backups for all the sites that succeeded prior to the error that triggered the rollback. So what do we do with any broken sites? I suppose we could build a new platform based on the prior commit, and deploy the broken sites' backups there. But if we're going to do that, we're pretty much back to Aegir's recommended process of migrating between platforms, just taking a long, arduous and risky way around.

Maybe I'm missing something obvious, but to me platform-level git repos are pretty much conflating sites and platforms. As soon as you have a second site on a platform, at least one of them will suffer increased downtime and risk. Even for a single site per platform, we should still follow the process above to ensure the safety and reliability of such updates.

I suppose a user could safely maintain two platforms off the same git repo, and migrate sites in a leap-frog fashion. That is, have all sites on one platform, run git pull on the other platform, migrate all sites to the updated platform, then reverse that process for the next git pull. If this is the workflow we want to support, then we'd probably only want to allow pull tasks on platforms without any sites.

Again, unless I've missed something, we're looking at non-trivial work to support this. For now, I suggest that we simply disable pull tasks on platforms. If someone is already using it as part of a safe update process, I'd love to hear about it. In that case, we could split out platform-level git repos and pull tasks into a separate feature, and put a warning in the description pointing to documentation for the proper process.

I'm tempted to turn this into a bug report, since this functionality doesn't appear to conform to our usual policy of helping users not shoot themselves in the foot.

helmo’s picture

TL;DR: I do use it in two ways. Separating the permission and adding a warning seems fair.

1) For small sites I have a common platform in git, where the sites then also have their own git repo for the site specific stuff.

2) For larger sites I sometimes create a custom platform (often also for legacy reasons), these often existed in sites/default before I put them under Aegir.

My reasoning for a platform in Git:
* Complexity: I started with drush make years ago, but kept patching and debugging it. Maybe these days more of the edge cases are flushed out, but back then committing to git took way less time then to figure out the right make format for a library that comes as a zip, only available via a POST request, with the desired code in a subdirectory. Not to mention borky SSL certificates.

* Comparability: Another advantage of all code in Git is for me the ease of reviewing. I just drush dl --gitcommit, and git diff. I could run diff on /var/aegir/platforms/platform-x/sites/all/modules/contrib/views /var/aegir/platforms/platform-y/sites/all/modules/contrib/views after building it, but it's more typing/TAB-ing.

* Dependency: And I don't like to depend on remote sites to host my production code. If the download page for some fancy jquery plugin is down when Drupal releases a security update I'll have trouble building a new platform.

Situations to pull instead of migrating to a new platform:
* Small security updates that don't bring update hooks (yes I review that code before deploying)
And often from looking at the code I can see that maintenance mode is not needed. Even with some updates hooks.
I know it's not a best practice to do unless you know what you're doing.
* Minor CSS updates
* Updates already thoroughly tested on a staging site.

At the confirmation dialog we already have a checkbox for "Force: Reset --hard before pulling?" that could have an option to run db updates.

This also relates to #1456258: Limit git features by platform where also checkboxes for drush fra and the drush cc all were mentioned.

One of my thoughts is that rules integration could also help here to add more custom actions and safeguards. Unfortunately that D7 upgrade has not gotten finished, #2323959: Upgrade to 7.x-3.x

formatC'vt’s picture

Yes, it's a dangerous stuff and result can be a nightmare, but no one is pushing you, this is your decision to use git or not.

cweagans’s picture

Sorry for the late reply on this. As the original author of this code in (hosting|provision)_platform_git, I can give some info:

- This functionality was written for SLAC.
- All of their sites are deployed from a custom install profile (so if they need to recreate a site, they can just delete it and re-deploy it)
- They use it for pretty much everything (migrate is too slow for them - their sites are many GB between the files and database, and each of their environments (dev, stage, prod, etc) are on different web server clusters)
- In their case, there's literally no user generated content after about 8pm and before ~6am, so their backups run at night between those times.
- If they deploy new code and something breaks, they have an easy process to restore from a backup. Most of the time, it's CSS tweaks or security patches. Things like that. Even when they're deploying new functionality, though, it's more of a "We're adding this module, site owners. You're welcome to turn it on if you want.". They don't *ever* remove anything.
- Putting individual sites in Git didn't really make much sense for them because there is no custom code running on each site. You can basically think of their infrastructure as a SaaS product for internal use.

Generally, the thought process was that while the Migrate task provides good reliability at a technical level, you can also achieve that reliability in other ways - in their case, from a meatspace process.

cweagans’s picture

(Oh, and generally +1 for keeping it and adding optional settings to improve reliability)

I'll invite the SLAC people to comment here.

ergonlogic’s picture

Okay, fair enough. What all these responses have in common is that the operators know the risks, and are mitigating them through their own processes and/or project architecture outside of Aegir itself. This entails fairly advanced knowledge of Aegir, Drupal and git.

So I propose that we:

  1. Split platform- and site-level git functionality into separate Hosting features
  2. Add more granular permissions
  3. Add a new 'Advanced' section to the Hosting features page, with appropriate warnings and caveats
  4. Add options to the task dialog to prompt for backups, and other pre- and post-task operations

(3) would be in Aegir core, and should be pretty simple. We can also put clustering and other edge-case features in there.

Thoughts?

niccolox’s picture

a few scatter-gun points

  1. git is univeral, its industry standard, its how the other Drupal hosting cloud lords operate today, Drush make is something you run locally or in a custom workflow, but not on the hosting platform. I understand Ant composer.json etc are also there, but we are talking hosting and web based devops. Also even projects with manifests like ant or composer.json offer git repos as an alternative download and deploy
  2. git platform has been in contrib for years now, and in production at SLAC for ? and so its the biggest public case study, thats not a bug, its a feature
  3. the single site per platform use case needs to be the basic unit, all mass-hosting oriented platforms start this way, and I assume there is often a master site that is cloned when new sites are created, and much testing is done on the master site
  4. devshop is irrelevant, we are talking aegir 3, not aegir 2, devshop is also aegir-based, not aegir mainline
  5. it seems to me that even on 1 site / 1 platform the missing element, or perhaps the assumed element is that there is a staged workflow, the standard and universal dev>test>live. We need a new contrib module that makes this work by default. Maybe something like Git Platform Deploy (dev>test>live)
  6. reading Camerons description of SLAC, and having seen a demo at badcamp, I now understand why the interest in disposable container PAAS. ALL the PAAS I have seen and used; Deis, Dokku Alt, Flynn, Octohost, Pantheon, Acquia, PLatform.sh, OpenShift etc etc use Git.
  7. the love affair with Docker based PAAS is based on a git like workflow
  8. Git is likely to be the only common denominator between Aegir 3 and Aegir NG, Git is going be the bridge between Aegir 3 and Aegir NG, not Drush make or Composer
  9. if we had Platform version, Sites created, tasks performed stored in a log file, ideally Ansible readable i.e. platformxyz-dev-1.0 and site1-xyz-1.0 we could leverage all sorts of Git reporting tools, graphing tools, CI etc etc

in short, instead of turning this feature off or hiding it away as "advanced" its actually basic, and the most simple option 1 site, 1 platform

we need a simple DEV>TEST>LIVE workflow for the single site/platform Git Platform use case, i.e. the theoretical Git Platform Deploy (dev>test>live feature)

for the complex mass hosted multiple site/per platform Git Platform use case, we need to offer even more guidance

one thing that I am learning more and more, is that there are lots of POLICIES built into Aegir tools, and so its a matter of gathering these use cases and best practices and baking them in as defaults, especially the simple and most common single site/single platform use case...

I would also add, that the Aegir Summit featured git based workflow, for platforms and sites, the genie is out of the bottle

cweagans’s picture

For multisite mass hosting (where Aegir excels right now), I'm not sure that this workflow (git platforms) is the best. SLAC was a pretty weird project from a tech stack standpoint. For many users of Aegir, the current migrate workflow is really the "right" way to do it, but I agree that we shouldn't lock people out of other workflows if they are deemed to be appropriate.

Note that the PaaS solutions we're implementing for Aegir 4 essentially do the exact same thing as the Migrate task, but under the hood. The main reason for doing this is so that we can roll back if something is horribly wrong with the new container, and that, IMO, is a good thing to suggest by default in older versions of Aegir too.

hosting_platform_git/provision_platform_git were open sourced approximately when they were written, so they've been used by at least one org for two years.

I think the solution here is a two-parter:

1) Disable git platforms for new installs and hide it in an "Advanced" section on the features page
2) Get the Rules integration working so that people can start informing Aegir of their human-driven workflows. If that means that "Update this platform" = "Git pull and hope for the best", that's an end user decision. It could also mean, however, that "Update this platform" = "Provision a new platform with this Git repo + tag/branch on the same server, clone sites to that platform, check that everything is okay on those sites, then remove aliases from the old sites and apply them to the new sites so that the real site URL points to the new platform". Both are perfectly valid workflows depending on the org requirements, but we shouldn't decide which one, IMO.

Just my $0.02.

niccolox’s picture

to be fair, the Git Pull Task is in Experimental and the Git Checkout is in Roles and Permissions

Features marked experimental have not been completed to a satisfactory level to be considered production ready, so use at your own risk.

some simple changes in description and grouping would help

Git pull task
Enables git pull tasks on sites and platforms.

Roles & permissions
Git checkout task
Enables git checkout tasks on sites and platforms.

Git Checkout Task

ergonlogic’s picture

FYI, 'Roles & Permissions' is a collapsed fieldset providing further details about the feature above.

The git functionality on platforms has distinct use-cases from that on sites, and so, should be separate features, imo. I really don't like seeing the 'git url' field when I'm creating a platform, for example, when what I really want is to keep site config under git. Likewise, in a single site per platform scenario, having git repos on sites is superfluous, as are their related tasks, etc.

I think we might want to consider some default behaviour for platforms under git. For example, we might want to lock the platform once a site is installed on it. This'd require an opt-in to the more risky workflows.

niccolox’s picture

one of the confusing things is that when you come from a cloud lord git centric workflow and you see git and site you think git platform/site as the same thing

but in the Git Integration in Aegir it means a site folder, site level git repo

ergonlogic’s picture

Title: Use case for 'git pull' on platform? » Split site and platform Git features

Just to be clear, I don't want to limit anyone's ability to build custom workflows with Aegir; quite the opposite. I'm only expressing concern with what we recommend, i.e., default behaviours. I think Aegir users have a right to expect that we won't recommend dangerous workflows.

Git support is one of the biggest new features in Aegir3, as evidenced by it being among the very first golden contrib to be added. I'd like to move it out of 'experimental'. After all, most, if not all, Aegir maintainers are using it in some fashion in production. So I'm working on (1) and (3) from #10. The others can be worked on later, if desired, in other issues.

Aegir4 ought to support platform-level git repos properly, along with sites/apps that use Composer, Gemfiles, etc. These serve the same purpose as Drush Make, which will likely go the way of the dinosaur, since Composer is much better. Common Git upstreams are also a reasonable way to use platform repos in a multi-site-like fashion. I believe our efforts are better served working towards Aegir4. Containers are popular for good reasons. Kubernetes and Openshift provide robust solutions to some of these very issues.

ergonlogic’s picture

Status: Active » Needs review

Pushed suggested changes to dev/2555129 branch. Note that it depends on 7a148998d4 in Hosting, where I added an 'advanced' group for Hosting features.

Note that I haven't tested this thoroughly, but the UI changes appear to work.

formatC'vt’s picture

thx, i will do some tests in few next days

helmo’s picture

I only did a very limited amount of testing, but added an update hook and fixed a typo in the dev branche.

formatC'vt’s picture

Status: Needs review » Needs work

We have a problem with pull because hook form alter call order is:

hosting_git_pull_form_alter
hosting_platform_git_form_alter
hosting_site_git_form_alter

but hosting_git_pull_form_alter should be called last, not first
The module order is determined by system weight, then by module name.

And _hosting_git_site_or_platform_enabled() is undefined.

formatC'vt’s picture

I do think we need merge hosting_git, hosting_git_pull and hosting_git_checkout modules and hosting_git can host all of the code. And platform/site modules can be implement only variable_set on module/enable just because hook_hosting_feature requires a module

ergonlogic’s picture

I'm ambivalent about how to handle individual git tasks. Right now, they're all in the same Provision code, under hosting_git, but we have separate features for a couple tasks on the front-end, and no tasks for all the rest. We have feature requests for a number of these to be exposed via front-end tasks, though. If we go the route of individual modules per task, then we'd probably want to split up the backend accordingly.

Historically, we've tended towards monolithic code-bases, which complicates debugging and such, even if it saves on the extra boilerplate of separate modules. Considering how some of these tasks are fairly stable, it'd be nice to move them out of 'experimental'. If we're adding new tasks to the main hosting_git though, this'd keep us from doing that. At least, I'd argue that we shouldn't. Individual modules per task allows for a greater separation of concerns, and would allow us to add additional 'experimental' git tasks, while promoting the battle-tested ones.

As for the order of form_alter hooks, I'd suggest just lowering the weight of hosting_git.module, to ensure that it runs before the other alter hooks. Alternative, increasing the weight of the others should accomplish the same thing.

Jon Pugh’s picture

See branch 2897894-git-hooks

This adds extensible "git hooks" to platforms and sites. Very simple to add:


/**
 * Implements hook_provision_git_hooks();
 * @return array
 */
function provision_git_provision_git_hooks() {
  return array(
    'update' => 'provision-update',
    'cache' => 'provision-flush_cache',
    'registry' => 'provision-rebuild_registry',
    'revert' => 'provision-features_revert_all',
  );
}

There is a similar hosting hook.

To handle the platform/sites problem, if you run provision-git-pull or provision-git-checkout on a platform, it will run the configured git-hooks on all sites.

colan’s picture

As Aegir Hosting Git does some overly permissive stuff (updating platforms in place as described in the issue description, which should be discouraged), you can now use the Aegir Platform Git module for platforms instead. It won't allow you to do this. It's a submodule of Aegir Deploy.

colan’s picture

Category: Support request » Plan
Status: Needs work » Needs review

We should mention this on the project page here to make users aware of the danger, and provide a link to the other module.