Drupal 10, the latest version of the open-source digital experience platform with even more features, is here.This issue has addressed several intermingled issues, with lots of questions and suggestions. I'll try to summarize and synthesize them here.
Do we want allow files/modules/themes/libraries to be moved out of the sites/ directories?
Sites are "owned" by clients, and just happen to live on platforms. Benefits would include the possibility of using a separate mount for files, and greatly simplify setup of ACLs, SFTP, etc. We seem to be converging on these settings being system wide, rather than per platform or per-site, but it hasn't been discussed in depth. It would however break our current backup methods, and so these would have to be addressed as well. Consensus seems to be that we should allow for it, but not make it the default.
Where should they go?
There appears to be two schools of thought about where a client's sites' directories should live.
The first is along side everything else in the site's directory. That is: /var/aegir/clients/[client_name]/sites/[site_name]/files. The client's directory, since it stays stable through upgrades and such, would be a good candidate for a chroot (for SFTP, etc.) There is no automated way to delete a client's directory at present, but safeguards would have to be put in place to check for the existence of sites, before deleting a client (as we do presently for platforms.) Allowing these safeguards to be overridden could be good for quota management. Moving a site between clients would involve moving these files as well.
The other option is somewhere outside of the clients directory, such as /var/aegir/site-files/[site_name]. This avoids the possibility of accidental deletion of files when deleting clients, and could simplify changes in site owner/client, since it would just involve re-writing symlinks.
Either way, we'll need to allow this to be set or overridden easily, based on the policies and priorities of the Aegir admins.
How would we do it?
First, we'll need a flag that lets the backend provision commands (install, deploy and verify) know about the fact that we're symlinking, so that those operations can move file directories into the symlink root and construct symlinks. We should consider splitting that function between directory creation and permission verification. This would allow ACL code to be merged in or hooked better, but would also allow us to avoid making modifications to core for every contrib module.
We already have several patches that provide hooks to:
- override these optional install directories,
- adjust symlinks accordingly, and
- allow altering the chgrp behaviours.
Backups and clones would have to know about these new directories, or risk losing files from backups, or having multiple sites share the same files after clones. Since we cannot reliably follow symlinks, we may also need to adjust the backup command to add those files to the archive when doing backups, which in turn may mean doing the backup in two steps (tar then gzip) to be able to append to the archive. An alternative would be to start using "zip" instead of tgz, which supports appending natively.
The core of this refactoring needs to happen install and deploy hooks, most specifically in _provision_drupal_create_directories(). Verify also uses that function, but possibly just to change permissions. We'll probably also need to prevent backups from fetching files directories on remote server, to cover cases where the symlink goes to an NFS share or DFS share like Gluster.
Original issue description:
Here is what I am thinking:
1. Add attributes to sites for files and tmp directories - this attribute cannot be shared with other sites (in Hostmaster?)
2. Install, Verify and Migrate tasks merely create or move around symlinks to these directories
3. Backup and Restore tasks work normally
| Comment | File | Size | Author |
|---|---|---|---|
| #44 | hostmaster_move_sites.patch | 6.85 KB | ergonlogic |
| #44 | provision_move_sites.patch | 22.19 KB | ergonlogic |
| #43 | hostmaster_move_sites.patch | 6.12 KB | ergonlogic |
| #43 | provision_move_sites.patch | 14.35 KB | ergonlogic |
| #42 | hostmaster_move_sites.patch | 6.16 KB | ergonlogic |











Comments
Comment #1
j0nathan CreditAttribution: j0nathan commentedSubscribing.
Comment #2
izmeez CreditAttribution: izmeez commentedsubscribing
Comment #3
Steven Jones CreditAttribution: Steven Jones commentedI'm not sure that this would be the answer to improving migration performance, I think that the main bottleneck is actually gziping the backup, so an idea might be to allow users to skip that step, and just create a tar ball, and not gzip it.
Comment #4
j0nathan CreditAttribution: j0nathan commentedCompressing big sites kills the server. This is a big issue we have here.
Comment #5
anarcat CreditAttribution: anarcat commentedWell, tarring also takes up significant I/O bandwidth... I think we should have a setting for this, that would basically a "files root directory" where the files, modules and themes directories would end up if the directory is set. We are currently using something like this for sftp-only accounts, and it basically looks like this:
This is pretty "koumbit-specific" though, but I think we could figure something out for a more common pattern.
Maybe we could simply send the files in the "clients directory" we are already creating? That would mean:
This would basically reverse the current practice where /var/aegir/clients/example/example.com is a symlink to sites/example.com though, and it could be confusing for existing users...
But in general, I like the idea of *allowing* for this kind of optimization, but it should be made clear that it means that those things are not actually backed up when you "backup" your site!! (Or maybe we can make the actual backup cross symlink boundaries, but not migrate)
So I guess there are a few bullet points to resolve here:
* how (or if) to allow files/modules/themes/libraries to be moved out of the sites dirs - i vote for yes and to the clients directory, but not do it by default (make it a flag)
* whether the backup tasks will continue to backup those directories even if they are outside of the sites directory (by following symlinks) - i think we should, but make migrate not follow symlinks during its backup...
What do you guys think?
Comment #6
halcyonCorsair CreditAttribution: halcyonCorsair commentedSubscribing
Comment #7
Josh Waihi CreditAttribution: Josh Waihi commentedOn another note, moving sites/foo/files out of the Drupal root is a great idea for web clustering. On non Aegir sites we create an NFS partition at /var/lib/sitedata and symlink sites/foo/files to /var/lib/sitedata/foo/files.
So in a similar fashion, I'm all for a symlink from /var/aegir/clients/example/example.com/files to sites/example.com/files.
In the meantime, I've implemented a simple patch that invokes a drush command hook when provision creates directories. This allows me to implement a hook that can remove directories from being created and replacing them with symlinks.
Comment #8
crea CreditAttribution: crea commentedDoes symlinking work with Drupal at all ?
I remember there were issues wth images.
Comment #9
Josh Waihi CreditAttribution: Josh Waihi commentedWe've been using it for years. Its fine so long as your web server allows it.
Comment #10
crea CreditAttribution: crea commentedI meant issues like this one #155781: "realpath" check breaks symbolic links in file directory
Comment #11
anarcat CreditAttribution: anarcat commentedHi Josh!
first, thanks for the patch! this looks like a good thing that might be useful for contrib modules... However, I do not feel it belongs to this issue, can you open another one?
second, when you do upload patches, please mark the issue as needs review so that we notice it more easily...
Comment #12
anarcat CreditAttribution: anarcat commentedComment #13
anarcat CreditAttribution: anarcat commentedThe code above was moved to #1283738: Add new hook provision_drupal_create_directories in _provision_drupal_create_directories, now we need code that implement that hook, @halcyonCorsair - can we see that code? :) Maybe that could be shipped as an option in 2.x...
Comment #14
crea CreditAttribution: crea commentedAtleast we should stop using gzip. Most data files of sites are already compressed media anyway, so simple tar is enough.
Those who are interested in this should fix #1322964: Provision file extract() doesn't support extracting simple tar files
Comment #15
crea CreditAttribution: crea commentedDid anyone here try to use symlinks without the patch ? _provision_drupal_create_directories() runs is_dir() check which should skip symbolic links, and tar should work with sumbolic links too. It looks like it should just work.
Comment #16
crea CreditAttribution: crea commentedI'm testing the symlinked file dir. For now, it seems to work without any patches. I only tried to migrate though.
Comment #17
crea CreditAttribution: crea commentedThere's an issue with this approach, that it removes possibility of easy site experiments. With separate file systems, one could upload new content or delete existing one on site clones when experimenting. With this approach, clones share the files so we must be careful. Also, if a site upgrade deleted or somehow broke the files, it's not reversible.
I'm trying to evaluate if these side effects are tolerable for me. I think, having all files backed up is a good thing. I don't want to lose the ability to experiment with site clones. I think, cloning should always create separate (symlinked) file directory, so the clone operations are fast and isolated from the main site at the same time.
In an ideal world, I would like being able to do copy-on-write snapshots of file system. Sweet dreams..
LVM snapshots could work, but integrating it with Aegir is not for the faint of heart..
Comment #18
crea CreditAttribution: crea commentedIn the end, we would want to use something like copy-on-write file cloning instead of tarring & gzipping. Btrfs has it, but its not suitable for production use yet.
Comment #19
izmeez CreditAttribution: izmeez commentedI'm fairly new to aegir and struggling with some symlink ideas and hope this is not off topic for this issue.
On a dev server, created and verified a platform then added a site.
Login to site is fine. Trying to change configuration > file system does not seem to take and keeps reverting to defaults.
Looking at the aegir platform site's setting.php file it is generated and maintained by aegir and includes:
To add symlinks would it be best to leave the settings.php file alone and replace the 'sites/example.com/files' and 'sites/example.com/files/tmp' with symlinks to point to a sites_data/domain/files folder ?
Will this allow aegir to continue to maintain the settings.php file and not require the domain/files to be copied when the site is migrated to a new platform?
I am confused by the discussion on symlinks related to the /var/aegir/clients/USER/domains and not sure if this is really the better place to create symlinks?
Thanks for your time and any suggestions.
Comment #20
omega8cc CreditAttribution: omega8cc commentedTo quote our IRC conversation with anarcat:
Comment #21
anarcat CreditAttribution: anarcat commentedI have done some analysis on this problem in #1484214: [meta] migrate/clone performance optimizations, i copy the relevant text here:
Comment #22
shaisachs CreditAttribution: shaisachs commentedI've been poking around with symlink'ed file directories in Aegir (on 1.9) for a couple of days now. The patch in #7 is a great start! However there are a couple of minor modifications needed: a) use drush_command_invoke_all_ref to allow changes to the directories array, and b) allow modules to alter the chgrp list, as well.
I've attached a patch that should address both of these issues. So far it's working pretty well!
Comment #23
shaisachs CreditAttribution: shaisachs commentedWell, turns out my last post was a little optimistic. On top of handling them in install and verify (which the patches in #7 and 22 address) full support for symlinked directories includes..
Turns out provision (1.x) needs some involved surgery to support all of that; this patch should take care of that stuff.
Comment #24
ergonlogicI'd really like to see this get in for the Aegir 2.x release. From a the stand-point of a hosting provider, this will greatly simplify setup of ACLs, SFTP, etc.; all priorities for Koumbit. However, it's horribly API-breaking, and so needs to get as part of a major version upgrade.
To that end, here's a patch that modifies the default behaviour of _provision_drupal_create_directories(). It's pretty basic at this point. So far, it only works for site installs, and doesn't yet even handle install rollbacks, etc. But, I'd like some feedback on the general approach. Basically, it just moves the creation of all site sub-directories to /var/aegir/clients//sites/, and sets up symlinks from a site directory on the platform (where settings.php and drushrc.php live).
I figure backups can also start from within the client's site directory, since we're generating a dummy settings.php. Later, we could optimize migrations to just clone the db and write new symlinks on the target platform. Clones could do likewise, but copy the client's site directory first.
Comment #25
omega8cc CreditAttribution: omega8cc commentedIt is a bad idea to put sites directories/files in the client directory tree and I have explained why in the #20 above.
Comment #26
ergonlogic@omega8cc: The gist of your concern there, as I read it, is that deleting a client would delete their sites. Presumably we could check for the existence of sites prior to allowing the deletion of a client. We could presumably re-use provision_drupal_find_sites() for this, as we do currently for platforms, no?
Comment #27
omega8cc CreditAttribution: omega8cc commented@ergonlogic: Not only that. Also, what if you need to change site owner, like when you want to delegate both front-end and backend (via clients/foo/bar.com) access to some other developer? You will have to move all those files there. In contrary, when managing all sites's files directories the same way we manage platforms and clients (they have dedicated sub-directories in the Aegir root/home directory), so sites files dirs would exist in ~/sites/{foo,bar}.com/files tree, all you need to do is to manage symlinks, no matter which client is the current owner and which platform is currently used. At least this is how I think it could be done to keep things simple and backward compatible, since people already symlink the files dirs if they are big enough to cause timeouts and other issues in this archive/extract workflow we use now.
Comment #28
omega8cc CreditAttribution: omega8cc commentedOr even simpler: ~/sites/{foo,bar}.com.files tree - so no extra directory level per site is needed, just a directory named that way, or ~/sites/{foo,bar}-com-files, to make it less weird, etc.
Comment #29
helmo CreditAttribution: helmo commentedThe solution from #28 looks promising.
I'd probably prefer: ~/sites/{foo,bar}.com-files
Comment #30
ergonlogicIt seems to me that if you change the owning client, moving the files would make sense...
Which ~ are you talking about here? Aegir users/clients don't currently have system users by default. Or are you suggesting /var/aegir/sites/...?
Comment #31
omega8cc CreditAttribution: omega8cc commentedYes, I mean
~/for aegir system user, so by default/var/aegir. The extra directory could be/var/aegir/sites-filesmaybe, to avoid confusion? And yes,~/sites-files/{foo,bar}.com-fileslooks better, since domains can also have dashes, and converting dots to dashes could only add extra confusion.Comment #32
ergonlogicI just updated the issue summary based on the discussion up to this point. Please feel free to point out anything I've missed or mis-interpreted. I'm also changing the issue title to reflect what we're actually talking about doing here.
Comment #33
ergonlogicRather than make this a global setting in Aegir, I implemented this as a field on web servers, so that a source directory for site symlinks (e.g.
/var/aegir/clients/sites) can be specified. This way, if it's left blank (the default) we can maintain our current method of building these directories on the platform. However, if it's set, we'll already have the new directory to build these in, the source for symlinks, etc. This could happen prior to callinghook_provision_drupal_create_directories_alter()'s to allow fine-grained controls, such as moving/filesout of the site's directory, mounting these under NFS, etc.FYI, I'm working on these over in my sandbox repos here:
Comment #34
omega8cc CreditAttribution: omega8cc commentedWe would need to move also the
privatedirectory.Comment #35
omega8cc CreditAttribution: omega8cc commentedAlso, no, we didn't talk here about {modules,themes,libraries} before (at least I don't remember that), only about {files,private}, so the new summary is a bit misleading.
[EDIT] Of course, I was wrong, sorry. While it wasn't mentioned in the original description, anarcat listed it in his comment #5
If we want to move also {modules,themes,libraries} (probably to make creating backups simpler), we need to move them to *the same* directory where {files,private} will be moved, so effectively those "two schools" are about moving all directories to the clients subdirectory or its own, new subdirectory in the aegir root.
Comment #36
ergonlogic@omega8cc If it's a matter of moving all site directories, that'll simplify things, as we won't have to special-case /files :) I'll update the summary.
Comment #36.0
ergonlogicUpdate summary to reflect discussions.
Comment #37
helmo CreditAttribution: helmo commentedMoving {modules,theme,libraries} was not really in my mind either. As I would not expect these to get BIG.
Comment #38
izmeez CreditAttribution: izmeez commentedI was also thinking of just the files directories which is data and may be private or public whereas sites/modules, themes,libraries are code and thus a different beast.
Comment #39
ergonlogic@helmo, @izmeez Okay... but is there reason to treat any of these differently by default? Bear in mind, we're still likely to have a hook_provision_drupal_create_directories_alter() to allow overrides, should one really want to treat them differently.
Comment #40
helmo CreditAttribution: helmo commented/var/aegir/sites/example.com/files and /var/aegir/sites/example.com/modules...
looks better to me then /var/aegir/sites-files/example.com and /var/aegir/sites-modules/example.com
This way it's also easy to make a tar of all site specific files without having to follow symlinks(which has it's security implications)
Comment #41
omega8cc CreditAttribution: omega8cc commentedYep, if we are going to move all those subdirectories there, we should use a path like
/var/aegir/sites/example.com/Comment #42
ergonlogicProgress report: anarcat and I reviewed this on Friday, and based on his feedback and suggestions, I've re-factored this patch pretty significantly. First off, we're now talking about moving the entire site directory out of the platform, rather than each of it's sub-directories. On the http server in the front-end, when setting an 'data directory' in which we'll install sites, we have tokens available for dynamic data, like [client_name] and [uri]. This then gets expanded in _provision_drupal_create_directories(), and can be modified by hooks, etc. It defaults to '[platform_root]/sites/[uri]' which is where sites are created now anyway.
I'm now working on updating the site context to set 'site_path' to the new location, which I'm hoping will mean minimal need to refactor backup, etc. as they'll just start at the new site path.
Comment #43
ergonlogicAfter some additional advice from darthsteven and anarcat, this now appears to be working properly for site installs, generating the proper settings.php and drushrc.php files. Unfortunately, it looks like Drush needs a site_path under the platform root. So, I had to add an additional property to site contexts: site_data_dir. The creation of settings files requires it, and presumably so will backups, clones, etc. Still needs testing of that functionality, and probably a switch from site_path to site_data_dir in most cases.
Comment #44
ergonlogicI built some tests to check that the site's data directory is indeed backed-up and moved, over in #1826074: Add tests to ensure site directories are migrated. The patch over there is incorporated into this one, as it depends on the same helper functions.
I added an additional set of tests to ensure migration still works after changing the server's 'data directory' setting. Surprisingly, everything appears to work without any additional changes!
I'm not sure where to go from here, as I think these patches now fulfill just about all elements of the feature request. Perhaps additional testing, of 'clone' functionality for example, could be useful. Also, we provide a token for [client_name], which could cause some odd behaviour when we change a site node's client. I'll investigate this further, as some additional development and tests may be in order.
Comment #45
anarcat CreditAttribution: anarcat commentedCould you make sure client name changes are tested? This is already a huge issue elsewhere in aegir, we don't want to add any more complications.
Also test what happens when the admin changes the token string...
Comment #46
anarcat CreditAttribution: anarcat commentedDid I mention this was awesome work and that you deserve a medal or commit access?
If not, here you go. :) I'll harrass the other core devs to make that threat more real.
Ah, and I recommend you actually do commits on your git repository when you work on such patches. Then use git-format-patch to generate those patches, and you'll have proper attributions in the commits.
Comment #47
ergonlogicThanks for the kind feedback :)
Yeah, I haven't even tried this yet, and I expect there'll be more work required as a result.
So, we're already testing changing the token string, but I guess you mean after having installed some sites?
Comment #48
anarcat CreditAttribution: anarcat commentedYes, that is what i mean.
Comment #49
anarcat CreditAttribution: anarcat commentedJust a reminder here: some testing is necessary before this patch goes in, to make sure client renames and token changes are working properly.
Comment #50
ergonlogicI don't think this is ready yet. I'll move this into feature branches in the main repos for further work.
Comment #51
ergonlogicI added new feature branches, both named dev-1205458-move_sites_out_of_platforms.
Comment #52
anarcat CreditAttribution: anarcat commentedThis feature branch will need to be rerolled to hosting, see #1912134: split hosting in its standalone module.
Comment #52.0
anarcat CreditAttribution: anarcat commentedUpdate description of where to move site dirs.
Comment #53
mvcI don't have anything to contribute here except an anecdote: I'm currently doing this manually with a bunch of symlinks for a client with large files/ directories, including one site which weighs in at about 200M plus 1.5T of videos. My main problem is that site clone operations require manual intervention, and my client keeps forgetting that step.
One note: if we choose to use /var/aegir/clients/[client_name]/sites/[site_name]/files, don't forget to account for the fact that not all Aegir instances have clients.
Comment #54
ergonlogicThis will be a priority for me once we shift back to 7.x-3.x dev. The plan is to use tokens to allow building custom site paths; probably with a couple standard defaults. Defaults will likely be on the platform (status quo), and in the clients dir. Then we can have an 'advanced' setting, for those that want to hack the site location directly.
I'm think sites default to the 'admin' client, but I'd have to confirm that.
Comment #55
mvcComment #56
rumenyordanov CreditAttribution: rumenyordanov commentedHello,
Will the patches given in #44 be moved into stable 2.x release
Comment #57
ergonlogicNo, this requires significant re-factoring, and so won't be added to the stable branch.
Comment #58
rumenyordanov CreditAttribution: rumenyordanov commentedHi Christopher,
I have a use case with multiple web nodes over same platform and the use case for me is to have files in NFS and the code rsynced. I have customized the aegir to have this in place but I am looking for more stable and reusable solution that can be contributed back to community. I need some advice if the direction I have taken is correct or I should consider alternative solution. What I am thinking:
1) Add configuration in Aegir that defines NFS paths for the web nodes
2) Use provision_drupal_create_directories_alter to actually create files as symlink to the NFS path
3) Use the web cluster module to have the code rsync ( as the pack depends on NFS mount for the code )
Regards,
Rumen
Comment #59
cweagansComment #60
anarcat CreditAttribution: anarcat commentedso me and ergonlogic are looking into this again, and we like cweagan's approach very much. so we're going to suspend work on this until we have #2185627: Add a storage service: Allow configurable file_directory_path per site. figured out.
Comment #61
colanThere's no way this is going to make it into Aegir 3.
Comment #62
ergonlogicComment #63
tdnshah CreditAttribution: tdnshah as a volunteer commentedI have installed aegir3.15 latest version and we want to implement this symlink method of moving sites folder out of platform as I have platform mounted on SSD Glusterfs replication and we need to move sites folder on HDD Glusterfs replication mount as we have more space available on HDD. I tried adding the public files path setting in global.inc as suggested in https://www.drupal.org/project/provision/issues/1260118 this post, but it doesn't work properly hence would like to take this approach of symlinking the sites folder. After reading the whole conversation above i think this approach should also work in our scenario but i am confused where could i configure this so please can anyone tell me how to actually implement and apply this patches, as i couldnt find any such configuration to set my HDD mount path in my aegir installation. Help would be appreciated alot Thanks in advanced