This is a hard requirement in order to implement #713102: Host sandbox Git repositories for any user who wants one. While we want anyone who has an itch to be able to contribute under the new system, we absolutely can't turn drupal.org into a free file host for whatever random crap people want to toss in there (warez, porn, etc.)

Some suggestions to help mitigate abuse of sandboxes:

- pre-commit hooks that check against a list of allowed file extensions, much like we have in the contributions/profiles directories: .inc, .php, .module, .install, .css, .js, .png...
- disallow committing of any files > 1MB in size (or similar)
- Some kind of "auto-scanner" script run before commit that greps for the following and rejects them (and/or bans the account):
a) text/headers from licenses other than GPL
b) security violations e.g. http://ha.ckers.org/xss.html
c) binary data

Others? Who here knows how to code something like this? :)

Comments

jeremy’s picture

Subscribe. Sounds like fun, i'd like to make time to contribute to this piece. Where would these pre-commit hooks live and how should they be implemented?

moshe weitzman’s picture

we already give people barely restricted sandboxes. why exactly is this a "hard requirement". controls are good, but i see no need to inject dependencies. i may be missing something tho.

gerhard killesreiter’s picture

Moshe: currently you only get a sandbox if you go through the "I want an CVS account" procedure. That's a mitigating factor as this limits the number of people to get one. The new workflow would be "create account on d.o", "click button", "commit stuff".

damien tournoud’s picture

Title: Come up with pre-commit hooks to mitigate abuse of sandboxes » Come up with an update hook to mitigate abuse of sandboxes

I would be happy with the following minimalistic checks:

- no file > 1MB
- total size of the sandbox < 100MB

Oh, and this is called an "update" hook in Git.

gerhard killesreiter’s picture

I wouldn't consider this enough. We should check d.o if there's any file that nees to be in the vicinity of 1 MB, but I think that a 100k limit might be more what we'll end up with.

We should also make sure that anything that you upload isn't easily exposed to third parties. Maybe only make stuff available over got based checkout.

CorniI’s picture

A bit information for an update hook can be found here:
http://book.git-scm.com/5_git_hooks.html
Also every git repo has an example update hook in .git/hooks/update.sample which should be something to look at.
Besides doing this forced checking on the server, it would be nice to have a real pre-commit hook, which is run locally on the client when an user commits something, which rejects the commit. A pre-commit hook is optional, as every user can choose to install it (or not), but it would be nice to have one. From reading the documentation, one would probably have to check the staging area against the rules the update hook enforces.
What we also need to investigate is that if we use per-branch level access, gitolite already installs an update hook, and git natively doesn't support multiple update hooks, so we need some (bash) glue logic there.

damien tournoud’s picture

@CorniI: as far as I know, the hooks are local to a repository, and are not carried over when you clone a repository. An pre-commit hook would be pretty much useless.

CorniI’s picture

Yes, I know. A pre-commit hook would just reject commits, if installed, not forcing the user to rebase when he committed something not under the d.o rules, as with just an update hook, the user will be notified of his rule violation at push time instead of at commit time. Sure, it'll be totally optional, and thus of less priority than an update hook.

CorniI’s picture

Ah and can we turn this into a general d.o-update hook?
I'm pretty sure that we want the project and issue repositorys to be locked down tightly as well, and you do similiar things there which you'd do in your sandbox.
And while we're there and forbidding stuff to be pushed, can we please deny non-annotated tags with a reasonable message?
Background:
Git supports two different kind of tags, annotated and non-annotated. Non-annotated tags are a simple information which associate the tag name with a given commit id. Annotated tags contain information like the tag date, a tag message, the tagger and they can be GPG-signed if necessary. Given that we don't have even a tag creation date for the non-annotated tags, they're a PITA to support in the vcs_git module and don't look nearly as nice annotated tags.
The update.sample hook from .git/hooks/ even contains a sample implementation of how to forbid non-annotated tags.

pwolanin’s picture

The original post suggests limiting file name extensions - any reason not to do that?

webchick’s picture

Couple of quick clarifications:

1. This is not a requirement to move Drupal.org to Git. This is something we have slotted for "phase 3" at #714030: Meta-issue: Planning the migration to GIT, where phase 1 is just getting a read-only mirror set up so we can practice the migration and have a target to code against, and phase 2 is simply replacing CVS with Git and keeping our existing patch workflow. We do not need to give every Tom, Dick, and Harry Git access right out of the gate, and can keep our existing approval process in place for quite some time.

2. However, what we will inevitably want is tighter integration with Git and Project* modules in order to facilitate collaboration, especially on the "big" patches. I lay out a current ongoing Git horror story at http://groups.drupal.org/node/50438#comment-139963 which lays out exactly what we do not want to have happen with this move to Git. So tighter integration needs to come fairly soon after we do just a straight-up CVS replacement, or we risk fragmenting the community despite this valiant effort.

3. The consensus at http://groups.drupal.org/node/50438 seems to be building toward one repo/branch per issue, where everyone working on a patch could hack on it at once. This way, instead of people having to leave a comment that says "There were some spelling errors at..." so that the original developer could integrate that feedback and re-roll the patch, you'd simply just fix them (if you were git-inclined). This would allow us to build any improvements collaboratively, just as happens currently off-site on various initiatives like Fields in Core, etc.. This would be the ultimate awesome places to be in terms of community momentum.

4. However, in order for that to happen, we basically need to make commit access to Git wide-open, to emulate the current situation where anyone can roll a patch. Which means we need safe-guards in place so that unsavory assholes don't ruin it for the rest of us. That's what this issue is about. And so it is a blocker for opening commit access (and sandboxes) up for everyone, everywhere.

5. My personal opinion is that opening up sandboxes/commit access in general would be a wonderful thing for us to do regardless of the timeframe in which we move whole-sale to Git, because currently the over-bearing CVS application process is actively turning away new contributors and sending them off to places like GitHub instead of drupal.org. I've ranted about this separately at #703116: Our CVS account application requirements are obtuse and discourage contributions.

Hope this helps clarify things, and where this issue falls in terms of the time frame/priority list.

sdboyer’s picture

Issue tags: +git phase 3

tagging

kbahey’s picture

In addition to a precommit hook, we can also have a nightly check via script that runs from cron, and check the overall disk utilization per user, sorted. This could be emailed to some mailing list (infra?) if anyone exceeds a certain threshold.

Something like:

cd /where/git/lives
du -sm * | sort -nr | awk '{if ($1 > 100) { print $0 }}'
sdboyer’s picture

I've been thinking quite a bit about this recently. I've got some basic ideas about what to put into our hook scripts, but the problem tends to be that git repository size can vary significantly depending on how many packed vs. loose objects it has. Git repacks (pseudo-)automatically every once in a while, but it's a non-trivial operation that we shouldn't be running on every push. There are tools for differentiating between packed/unpacked size, and we can take advantage of those, too, but...yeah, some things to consider.

What's pretty clear, though, is that I absolutely do NOT want to do this on cron. We already have plenty of wasted IO & cpu due to dumb cron-based infra processes that have to iterate tons of data. This system can easily be event-based, so that's how we should build it.

pwolanin’s picture

Is there a separate issue for blocking non-annotated tags?

yautja_cetanu’s picture

Could you stagger accounts? A Basic sandbox for everyone and an advanced sadbox for "verified" users?

Then you could impose the restrictions such as files <1mb but lift them when the account is verified. It means the restrictions will not have to accept literally every possible drupal module.

eliza411’s picture

Status: Active » Closed (won't fix)

Closing old issues. Please re-open if needed.