At DC-CPH, @webchick gave a talk about how difficult it is to get a CVS account these days. Also, this thread outlines a lot of the ideas as well. At the Drupal Core Developer Summit, we discussed about how to change this process with the move to Git. We came up with the following workflow, as quoted from here:

  1. Getting a git account will be easy. You fill out an online form on drupal.org that says that you understand community values and legal issues. You get an account. The details of the form are being hashed out in #720670: Figure out wording of the "Yes, I promise to upload GPLv2+ code" checkbox
  2. If you have a git account, you can create a project and project page on Drupal.org.
  3. To create a release you must go through a process basically equivalent to the current CVS application process.

Though we came to that conclusion in CPH, this thread is to continue that discussion and get some sort of agreement on how to move forward, then create specific issues on how to implement the technical side on d.o.

Comments have already been made, following on that thread:
http://drupal.org/node/703116#comment-3660358
http://drupal.org/node/703116#comment-3660528
http://drupal.org/node/703116#comment-3661004
http://drupal.org/node/703116#comment-3661260

Comments

zzolo’s picture

To quote what was said from other thread (formatting not really carried over):

Posted by sun on November 3, 2010 at 5:32am

2. If you have a git account, you can create a project and project page on Drupal.org.

Given the high amount of unwanted, bogus, or otherwise one-shot-off and test CVS applications created by people, I naturally and logically expect a vast increase of projects being registered on drupal.org that won't (ever) contain any code or releases.

The project short names and therefore namespaces of those projects will be unavailable for others, who might have working code already.

* Outsiders will probably rewrite their code to use a different, cryptic, but available namespace. Effectively adding burden on users and support channels; e.g., "Do you have the ftzgrp module installed?"
* Insiders will probably create a request to take over project ownership of the existing "fake" project. Effectively adding burden on d.o webmasters; unless a more automated project_conflict_resolution_process.module is developed.

If we'd want to prevent that from happening in the first place, then we'd slightly change the outlined process into:

* Create a git account by filling out a form on drupal.org
* If you have a git account, you can create unofficial projects without project page on drupal.org. I.e., like the current sandboxes, or similar. Code can be edited, improved, and also shared and worked on together with others.
* To create an official project and project page with a release, you must go through a process basically equivalent to the current CVS application process.

The technical difference to now would be that the (CVS/git) application review process is based on revisioned code already, so iterative code reviews will be much faster; but most importantly, the result of that application review process will merely be to grant the "create project content" user permission.

reply #96 Posted by zzolo on November 3, 2010 at 6:14am:

Hi @sun. Your points are very valid. But, the idea with this particular plan is to allow people to easily put up code and have all the amenities of a project page without the official approval process. Think GitHub; there is no barrier to entry there and people have a whole infrastructure for their code/project.

So, instead of limiting this, I would suggest possibly the following:

* Namespace is the problem. We probably need more than one namespace for "approved" projects, and sandbox projects. This way, project names will not get used up, except with "official" projects.
* Some sort of very distinct visual difference between an "approved" project and sandbox project.
* Some sort of indication to user and creator that this project has not been committed to in X number of weeks.

Also, if we are going to further this discussion, I would suggest the following:

* A new issue.
* Determine goals of new system before process/rules.

reply #97 Posted by greggles on November 3, 2010 at 7:18am new

Github's solution is to prefix every project with the username, which seems like it solves the problem of conflicts but creates a new one: it's really hard to find the right version of a module.

In this case I think a solution where we are very accepting during the code commit, project node creation etc. process is GREAT, but we may also need to add some conditions to the abandoned project process to cover new situations of project node takeover and make it more logisitically simple and socially acceptable to do it. For example the word "abandoned project takeover" sounds very hostile. Maybe "project transfer" would help. It's also very difficult right now to move project_issue nodes from one project to another, but if we are going to be doing more of this that will likely become necessary.

@mcfilms: the code would be accessible via git checkout, but not as a downloadable file until there was a community review process on the project/code.

reply #98 Posted by Michelle on November 3, 2010 at 8:17am new

I'm probably going to get reamed for that but it has to be said... Maybe it's time to reconsider the whole "golden projects" idea. One repository that is whatever anyone wants to submit with lots of warnings all over it and one that has gone through some sort of approval process. That would lessen the burden on the security team if they only had to worry about the approved one, would lessen the burden of being the gatekeeper because the code would be there and available in the other repository so it wouldn't be such a huge deal if it took a while to get a review.

If we could say to people like this, "Here's your space, put up your project, promote it, get people using it. When you have something solid, ask for a review to get into the main repository." I think that's a whole lot better than, "Sorry, we only have one person actively doing code reviews and 20 bazillion people applying so you're going to have to wait approximately half a gazillion years before you can share your work with the community."

Michelle

zzolo’s picture

I think its important to state what are goals are with trying to change the system. Here are the ideas that I get from all the discussion:

  • Reduce burden on Security Team
  • Make code contribution easier
  • Allow code ("official" or not) to grow on Drupal.org with little barrier
  • Keep some sort of peer review for "official" projects/releases

I think with some filling in of the details, the initial proposal above fits this.

rfay’s picture

Glad to get this into a good place. Subscribing.

sun’s picture

Very good follow-ups to mine. Food for thought.

Durrok’s picture

Good stuff, I'm excited to see where this goes.

mcfilms’s picture

My first inclination was that the "journeyman" level Drupal module developers should have a way to distribute their modules as down-loadable packages too. But maybe requiring users to use GIT to GET them is a good safety feature. It requires the person installing the module to be one step further along on the Drupal learning path.

By the way, my use of the word "journeyman" seems a little dated (and some might consider it has a gender-bias). What should these entry-level module developers be called? (hmmm... elm-devs?)

The reason I am concerned with what to call them is that I see a profoundly good reason to prefix the module name. However, I think that prefix should relate to this group in general. So instead of having durrock-widget.module and sun-somethingelse.module, the names could be elm-widget.module and elm-somethingelse.module. The project itself would be the elm-somethingelse project. As an end-use, these names instantly tell me something useful.

Also, in terms of namespace, it allows these "ELM" developers to reserve a particular name that other ELM-level devs couldn't use. I suppose an already fully certified developer could trump their reservation and release a "somethingelse.module" that would cause the elm-dev to have to come up with a new name.

One last thing: What would be involved with letting the community-at-large vote or allocate userpoints to push certain modules up the evaluation cue? I mean if you are successfully using a developer's module on a production site and you have first-hand experience with them responding to their issue cue, wouldn't you be motivated to see them become anointed? It's the positive version of "voting someone off the island." And that sort of involvement and interactivity is precisely the sort of thing that makes a community stronger.

sreynen’s picture

My first inclination was that the "journeyman" level Drupal module developers should have a way to distribute their modules as down-loadable packages too. But maybe requiring users to use GIT to GET them is a good safety feature. It requires the person installing the module to be one step further along on the Drupal learning path.

Exactly. It's important that Drupal.org doesn't present inexperienced users with potentially dangerous unreviewed code. Allowing access only via git both raises the presumed experience level of the user and disassociates the code a bit from Drupal.org.

What would be involved with letting the community-at-large vote or allocate userpoints to push certain modules up the evaluation cue? I mean if you are successfully using a developer's module on a production site and you have first-hand experience with them responding to their issue cue, wouldn't you be motivated to see them become anointed?

If you're using someone's code on a production site, that'll show up in the usage stats. Heavy usage might encourage reviewers (which anyone can be) to look at the code, but ultimately it's that code review that determines whether the project is approved. We can't vote code into being secure or working correctly (though it would be awesome if we could), and that's what determines approval.

zzolo’s picture

I think voting is a good idea. I believe it has been talked about before for all projects. But, either way, it would provide the
"Release Reviewer(s)" another metric to help make decisions. And yes, voting, or even usage statistics, cannot replace code/security reviews.

That said, I don't think voting is a necessity for the initial implementation of this process.

We do, indeed, need some vocabularies defined here:

  1. A person with Git access (but without release review privilege)
  2. A person who is applying for a release review
  3. A person doing a release review
  4. A person with Git access and release priviledge

And there are a couple points missed above:

One thing that was not mentioned above, but that we discussing Copenhagen, is the scope of access once a person has been approved for release. The main question here was: Do we require that every release be reviewed? Every project? The first few projects/release? Or just the first release? We had decided on the last one, just one review process for initial release.

Another aspect was how to handle existing accounts. We decided that it was most appropriate to grandfather in current users with CVS access.

So, to re-iterate the points above:

  1. A GIT account is given to every user on Drupal.org that agrees to a few specific terms.
    The details of the terms are being hashed out in #720670: Figure out wording of the "Yes, I promise to upload GPLv2+ code" checkbox
  2. If a user has a GIT account, they can create a project and project page on Drupal.org. These projects should be visually distinct and noted that the user has not been peer-reviewed by the community.
    Namespace issues may need to be worked out here.
  3. To create their first release, a user must go through a peer-review process (basically equivalent to the current CVS application process). Each release and project after that will be considered approved.
  4. Existing users with CVS access are grandfathered in.

So, I just thought of something. Here is a one-off case: Someone without release approval creates a new project, then someone with release approval gets commit access to that project. What happens there? Do we not allow that situation to happen? Do we automatically allow the release to happen? Etc.

sreynen’s picture

Do we require that every release be reviewed? Every project? The first few projects/release? Or just the first release? We had decided on the last one...

The last one is "the first release," but does that mean the first release for the person or the first release for the project? Allowing releases at the project level rather than the person level would take care of the scenario zzolo mentioned, as well as make the review process seem a little less personal. On the other hand, it might be a lot of unnecessary work as it seems safe to assume anyone who passed the first review will only be creating better quality code after that.

If the reviews stay per-person, that scenario seems basically the equivalent of the co-maintainer CVS access option we have now. I think that part currently works pretty well, because co-maintainers are giving more thorough code reviews than someone unfamiliar with the project normally would. But per-person reviews would complicate the namespace issue, as the namespace is for the project, not the person.

brianV’s picture

Is there any merit to a Debian-style tiering of repositories, where tiers represent the amount of 'trust' a module should have?

  • Main: Highest tier of peer reviewed and security conscious modules - probably best saved for the most 'trusted' and widely used modules (CCK, Views, etc.)
  • Extra: The second tier is compromised of modules that have at least one security audited release and are actively maintained (commits in the last X months?)
  • Universe: Modules that have been security audited at one point, but are no longer maintained.
  • Multiverse: The wild west. Multiverse module code is not hosted on drupal.org, and multiverse modules can not claim a namespace (ie, http://drupal.org/project/multiverse/112312). Modules in the multiverse may be unmaintained, or have security risks. Anyone can create multiverse modules, but security concerns and namespace conflicts are the responsibility of the site developers to deal with. Multiverse module pages have a big warning that warns potential users that multiverse modules are to be used 'at the user's own risk'...

This is just off the top of my head, and could use some refining, but this (to me at least) provides a balance between keeping the modules on d.org safe and non-conflicting from a namespace perspective while allowing any developer to self-host modules and register them in the Multiverse, making them findable, at least, by the Drupal community.

To move a module out of multiverse, the module maintainer would need to apply for a security review, and upon passing, they get permission to move their module onto d.org.

mcfilms’s picture

In light of the fact that, as Michelle mentioned, there is currently ONE person reviewing modules, it seems to me that code review would have to be based on the individual. So even if they had three modules up for review, once granted "Full-Git" or "Anointed" status, ALL their modules would become available to the general public.

sun’s picture

When changing the overall process and system to have unofficial/unapproved user projects somewhere without releases, but which can be actively used and improved by others, I naturally expect an increase of application reviewers.

Fact is, the most annoying part of application reviews is that you're dealing with ugly tar/zip archives, you need to redo the entire code review after requested changes have been incorporated (as there is no diff), no one ever used the code aside from the applicant, among many other circumstances that make the current application review process hard and tough.

toomanypets’s picture

1. Getting a git account will be easy. You fill out an online form on drupal.org that says that you understand community values and legal issues. You get an account.

Good. This eliminates discouragement associated with the current application process. I am concerned that the discouraging nature of the current process has significantly harmed our organization far more than we suspect.

2. If you have a git account, you can create a project and project page on Drupal.org.

Good. Someone who wants to give back to the community, regardless of motivation or quality, will feel that they are making a difference, and that they are now part of the community instead an outsider.

3. To create a release you must go through a process basically equivalent to the current CVS application process.

Good. But consider an additional requirement. Perhaps to create a release you must also be a member of the Drupal Association.

Pros:

  • Cash to support Association activities
  • Perhaps some percentage of this cash could be used as a stipend for a pool of reviewers. I spent some time this morning reviewing "won't fix" applications, and the workload for reviewers is absurd. If I were a reviewer, I'd probably be a lot less closer to burnout with some compensation.
  • A membership (and the associated spend) is a probably a fairly strong indication of commitment to the community and the project. I wonder if this would result in fewer duplicate and stale/abandoned projects?

Cons:

  • This is yet another barrier to participation. However, the barrier is relatively small (approximately $30) and potentially outweighed by the "Pros" listed above. This requires some serious thought, and perhaps a survey to truly understand the impact on participation.
  • Administrative overhead to ensure memberships are active at time of review, though I'm sure there's a clever way to automate this. The membership form could be presented to the user if they aren't currently a member or if their membership had expired.

From the drupal.org Getting Involved page:

It’s really the Drupal community and not so much the software that makes the Drupal project what it is. So fostering the Drupal community is actually more important than just managing the code base.
- Dries Buytaert

zzolo’s picture

Hi @toomanypets. I strongly disagree with monetary requirements. $30 is a huge barrier for a lot of people in this world and has no real benefit, IMO. This would really disrupt our culture and this community.

mcfilms’s picture

I can hear it now: "I've developed this great module for a project. I would like to make it easy to download for the community. It's my way of giving back. Tons of people will find it useful. Oh wait. I have to PAY to contribute it?"

brianV’s picture

Yep, making people pay in order to contribute code to the community would be a no-go for most contributors.

toomanypets’s picture

OK, bad idea, but here's why I suggested it. I went through the application process once, and it didn't leave a great taste in my mouth -- perhaps I'm too sensitive.

Have you ever heard the expression "It's not what you say, but how you say it?" By suggesting the membership requirement I was hoping to find a way to either reduce work load for those reviewing applications (so they have more time/energy to focus on "how they say it"), or provide some degree of compensation so that each potential contributor is treated like a paying customer.

For those of you who review applications: please do not misunderstand. I am very appreciative of your efforts, and I think you do a great job of filtering out the cruft and mentoring contributors. Frankly, after reviewing some of the "won't fix" applications, I am surprised there aren't comments on these applications such as "You've got to be kidding me" or "You actually want us to review this?" Even if I were technically qualified to do what you do, there is no way I could do what you do! Thank you.

Having spent the majority of my career in customer-facing roles, I've seen what happens when customer support professionals get overloaded. The reported issues still get resolved, but the interaction with the customer (how you say it) suffers.

Thanks for listening.

zzolo’s picture

You're experiences are invaluable, @toomanypets. This is the exact reason that we are proposing this solution. It allows new contributors to mature their code and get more feedback before it is up for review. Hopefully lessening the load on reviewers, both by code quality, and lessening application cruft.

An important movement at the moment is to get more reviewers. I have been trying to head this up with the little time I have and got a number of people going at DrupalCon Copenhagen. But its hard to keep up the momentum. Nonetheless, I made an in depth handbook page on how to review and how to be nice and helpful. And I have put in a request for a new group on g.d.o for code reviewers. (http://groups.drupal.org/code-review)

I think your reasons and concerns are solid, but I feel, and I assume lots of others feel the same, that requiring a monetary donation to put code on d.o is very much not a good or viable solution for the community.

sun’s picture

I'm still worried about the clobbering of "official" drupal.org project namespaces and the resulting consequences of that.

So far, the only suggestion to resolve this has been to introduce semi-official projects that automatically get the username as prefix.

Technically, that would result in almost the same as we already have with the current CVS sandboxes (which only a small fraction of developers know about and use). However, with the difference that people - instead of having one dumping ground for everything - could register individual, atomic projects having a username prefix, which may or may not be turned into non-prefixed projects at some point.

But does that actually work for real world scenarios?

If you look into my sandbox, then you can see a couple of things that are under "initial" development currently. Why are those in my sandbox and not registered as regular projects?

  1. First and foremost: Finding a correct and proper module name is one of the biggest challenges at all.

    It's a one-time decision that has a long-term effect and cannot be changed easily afterwards. Changing an existing name is a huge project on its own; requires to write a complex module upgrade path, leaves users behind with old versions of a replaced project, screws usage statistics, handbooks, and anything else that has ever been written or stated about a name/project.

  2. While the code could be ready for mass-consumption already, I don't plan to maintain it at this time, because it is "pre-alpha" and under heavy development; things will undergo major changes without notice, and, rapidly.

    Specifically, filing bug/task/feature issues against the current code would be nonsense and a total waste of time for everyone involved.

  3. My fellow Drupal friends and co-maintainers can actively contribute without any restrictions.

    There's no CVS access or any other bureaucracy involved with sandboxes. Everyone having a CVS account can commit to any sandbox. But of course, only my friends, who actually know about those developments, and who are discussing the code in almost a daily frequency, have a "gentlemen agreement" to basically commit freely, but letting each other know of current developments and envisioned changes.

  4. It's the pro-active evolution and shaping progress of the Drupal community that matters, not the publishing for general mass-consumption.

    Too specific one-shot modules, only developed for a certain use-case, heavily harm the Drupal community, as they naturally lead to duplication, and in turn, to big confusion for everyone in the search for a module solving a use-case. By quickly drafting, reshaping, rewriting, discussing, and revamping stuff in a sandbox with others, the potential use-cases as well as general design questions can be fleshed out, compared with other projects, and be redone, as required.

If that could be done in a slightly more formalized fashion in the future, fine. But effectively, the only difference would be that every item in my sandbox would become a new project on drupal.org, prefixed with "sun_".

For example, the "markup_test" module would become "sun_markup_test", and it would get a corresponding project page. Normally, that would and should also mean that the module also has to use the name "sun_markup_test" instead of "markup_test". But regardless of that, what's a bit weird is that neither "sun_markup_test" nor "markup_test" will ever end up for public mass-consumption as is, because the module will end up using a different name, which we figured out during development. It would also have to change, in case an official "markup_test" project would have been registered by someone else in the meantime. While that may not be true for all new project/module ideas, we are often recommending a name change for new contributions in the current CVS application reviews.

In the end, the current incarnation of "markup_test" will have to vanish from every place it currently exists in. Meaning: If there is any project page and any other stuff for it, then that has to be deleted, too. As of now, no posts but spam posts are actually deleted from drupal.org. Everything else is normally kept forever, but that makes no sense for the outlined scenario.

rfay’s picture

IMO the namespace issue is completely valid and worthy of consideration, but is not actually tied to this issue. Or should I say, we can't actually solve it in this issue. If we're actually going to take it on, I think we should open another issue. But it's a generic how-we-do-business-in-drupal issue. Nothing would fundamentally be different from what it is today.

sun’s picture

err? Of course it would be different: Everyone and the world can clobber drupal.org with half-baked project ideas, perhaps not even containing any code; searching for projects leads to gazillion results, and whatnot.

If we want to change the process, then we need to take the consequences of our actions into account. The consequences of simply opening up project registrations to everyone and only limiting releases, which is what I'm trying to communicate here, are unacceptable and their impact will be much more harmful than beneficial for the Drupal community and project at glance.

If we leave out the free-for-all project registration and basically keep the current sandbox behavior, but merely allow more users to have a sandbox (which is what I originally proposed in my slightly altered process), then all of those consequences around clobbering drupal.org projects wouldn't exist. As of now, that's the only simple resolution I can see, which doesn't lead to negative consequences.

Durrok’s picture

Sun - Would splitting the development and release sections of the module perhaps provide a decent solution? Only modules that have been approved would show up on actual project pages and you have to have your first module of every project "approved" before it can it can be turned into a project page. Otherwise it is available only from git or maybe a "temp" project page with some random name (unix timestamp of first uploaded salted with users uid or something like that). I personally prefer the temp project page as it might be very hard to get people to test your module without some type of advertisement.

To help tackle the "searching for a module gives me 100s of development modules" we could not index these temp pages in the drupal search engine, index them but let you filter them out easily (probably should be filtered as the default), and/or put up warnings on these temp project pages ("WARNING: This module has not been peer reviewed and may contain security vulnerabilities or may conflict with other modules. USE AT YOUR OWN RISK!")

sreynen’s picture

So far, the only suggestion to resolve this has been to introduce semi-official projects that automatically get the username as prefix

@zzolo made another suggestion in #96 of the previous thread (quoted in #1) to add a second namespace for sandbox projects. It seems like everyone's agreed that sandbox projects should be in some way prevented from using the approved project namespace, so I think @rfay is right that the discussion of how exactly that happens should be handled in a separate issue, unless there's some question of whether it will be possible at all.

mikey_p’s picture

The only problem with the suggestion from #23 is that Drupal fundamentally and technically only has a single namespace. The only way to namespace items is to use a different project name, otherwise the software itself won't have anyway to determine whether 'foo' module came from the official namespace or the sandbox. This could lead to lots of confusion.

sun’s picture

After talking to others, I (we) honestly don't understand why we are discussing project-pages-for-everyone at all.

  1. What's the effin' point of a project that contains no releases?
  2. Why is having a project page even considered to be a part of "barrier of entry" at all?
  3. What exactly is not possible with simple sandboxes?

My answer to all of those questions is: None / Nada / Nothing.

Did you care to have a look in my sandbox? Did you care to have a look into all of the other existing sandboxes? How do you come to think that anyone would remotely consider to look at any of those? How do you come to think that anyone would consider to use a module from an author that didn't go through any code or security review? Solving your use-case at the potential cost and very high risk of critical security issues?

Of course, the following list of projects exists for other reasons:

http://drupal.org/project/annotation
http://drupal.org/project/recorder
http://drupal.org/project/trip_currency
http://drupal.org/project/xssfilter
http://drupal.org/project/members
http://drupal.org/project/viewcount
http://drupal.org/project/form_mail

But you can expect a giant list of such projects with the proposed process. Now, what's the point? Do those help anyone or improve anything?

The exact opposite: All what those projects are doing is to confuse everyone. Most importantly, they confuse innocent end-users.

The consequence?

Drupal sucks! 18,000+ extensions, but ALL OF 'EM are empty/unusable! WTF?!?! Going back to #wordpress!

mikey_p’s picture

The benefits of having project pages are getting an issue queue for each project and a place for feedback for the author as well as increased visibility for sandbox items. This would be solved by having default project search not include sandbox projects, but having a separate facet or search that searches all, or only sandboxes. Suppose there is someone else who is working on something similar to sun's markup_test module. How would anyone find it? Our current sandboxes are very difficult to search, and probably won't get much better if it only exists as a repo, without any record on d.o.

sun’s picture

The benefits of having project pages are getting an issue queue for each project and a place for feedback for the author as well as increased visibility for sandbox items.

So it's all about having an issue queue for your code in the repo? Easy: sandbox_issue.module (à la project_issue)

Suppose there is someone else who is working on something similar to sun's markup_test module. How would anyone find it? Our current sandboxes are very difficult to search, and probably won't get much better if it only exists as a repo, without any record on d.o.

Sounds like promoting sandboxes to first-class entities. It's very beneficial to hear these actual use-cases and ideas that are currently hidden behind the "projects for everyone" proposal.

--
Overall, I have the impression that there has been some in-person discussion at some point/event, but we settled way too fast on the idea that "everyone should get a project", without specifying what exactly "project" means or could mean, and whether it's the same "project" we commonly mean with "project", and if it is, what the positive and negative consequences of the proposed simplicity of allowing more "projects" are.

zzolo’s picture

Hi @sun. Again your points are very valid, but you may be getting ahead of the discussion and assuming implementation details. This discussion is continuing what a group of us discussed and decided together on in Copenhagen. Nothing is moving forward yet, and this thread is to put the discussion out in the community. This is the exact reason why I tried to start with stating goals first, before worrying about implementation.

The main reason to even consider all of this is because getting a CVS account right now sucks a LOT, and we need to do all we can to fix it, and the migration to Git offers a good opportunity to make changes. Drupal is growing, exponentially, and that means people are writing more code and more people want to contribute to it. We can't avoid it. It's really awesome, but we can't assume our current infrastructure and policies will be able to handle the growth, so we have to think about changing and improving. We have to scale with it as a community, which means putting mechanisms in place to allow people to foster their code, as well as have more and more mentors to ensure that good code is what gets put out into the community.

Yes, the sandboxes are really awesome, but no one knows about them and there is no structured place to have actual discussions about those projects. Having a full d.o project allows people to have discussions, fix bugs, etc in a more obvious, public space. Code without a community (ie at least a webpage) is kind of a waste of time, IMO, no matter how good its written (I guess, maybe if you comment the hell out of it).

Also, no one has ever mentioned that any of these "unreviewed" projects would show up in searches (without specific filters), or get the full project treatment outside an issue queue. I always thought it was a safe assumption to keep them out of the "public" eye. I've also mentioned a couple times, that is is very important to have distinct visual indicators of projects that are "unreviewed". The module soup is a very real concern and I don't think anyone here wants it to grow without some sort of peer-review.

GitHub is a real example, and competition to d.o. There are lots of drupal modules on github, and none of them are even remotely supported or peer-reviewed (farther than regular open source), but its a real place for people to put and get drupal modules. It's pretty safe to assume that it will just grow if we continue to allow our code contribution process to be so negative. On that note, we should also include the following as one of the goals, at least, I believe this to be one of the community's goal.

  • Attempt to keep Drupal code in one place (drupal.org)

@sun, it would be nice to know who was in that discussion, so that we can get some idea of how many people feel that way. It would be nice to know if you agree with the goals I stated above, and if so, then let's work towards fleshing this out, focusing on achieving those goals, and talk about how we can make this happen, instead of possible details that have not even been fully decided on yet.

mcfilms’s picture

Yes, the sandboxes are really awesome, but no one knows about them...

I have been using Drupal for a couple years and, although I have an awareness that their is such a thing, I do not know where these sandboxes are or how to search them.

The reason I became interested in this topic was that I met developers who showed me modules that I found really useful. They did not have a CVS account (and still didn't a month later). Their modules were not available on some sort of "sandbox" page. Two were available as zip files in an issue cue. One could be e-mailed to me. If this is how the modules are getting distributed I think we can all agree that something is broken.

I came across a quote on a Joomla! site that made me raise my eyebrows. I don't agree with a lot of what this developer said. And his point was in a different context than what is discussed here. But the fact is that this individual felt this way. It made me wonder what made him think this:

Please, let's not mistake a monolithic, arrogant, restrictive development model (Drupal) for a flexible, open-minded, extensible architecture (Joomla!, Nooku et al).

(The full quote is at http://www.alltogetherasawhole.org/profiles/blogs/make-a-joomla-drupal-m... )

marvil07’s picture

A lot of people here talk about the sandboxes, here the issue about it: #713102: Host sandbox Git repositories for any user who wants one

Please also note that versioncontrol_account_status module(inside versioncontrol project) handle the actual(old) process of vcs account approval.

I really need to take a closer look to this issue, but I just wanted to subscribe and mention those things :-p

mlncn’s picture

A proposal based on all discussed above:

  1. Getting a git account is as easy as having a Drupal.org account.
  2. We need something equivalent to a sandbox that people are taught to use, that gives access to issue queues that may be tied to a person namespace instead of a project namespace but is still fairly analogous to having a 'real' project. (First to approval getting the namespace could cause some heartbreak, but some way of automating a change of namespace could reduce the injury added to the insult. In the Definitive Guide to Drupal 7, i recommend people give custom modules highly unique names without underscores so that changing the project name is easy with find/replace should they choose to contribute it. For sandbox projects, it's probably easier to let people use the namespace they ultimately hope to get, though, and we could even try having the namespace given to sandboxed projects provisionally reserved.)
  3. Every new project should have to go through the review process to be created. Established developers should experience a taste of what new developers go through (they may be treated differently, we can't control that, but at least they see the process).
  4. Practically, we cannot hold up a security release for another review, so only the first release for each project should have a review.

I agree that step 2 is not going to be that easy to figure out, but i also agree that it has to be figured out to conclude this longer discussion. Edit: Asked about issue queues with the personal git sandboxes.

A side effect of this setup: Maintainers should be able to add people to be co-maintainers with commit access without anyone else doing a review. This is the setup we currently have, really: people who know how broken the CVS system is get added via a co-maintainership for someone else's module.

webchick’s picture

Issue tags: +git phase 2

Just so folks are aware, the compromise we reached at Drupalcon CPH wasn't intended to be the Ultimate Awesome solution to this, but something that could get us able to launch the Git migration sooner with the infrastructure we have, and could be fleshed out more later with #713102: Host sandbox Git repositories for any user who wants one and some other "git phase 3" (aka, post-launch) components.

If people want to blast this discussion wide open again and start from ground zero once more, realize that this will push the Git migration off for probably another 2-3 months (which means after Drupalcon Chicago), and incur another several tens of thousands of dollars that we don't currently have. Unless we get a huge, miraculous influx of free, dedicated help or sponsorship from somewhere.

Anyway, I'm tagging this as something for the Git team to revisit and chime in on.

mikey_p’s picture

I hope this doesn't hold up the migration, and I'm not sure why it would. The worst case scenario is that we make the switch but keep manual approval process for a little while until this gets resolved, but at least all our current contributors could be using git for their projects, and for core. (I can't stop dreaming about that day)

webchick’s picture

The manual approval process is provided by cvslog.module, which specifically did not budget for porting, since there's a 200-reply issue (hilariously, now marked 'fixed') about how harmful this process is to the contributor community. To help address that problem, along with the absolute insanity of passing tarballs back and forth in the current application appoval process, and to make the VCAPI code more streamlined and generally sane, we had built consensus 2 months ago at Drupalcon CPH to make the approval process simply around granting "create project releases" permissions, for a whole variety of reasons.

I'll let Sam chime in here in case I'm blowing things out of proportion, but I don't think so.

zzolo’s picture

Just to defend my action, I marked that other issue as "fixed" because there the few action items on there were already addressed (like rewording), and the other items were either general topics that could not be taskable in such a broad discussion, or directly concerning this issue. And I did state that I was not fully comfortable as marking it fixed. There. :)

I do think its important to have this issue up for a moment on d.o to have others chime in. Drupal Core Developers Summit is hardly consensus, especially as a makeshift discussion while most people were coding. Nonetheless, I do agree with this and think we should move forward with implementation.

sun’s picture

It's nice and all that there's been some consensus 2 months ago, but obviously that discussion and consensus didn't take the full range of consequences into account. Sorry, it doesn't help at all to keep on repeating the story.

As mentioned earlier already, an intermediate step to keep on making progress would be to change

make the approval process simply around granting "create project releases" permissions

into

make the approval process simply around granting "create project content" permissions

webchick’s picture

That's not an intermediate step though, because in order to do that, we need to have sandboxes, which were not slated to happen until after initial launch. Hence the pushing timeline off 2-3 months thing.

sdboyer’s picture

@sun:

It's nice and all that there's been some consensus 2 months ago, but obviously that discussion and consensus didn't take the full range of consequences into account.

Please, the dismissive attitude isn't helpful. We (or at least I) did consider the namespace pollution problem, and while it's a concern, it's not the only one. Slighting the conclusions we did come to because we didn't adequately address your big concern doesn't help. We were trying to use the git migration as an opportunity to address some of the concerns raised in #703116: Our CVS account application requirements are obtuse and discourage contributions, and the solution we came up with hit pretty much all the bases.

Now that said, we probably could have given more consideration to the namespace issue, and there certainly is cause for concern. And I think a solution where people get instant access to sandboxes, but require approval to create 'full' projects, is a good one, even for the long term. But we need to be clear - as it pertains to the git migration, the goal of the discussion we're having here is NOT about the ultimate form we want this process to take. It's about what we can reasonably accomplish under the banner of the git migration. The proposal you made, which you re-summarized in #36 as

make the approval process simply around granting "create project content" permissions

and involves introducing sandboxes, is not an intermediate proposal - it's something much more like a final product, and one that would introduce a solid chunk of additional work for us to do pre-launch. I think webchick might be overstating slightly in #32 - it's probably only two month's extra work, at the outside. We'd need to set up an entirely separate system for managing auth, do more than we're planning on right now with drupal <-> vcs username mapping, generate additional UIs for managing the separate project types, set up a separate class of repo-triggering jobs, and other things that aren't coming to the top of my head. The reality is that the migration's launch would more than likely be delayed until after DC Chicago.

Really, our choice is between sticking with the same profoundly broken process we have now, or trying to improve it. If we try to improve it, we can go with the proposal we came up with in CPH, which opens the process up enormously - and then we can prioritize introducing something like the sandbox-based system you've described later. Or we can introduce sandboxes pre-launch, and delay it for a while longer. I'm more inclined towards the former, as I'd really like to avoid letting our barriers to contributions languish on any longer than they have to.

mikey_p’s picture

For myself and others sake, could someone explain why restricting the 'create project content' permission would require sandboxes?

What I was picturing as the worst case scenario, only users that currently have CVS accounts would be able to create new project which would in turn create a repo for each project. To get that permission, users would have to apply, submit code, etc, roughly the same as now. I don't see how this would require sandboxes. Maybe I'm misunderstanding the definition of sandboxes? (we should probably clarify that as well)

webchick’s picture

I guess it would be more correct to state that if we move approval process to "Create project content" permission, "someone" is going to need to port the demonstrably-detrimental-to-our-community CVS application process that currently lives in cvslog.module to to a new module that integrates with Version Control API. And there is no one on the Git migration team that I'm aware of with either the availability or the desire to do that, currently.

webchick’s picture

Oh. And in terms of definition of sandboxes, at least when I'm talking about them, I mean http://drupalcode.org/viewvc/drupal/contributions/sandbox/ in Git. A private place where you can commit whatever unholy things you want, without hapless end users coming across them, but where application reviewers can review with actual diffs and such.

Also, some background for the rationale of the decision.

One of the big reasons for the application process by its advocates is to allow for a mentorship period for new developers to teach them best practices before we turn them loose. This is a noble goal, in theory. But it currently falls flat down on its face, because there is nothing about the process that remotely mirrors actual module development. In the real Drupal world, we don't pass around tarballs to one another. We commit stuff to version control and read diffs. In the real Drupal world, we don't have one massive issue to keep track of every last thing that's wrong with something. We use issue queues and patches, and we enforce pretty strict guidelines on kitten-killing. In the real Drupal world, we do everything we can to remove barriers to contribution because we recognize that the 0.05% of us who are giving something back are our community's most precious resource. Yet, everyone who wants to contribute new code needs to go through this arcane ritual that only leaves a bad taste in their mouth about the Drupal community and causes a non-trivial percentage of them to host their code off Drupal.org.

So basically, we are initially setting new developers up with the totally wrong idea about our community, none of the skills they actually need to do their day-to-day maintainership skills, plus coupling the concept of mentorship with people actively blocking a new contributor from participating. The two are not compatible, and performing the kind of acrobatics we do is not only frustrating for contributors, it's a frustrating experience for CVS application reviewers, as well (not being able to read a simple patch or diff drives me absolutely, stark-raving nuts).

Allowing anyone to create projects takes care of all of that, because new developers get an issue queue, a project page, a Git account, and more without us having to develop a bunch of infrastructure around sandboxes. And the only way for end users to get a copy of the code is if they know Git, effectively creating a safe place for experimental code.

We did take the full range of consequences into account, including namespace collisions and the risk that we become a warez host. But on balance, addressing these problems after launch seemed like a fair trade-off for removing this horrific blight on our community that actively and demonstrably drives contributors away.

So, in closing, if we want to keep the status quo, then someone needs to step up to port the application stuff from cvslog to VCAPI.

pwolanin’s picture

Of the choices in front of us and with a clear need to get to git yesterday, I think this makes sense as long as it's not too hard to "recycle" projects that are dead or abusive.

So as I understand it, the moderation would rest with the granting on a per-project basis the permission to create releases.

I think we need to add some kind of flag for an intermediate state - something like "looking for contributors". This would be added by a moderator after there is an initial push of some code and the moderator gives it a 2 minute sniff test to make sure it's not obvious garbage.

This flag could be used to start showing project in search results see: http://drupal.org/node/970190

brianV’s picture

To avoid namespace collisions, perhaps namespace should not be regulated for non-reviewed / application modules, and they get assigned a 'numeric' project machine name:

ie, http://www.drupal.org/project/<project_node_nid>

Then, when a project for a CVS application is approved, the project node has it's path updated to http://www.drupal.org/project/<project_machine_name>

At which point, the namespace is considered reserved...

This would essentially allow the full project functionality of a full-fledged project for the review process.

Of course, I am saying this with no concept of what role the project's machine name plays in the backend VCAPI stuff.

zzolo’s picture

Hey, on a related note, the new Code Review group got approved:
http://groups.drupal.org/code-review

I still need to spend some time making it look better and providing a better more focused interface.

sdboyer’s picture

For the sake of my sanity (and since this issue is tagged git phase 2), I would prefer that this discussion be focused ONLY on solutions that are feasible to implement before/as part of the git migration. Hint: pretty much anything that involves adding new project meta-structures - like pseudo-projects, separately namespaced projects, etc. - is NOT feasible.

Long-term discussions are fine & very important - but they've got no place here.

mlncn’s picture

I apologize, it was not clear to me before that namespacing had been considered and rejected for now due to capacity constraints. Full speed ahead with dropping barriers to starting a project! This treats everyone equally and puts our problem in one space: improving the quality of all the projects at d.o/project/*. We'll want policies and tools for recycling project namespace soon after, and we're likely to have plenty of incentive to work on user-namespaced projects that enjoy all the amenities of regular projects, but if that's off the table for now let's see what opening the doors brings.

I think concerns with too many projects in listings is not an issue as the default listing would not include projects that have no release.

So the only modification to the original post is to specify that only first releases get a review:

3. To create a first release you must go through a process basically equivalent to the current CVS application process.

with the benefit that each piece of the review can be posted as real, separate issues on the project.

Is it more convenient for people with the permission to publish first releases for the issues to remain posted in a special queue, or could this queue be dropped in favor of using tags on issues in each new project's queue?

mikey_p’s picture

@webchick in #40

Just to be clear I'm not sure what the best path forward here, is, but I'm trying to find a way forward that can get us switched to git as quickly as possible, and if that means leaving the namespace issues to be resolved later, I'd like to consider that possibility. With that in mind, quite a bit of the code for the current cvs application process lives in drupalorg module, mainly the submit handlers for creating an issue in the issue queue. All that is left in cvslog is a form that creates an email to send to the cvs applications mailing list.

tizzo’s picture

I joined the Git migration team partly out of my selfish desire to get CVS out of my life forever, so I hate to advocate anything that could slow that down but...

Namespace pollution is already a problem and allowing anyone to create project content is going to cause this to spiral out of control.

Could we possibly do something like the following:

  1. Allow creation of any number of repositories in a user's sandbox (maybe sandbox/[uid]/foo.git)
  2. Have a "mark sandbox for review" button in that user's list of sandboxes (flag module)
  3. Grant 'create project content' to that user upon approval (this could even be a VBO view for administrators)

Reasoning:

The final code to create new repositories isn't finished yet (though I built a working prototype) and the logic for adding sandbox sub directories would be trivial to add.

If an item was flagged, diffs could be reviewed and versioncontrol api would be logging all of the changes just as it does for a module repository without having to do any extra work (vcs_api already handles the change logging and building links to those diffs, etc).

When an application is approved, we grant 'create project content' permission and then they can create projects of any name and create releases as well (we retain the current functionality.

This solution could be implemented with very minimal effort and prevents namespace proliferation. Almost all of the administrative and UI stuff can be handled by views and flag module.

Limitations

We would lose the ability to have issue queues, etc. Frankly though, I don't think we want to encourage users to use sandbox code and issue queues would encourage that. Git is hard to use, simply copying and pasting a 'git clone' command isn't. If we have project pages we'll have people using projects. More and more end user types are starting to grab OSS apps from github and I think we want to keep sandboxes more obscure (e.g. in a list on a users /user/[uid]/sandbox page).

Ideally it would be nice to have a way to 'promote' a project and perhaps for each project to undergo that as a review process. This would be easy to add with a custom action but would represent extra work. However that could be added later when we're all happily using Git.

greggles’s picture

subscribing.

I wish I had more to say, but it was quoted in #1.

Summary which is friendly to both the project timeline and the needs of our new contributors: I think it's OK to let anyone create full project nodes as long as we make the "abandoning" process easier and more friendly.

arianek’s picture

Been following this thread for a while (I was at the discussion in CPH), and I still feel like the plan that we hashed out (and there *was* quite a bit of lively debate over all of the issues mentioned, including namespace etc.) is still pretty darned solid. (ie. outlines posted by sdboyer, zzolo, webchick) and that it is reasonably implementable without blowing budget/timeline out of the water.

I think BrianV's comment #43 with the idea of just giving numeric project names until a release is approved honestly isn't the worst idea ever as well. Might even help with making sure projects started by existing "approved" users that stagnate or never get to a full release don't end up hogging up namespace as well.

pwolanin’s picture

I *think* the short project name is needed to create the repo, etc - though perhaps all the repos could have numeric names and the short project names could just be alias.

That might actually make the process of reclaiming a project short name even easier, since the old repo could live on and even be accessed via its numeric name.

tizzo’s picture

pwolanin: In our proof of concept code the shortname is used to create the repo but that could easily change, nothing's production ready yet and that code will be fairly isolated (in neither project.module nor versioncontrol_git themselves according to sdboyer & dww) so no worries there.

If we want to be able to do aliasing between the user readable repository name and the the system name that is something that we could add to the twisted daemon, we already have code in place to abstract the requested location from the disk location. While that could make it easier to handle namespace takeovers, I don't think we want our solution to rely on doing that regularly. We'll always need to do due process before any takeover (contact d.o folks, contact maintainer, wait, discuss in an issue) and with a community as inclusive and democratic as ours we're never going to make those decisions quickly.

With the setup I suggested in #48, each user has a sanbox namespaced by their uid (e.g. git@git.drupal.org:/sandbox/[uid]/experiment.git). When they get approval (either for their project or for when they get global 'create project content' permission) the would get their official non-uid namespaced repo. Then it would be a simple as git clone git@git.drupal.org:/sandbox/[uid]/experiment.git && git push git@git.drupal.org:/experiment.git master.

We should work on the project takeover workflow to make it suck less, but that's really not a tenable solution to the namespace issues at large.

sun’s picture

Closely related g.d.o discussion: How do we get more people to review code?

pwolanin’s picture

@tizzo - it's totally silly to have sandbox/[uid]. We don't need that - you made a brilliant suggestion, but maybe you don't realize it.

I don't need /sandbox AT ALL, IN ANY PHASE if users who have agreed to the GPL pledge can create projects that have no short name. Then we have as many sandboxes as we need, right away!!!

Name the real repos by the nid, e.g. project/[nid], e.g. git clone git@git.drupal.org:/project/961144.git

Or add some letters and leading 0's in front if a purely numeric repo name offends you - e.g.
git clone git@git.drupal.org:/project/p000961144.git

The first step of moderation would be for an admin to assign a requested short name (i.e. the URL alias, etc) as the initial moderation step described in #42 - someone sniffs the code and deems it to be contain some sane PHP. We can then write some symlinks on disk to map the short name to the numeric repo name, etc.

This kind of process means we are largely relieved from worrying about name space consumption while also letting new contributors get started right away, have an issue queue, etc.

webchick’s picture

Hmmmmmmmm!!! That definitely is worth some pondering!

eliza411’s picture

Issue tags: +git sprint 5

Tagging for consideration in git sprint 5

mikey_p’s picture

If we go with node IDs as project and repo identifiers, Project module would most likely need some work as it depends on the uri as the shortname in a number of places such as views argument, etc.

mikey_p’s picture

One things that has been coming up is the idea of leaving Project nodes and issues queues out for sandbox projects. While I do partially agree with arguments for this approach, I thought one of the major purposes of d.o sandboxes was better visibility and issue integration. If we don't do project nodes and issue queues, what is the point of d.o sandboxes? Why not just tell people to do sandbox hosting on github if this isn't going to tie into any d.o specific infrastructure?

webchick’s picture

Well, a big point is keeping all Drupal code in a central location, and thus keeping our community collabration in a central location. This has both profound cultural effects, but also lowers barrier to contribution. Right now, initiatives are scattered all over the place, some on launchpad, some on github, some on private git repositories, etc. It makes it extremely challenging for a contributor to participate because they need to have accounts on all these disparate systems, and learn how if you want to clone from *this* repo you do *this* and so on. If everything's on drupal.org, it's easy.

There are technical considerations, as well. Right now, it's possible to checkout the entirety of the contributions repository and grep for, say, hook_node_blargy_blarg, and find out the overall impact of an API change on such a function with all public-facing code, or for the security team to get a list of things affected by a certain new vulnerability. With code scattered to the four winds we lose this ability. I ended up re-writing a module that already existed on GitHub because it didn't come up in my grep of the contributions repository, thus wasting hours of my time, for example.

My vast preference would be for even "experimental" projects to have their own projects/issue queues. Both because it teaches people "the Drupal way," and also because it makes collaboration easier. "Learn one set of tools that works anywhere" is the key to low barrier of entry for contributors.

eliza411’s picture

Assigned: Unassigned » sdboyer

Assigning to Sam to get a decision by the beginning of the second week of the sprint.

pwolanin’s picture

Using the Node ID instead of the short name should be even easier in a lot of places. But agreed that it means some work re-aligning project module.

michelle’s picture

I somehow missed that the issue I had been following forked over here. As much as I loathe subscribing, I want to keep up on this issue considering I'm one of the CVS app approvers. Not too much to say that hasn't already been said except a definite -1 to making someone who already has a project go thru the process again for a new project. Sounds like that was mainly early on in the issue, though, and maybe not of concern anymore.

Michelle

Durrok’s picture

Michelle - I think the idea is to make that process a lot easier and stop people from spawning projects with no supervision or peer review just because they have had one module that went through the process. Right now contributing to Drupal is a case of the haves and the have-nots. If you have not gotten a module approved you are locked out and can't contribute. If you have one approved module you are given free reign to create as many projects as you want. Somewhere in between is a far better solution.

michelle’s picture

@Durrok - The problem with that is you're putting up a barrier against an existing contributor. When I created Author Pane, it was because I pulled duplicated code out of two of my modules and made it a dependency. If I had had to wait through an approval process for AP to be approved, it would have been a real PITA to time everything just right between the three modules.

I think if we trust someone with the rights to create a project, we should continue to trust them. If a person becomes a problem, that can be dealt with on a case by case basis. But putting unneeded barriers in front of people who have already passed the initial screening seems like needless bureaucracy, a drain on already scarce man-power, and a source of frustration to anyone trying to move part of an existing module into another project and time everything correctly.

I think the assumption should be that most people will do the right thing once they've initially shown they are capable and leave it at that. Deal with exceptions; don't put more hoops in front of the 90% that will do just fine.

Michelle

pwolanin’s picture

@Michelle - well, the recent ideas have involved anyone being able to make a new project, by levels of moderation to get it a short name and then to get to make releases.

Likely there will be some level of permission for people trusted to do every step of this themselves, but honestly, a little peer review would likely be helpful for almost everyone in the community.

webchick’s picture

Calling for more peer review when we have a 6-page backlog of approval requests going back 10+ weeks is nothing I can possibly support.

michelle’s picture

@pwolanin: In theory, peer review is great. In practice, it can mean delays and headaches and people giving else and putting their code elsewhere. I created Author Pane after weeks of frustration trying to get the maintainers who's module I was integrating with to add a smidge of integration code. I finally gave up and made a separate module for it all. If I'd had to try and get someone to then do a review on that module before I could release it, I would have been seriously pissed.

I can understand that core is sacred and every line needs to be scrutinized. But I think we need to loosen our grip on contrib not tighten if further. Need to put less time into control and more time into making sure the cream rises to the top.

I got access to approve CVS accounts a very long time ago and all we did was give the module a cursory check to make sure it wasn't a complete piece of crap and then welcomed them to the contributor's club. There was never a backlog like there is now. Gatekeeping just doesn't scale. I seriously don't understand why the whole "golden repository" thing is so taboo. What is so terrible about having a set of modules that are stamped as being under the care of the security team and actively maintained and a set that has warnings to use at your own risk? Let the maintainers who really care about making their module the best it can be work to get approved into good standing and let the ones who just want to share what they did in case someone else can use it have their space.

Michelle

pwolanin’s picture

@Michelle - as above - I'm sure we will have a permission set that allows you to totally bypass moderation.

@webchick - I'm not thinking of anything that would lead to that kind of backlog - rather something that can be widely shared. Note that with the new process I hope we will implement (i.e. you can get a project with just a node ID immediately) 95% of that backlog will be gone.

webchick’s picture

Peter, so where are these mythical people who are going to do this mythical "widely shared" approval, and why aren't they helping with this right now, when we so desperately need them?

Until we get our CVS application backlog under control, I will fight to the death any attempt to introduce more bureaucracy into the contribution process. And after that, too. The people who want to give us their code are less than 0.05% of our community. It's completely asinine and idiotic the kind of backflips we make these people perform just to get into the door, so they can help us. The idea of expanding this approval process out further to every time you want to make a project is enough to cause me to want to move to GitHub. :(

Stepping back, here's a summary of stuff we need to figure out:

  1. We need to stop killing new code contributors' spirit before they even get their foot in the door. The agreed-upon course of action in CPH is letting anyone who wants to create a project on d.o, so they can get familiar with our tools before they go for approval, and account approvers have better tools with which to review the code. We still need to put in some rudimentary checks in so that we don't become a warez site. That's covered in #720700: Come up with an update hook to mitigate abuse of sandboxes.
  2. We need to deal in some way with the namespace proliferation problem. Peter suggests numeric shortnames vs. human-readable shortnames prior to project approval. We could also go the other way and come up with a procedure for deleting projects after a certain period of time (merlinofchaos has one possible proposal at #855508: Policy Proposal: Reduce bad maintainer reliance on -dev modules).
  3. We need some means of differentiating "real" projects from experimental projects. Peter suggests numeric shortnames vs. textual shortnames. Howard suggests porting over the concept of per-user sandboxes. Michelle suggests "golden contrib".
  4. We also need some sort of process of moving from an experimental to a "real" project, and make the security team only responsible for "real" projects. The proposal the Git team is planning on implementing is around the one-time granting of "create project release" permissions by an approval team, based on existing code in Drupal.org's Git repository. Peter suggests an approval process around each time you want to add a human-readable shortname to a project. Michelle suggests getting a particular project into "golden contrib" requires an approval process.
  5. We need better tools/indicators for finding good modules vs. crappy modules. The initial redesign launch has actually helped with that a lot. Ratings/reviews are also coming down the pipeline. There are folks working on indicators for automated testing coverage too, afaik. Additional ideas (or, preferably, code) around this would help on a whole bunch of fronts, not just this one.
  6. And finally, we want to launch the Git migration by Drupalcon Chicago. That means we should probably not be trying to engineer the penultimate solution to fix every single one of our problems here, but the simplest thing that could possibly work, and that we can all live with. Please bear that in mind when suggesting we veer off course.
pwolanin’s picture

@webchick: I'm 100% on board with you. Clearly I'm not effectively communicating my thoughts in writing, however, since I seem to be provoking an unintended response.

I feel pretty strongly that the numeric project ID route will get use very far, very fast. I'm happy to help code it if someone can point me in the right direction in terms of assisting. I think the recent posts mainly point to the need to have a large and graduated set of permissions so we can adjust roles/permissions dynamically as we get used to the system.

tizzo’s picture

Peter I see where you are coming from, but I think that using project nodes for sandboxes is going to cause problems in the community without a lot more thought and effort.

The project shortname is mainly what is used in the URL (and other places irrelevant to sandboxes) and most new users don't pay attention to that (they're trained not to by ugly urls elsewhere on the web). While using nid's for shortnames solves the technical problem of Drupal.org namespace pollution, it does not solve the problem of decreasing confusion for end users, and it does not prevent namespace collisions in PHP code.

Think about this: many sandboxes are going to be forks of existing projects. In your proposal, drupal.org/project/views is going to be the views. drupal.org/project/12345 is going to be a views (where some user forked views to work on feature X). What will differentiate these? Both have issue queues and project pages. Will new users understand the difference? Will google?

It's better for the community to keep sandboxes in a quarantined area. Any solution that makes a sandbox look like a project is a bug, not a feature.

Furthermore, lets not forget the criteria for getting a project approved. It should pass the coder.module's coding standards tests, be free of gaping security holes and be a working module. We don't need full projects with proper queues for that. By the time issue queues really become relevant, this person should have a proper project! Having the other features on sandboxes is going to encourage end users to use the projects, I think that's something we want to discourage. Sandboxes are for developers and collaborators, projects are for end users.

In my view, getting the code out of tarballs and into repos tracked by versioncontrol api (which can make views of each repos commits, etc) is what's needed to start watching sandbox activity and reviewing diffs and gets us away from tarballs for the review process.

I think it would be a mistake to underestimate the number of new Drupal users that are savvy enough to `git clone` in a post-github world. Sandboxes that differ from project's only in their lack of release tar balls are bound to confuse people.

[edited to fix an unclosed tag and grammatical errors]

pwolanin’s picture

@tizzo - no where do we have PHP name space enforcement.

We can always style projects differently based on their status. And as above, I suggest we use robots.txt and our search integration code to make these show up last or not at all in searches and listings. I really think there will be little confusion if it's done right.

I feel pretty strongly that have a unified system for all this will be much more beneficial than trying to construct a walled-off sandbox area. At that point anyone serious will just go back to github.

sdboyer’s picture

Status: Active » Fixed

I've been reading and re-reading the various points brought up in this issue, and am a bit torn. In particular, the points raised by tizzo in #48 got me thinking, "would it really be THAT difficult to introduce true sandbox functionality? it would solve the namespace problem, and real sandboxes would be _great_." At the same time, pwolanin's suggestion in #54 (which I think was also brianV's suggestion in #43, but nobody seemed to notice it) seems like the simplest possible addition to our original plan from CPH that will take care of our problems.

So, in an attempt to bring some closure to this, let me start by stepping back for a big-picture view of what sandboxes could/should be all about. I think it's important because it highlights the range of issues relevant to the decision we make now - and I'd like to future-proof our decision, to the extent that it's possible & feasible.

In a hazy, rosy future...

Projects and sandboxes happily co-exist in the d.o universe, each as first-class items with distinct purposes. Anyone who's checked the box promising to submit GPL code can add new sandbox repositories, very much like github; in fact, sandboxes could be a place where we work directly with github. We provide buttons allowing projects to be created from sandbox repos, as well as buttons allowing you to pop off your own copy of a project repo into a sandbox repo (again think github: "hardcore forking action!") - and for such popped-off sandbox repos, a set of buttons to spawn new issues against the original project, using the sandbox repo as the base for a new per-issue repo. These sandbox repos have issue queues and can also be included in contributor coding activity statistics. But they can't have releases, they're not included in public-facing project listings on d.o, and only the sandbox owner will ever have push access. In other words, these are true one-person sandboxes - play areas that facilitate interaction around code within the d.o environment, act as training grounds for good project maintainership practices, but stay entirely out of view of drupal's 'consumers'.

I don't really care to articulate a longer-term vision for project creation, as that's something we're going to need to play quite a bit by ear. IMO, the best way we can serve that process is to build a good sandbox system, as doing so will be _the best_ thing we an do to support experimenting with & creating sane project creation workflows later. d.o sandboxes as first-class entities are a gift that keep on giving.

My goal for phase 2, then, is an approach that slides as naturally into these phase 3 goals, while requiring the minimum amount of work. That was the operant principle during CPH, too. The only thing that's changed is the namespace question. In retrospect I'll admit it could have had more consideration at CPH, as it has the potential to introduce some very hostile dynamics in fights over namespaces, even if we do establish some clear guidelines for it. Nice to avoid if possible.

I'm gonna opt in favor of brianV/pwolanin's basic nid-based approach: it's a nice under-the-hood trick with minimal additional complications to project, takes a minimal number of missteps into sandbox territory, and won't add to review burden. For now, we'll have projects & pseudo-sandboxes smooshed together. When a project gets approved/if the person creating the project has the right perm, it earns a real namespace, but until then it gets only a nid. It'll work well enough initially that we can stand to launch with it, and the workflow will probably be similar enough to what we eventually implement that it won't confuse people too much when we do implement real sandboxes. That'll take a little while, though, because I think that doing it properly is going to mean some non-trivial additions to project. But let's also be clear that doing it this way will, I strongly suspect, require more awkward hacks to project.

As for how we actually grant perms for project creation, not much has changed since CPH. Nevertheless, let's sum that up: anybody can post pseudo-sandbox projects, but such projects must go through an approval process in order to become full projects (which means inclusion in listings on d.o, the ability to generate releases, and occupying a real spot in our global namespace). There'll also be a perm which allows users to bypass that approval process and immediately create full projects (entirely analogous to what we have now).

I _think_ this adequately addresses all the big points. Please feel free to discuss further, but ONLY take change the status from fixed if there is an actual glaring problem with the above proposed system which we really need to consider as part of phase 2.

sdboyer’s picture

I missed #71 and #72 while writing #73, but they don't change the direction I'm advocating. tizzo, I do agree that the ultimate model we should pursue for sandboxes is something more akin to what you're describing - but I think the immediate term portion of pwolanin's idea is better because there's simply no way to do sandboxes fully and well (and not walled-off) in the time frame we have, which means going the path of least-damage. The importance of having issue queues for these projects is something we talked over quite a bit in CPH, and ultimately advocated for sandboxes as sorts of "projects with training wheels" - a vision I still find compelling.

mlncn’s picture

For namespacing, should we officially recommend that people include their project node ID in their project until the get official? So it's always myhopedformodulename1120394_magic_ponies(), making renaming a module when it gets its namespace a simple matter of find-and-replace.

sdboyer’s picture

I think the easiest procedure could be that we simply don't allow projects to have a shortname until they're approved. Including the nid would be redundant on top of the other strategy, though - plus, people won't know the nid of the project until after it's been created. Chicken or egg, there :)

pwolanin’s picture

@sdboyer I think the suggestion related to PHP name spacing inside the module code.

@Benjamin - I think that's a fine suggestion (something like that could be documented as a best practice), but the likelihood of wide collisions is low enough that such a suggestion is only going to be adhered by advanced contributors who won't need it anyhow.

mikey_p’s picture

I hate to drag out the issue after being closed, but many of the namespace concerns will be easier to address once #102102: Parse project .info files: present module list and dependency information is complete. When that's done we'll have an actual table with real data from .info files that we can query and expose to allow users to search for occupied namespaces directly on Druapl.org.

If you're wanting to help out with the namespace problem, that issue would be a great place to start with reviewing some existing code.

sdboyer’s picture

Assigned: sdboyer » Unassigned
Status: Fixed » Active

Ugh. OK, there's been some apparent fuzziness on the whole namespace issue. There are two namespace issues - one is project namespaces, and the other is PHP namespaces. While tied together by convention, there is no necessary relationship between the two. And the only one we should be caring to discuss here at all is project namespaces. So yes @pwolanin in #77, I think I probably did misconstrue #75, but I agree that it'll likely be moot. And @mikey_p in #78, that's a PHP namespace issue, which while it could maybe be touched on at the level of packaging/project namespaces, is (I think) WAY beyond the scope of anything we need to care about for phase 2.

sdboyer’s picture

Assigned: Unassigned » sdboyer
Status: Active » Fixed

Dammit.

tizzo’s picture

Everything we’re considering is a marked improvement and I’m committed to help out with implementation however we decide to do it. That said:

I must have been sitting at the wrong table in CPH because I missed the meeting that everyone is referring to where we identified and agreed upon the vision of what sandboxes are actually for in our community and what problem they’re trying to solve.

In terms of the approval process, I thought we wanted sandboxes for one and all to provide new contributors with version control and to use that for the approval process, after which these contributions become real projects. I’m starting to get the sense that the problem we’re really trying to solve is to make the code review/project approval process less necessary and less time critical. Is the rationale “if people just have full-on projects, who cares if it takes them a long time to get approval”? If so, project based sandboxes aren’t just a solution, they’re the only solution. If that’s the case I’ve been missing the boat and I don’t think there is even a question here.

If not: aside from projects that have been created but not yet approved, is there a real use case for proper issue queues on sandboxes? (Let’s leave per-issue queue sandboxes aside here, they are a separate issue and need their own solution that integrates tightly with whatever issue they relate to (test-bot, etc)).

Re: @pwolanin in #72:

I feel pretty strongly that have a unified system for all this will be much more beneficial than trying to construct a walled-off sandbox area. At that point anyone serious will just go back to github.

Shouldn’t anyone serious about their project get a proper project?

We’re saying that this doesn’t need to be the ultimate solution, but if we use project.module for sandboxes, we’ll need to carry forward and/or migrate all of the peudo-project features and data (issues, comments, etc) forever or people will yell ‘regression!’. Using project for sandboxes is definitely a commitment, moving to anything else would require replicating project_* features in a new sandbox_* suite which is unlikely to ever happen. If we start anew it doesn't have to be the final solution, if we extend project we're at least committing to a minimum.

tizzo’s picture

My question really is this: do we want sandboxes to be full-on projects because it's good for sandboxes to be full-on projects or because we want a work around for the proper project review process?

If it's the former, what's the user story? If it's the latter, that seems like sort of a social hack but it might be the easiest solution to what is becoming a big problem.

sun’s picture

At first, I also struggled with this differentiation of goals, tasks, and infrastructure tools. However, thinking through it, I no longer think that there is a need to special-case the two, basically different goals resp. user stories.

What matters is that current sandboxes are used for arbitrary code. CVS applications also contain arbitrary code. Initially, neither of both has the goal of being published as official project, nor should it. However, both could use a canonical resource/URL on d.o in order to facilitate better access to the code. And it seems to turn out that the most simple way of achieving both types of arbitrary code resources and repositories is to turn them into regular project nodes, merely using an internal flag to differentiate between official projects and sandboxes (and/or a missing/existing machine name).

webchick’s picture

The advantages of creating full-on projects for "sandbox" projects are:

1. The same collaboration tools work everywhere. If I am playing around with someone's experimental code and find a problem, I can file an issue and a patch. I don't need to somehow find the author's email address and email them a patch, which is what I currently have to do. (And so, as a result, usually don't.)

2. It also allows multiple people to work on experimental code by simply adding them as project co-maimtainers (again, same collaboration tools work everywhere), rather than having to do "pull request" type stuff that our current infrastructure doesn't support, and won't until well into "phase next", if ever.

3. Because of #1, we can effectively provide training wheels for new contributors on how to be a maintainer, before their "formal" account is approved. Git application admins can also use the actual issue queue to post feedback rather than one gigantic monster issue keeping track of every jot and tiddle (we call that "killing kittens" in the rest of the community...)

4. It's what GitHub does. GitHub doesn't make a distinction between a repository that's for a scrappy one-off thing you wrote while drunk at a party and, say, jQuery. Either way, you get an issue queue, a landing page, etc. If we want Drupal.org to remain the central collaboration hub for Drupal code, we need to offer at least that too, IMO.

tizzo’s picture

@webchick: I think you converted me.

The one caveat is that github also makes very clear whose repository you're dealing with. When I want to see whether this is the jQuery or if this is a jQuery, I look to see if this repository was forked from someone else, whose repository it is, and sometimes cross reference it with something else (blog post, project's homepage, whatever).

Github in no way tracks canonicality™, it leaves that to other resources and ownership is a key component of a github project. Canonicality™ is a key component of a Drupal project and what community members understand about them.

If we need this at launch, should we create issues like "Determine way to indicate that sandbox projects are untrusted, non-canonical or experimental code"?

Do we need a way to track project hierarchy (merlinofchaos's views begat dereine's views, begat tizzo's views) and is that something that belongs in the presentation of the sandbox project nodes (major feature creep / phase 3)?

We also need to figure out how people find these sandboxes. Ok, they don't turn up in search, how do we find them? Are they listed on a user's page?

Also: The title of this issue is still 'Determine/finalize code review and project approval process'. How do I flag my sandbox for review? Just link to it from the appropriate queue (that could be enough)?

webchick’s picture

Good point about Canonicality™. We do have the project author indicated on the node, so I think we're all right authorship-indicator-wise. But it's a good point that someone outside the community would not necessarily know/understand that the "Views" project by dereine is not the "real" Views module, and the "Views" project by merlinofchaos is. (Though it will help those people a lot that one won't have release nodes associated with it and the other will, and that one will have a URL like /project/views, and the other won't.)

But yes, "Determine way to indicate that sandbox projects are untrusted, non-canonical or experimental code" sounds like a good follow-up issue (please tag "git phase 2" and cross-link here). Could be as simple as force-altering the node title so there's no physical way to make another "Views" module, only "Views [experimental]". But anyway let's discuss over there. :)

Project hierarchy I think is "phase next." We don't have the capability to track that currently (for example, to list Feeds module as a "fork" module from Feed API), so no reason to hold up our release date on not having it for launch, IMO. Worth starting an issue about though (tagged "git phase 3") for later; it'd be nice if the 'canonical' project pages could show a list of all of their forks ala https://code.launchpad.net/pressflow.

And yes, these "experimental" projects would be linked to from various authors' profile pages, just like all the other projects that the person's committed to -- same collaboration tools work anywhere, remember? :)

And while it's true we didn't figure out the exact step-by-step future approval process here as the issue title would otherwise indicate, and we definitely need to, I think it's a separate concern from the technical implementation which is what this issue has been good for nailing down. So how about let's start a separate issue (again, "git phase 2" and cross-linked here) to discuss the mechanics/policies around how this will work in practice now that we've basically reached consensus on the technical implementation.

Cool?

webchick’s picture

Created:

#984734: Meta: Visually distinguish sandbox projects from regular projects when viewing them
#984730: Decide on actual process for promoting a "sandbox" project to a "real" project

I don't know enough about what other hosted repositories do around the concept of canonicality™ to feel comfortable writing up that issue, though.

webchick’s picture

Title: Determine/finalize code review and project approval process » Determine/finalize technical requirements for post-Git migration project approval process

Re-titling this to be more accurate. :)

Status: Fixed » Closed (fixed)
Issue tags: -git phase 2, -git sprint 5

Automatically closed -- issue fixed for 2 weeks with no activity.