Add project checkout version (i.e. git commit identifier) field to various plugins [#1666160]

There are certain plugin types (scan being one example) where the ability to re-run the job on a given release does not provide any additional value; since as long as the job completes successfully, the results will never vary from one job run to the next.

There are also jobs (such as plumber) which depend on the existence of a previous job (scan, in the plumber case).

If the 'dependent' job were to be executed independently, and store the project & commit against which it had been run ... and then the 'dependee' run at some point in the future ... we could perform a query against the existing results dataset and potentially save a redundant job execution. (In other words, if a scan job was run as part of a plumber call, and then coder was invoked a few days later, we could identify the scan job with the same commit_id as the coder run and add it into the job properties instead of queuing up a second scan job on that identical codebase).

Leveraging the commit_id as the property value also helps decouple the data being stored on drupal.org and the conduit server. As written today, creating a plumber job requires that the calling service knows the node ID of the associated scan job on the conduit server; which means that the entire conduit nid tree is duplicated in the calling service's database. The commit_id, however, is an independent data point already known to the calling service (i.e. drupal.org); from which the associated conduit node can be derived during conduit processing ... thus eliminating the need for the calling server to know remote node ids when building jobs.

Comments

Comment #1

boombatower CreditAttribution: boombatower commented 2 July 2012 at 05:25

The scan -> plumber coupling is something I am really not sure how we want to solve. We talked about this in IRC and seemed like we were leaning towards scan => commit_identifier, but I am not sure that really makes sense since we should be able to run multiple scans per commit with different scopes if we want.

Secondly I am not sure scan should detect the commit it ran on as we suggested, maybe it should. For drupal.org at least it makes the most sense to just fire of a group of jobs with a particular commit since issues should know the repo and latest commit related to them and testing branch commits is being triggered by a commit operation.

It seems like the vcs array should support checking out by commit identifier (which was in my original list of todos), but I am not sure that jobs like scan should record the value to allow scan to be fired with not commit context and other jobs to get the context from scan. That seems cumbersome and d.o should have all the information it needs to pass the commit ID when creating jobs.

My original issue was the super simple vcs format I wanted to keep was one string, but that doesn't make it easy to identify between a commit identifier and a tag/branch other then string parsing which is no sure thing. We may have to provide an optional (possibly just force all to this format) that is more verbose and split up like drush make with separate keys. The current format is nice and simple, but isn't necessarily obvious since it is conduit specific. I look for a generic format for representing a VCS specifically and couldn't find much. If one exists I definitely be in favor or using it to solve the problem.

Comment #2

jthorson CreditAttribution: jthorson commented 3 July 2012 at 00:59

The different scopes is a good point ... but I think having the commit id field in the results is valuable to end users after-the-fact as well; as it makes it perfectly clear what codebase a particular test was executed against.

sdboyer is doing a bunch of work on the repos right now, and is now passing actual 'vcs' and 'repository' objects to PIFT ... there might be something in there we can use.

Comment #3

boombatower CreditAttribution: boombatower commented 9 July 2012 at 18:30

The properties are visible to the end-user and if we want to make the commit identifier more visible we can look into that, but that does not require any changes to data structures or anything else. Personally, assuming you know the systems runs against the same commit it should be fairly rare that one needs to see the commit identifier and if you do it would be available on the properties page which also scales to as many repositories as we want.

I mean as we talked about the features this allows for is running against a branch and just recording the commit that was on at the time. I see that possibly being useful for others, but not d.o so much.

Overall, I think we should definitely support the commit identifier in properties, but recording it (for each repo) seems like a useful feature, but perhaps we hold of until someone needs it. If we were to do so we need to come up with some means of automatically referencing the commit identifier. Perhaps assume all jobs in a group run on that (which assumes they run serially [or at least after the first]), define a special job...etc.etc. Which seems much more then we need considering d.o can just send the commit identifier it wants. Definitely seems like something to think about, but I would vote for keeping it simple for now.