Problem/Motivation

There's been many changes to our repository on Drupal.org:

1. First CVS
2. Moved to GIT with gitweb interface
3. Moved to cgit web interface

This has resulted in many broken links when people try to link to a commit. For example: the link

http://cgit.drupalcode.org/drupal/commit/?id=00cb147e146d24c87bc31462e9ff3aabb3dd1569

works today, but may not work if we switch to another web interface.

Proposed resolution

We could avoid any more broken links by creating a new code. For example: [commit:00cb147e1]

This way, Drupal.org can parse that and output the real link.

It also opens up other possibilities, like automatically showing commit author information, a list of contributors stored in git notes, etc.

Remaining tasks

Submit patches to implement this feature on Drupal.org.

User interface changes

Introduction of a new code (eg. [commit:00cb147e1]) that would be transformed into a URL, and include the commit message, contributors, etc.

API changes

None.

Comments

fizk’s picture

Issue summary: View changes
fizk’s picture

Issue summary: View changes
tvn’s picture

Project: Drupal.org webmasters » Project issue tracking
Version: » 7.x-2.x-dev
Component: Site organization » Issues
dww’s picture

Title: Create code for linking to git commits in issue comments » Create code for linking to git commits
Project: Project issue tracking » Version Control API -- Git backend
Version: 7.x-2.x-dev » 7.x-1.x-dev
Component: Issues » Code

+1 to the concept!

This would just [sic] add a new text filter that converted [commit:hash] to a useful link. project_issue doesn't know or care about Git at all. The reason the [#nid] filter lives in project_issue is because the filter output itself is the business of project_issue (e.g. status-specific coloring of the div, assigned info in the hover text, etc), not the fact that most times people are using that text filter they're doing so on issue nodes/comments.

So, the place to implement this new text filter would be in a module that knows about Git, the currently configured Git repo viewer, etc. That'd be versioncontrol_git.

Cheers,
-Derek

dww’s picture

Title: Create code for linking to git commits » Create input filter code for linking to git commits

Hopefully more self-documenting title...

marvil07’s picture

Project: Version Control API -- Git backend » Version Control / Project* integration

Sounds fine.

Since this assumes there is a repository based on the project the issue belongs to, I would say the place for this is versioncontrol_project_issue_git module at versioncontrol_project and should use versioncontrol abstraction for repository urls(call versioncontrol webviewer_url_handler plugin methods to get the url).

Patches are welcome!

dww’s picture

Why does this know anything about projects? Oh, because once you fork, you might have multiple projects with the same commit hashes. :/ That makes this filter much more tricky. Text filters don't normally have $node context. By default, they don't know what node they've been embedded in. You have to do some pretty unholy things to make that work (for example, see my patch at #1095014-4: Remember node context without <dme:context> tags. for one possible approach).

And what happens if you're not embedding a reference to a commit on a node that has a notion of a project (either an issue, project, or release)? E.g. how about if a book page wants to reference a commit using this? Or a block? Ooof.

Frankly, given all this, I'm much less enthusiastic about providing this feature. Someone will have to come up with a better syntax or proposal for how it works.

The code would be vastly cleaner if the user specifies both the hash and the project. But that makes it harder to use. Ugh. Something like this?

[commit:project_machine_name:hash]

E.g.

[commit:project_issue:661e6f3]
?

Bleh. ;)

fizk’s picture

dww, Thanks for the insights, [commit:project_machine_name:hash] seems very reasonable to me, and I'd be happy to use that notation.

sun’s picture

In any case, let's not repeat the epic syntax mistake of the issue link filter here.

Automatically enhanced strings like this should use a natural syntax instead of a [cryptic:hard:strange] syntax.

Natural means:

repository@a1b2c3f4

In words: Repository name, followed by a literal '@' sign, followed by at least 8 leading characters of the commit hash.

This bug was introduced in drupal@2da50857 yesterday.

See also https://help.github.com/articles/writing-on-github#references

markcarver’s picture

I agree with #9, it should just preg_match_all ([a-z0-9_]{2,})@([a-z0-9]{8}). It shouldn't/doesn't have to be in a [...] format.

Fabianx’s picture

Totally agree with #9, the syntax is nice!

Steven Jones’s picture

Any pointers on how one would go about getting the URL for a project's code viewer given a project short name + a hash?

I was going to try and write some code for this, but couldn't really see how you go about getting the URL.

markcarver’s picture

Actually you wouldn't have to retrieve anything. CgitMost web interfaces can use the shortened version of a hash: http://cgit.drupalcode.org/drupal/commit/?id=2da50857. All you'd need to do is something like the following (using an anonymous function, but you get the idea):

    $body = preg_replace_callback(
        '/([a-z0-9_]{2,})@([a-z0-9]{8})/g',
        function ($matches) {
            return '' . $matches[0] . '';
        },
        $body
    );
markcarver’s picture

Furthermore, it would also be nice if we could turn the long URLs so they are displayed as the shorthand version as well (like GH does). So a comment that contains just the URL like: http://cgit.drupalcode.org/drupal/commit/?id=2da508579c6a30f8e51b1cbeaee...

Would be displayed as: drupal@2da50857

    $body = preg_replace_callback(
        '/https?:\/\/cgit\.drupalcode\.org\/([^/]*)\/commit\/.*(?:\?|&)id=([a-z0-9]+)/g',
        function ($matches) {
            return '' . $matches[1] . '@' . substr($matches[2], 0, 8) . '';
        },
        $body
    );
Steven Jones’s picture

Hmm...it's more about the syntax that was suggested above of being able to type: drupal@2da508 and have that magically turn into a link, rather than having to go off, get a link, and then just get that re-written to a shorter syntax.
This implies going from these two items of information, the project name and a hash, and constructing the correct link. I believe that the repo viewer is abstracted out in Version control API so we shouldn't assume that it's always going to be cgit.

Steven Jones’s picture

Ah sorry, I see what you're saying.

Yeah, I guess the long links turning into a shorter ones is nice, but maybe not expected given that the same thing doesn't happen for issue links.

markcarver’s picture

In theory, we could use the t() function to replace variables more easily so we'd only have to change it in one place (if we move to something other than cgit in the future). I was a bit hesitant to use this method though because this isn't really what t() is supposed to be used for

(edit) we should just use format_string():

// Current web interface URL for commits.
define('VERSIONCONTROL_PROJECT_COMMIT_URL', 'http://cgit.drupalcode.org/!project/commit/?id=!hash');
/**
 * A callback for replacing commit hashes/links.
 *
 * Uses the currently implemented absolute URL and shorthand. It
 * assumes that the first grouping match is the project and the second
 * is the commit hash.
 */
function _versioncontrol_project_commit_url_callback($matches) {
  $project = $matches[1];
  $hash = $matches[2];
  return format_string('@commit_label', array(
    '!commit_url' => format_string(VERSIONCONTROL_PROJECT_COMMIT_URL, array(
      '!project' => $project,
      '!hash' => $hash,
    ),
    '@commit_label' =>$project . '@' . substr($hash, 0, 8),
  ));
}
// Replaces all 'project@hash1234' texts with links.
$body = preg_replace_callback('/([a-z0-9_]{2,})@([a-z0-9]{8})/g', '_versioncontrol_project_commit_url_callback', $body);

// Replace all absolute Cgit URLs with the current commit URL shorthand.
$body = preg_replace_callback('/https?:\/\/cgit\.drupalcode\.org\/([^/]*)\/commit\/.*(?:\?|&)id=([a-z0-9]+)/g', '_versioncontrol_project_commit_url_callback', $body);
Yeah, I guess the long links turning into a shorter ones is nice, but maybe not expected given that the same thing doesn't happen for issue links.

Just because we don't currently do this with absolute issue URLs, doesn't mean we cannot do it for this (or issue URLs in the future).

Steven Jones’s picture

@Mark Carver you're after format_string.

I was under the impression that the version control module had some classes that you basically pass that sort of information to, that then returns a link, so that indeed this can be changed in one place.
I'm after someone hopefully who can tell me if that's the case, and how to go about getting to such a handler.

markcarver’s picture

Heh, you're right... I was trying to remember if there was such a thing but couldn't remember off the top of my head. I've updated #17 accordingly.

markcarver’s picture

I was under the impression that the version control module had some classes that you basically pass that sort of information to, that then returns a link, so that indeed this can be changed in one place.

This really isn't about versioncontrol_project acting on a specific commit event per se, but rather versioncontrol_project providing a filter via hook_filter_info(). This will, unfortunately, have to be inside hook functions in the .module file opposed to interfacing with some existing class.

Steven Jones’s picture

Any pointers on how one would go about getting the URL for a project's code viewer given a project short name + a hash?

To answer my own question (and working backwards from how the code will need to work):

  1. We need to get hold of a class that implements: VersioncontrolWebviewerUrlHandlerInterface and then call the getCommitViewUrl method on it to get our URL.
  2. We can get this class by calling getUrlHandler on the repository object.
  3. versioncontrol_project will load this onto the project node in $node->versioncontrol_project['repo'].
  4. project_get_nid_from_machinename will give us the nids of any matching projects from a shortname/machinename so that we can begin the above hunt!

Writing the filter is actually quite trivial I think, especially as @Mark Carver provided a nice regex above.

markcarver’s picture

AFAIK (and after reading the issue summary), this issue is only about implementing new filter(s). I fail to see why interfacing with any class is necessary here?

Steven Jones’s picture

Status: Active » Needs review
FileSize
2.86 KB

Here's a patch that provides a filter for filtering git commits in the form: repository@a1b2c3f4.

@Mark Carver it should now be obvious why I was asking about classes etc, because that is the abstraction layer that the Versioncontrol project basically is. We ensure that our code is non-Drupal.org specific (though I'm fully aware that it basically is) and we also don't duplicate the settings for the location of the repo viewer by using it and its classes.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC, nice work!

markcarver’s picture

Status: Reviewed & tested by the community » Needs work

Ok, I see now what you mean. I didn't think it would need this level of abstraction, but in retrospect this is indeed better. I suppose we could add the secondary filter (#14) in drupalorg and just use the same versioncontrol_project_git_revision_process_callback there. I'll go ahead and create an issue there.

A couple things:

  1. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,68 @@ function versioncontrol_project_git_views_api() {
    +    if (($nid = project_get_nid_from_machinename($machine_name)) && ($repo = versioncontrol_project_repository_load($nid)) && ($repo instanceof VersioncontrolRepository) && ($url_handler = $repo->getUrlHandler()) && ($url_handler instanceof VersioncontrolWebviewerUrlHandlerInterface)) {
    

    Minor nit pick, can we at least put these on separate lines for readability?

  2. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,68 @@ function versioncontrol_project_git_views_api() {
    +  if (isset($url)) {
    ...
    +  }
    

    We really don't need an if isset statement here, just add the $url to the if statement above and then return the link from there.

    Also, if we could always limit the display text to the 8 character revision instead of just the full text match and open them in a new window, I think that would be better:

    return l($machine_name . '@' . substr($revision, 0, 8), $url, array(
      'attributes' => array(
        'target' => '_blank',
      ),
    ))
    
markcarver’s picture

I have created this as a follow-up.

markcarver’s picture

Since these were very minor nit picks, I just went ahead and implemented them. I also updated the docs a bit so they conform to standards are are a little more explanatory.

markcarver’s picture

Status: Needs work » Needs review
Steven Jones’s picture

Thanks for the review and the improvements Mark, we're moving in the right direction.

As we're being nit-picky...I'll point out that according to https://www.drupal.org/coding-standards#linelength we shouldn't be splitting the conditions of an if statement over multiple lines, but using some other structure. But, to be honest, I'm really not overly bothered about it to knock this back to needs work.

markcarver’s picture

No, you are right. In hindsight, we really shouldn't need the "instanceof" either since we are already in a try catch, right? This should be a bit cleaner.

Steven Jones’s picture

Status: Needs review » Needs work

we really shouldn't need the "instanceof" either since we are already in a try catch, right?

Ah, if only PHP was consistent, then yes. Sadly, if one of the methods returns say, FALSE, then you're then calling a method on a non-object, which is a fatal error, and that isn't catchable by the try/catch :(

So we will need those instanceof lines in there somewhere.

markcarver’s picture

Ah, I was hoping that wasn't the case. Oh well, here is an even better patch then :D

markcarver’s picture

Status: Needs work » Needs review

Ugh, forget the wrong patch filename. drush_iq is still working out the kinks (of which it isn't setting the issue status appropriately either).

markcarver’s picture

Woops, here is a patch that accounts for if the URL is empty.

Steven Jones’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me and seems to work nicely.

Wim Leers’s picture

OMG YAY, love this! Can't wait :)

marvil07’s picture

Thanks for the patches!

I did not yet read all the thread, but looking at the last patch it seems to be doing the right thing.
Hopefully I can get back to this soon.

Mixologic’s picture

Issue tags: +BDD

If/when this get added I'll want to create a test for it eventually.

Also, would linking to a commit in a sandbox work if I used the {sandbox id}@{commithash} ?

edit: I keep forgetting that < and > are bad things to wrap variables in.

marvil07’s picture

Status: Reviewed & tested by the community » Needs work

Now I had the time to read all the comments, so please bear with me.

Why does this know anything about projects? Oh, because once you fork, you might have multiple projects with the same commit hashes.

Yes, in theory a filter should be as independent as possible, but since the needed URL needs a versioncontrol repository object to be generated, I guess the proposed solution is good enough: including the project identifier (machine name) in the text to be filtered.

About multiple projects sharing a subsets of its commits, it is fine now that a project machine name is provided.
Notice that different projects have different repositories, so the even for the same commit hash in two different projects the urls shoudl be different.

And what happens if you're not embedding a reference to a commit on a node that has a notion of a project (either an issue, project, or release)? E.g. how about if a book page wants to reference a commit using this? Or a block? Ooof.

That is also a valid concern, but again, with the project machine name we should have enough context.

Automatically enhanced strings like this should use a natural syntax instead of a [cryptic:hard:strange] syntax.

Natural means:

repository@a1b2c3f4

In words: Repository name, followed by a literal '@' sign, followed by at least 8 leading characters of the commit hash.

IMHO that is not natural. @ makes me think about mail, so the elements are in the wrong position.
If we go with the proposed syntax, it should be hash@project_machine_name.

dww's proposal [commit:project_machine_name:hash] is really like redmine syntax(commit:c6f4d0fd).
Redmine does not have the concept of pages outside project context(like books as mentioned before), so they do not need the project machine name to get the context.
So that explains having "commit:" string included.

In our case, we already have a filter for project issue nodes(e.g. [#1234]) which assumes the syntax for issue nodes.
Based on that I would propose to: assume the syntax for commits(i.e. do not use "commit:" string) and use @ sign as in mail.

I'm not sure if we should use brakets or not, let's not use it for now, since mails are rarely used sites like this.

The resulting proposed syntax would be: hash@project_machine_name.

Also, about the hash, I would say we should suggest 7 characters, which is the git CLI default to shorten hashes(e.g. see git log --oneline).

Also, would linking to a commit in a sandbox work if I used the {sandbox id}@{commithash} ?

IIRC sandboxes machine name are the node id, so there should not be any problem.

Now some code review.

  1. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,94 @@ function versioncontrol_project_git_views_api() {
    +    return t("References to git commits in the form of project_short_name@commit_hash turn into links automatically. You may reference sandboxes by referencing their numeric short name, and you may use any length of commit hash from 8-40. For example: drupal@2569242a, drupal@2569242afd5eec297e7d72065f4e3dd2586d0fd8");
    

    nitpick: please use ' instead of " since it is not needed here.

  2. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,94 @@ function versioncontrol_project_git_views_api() {
    +  try {
    

    IMHO try/catch is not really needed here, since the exception messages are not used to log some error.

    I would suggest to use return when needed instead.
    e.g.

    if (!$nid) {
      // Not a valid project machine naem.
      return;
    }
    
  3. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,94 @@ function versioncontrol_project_git_views_api() {
    +    $repo = versioncontrol_project_repository_load($nid);
    

    nitpick: please use $repository (long story on #1045496: Stop using 'repo' shorthand in internal variable names and functions).

  4. +++ b/versioncontrol_project_git/versioncontrol_project_git.module
    @@ -121,3 +121,94 @@ function versioncontrol_project_git_views_api() {
    +    return l($machine_name . '@' . substr($revision, 0, 8), $url, array(
    

    Here the $repository->getBackend()->formatRevisionIdentifier($revision, 'short') method can be used instead.

Steven Jones’s picture

We added the try/catch because getUrlHandler(); can throw exceptions in rare cases, but for us, they shouldn't be fatal, we can just not replace out the strings.

markcarver’s picture

Patch submitted by the Drush iq-submit command.

markcarver’s picture

Status: Needs work » Needs review

IMHO that is not natural. @ makes me think about mail, so the elements are in the wrong position.
If we go with the proposed syntax, it should be hash@project_machine_name.

dww's proposal [commit:project_machine_name:hash] is really like redmine syntax(commit:c6f4d0fd).

@dww's proposal has the project first and the revision second. repository@revision was a natural evolution of that order. Also, with many of us also on GH, this makes sense. The revision belongs to the project (ie: you have to know what the project is first before the revision). I do not agree with hash@project_machine_name.

We added the try/catch because getUrlHandler(); can throw exceptions in rare cases, but for us, they shouldn't be fatal, we can just not replace out the strings.

I took a look at this method and it indeed only throws a warning and it's the only method that throws an exception at all, so we can remove the try/catch entirely.

markcarver’s picture

marvil07’s picture

Status: Needs review » Needs work

@dww's proposal has the project first and the revision second. repository@revision was a natural evolution of that order. Also, with many of us also on GH, this makes sense. The revision belongs to the project (ie: you have to know what the project is first before the revision). I do not agree with hash@project_machine_name.

I guess I did not got into enough details.

dww's proposal is using a colon, and in that case I think it is ok to have project_machine_name:revision, but if we are going to use @ that historically references to mail, e.g. name@domain.com, for which I think it is un-natural to have project_machine_name@revision, since following the analogy it should be revision@project_machine_name.

Hopefully now it makes more sense.

markcarver’s picture

edit: we need other opinions. I personally think this is more about how one interprets @. I do not see this as an email address at all.

I see "project" is being referenced "at" this "revision" point, not
"revision point" "at" (belongs to) this "project"

I guess we could just do project:revision (replacing @ with :). I just don't think that it would be enough to differentiate the project from the hash (just looks like one line of text) because @ utilizes the full line height and character em width opposed to :

drupal:2da50857

vs.

drupal@2da50857

Fabianx’s picture

Status: Needs work » Needs review

https://help.github.com/articles/writing-on-github#references has

User@SHA: jlord@a5c3785
User/Project@SHA: jlord/sheetsee.js@a5c3785

So it would be good to adopt that to not confuse people coming from GH more than necessary.

Also in this case the natural speak means:

project _at_ revision

e.g.

This was fixed by 'drupal at 2da50857'

which references the revision this project was on at a certain point.

and is not meant as an address or something, but just read as 'at'.

YesCT’s picture

Issue tags: +d.o input filters