Problem/Motivation
There's been many changes to our repository on Drupal.org:
1. First CVS
2. Moved to GIT with gitweb interface
3. Moved to cgit web interface
This has resulted in many broken links when people try to link to a commit. For example: the link
http://cgit.drupalcode.org/drupal/commit/?id=00cb147e146d24c87bc31462e9ff3aabb3dd1569
works today, but may not work if we switch to another web interface.
Proposed resolution
We could avoid any more broken links by creating a new code. For example: [commit:00cb147e1]
This way, Drupal.org can parse that and output the real link.
It also opens up other possibilities, like automatically showing commit author information, a list of contributors stored in git notes, etc.
Remaining tasks
Submit patches to implement this feature on Drupal.org.
User interface changes
Introduction of a new code (eg. [commit:00cb147e1]) that would be transformed into a URL, and include the commit message, contributors, etc.
API changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| #41 | versioncontrol_project-create-input-filter-2281551-41.patch | 4.13 KB | markhalliwell |
Comments
Comment #1
fizk commentedComment #2
fizk commentedComment #3
tvn commentedComment #4
dww+1 to the concept!
This would just [sic] add a new text filter that converted
[commit:hash]to a useful link. project_issue doesn't know or care about Git at all. The reason the[#nid]filter lives in project_issue is because the filter output itself is the business of project_issue (e.g. status-specific coloring of the div, assigned info in the hover text, etc), not the fact that most times people are using that text filter they're doing so on issue nodes/comments.So, the place to implement this new text filter would be in a module that knows about Git, the currently configured Git repo viewer, etc. That'd be versioncontrol_git.
Cheers,
-Derek
Comment #5
dwwHopefully more self-documenting title...
Comment #6
marvil07 commentedSounds fine.
Since this assumes there is a repository based on the project the issue belongs to, I would say the place for this is versioncontrol_project_issue_git module at versioncontrol_project and should use versioncontrol abstraction for repository urls(call versioncontrol webviewer_url_handler plugin methods to get the url).
Patches are welcome!
Comment #7
dwwWhy does this know anything about projects? Oh, because once you fork, you might have multiple projects with the same commit hashes. :/ That makes this filter much more tricky. Text filters don't normally have $node context. By default, they don't know what node they've been embedded in. You have to do some pretty unholy things to make that work (for example, see my patch at #1095014-4: Remember node context without <dme:context> tags. for one possible approach).
And what happens if you're not embedding a reference to a commit on a node that has a notion of a project (either an issue, project, or release)? E.g. how about if a book page wants to reference a commit using this? Or a block? Ooof.
Frankly, given all this, I'm much less enthusiastic about providing this feature. Someone will have to come up with a better syntax or proposal for how it works.
The code would be vastly cleaner if the user specifies both the hash and the project. But that makes it harder to use. Ugh. Something like this?
[commit:project_machine_name:hash]E.g.
[commit:project_issue:661e6f3]?
Bleh. ;)
Comment #8
fizk commenteddww, Thanks for the insights,
[commit:project_machine_name:hash]seems very reasonable to me, and I'd be happy to use that notation.Comment #9
sunIn any case, let's not repeat the epic syntax mistake of the issue link filter here.
Automatically enhanced strings like this should use a natural syntax instead of a [cryptic:hard:strange] syntax.
Natural means:
In words: Repository name, followed by a literal '@' sign, followed by at least 8 leading characters of the commit hash.
See also https://help.github.com/articles/writing-on-github#references
Comment #10
markhalliwellI agree with #9, it should just preg_match_all
([a-z0-9_]{2,})@([a-z0-9]{8}). It shouldn't/doesn't have to be in a[...]format.Comment #11
fabianx commentedTotally agree with #9, the syntax is nice!
Comment #12
steven jones commentedAny pointers on how one would go about getting the URL for a project's code viewer given a project short name + a hash?
I was going to try and write some code for this, but couldn't really see how you go about getting the URL.
Comment #13
markhalliwellActually you wouldn't have to retrieve anything.
CgitMost web interfaces can use the shortened version of a hash: http://cgit.drupalcode.org/drupal/commit/?id=2da50857. All you'd need to do is something like the following (using an anonymous function, but you get the idea):Comment #14
markhalliwellFurthermore, it would also be nice if we could turn the long URLs so they are displayed as the shorthand version as well (like GH does). So a comment that contains just the URL like: http://cgit.drupalcode.org/drupal/commit/?id=2da508579c6a30f8e51b1cbeaee...
Would be displayed as: drupal@2da50857
Comment #15
steven jones commentedHmm...it's more about the syntax that was suggested above of being able to type: drupal@2da508 and have that magically turn into a link, rather than having to go off, get a link, and then just get that re-written to a shorter syntax.
This implies going from these two items of information, the project name and a hash, and constructing the correct link. I believe that the repo viewer is abstracted out in Version control API so we shouldn't assume that it's always going to be cgit.
Comment #16
steven jones commentedAh sorry, I see what you're saying.
Yeah, I guess the long links turning into a shorter ones is nice, but maybe not expected given that the same thing doesn't happen for issue links.
Comment #17
markhalliwellIn theory, we could use thet()function to replace variables more easily so we'd only have to change it in one place (if we move to something other than cgit in the future). I was a bit hesitant to use this method though because this isn't really whatt()is supposed to be used for(edit) we should just use
format_string():Just because we don't currently do this with absolute issue URLs, doesn't mean we cannot do it for this (or issue URLs in the future).
Comment #18
steven jones commented@Mark Carver you're after
format_string.I was under the impression that the version control module had some classes that you basically pass that sort of information to, that then returns a link, so that indeed this can be changed in one place.
I'm after someone hopefully who can tell me if that's the case, and how to go about getting to such a handler.
Comment #19
markhalliwellHeh, you're right... I was trying to remember if there was such a thing but couldn't remember off the top of my head. I've updated #17 accordingly.
Comment #20
markhalliwellThis really isn't about versioncontrol_project acting on a specific commit event per se, but rather versioncontrol_project providing a filter via
hook_filter_info(). This will, unfortunately, have to be inside hook functions in the .module file opposed to interfacing with some existing class.Comment #21
steven jones commentedTo answer my own question (and working backwards from how the code will need to work):
VersioncontrolWebviewerUrlHandlerInterfaceand then call thegetCommitViewUrlmethod on it to get our URL.getUrlHandleron the repository object.$node->versioncontrol_project['repo'].project_get_nid_from_machinenamewill give us the nids of any matching projects from a shortname/machinename so that we can begin the above hunt!Writing the filter is actually quite trivial I think, especially as @Mark Carver provided a nice regex above.
Comment #22
markhalliwellAFAIK (and after reading the issue summary), this issue is only about implementing new filter(s). I fail to see why interfacing with any class is necessary here?
Comment #23
steven jones commentedHere's a patch that provides a filter for filtering git commits in the form: repository@a1b2c3f4.
@Mark Carver it should now be obvious why I was asking about classes etc, because that is the abstraction layer that the Versioncontrol project basically is. We ensure that our code is non-Drupal.org specific (though I'm fully aware that it basically is) and we also don't duplicate the settings for the location of the repo viewer by using it and its classes.
Comment #24
fabianx commentedRTBC, nice work!
Comment #25
markhalliwellOk, I see now what you mean. I didn't think it would need this level of abstraction, but in retrospect this is indeed better. I suppose we could add the secondary filter (#14) in drupalorg and just use the same
versioncontrol_project_git_revision_process_callbackthere. I'll go ahead and create an issue there.A couple things:
Minor nit pick, can we at least put these on separate lines for readability?
We really don't need an if isset statement here, just add the
$urlto the if statement above and then return the link from there.Also, if we could always limit the display text to the 8 character revision instead of just the full text match and open them in a new window, I think that would be better:
Comment #26
markhalliwellI have created this as a follow-up.
Comment #27
markhalliwellSince these were very minor nit picks, I just went ahead and implemented them. I also updated the docs a bit so they conform to standards are are a little more explanatory.
Comment #28
markhalliwellComment #29
steven jones commentedThanks for the review and the improvements Mark, we're moving in the right direction.
As we're being nit-picky...I'll point out that according to https://www.drupal.org/coding-standards#linelength we shouldn't be splitting the conditions of an if statement over multiple lines, but using some other structure. But, to be honest, I'm really not overly bothered about it to knock this back to needs work.
Comment #30
markhalliwellNo, you are right. In hindsight, we really shouldn't need the "instanceof" either since we are already in a try catch, right? This should be a bit cleaner.
Comment #31
steven jones commentedAh, if only PHP was consistent, then yes. Sadly, if one of the methods returns say, FALSE, then you're then calling a method on a non-object, which is a fatal error, and that isn't catchable by the try/catch :(
So we will need those instanceof lines in there somewhere.
Comment #32
markhalliwellAh, I was hoping that wasn't the case. Oh well, here is an even better patch then :D
Comment #33
markhalliwellUgh, forget the wrong patch filename. drush_iq is still working out the kinks (of which it isn't setting the issue status appropriately either).
Comment #34
markhalliwellWoops, here is a patch that accounts for if the URL is empty.
Comment #35
steven jones commentedLooks good to me and seems to work nicely.
Comment #36
wim leersOMG YAY, love this! Can't wait :)
Comment #37
marvil07 commentedThanks for the patches!
I did not yet read all the thread, but looking at the last patch it seems to be doing the right thing.
Hopefully I can get back to this soon.
Comment #38
MixologicIf/when this get added I'll want to create a test for it eventually.
Also, would linking to a commit in a sandbox work if I used the {sandbox id}@{commithash} ?
edit: I keep forgetting that < and > are bad things to wrap variables in.
Comment #39
marvil07 commentedNow I had the time to read all the comments, so please bear with me.
Yes, in theory a filter should be as independent as possible, but since the needed URL needs a versioncontrol repository object to be generated, I guess the proposed solution is good enough: including the project identifier (machine name) in the text to be filtered.
About multiple projects sharing a subsets of its commits, it is fine now that a project machine name is provided.
Notice that different projects have different repositories, so the even for the same commit hash in two different projects the urls shoudl be different.
That is also a valid concern, but again, with the project machine name we should have enough context.
IMHO that is not natural. @ makes me think about mail, so the elements are in the wrong position.
If we go with the proposed syntax, it should be
hash@project_machine_name.dww's proposal
[commit:project_machine_name:hash]is really like redmine syntax(commit:c6f4d0fd).Redmine does not have the concept of pages outside project context(like books as mentioned before), so they do not need the project machine name to get the context.
So that explains having "commit:" string included.
In our case, we already have a filter for project issue nodes(e.g.
#1234: "Mark all as read" in forums.) which assumes the syntax for issue nodes.Based on that I would propose to: assume the syntax for commits(i.e. do not use "commit:" string) and use @ sign as in mail.
I'm not sure if we should use brakets or not, let's not use it for now, since mails are rarely used sites like this.
The resulting proposed syntax would be:
hash@project_machine_name.Also, about the hash, I would say we should suggest 7 characters, which is the git CLI default to shorten hashes(e.g. see git log --oneline).
IIRC sandboxes machine name are the node id, so there should not be any problem.
Now some code review.
nitpick: please use
'instead of"since it is not needed here.IMHO try/catch is not really needed here, since the exception messages are not used to log some error.
I would suggest to use return when needed instead.
e.g.
nitpick: please use $repository (long story on #1045496: Stop using 'repo' shorthand in internal variable names and functions).
Here the
$repository->getBackend()->formatRevisionIdentifier($revision, 'short')method can be used instead.Comment #40
steven jones commentedWe added the try/catch because
getUrlHandler();can throw exceptions in rare cases, but for us, they shouldn't be fatal, we can just not replace out the strings.Comment #41
markhalliwellPatch submitted by the Drush iq-submit command.
Comment #42
markhalliwell@dww's proposal has the project first and the revision second.
repository@revisionwas a natural evolution of that order. Also, with many of us also on GH, this makes sense. The revision belongs to the project (ie: you have to know what the project is first before the revision). I do not agree withhash@project_machine_name.I took a look at this method and it indeed only throws a warning and it's the only method that throws an exception at all, so we can remove the try/catch entirely.
Comment #43
markhalliwellComment #44
marvil07 commentedI guess I did not got into enough details.
dww's proposal is using a colon, and in that case I think it is ok to have
project_machine_name:revision, but if we are going to use@that historically references to mail, e.g. name@domain.com, for which I think it is un-natural to haveproject_machine_name@revision, since following the analogy it should berevision@project_machine_name.Hopefully now it makes more sense.
Comment #45
markhalliwelledit: we need other opinions. I personally think this is more about how one interprets
@. I do not see this as an email address at all.I see "project" is being referenced "at" this "revision" point, not
"revision point" "at" (belongs to) this "project"
I guess we could just do
project:revision(replacing@with:). I just don't think that it would be enough to differentiate the project from the hash (just looks like one line of text) because@utilizes the full line height and character em width opposed to:drupal:2da50857
vs.
drupal@2da50857
Comment #46
fabianx commentedhttps://help.github.com/articles/writing-on-github#references has
User@SHA: jlord@a5c3785
User/Project@SHA: jlord/sheetsee.js@a5c3785
So it would be good to adopt that to not confuse people coming from GH more than necessary.
Also in this case the natural speak means:
project _at_ revision
e.g.
This was fixed by 'drupal at 2da50857'
which references the revision this project was on at a certain point.
and is not meant as an address or something, but just read as 'at'.
Comment #47
yesct commentedComment #48
mlncn commented... bump. Good to maybe get this in and start updating hard-coded links, as a step that can be taken separately and before the continuing GitLab migration steps?