(note: this is kind of the same as http://drupal.org/node/77976 but that issue is asking about a rating system for projects, where this is talking about gathering quality metrics in an automated way)
Dries has come up with a mock-up which makes tons of sense, attached to this issue.
Let's talk implementation details.
"This item is one of the most frequently downloaded projects"
From what table can we grab this information? Does this come from the server logs? I looked briefly through the project module schema but don't see anything that seems to match. Likely it's through server logs, but we might want to create a poroject_download?file=whatever type of thing.
I also think it would be cool to specify how many downloads it has, like:
"This item is one of the most frequently downloaded projects, with 34,034 downloads this month."
with both "this month" and "one of the most frequently downloaded projects" configurable.
It is maintained by 2 maintainers
This one should be easy, just a query of cvs_project_maintainers. It does beg the question though whether we should implement a _project_download_statistics hook or something that can be implemented by other modules, so that we don't create any dependencies on cvs module in project module and vice/versa.
that [should be who but who's counting ;)] appear to be very active
Same as above, I'd like to see this mention how many commits like:
"... appear to be very active, with 29 CVS commits in the past month"
This past month, 10 different people were active in the project's issue tracker
Should be able to determine this between the project_issue tables.
Together, they reported 24 bugs and 19 got fixed.
Same.
The last change was 4 hours ago
I think we can get this from the cvs_messages table.
I'd also specify which version here like,
"The last change was 4 hours ago, in the Drupal 4.7 version."
I'll attach my rendition of one idea for the admin interface for this in a moment.
Comment | File | Size | Author |
---|---|---|---|
#2 | mockup.module | 3.28 KB | webchick |
#1 | project_mockup.png | 57.54 KB | webchick |
project-information-at-a-glance.png | 155.58 KB | webchick |
Comments
Comment #1
webchickHere's an idea of how the admin interface might look for this block.
Comment #2
webchickAnd here's the skeleton code of that mockup, if anyone wants to tweak/change stuff.
The module is for 4.7 and the mockup is found @ admin/settings/mockup, just because I needed a quick and dirty way to mockup the form. ;)
Comment #3
killes@www.drop.org CreditAttribution: killes@www.drop.org commenteddownload stats are available from osuosl. We have awstats output and can screenscrape that or we need to ask for the original data.
Comment #4
dwwi'm actually thinking about making a "project_vcs.module" -- and moving a bunch of the crap out of cvs.module + schema into something that's VCS-independent. all the existing "CVS access" tab crap doesn't depend on cvs at all. similarly, i think every VCS has a notion of tags and branches, so a lot of the new code i'm going to write for the new release system could just live in here, and not add further dependencies on cvs.module. basically, instead of cvs.module and svn.module implementing a bunch of the same stuff, each one should focus on the things that are absolutely *specific* to the particular VCS in question, and code/logic/data that can be shared across all VCS's to integrate w/ project nodes, should be in project_vcs.module. project_vcs.module would probably introduce a few hooks that could be implemented by cvs.module, svn.module or even someday a bzr.module (if anyone's inclined to write such a thing). ideally, instead of the choice of VCS being site wide, each project could select if its code lives in cvs, svn, bzr, or has no code at all (e.g. the drupal.org website project). the cvs vs. svn thing isn't so much for d.o, where we're always going to have 1 repo, but for other sites where many projects might all live in the same place, and might live not just in different repos, but entirely different kinds of repos, too...
getting the download states from the server into the DB would probably be a good idea. this block is going to get pretty expensive to compute. it's going to be even worse if it has to fopen() a server log and parse the whole thing, too. i'd rather we stuffed the download stats into the DB in some reasonable way, and let this block just worry about getting the info from there. but, i could be wrong about all that. sounds like killes might have some better ideas/info about how this does/should work. ;)
thanks for taking the lead on this, webchick!
-derek
Comment #5
webchickproject_vcs.module makes a lot of sense. I especially really like the idea of each contrib project's maintainer getting to choose what VCS to use for their project. That's a bit above my level though. ;)
For this block, what should I patch against? Project.module? That doesn't really seem appropriate because of all the sub-dependencies on issues and cvs. Should I just make a custom module like project_stats_block.module that could be downloaded as a separate contrib? Or..?
Comment #6
nedjoI'd say for now a patch against project.module should make sense. It could be pulled out into something else later if appropriate.
Angie, you might want to have a look at the patch to this issue, http://drupal.org/node/66013. It shows how we can load data on project usage from the data sent to drupal.org by the drupal.module. The patch may not apply any more as it's been a few months, but in any case you could look at the code and adapt it as a new metric.
Your mockup.module idea seems like a good direction. We should consider a bit what type of thresholds make most sense for the intervals. Percentages? Set numbers, as in the current draft? Ideally we'd have something that didn't need adjusting when, e.g., the number of downloads goes up. Maybe we can find a handy script, e.g. for charting, that will break data into intervals for us. Likely there's useful stuff already in our project statistics generation. Possibly we could use a reusable function that we can send data to and return an appropriate response.
We would then return the text something vaguely like this:
Comment #7
dwwFor this block, what should I patch against?
very good question...
Should I just make a custom module like project_stats_block.module
at this point, i'd probably vote for that. one of the things i've been trying to do is make the project codebase more modular. when i first took over, i was almost paralyzed by the size of the code and the complexity. it was a great learning experience ripping all the issue tracking out into a separate module. i'd rather keep moving in this direction, and making the separate chunks of functionality live in different files, so that it's easier to grok any individual file and work on it if that's the area of project where you've got an itch to scratch...
not everyone using project.module and friends will want/need this block, it's going to have its fingers in too many other modules to make sense in any of them, and it'll probably be easier to just have it as a stand-alone chunk of code. i'd have to think about it some more to decide if it should just be committed into a modules/project/contrib directory inside the main project directory (or right next to project.module, in fact), or if should be a totally stand-alone project node/tarball for itself. for now, i'd just write it as a single .module file (i don't think it'll need any DB tables of its own) which you put in a sandbox or attach copies of to this issue, and we can decide later where to actually commit it for real. sound good?
thanks!
-derek
p.s. i just saw nedjo's post as i was writing this. i still think i'd rather this was in a separate module, not as a patch to project.module. but, other than that, a huge +1 to everything nedjo just said. ;)
Comment #8
webchickCool. Thanks, guys. I'm going to try and do some work on this this week. :)
Comment #9
webchickOk code freeze happened, so "this week" didn't happen. ;)
Bumping priority to critical and assigning to myself. Now my goal is to try and get this done for the 5.0 release. :)
Comment #10
webchickI discussed with dww a plan for implementing this. Since he's hard at work on the new release system for contributed modules, the backend of project module and its sub-modules will probably change quite significantly in the next little while. Therefore, it probably makes more sense to start at the "high-level" architectural level rather than worry about implementation at this stage.
We were talking on IRC and drewish came up with the idea to make the various factors weighted so we could tweak it, and make adjustments later on. Ideally, this would be built in a general way so sites outside of Drupal.org could tweak it to their needs. So we come up with a list of criteria, and people can add/delete criteria and weight it according to what should make the project more/less "healthy"
So here are some of the criteria we came up with:
Comment #11
dww@# of sites implementing this module -- cull from drupal module?
see: http://drupal.org/node/66013 and http://drupal.org/node/66015
i'm too tired to think more clearly about the rest of this right now. hopefully i'll find time/energy to look at this more closely in the near future...
Comment #12
agentrickardGranted, I only maintain one module, and it's brand new (http://drupal.org/project/mysite), but I don't like this metric:
Or these:
I don't like this for the simple reasons that I have tried (very hard) to minimize the number of commits that I have to make. If I have written the module correctly, I won't need to commit bugfixes frequently.
The frequency and volume of commits is not a good measure of the quality of maintainance. It may instead be an indication of bad code or features that weren't planned very well.
So, my argument is that these stats are an inaccurate representation of what we're really trying to accomplish, which is some indication that the module maintainer is paying attention.
How about, instead or in addition, we scrape the accesslog to see the last time the maintainer(s) looked at the issue queue (which I do daily). Or scrape the last time the maintainer added to / comment on / changed the issue queue.
Comment #13
webchickEh... maybe. I definitely wouldn't take CVS commits as a whole to be a measure of quality, but I think encompassed within the rest of the stats they're a nice indication of activity from the maintainer.
Drupal core gets frequent CVS commits and I don't think one would call it a poor quality project. And no matter how well you code something, there will always be bugs, and there will always be new features that users want to add. Frequent small commits also means you're backing up changes into logical chunks rather than one huge commit that "fixes 12 bugs with various stuff."
Scraping stats seems like rather a clunky solution to me. It also rules out folks who are getting issue notifications by e-mail, and only hit their issue queues when there's something to look at.
Comment #14
dwwbasically, i'm w/ webchick on this. if nothing else, you *MUST* commit some changes every 6 months to port to the new version of core (since it *WILL* break your module) or you're not responsive/active. so, if the little box says "last commit: 13 months ago", i know this module is dead in the water.
and, in spite of how brilliant we all think we are, we will have bugs, or it won't quite work right on pgsql, or whatever. ;)
plus, lots of small commits is, ultimately, the better way to use a revision control system. minimizing your # of commits isn't necessarily something to strive for, and it doesn't really prove you're a great, thoughtful developer (usually it just means you leave a lot of uncommitted changes in a workspace and commit in large batches, which has its own problems). but, people have different styles, and ultimately, what matters is how good the code is, not the frequency and size of the commits.
however, the main point is that none of the various metrics proposed are supposed to be the only quality metric, they're all just "possible indicators of quality". better to provide all the stats we can easily display in an intelligent way, and let people decide for themselves which metrics are important to them and which are not.
activity in the issue queue (# of open vs. # of fixed, time since last fixed, etc) are all good indicators, and are discussed above. time since the maintainer looked at the issue queue i think is less interesting/informative than time since the maintainer resolved something. of course, small, well-written modules that Just Work(tm) won't have any issues, and then it'll be a long time since something was resolved, which is why the ratio of open to fixed is important.
thanks,
-derek
Comment #15
agentrickardI yield the floor on this one :-)
I have, though, edited my Module description to be helpful, as far as the info I would want when browsing.
http://drupal.org/project/mysite
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedsubscribing ... this seems like good bang for the buck in terms of development hours.
Comment #17
webchickSomething Nate suggested when I showed this to him:
We should make the "overall percentage" value both visible and sortable so it's viewable from the listing "at a glance", rather than having to go to the project page to find it out. The percentage also is a very easy way for people to get the "hard data" (like what sourceforge does). Furthermore, there could be a little drop-down to restrict the module list to X percent.
Comment #18
pwolanin CreditAttribution: pwolanin commentedI know dww has been away, but any way we can make progress on this issue?
Or alternately, add a voting/rating system for end-user feedback?
Comment #19
webchickSure, if you want to take a stab at it, go for it!
I've found myself too busy to really make progress on it beyond the initial speccing out.
Comment #20
pwolanin CreditAttribution: pwolanin commentedWell, I can't really imagine how I'd proceed far without access to something like scratch.d.o or some reasonable database dump I could use to make a localhost install.
Is the drupal module really reporting back useful data? I find it a bit hard to believe since it's disabled by default (and still has the bug that even with with distributed authentication turned off, the message shows up on the registration page), that it's being widely used.
Anyhow, from the link by nedjo, this looks like the essential elements of the drupal.module-related query, if it's to be used:
Comment #21
drewish CreditAttribution: drewish commentedSubscribing. I got accepted for a SoC project to implement this. I just came across this issue while trying to write up a response to this comment.
Comment #22
NancyDruThere's one indicator of "quality" that I always look for, but I know from various forum posts that others don't, or don't understand. That is the issue queues.
Perhaps the system could create and display something like this:
Based on posts that I've seen many times, I think it's important for this information to be visible to the potential adopter, not hidden away for only the maintainer, many of whom don't know it exists.
As for download counts, it might be useful to the maintainer to see, but I think that's going to be used the wrong way if made completely public. I've seen too many posts from people wanting to download a module because it's popular rather than meeting a need.
And I think a report like above is much more useful than how many commits in such-and-such time. Simple and well-written modules just aren't going to have many commits and the way this is described above may give people false conceptions about a module.
Comment #23
volunteermama CreditAttribution: volunteermama commentedwhere is current discusion on this concept?
I had some ideas ( others may have already suggested them some where else )
Number of in use instances divided by Number of dowloads... High number means people consider & adopt it
An accelleration ( or deacceleration ) of whatevers... Compared to ( divide by ) the average acc ( to account for releases of drupal, holidays ) .. Like down loads or in use .. This could show a growing new module even if in raw numbers it is smaller than the big modules.
I'm not writing it well ... Let me know if I should post somewhere else
Comment #25
dwwCurrent discussion is here: http://groups.drupal.org/node/7191
Comment #26
sunSee also: http://groups.drupal.org/node/10629
Comment #27
dwwBattle plan is now here: Project metrics for drupal.org redesign
Issue tag to follow is project metrics
For example:
#889886: Create a metrics backend framework module
#889888: Design, document and invoke an info hook for modules to advertise metrics they support
#889890: Finalize and implement data storage plan for project_metrics
#889892: Expose metrics data to views
#889894: Create a drush plugin to drive metrics processing
Comment #28
webchickTagging.
Comment #29
mgiffordCan we close this issue now that there is https://drupal.org/metrics
Comment #30
mgiffordadding related issues, but this has got to be close to the source.
Comment #31
mgiffordThe type of metrics that Dries mentioned are much more user friendly than what we've got just now. I must have missed that in #29.
Adding more related issues and tags too.
Comment #32
MustangGB CreditAttribution: MustangGB commentedComment #33
YesCT CreditAttribution: YesCT commentedpriority tag was left over from when d.o had office hours.
current process is different, and https://www.drupal.org/roadmap has the priorities
Comment #34
apaderno