(note: this is kind of the same as http://drupal.org/node/77976 but that issue is asking about a rating system for projects, where this is talking about gathering quality metrics in an automated way)

Dries has come up with a mock-up which makes tons of sense, attached to this issue.

Let's talk implementation details.

"This item is one of the most frequently downloaded projects"

From what table can we grab this information? Does this come from the server logs? I looked briefly through the project module schema but don't see anything that seems to match. Likely it's through server logs, but we might want to create a poroject_download?file=whatever type of thing.

I also think it would be cool to specify how many downloads it has, like:

"This item is one of the most frequently downloaded projects, with 34,034 downloads this month."

with both "this month" and "one of the most frequently downloaded projects" configurable.

It is maintained by 2 maintainers

This one should be easy, just a query of cvs_project_maintainers. It does beg the question though whether we should implement a _project_download_statistics hook or something that can be implemented by other modules, so that we don't create any dependencies on cvs module in project module and vice/versa.

that [should be who but who's counting ;)] appear to be very active

Same as above, I'd like to see this mention how many commits like:

"... appear to be very active, with 29 CVS commits in the past month"

This past month, 10 different people were active in the project's issue tracker

Should be able to determine this between the project_issue tables.

Together, they reported 24 bugs and 19 got fixed.

Same.

The last change was 4 hours ago

I think we can get this from the cvs_messages table.

I'd also specify which version here like,

"The last change was 4 hours ago, in the Drupal 4.7 version."

I'll attach my rendition of one idea for the admin interface for this in a moment.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

FileSize
57.54 KB

Here's an idea of how the admin interface might look for this block.

webchick’s picture

FileSize
3.28 KB

And here's the skeleton code of that mockup, if anyone wants to tweak/change stuff.

The module is for 4.7 and the mockup is found @ admin/settings/mockup, just because I needed a quick and dirty way to mockup the form. ;)

killes@www.drop.org’s picture

download stats are available from osuosl. We have awstats output and can screenscrape that or we need to ask for the original data.

dww’s picture

i'm actually thinking about making a "project_vcs.module" -- and moving a bunch of the crap out of cvs.module + schema into something that's VCS-independent. all the existing "CVS access" tab crap doesn't depend on cvs at all. similarly, i think every VCS has a notion of tags and branches, so a lot of the new code i'm going to write for the new release system could just live in here, and not add further dependencies on cvs.module. basically, instead of cvs.module and svn.module implementing a bunch of the same stuff, each one should focus on the things that are absolutely *specific* to the particular VCS in question, and code/logic/data that can be shared across all VCS's to integrate w/ project nodes, should be in project_vcs.module. project_vcs.module would probably introduce a few hooks that could be implemented by cvs.module, svn.module or even someday a bzr.module (if anyone's inclined to write such a thing). ideally, instead of the choice of VCS being site wide, each project could select if its code lives in cvs, svn, bzr, or has no code at all (e.g. the drupal.org website project). the cvs vs. svn thing isn't so much for d.o, where we're always going to have 1 repo, but for other sites where many projects might all live in the same place, and might live not just in different repos, but entirely different kinds of repos, too...

getting the download states from the server into the DB would probably be a good idea. this block is going to get pretty expensive to compute. it's going to be even worse if it has to fopen() a server log and parse the whole thing, too. i'd rather we stuffed the download stats into the DB in some reasonable way, and let this block just worry about getting the info from there. but, i could be wrong about all that. sounds like killes might have some better ideas/info about how this does/should work. ;)

thanks for taking the lead on this, webchick!
-derek

webchick’s picture

project_vcs.module makes a lot of sense. I especially really like the idea of each contrib project's maintainer getting to choose what VCS to use for their project. That's a bit above my level though. ;)

For this block, what should I patch against? Project.module? That doesn't really seem appropriate because of all the sub-dependencies on issues and cvs. Should I just make a custom module like project_stats_block.module that could be downloaded as a separate contrib? Or..?

nedjo’s picture

I'd say for now a patch against project.module should make sense. It could be pulled out into something else later if appropriate.

Angie, you might want to have a look at the patch to this issue, http://drupal.org/node/66013. It shows how we can load data on project usage from the data sent to drupal.org by the drupal.module. The patch may not apply any more as it's been a few months, but in any case you could look at the code and adapt it as a new metric.

Your mockup.module idea seems like a good direction. We should consider a bit what type of thresholds make most sense for the intervals. Percentages? Set numbers, as in the current draft? Ideally we'd have something that didn't need adjusting when, e.g., the number of downloads goes up. Maybe we can find a handy script, e.g. for charting, that will break data into intervals for us. Likely there's useful stuff already in our project statistics generation. Possibly we could use a reusable function that we can send data to and return an appropriate response.


function project_interval($score, $min, $max, $intervals) {
  $intervals = count($intervals);
  // Find out where $score falls between $min and $max, given the number of intervals.
  $interval = analyze_the_data_and_find_the_right_one; 

  return $intervals[$interval];
}

We would then return the text something vaguely like this:

// Number of downloads for project x.
$score = 1534;
// Number of downloads for most-downloaded project.
$max = 26876;
// Number of downloads for least-downloaded project.
$min = 140;
// Intervals.
$intervals = array(
  t('rarely downloaded'),
  t('not very popular'),
  t('fairly popular'),
  t('frequently downloaded'),
  t('one of our most popular'),
);

$downloads = project_interval($score, $min, $max, $intervals);
dww’s picture

For this block, what should I patch against?

very good question...

Should I just make a custom module like project_stats_block.module

at this point, i'd probably vote for that. one of the things i've been trying to do is make the project codebase more modular. when i first took over, i was almost paralyzed by the size of the code and the complexity. it was a great learning experience ripping all the issue tracking out into a separate module. i'd rather keep moving in this direction, and making the separate chunks of functionality live in different files, so that it's easier to grok any individual file and work on it if that's the area of project where you've got an itch to scratch...

not everyone using project.module and friends will want/need this block, it's going to have its fingers in too many other modules to make sense in any of them, and it'll probably be easier to just have it as a stand-alone chunk of code. i'd have to think about it some more to decide if it should just be committed into a modules/project/contrib directory inside the main project directory (or right next to project.module, in fact), or if should be a totally stand-alone project node/tarball for itself. for now, i'd just write it as a single .module file (i don't think it'll need any DB tables of its own) which you put in a sandbox or attach copies of to this issue, and we can decide later where to actually commit it for real. sound good?

thanks!
-derek

p.s. i just saw nedjo's post as i was writing this. i still think i'd rather this was in a separate module, not as a patch to project.module. but, other than that, a huge +1 to everything nedjo just said. ;)

webchick’s picture

Cool. Thanks, guys. I'm going to try and do some work on this this week. :)

webchick’s picture

Assigned: Unassigned » webchick
Priority: Normal » Critical

Ok code freeze happened, so "this week" didn't happen. ;)

Bumping priority to critical and assigning to myself. Now my goal is to try and get this done for the 5.0 release. :)

webchick’s picture

I discussed with dww a plan for implementing this. Since he's hard at work on the new release system for contributed modules, the backend of project module and its sub-modules will probably change quite significantly in the next little while. Therefore, it probably makes more sense to start at the "high-level" architectural level rather than worry about implementation at this stage.

We were talking on IRC and drewish came up with the idea to make the various factors weighted so we could tweak it, and make adjustments later on. Ideally, this would be built in a general way so sites outside of Drupal.org could tweak it to their needs. So we come up with a list of criteria, and people can add/delete criteria and weight it according to what should make the project more/less "healthy"

So here are some of the criteria we came up with:

  • Age of the project
  • Number of releases
  • # of downloads (all agreed this should not be weighted highly, because a) it can be manipulated easily and b) a lot of people download stuff from cvs and never use the download link -- is there a way of tracking this somehow??)
  • # of sites implementing this module -- cull from drupal module?
  • Length of time since last commit
  • Average commits per (some time frame) -- indicates the love it's seeing from its maintainer
  • # of outstanding bugs
  • average length of time that bugs stay outstanding per (some time frame)
  • # of posts to an issue queue per (some time frame) -- represents how many people "care" about a module
  • Other??
dww’s picture

@# of sites implementing this module -- cull from drupal module?
see: http://drupal.org/node/66013 and http://drupal.org/node/66015

i'm too tired to think more clearly about the rest of this right now. hopefully i'll find time/energy to look at this more closely in the near future...

agentrickard’s picture

Granted, I only maintain one module, and it's brand new (http://drupal.org/project/mysite), but I don't like this metric:

that [should be who but who's counting ;)] appear to be very active

Same as above, I'd like to see this mention how many commits like:

"... appear to be very active, with 29 CVS commits in the past month"

Or these:

# Length of time since last commit
# Average commits per (some time frame) -- indicates the love it's seeing from its maintainer

I don't like this for the simple reasons that I have tried (very hard) to minimize the number of commits that I have to make. If I have written the module correctly, I won't need to commit bugfixes frequently.

The frequency and volume of commits is not a good measure of the quality of maintainance. It may instead be an indication of bad code or features that weren't planned very well.

So, my argument is that these stats are an inaccurate representation of what we're really trying to accomplish, which is some indication that the module maintainer is paying attention.

How about, instead or in addition, we scrape the accesslog to see the last time the maintainer(s) looked at the issue queue (which I do daily). Or scrape the last time the maintainer added to / comment on / changed the issue queue.

webchick’s picture

Eh... maybe. I definitely wouldn't take CVS commits as a whole to be a measure of quality, but I think encompassed within the rest of the stats they're a nice indication of activity from the maintainer.

Drupal core gets frequent CVS commits and I don't think one would call it a poor quality project. And no matter how well you code something, there will always be bugs, and there will always be new features that users want to add. Frequent small commits also means you're backing up changes into logical chunks rather than one huge commit that "fixes 12 bugs with various stuff."

Scraping stats seems like rather a clunky solution to me. It also rules out folks who are getting issue notifications by e-mail, and only hit their issue queues when there's something to look at.

dww’s picture

basically, i'm w/ webchick on this. if nothing else, you *MUST* commit some changes every 6 months to port to the new version of core (since it *WILL* break your module) or you're not responsive/active. so, if the little box says "last commit: 13 months ago", i know this module is dead in the water.

and, in spite of how brilliant we all think we are, we will have bugs, or it won't quite work right on pgsql, or whatever. ;)

plus, lots of small commits is, ultimately, the better way to use a revision control system. minimizing your # of commits isn't necessarily something to strive for, and it doesn't really prove you're a great, thoughtful developer (usually it just means you leave a lot of uncommitted changes in a workspace and commit in large batches, which has its own problems). but, people have different styles, and ultimately, what matters is how good the code is, not the frequency and size of the commits.

however, the main point is that none of the various metrics proposed are supposed to be the only quality metric, they're all just "possible indicators of quality". better to provide all the stats we can easily display in an intelligent way, and let people decide for themselves which metrics are important to them and which are not.

activity in the issue queue (# of open vs. # of fixed, time since last fixed, etc) are all good indicators, and are discussed above. time since the maintainer looked at the issue queue i think is less interesting/informative than time since the maintainer resolved something. of course, small, well-written modules that Just Work(tm) won't have any issues, and then it'll be a long time since something was resolved, which is why the ratio of open to fixed is important.

thanks,
-derek

agentrickard’s picture

I yield the floor on this one :-)

I have, though, edited my Module description to be helpful, as far as the info I would want when browsing.

http://drupal.org/project/mysite

moshe weitzman’s picture

subscribing ... this seems like good bang for the buck in terms of development hours.

webchick’s picture

Something Nate suggested when I showed this to him:

We should make the "overall percentage" value both visible and sortable so it's viewable from the listing "at a glance", rather than having to go to the project page to find it out. The percentage also is a very easy way for people to get the "hard data" (like what sourceforge does). Furthermore, there could be a little drop-down to restrict the module list to X percent.

pwolanin’s picture

Version: x.y.z » 5.x-1.x-dev

I know dww has been away, but any way we can make progress on this issue?

Or alternately, add a voting/rating system for end-user feedback?

webchick’s picture

Assigned: webchick » Unassigned

Sure, if you want to take a stab at it, go for it!

I've found myself too busy to really make progress on it beyond the initial speccing out.

pwolanin’s picture

Well, I can't really imagine how I'd proceed far without access to something like scratch.d.o or some reasonable database dump I could use to make a localhost install.

Is the drupal module really reporting back useful data? I find it a bit hard to believe since it's disabled by default (and still has the bug that even with with distributed authentication turned off, the message shows up on the registration page), that it's being widely used.

Anyhow, from the link by nedjo, this looks like the essential elements of the drupal.module-related query, if it's to be used:

+        case 'mostused':
+          return array(
+            'fields' => array('COUNT(cs.name) AS mostused'),
+            'joins' => array('INNER JOIN {client_system} cs ON p.uri = cs.name'),
+            'group_bys' => array('n.nid'),
+            'order_bys' => array('mostused DESC', 'n.title ASC')
+          );
drewish’s picture

Assigned: Unassigned » drewish

Subscribing. I got accepted for a SoC project to implement this. I just came across this issue while trying to write up a response to this comment.

NancyDru’s picture

There's one indicator of "quality" that I always look for, but I know from various forum posts that others don't, or don't understand. That is the issue queues.

Perhaps the system could create and display something like this:

This display shows how long issues have remained for this contribution.

Status                    Average           Longest
----------               -------------    ---------------
active                   1 week, 2days    8 weeks, 1 day
active, needs more info  3 days           8 weeks, 2 days
patch needs review       3 weeks, 4 days  16 weeks, 3 days
fixed                    4 weeks, 5 days  32 weeks, 1 day
won't fix                1 week, 4 days   2 weeks, 5 days
closed                   6 weeks, 6 days  34 weeks, 3 days

Note that "closed" status happens automatically two weeks after an issue has been marked with one of the resolution statuses ("fixed," "by design," etc.) and all activity ceases. This can "inflate" the average times and should be taken into account.

Based on posts that I've seen many times, I think it's important for this information to be visible to the potential adopter, not hidden away for only the maintainer, many of whom don't know it exists.

As for download counts, it might be useful to the maintainer to see, but I think that's going to be used the wrong way if made completely public. I've seen too many posts from people wanting to download a module because it's popular rather than meeting a need.

And I think a report like above is much more useful than how many commits in such-and-such time. Simple and well-written modules just aren't going to have many commits and the way this is described above may give people false conceptions about a module.

volunteermama’s picture

where is current discusion on this concept?
I had some ideas ( others may have already suggested them some where else )
Number of in use instances divided by Number of dowloads... High number means people consider & adopt it

An accelleration ( or deacceleration ) of whatevers... Compared to ( divide by ) the average acc ( to account for releases of drupal, holidays ) .. Like down loads or in use .. This could show a growing new module even if in raw numbers it is smaller than the big modules.

I'm not writing it well ... Let me know if I should post somewhere else

dww’s picture

Current discussion is here: http://groups.drupal.org/node/7191

sun’s picture

webchick’s picture

Issue tags: +Drupal.org priority, +Developer improvements, +Business improvements, +Site builder improvements

Tagging.

mgifford’s picture

Can we close this issue now that there is https://drupal.org/metrics

mgifford’s picture

mgifford’s picture

Version: 5.x-1.x-dev » 7.x-2.x-dev
Issue tags: +core metrics on drupal.org, +metrics, +project metrics
Related issues: +#1979998: Track community metrics to gauge community health.

The type of metrics that Dries mentioned are much more user friendly than what we've got just now. I must have missed that in #29.

Adding more related issues and tags too.

MustangGB’s picture

Priority: Critical » Major
YesCT’s picture

priority tag was left over from when d.o had office hours.
current process is different, and https://www.drupal.org/roadmap has the priorities

apaderno’s picture

Assigned: drewish » Unassigned