I want to develop something like http://eaves.ca/2011/04/07/developing-community-management-metrics-and-t... for Drupal.

First goal is to figure out a list of metrics that make sense to capture. This is happening in the Prairie group at http://groups.drupal.org/node/144624

Then I'll need to add a drupalorg_metrics module which outputs this data with pretty graphs. Will probably be important to cache the counts so this doesn't get regenerated on every page request, since some of these queries are likely to be expensive. :(

Marking postponed on the outcome of that discussion.

Comments

drumm’s picture

I'd like to see sampler module used since we put some time into it for the big Drupal.org redesign push and should be used for project metrics too. I hear it is quite flexible.

drumm’s picture

And simply using the metrics from drupalorg_get_activity() would be a decent place to start.

dww’s picture

Yup, we definitely want to use sampler. In fact, we're ready to deploy that and start gathering data on about 5 Project*-related metrics ASAP, but sadly we never heard from nnewton to make it happen. :( See #893912: deploy Sampler API with project metrics for more...

moshe weitzman’s picture

Assigned: webchick » moshe weitzman
Status: Postponed » Needs review
StatusFileSize
new3.21 KB
new187 bytes

Attached is a good start on a /metrics page. You can see it at http://sampler-drupal.redesign.devdrupal.org/metrics (drupal/drupal for login). The VM data is a bit stale so please don't review charts for accuracy :). The charts are generated by Flot. It took me a long time to figure out Flot, but I chose Flot over Google Charts because I understand that it is preferred by Infra team. If I am wrong, let me know.

The architecture for data collection is same as the counts for issue links in the Contributor Links block. We gather the data during hook_cron() and save one variable, drupalorg_metrics_counts to the variables table. All page requests just do a variable_get() during rendering.

For this iteration of the page, I'm omitting Downloads. Those are more complicated since they need to be summarized from web log and sampled using Sampler module.

lars toomre’s picture

The indentations in the module file are a bit messed up. Perhaps in places tabs were used instead of two spaces? Also I do not have permissionto review the .info file.

dww’s picture

Cool and sad. Nice to see progress. *Really* wish that progress was in the form of getting sampler deployed, writing these things as sampler metrics, and solving the "how to chart sampler data" problem in general instead of custom flot work. But, anything is better than nothing and perfect can't be the enemy of good. I'm just sayin'... ;)

Thanks,
-Derek

moshe weitzman’s picture

StatusFileSize
new187 bytes

Renamed info file so it can be viewed (apparently d.o allows .info to be uploaded but not viewed - WTF).

drumm’s picture

Status: Needs review » Needs work
  • The full width of the page is 940px, not 900.
  • Does flot require inline styles? We have classes for all the grid widths, the whole page would be class="grid-12 alpha omega", alpha and omega because it is the first and last element in the row. The height would be best in Bluecheese's CSS, all our custom CSS goes to that one place.
  • Why only go back to 2007? Why not always end in the current year?
  • I am a bit concerned about the data storage since all variables are loaded on every page load. How big does this get? In drupalorg_get_activity():
    • We do not trigger on cron, instead we use a custom drush command, so we can individually control the schedules with Jenkins.
    • We use the drupalorg cache bin, not the variables system. On the live site, I believe it is backed by memcache.
    • Since it is a cache, and unncached requests are lengthy, we use a lock.
  • This counts unpublished spam.
  • I attached a patch fixing up some minor code style. And mktime does "[Month] values greater than 12 reference the appropriate month in the following year(s)." It is made with git format-patch, see http://drupal.org/node/1054616.
  • I do agree with dww in #6.
  • I personally like the pile-everything-into-one-module strategy, organization can happen with include files as needed.
drumm’s picture

StatusFileSize
new10.17 KB

I double checked the indentation - it is tabs instead of spaces. Attached is a full patch fixing that and removing trailing whitespace.

moshe weitzman’s picture

Status: Needs work » Needs review
StatusFileSize
new3.21 KB
new187 bytes

1. Done
2. I tried for a while to not use inline styles. I was unable to get flot to work that way. Would be great if someone else could figure this out.
3. We go back to 2007 because you expand the range of your X and Y axes by going back further and thus it gets hard to make out detail. Further, I think prior years are not really relevant for monitoring how the community is doing. Research projects need much more detail than these graphs are intended to provide. The data collection always stops in the current month so the graphs will always be current. Cron is not running on the demo site so thats why it stops a bit early.
4. OK, switched to drush+cache_get system like drupalorg_get_activity(). The drush command you need is drush eval "drupalorg_metrics_record_stats(2007, 2030);". I don't see a need for locks here. The read to build the graphs is done in a different code path from the data collection. The data collection will be done during cron using a series of fully indexed queries that won't block a thing.
5. I added status checks on users and comments to combat the spam problem. I really think such spam should be deleted and not unpublished, but OK. I omitted the check for nodes because those get less spam and we deliberately as it ages. Those past years shouldn't get penalized.
6. This patch is based on yours.
7. Duly noted.
8. I've kept the separate module. Your way has merit for sure. At the same time, it is handy to just disable a module when the server is crashing, or functionality is no longer wanted.

drumm’s picture

A little drush command would be good to have. The more we put under version control, the better. Our Jenkins configuration is not version controlled. I'm not too worried about changing 2007 or 2030 any time soon, so the extra work committing and deploying is fine.

Locking is good, even if there never is much contention. Drupal.org is a big, busy place, so I'd rather be prepared for the unlikely event that a bunch of people and/or robots are looking for metrics when the thing is uncached. (This was something that caused problems for the home page metrics when we launched the redesign; not the same traffic here, but the lesson stuck with me.)

How does this fit into the rest of the site? Should there be navigation to it? What links to it?

moshe weitzman’s picture

Attached is the drush command, and new module with locking. Same .info is attached as well.

drumm’s picture

Status: Needs review » Fixed

I did a bit more cleanup that you can see in the git log for this project and deployed: http://drupal.org/metrics.

moshe weitzman’s picture

Thanks Neil. Way to deliver! i will work on adding links to the page.

drumm’s picture

I went ahead and cleared out the sampler dev site since this is done and it is quite old. Doing a new round of sampler work is just installing modules. As always, request a new dev site if you need it.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

greggles’s picture

Status: Closed (fixed) » Needs work

This stopped displaying data at some point. The cached version in google is from April 18th and looks fine, so it's a pretty recent breakage.

greggles’s picture

Status: Needs work » Closed (fixed)

And back. I guess it was just the time I hit it?

lizzjoy’s picture

No, it is not showing data right now.

greggles’s picture

Status: Closed (fixed) » Needs work

And empty again :/

mgifford’s picture

Issue summary: View changes
Status: Needs work » Fixed
Issue tags: +maintain

I'm going to close this because /metrics is working again.

However, I think that having access to this data, is only part of what David Eaves was getting to here:
http://eaves.ca/2011/04/07/developing-community-management-metrics-and-t...

How do we use this to affect participation #2186377: Highlight projects that follow Best Practices

Not sure that this is fine enough to allow us to address Core participation or those of the most popular D7 modules we all depend on.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

  • Commit 035f57c on 6.x-3.x, 7.x-3.x-dev by drumm:
    [#1182998] code cleanup.
    
    
  • Commit 252520a on 6.x-3.x, 7.x-3.x-dev by drumm:
    #1182998 Fix whitespace.
    
    
  • Commit 4453134 on 6.x-3.x, 7.x-3.x-dev by drumm:
    [#1182998] Fix typo.
    
    
  • Commit 4af1315 on 6.x-3.x, 7.x-3.x-dev by drumm:
    [#1182998] Comments #10 and #12- new drushrc, caching, locking, and...
  • Commit 8f43775 on 6.x-3.x, 7.x-3.x-dev by drumm:
    [#1182998] Remove cron, it is replaced by drush.
    
    
  • Commit b76b235 on 6.x-3.x, 7.x-3.x-dev by drumm:
    #1182998 code style.
    
    
  • Commit e3a957c on 6.x-3.x, 7.x-3.x-dev authored by moshe weitzman, committed by drumm:
    #1182998 Initial buildout of module.