I'm not exactly sure why, but https://www.drupal.org/api mentions that if you have API change requests, you should post them here. :) So, here goes!

I spend a good chunk of my Drupal life looking at issues in Google Spreadsheets in order to do issue triage, figure out what to fund through D8 Accelerate, etc. Google Spreadsheets has an API. Therefore, I thought a perfect use case for the Drupal.org API would be to populate a Google Spreadsheet with it. Here's what I found as I was doing so.

Problem #1: No way to do "AND/OR" queries within a given filter, esp. for status/version.

The most common thing I want to do is replicate the "- Open issues -" filter on any given issue view like https://www.drupal.org/project/issues/user. (Ideally, excluding "fixed," but I could always filter that in the spreadsheet.)

You can do a query like this to get critical, needs review issues against Drupal core 8.0.x, which is great:
https://www.drupal.org/api-d7/node.json?type=project_issue&field_project...

However, you can't do a query for "active|needs review|needs work|RTBC|postponed," so you must instead do one query for each status you care about. (In this case, at least 5 separate queries; more if there's a pager on any of them.)

So that's not ideal, but I can get by with 5 queries as long as I only want to query multiple values on a single filter. However, I'm always also going to want to replicate the issue view's "- 8.x issues -" filter, and so this becomes an exponential problem. There are currently 19 8.0.x-something statuses. 5*19 = 95 queries

(Since the criticals are pretty tightly triaged, I'm hard-coding version to 8.0.x-dev for now to avoid this problem, but there's definitely a chance I miss issues this way.)

One possible solution is to fix this in RestWS (relevant issue at #2308939: Make AND queries possible)

Another is to somehow expose those "- Open issues -" and "- 8.x issues -" filters at some weird ID like 999 or something so they won't conflict with anything else.

Problem #2: No way to determine a term's name without doing an extra query per term.

It's super-useful to be able to further filter this list by things like "has the Performance tag, but not the Triaged D8 Critical tag" or whatever. In theory, this is pretty easy since an issue node returned a query like the above has an array of its issue tags located in its taxonomy_vocabulary_9 property, so I could just concat them into comma-separated values and do a plain text search in the spreadsheet.

The problem is, the array only includes the term ID, not the term name, and I am just not nerdy enough to have all 10,000+ issue tag term IDs memorized (it is one of my many flaws). So I then need to query Drupal.org again for each term attached to an issue. Let's call the average number of tags attached to an issue (especially a critical Drupal core issue) 5. There are about 90 critical issues. That's an additional 450 requests to build this list. :(

(For now I am querying the database directly to populate a "look-up sheet" of common tag IDs/names to check first, hopefully cutting that number down significantly, but "normal" people will not have this option.)

I couldn't find a correlating RestWS issue, but adding the term name as a property in the taxonomy_vocabulary_9 array would be hugely helpful.

There are probably others, but as far as I got so far.

Comments

Gábor Hojtsy’s picture

I also wrote some very basic drupal.org API code to crunch the data for http://hojtsy.hu/blog/2015-jan-06/2014-review-multilingual-drupal-perspe... and the hardest part is that were external data is referenced, we only get an ID and need to do bazillions of extra requests (and hopefully cache locally). Angie pointed the problem out for tags, but users are the same problem, eg. you need another request to get who it is assigned to. It would be amazing if some pre-loaded data would be in the output for labels at least, since that is the most common need for user names, terms, etc.

mglaman’s picture

Thanks for opening this webchick! I've been meaning to do a write up on my experience with the Drupal.org API + ContribKanban.com

In short, each list is it's own query. Although it can't be, because the API doesn't support AND/OR/Exclude. The real issue here is being able to pick multiple status states. Currently I do a forEach() on the status array and time it out at 1 second per request so things don't get bogged down. I cache responses, but eventually I had to invalidate these, so it's just a bandaid.

joshuami’s picture

Component: Other » Blocked IPs
Issue tags: +drupal.org hitlist

I agree these would be valuable improvements. I'm adding it to our hitlist, but it will be a while before I can free up staff to work on it due to other priorities on the roadmap. Is there someone in the community that would like to take a look and try to tackle this?

If anyone has any names, I'm happy to reach out to them.

mglaman’s picture

joshuami, I'd be more than willing to try and help out. Was going to try and work on restws queue in coming days.

Gábor Hojtsy’s picture

Component: Blocked IPs » Other

Not about blocked IPs?

joshuami’s picture

@mglaman, the work on RESTws would be awesome. Connect with mixologic and drumm in IRC in the #drupal-infrastructure. We'll definitely need to include some performance testing before we deploy to production for Drupal.org.

@Gábor Hojtsy Thanks for catching that I accidentally added the wrong component. I was checking to see if we had an API component on Drupal.org customizations. @webchick is correct that it is a little odd to put these requests in infrastructure. The select list apparently grabbed the first component when I switched back to infrastructure. We might move this issue to a better place and update the documentation.

drumm’s picture

I'm not exactly sure why, but https://www.drupal.org/api mentions that if you have API change requests, you should post them here. :) So, here goes!

The API is affected by various projects, including some infrastructure pieces. Whether infrastructure or drupalorg is the best place to start triaging is a bit of a toss-up. I went with infrastructure because it is lower level.

#1978202: Return file info instead of a callback url is the most-reported issue I know about, for reducing the HTTP requests to get information about files. #1676732: Add the taxonomy term might be hijacked for terms.

With #2308939: Make AND queries possible, that might cover everything reported here.

drumm’s picture

And of course, parallel work on D8's core REST should happen, so we aren't risking regressions in the future.

webchick’s picture

Actually, yeah, between those three issues (assuming whatever's done for terms can also be done for users to get usernames) that'd address everything here.

Although klausi's reply at #1676732: Add the taxonomy term is not very promising.

drumm’s picture

Since RestWS and D8's core REST are related, I'd expect RestWS to follow D8's lead.

Or, because of any code freeze/slush, decide on the best practice way to alter in the extra data for both.

irinaz’s picture

@webchick, is this issue still relevant? If yes, could you give more details? If no, could we close this issue?