The Ooyala API has a rate limit of 300 requests per minute but unfortunately this module doesn't handle being rate limited very well.

Example incident

For example, we were using the ooyala_taxonomy submodule to sync labels, and the effect of being rate limited was that all the labels in one synchronised group got deleted, both in Drupal AND in Backlot. This had a serious impact on a production website which was using the labels to determine which videos to display in prominent locations.

Explanation

From what I can tell this is what happened:

  1. In ooyala_taxonomy_term_sync_pull_labels() the module tried to load all labels from the API with a call to ooyala_api_label_list() which itself triggers an API call to /v2/labels.
  2. The request to /v2/labels failed due to rate limiting (ie. the response was a 429 error).
  3. Rather than failing at this point, the ooyala_api_label_list() function simply returned an empty array()
  4. ooyala_taxonomy_term_sync_pull_labels() proceeded as though all the labels it knew about had been deleted and subsequently deleted all the mapped taxonomy terms in Drupal
  5. Deleting the labels in Drupal in turn caused ooyala_taxonomy_term_delete() to be invoked which began triggering ooyala_api_label_delete() for each deleted taxonomy term, thereby deleting the labels from Backlot as well (it seems that the timing may have been unfortunate, and the rate limit had just reset for a new minute at the point when the DELETE requests commenced).

This was fairly catastrophic, as it also resulted in the loss of the associations between the deleted labels and the assets they had been assigned to.

The cause

Part of the problem is that the OoyalaDrupalWrapper::request() wrapper function catches any HTTP exceptions coming from OoyalaApiV2::request() and simply logs them, rather than handling them, or re-throwing them.

Work around solution

This is one example of how this had a serious impact on a production site. There may be other scenarios that could be equally serious, so we handled this by adding code to re-throw the 429 Exception, and thus trigger a fatal error (see attached patch as an example). This was a quick, fail-safe solution to the immediate problem which I understand its not ideal, but which did prevent the incident reoccurring.

Do you have any thoughts on how to handle this better?

Comments

quicksketch’s picture

and the effect of being rate limited was that all the labels in one synchronised group got deleted

Holy moly! Wow, that sounds terrible. Sorry for causing such a catastrophe.

Fixing this could take quite an effort to handle gracefully throughout our code. For starters, your patch seems like a responsible thing to include even if we don't actually handle catching the exceptions. One thing I'd probably change before including however is using a more specific exception. The OoyalaAPIV2.php file already includes an OoyalaException class. Let's use that when rethrow instead of a generic Exception. That'll make catching the right exception easier in the future.

quicksketch’s picture

Status: Active » Fixed
StatusFileSize
new900 bytes

Here's the patch I've committed to the module. Same idea, just rethrowing the OoyalaException instead of a generic one, and elaborating on the message in the event that it bubbles all the way up to an end-user.

jamesharv’s picture

Many thanks @quicksketch. That patch looks good. I wonder if it might have been worth including the 429 error code in the re-thrown exception as well - again, to make it easier for callers to handle in the future?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.