I'm trying to figure out the best way to handle the derivation process at archive.org with the engine api -- the process looks like:

1. Send source file to archive.org
2. Wait some undefined period of time (minutes-days)
3. Once derivatives are available, store as derivatives

I'm running into a challenge with step 1 -- in that hook_media_derivatives_create_derivative expects the actual derivative file to be returned. In this case that file isn't available for quite some time, but it would be nice to be able to kick off the file transfer and encoding process but store the derivatives later when they are available.

I could bypass the entire hook & configuration and use the derivatives_api structure to store derivatives, but I really like the configuration process you're building and would love to not re-create a bunch of that.

Any suggestions would be great. Thanks for all your work on this!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

civicpixel’s picture

The path I'm currently taking is:

1. Handle initial file transfers of source material to archive.org outside of media_derivatives
2. Defined a media_derivative event "new derivation available at archive.org" that fires off as new files related to the source file become available at archive.org
3. Scheduled => Immediate
4. Conditions => matches filetype xyz (ex: mp4)

I think this will work out ok, I'm still not entirely clear on where I'm going to store the file transfer settings. I have some of them being defined by the engine settings, and others in a global settings page.

slashrsm’s picture

That sounds like a tougher one .... :)

My first idea was to change API for 'create derivative' callback a bit, so it would be legal to return something like 'I'm still processing this derivative, please keep it in processing state, since I'll do this in background. I'll inform you when it's done.'.

We would then implement some function you would call, when a derivative is actually created, which would save it and change it's state to 'finished'. Just like media_derivatives_start_encode() does now.

I think there will be some other use cases when we'll need this kind of a 'non-blocking' behaviour, so I'd like to fund a solid solution for this. Do you think this is one, or it's mostly a hack?

civicpixel’s picture

I think that's a good solution -- I planned out what the workflow would then look like for internet archive:

PRESET: IA Transfer

  1. On file_insert for non derivatives internet_archive_media_derivatives_create_derivative would get called and return "still processing"
  2. the internet_archive module would add that file to the transfer queue (drupal queue)
  3. on completion of a successful transfer internet_archive would call your media_derivatives_update_derivative function passing the path and setting the derivative to finished.

PRESET IA Derivative

  1. On archive_insert (media_derivative event defined by internet_archive) internet_archive_media_derivatives_create_derivative would get called and return "still processing"
  2. the internet_archive module would watch for the expected derivative when cron runs
  3. when the derivative completed encoding and was available internet_archive would call your media_derivatives_update_derivative function passing the path and setting the derivative to finished

This would be ideal because I could still store the majority of the derivative information & status using media_derivatives.

slashrsm’s picture

Assigned: Unassigned » slashrsm
Category: support » feature
Status: Active » Needs work

I'll try to code this during next week.

civicpixel’s picture

Thanks for the update -- I have the engine working now in a very basic state at the moment. If you can make accommodation for the above I should be able to get a development release up. I have another issue but I'll post it in a new thread.

slashrsm’s picture

Attached patch implements it... Please test and let me know if it works as expected.

Engine can now return MEDIA_DERIVATIVE_ENGINE_PROCESSING. This will leave derivative in processing state and wait for engine to end it's job.

When you have your derivative call media_derivatives_derivative_finished() and pass 3 arguments:
- derivative object or MDID
- derivative file object or URI string (this would be returned from engine before this patch)
- (optional) source file object

If an error happend call media_derivatives_derivative_error() and pass 2 arguments:
- derivative object or MDID
- instance of MediaDerivativesException

slashrsm’s picture

Status: Needs work » Needs review
camdarley’s picture

Maybe Media Derivatives should include Batch Operations in module core, as most of the engine we could use take a large amount of time to process (transcoding, transfering, etc...). We could also have an UI, listing all currents derivatives process.
I was thinking about it for the engine i'm coding, but as it seems to be useful for almost all other engines it doesn't need to be re-coded in each engine.
I'm not experienced enough to propose a working code using queue and batch operations, but civicpixel seems to do good work on that.
What do you think about this?

slashrsm’s picture

I agree. I was planning to support batch jobs from the beginning. One of the reason I did not develop it by now is in lack of experience with batch API. I am not completely sur how this should work. Maybe we can pan this together?

I was never thinking about central list of job, but it sounds like a good idea! I am not completely sur if this can be done. Anyone?

camdarley’s picture

As it's not anymore related to this issue, i created a new one: #1323430: Per-preset parallelized engine processing

slashrsm’s picture

@civicpixel: Have you tried attached patch? Can you help yourself with that?

slashrsm’s picture

This patch could become deprecated if we implement this: #1323430: Per-preset parallelized engine processing.

@civicpixel: What do you think about this?