I'm trying to figure out the best way to handle the derivation process at archive.org with the engine api -- the process looks like:
1. Send source file to archive.org
2. Wait some undefined period of time (minutes-days)
3. Once derivatives are available, store as derivatives
I'm running into a challenge with step 1 -- in that hook_media_derivatives_create_derivative expects the actual derivative file to be returned. In this case that file isn't available for quite some time, but it would be nice to be able to kick off the file transfer and encoding process but store the derivatives later when they are available.
I could bypass the entire hook & configuration and use the derivatives_api structure to store derivatives, but I really like the configuration process you're building and would love to not re-create a bunch of that.
Any suggestions would be great. Thanks for all your work on this!
Comment | File | Size | Author |
---|---|---|---|
#6 | 1289710_non_blocking_processing_of_derivatives_6.patch | 8.04 KB | slashrsm |
Comments
Comment #1
civicpixel CreditAttribution: civicpixel commentedThe path I'm currently taking is:
1. Handle initial file transfers of source material to archive.org outside of media_derivatives
2. Defined a media_derivative event "new derivation available at archive.org" that fires off as new files related to the source file become available at archive.org
3. Scheduled => Immediate
4. Conditions => matches filetype xyz (ex: mp4)
I think this will work out ok, I'm still not entirely clear on where I'm going to store the file transfer settings. I have some of them being defined by the engine settings, and others in a global settings page.
Comment #2
slashrsm CreditAttribution: slashrsm commentedThat sounds like a tougher one .... :)
My first idea was to change API for 'create derivative' callback a bit, so it would be legal to return something like 'I'm still processing this derivative, please keep it in processing state, since I'll do this in background. I'll inform you when it's done.'.
We would then implement some function you would call, when a derivative is actually created, which would save it and change it's state to 'finished'. Just like media_derivatives_start_encode() does now.
I think there will be some other use cases when we'll need this kind of a 'non-blocking' behaviour, so I'd like to fund a solid solution for this. Do you think this is one, or it's mostly a hack?
Comment #3
civicpixel CreditAttribution: civicpixel commentedI think that's a good solution -- I planned out what the workflow would then look like for internet archive:
PRESET: IA Transfer
PRESET IA Derivative
This would be ideal because I could still store the majority of the derivative information & status using media_derivatives.
Comment #4
slashrsm CreditAttribution: slashrsm commentedI'll try to code this during next week.
Comment #5
civicpixel CreditAttribution: civicpixel commentedThanks for the update -- I have the engine working now in a very basic state at the moment. If you can make accommodation for the above I should be able to get a development release up. I have another issue but I'll post it in a new thread.
Comment #6
slashrsm CreditAttribution: slashrsm commentedAttached patch implements it... Please test and let me know if it works as expected.
Engine can now return MEDIA_DERIVATIVE_ENGINE_PROCESSING. This will leave derivative in processing state and wait for engine to end it's job.
When you have your derivative call media_derivatives_derivative_finished() and pass 3 arguments:
- derivative object or MDID
- derivative file object or URI string (this would be returned from engine before this patch)
- (optional) source file object
If an error happend call media_derivatives_derivative_error() and pass 2 arguments:
- derivative object or MDID
- instance of MediaDerivativesException
Comment #7
slashrsm CreditAttribution: slashrsm commentedComment #8
camdarley CreditAttribution: camdarley commentedMaybe Media Derivatives should include Batch Operations in module core, as most of the engine we could use take a large amount of time to process (transcoding, transfering, etc...). We could also have an UI, listing all currents derivatives process.
I was thinking about it for the engine i'm coding, but as it seems to be useful for almost all other engines it doesn't need to be re-coded in each engine.
I'm not experienced enough to propose a working code using queue and batch operations, but civicpixel seems to do good work on that.
What do you think about this?
Comment #9
slashrsm CreditAttribution: slashrsm commentedI agree. I was planning to support batch jobs from the beginning. One of the reason I did not develop it by now is in lack of experience with batch API. I am not completely sur how this should work. Maybe we can pan this together?
I was never thinking about central list of job, but it sounds like a good idea! I am not completely sur if this can be done. Anyone?
Comment #10
camdarley CreditAttribution: camdarley commentedAs it's not anymore related to this issue, i created a new one: #1323430: Per-preset parallelized engine processing
Comment #11
slashrsm CreditAttribution: slashrsm commented@civicpixel: Have you tried attached patch? Can you help yourself with that?
Comment #12
slashrsm CreditAttribution: slashrsm commentedThis patch could become deprecated if we implement this: #1323430: Per-preset parallelized engine processing.
@civicpixel: What do you think about this?