Closed (fixed)
Project:
Advanced Queue
Version:
8.x-1.x-dev
Component:
Code
Priority:
Major
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
25 Oct 2017 at 23:43 UTC
Updated:
25 Sep 2024 at 16:22 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
killes@www.drop.org commentedWhat would be the best way to implement this? Seems like an important feature.
Comment #3
joachim commentedOff the top of my head, I'd do this:
- add a hash field to the advancedqueue table
- add a method to the job plugin interface to let each job type plugin define how a job payload gets turned into a hash (as there may be some data in the payload that should be ignored when considering what makes two jobs 'the same')
- enqueueJob() queries for the hash
Comment #4
killes@www.drop.org commentedQuestion is: would the maintainers be inclined to accept such a patch or should I try to implement my own backend?
Comment #5
bojanz commentedYes, I definitely want this.
Here's the issue that added the D7 version of the feature: #2846958: Support unique item creation.
Open questions:
1) The hashing approach VS forcing the caller to manually specify a uniqueness key.
I like the idea of a hash, though I'm not sure sure the job type comparison override is needed.
After all, the main problem we're solving is the same code being run twice, thus the jobs are usually identical.
Resque plugins always do uniqueness by constructing an ID out of the payload, which is equivalent to our hash idea.
2) Storage class implementation:
a. Add a $job->isUnique() and expand the SQL logic inside enqueueJobs() to perform the DB merge when needed. Have an empty SupportsUniqueJobsInterface that tells the outside world whether it's supported, or silently ignored.
b. Add separate enqueueUniqueJob() and enqueueUniqueJobs() methods via SupportsUniqueJobsInterface
Leaning towards a) at the moment.
What do you think?
Comment #6
killes@www.drop.org commentedMy usecase is as follows:
I need to update 50k products to keep prices up to date.
These products come from different suppliers, each using their own unique product ID (which however may clash with another supplier's).
So, I would like to use a key composed of supplier and product id, to keep it 1) unique and 2) be able to see on a glance which supplier it is from. ie I do not want to have to use a hash.
Comment #7
joachim commented> Resque plugins always do uniqueness by constructing an ID out of the payload, which is equivalent to our hash idea.
That would probably mean better performance, as you'd get unique IDs which are much shorter than a hash string and therefore, I assume, quicker to query for.
You'd still need a method on the plugin to turn the payload into an ID.
Comment #8
bojanz commentedNot necessarily. Our hash would be 32 chars, which is pretty short:
Comment #9
evaldas.uzkurasMade a unique jobs support. Quite similar to that was made in D7.
Used key instead of the hash, because I think, that job creator should decide how to handle uniqueness.
In my use case, I have different job types for the same object, so it was essential to have own key to override other jobs.
Comment #11
evaldas.uzkurasTrying to fix MySQL error: the key is too long.
Comment #13
evaldas.uzkurasFixed - Undefined index: unique_key
Comment #15
evaldas.uzkurasFixed last failed test
Comment #16
evaldas.uzkurasComment #17
evaldas.uzkurasFIxed coding standard.
Comment #18
evaldas.uzkurasComment #19
joachim commented> Have an empty SupportsUniqueJobsInterface that tells the outside world whether it's supported, or silently ignored.
My first thought was to define whether a backend plugin supports uniqueness or not on the plugin annotation.
The advantage of using the plugin definition is that a child class can change the support from TRUE to FALSE, whereas you can't remove an interface from a child class in PHP.
Comment #20
joachim commentedThe patch at #17 doesn't work in the way that previous comments were suggesting.
It's requiring the caller to handle passing in a unique key of their own devising, which is fiddly & also open to errors. (Case in point: while writing my patch, my test was failing because the code I wrote to create a unique key from a hash wasn't considering the queue name in the hash!)
So here is an updated patch with an aproach in like with what @bojanz was suggesting.
(Though @Evaldas Užkuras should be credited as I've kept all the .install file code, which was much appreciated as I hate writing DB update code!)
@bojanz concurred with this in #commerce on slack, which I'm taking the liberty of quoting:
> my intention was to allow callers to skip defining a unique key
> every time they enqueue a job
> I found that a bit annoying in D7
in #5:
> I like the idea of a hash, though I'm not sure sure the job type comparison override is needed.
I agree. Let's leave that off for now, as I don't know of any use cases where the job type plugin might want to customize how the hash is created. It's therefore most likely a YAGNI.
So for the hash, we need to ensure a job is unique only within its type and its queue. Otherwise, we could queue two different jobs for the same order, and the second one would be ignored! Therefore the hashing uses the job queue, job type, and the serialized payload:
> Have an empty SupportsUniqueJobsInterface that tells the outside world whether it's supported, or silently ignored.
Backends may or may not support unique jobs.
However, silently ignoring jobs that should be unique is dangerous, and as we've seen at #2931290: Multiple recurring orders get created if cron is badly configured can be extremely detrimental to a site (with Commerce Recurring, it leads to duplicate orders, which will in time lead to customers being overcharged, which is pretty bad!). Commerce Recurring's jobs absolutely MUST NOT be used with a backend that does not support unique jobs, and therefore, I think we should guard against this here.
So therefore, I want to introduce a concept of a job type *requiring* that its jobs be treated as unique. This can be declared on the job type plugin annotation.
I don't think we need finer granularity than this by allowing this on a job-by-job basis. The use cases I've seen, it's the job type as a whole that requires job payloads to be treated as unique. (If for some reason you need some jobs to be unique and some not, you can define a 2nd job type that inherits the class and has a different annotation, or you can cheat and chuck a timestamp into your payload ;)
However, what I don't massively like about the code I've written for this is that Database::enqueueJobs() has to call parent::enqueueJobs() to check that the backend is compatible with the jobs to queue. And obviously, any other backend implementations that exist in the wild will have to add that call to the parent too. Suggestions on how to make this more elegant welcome!
Comment #21
brunodboTested the patch in #20: I added
ensure_unique = TRUEto my job type plugin's annotation, and am seeing the hash being created when a job is enqueued.I'm seeing the following behavior though, maybe I'm misunderstanding how this is supposed to work:
- When I run cron, an existing job record's 'available date' is being updated (I haven't specified a delay for this job when enqueuing it)
- During the same cron run, the previous 'version' (i.e., with the previous 'available date') of the same job gets processed, leading to the same job being processed over and over again in subsequent cron runs. In my testing, the job never reached completion.
Is this expected? I was expecting the job record wouldn't change at all once it's been created, if the job type requires uniqueness, to guard against multiple processing. Does the prevention of multiple processing has to be taken care of in the job type plugin somehow?
Comment #22
joachim commented> - When I run cron, an existing job record's 'available date' is being updated (I haven't specified a delay for this job when enqueuing it)
That's an interesting question. IIRC the 'available_date' property on a job is the date it's available from, and by default, that's the moment it was placed in the queue. Arguably, attempting to re-queue it shouldn't change that date, because it's still the case that the job was available from the time it was queued the first time.
> - During the same cron run, the previous 'version' (i.e., with the previous 'available date') of the same job gets processed, leading to the same job being processed over and over again in subsequent cron runs. In my testing, the job never reached completion.
Ouch.
So the job got loaded and processed by one cron hook, and then requeued by another cron hook? But why didn't cron hook 1 mark the job as complete?
Can you explain the different cron hooks you had running, and their exact order?
Comment #23
brunodbo> Arguably, attempting to re-queue it shouldn't change that date, because it's still the case that the job was available from the time it was queued the first time.
I would say that the attempt to re-queue a job shouldn't change the date it should be available. Otherwise, there might be cases that a job never becomes available? For example, if I queue a job that should send a message at 'now + 30 days', and the 'available date' for that job keeps getting changed with 'now + 30 days', the job would never become available.
Should we add a check in
\Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\Database::enqueueJobsto see if the job exists, and only call$job->setAvailableTime($this->time->getCurrentTime() + $delay);if it's a new job? Or simply bail if we have an existing job?> So the job got loaded and processed by one cron hook, and then requeued by another cron hook? But why didn't cron hook 1 mark the job as complete?
Yep, that's exactly what it looks like.
> Can you explain the different cron hooks you had running, and their exact order?
I have the 'Advanced queue' cron hook running first, and then my cron hook that queues the job runs right after (using Ultimate cron to check the order, but didn't change the order). Is it possible that the job gets re-queued because it isn't completed yet by the time my cron hook runs? When I disable my cron hook that does the queuing, the existing job completes (state 'Success').
Comment #24
joachim commented> Is it possible that the job gets re-queued because it isn't completed yet by the time my cron hook runs?
Do you have cron jobs running in parallel? My understanding is that with a normal setup they run in sequence, so AQ should finish processing the job and decide whether it succeeded before your own cron hook tries to queue things.
Can you add some logging statements to your code to get some finer detail on the order of operations?
Also, if you were able to reproduce this in a kernel test, that would be great!
Comment #25
joachim commented> I would say that the attempt to re-queue a job shouldn't change the date it should be available. Otherwise, there might be cases that a job never becomes available? For example, if I queue a job that should send a message at 'now + 30 days', and the 'available date' for that job keeps getting changed with 'now + 30 days', the job would never become available.
Ok, I'm convinced :)
> Should we add a check in \Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\Database::enqueueJobs to see if the job exists, and only call $job->setAvailableTime($this->time->getCurrentTime() + $delay); if it's a new job? Or simply bail if we have an existing job?
Yes, quite probably. Which will make it slower than the merge query, unless there's a way we can set the merge query to not bother making a change at all if the item already exists.
We basically shouldn't change an existing job at all when attempting to queue a duplicate.
Comment #26
brunodbo> Do you have cron jobs running in parallel? My understanding is that with a normal setup they run in sequence, so AQ should finish processing the job and decide whether it succeeded before your own cron hook tries to queue things.
Apparently Ultimate Cron runs cron jobs in parallel, I just learned from its project page (I had only installed it yesterday to review the cron jobs' order) :) So with Ultimate Cron installed, my job gets queued and processed during the same cron run, since the cron jobs run in parallel.
I did some more poking around, added some logging code as you suggested, and realized that the 'unexpected' re-queuing I mentioned in #21 is my own doing :D. During my queuing cron hook, I look for users that are flagged, and queue a job to send a message to those users. But then in the queued job, I don't do anything that would prevent the same job from getting queued again (e.g., users aren't unflagged, since the job just sends a message)...
> We basically shouldn't change an existing job at all when attempting to queue a duplicate.
What if we add a
$job->exists()method that checks for a job with a matching hash, so we can call that at the start of\Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\Database::enqueueJobs? I can imagine such a method could be useful in other places as well.Comment #27
brunodboThis adds a
hasDuplicateJobmethod to theDatabasebackend to check whether the to-be-queued job already has a matching job. Then inenqueueJobs(), we check for a duplicate, andreturnif we have one. If we don't, we fill in theunique_keyfield in theinsertquery. Not sure if we still need themergequery?The above also allows me to do something like this when queuing jobs, so
enqueueJob()doesn't get called in the first place:What do you think?
I'm also wondering whether we should move the unique key creation to the
Jobclass, though it makes sense that the backend is responsible for deciding how to implement uniqueness...Comment #29
brunodboBetter patch.
Comment #31
brunodboNot sure yet why
DatabaseBackendTestis failing, but this fixes an issue in theforeachinenqueueJobs()(continueinstead ofreturn), and should fix the coding standards test fail.Comment #32
brunodboComment #34
joachim commented> I did some more poking around, added some logging code as you suggested, and realized that the 'unexpected' re-queuing I mentioned in #21 is my own doing :D. During my queuing cron hook, I look for users that are flagged, and queue a job to send a message to those users. But then in the queued job, I don't do anything that would prevent the same job from getting queued again (e.g., users aren't unflagged, since the job just sends a message)...
Thanks for investigating that one!
So should it be the case that we don't queue a duplicate if the original is currently claimed?
We definitely need to consider a completed job as no longer unique -- Commerce Recurring requires this, for instance, as every billing period (eg every month) it will queue a job for the same order.
A few points about the latest patch (some of which are to do with my previous work, feel free to ignore those and I will get round to them later!) --
On reflection, I think I want to move this code to its own method.
It doesn't fix the problem that plugin implementations need to call it, but it's cleaner and better DX than having to call parent::enqueueJobs().
Needs to mention job status as well.
This needs to consider the job status, as mentioned above. A completed job doesn't count as a duplicate.
This should be added to SupportsUniqueJobsInterface.
Comment #35
brunodbo> So should it be the case that we don't queue a duplicate if the original is currently claimed?
I think so. That would mean that a job is unique (and shouldn't be re-queued) until it reaches the completed state.
We'll also a need a way to reflect the job's state in the unique key handling, to avoid integrity constraint violations for the 'unique_key' field when queuing a next iteration of the job (i.e., when a job is completed, and we queue it again). Should we update/empty the 'unique_key' field when a job reaches completion?
Comment #36
joachim commented> We'll also a need a way to reflect the job's state in the unique key handling,
That's not going to work. A job that's being queued is always going to be in the 'queued' state (or whatever, post-breakfast, don't have the code in front of me). We have a use case where the same order gets queued, say, once a month. So that's going to have the same hash each time.
> to avoid integrity constraint violations for the 'unique_key' field when queuing a next iteration of the job (i.e., when a job is completed, and we queue it again). Should we update/empty the 'unique_key' field when a job reaches completion?
... and because of this use case, we actually shouldn't have a database-level unique key constraint on the unique key column. That's actually a good catch, as up till now, our use case of 'same order should get queued a month later' would cause a DB crash!! We should add a test for this.
We're not making use of that constraint anyway, because we now do a DB load to check for a job before doing an insert.
Comment #37
joachim commentedOh, another thing to do -- we should log a notice when a duplicate job is rejected. At the moment, the patch just swallows the error. But that could lead to real developer headaches if there's a bug that's incorrectly causing a job to be a duplicate.
BTW, I really appreciate all your help with this issue! I thought the patch was pretty much done, and you've brought loads of problems to light! Yay for peer review!
Comment #38
brunodboHappy to help!
#34:
1. Attempt at refactoring this in attached patch. Method name suggestions welcome :)
2. Done, though could be made easier to read. Maybe split it up in 2 paragraphs?
3. WIP: jobs will be re-queued if previous jobs with the same hash have the state 'success'. There is still an issue with the query in
\Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\Database::hasDuplicateJob(which is likely the reason the tests are failing):$unique_keyhas a different value than what is stored in the db, so it might be something with the way the hash field is being checked in the query?4. Done
#36:
> we actually shouldn't have a database-level unique key constraint on the unique key column.
Unique key constraint removed in attached patch.
> We should add a test for this.
Where should this test go? Testing newbie here...
#37:
Should we add the logger in this patch (AQ doesn't have a logger channel yet AFAICT)? Or should that be its own issue?
Comment #39
brunodboComment #41
joachim commentedChanges since the last patch:
- changed docs for hasDuplicateJob() not to mention key, as it's an implementation detail.
- fixed duplicate job check to consider queued or processing states only.
- add mention of states to hasDuplicateJob() docs.
- fixed duplicate job check not working due to no queue ID.
- changed method name.
- make protected, add to docs.
- removed pointless base method
- fixed non unique backend not calling helper.
- added logging.
Still todo, maybe:
- I'm not convinced hasDuplicateJob() should be public, on second thoughts. Queue users shouldn't need to worry about this, they should be able to let the queue handle this for them.
- I'm tempted to rename the unique_key DB column to something like hash, as it's not unique in the database sense of the word, so could confused developers.
Comment #42
brunodboYep, think so too - seems like that should be an internal thing. Didn't change this in the attached patch yet - should we do away with the interface then (since interface methods should be public)?
Yeah, I got confused by that as well :) Changed that in the patch here, and rename
createJobUniqueKey()tocreateJobHash()Also rewrote the docs for
hasDuplicateJob()a little, to (hopefully) make it easier to read.Comment #43
jonathanshawWe still need a way for a backend to be able to declare that it can handle jobs that require uniqueness, so that we can throw an exception if a job that requires uniqueness is passed to a backend that can't provide that.
But this doesn't have to be done by an interface: we could add a new annotation to the backend plugin type instead.
Comment #44
thronedigital commentedAny updates?
Comment #45
gordon commentedI have done some investigation into this as I required this for a client but it was not a good fit even though this would make the implementation very clean.
1. the method of using a hash for the uniqueness based upon the payload I think is not a good idea. Since it is based upon the entire payload you can't add in additional variable parameters. eg. I want the uniqueness based up on the node id, however I need to add in other data as well which I don't want to be a part of the uniqueness such as the a workflow status which I may want to change later.
2. So if you try to enqueue a new version of an existing job, if will simply discard the new job, in my case I don't care about the existing job but want to replace it with the new one, which may have a new available date, or other changes.
3. Also the enqueuing will crash if you put try to put a job which is unique onto a queue which backend doesn't support uniqueness. Since there is only the database backend this is not a huge problem but will create issues if/when alternate backends are created.
Comment #46
joachim commented>1. the method of using a hash for the uniqueness based upon the payload I think is not a good idea. Since it is based upon the entire payload you can't add in additional variable parameters. eg. I want the uniqueness based up on the node id, however I need to add in other data as well which I don't want to be a part of the uniqueness such as the a workflow status which I may want to change later.
To address that, we can do this:
> - add a method to the job plugin interface to let each job type plugin define how a job payload gets turned into a hash (as there may be some data in the payload that should be ignored when considering what makes two jobs 'the same')
Comment #47
jonathanshawThis use case interests me too.
Comment #48
gordon commentedI have made some changes so that it will now accept an unique id. So as a part of the definition you can when creating a job you can specify it and it will be used. However it will only use this if the job type is a unique type, otherwise
Will alway return NULL.
If you do not specify the unique id, then it will fall back to generating a hash from the payload.
Also I have changed the unique flag on the job to be one of the following
UNIQUE_DUPLICATES
UNIQUE_DONT_OVERWRITE
UNIQUE_OVERWRITE
So now you can specify if you want to overwrite the unique items or ignore it.
Lastly so you can I have changed how Job::create() works so you can specify the unique id using this interface rather than using new Job() to do this.
So in the 3rd parameter it is just a $definitions array and gets added to the payload, queue id and queue status.
I also added an index on the unique id field so that it will run faster.
I also found a couple of bugs in that the state and the available were getting updated on the job after it was being converted into an array.
Lastly I found a couple of bugs in that the
Comment #49
gordon commentedfound some bugs that broke it. fixed now.
Comment #50
jonathanshawWe lack test coverage for the changes introduced in #48.
Comment #51
daniel korteReroll plus some minor clean up
Comment #52
daniel korteSorry, ignore #51
Comment #53
daniel korteThird time is a charm.
Comment #54
jonathanshawLet's see what the testbot says
Comment #56
daniel korteWhoops, wrong folder.
Comment #57
gordon commentedJust adding an interdiff for the changes between 49 and 56
Comment #58
gordon commentedSince I am running this in live, I had a problem when updating because I would run the update on this patch again. I added in a check to make sure it does try to run this twice.
Comment #59
martijn de witComment #61
andypostThere's already implementation in https://www.drupal.org/project/queue_unique (it using md5 hash, that's only thing to improve)
Comment #62
andypostmaybe it should just skip queuing when item already queued or other reaction where same item in processing
needs injection of service
this hashing better to keep configurable via some container parameter
serialization could be dangerous/impossible
Comment #63
jonathanshawReally? I thought the payload got stored in a database field by the the database backend, so that probably does some serlializing.
Comment #64
andypostThis exception is thrown when nothing did check if the job serializable ... so you'll get serialization exception instead of expected
moreover payload serialized as json other places so it more confusing
Comment #65
andypostComment #66
johnpitcairn commentedPatch at #58 no longer applies to latest dev (or rc2).
Comment #67
jcnventuraComment #68
jonathanshawThis should be
return $job_type_definition['ensure_unique'] !== AdvancedQueueJobType::UNIQUE_DUPLICATES;Otherwise the UNIQUE_DONT_OVERWRITE option gets ignored and duplicates are allowed.
Comment #69
ankithashettyRe-rolled the patch in #58, made changes requested in #68 and fixed indentation issues in couple of files.
Retaining status as "Needs work" to address #62, thank you!
Comment #70
freality commentedThis current solution has the uniqueness hash being created within \Drupal\advancedqueue\Job. Which means a JobType has no influence on its hash criteria. It would be great if hash creation was delegated to JobType; like
JobTypeInterface::createHash(Job $job): string;.Furthermore, all job definitions should have a hash value; with JobTypeBase defining a default implementation [which could simply returns a uuid, db serial, some random string val]. There'd be no need for any logic to determine if a job requires uniqueness.
Comment #71
chi commentedA couple of hypotetical issues related to creating hash from the payload.
1. What if payload format is changed during deployment. The payload may still have same meaning for consumers but the hash will be different.
2. What if payload contains some volatile data, i.e. timestamps. The consumers could ingore redundant data but to ensure jub uniqueness they must be removed manually.
Comment #72
joachim commentedI think both those cases can be dealt with by overriding createJobHash():
1. Write compatibility code in createJobHash() which ensures the hashes are made in the same way. Admittedly, that does mean you're forever using the OLD format. But changing payload format is a deployment event which arguably requires the queue to be emptied completely and repopulated, because the data is changing fundamentally.
2. Filter out the volatile data in createJobHash()
Comment #73
chi commentedAnother relevant issue is ability to remove existing jobs. The typicall use case is like follows. A a job has been scheduled to notify users about some event, i.e. a new post on site. The related entity has been removed and you need to cancel the job before emails go out.
Comment #74
jsacksick commentedComment #75
jonathanshawI think this has to be out of scope here. Jobs should check they are still relevant at the point when they are executed, if there is a chance they are not. It's true you could try to (ab)use the job hash mechanism to do something like this, but I don't think we should burden ourselves with that as a consideration here. This feature is proving difficult to land even as it is.
Comment #76
zengenuity commentedRe-roll of the patch from #69.
Comment #77
rinasek commentedComment #78
jcnventuraThis is a really nice to have, but the module is already in use in 4K sites, and doesn't have this feature yet. We're in Release Candidate status for 2 years already. Removing the 'Release blocker' tag, but would be good if this gets to RTBC soon(ish).
It can always be added to 1.1-beta1.
Comment #79
crzdev commentedLatest patch #76 unable to apply into 1.0.0-rc6 (or 8.x-1.x), adding reroll
Comment #81
smovs commentedUpdated patch #79 for applying to module ver. 8.x-1.0-rc7
Comment #84
smovs commentedUpdated patch #79 for applying to module ver. 8.x-1.0-rc7 and removed logging in src/Plugin/AdvancedQueue/Backend/Database.php.
This logging adds a lot of logs while deploying and that is hard for maintenance.
Comment #86
jonathanshaw@SmovS please post an interdiff when posting patches, or please make a commit to an existing MR. If you do this, we can see the proposed change to the patch; in this case it would make it easier to see the logging code you have removed.
All this logger code can be removed too as nothing is using $this->logger AFAICS.
Comment #87
borisson_Added a review to the MR
Comment #88
emil stoianov commentedLatest changes by @smovs but as a patch file for those who need it.
Comment #89
joachim commented> public $ensure_unique = self::UNIQUE_OVERWRITE;
I think the default value for this should be UNIQUE_DUPLICATES, otherwise behaviour will be changed for existing job type plugins.
EDIT: also, the constant names are confusing!
I thought UNIQUE_DONT_OVERWRITE = 'leave' meant 'make a new job'. But the comment for it says 'Do not enqueue the new job'.
Comment #90
joachim commentedUnit tests are currently broken because the MR introduces a dependency on the kernel to the Job class:
Not sure how best to resolve this. Though I am not really keen on toArray() calling getUniqueId() to get the value rather than just returning what is set in the class. The order of precedence in getUniqueId / setUniqueId / constructor for handling $this->uniqueId doesn't seem right to me.
Comment #91
joachim commentedAnother issue: we're using the phrase 'unique ID' in comments and method names and var names.
But there is *already* a unique ID -- the Job ID $job->id which is a unique ID. And it's fairly common in core to casually prefix the word 'ID' with 'unique'.
So I think it's going to be really confusing to differentiate between 'the job ID, which is a unique ID', and the 'unique ID which is a hash to check for uniqueness'.
Comment #92
joachim commentedI'd really like to rename the constants for these, as they're confusing:
Something like:
Furthermore, what about making them an Enum now that we have those in PHP?
Comment #93
joachim commentedI'm also a bit confused about the need for BackendBase::ensureUniqueJobsSupport(). This says:
But the Queue class's enqueueJob/s() methods already have a check.
And the module's README has this code example showing that jobs should be queued by calling these:
However! Kernel tests, both in this MR and the main branch, repeatedly do this:
Again, not sure how best to resolve it as it's a bit of a confusing situation.
Comment #94
jcnventuraEveryone.. I truly dislike issues going over 100 comments, and this one is really in danger of that.. Can I ask those that are interested in this to review and test this, please??
@joachim Can you please update the issue summary to explain what is the current status of this, and if there's still any blocker / caveat that would prevent this to be marked RTBC by someone else?
Setting this to "Needs review" as I'm not sure the "Needs work" that was added back in #69 is still applicable.
Comment #95
stmh commentedI reviewed the code and have no objections in merging it back. I also tested the new functionality in a project where I need the functionality and it works flawlessly.
Mark as RTBC
Comment #96
joachim commentedI think this needs work, as per my various comments -- see #93 and #94.
Comment #97
stmh commentedRegarding your comments:
I think its safe to remove the comment, as
enqueueJobandenqueueJobswill throw an exception if you try to add a JobItem that needs uniqueness and the queue does not support it.Regarding the second comment. The Queue itself provides two function
enqueueJobandenqueueJobsto add jobs, which looks fine to me.Comment #98
joachim commentedOh and #92 as well.
Comment #99
stmh commented@joachim Oh yes, let's do this. I had my problems understanding them as well.
Comment #100
jcnventuraOK. Then we need to work on #92 and removing the first comment of #93?
Comment #102
stmh commentedWorked on #92 and #93, see my latest commits.
Comment #103
joachim commentedPushed a bunch of fixes. Tests now pass.
Still to do:
1. My comment in #93. I'm not sure how best to resolve this and it would be useful to have input from the original author of that code. But it looks like the checking in Queue was added in #48 back in 2019, and the checking in BackendBase even before that, so I don't know if those devs are even still around. Given that, I'd be inclined to remove the checks in Queue and let backends use the ensureUniqueJobsSupport() helper.
2. getDuplicateJobId could probably be made protected. That would make SupportsUniqueJobsInterface an empty interface, but there's already precedent for that in the backend interfaces.
3. I REALLY want to rename the DB column we're adding from unique_id to something like job_hash. It's going to be a source of confusion and bugs with the current name. I appreciate that sites are already using this patch.... but that's a risk you take when using a patch.
4. The MR makes changes to existing hook_update_N() implementations, which is out of scope AND generally not a good idea:
These were first introduced in #51 -- and later patches have added the same changes to newer implementations that were added to the module in the meantime!
Comment #104
bojanz commentedFWIW I've been using "fingerprint" as the column name for this purpose in a few other places. It is fairly common in various libraries.
I do agree that "job_hash" sounds better than "unique_id" at least. Every ID is unique so the column name sounds confusing.
Comment #105
jcnventuraJust wanted to comment that I think this should not be committed to the module before we release 1.0, as adding new features is somewhat against what we should be doing in "rc" status.
Then again, we should probably release 1.0 ASAP unless someone has any objections, and if there is any, please comment on #3220292: Roadmap to 1.0.
Comment #106
jcnventura1.0 has now been released. This can easily go into 1.1, we just need to get this in RTBC.
Comment #107
emil stoianov commentedThe patch with latest updates for those who are not comfortable downloading merge requests.
Comment #108
jonathanshawI think part of the problem we're having with naming things and figuring out how this should be organised is that the semantics of 'unique' are unhelpful.
Let's discard the whole term "unique" and focus on the language of duplicates and how we handle duplicates: we need to detect duplicates (by hash), and then manage them.
Concretely this leads me to the following suggestions:
instead of a single Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\SupportsUniqueJobsInterface we have:
Drupal\advancedqueue\Plugin\AdvancedQueue\Backend\SupportsDetectingDuplicateJobsInterface
Our annotations become:
Notice that DUPLICATES_MERGE I've said requires SupportsDeletingJobsInterface instead of some new SupportsUpdatingJobsInterface - this is because the simplest possible way of updating a job is to delete the old one and insert a new one.
I'm saying 'merge' here instead of the previous 'overwrite' because the basic idea in order to avoid committing us to what the payload should be of the final updated job; in this first implementation we should just overwrite the old job with the new one, but in future we might allow the job type plugin to have an opinion about how the payload of old and new job are combined.
Comment #109
jonathanshawI agree that BackendBase::ensureUniqueJobsSupport() does not seem necessary. It's unused in the Database backend which is the only one we offer.
I agree SupportsUniqueJobsInterface doesn't need a public getDuplicateJobId() method.
I dislike the jobTypeRequiresUniqueness() and getJobTypeUniquenessType() methods on the JobTypeManager. I understand the logic for wanting to check this without instantiating a plugin, but the semantics seems very ugly. I think it would be nicer to have this as just a single method on the Job class: Job::getDuplicatesPolicy() which itself calls out to JobTypeManager::getDefinition().
I agree
$this->getBackend()->supportsJobType($job_type);is a nicer semantic than trying to figure this out on the Queue entity or in the JobTypeManager. Actually if we do this, then strictly speaking we don't even need the SupportsUnique/Duplicates interface at all.Comment #111
jonathanshawI've opened a new MR as I'm making a significant number of changes here:
- using 'duplicates' as the key term instead of 'unique'
- introducing the term 'duplicates policy' to describe the duplicates handling strategy specified on a job type.
- using 'fingerprint' instead of 'unique id' or 'job hash'
- removing jobTypeRequiresUniqueness() and getJobTypeUniquenessType() methods from the JobTypeManager, in favour of new method Job::getDuplicatesPolicy()
- removing SupportsUniqueJobsInterface altogether, adding no new interface
- adding a verifyJobSupport() method to BackendInterface and BackendBase.
However, doing this I've realised that a lot of the reason why this issues is getting tricky goes back to #20:
This goal makes sense. However, it is not really acheivable for the reasons pointed to in #93. It is perfectly possible and reasonable for custom code using this module to enqueue jobs with:
$queue->getBackend()->enqueueJobs([$jobs]);This means that calling code has direct access to the backend plugins. We can't stop calling code from passing jobs to backend plugins that can't properly handle them, and we can't force custom or contrib backend plugins to verify they can handle a job properly.
Everything we are doing with SupportsUniqueJobsInterface, Queue::verifyQueueSatisfiesJob(), BackendInterface::verifyJobSupport(), JobTypeManager::jobTypeRequiresUniqueness() etc is basically wishful thinking. We are trying to create protective mechanisms but we can't ensure that anyone uses them.
I think at this point we should:
- rip out these mechanisms (which in MR11 are basically simplified to BackendInterface::verifyJobSupport())
- create a new issue for discussing and working on this rather tricky problem, which may apply to other future features not just this duplicates feature
- document on the annotation type that duplicates handling may not be safe when using backend other than the Database backend.
Comment #112
joachim commented> Let's discard the whole term "unique" and focus on the language of duplicates and how we handle duplicates: we need to detect duplicates (by hash), and then manage them.
I sort of see what you're getting at, but the word 'duplicate' carries the same connotations as 'unique' -- ensuring there's only one of something.
> I think it would be nicer to have this as just a single method on the Job class: Job::getDuplicatesPolicy() which itself calls out to JobTypeManager::getDefinition().
> - removing jobTypeRequiresUniqueness() and getJobTypeUniquenessType() methods from the JobTypeManager, in favour of new method Job::getDuplicatesPolicy()
The Job class can't depend on the service container!!!! I *removed* that in my latest commits. It crashes the unit tests otherwise.
> Notice that DUPLICATES_MERGE I've said requires SupportsDeletingJobsInterface instead of some new SupportsUpdatingJobsInterface - this is because the simplest possible way of updating a job is to delete the old one and insert a new one.
Then it's not 'merge', it's 'replace'. Saying 'merge' when it's actually deleting the old one is misleading.
Comment #113
bojanz commentedIs there enough value in actually support the "merge" strategy? Or the concept of strategies at all?
In all other implementations I've used, a job/task is either unique/deduplicated (meaning that a job with the same fingerprint is rejected on enqueue) or it isn't.
My assumption without reading the code would be that I do:
enqueueJobs() can check if $job->getFingerprint() is non-empty, and if the backend does not implement SupportsUniqueJobsInterface, throw an exception.
Comment #114
jcnventuraI tend to support #113 as well. From the point of view of maintenance, it makes the code a lot simpler, and I think also from the point of view of an API user. As an API user, I'd try to create the job, and if I get the exception, I either propagate the error up, or I do something to make the job unique and try again. Normally, the API user is the one that would better understand the consequences of trying to create duplicate jobs.
Comment #115
jonathanshawThanks to everyone for the feedback.
What I've done is:
- simplified the job type annotation to a boolean
- given the job type plugin responsibility for deciding how duplicates should be handled
- made the base job type plugin's implementation simply throw an exception if there's a duplicate
- given the Queue entity, not the Database backend, responsibility for looking for duplicates.
This means that all of the duplicates functionality added in this issue is completely bypassed if you call $queue->getBackend()->enqueue(). But if you call $queue->enqueue() you get safe behavior whatever backend you use.
I believe in a follow-up issue we should deprecate Queue::getBackend(). This function creates a leaky abstraction of sorts; by putting API users in direct contact with the backend, we lose any ability to intermediate. Instead API users should interact with the queue entity, and we can use that to provide safety and backwards compatability as we add new features like this in future.
Comment #116
jonathanshawI like where this has got too. It seems clean and simple, but flexible enough to accomodate any use case.
Comment #117
jonathanshawGreen at last.
Comment #118
adamfranco commented@jonathanshaw, there's a typo in Merge request 11 on lines 132-133 of
advancedqueue.install. The variable initialized is$schemabut undefined$database_schemaname is used. See attached patch.Comment #119
jonathanshawThanks @adamfranco, I fixed the variable name in the update hook.
Comment #120
astonvictor commentedHi @jonathanjfshaw,
thanks for your work and your MR.
I tested your changes by:
- creating a new
AdvancedQueueJobTypeplugin withallow_duplicates = falseconfiguration;- added a code to create a new job after submitting a custom form.
as a result after creating the second job with the same value, I got a fatal error -
Drupal\advancedqueue\Exception\DuplicateJobException: Job rejected because it duplicates existing jobs.I guess, we should simply not add the second job to the queue and add a log with a notice.
FYI tested with drupal 10.1.5, PHP 8.1
What do you think?
Comment #121
jonathanshawYes, that behavior is by design. This strategy was chosen by the maintainer in #114:
Comment #122
astonvictor commentedyeah, it works for me with try-catch.
I guess it should be somehow documented on the module page / readme.
Comment #124
tbkot commentedPatch with the latest changes from MR to be used in #2931290: Multiple recurring orders get created if cron is badly configured
Comment #125
tbkot commentedUpdated patch for 1.0 version instead of 1.x-dev branch
Comment #129
jcnventuraThanks everyone! And there were a lot of you. As to how to document, this.. I guess it will be part of the release notes when the next release is created. If someone would be kind enough to create a change record, that would be amazing.
Comment #131
khiminrm commentedHi!
When testing the patch, I've noticed that it's not always working properly.
I will try to explain. I've tested it with
commerce_recurring.The job payload is simple:
['order_id' => $order->id()]So I've run the 'Commerce Recurring' cron job to process a recurring order. The job has created two queue items:
with types
commerce_recurring_order_closeandcommerce_recurring_order_renewwith payload{"order_id":"230185"}.Then I've run the 'Advanced Queue' cron job. It has processed the queue items. One of them was failed due to another issue not related to this one. And another one was successfully processed. So I had two queue items with states
failureandand they were found as duplicated.
So, I think we should remove condition where we check only two states in the
getDuplicateJobs()method:->condition('state', [Job::STATE_QUEUED, Job::STATE_PROCESSING, Job::STATE_FAILURE], 'IN');Please, review and re-test if possible. I'm attaching the updated patch from #125 for 1.0 and interdiff.
Thanks in advance!
Comment #132
jsacksick commentedHm... Perhaps in some cases we still want to allow queuing jobs with the same fingerprint?
One usecase I could think of is a job that notifies a 3rd party system after an entity was updated, you probably wouldn't to notify twice for the same update, but would still want to notify about subsequent updates but I'm guessing in this case the payload should contain what's supposedly unique about the update i.e. the changed timestamp for example in addition to the entity ID?
Not sure, but in the case of these recurring jobs, we never want to queue duplicate jobs...
Comment #133
khiminrm commentedCreated new issue https://www.drupal.org/project/advancedqueue/issues/3460188
Comment #134
bojanz commentedBy definition a fingerprint must be unique. Therefore it should be impossible to queue two jobs with the same fingerprint. Producers who wish to run a job after all can influence the fingerprint by including an always-unique component such as a timestamp.
Comment #135
jsacksick commented@bojanz: Agreed, wanted to come back and edit my message, but Roman posted a patch in that sense, a fingerprint should be unique and it'd be the responsability of the job type to make sure it is for usecases I mentioned in #132.
Comment #136
zaporylieDraft Change Record was created: https://www.drupal.org/node/3468315. Feel free to adjust and publish before making a tagged release that includes this feature. Thanks, everyone for a great job on this, hope to see it as part of the tagged release in a very short time 😉
Comment #137
jonathanshawWe overlooked something: we didn't harden the logic in the queue to handle the possibility that the job plugin's handleDuplicateJobs() method might return null to cancel the job.
I opened a follow-up: #3476814: Handle nulled duplicate jobs