Problem/Motivation
We recently ran into an issue with Commerce where errors during checkout weren't handled correctly. The result was orders being completed without attempting to charge the payment method the user had entered onto the order.
The issue is here: #2219483-29: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation
Once the patch from that queue was applied, it revealed an issue with Commerce Discount that prevented users from moving forward to the next page during checkout.
The error appears to be happening because the commerce_discount line items set against the order are being recreated at some point so that when commerce_line_item_field_validate starts to iterate through the passed line item ids (around line 823), versus line items loaded via commerce_line_item_load_multiple and compare them the commerce_discount line items will never match up. This is consistently reproducible with the current dev version of Commerce Coupon and Commerce 7.x-1.11 with the above patch applied.
Product discounts don't appear to be affected. Just create an order discount (no coupon required) and it will make it so checkout pages will simply refresh (without any printed errors) prohibiting checkout completion.
It appears that Commerce Discount has been relying on this bug in Commerce which allows it to bypass validation of its line items.
Steps to reproduce
- Apply patch #2219483-29: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation
- Create an order discount
- It will make it so checkout pages will simply refresh (without any printed errors) prohibiting checkout completion.
Proposed resolution
Defer deletion of discount line items until after order form has passed validation (applies to each step in checkout).
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #12 | commerce_discount_line_items_prevent_validation-2618488-12.patch | 1.36 KB | ryandekker |
Comments
Comment #2
merauluka commentedFixing a typo
Comment #3
mglamanLinking to related #2219483: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation. merauluka, if the patch in that issue resolves some issues, or helps find where modules are introducing bugs by fixing a bug of its own, would you mind marking it RTBC? There's been a few people who have tossed their thumbs up.
Comment #4
merauluka commentedAnother user beat me to it, but I did comment that the patch in #2219483 fixed our payment skipping issue.
Comment #5
joelpittetTouch up issue summary to make the Steps to reproduce clear.
Comment #6
ryandekker commentedI've explored this issue a bit and I've noticed a couple things.
commerce_line_item_delete_multiple($line_items_to_delete);seems to have resolved the issue in checkout for me, albeit with the issue of adding yet more cruft in the commerce_line_item table. Ideal? No. But does address the issue of failed validation in the cart, and allows the order to remain essentially the same.So, if we deferred deletion until later (after form validation, not sure which hook) we could address both the cruft I'm seeing using the module normally (point 1), and point 2 wouldn't cause too much of an effect. Just throwing that out there as an option.
However, as I dug into this a bit more, it seems to me that that deletion thing as a whole is unnecessary. I haven't tested thoroughly tested this idea, but the calculations seem to stay constant by removing the deletion. I don't know if the module has morphed over time and that's a legacy strategy, but the
commerce_discount_set_existing_line_item_price()function already accounts for update situations.This patch merely removes several lines of code from the
commerce_discount_commerce_cart_order_refresh()function, which allows updates to the existing line item rather than the the deletion/re-creation cycle. In my testing, this addresses the issue at hand as well what I brought in point 1 above.It does seem a bit too good to be true to me, so would love some feedback on the problems this may cause. We'll be doing more testing ourselves and will report back with what we find.
Comment #8
mglamanSo the test failed because Commerce Order cleans up deleted line items. Free products need to be deleted. (added, totally forgot to finish my sentence on first post.)
Are the errors being caused happening when the "free products" condition is used? Or just the random create/delete of normal discount offers?
Comment #9
ryandekker commentedThe validation issue still occurs for me with that added back in (and with the deletion line). Also, that only resolved 2 out of the 6 failures. The 4
testCommerceDiscountCompatibilityStrategiesfailures are still present.Perhaps it is best to just handle deletion later on (even on cron, probably with a query that looks up straggling rows), as the order object remains correct without that cleanup. Just removing this line still passes all tests and will still pass form validation:
Thoughts?
Comment #10
mglamanWe could do something like this, where
$line_items_to_deleteis now an array of line item IDs.Then our cron worker callback invokes
I like this idea because it removes processing out of the order workflow and moves it to a later process.
Comment #11
torgospizzaI second this notion!
Comment #12
ryandekker commentedOk, here's the cron/queue thing implemented.
It seems to fix the issue at hand and still delete the same items that the original version does. Give it a spin and let me know what you think.
However, the issue I raised before, straggling line items in the commerce_line_item table, does still seem to be present. I'm not sure how, but line items get disconnected from their orders randomly. Granted, it's just a cleanliness issue, but it's so closely related to this one I'd really like to get it knocked out as well.
Here's a query that I ran to check how many of these straggling line_items we have in our database, run it for yours and see if you don't have many as well:
While we're doing a queue processor and doing cleanup later on, what about running a query similar to this and queueing the line_item_ids to be deleted using that same queue processor? Unless there's something in code that's adding these that can be addressed at the root of the issue, it just seems like hook_cron to look up these and queue them for deletion would address the random cruft that this module spawns on cart refreshes. Something to the effect of:
Comment #13
joelpittetI'm really not sure how using cron to clean this up will help solve the problem in the issue summary. Could you please explain?
Also if we don't delete the line items right away we could end up with some false positives, so that would need tests but I think that solution needs to be moved to a separate issue as it muddles this up a bit.
Comment #14
joelpittetComment #15
ryandekker commentedThe issue as noted in the summary is that the RTBC issue on Commerce will prevent all orders with discounts from working across the board. This is because the immediate deletion of line items (on cart refresh) causes the form to fail validation (without errors!). The patch in #12 postpones that deletion to allow form validation to pass. Cron has nothing to do with it, other than still needing to clean things up. I updated the proposed resolution in the summary to state this a bit more concisely.
Also... as that's likely to make it into a stable release within a month or two, I don't think Postponed is the right status for this. Were you having problems reproducing the issue? I think it's pretty easy to reproduce, unless this is a much more isolated issue than I believe it is. From what I can tell, if that fix makes it into Commerce before this is resolved then Commerce Discount will break all stores running discounts. We need this fixed before that or this will become a critical issue.
If you had a different idea as far as preventing the issue in the summary, let's discuss. I think the ideal would be not to delete the line items on refresh at all, but rather to update them. However, deletion seems to be an accepted method and converting to updates seems like it would have far-reaching implications for no real gain that I can see.
At any rate, what sort of false positives are you worried about? It still passes all tests and no real change has occurred, aside from when the line items are deleted.
As I mentioned in #12, neither the module in it's present state nor with that patch seems to catch all the extra line items which are no longer associated with orders, which is what I was getting at in the second part of that post. That's not a huge deal, just a cleanliness issue, so that can be it's own issue if need be; I just figured since we're already talking about a cron clean up task...
Comment #16
joelpittetcron has a bit to do with it, that job is run via cron 15 seconds by default after. I'm a bit afraid if discount line items are checked for their existence, and they are gone ~15 seconds later.
https://api.drupal.org/api/drupal/modules!system!system.api.php/function...
Comment #17
joelpittetWas considering moving the discounts on an order to their own table to keep track of them better and I can update/delete/add to that table more frequently without effecting the line existing line items at all. (hopes and dreams). I gotta put some time into that solution though.
The schema @mglaman and I discussed briefly was this in IRC:
Comment #18
ryandekker commentedI'm not totally tracking on #16... The default 15 seconds for cron queue workers is a timeout on the processor, not a delay until it runs. Are you concerned about a sort of race condition if cron fires too early? I suppose I hadn't thought of that... any ideas to avoid that? Though to be fair, that really just puts us back where we are without the patch. Perhaps we could queue a check after validation (say in form_submit) to address the deletions after we know the form is done with them. Thoughts?
Re: #17, it's probably best to move that into it's own issue. I'm also not sure that schema supports the entirety of options with discounts, but you would know that better than I. Calculated amount would really just act as a cache, not sure it's needed. Other than that, I think the only new info here would be the created field. I suppose that it could make things a bit more stable, but it seems like quite the undertaking for not a lot of payout. A shift like that probably deserves a 2.x branch. There's my off-the-cuff thoughts on it, take them or leave them.
Comment #19
joelpittet@ryandekker a timeout is a delay AFAIK. Yes I'm worried about a race condition but not that cron fires too early, that it fires too late and other conditionals check existence of line items types that are about to be deleted. Sorry I'm not very good at explaining things sometimes.
re #17 yes it's own issue, I'm mentioning the idea as a possible data integrity check. It's not part of this issue for sure, just related in what they are trying to solve. And yes calculated amounts would be more like cache and they make things super complicated(personally would like to remove that feature and leave all discounts on the cart)
Comment #20
joelpittetAnd no ideas/thoughts on how to resolve the race condition, which is why the cron idea worries me.
Comment #21
ryandekker commentedA timeout and a delay are different... at least as I'm using the terms. To try to get on the same page here (no condescension! ;) ):
Timeout means the amount of time the callback can run until it will be stopped. So if cron runs at 0:00:00, and this queue has 100 items in it, it'll process as many as it can before 0:00:15. So if those requests are slow and take 1 second each, it will only delete 15 line items. The rest will be saved for the next time cron runs. More likely, all 100 will process in a few seconds, and the next queue processor would start closer to 0:00:05. The timeout is just a way to be responsible so we don't hog too much of the 180 seconds cron sets PHP's max_execution_time to.
Delay means the time before the processing will begin. This isn't what the queue info is referring to. If cron processes 1 second after the line item is added to the queue, the line item would be deleted then. This is the issue I was worried could cause a race condition, as it would operate as it does now: causing validation to fail and preventing forward motion in the form. However, as long as cron isn't running constantly (not a good assumption, but still), the user could resubmit the form and it would be successful in their completion. (Cron queues will not fire until cron is executed, usually on a schedule, when invoked via drush or cron.php. If poorman's cron takes over, then I think (I'm not that familiar with core's poorman cron) it will still fire at the beginning of the request, so still shouldn't cause major issues.)
However, if your concern is that it'll take too long anyway... that's exactly the design of deferring the deletion of the straggling line items. Please note, though, that the straggling line items are still being removed from the order during the cart refresh hook via
$wrapper->commerce_line_items->offsetUnset($delta);. They are simply left hanging around in thecommerce_line_itemtable until cron deletes them (as it would normally do inhook_commerce_cart_order_refresh()).The main requirement is for the line items to still be in the table when
commerce_line_item_field_validate()fires, so thatcommerce_line_item_load_multiple()doesn't fail to load them (which is what causes validation to fail). This is a "new issue" introduced in #2219483-29: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation where commerce is changing it's validation to not ignore some validation failures, including the ones which deleting items in the cart refresh causes.Hopefully that helps a bit... What exactly are the "other conditionals" you are concerned about? Like I said, the patch still passes all tests, so I'm not sure what those would be.
And yeah, race condition... I have some thoughts there as far a different take on cron to address that and other cruft in the line items table, but I don't want to go there until you're sold on cron to begin with.
Comment #22
mglamanWe need to get a stack trace for commerce_line_item_field_validate to see what is causing the line item field validation hook to fire. Generally there are checks to see if a line item still exists before operating on it. If we can get a trace we can find this out and maybe mitigate the whole "when we delete things."
If this is the case, we could still run into the error. Cron could run when an order is being refreshed and still get goofy.
Also, based on reviewing commerce_line_item_field_validate().. it looks like the line item ID is persisting?
ryandekker, if you can reproduce.. can you use xdebug or something to get a stacktrace from commerce_line_item_field_validate() when the logic gets goofy. The order is being saved somewhere without removing a reference somewhere.
Comment #23
mglamanComment #24
joelpittet@ryandekker thanks for the through explanation I didn't know it worked like that from reading the docs a few times. I'm coming at timeout from the JavaScript perspective so that's why I'm confused
setTimeout()waits before executing.Poormanscron will wait 3 hours(default in performance settings) before executing on a request and with your assumption regarding running it constantly, I run cron constantly and elysia_cron deals with divvying that time out.
Regardless of all that, I agree with #22 we need to find out where this validation failure is occurring. And introducing a cron hook just makes the problem more complicated in the mental model of when things are happening so I kind-of want to avoid that if possible. I may get my hand forced on this one if that issue goes in.
Comment #25
ryandekker commentedHere's a backtrace:
I'm also attaching a couple of images to help illustrate the issue.
1. field_is_commerce_line_items.png shows that the the field being validated is the main
commerce_line_itemsfield. What may not be clear (but see call stack above) is this is coming from the form, and no longer even represents the current state of the order (because the line item was deleted).2. commerce_discount_line_item_has_been_deleted.png shows the core issue: the order in
$form_state(I believe is where this comes from) has the discount line item deleted in the cart refresh. But when it tries to load it, it can't because it's already been deleted. This causes the mismatch I mentioned.The patch in #12 is a hack that just makes sure it's still there when this is called (by delaying that deletion). This is relatively acceptable in my mind because the error is not a real problem. Obviously this has the problematic possibility of still opening up a race condition (more likely under conditions where queue processing is constant) which maintains the issue currently present.
The alternate cron solution I had in mind was simply to add a queue processor at a different step, likely on completion of the checkout process as a whole, which wouldn't clean up line items until well after they were needed. I think it's actually quite a bit cleaner in general, and would have the added benefit of addressing the issue I mentioned where there were straggling line_items that are not deleted by the module as is stands now anyway.
I'm all for a resolution that addresses the issue in $form_state itself, but you'll have to dig in to find out how to do that as it's not apparent to me. Clearly there's something working for commerce shipping, but then again, it's a much simpler line item that doesn't need to be recalculated all the time. If you could think of a way to address line item updates without requiring deletion this would resolve itself, but that seems like quite the undertaking.
Comment #26
mglamanLinking to possibly related issue, #1804592: During AJAX form submission in checkout, the $order argument passed by the Form API is incorrect.. We're discovering a possible issue around the order in the form state it seems.
Comment #27
joelpittetI've tried to reproduce with the latest RTBC over there and can't reproduce from the steps in the issue summary. Is this something that still exists with the latest patch #2219483-35: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation?
If so please add more to the steps to reproduce so that I can help test and sort this.
Comment #28
magicmyth commentedJust ran into this issue without using the patch from #2219483: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation. I've not worked out exactly what is causing this issue to be triggered on my site yet but I'm using the latest Commerce release (7.x-1.13) along with commerce_eu_vat so maybe that is the combo. The patch from #12 resolved the issue of not being able to pass the first checkout page.
Comment #29
joelpittetPlease provide minimum steps to reproduce before setting rtbc because I've already tried reproducing with the provided steps
Comment #30
rszrama commentedWith the release of Commerce Discount 7.x-1.0-beta1, I really believe this issue is outdated. We completely revamped the way discount line items were deleted / recreated both to reduce unnecessary recreations and to fix discount compatibility checking.
There may still be some combination of modules that causes this issue to surface, in which case this issue can be reopened with steps to reproduce. However, in a local test with the patch from #2219483-35: Fields altered onto the checkout form outside of panes don't prevent form submission when they fail validation and an order discount, I can proceed through checkout just fine.