Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
...If not now, then eventually.
Currently, multistep forms save some seed data to the current session. This data isn't being properly cleared out, and as time goes on will mushroom to fill the sessions table. The attached patch makes sure that stale form seed information gets nuked if it's been around for more than a day.
Comment | File | Size | Author |
---|---|---|---|
#39 | multistep-form-session.patch | 1.68 KB | Gribnif |
#35 | session_4.patch.txt | 2.02 KB | Owen Barton |
#12 | session_3.patch | 1.98 KB | RobRoy |
#9 | multistep_sessions_3.patch | 1.94 KB | chx |
#5 | multistep_sessions_2.patch | 1.89 KB | chx |
Comments
Comment #1
eaton CreditAttribution: eaton commentedUnecessary hunk pulled out, unecessary date calculations replaced with a simple time value.
Comment #2
dopry CreditAttribution: dopry commented+1 the code looks sound. It doesn't break test installation.
I had some reservations about the time base session clearing, but chx and eaton explained there isn't enough context from within formAPI to know when a user is done with the form, so time based expiry was the best option.
It should probably be noted to developers that if they're paranoid about dragging around session data they should unset the data stored by their multistep forms when the form is completed.
Comment #3
chx CreditAttribution: chx commentedNow is the unnecessary hunk removed.
Comment #4
RobRoy CreditAttribution: RobRoy commentedSmall nit: It would be nice to have a bit more (and a period) in the header docs for the cleanup function so developers know that this cleans out day old data, not all data...something like
"Remove form information from the $_SESSION['form'] array that is more than one day old."
Comment #5
chx CreditAttribution: chx commentedComments are important but they shall not keep up a critical...
Comment #6
chx CreditAttribution: chx commenteduh, this can be read the wrong way, so I add: "so I have rerolled to further it".
Comment #7
RobRoy CreditAttribution: RobRoy commentedComments look good, that's all I can really comment on. Budump tssh! :)
Comment #8
m3avrck CreditAttribution: m3avrck commentedIt appears this clean function is *never* called in this last patch. Not sure where to put otherwise I would.
Comment #9
chx CreditAttribution: chx commentederm yes...
Comment #10
chx CreditAttribution: chx commentedI shall let Eaton RTBC this.
Comment #11
eaton CreditAttribution: eaton commentedThe conversation with Dopry raised some questions for me, too. Long-term, we may want to think through ways of making up-to-the-moment clearing more straightforward for modules. For now, though, in the final hours of beta 1, this solves the emergency.
RTBC it shall be.
Comment #12
RobRoy CreditAttribution: RobRoy commentedComments got borked in the new revision. This patch just changes some comments.
Comment #13
Dries CreditAttribution: Dries commentedSession data is already garbage collected. This is not needed.
Comment #14
chx CreditAttribution: chx commentedOnly when the session expires. That's by far not enough, if you have a busy ecommerce site (which will use #multistep heavily, I guess) and the default about two week session expire.
Comment #15
Dries CreditAttribution: Dries commentedShow me it is a real problem, and I'll reconsider this. You can store a LOT of form data before it becomes problematic.
Comment #16
eaton CreditAttribution: eaton commentedDries, any form that has #multistep set to true will store its initialization params in the session. Each instance of each page in each form with that flag will have its seed data stored. One by one it's not really a problem. But that means those unused seed values will accumulate in saved sessions. And if the users log on regularly, their sessions won't be garbage collected. On a site with a few hundred users, doing something like creating polls regularly, or using any new module that instantiates a multi-step form, the endless accumulation could be a serious problem.
Comment #17
eaton CreditAttribution: eaton commentedOK. To demonstrate, I set up a basic site, visited the modules page, turned on poll module, and made a poll. I previewed once, then saved the values. Then I checked the contents of $_SESSION.
Pay special attention to the entry for node_poll_form -- the *full form_values* array is persisted. This is necessary because the form values become a seed for the next step in many multistep forms. And we're lucky, because core only tends to use it on forms with a very small footprint. But persisting all that data is a BIG, big problem.
If I add something like ecommerce, which will use extensive multistep forms for *every purchase made by every user* the results will be pretty grim. It boils down to this: the information we store in the session is transient, and is not intended for long term persisting. There's no reason to keep it, and keeping it can become a problem in the future because of the eternal accumulation of data under some circumstances. My big question with this patch was whether waiting a day left too *much* opportunity for accumulating session bloat. The more I look at it, something like an hour might be much better. Leaving it to eventual session expiration is just bad. Very, very bad
Comment #18
eaton CreditAttribution: eaton commentedComment #19
drummI think this is okay. Is there any good way to be more proactive about storing less data?
Comment #20
chx CreditAttribution: chx commentedNo.
Comment #21
eaton CreditAttribution: eaton commenteddrumm: to expand on chx' statement, there's not really any way to reduce the amount of information we store without making a lot of assumtions about what subsequent steps in the multipart forms will require. In 6.0, we might consider being able to mark certain fields as explicitly multistep, and only having THOSE fields appear in the value collection that's passed on to the next step. That, though, is prone to its own set of weird hassles.
I'll take a closer look today to see if there are any other opportunities to automatically clear stuff, but I don't think we can do too much more.
Comment #22
eaton CreditAttribution: eaton commentedEr, more specifically, I don't see how we can do too much more without adding a LOT of code VERY late in the cycle.
Comment #23
Dries CreditAttribution: Dries commentedOr store it in the cache table and get GC for free?
Comment #24
eaton CreditAttribution: eaton commentedThat's one possibility, although it would rule out any possibility of using multistep forms in pre-bootstrap scenerios like the installer, and would lead to a few more DB hits each time a multistep is displayed. I'd have to look, too, to ensure that a cache_clear_all() by user a wouldn't destroy an in-progress form by user b.
It might be an acceptable solution, but it's not without drawbacks. chx, any thoughts?
Comment #25
chx CreditAttribution: chx commentedThis is a very dangerous path to walk on. If you are on a complex portal site with ecommerce and you are checking out and someone submits a node then boing! your cart is gone? I do not think I like the idea.
Comment #26
pwolanin CreditAttribution: pwolanin commentedSince there is now the capacity for additional cache tables, why not have a separate cache table for this purpose? I think cache_clear_all() only clears the page cache, so the multi-step forms cache could have its own GC performed via a cron hook.
Comment #27
drummI like the cache idea. Seems like a better fit for the data. I'm not sure how chx's scenario could happen with proper cache keys (uid would be in the key).
Comment #28
pwolanin CreditAttribution: pwolanin commentedwhy not construct a cache ID on some combination of uid, form_id, and session to make it maximally granular?
Comment #29
eaton CreditAttribution: eaton commentedThis still raises difficult questions: I'd prefer that we NOT use a table, at least at this point, to store the data. It will make multistep forms incompatible with the pre-db installation process, something that closes off future enhancements to us. If we do do this, I'd prefer a separate cache table just for transient form data. What's the 'process' for generating a new cache table?
Comment #30
Dries CreditAttribution: Dries commentedThe problem with using the cache table is that is know to be 'nukable'. That is, at any point in time, one might nuke the cache without excepting data loss, or causing trouble. The old cache data is always properly regenerated. The session data, on the other hand, is not supposed to get lost and can't be regenerated. So maybe using the cache isn't a good idea after all?
Comment #31
eaton CreditAttribution: eaton commentedPrecisely. The session *does* do exactly what we need; it's just a matter of clearing it out in a more timely fashion to prevent evil stuff like, 'five million people filled out my survey form, now the sessions are filling my database..."
Comment #32
m3avrck CreditAttribution: m3avrck commentedIt seems to me that the last patch posted solves this problem elegantly.
A separate table to track this is a farse. This multistep data is tied 100% to a user's session. We have a SESSIONS table just for that.
All that we need is to properly clear this table ONCE the user has completed this multistep form. You can't just clear this at anytime, what if you are to wipe a session that a user is in the middle of? A safe guard of 1 day is a reasonable guess for this.
The ideal solution would clear this session data as soon as the user submits the multistep form. In which case, we'd need a new param, like #form-final-step and then when that is submitted, we clear session data. That would be the BEST solution but the code to do so, might not be.
Comment #33
eaton CreditAttribution: eaton commentedWe seem to have come full circle. I still feel the solution in the patch is the best one we have until 6 rolls around and we can look at more comprehensive/agressive/api-breaking approaches if necessary. For 5, though, this s a great boon.
Comment #34
Owen Barton CreditAttribution: Owen Barton commentedI agree - relying on session gc won't be any good for sites with a regularly visiting users and/or long session expiration.
+1 for session_3.patch!
Comment #35
Owen Barton CreditAttribution: Owen Barton commentedRTBC - patch synced with HEAD
Comment #36
neclimdul+1 I've tried to break this and haven't been able too. It only changes things on multistep forms so this only affects sites which use a large number of multistep forms. Very good.
Comment #37
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD.
Comment #38
(not verified) CreditAttribution: commentedComment #39
Gribnif CreditAttribution: Gribnif commentedI really don't think this change, as it currently stands, fully resolves the issue. If I have a multistep form on my site, what's to keep someone from writing a script that reloads the page repeatedly until my database fills up?
The MySQL docs define the size of a LONGTEXT type as: "A TEXT column with a maximum length of 4,294,967,295 or 4GB (2^32 – 1) characters. The maximum effective (permitted) length of LONGTEXT columns depends on the configured maximum packet size in the client/server protocol and available memory."
Sounds dangerous to me.
Perhaps we can come up with some way to identify the fact that the user has revisited the first step of a given form via reload (as opposed to some sort of "back" button on the form itself) and then delete all of the previous session data? In this case, there's no sense keeping around other data from partially-completed versions of the same form, so just nuke it all.
The attached patch attempts to do exactly this, based on the page's URL and the $form_id parameter. It does still leave one copy of the form data hanging around after the form has been completely submitted, but that's not too bad IMHO. What does everyone think?
Comment #40
moshe weitzman CreditAttribution: moshe weitzman commentedIMO we need not bloat our codebase with "whats to stop someone from x". there are way way too many holes to plug.
Comment #41
chx CreditAttribution: chx commentedD6 won't use the session for this, for D5 , I am not inclined to 'fix' as Moshe said.