...If not now, then eventually.

Currently, multistep forms save some seed data to the current session. This data isn't being properly cleared out, and as time goes on will mushroom to fill the sessions table. The attached patch makes sure that stale form seed information gets nuked if it's been around for more than a day.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

eaton’s picture

FileSize
2.46 KB

Unecessary hunk pulled out, unecessary date calculations replaced with a simple time value.

dopry’s picture

+1 the code looks sound. It doesn't break test installation.

I had some reservations about the time base session clearing, but chx and eaton explained there isn't enough context from within formAPI to know when a user is done with the form, so time based expiry was the best option.

It should probably be noted to developers that if they're paranoid about dragging around session data they should unset the data stored by their multistep forms when the form is completed.

chx’s picture

FileSize
2.03 KB

Now is the unnecessary hunk removed.

RobRoy’s picture

Status: Needs review » Needs work

Small nit: It would be nice to have a bit more (and a period) in the header docs for the cleanup function so developers know that this cleans out day old data, not all data...something like

"Remove form information from the $_SESSION['form'] array that is more than one day old."

chx’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.89 KB

Comments are important but they shall not keep up a critical...

chx’s picture

uh, this can be read the wrong way, so I add: "so I have rerolled to further it".

RobRoy’s picture

Comments look good, that's all I can really comment on. Budump tssh! :)

m3avrck’s picture

Status: Reviewed & tested by the community » Needs work

It appears this clean function is *never* called in this last patch. Not sure where to put otherwise I would.

chx’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.94 KB

erm yes...

chx’s picture

Status: Reviewed & tested by the community » Needs review

I shall let Eaton RTBC this.

eaton’s picture

Status: Needs review » Reviewed & tested by the community

The conversation with Dopry raised some questions for me, too. Long-term, we may want to think through ways of making up-to-the-moment clearing more straightforward for modules. For now, though, in the final hours of beta 1, this solves the emergency.

RTBC it shall be.

RobRoy’s picture

FileSize
1.98 KB

Comments got borked in the new revision. This patch just changes some comments.

Dries’s picture

Status: Reviewed & tested by the community » Closed (won't fix)

Session data is already garbage collected. This is not needed.

chx’s picture

Status: Closed (won't fix) » Reviewed & tested by the community

Only when the session expires. That's by far not enough, if you have a busy ecommerce site (which will use #multistep heavily, I guess) and the default about two week session expire.

Dries’s picture

Status: Reviewed & tested by the community » Closed (won't fix)

Show me it is a real problem, and I'll reconsider this. You can store a LOT of form data before it becomes problematic.

eaton’s picture

Dries, any form that has #multistep set to true will store its initialization params in the session. Each instance of each page in each form with that flag will have its seed data stored. One by one it's not really a problem. But that means those unused seed values will accumulate in saved sessions. And if the users log on regularly, their sessions won't be garbage collected. On a site with a few hundred users, doing something like creating polls regularly, or using any new module that instantiates a multi-step form, the endless accumulation could be a serious problem.

eaton’s picture

Status: Closed (won't fix) » Needs work

OK. To demonstrate, I set up a basic site, visited the modules page, turned on poll module, and made a poll. I previewed once, then saved the values. Then I checked the contents of $_SESSION.

Array
(
    [form] => Array
        (
            [e4a171783b07279dca279496088672d4] => Array
                (
                    [0] => system_modules
                )
            [feba265b479eac97a574ffb3a1341d65] => Array
                (
                    [0] => system_modules
                )
            [0275691da1ebf1a266baf1058823b75a] => Array
                (
                    [0] => poll_node_form
                    [1] => Array
                        (
                             [uid] => 1
                             [name] => admin
                             [type] => poll
                        )
                )
            [617d62ef5e0e6d5c20ab4f5fcf900724] => Array
                (
                    [0] => poll_node_form
                    [1] => Array
                        (
                            [uid] => 1
                            [name] => admin
                            [type] => poll
                        )
                    [2] => Array
                        (
                            [title] => This is my sample poll
                            [changed] => 
                            [choices] => 5
                            [choice] => Array
                                (
                                    [0] => Array
                                        (
                                            [chtext] => Question 1
                                            [chvotes] => 0
                                        )
                                    [1] => Array
                                        (
                                            [chtext] => Question 2
                                            [chvotes] => 0
                                        )
                                    [2] => Array
                                        (
                                            [chtext] => Question 3
                                            [chvotes] => 0
                                        )
                                    [3] => Array
                                        (
                                            [chtext] => 
                                            [chvotes] => 0
                                        )
                                    [4] => Array
                                        (
                                            [chtext] => 
                                            [chvotes] => 0
                                        )
                                )
                            [morechoices] => 1
                            [active] => 1
                            [runtime] => 0
                            [form_build_id] => 0275691da1ebf1a266baf1058823b75a
                            [form_token] => d8408eeeba2594b40bac0704b26bf690
                            [form_id] => poll_node_form
                            [log] => 
                            [comment] => 2
                            [menu] => Array
                                (
                                    [title] => 
                                    [description] => 
                                    [pid] => 1
                                    [path] => 
                                    [weight] => 0
                                    [mid] => 0
                                    [type] => 86
                                )
                            [name] => admin
                            [date] => 
                            [status] => 1
                            [promote] => 1
                            [op] => Preview
                        )
                )
            [8a5f9625b4bfe1e7df793d9932b96053] => Array
                (
                    [0] => poll_node_form
                    [1] => Array
                        (
                            [uid] => 1
                            [name] => admin
                            [type] => poll
                        )
                )
[node_overview_filter] => Array
        (
        )
)

Pay special attention to the entry for node_poll_form -- the *full form_values* array is persisted. This is necessary because the form values become a seed for the next step in many multistep forms. And we're lucky, because core only tends to use it on forms with a very small footprint. But persisting all that data is a BIG, big problem.

If I add something like ecommerce, which will use extensive multistep forms for *every purchase made by every user* the results will be pretty grim. It boils down to this: the information we store in the session is transient, and is not intended for long term persisting. There's no reason to keep it, and keeping it can become a problem in the future because of the eternal accumulation of data under some circumstances. My big question with this patch was whether waiting a day left too *much* opportunity for accumulating session bloat. The more I look at it, something like an hour might be much better. Leaving it to eventual session expiration is just bad. Very, very bad

eaton’s picture

Status: Needs work » Needs review
drumm’s picture

I think this is okay. Is there any good way to be more proactive about storing less data?

chx’s picture

No.

eaton’s picture

drumm: to expand on chx' statement, there's not really any way to reduce the amount of information we store without making a lot of assumtions about what subsequent steps in the multipart forms will require. In 6.0, we might consider being able to mark certain fields as explicitly multistep, and only having THOSE fields appear in the value collection that's passed on to the next step. That, though, is prone to its own set of weird hassles.

I'll take a closer look today to see if there are any other opportunities to automatically clear stuff, but I don't think we can do too much more.

eaton’s picture

Er, more specifically, I don't see how we can do too much more without adding a LOT of code VERY late in the cycle.

Dries’s picture

Or store it in the cache table and get GC for free?

eaton’s picture

That's one possibility, although it would rule out any possibility of using multistep forms in pre-bootstrap scenerios like the installer, and would lead to a few more DB hits each time a multistep is displayed. I'd have to look, too, to ensure that a cache_clear_all() by user a wouldn't destroy an in-progress form by user b.

It might be an acceptable solution, but it's not without drawbacks. chx, any thoughts?

chx’s picture

This is a very dangerous path to walk on. If you are on a complex portal site with ecommerce and you are checking out and someone submits a node then boing! your cart is gone? I do not think I like the idea.

pwolanin’s picture

Since there is now the capacity for additional cache tables, why not have a separate cache table for this purpose? I think cache_clear_all() only clears the page cache, so the multi-step forms cache could have its own GC performed via a cron hook.

drumm’s picture

Status: Needs review » Needs work

I like the cache idea. Seems like a better fit for the data. I'm not sure how chx's scenario could happen with proper cache keys (uid would be in the key).

pwolanin’s picture

why not construct a cache ID on some combination of uid, form_id, and session to make it maximally granular?

eaton’s picture

This still raises difficult questions: I'd prefer that we NOT use a table, at least at this point, to store the data. It will make multistep forms incompatible with the pre-db installation process, something that closes off future enhancements to us. If we do do this, I'd prefer a separate cache table just for transient form data. What's the 'process' for generating a new cache table?

Dries’s picture

The problem with using the cache table is that is know to be 'nukable'. That is, at any point in time, one might nuke the cache without excepting data loss, or causing trouble. The old cache data is always properly regenerated. The session data, on the other hand, is not supposed to get lost and can't be regenerated. So maybe using the cache isn't a good idea after all?

eaton’s picture

The problem with using the cache table is that is know to be 'nukable'. That is, at any point in time, one might nuke the cache without excepting data loss, or causing trouble. The old cache data is always properly regenerated. The session data, on the other hand, is not supposed to get lost and can't be regenerated. So maybe using the cache isn't a good idea after all?

Precisely. The session *does* do exactly what we need; it's just a matter of clearing it out in a more timely fashion to prevent evil stuff like, 'five million people filled out my survey form, now the sessions are filling my database..."

m3avrck’s picture

It seems to me that the last patch posted solves this problem elegantly.

A separate table to track this is a farse. This multistep data is tied 100% to a user's session. We have a SESSIONS table just for that.

All that we need is to properly clear this table ONCE the user has completed this multistep form. You can't just clear this at anytime, what if you are to wipe a session that a user is in the middle of? A safe guard of 1 day is a reasonable guess for this.

The ideal solution would clear this session data as soon as the user submits the multistep form. In which case, we'd need a new param, like #form-final-step and then when that is submitted, we clear session data. That would be the BEST solution but the code to do so, might not be.

eaton’s picture

Status: Needs work » Needs review

We seem to have come full circle. I still feel the solution in the patch is the best one we have until 6 rolls around and we can look at more comprehensive/agressive/api-breaking approaches if necessary. For 5, though, this s a great boon.

Owen Barton’s picture

I agree - relying on session gc won't be any good for sites with a regularly visiting users and/or long session expiration.
+1 for session_3.patch!

Owen Barton’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
2.02 KB

RTBC - patch synced with HEAD

neclimdul’s picture

+1 I've tried to break this and haven't been able too. It only changes things on multistep forms so this only affects sites which use a large number of multistep forms. Very good.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)
Gribnif’s picture

Status: Closed (fixed) » Active
FileSize
1.68 KB

I really don't think this change, as it currently stands, fully resolves the issue. If I have a multistep form on my site, what's to keep someone from writing a script that reloads the page repeatedly until my database fills up?

The MySQL docs define the size of a LONGTEXT type as: "A TEXT column with a maximum length of 4,294,967,295 or 4GB (2^32 – 1) characters. The maximum effective (permitted) length of LONGTEXT columns depends on the configured maximum packet size in the client/server protocol and available memory."

Sounds dangerous to me.

Perhaps we can come up with some way to identify the fact that the user has revisited the first step of a given form via reload (as opposed to some sort of "back" button on the form itself) and then delete all of the previous session data? In this case, there's no sense keeping around other data from partially-completed versions of the same form, so just nuke it all.

The attached patch attempts to do exactly this, based on the page's URL and the $form_id parameter. It does still leave one copy of the form data hanging around after the form has been completely submitted, but that's not too bad IMHO. What does everyone think?

moshe weitzman’s picture

IMO we need not bloat our codebase with "whats to stop someone from x". there are way way too many holes to plug.

chx’s picture

Status: Active » Closed (won't fix)

D6 won't use the session for this, for D5 , I am not inclined to 'fix' as Moshe said.