Drush provides a feature called sql-sanitize that helps to delete data that shouldn't leave a production environment.

A drush file with these contents are all you should need.

function webform_drush_sql_sync_sanitize($site) {
  drush_sql_register_post_sync_op('webform_submitted_data',
    dt('Delete all data submitted to webforms (depending on the site config, may contain sensitive data).'),
    "TRUNCATE webform_submitted_data;");
}

If there are other tables that webform stores data in, then please do add them in a similar fashion.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

greggles created an issue. See original summary.

greggles’s picture

Title: Sanitize private data in zendesk_users » Sanitize private data in webform_submitted_data
greggles’s picture

Issue summary: View changes
DanChadwick’s picture

Looking at the sql implementation, do we have to deal with table prefixes?
http://api.drush.org/api/drush/commands%21sql%21sync.sql.inc/function/sq...

This goes in the drush file with the drush commands?

Shall we delete submissions too, which contain a record of when and who submitted the webform? If we don't, then we will have empty submission (which is fine if that's what is desired)

greggles’s picture

I suggest deleting submissions as well. I was looking for big tables.

I'm not sure there is a good way to deal with prefixes in these hooks. I think people with prefixes are used to the feature not working for them.

DanChadwick’s picture

There's a bit of a problem with doing this in bulk. Every component gets a chance to delete something, and then every module gets a chance at deleting. See webform_submission_delete().

Also see the related issue with clearing large numbers of submissions.

And if webform_results_clear uses the batch api, then we could loop through all webform nodes. This could be written to not consume unlimited memory, but it could take a verrrry long time.

Can drush commands use the batch api?

DanChadwick’s picture

Status: Active » Postponed (maintainer needs more info)

Greg -- Can you clarify what is desired here?

  1. Delete the data directly from the database with SQL, bypassing hooks. Will be fast, but may result in latent database damage because clean-up code won't get called. For example, any uploaded files will be stranded and never deleted. Other contrib modules may have issues.
  2. Delete the data for every webform, for every submission, calling all the hooks. Will be leave the database correct, but could be VERY lengthy for big databases.

I am in the process of creating a drush command to delete a node's submissions, which is slightly related to this issue.

greggles’s picture

I think 2 is ideal, but I'm not sure that drupal is bootstrapped sufficiently to achieve it. It's worth trying it out.

DanChadwick’s picture

Status: Postponed (maintainer needs more info) » Active

not sure that drupal is bootstrapped sufficiently to achieve it.

I thought that Drupal was fully bootstrapped for drush commands. I just wrote a drush command that deletes the submissions for one node. This command would do the same but for all webform nodes.

We can see if it is useful. Could be hours of execution for really bit databases.

greggles’s picture

For a random drush command, it is, sure. For sql-sanitize it might not be. Can you try that in sql-sanitize?

DanChadwick’s picture

Status: Active » Needs review
FileSize
763 bytes

Ack. I don't think a proper job is possible. Consider:

function drush_sql_sanitize() {
  drush_sql_bootstrap_further();
  drush_include(DRUSH_BASE_PATH . '/commands/sql', 'sync.sql');
  drush_command_invoke_all('drush_sql_sync_sanitize', 'default');
  $options = drush_get_context('post-sync-ops');
  if (!empty($options)) {
    if (!drush_get_context('DRUSH_SIMULATE')) {
      $messages = _drush_sql_get_post_sync_messages();
      if ($messages) {
        drush_print();
        drush_print($messages);
      }
    }
  }
  if (!drush_confirm(dt('Do you really want to sanitize the current database?'))) {
    return drush_user_abort();
  }

  $sanitize_query = '';
  foreach($options as $id => $data) {
    $sanitize_query .= $data['query'] . " ";
  }
  if ($sanitize_query) {
    $sql = drush_sql_get_class();
    $result = $sql->query($sanitize_query);
  }
}

Regardless of the bootstrap question, the hook is invoked before the confirmation for the purpose of building SQL commands. I think the best we can do is truncate the relevant databases.

Here's a patch. Someone else can test this, please.

DanChadwick’s picture

Status: Needs review » Fixed

No testing? :( Well, let's commit it since I'm pretty confident that the SQL is correct it it follows Greg's template.

Committed to 7.x-4.x.

  • DanChadwick committed 4243541 on 7.x-4.x
    Issue #2555051 by DanChadwick, greggles: Added sanitize private data in...
DanChadwick’s picture

Version: 7.x-4.x-dev » 8.x-4.x-dev
Status: Fixed » Patch (to be ported)

Needs up-port to D8.

fenstrat’s picture

Version: 8.x-4.x-dev » 7.x-4.x-dev
Status: Patch (to be ported) » Fixed

Committed and pushed to 8.x-4.x.

  • fenstrat committed c2a5140 on 8.x-4.x
    Issue #2555051 by DanChadwick, greggles: Added sanitize private data in...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.