Currently the drush sql-sanitize command works by invoking a hook to identify data to delete or munge. This means that every time new tables or columns are added to a Drupal site the HOOK_drush_sql_sync_sanitize must be updated to cover those new tables/columns. If a developer forgets to update the hook to cover the new table/column then data will be preserved that possibly should be deleted.
It would be nice to have a system where specific data has to be whitelisted for preservation and/or munging so that newly added tables will be emptied by default until they are explicitly whitelisted for preservation.
I'm thinking of a system where a hook returns an array like so:
$tables = [
'users' => [
'queries' => [
'DELETE FROM {users} WHERE status = 0',
"UPDATE {users} SET name = concat('user', uid), mail = concat(uid, 'example.com')",
],
'fields' => ['name', 'mail'],
],
];
- The 'queries' should be a query with table prefixes for passing to db_query.
- Any columns that are not in the array would be set back to their default value as defined in hook_schema if possible, if not possible, the table should be truncated and an error logged.
- Any tables that are not listed would be truncated.
I'm currently planning to add this code into a module called paranoiasanitize.module which will be a sub-module of paranoia.
Comments
Comment #2
c4rl commentedIt seems to me that if the table isn't wholly truncated (by making absent in the structure), some (or all) data is kept either whole or munged.
Could then the
'action'key be'delete where', and it provides a SQL string WHERE clause? Thus, to keep the table whole, one omits'delete where'.I'd also propose that the
'columns'structure allow for three forms of return:Comment #3
gregglesI think that works, I'll give it a shot. My initial thought with explicitly saying truncate was that it might make it easier to understand what is happening, but maybe that's a bit anticipatory.
I believe a closure would require running an update on each row which seems likely to be dramatically slower than doing an update. But maybe this table will be dealing with so few rows that it would be acceptable if it made the code simpler? I'll reconsider as I get closer to that work.
Comment #4
coltraneI think a hook should return:
This drush command doesn't need to be too smart.
The drush command could:
The drush command needs to have default actions (truncate or e.g. update column to empty string) but otherwise its first responsibility is to keep private data out of the final DB and secondly to record when a table's registered sanitization is inadequate (and to fallback to safety). Leave the specifics up to the developer within each hook.
hook return for failing users table sanitization:
Comment #5
gregglesJust want to start saying that I think most of Ben's question in #4 deals with the column level data as opposed to the table level data.
OK, so I got to the point of having a working version of tables and here's what I've found as the node_paranoia_sql_sanitize_tables_list() implementation:
I think that restricting to the where clause is nice because it reduces the text that gets repeated, but it's also really limiting. For example, it forces the node_access/node_revision to be a subselect when it might be better for them to be written as a DELETE with an inner join.
I think Ben's proposed structure works quite well with some small adjustments, so I'm going to update the issue summary to show that.
Comment #7
gregglesI've now pushed some code that provides this feature to git. Given it's a new sub-module rather than a patch to the existing module that seemed like an easier and reasonable solution. It's had some internal review, so it's at least not horrible :)
I see that as the first version and am happy to revert or amend as appropriate. Reviews welcome!
Comment #9
coltraneFound that the generated hooks include prefixed table names and that drush_paranoiasanitize_paranoia_sql_sanitize_whitelist() also applies prefixes resulting in doubly-prefixed tables.
Comment #11
gregglesI fixed that by deleting the line and filed #2660678: Sort out how to ship hooks from d.o and handle prefixes to track future work on it.
I think it might be time to release this and followup work can be in followup issues.