Another ported bash script. Useful for determining which tables are changed by various operations on a Drupal site.

Comments

moshe weitzman’s picture

Code looks good.

So, this appears to dump each table to a file and then md5 the contents. Fair enough. Could you elaborate on how you use this during dev or deployment?

greg.1.anderson’s picture

Some co-workers of mine in another division have put up a new Drupal website for their Beta Labs program, and of course they want to make code changes on their dev site and content changes on their live site. The question posed to me was, "how do we know which tables to put into --structure-tables to insure that we get any database changes needed from our dev site changes without overwriting the blogs on the live site?" This led to a discussion on node tables, whether you run updatedb on the dev site or the live site, and other situations where the database tables change on the dev site (adding new modules, for example).

Now, to see what actually changed, you can take sql dumps and diff them, but these diffs are very noisy, and often, the only thing you really care about is whether the table changed or not. In the early days, I used a bash script to print our table names that had changed; sql-hash is a port and improvement of the script that split the database up into tables. The old script saved the whole database table, but since the specific changes of the table are not relevant, I switched to saving just the md5 hash.

The use case, then, goes something like this:

  1. Bring your live site and dev site in sync, so they are clones of each other
  2. Save the sql-hash output from one of the sites
  3. Do whatever changes are needed on your dev site. You users will probably add content to the live site while you're doing this.
  4. When you're ready to sync, take new sql-hash readings from dev and live
  5. Diff your three hash lists to see which tables changed in each place
  6. Do the actual re-sync using sql-sync --structure-tables

There are a couple of weaknesses to this technique.

  1. It is possible, perhaps even likely, that relevant changes might be made to the variables table of both sites (and thanks to the cron timestamps, it's about certain the both variable tables will have inconsequential changes.
  2. Since sql-hash calls sql-dump repeatedly, the hash values produced do not represent a single-moment-in-time snapshot of the database being hashed.

The variables table requires a little additional investigation. It's not currently an issue on any of my sites (I've got it easy -- I only need to preserve the comments table), but it has been in the back of my mind that a script to three-way-merge variable table changes might be useful in some circumstances.

As for the fact that the output of sql-hash is not atomic, that is not a big issue either, since we are only looking for database tables that change, and we use an atomic operation (sql-sync --structure-tables) to do the actual transfer. So, the only risk is that some type of change that wasn't already made earlier might happen in between the time you sql-hash and the time you sql-sync. This can be avoided with a little caution; just be sure to take the live site offline while you are investigating. Over time, you will come to learn that there are only certain tables that can change on the live site based on the kinds of edits that are allowed there, and the sql-hash operation then becomes unnecessary.

Of course, there are other solutions to to data-merge problem, such as exporting the data nodes from the live site and re-importing them on the dev site. This puts an upward limit on how much effort one should spend on syncing via --structure-tables, and of course there is also the option of using vcs checkins of the sql-dump to merge db changes together. Both of these are more complicated solutions (esp. the later, which can hurt you a lot if you're not careful). Even if --structure-tables-based syncing is not used, though, sql-hash remains a valuable early investigation tool to determine what changes when certain high-level operations are performed on a Drupal website.

greg.1.anderson’s picture

Status: Needs review » Needs work

Has a bug on mysql databases.

davidcm’s picture

Status: Needs work » Needs review
StatusFileSize
new6.84 KB

mysql database bug fixed, thanks Greg

I'll write up a nice how to/blog about how sql-hash helped solve my sync problem once my project is finished

greg.1.anderson’s picture

@moshe: I think this is ready to go; if you don't want it in drush core, I'll put it in drush_extras, but I think it would be good to have here.

moshe weitzman’s picture

Project: Drush » Drush extras
Version: » 6.x-2.0-alpha1

wow, i replied here a day ago and it went poof. odd.

thanks for the thorough use case info.

i mulled this and i think its better suited for drush extras.

greg.1.anderson’s picture

Status: Needs review » Needs work

Forgot to make a note here a hile back; this still has a problem with mysql. Seems that mysql includes timestamps and other variable data in the dump, so hashes are always different, rendering this command useless. For it to be useful, it would need to correctly strip out variable information from the dump before generating the hash.

greg.1.anderson’s picture

Status: Needs work » Fixed

Committed a variant of the above (d42e733) that strips out comments (including the timestamp inserted by mysql) before computing the hash. Also added an sql-compare command to compare tables from two different sites.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.