Problem/Motivation
Migrate source tests are dependent on a source DB that is currently stored as a .gz file. This complicates the patch creation process.
This issue was brought up by @alexpott over in #2281393: D6->D8 Blocks - Custom titles not imported.
Proposed resolution
I discussed this issue at a Florida DrupalCamp sprint with @mrjmd, @douggreen, @thomasVI, @phenaproxima and we discussed several solutions. The consensus was that the simplest solution was to just get rid of the d6.gz file all together in lieu of a d6.sql file. Along with with, we discussed decreasing the size of the d6.gz file by only exporting the structure of certain (cache, session), tables to keep the size of the d6.sql file to a minimum. This solution will make it so the patch creating process for tests involving source DB changes will be the same as the rest of Drupal core and remove the need for a binary file in patches. @mrjmd and @douggreen are working on this patch...
As an additional step, we also discussed getting rid of the d6.sql/.gz file completely by writing a script that generates a new database from the /core/modules/migrate_drupal/src/Tests/Table/d6/ table dumps. This script will also need to generate the necessary cache (and other) tables that aren't included in the table dumps. @thomasVI and @phenaproxima are working on thTis patch.
Remaining tasks
Write the patches, test, and discuss.
User interface changes
None.
API changes
None.
Comment | File | Size | Author |
---|---|---|---|
#73 | 2469623-73.patch | 625.26 KB | phenaproxima |
#71 | 2469623-71.patch | 612.14 KB | phenaproxima |
#69 | interdiff-2469623-69.txt | 812 bytes | phenaproxima |
#69 | 2469623-69.patch | 625.26 KB | phenaproxima |
#67 | interdiff-2469623-67.txt | 1.1 KB | phenaproxima |
Comments
Comment #1
douggreen CreditAttribution: douggreen commentedAttached patch writes sql file and excludes data from several cache tables. The original gz file is 211k and the new uncompressed sql file i 237k.
Comment #2
ultimikeOk - I think this patch looks pretty good, with one small change - we think we need to keep the menu_router data (no interdiff - sorry), I've updated the patch with this change. Here's how we tested:
Also - @mrjmd worked on this patch with both @douggreen and I and should be added to any commit message.
Thanks,
-mike
Comment #3
mrjmd CreditAttribution: mrjmd commentedThe attached patch gets rid of our need for either a gzip or a sql file, by adding a new script that is the inverse of ./core/scripts/migrate-dump-d6.sh. It includes the php files for dump schema of the previously excluded tables, and data for menu_router.
Written by @phenaproxima, @thomasVI.
Comment #4
phenaproximaA couple of things I'd like to do, ideally, with this patch:
I'm sure more things will occur to me, but these are the major ideas for now.
Comment #5
ultimikeOk - this is really interesting now. We've eliminated the need for the .sql dump file to be in the repo at all... Some thoughts:
$databases['d6_migrate']['default']
settings.php entry for the connection to the D6 source DB.Please change "Hahahahahaha" to "Mwahahaha" for more evil-ness.
We should probably re-write this to be similar to the format (numbered) as the migrate-dump-d6.sh (just for consistency).
This patch also will need include the removal of the d6.gz file.
-mike
Comment #6
benjy CreditAttribution: benjy at CodeDrop commentedI actually wrote a script to go from PHP to DB a while back when we first moved to the automated dumps: https://www.drupal.org/files/issues/createdb.txt Not sure if that helps here.
The only thing that concerns me is how do we know that people used the Drupal 6 site to make changes to the dumps? We don't want people manually editing the dumps. Seeing the changes in the .gz file used to give us this confirmation that they knew to generate the binary file.
In general though, i'm +1 for this.
Comment #7
phenaproximaAt the risk of sounding ignorant, is it really a major problem if someone manually edits the dump files? As long as it's valid PHP and the data is correct, what's the difference between manually editing the files and using D6 to do it? If anyone posts a patch with modifications to the dump files, it will be way easier to review the changes and ensure they won't break anything, whereas this is impossible to do at a glance with a gzip. On the other hand, if we continued to use a gz file, we would be unable to tell if D6 was used to make the changes, or the MySQL command line.
Comment #8
benjy CreditAttribution: benjy at CodeDrop commentedThis is the tricky bit to validate without using D6.
We used to manually manipulate the dumps and it was error prone. People made changes to the dumps because it showed a bug or "seemed right" but didn't actually represent a state that Drupal 6 could get into. These were very difficult to track down when I converted into the automated dump creation or just when they arose in general.
Comment #9
phenaproximaIt seems to me that the real issue there is less about the way we want to deliver the dump files (PHP or gz) and more about how to validate that the dumps are safe, since neither method prevents the database from being manually mucked with. Separate issue? :-?
Comment #10
ultimikeI guess it comes down to whether or not we need/want an automated method for validating any changes to the dump files **and** whether or not validating the dumps is important enough to hold up this issue.
How do we validate the dumps? Would we have to create a new DB from the dumps then run some automated tests on the resulting site? Wasn't this the plot of Inception?
- that's an understatement :\
Just throwing this out there, but is there some way we can pseduo-verify that the dumps were creating using the script using some sort of timestamp or something inserted into the dumps as a comment?
-mike
Comment #11
phenaproximaHow about using checksums? MySQL, it seems, natively supports them: https://dev.mysql.com/doc/refman/5.0/en/checksum-table.html. We could get a checksum for each table we dump, embed it in the dump file (as a public constant on the table class, say), and then verify that checksum after creating the table.
My mind also went to the rabbit-hole solution of spinning up a D6 and automatically testing it, but then I felt the beginnings of a headache and decided it wasn't a good idea.
Comment #12
benjy CreditAttribution: benjy at CodeDrop commentedThe timestamp/checksum is a great idea. When you're reviewing migrate patches you want an easy way to see that the patch was generated using these scripts and not via manually manipulating the dumps. The gz gave us that and so will the table checksum.
+1
Comment #13
phenaproximaI'll take a stab at a checksum patch.
Comment #14
amateescu CreditAttribution: amateescu commentedIt would be awesome to have this issue fixed at some point, there are a few patches in the queue that have to update the d6.gz dump and it's not an easy procedure to say the least :)
For example, the most recent ones that I know of are #2454669: SQLite: Fix tests in migrate_drupal test group and #2477853: PostgreSQL: Add support for reserved field/column names.
Comment #15
benjy CreditAttribution: benjy at CodeDrop commentedHa, you should have tried adding a CCK field by manually updating all the dump files, I think the pain that caused wounded most of us :)
But yes, would love to get this finished.
Comment #16
ultimikeA little birdy tells me that good progress will be made on this issue starting this week...
-mike
Comment #17
phenaproximaAdded MD5 hash check to migrate-dump-d6.sh and migrate-generate-d6.sh. Sorry this took so long -- life got in the way!
I decided to use an MD5 of the dump code, rather than a table checksum provided by MySQL, after discussing this with @benjy on IRC. For two big reasons:
1) I don't know if table checksumming is supported by things other than MySQL;
2) Table checksums are extremely sensitive to schema variations, and rightfully so, but migrate-dump-d6.sh does not precisely recreate the schema that a normal Drupal 6 installation will generate. @benjy tells me that this does not affect the tests (at least not yet), but it's something to be aware of. Improving the exactness of the dump files' schema is probably a good idea, but I don't know if that belongs in this issue.
Comment #18
benjy CreditAttribution: benjy at CodeDrop commentedI wonder if we need these cache tables?
Also, can we add a newline after the hash?
I'll try and give this a proper test soon.
Comment #19
phenaproxima@benjy: Don't we at least need the schemas? My patch in #3 alters migrate-dump-d6.sh so that it will dump schema for those tables, but not the data.
Comment #20
benjy CreditAttribution: benjy at CodeDrop commentedWell we never read from those tables to do the migrations so we don't need it for that.
I'm not sure if we need it when going back from D8 -> D6, it would depend on whether D6 automatically re-creates those tables if they're missing? If it doesn't, we would need the schemas yes.
Comment #21
phenaproxima@benjy: I think we'll need them because we're going to be using these dump files, rather than an SQL file, to generate new D6 DBs for people to make changes in.
Comment #22
ultimikeWe'll have to document the process for people to generate a D6 DB from the dump files as well.
-mike
Comment #23
ultimikeI jumped the gun with the "Save" button on my previous post...
Would it be possible to modify the migrate-generate-d6.sh script with a "--check-md5" switch that won't actually import the DB, but just make sure all the table hashes match? This would make it easier for patch reviewers.
Also - holy menu_router table, Batman!
Finally:
The most inconsequential of comments: extra space.
thanks,
-mike
Comment #24
phenaproximaThis patch adds a --validate option, which will output the result of every table's hash check. If you omit the option, the tables will actually be imported, although the hash will still be checked and invalid tables will still be skipped.
Comment #25
ultimike@phenaproxima - thanks for adding the
--validate
option - I think that will be really helpful.I don't think I'm qualified to RTBC this, but it looks good to me. I think you may have missed benjy's request to add a new line after the hashes though...
So, in practice, I guess the hashes would work like this:
Scenario 1 - creating a new patch involving a change to the D6 DB tables
Scenario 2 - reviewing a patch involving a change to the D6 DB tables.
--validate
option to ensure the table dumps were created from actual tables and not manually.Does that sound about right?
thanks,
-mike
Comment #26
andypostplease add interdiff.txt next time, to easy follow changes
PS: review of 300k is hard
Comment #27
phenaproximaHere's an updated patch with a --database option added to both scripts, allowing the user to specify which database to use instead of forcing them to use d6_migrate. (d6_migrate is the default value, but this can now be overridden). As per discussion in the Google hangout last night, we will probably end up generalizing these scripts so that they can be used to create dumps for Drupal 6 and Drupal 7, so this adds a step towards that.
This patch also adds a new line after the hash in each dump file, as per @benjy's request.
Also my first go with interdiff. Yay!
Comment #28
phenaproximaThis fixes a bug in the previous patch, and adds support for detecting the version of Drupal core being dumped (or restored). As far as I can tell, this makes both the dump and restore scripts D7-ready.
Comment #29
phenaproximaThis patch fixes an options-related bug in #28 and #27, so I've hidden those broken brothers. This one works, and is capable of dumping/restoring D6 and D7 databases alike, although it defaults to D6 (using the d6_migrate database). I'm not sure how this affects the work being done over in #2410625: Update D7 dumps to match the D6 dumps.
Ideally I'd like to refactor these two scripts into one, called migrate-db.sh, which combines the dump and restore functions. But in the meantime, would love to get input on this.
Comment #30
phenaproximaYet another version. This patch:
I tried to generate an interdiff, but it choked at some point; my guess is that this is caused by the presence of the binary data formerly known as d6.gz.
Comment #32
phenaproximaApparently, git doesn't like trailing white space when applying patches. This patch should appease the testbot.
Comment #33
ultimikeLooks pretty good - I love the fact it is just a single script for D6/D7 and dump/restore at this point. So clean.
benjy mentioned he wanted to play with it a bit as well, so combining my comments below with his desire to check it out leads me to think it's not ready to RTBC just yet.
Are the scenarios I outlined in comment 25 above valid? Are there any others I'm not thinking of?
Some comments below.
What's up with all these 'info' diffs that looks the same?
"Drupal source data" maybe?
Is there a default behavior if the user doesn't specify --dump, --restore, or --validate?
Why do we need to define the source directory? Can we just define the major Drupal version?
Same question about the source as above.
Ah - I see, if no --dump, --validate, or --restore, then error out.
-mike
Comment #34
phenaproximaLooking at it closely, it looks like the "datestamp" item in that serialized pile of detritus looks different across diffs. Not sure if that presents a problem (I can't imagine why it would), but that seems to be the reason why those diffs are happening.
Just to normalize the input. If we asked for a version, someone could conceivably enter "d7", "drupal7", "Drupal 7", "7", or just about anything else. Asking for the path to the dump files removes all ambiguity. If that's too clumsy, one possibility is to define additional command-line switches to specify the major version, like this:
For Drupal 7:
migrate-db.sh --restore -7 --database=d7_migrate
For Drupal 6:
migrate-db.sh --validate -6
...which, actually, I really, really like. Will post an updated patch with that.
Comment #35
phenaproximaThis patch ditches the --source-dir option in favor of version options (-6 and -7). This makes it more consistent with the --dump behavior, which knows the path to the dump files anyway.
I again apologize for the lack of an interdiff. It really seems to hate binaries in patches.
Comment #36
phenaproximaWhitespace errors, you will be the death of me. This applies cleanly and should pass the testbot.
Comment #37
phenaproximaOne more -- removed an errant mention of the defunct --source-dir option.
Comment #38
ultimikeHmm - I wonder if those datestamps in the "info" field of the "system" table are going to be different for every row in the table each time someone runs the script with
--dump
? That would suck. Is that the timestamp the module was installed, generated on d.o., or something else?Also - for the version arguments, it might be easier to understand with something like
--version=6
or--version=7
instead of just-6
or-7
.Thoughts?
-mike
Comment #39
phenaproximaReplaced -6 and -7 with
--core
, which accepts the values 6, 6.x, 7, and 7.x. I agree that it makes for better UX.As for the datestamps being odd...I'm not sure what's up with that. The dump files in the patch have been spun off from d6.gz as it exists in 8.0.x.
Comment #40
benjy CreditAttribution: benjy at CodeDrop commentedContains \Drupal\migrate_drupal\Tests\Dump\Cache.
The docs are missing the namespace d6/d7 from the namespace.
Could we get a version of the patch with just the script? Make it easier to review?
Comment #41
phenaproximaHere's an updated patch which fixes that. I'm also attaching migrate-db.sh, as contained in the patch. :)
Comment #42
phenaproximaI've postponed #2410625: Update D7 dumps to match the D6 dumps until this issue is fixed, since the current version of the patch covers Drupal 6 and 7.
Comment #43
benjy CreditAttribution: benjy at CodeDrop commentedThis is coming on really well, I love the validate option, so handy!
I tried to restore though and had a fatal error?
Comment #44
phenaproximaI've had that too when I try to restore while migrate_drupal isn't enabled. ;)
Comment #45
benjy CreditAttribution: benjy at CodeDrop commentedSeems like something worth adding a check for?
Comment #46
phenaproximaPoint. This adds that check to the --restore mode.
Comment #47
benjy CreditAttribution: benjy at CodeDrop commentedThis return seems to be hindering my ability to restore? :)
Also, I don't think $connection is in scope here.
Comment #48
phenaproxima/me is an idiot :)
Comment #49
phenaproximaGot rid of an errant return statement pointed out by @benjy on IRC.
Comment #50
chx CreditAttribution: chx commentedThe patch is big and frightening but
git diff HEAD --name-only|grep -v /Table/
reveals only three non-Table files changed, and if you add--diff-filter=D
you will also see two of them are deleted.The table changes surely are good since they pass tests.
So, the only file needing review is the new dumper script. For easier review I have attached it. The only worrisome code is
$php = str_replace('{{PHP_VALUES}}', $values, $php);
but since it is not a generic dumper script we won't have problems with{{PHP_VALUES}}
in the table data. I also like the exclusive use of===
without a single==
. Wish the rest of core did that.The md5() hashing function is also fine for testing purposes -- although it is no longer used for runtime code, PHPunit mock objects also use md5().
Comment #51
rocketeerbkw CreditAttribution: rocketeerbkw as a volunteer commentedDatabase name d6_migrate still hardcoded here
Should the router table be checked for, or removed from the comment?
Not skipping invalid tables anymore?
Comment #52
amateescu CreditAttribution: amateescu for Drupal Association commented#51.1: Fixed.
#51.2: The code does not exclude it anymore so I updated the comment.
#51.3: That's not needed because we're avoiding unneeded tables at dump time.
Setting back to RTBC because my changes are more than minimal.
I also reviewed the rewritten generate script and it looks very good. Actually, so good that at I'm going to use it as the base dump script for this critical issue: #2447573: [meta] Make sure 8.x - 8.x hook_update_N() testing is possible. Great work, @phenaproxima!
Comment #53
amateescu CreditAttribution: amateescu for Drupal Association commentedOops, now with the patch.
Comment #54
amateescu CreditAttribution: amateescu for Drupal Association commented@phenaproxima pointed in out in IRC that I'm silly and got the database name wrong, he will post an updated patch :)
Comment #55
phenaproximaUpdated patch:
1) Fixes incorrect DB name being used when querying for table schema info.
2) Restores table validation during restore, as @rocketeerbkw suggested in #51, since that's a data integrity issue.
3) Documented all the helper functions.
Comment #56
phenaproximaWhoops! Botched a return statement in the previous patch.
You know what they say: 56th time's the charm.
Comment #57
chx CreditAttribution: chx commentedWell, actually, 3rd time is a charm so let's try an RTBC for the third time :)
Comment #60
phenaproximaSetting this back to RTBC after testbot failure.
Comment #61
ultimikeI found a small bug in the new script with the --validate parameter - interdiff and patch included. Here's what I did:
Applied the patch from 56.
Restored the D6 DB.
I didn't make any changes and I dumped the DB using the script. This resulted in a bunch of changes from a doc typo ("cores" -> "core") and a field int length(?). The bug fix in the interdiff is at the very bottom:
I also spoke with alexpott about this issue, and he likes everything about it but would like to see a phpunit test built for the --validate option.
-mike
Comment #62
ultimikeAh bugger - I forgot to add the diffs to the last comment...
-mike
Comment #63
alexpottWe can add a PHPUnit test to ensure that file and hash are correct. I'm very +1 to removing the .gz - gonna be way nicer when reviewing patches that change stuff.
Comment #64
vendion CreditAttribution: vendion as a volunteer commentedThe current state of the database script both UID 0 and UID 1 do not get created when running the restore function. Is there a reason for this? It seems like if we would need to login with admin access to update data for tests this would create a hurdle.
It was noticed that in the current migrate-db.sh script the _db_get_query() function explicitly ignores UID 0 and 1.
Comment #65
benjy CreditAttribution: benjy at CodeDrop commentedNow symfony/console is going into core as a dependency for the upgrade script, maybe we should use that instead: https://www.drupal.org/node/2447573#comment-9911479
It might make testing a bit easier with their CommandTester.
Comment #66
phenaproximaAdded a unit test of migrate-db.sh's validation procedure. See core/modules/migrate_drupal/tests/src/Unit/MigrateDBTableValidationTest.php. No changes were made to migrate-db.sh since #62, so I'm not bothering to attach it again.
Comment #67
phenaproximaAt @alexpott's request, posting the unit test by itself, since nothing else has been changed in the previous patch.
Comment #68
amateescu CreditAttribution: amateescu for Drupal Association commented@benjy, whether we can use symfony console or not has not been decided yet, and that issue will refactor the script added here anyway, so I suggest to move forward with the current patch.
Comment #69
phenaproximaChanged the test as per @alexpott's request. It will now actually validate all table dumps so that contributors will know, at patch submission time, if they screwed up by manually editing a dump file.
The interdiff is only a renamed copy of the test itself, since nothing else has changed.
Comment #71
phenaproximaThis patch regenerates the table dumps so the tests will pass. Beyond that, there are no changes to any code.
Comment #73
phenaproximaRe-rolling yet again to account for DB changes made in ultimike's patch in #62.
Comment #74
benjy CreditAttribution: benjy at CodeDrop commentedDoes the test just copy the logic from the script to show that it works? I don't see the point in that.
Also, in the patch in #73, I can't see MigrateDBTableValidationTest at all?
Comment #75
phenaproxima@benjy, it does copy the script's logic. But not to prove that it works, but rather to automatically test all the dumps at patch submission time. Let testbot help us, I say; CI for the win!
As for the name, it was changed to MigrateTableDumpTest. It's at the very top of the patch. :)
Comment #76
benjy CreditAttribution: benjy at CodeDrop commentedAh ok, validating the hash in a test makes sense, nice call! RTBC again for me.
One small improvement could be a message on the assertion, rather than having hash x does not equal hash y. It could be something like, Table x looks like it has been changed manually and the hash is outdated?
Comment #77
phenaproximaIt sets the path of the dump file as the assert message for that exact reason.
Comment #78
benjy CreditAttribution: benjy at CodeDrop commentedAh great, I was still looking at the patch in #67
Comment #79
ultimikeRTBC +1 on this!
-mike
Comment #80
kdechant CreditAttribution: kdechant commentedI've long wanted a DB fixture system for use with Drupal automated tests (e.g., for custom modules). The script migrate_db.sh (and the files it outputs to "src/Tests/Table") is close to what I had in mind. Would it make sense to (eventually) convert this into a general-purpose fixture generator that isn't specific to Migrate?
Comment #81
benjy CreditAttribution: benjy at CodeDrop commented@kdechant, i've actually used the existing script in core for such a purpose, i used it on the iFrame example module: https://www.drupal.org/sandbox/benjy/2422307
Comment #82
alexpottre #80 that is exactly what @amateescu is planning to do on #2447573: [meta] Make sure 8.x - 8.x hook_update_N() testing is possible.
I think that
Drupal\migrate_drupal\Tests\MigrateTableDumpTest
could be a PHPUnit test - but let's do that in a followup.Committed 78cc7d8 and pushed to 8.0.x. Thanks!