Moving a class between files can cause SQL errors on registry rebuild [#797024]

Comment	File	Size	Author
#49	stale.filetimes.kill_.kittens.patch	1.61 KB	Anonymous (not verified)

#48	stale.filetimes.kill_.kittens.patch	1.93 KB	Anonymous (not verified)

#18	registry.transaction.patch	6.28 KB	Anonymous (not verified)

#9	registry.transaction.patch	6.28 KB	Anonymous (not verified)

#3	797024-registry-rebuild-merge-3.patch	1.17 KB	pwolanin

#2	797024-registry-rebuild-ordering-2.patch	1.71 KB	pwolanin

Comment #1

pwolanin CreditAttribution: pwolanin commented 12 May 2010 at 15:44

One possible approach is to alter the {registry} primary key so that the filename is a third element.

Log in or register to post comments

Comment #2

pwolanin CreditAttribution: pwolanin commented 12 May 2010 at 16:29

Status:

Active

» Needs work

File	Size
797024-registry-rebuild-ordering-2.patch	1.71 KB

hmm, this patch was my first attempt - but doesn't seem to fully resolve the problem. In particular will not resolve it if the class moves between two existing files.

Log in or register to post comments

Comment #3

pwolanin CreditAttribution: pwolanin commented 12 May 2010 at 16:48

Status:

Needs work

» Needs review

File	Size
797024-registry-rebuild-merge-3.patch	1.17 KB

Changing the insert query to a merge query resolves the problem for me. Obviously that's less efficient, but the name and type should be unique, so adding filename to the Primary key seems like a bad hack.

Log in or register to post comments

Comment #4

Crell CreditAttribution: Crell commented 12 May 2010 at 23:33

Subscribing. No time to look at this right now, but I will try to soon.

Log in or register to post comments

Comment #5

Anonymous (not verified) CreditAttribution: Anonymous commented 13 May 2010 at 06:30

hmm, ouch. i think the cleanest way to do it is clear resources of all to-be-parsed files before adding any new ones. no way we could contemplate that without wrapping this whole thing in a transaction though.

Log in or register to post comments

Comment #6

pwolanin CreditAttribution: pwolanin commented 13 May 2010 at 11:50

@justinrandell - that might work too, but given the way the code is structured it seemed safer to do the merge query. Given that not all DBs will actually do transactions, can we rely on that to protect us against an empty table when we need the data?

Log in or register to post comments

Comment #7

Anonymous (not verified) CreditAttribution: Anonymous commented 13 May 2010 at 14:37

pwolanin: without transactions, this whole process is open to creating WSODs on a run that doesn't complete. there's kind of a nod to that in the code comment "We update the filectime/filemtime after we've saved the files resources rather than here, so if we don't make it through this rebuild, the next run will reparse the file." its a bigger window to delete them all at once, but its the same window.

its also rare for large numbers of files to change between runs, so the window is not that much bigger.

either way, there's no good reason not to wrap this in a transaction, but that's scope creep.

Log in or register to post comments

Comment #8

pwolanin CreditAttribution: pwolanin commented 13 May 2010 at 19:11

Sure, we could wrap the above patch with a transaction too.

Log in or register to post comments

Comment #9

Anonymous (not verified) CreditAttribution: Anonymous commented 14 May 2010 at 00:55

File	Size
registry.transaction.patch	6.28 KB

patch attached that wraps a registry rebuild in a transaction, and deletes all resources for to-be-parsed files upfront before adding any new resources.

very lightly tested, only tested a clean install works, and that the registry tests parse. lets see what the bot thinks.

i guess we need a test for the "move a class from one file to another case".

Log in or register to post comments

Comment #10

14 May 2010 at 02:30

Status:

Needs review

» Needs work

The last submitted patch, registry.transaction.patch, failed testing.

Log in or register to post comments

Comment #11

pwolanin CreditAttribution: pwolanin commented 14 May 2010 at 02:50

That's going to be a rather tricky test to write.

Log in or register to post comments

Comment #12

Anonymous (not verified) CreditAttribution: Anonymous commented 16 May 2010 at 02:40

Status:

Needs work

» Needs review

#9: registry.transaction.patch queued for re-testing.

Log in or register to post comments

Comment #13

16 May 2010 at 04:20

Status:

Needs review

» Needs work

The last submitted patch, registry.transaction.patch, failed testing.

Log in or register to post comments

Comment #14

pwolanin CreditAttribution: pwolanin commented 16 May 2010 at 12:46

There is only one fail, but the patch looks like it changes a lot more than really needed. Why not just add the transaction around each file?

Log in or register to post comments

Comment #15

Anonymous (not verified) CreditAttribution: Anonymous commented 16 May 2010 at 23:16

@pwolanin: because we want it all to work or none of it to work? there's very little that has changed with the patch, its mainly just whitespace.

that failing patch smells of some nasty coupled code somewhere. i'll have a look today.

Log in or register to post comments

Comment #16

Crell CreditAttribution: Crell commented 17 May 2010 at 17:13

While registry rebuild is a great example of where we want transactions, it's not per-file. It's per-rebuild.

The problem here is:

Before:
foo.php: class A {}
bar.php: nothing

After:
foo.php nothing
bar.php: class A {}

If the registry rebuild hits foo.php first, then there is a brief period where class A exists nowhere, which is OK. If it hits bar.php first, then it tries to insert class A in bar.php and chokes on a duplicate key error.

Possible solutions include:

1) Wrap the entire rebuild in a transaction: This wouldn't protect us against the above breaking stuff, but the breakage wouldn't be as bad because we're not left with a half-built registry table. So net win, but only part of the solution.

2) Wrap each file in a transaction. This doesn't help us here unless one file contains the name name twice, which would break PHP anyway. What it would do if we parsed bar.php first is rollback bar.php's changes entirely, then process foo.php (unless we told it to stop) and clear out class A, and then on the next rebuild it would find the new location for A. But between those rebuilds (which is two user-triggered answers) Drupal won't know where A is, and that's bad.

3) Use a merge query instead of a multi-insert. This is actually what we did originally when we were still tracking functions. It's a bit slower, but is not going to fatal out on us. The side effect is that if a class really is defined in more than one location, whichever one parses second will win. That makes "define it twice and include just one of 'em" break, but that's OK since that's a poor pattern anyway, especially for classes. That's not supported in the first place. We will still need to delete all entries in a file before parsing so that we clear out removals.

So we want to do #1 and #3, I'd say.

Now, I should go look at the patch... :-)

Log in or register to post comments

Comment #17

Anonymous (not verified) CreditAttribution: Anonymous commented 18 May 2010 at 03:40

@Crell - the example you raise in #16 is why i think the cleanest solution is to delete all resources for files that have changed upfront, before doing any inserts, and after starting a transaction. a failed rebuild can still lead to WSODs, but at a known state, giving some chance to roll back to the previous code and try again.

using inserts instead of merge does enforce "you can only have that class defined once" from 3. above. bug or feature?

Log in or register to post comments

Comment #18

Anonymous (not verified) CreditAttribution: Anonymous commented 18 May 2010 at 04:31

Status:

Needs work

» Needs review

File	Size
registry.transaction.patch	6.28 KB

attaching patch again, i can't reproduce the fail locally. maybe the testbot is still running myisam? if so, maybe that's the issue? do we have a case here of transactions for the win, but we're still hobbled by myisam? sad panda :-(

the patch still uses the "delete all then insert" pattern.

i'm not sure we can support the "allow a class to be declared twice, then pick one at runtime" pattern and use autoloading. this is a not a registry limitation, but a php one. to support that, we'd have to allow for a global flag to be set so that drupal_autoload_class() could check it and choose the right class. that is just evil and ugly. i'd rather we blow up early than silently load an arbitrary version of the class.

Log in or register to post comments

Comment #19

18 May 2010 at 06:50

Status:

Needs review

» Needs work

The last submitted patch, registry.transaction.patch, failed testing.

Log in or register to post comments

Comment #20

catch

he/him

English

CreditAttribution: catch commented 18 May 2010 at 07:06

I'm pretty sure the test bot runs myisam for performance reasons.

Log in or register to post comments

Comment #21

Anonymous (not verified) CreditAttribution: Anonymous commented 18 May 2010 at 07:40

sad panda. i just hacked includes/database/mysql/schema.inc to create myisam tables, and i still can't reproduce the flood fail.

Log in or register to post comments

Comment #22

pwolanin CreditAttribution: pwolanin commented 18 May 2010 at 12:42

no - as best as I know the testbot should be running on innoDB - but that may vary by test instance.

Log in or register to post comments

Comment #23

catch

he/him

English

CreditAttribution: catch commented 18 May 2010 at 12:56

hmm, I double checked and according to http://qa.drupal.org/performance-tuning-tips-for-D7 it is indeed innoDB.

Log in or register to post comments

Comment #24

Anonymous (not verified) CreditAttribution: Anonymous commented 18 May 2010 at 14:29

allllrighty. can anyone else confirm that the flood test passes on their local setup? if so, then we need to investigate the bots. if not, then we need to look at the difference between my setup (stock ubuntu lucid) and whoever can reliably get the test to fail.

Log in or register to post comments

Comment #25

Berdir

German

Switzerland

CreditAttribution: Berdir commented 18 May 2010 at 21:34

Just wanted to note that I can't reproduce the test fail either on my local system.

Log in or register to post comments

Comment #26

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 24 May 2010 at 19:31

Status:

Needs work

» Needs review

#18: registry.transaction.patch queued for re-testing.

Log in or register to post comments

Comment #27

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 24 May 2010 at 19:32

The test passes in my environment.

Log in or register to post comments

Comment #28

24 May 2010 at 21:10

Status:

Needs review

» Needs work

The last submitted patch, registry.transaction.patch, failed testing.

Log in or register to post comments

Comment #29

Anonymous (not verified) CreditAttribution: Anonymous commented 25 May 2010 at 01:24

i've pinged boombatower about this. we'll need to figure out why the bot is failing when three different developers can't reproduce the fail.

Log in or register to post comments

Comment #30

Berdir

German

Switzerland

CreditAttribution: Berdir commented 25 May 2010 at 06:57

Status:

Needs work

» Needs review

#18: registry.transaction.patch queued for re-testing.

Log in or register to post comments

Comment #31

25 May 2010 at 07:20

Status:

Needs review

» Needs work

The last submitted patch, registry.transaction.patch, failed testing.

Log in or register to post comments

Comment #32

Damien Tournoud CreditAttribution: Damien Tournoud commented 3 June 2010 at 00:09

Status:

Needs work

» Reviewed & tested by the community

Log in or register to post comments

Comment #33

YesCT CreditAttribution: YesCT commented 3 June 2010 at 04:33

if it gets committed like this... wont all patches start failing the bot testing?

Log in or register to post comments

Comment #34

catch

he/him

English

CreditAttribution: catch commented 3 June 2010 at 05:08

Last patch now passes the test bot.

Log in or register to post comments

Comment #35

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 6 June 2010 at 00:33

Status:

Reviewed & tested by the community

» Fixed

It looks like all this really does is wrap the existing code in a transaction, which seems reasonable. This won't help those on MyISAM, of course, but at least we un-screw part of our userbase. :P It'd be nice to have test coverage for this, but I can't really envision a way for that to happen.

I'm a bit nervous about whether that test passage was a fluke or whether the previous fails were flukes. I guess if testbot starts crapping out after this, we'll have a good place to start on where to point the finger of BLAAAAAME. ;)

Committed to HEAD!

Log in or register to post comments

Comment #36

pwolanin CreditAttribution: pwolanin commented 6 June 2010 at 03:49

Status:

Fixed

» Needs work

ugh, wait.

I clearly did not look carefully enough at the last patch. If ALL it does is wrap the current code in a transaction, then it does not solve the bug. Every attempt to rebuild this will cause an exception and rollback, right?

Per #16, to solve the bug we MUST use a merge query like my original patch did.

Log in or register to post comments

Comment #37

pwolanin CreditAttribution: pwolanin commented 6 June 2010 at 04:29

also, this looks like a bug to me:

      $modified_file = !isset($file['filectime']) || !isset($file['filemtime'])
                  || $filectime != $file['filectime'] || $filemtime != $file['filemtime'];

Since it would seem not to account for the fact that the rebuild may be run on different webnodes, or a DB dump pulled from a different server and run locally. Looks like a good use for hash functions instead.

Log in or register to post comments

Comment #38

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 6 June 2010 at 04:30

There is a merge query hunk at the bottom, but I can't parse the earlier patches well enough to tell if it's equivalent.

Log in or register to post comments

Comment #39

Anonymous (not verified) CreditAttribution: Anonymous commented 6 June 2010 at 05:24

Status:

Needs work

» Fixed

discussed this with pwolanin in #drupal, now marking as fixed.

we delete all resources for files we are going to parse, so we should be good.

Log in or register to post comments

Comment #40

pwolanin CreditAttribution: pwolanin commented 6 June 2010 at 05:27

Yes, I think this is ok as long as we assume we have reasonable transactions.

new issue for #37: #819388: Switch registry back to using hashes not file mtime and ctime

Log in or register to post comments

Comment #41

David_Rothstein CreditAttribution: David_Rothstein commented 7 June 2010 at 02:27

Status:

Fixed

» Needs work

This patch led to a massive increase in the amount of time it takes to install Drupal (from ~8 seconds to ~22 seconds on my machine). Granted, installations only happen once, so their performance is not the most important thing in the world, but I recall a fair amount of work a few months ago in the D7 dev cycle to shave some seconds off the installer so people didn't have to stare at that progress bar forever, and now we just doubled or even tripled the entire installation time :(

@pwolanin's earlier patch in this issue does not have a noticeable effect on the time it takes to install Drupal.

Also, the flood test failure seems to be happening out in the wild. It happened to a patch of mine earlier today even though that patch touched nothing but JavaScript (#819724: Improve tabledrag accessibility for screen-reader users with WAI-ARIA).

Log in or register to post comments

Comment #42

rfay

English

Palisade, CO, USA

CreditAttribution: rfay commented 7 June 2010 at 02:40

This massive increase of time has two other very serious effects:

1. It uses far more resources in our testing environment. Remember that we do hundreds or thousands of installs in each D7 test suite. I note that we've doubled the time it takes to run the test suite. Not so good.

2. The important work in #686196: Meta issue: Install failures on various hosting environments aims to make Drupal install before timeouts on run-of-the-mill shared hosting. This would be a step back for us.

Log in or register to post comments

Comment #43

catch

he/him

English

CreditAttribution: catch commented 7 June 2010 at 02:57

I'm surprised that just adding a transaction would take so much time.

Didn't profile, but ran the installer via drush as 'time drush -v si --db-url:d7 -y' with HEAD and with this patch reverted:

HEAD:


real	1m19.336s
user	0m47.867s
sys	0m15.121s

Patch reverted:


real	0m44.215s
user	0m26.682s
sys	0m10.165s

About what rfay reports.

That almost certainly means a regression on #686196: Meta issue: Install failures on various hosting environments then.

Log in or register to post comments

Comment #44

Crell CreditAttribution: Crell commented 7 June 2010 at 03:25

My gut intuition is that it's not the transaction that slows it down as much as the changes to the DB queries that result, although I've not tested to confirm that. But normally transactions should speed things up if anything since the DB doesn't need to commit to disk as many times.

Log in or register to post comments

Comment #45

Anonymous (not verified) CreditAttribution: Anonymous commented 7 June 2010 at 04:07

investigationing.

Log in or register to post comments

Comment #46

Anonymous (not verified) CreditAttribution: Anonymous commented 7 June 2010 at 04:57

ouch. can anyone reporting speeds, also give info on their setup? chatting to chx in #drupal, he's using skip innodb, so that's important to know. i'm using innodb on ubuntu lucid.

with patch reverted:

4.15user 0.35system 1:12.09elapsed 6%CPU (0avgtext+0avgdata 202448maxresident)k
64inputs+0outputs (1major+22878minor)pagefaults 0swaps
3.98user 0.30system 1:16.01elapsed 5%CPU (0avgtext+0avgdata 202432maxresident)k
0inputs+0outputs (0major+22892minor)pagefaults 0swaps

with patch:

10.60user 0.50system 1:16.04elapsed 14%CPU (0avgtext+0avgdata 203232maxresident)k
0inputs+0outputs (0major+23215minor)pagefaults 0swaps
10.26user 0.49system 1:16.44elapsed 14%CPU (0avgtext+0avgdata 203232maxresident)k
0inputs+0outputs (0major+23211minor)pagefaults 0swaps

Log in or register to post comments

Comment #47

Anonymous (not verified) CreditAttribution: Anonymous commented 7 June 2010 at 05:04

hmm. on innodb, its even slower without the transaction, but with same code reorganisation.

in other words, take HEAD and comment out the line that starts the transaction, and the try catch block.

14.31user 0.59system 3:33.73elapsed 6%CPU (0avgtext+0avgdata 203232maxresident)k
0inputs+0outputs (0major+23004minor)pagefaults 0swaps

Log in or register to post comments

Comment #48

Anonymous (not verified) CreditAttribution: Anonymous commented 7 June 2010 at 06:05

Status:

Needs work

» Needs review

File	Size
stale.filetimes.kill_.kittens.patch	1.93 KB

oh, dear. bit embarrassing how obvious this was.

the second loop in _registry_parse_files was using stale $filectime and $filemtime values. so, umm, yeah, we reparse every file bar one, every run through.

attached patch keeps a list, and we're back to normal install times.

4.41user 0.23system 1:14.69elapsed 6%CPU (0avgtext+0avgdata 202336maxresident)k

Log in or register to post comments

Comment #49

Anonymous (not verified) CreditAttribution: Anonymous commented 7 June 2010 at 06:08

File	Size
stale.filetimes.kill_.kittens.patch	1.61 KB

without the debug.

Log in or register to post comments

Comment #50

chx CreditAttribution: chx commented 7 June 2010 at 06:12

Status:

Needs review

» Reviewed & tested by the community

Before patch real 0m12.535s after patch real 0m4.164s

Log in or register to post comments

Comment #51

catch

he/him

English

CreditAttribution: catch commented 7 June 2010 at 16:40

Title:

Moving a class between files can cause SQL errors on registry rebuild

» Moving a class between files can cause SQL errors on registry rebuild (TESTING BROKEN)

Extra extra critical, patch looks fine.

Log in or register to post comments

Comment #52

pwolanin CreditAttribution: pwolanin commented 8 June 2010 at 02:48

ok, well let's get this in and then deal with the other issue of using file hashes instead of times

Log in or register to post comments

Comment #53

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 8 June 2010 at 03:16

Title:	Moving a class between files can cause SQL errors on registry rebuild (TESTING BROKEN)	» Moving a class between files can cause SQL errors on registry rebuild
Status:	Reviewed & tested by the community	» Fixed

Oh, dear lord! :\ Thanks for the sleuthing, folks.

Committed to HEAD!