Problem/Motivation
In #3174662: Encapsulate \PDOStatement instead of extending from it a PDO statement wrapper was introduced. The getIterator() method represents a performance regression relative to earlier versions of Drupal 8/9.
Previously if you did a foreach ($statement as $row) {} the PDO statement would fetch a single result from the database in each loop keeping memory consumption to a minimum.
Now, the getIterator() method fetches all results and constructs an ArrayIterator. For a large query site, this is large use of PHP memory.
Also, currently StatamentPrefetch can be rewound without raising any error (database statements cannot be rewound), and there is calling code in migrations that does so failing silently.
Steps to reproduce
Compare memory usage before/after this change.
Proposed resolution
See, the API changes.
Remaining tasks
Add patch and tests.
User interface changes
n/a
API changes
- Introduce
Drupal\Core\Database\StatementWrapperIteratorandDrupal\Core\Database\StatementPrefecthIteratorclasses all implementing \Iterator (instead of \IteratorAggregate) so that iteration on the statement object is kept in sync with the underlying client statement resultset cursor (net of the usage of PDO::MYSQL_ATTR_USE_BUFFERED_QUERY which is a PDO-only feature, other drivers do not necessarily have that, see #45 and #47). - Extract the implementation of \Iterator methods (valid(), next(), current(), rewind(), etc) to a self-contained trait used by both new classes. Do not use \Iterator methods directly in consuming code (they should be used internally by PHP when executing
foreachconstructs IMHO). - Let core MySql and PostgreSql database drivers implement the new StatementWrapperIterator class.
- Let core SQLite database driver implement the new StatementPrefetchIterator class.
Data model changes
n/a
Release notes snippet
| Comment | File | Size | Author |
|---|
Issue fork drupal-3265086
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3265086-fix-memory-usage
changes, plain diff MR !1842
Comments
Comment #2
pwolanin commentedDiscussed this with alexpott in Slack, and in terms of the change he said:
Patch takes that general approach but in the fallback case uses the generator syntax to fetch() one result at a time.
Comment #3
pwolanin commentedComment #5
mondrakeThis is trying to change StatementWrapper from implementing \IteratorAggregate to implement \Iterator instead. This would avoid pre-fetching all records when iterating over the statement.
Comment #6
mondrakeWorking on a more round patch…
Comment #7
mondrakeDeprecating
StatementWrapper::getIteratorand implementing methods and properties to track fetching status so that an \Iterator can be implemented.Comment #8
mondrakeComment #9
mondrakeComment #11
mondrakeComment #12
daffie commentedComment #13
mondrake@daffie besides a deprecation test for getIterator(), are there other tests needed?
Comment #14
mondrakeComment #15
pwolanin commentedWhat's the intent of these changes instead of delegating to the PDO statement? Does that not behave the same in PHP 7 and 8?
Comment #16
pwolanin commentedAh, I see, that is the problem:
So this is giving us consistency across PHP versions?
Comment #17
mondrakeTBH, I do not care. We have a StatementWrapper since 9.1 precisely to abstract away from any underlying statement object.
Comment #18
pwolanin commentedIs there any test code already that does a foreach() on the statement wrapper?
It might also be interesting to have a test that verifies that iterating a large result uses less memory that calling fetchAll() ? Of cours,e I'm not sure if that holds true for all of the databases.
Comment #19
mondrake#18 AFAICS,
Drupal\KernelTests\Core\Database\FetchTestis already testing traversing the executed statement viaforeachloops.I see the point in testing the memory consumption, but I am afraid such a test would be rather tricky/fragile.
Comment #20
mondrakeAdded a CR placeholder, to be filled in at later stage.
Comment #21
mondrakeAdded a deprecation test.
Comment #22
daffie commentedI am trying to understand this patch and why it does it the way it does. I need some help to understand it.
Comment #23
mondrake@daffie those methods are must haves for a class implementing \Iterator.
More about the iterator design pattern: https://refactoring.guru/design-patterns/iterator/php/example
Comment #24
daffie commentedI am still not convinced that we should do this issue.
If we do, than we need testing for the new methods: 'key', 'current', 'next', 'rewind' and 'valid'. Not if we need to add testing for the new class variables and protected helper methods.
Could we update the IS and CR to make more clear why we are doing this issue.
Comment #25
mondrake@daffie have a look at the StatementPrefetch class: this patch does something very similar. The difference is that StatementPrefetch loads ALL the rows in an array in memory to release the database for changes (IIRC SQLite wont' allow e.g. to INSERT or UPDATE if there's an open SELECT resultset, or something like that), here we do not do that, we just keep in synch the traversal of the StatementWrapper through foreach() or while() with fetching from the wrapped client statement.
In fact, currently in HEAD, if you do
foreach($statement as $row) {...}you do something same as StatementPrefetch: PHP looks for the iterator logic, finds that it is an \IteratorAggregate class, gets the iterator viagetIterator()that in the case at hand is afetchAll()to the wrapped client statement. After the patch, PHP finds \Iterator so it knows it has to use the specific methods (current, key, etc), that allow us to fetch one record at a time instead.Comment #26
pwolanin commentedI think this could be greatly simplified by using the Generator syntax and keeping the existing IteratorAggregate interface?
Comment #27
mondrakererolled
Comment #28
daffie commentedI have another thought about this and I see why we should do this.
No new tests are needed.
The CR needs an update.
Comment #30
mondrakeTest fails on SQLite, because it does not implement StatementWrapper. I'm not keen on introducing yet another conditional to skip tests based on the driver. Postpone on getting a more stable solution in #3231950: Split Database tests in 'core' ones and 'driver specific' ones.
Comment #32
daffie commented#3231950: Split Database tests in 'core' ones and 'driver specific' ones has landed.
Comment #33
daffie commentedComment #34
mondrakeRebased to 10.1.x
Comment #35
daffie commented2 unresolved threads on the MR.
Comment #36
mondrakeReplied.
Comment #37
daffie commentedOk, let keep it as it is.
All code changes look good to me.
I have updated the IS and CR.
For me it is RTBC.
Comment #38
alexpottHow come #26 has not been addressed? Like if we can fix this without an API change that'd be nice.
Comment #39
mondrakeCorrect me if I am wrong, but if we rely on the underlying PDOStatement,
1. we re-couple the StatementWrapper to PDO, which we struggled to decouple from
2. we do not solve the issue at hand (the IteratorAggregate would have to fetch all resultset and pass it back in single yields)
I was trying to make it consistent with StatementPrefetch, which implements
\Iteratorand is used by SQLite.Comment #40
daffie commentedThe question of @alexpott has been addressed.
Back to RTBC.
Comment #41
mondrakeRebased
Comment #42
mfbRegardless of this patch being applied, memory usage is surprisingly high if you select all rows of a large table without actually fetching them, for example:
And it takes more time to execute() the bigger the table. So is the whole table being read from the database even if you don't fetch it?
Comment #43
mfbLooks like the answer to #42 is that this is due to
\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => TRUEin modules/mysql/src/Driver/Database/mysql/Connection.php - if I change this to FALSE then the execute() on all rows of a large table uses almost no memory and returns almost instantly.Comment #44
mondrake#42 and #43 are very interesting, thanks @mfb.
It’s worth trying to do that in the MR and see the impact?
Comment #45
mfbCould be worth experimenting but according to https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php USE_BUFFERED is needed if you want to keep executing other queries while iterating the results; and sounds like it should be more performant for typical queries.
Comment #46
mfbYeah, no that doesn't work, but at least this little side tangent allowed me to learn about buffered queries
Comment #47
mondrakeSo #44 is not a good idea, reverted and returned to RTBC.
#45is right, we can't do queries while an unbuffered query is active, see this clearly in Drupal\Tests\mysql\Kernel\mysql\SyntaxTest::testAllowSquareBrackets from the test run:
It would be hard to change that generally for Drupal now. So this issue could help for specific cases when passing a value for PDO::MYSQL_ATTR_USE_BUFFERED_QUERY in the query $options['pdo'] when iterating through large data sets in a controlled manner, i.e. ensuring no other queries are run in the in-between the cursor navigation.
Comment #48
mondrakeComment #49
mondrakeAdded some docs based on @catch feedback. Thanks!
Comment #50
daffie commentedBoth points of @catch have been addressed.
Back to RTBC.
Comment #51
mondrakeI have the mysqli driver failing tests with the patch here applied. Setting to NR while I figure out whether the problem is here or in the mysqli driver code.
Comment #52
mondrakeNo, the changes are not BC, because
::markResultsetIterableand friends are not called by downstream code ::execute and ::fetch* methods currently, obviously. So we'll need to deprecateStatementWrapperand introduce a new statement class instead, e.g. StatementWrapperIterator.Comment #53
mondrakeFor review again.
Comment #54
daffie commentedThe testbot is failing for PostgreSQL.
Comment #55
mondrakeI just picked for testing a version of pgsql too old for D10.
Comment #56
mondrakeComment #57
mondrakeUpdated IS with latest evolution
Comment #58
mondrakeUpdated CR.
Comment #59
daffie commentedAll the code changes look good to me.
The IS and the CR are in order.
For me it is RTBC.
Comment #60
alexpottIsn't the smallest change here
We don't need any deprecations - we don't couple this any more or any less to the underlying PDO Statement.
Comment #61
mondrake#60 didn't really think about it so far... definitely worth trying, thanks!
Comment #62
mondrake#60 as a patch.
Comment #63
mondrakeComment #64
mondrakeComment #65
mondrakeSo #62 fails on this
when rewinding the StatementWrapper iterator,
\ArrayIterator(current HEAD) is rewindable,Generator(after the patch) is not.https://www.php.net/manual/en/generator.rewind.php
So for me #62 is breaking BC. Back to RTBC for the MR.
Comment #66
alexpott@mondrake hmmm.... but in the MR
So i'm not sure that that test is actually real. You can't rewind the MR either. The MR just does NOT error properly.
Comment #67
mondrakeThe MR no-ops. The Generator throws an exception.
I do not know why the migrate plugin rewinds the iterator. But no matter why, HEAD and MR are fine, #62 fails. That’s enough for me to say #62 is not BC.
Note: StatementPrefetch (used by Sqlite) implements \Iterator and its rewind() method is exactly equal to the MR (actually, copy/pasted)
Comment #68
alexpottYeah but what it is doing fails. Like silently. So it's bogus API.
Comment #69
mondrakeYes. To fix that, though, we need to fix the bogus API and the bogus usage of the API.
IMHO we can do the former here, taking care also of StatementPrefetch, but as a deprecation, because otherwise we introduce a BC break.
The latter, though, should be a followup.
Comment #70
mondrakeDeprecated rewinding the statement.
Last commit should fail on the same tests as #62, but here with a deprecation instead. Only for mysql and pgsql here. Next, adjust StatementPrefetch to do the same for SQLite.
Comment #71
mondrakeSplit the \Iterator related methods in a trait, so that the new class uses it but also StatementPrefetch can start using it. Added a deprecation ignore for the statement rewinding error thrown by TrackChangesTest that should be fixed separately.
Comment #72
mondrakeEnded up rewriting entirely StatementPrefetchIterator, benefitting from the StatementIteratorTrait - now both StatementWrapperIterator and StatementPrefetchIterator share the \Iterator implementation code. In the process, added a test for
fetchAllAssoc()to FetchTest.Comment #73
mondrakeFiled #3339373: Drupal\migrate\Plugin\migrate\source\SourcePluginBase::rewind() is rewinding database statements as a separate issue.
Comment #74
mondrakeCouple of tweaks. Now the \Iterator methods (
current(),valid(),key()etc.) are implemented ONLY in the iterator trait code and not used anywhere by the statement code.In my opinion, these methods should be there only to allow PHP internals to execute the
foreach()loop on the object implementing \Iterator. They should *not* be called directly by consuming code.Comment #75
mondrakeRebased
Comment #76
daffie commentedThe MR looks good.
The IS and the CR need to be updated.
Comment #77
mondrakeUpdated IS and issue title. Appreciate a review before going for the CR update.
Comment #78
daffie commented@mondrake: Could you address my remark on calling markResultsetIterable() twice gives a unexpected result. The rest of the MR looks great.
Comment #79
mondrakeThanks @daffie for finding #78. All comments in MR addressed.
Comment #80
alexpottI still think we are taking the wrong more disruptive approach here. as pointed out in #66 the bit on #62 that fails is actually failing on HEAD it's just not telling us it is because we've incorrectly implemented the API. I think we should fix that here rather then introducing more disruption to correct usages.
Comment #81
mondrakeI do not think we are disrupting anything. We're improving and cleaning up the statement classes, rather. SQLite has code that has been useless for 15 years. Non-core drivers (are there many?) can still use the previous implementations until Drupal 11 comes. But, obviously that is my opinion. Please mark this as won't fix if you disagree so not to spend more time on moonwatching.
Comment #82
alexpottThis code is a big performance improvement so we need to do something. I just think that fixing this should not require us to refactor our statement wrapper classes yet again.
Comment #83
daffie commentedAll code changes look good to me.
I have updated the CR for SQLite.
For me it is RTBC.
Comment #84
alexpottThe current fix still allows the bogus rewind to occur in the migrate code. Doing a no op is not a fix.
Comment #85
alexpottThis MR adds and skips a deprecation. Without an on issue discussion as to whether this is correct here. In my opinion it is not correct. If we are to go the way of the most recent MRs then we need to fix the code that triggers the deprecation in core. Because adding it to the skip list makes it harder for code that triggers it to discover.
Comment #86
mondrakeLet's postpone on #3339373: Drupal\migrate\Plugin\migrate\source\SourcePluginBase::rewind() is rewinding database statements. Solving that will remove the need to skip the deprecation here, and hopefully can be done independently from this issue.
Comment #87
alexpott#86 has been fixed. Now I think we can do the following patch.
Comment #88
mondrake... or, alternatively, fail when a statement is tried to be rewound (deprecation error now, excpetion in d11), which I would prefer. See MR.
Comment #89
daffie commented#3339373: Drupal\migrate\Plugin\migrate\source\SourcePluginBase::rewind() is rewinding database statements has landed and the MR has been updated for it.
Back to RTBC.
Comment #90
alexpottThere's no need for all the change to issue a deprecation or even better an exception. If you iterate twice over a statement that's an error and your code is not working the way you think it is. In my opinion there is no need for an deprecation in this instance - we can move straight to an exception.
But also adding the deprecation/exception is not part of fixing the memory issue that this issue was originally filed for. That's fixed by #87 and it adds a test that proves consistent rewind behaviour on all core drivers.
Comment #91
alexpottHere's a patch that issues a deprecation on rewind without all the API change in the MR.
Comment #92
alexpottLooking at #91 in more depth I see no reason for no result statements to behave differently.
Comment #93
catch#92 looks a lot more straightforward. I think if we wanted to do the broader changes in the MR anyway, we could look at a follow-up task for that.
Comment #94
mondrakePlease correct me if I am wrong...
More conceptual than anything, but if I read correctly #92 will throw deprecation only if a cursor has been fully fetched. But, in #65, we found that
Aren't we getting different results in #92 for the case of full fetching vs. the case of partial fetching? Maybe we need a separate test for that?
Comment #95
alexpott@mondrake so your comment got me to thinking about writing a test for that.
Both the MR and patch fail this test and incorrect trigger a deprecation. I think we should do #87 and add the above test and then, maybe, file a follow-up to think about how to do the deprecation or exception on rewind. DB statements in Drupal have not been rewindable for years and only our error in #3174662: Encapsulate \PDOStatement instead of extending from it made this possible for MySQL and Postgres... but not SQLite.
Comment #96
alexpottWhen you change the test in #95 to assert on actual counts it reveals an existing bug in resumption of looping around a statement in \Drupal\Core\Database\StatementPrefetch - ie. SQLite. I;ve added the test here and added @todo for SQLite because I think adding the test coverage here is more important than fixing the existing bug here.
Comment #97
mondrakeI think this is wrong expectation... you cannot start a
foreach, break, and resume with anotherforeachfrom where you left.Internally PHP always calls (unconditionally, I assume)
rewind()when starting a foreach loop,https://www.php.net/manual/en/iterator.rewind.php
Comment #98
mondrakeWorking on getting the MR green.
Comment #99
mondrakeThe MR now incorporates tests from #96, with the assumption that
The beauty of the MR is that since all the iteration code is in the trait, fixing it there fixes both statment classes in core. I like the trait and am planning to use in in contrib for mysqli and DruDbal drivers, that require their own statement classes anyway (they're not PDO based).
Comment #100
daffie commentedThe added testes look good to me.
Back to RTBC.
Comment #101
mondrakeThank you.
I wonder whether we should use this occasion, we have two new statement classes, also to have a new StatementInterface, fully typehinted, so to finally leave PHP 5 behind.EDIT: Second thinking, not a good idea (yet). It would make sense to do some cleanup of the supported fetch modes first, #3345938: Deprecate support for unused \PDO::FETCH_* modes. That might reshape some methods.
Comment #102
alexpottWhat's interesting about #97 and that test is that it works the same as PdoStatement and it behaves like a cursor. If you change
to
on MySQL and PostgreSQL the test still passes. It's quite a bit harder to get the client statement from SQLite...
So we're choosing to depart from PdoStatement behaviour - which is also fine but should be noted.
The problem that prevents us committing the MR though is highlighted by the test in this test
This test will do something very different on HEAD or with \PdoStatement. The current behaviour in the MR makes it look like there are 3 rows whereas there are 4. Amusingly on HEAD this the final count will be 4 for Postgres and MySQL but it will be 5 for SQLite because of bugs in the StatementPrefetch class. Yes at least the MR is consistent :)
I think that this will be tricky to solve and perhaps rather and deprecating going straight to an exception is warranted because that's the behaviour we seem to actually want. If we want to pursue the deprecation then we need to make it behave as it has before we broke it in #3174662: Encapsulate \PDOStatement instead of extending from it - i.e. we need to make it behave like a \PdoStatement
Comment #103
mondrakeLet's go for an exception in place of a deprecation then, if all agree. The current state after #3174662: Encapsulate \PDOStatement instead of extending from it has been in place for more than 2 years with noone reporting the issue; so trying to go back to the situation prior than that change seems overkill to me.
Comment #104
alexpottDiscussed with @catch who suggested instead of doing an exception in Drupal 10 or a silenced deprecation we should do
in Drupal 10 and then change to an exception in Drupal 11. I think this gives us the best of both worlds. Code that is broken now will have a warning to fix but we won't cause any sites to stop functioning.
Comment #105
mondrakeOK, on that.
Comment #106
mondrakeComment #107
mondrakeComment #108
daffie commentedThe deprecation has been removed.
The IS has been updated.
Back to RTBC.
Comment #109
alexpottAdded a review to the MR. I have some concerns about scope. For example code is removed from \Drupal\sqlite\Driver\Database\sqlite\Statement that is not related to fixing this. We're also changing calls to low level PDO functions that are not part of fixing this change.
Comment #110
alexpottThe conversions from switches to matches could happen in a follow-up.
Comment #111
mondrakeno it is related to fixing this, because that piece of code is doing strange things with the resultset that are incompatible with the new trait that keeps the iterator related properties private on the base class. But yes, that can be removed independently from this, I opened a spin-off for that and we'll have to wait that before continuing here.
TIL
match(). Thanks!Postponed on #3346898: Remove obsolete code from Drupal\sqlite\Driver\Database\sqlite\Statement.
Comment #112
mondrakeMeant to postpone.
Comment #113
mondrakeWell that was fast. Working on @alexpott’s feedback.
Comment #114
mondrakeAddressed @alexpott's feedback inline in the MR, with the exception of two points where I'd like to have a further thinking before doing changes. Thanks!
Comment #115
daffie commentedComment #116
mondrakeComment #117
daffie commentedAll remarks have been addressed.
Back to RTBC.
Comment #118
mondrakeFound an issue with latest changes, which uncovers a lack of test coverage.
Comment #119
mondrakeThe problem was with '0' being confused with FALSE in fetchField, i.e. a valid scalar interpreted as end of recordset.
I was also very tempted to add a trait to split out the code that emulates the conversion, in StatementPrefecth, of data in FETCH_ASSOC format to other FETCH_* modes - would be very useful in contrib for non-pdo based drivers. But I'll leave that to a follow up.
Comment #120
mondrakeSecond thinking, I'm after the trait that will make everything more organised.
Comment #121
mondrakeFixed the CS of the snake_case vs camelCase.
Added a trait to manage emulation of fetch modes, implemented in the StatementPrefetchIterator. Will be very useful for the mysqli driver that is not PDO based and therefore con only emulate PDO’s fetch modes.
Comment #122
alexpottPosted some small nits to the MR review.
@mondrake FetchModeTrait is yet more API - the new trait is another API surface for us to support. Ideally improvements in API are not mixed in with bug fixes. Can't this be part of a follow-up? It's not necessary to fix the bug and unlike the introduction of the other trait we can't even argue that this is a necessary refactor to make iteration of all db statements consistent.
Comment #123
mondrakeOK, I tried. Let me revert that and put in a follow up.
Comment #124
mondrakeDone #122.
Comment #125
mondrakeFiled #3347497: Introduce a FetchModeTrait to allow emulating PDO fetch modes.
Comment #127
alexpottI added one question to the MR otherwise I think we're finally there with this one :)
@mondrake thanks for the recent changes and opening of follow-ups and blockers.
Comment #128
mondrake@VladimirAus can you please explain what was the purpose of your push, and did you use
push --force? Why?I'm working on the test addition.
Comment #129
mondrakeAdded a test. There can be certainly more (for instance, testing that fetchAll* returns the remaining resultset in a partially iterated statement), but test hardening can be postponed to later on IMHO.
Comment #130
mondrakeSo, the test uncovered a bug.
Comment #131
mondrakeMade fixes to the StatementWrapperIterator methods that are 'foreaching' the statement themselves - in a way that if the statement was already traversed fully, they can return an empty array straight ahead instead of erroring - in respect to the API.
Comment #132
daffie commentedFor the 2 remarks on the MR.
Comment #133
mondrakeComment #134
daffie commentedAll changes look good to me.
Back to RTBC.
Comment #135
mondrakeNeeds adjustment after commit of #3313355: Allow the database query log to be dispatched as log events. On it.
Comment #136
mondrakeComment #137
daffie commentedThe reroll looks good to me.
The testbots are all green.
Back to RTBC.
Comment #138
alexpottCommitted c6532d8 and pushed to 10.1.x. Thanks!
Comment #141
mondrakeA @todo that was left over from a previous version of the MR got committed inadvertently. Filed #3348788: Remove @todo leftover of #3265086 to remove it.
Comment #143
bnjmnmAdjusting issue credit as credit is not provided for button-click rebases without additional contribution.