Problem/Motivation

Found via #3377657: Add database query spans to otel traces.

Drupal uses the flood system for user logins.

The database flood backend lazy-creates the database table (using the same pattern as cache tables) which is good.

However, it only does this on writes, not selects.

For cache this is fine because cache misses usually result in a cache write, however a successful login only ever queries the flood table, it never writes to it. This means the flood table stays un-created on new sites until someone gets their password wrong, meaning schema queries against the flood table to check if it's there or not. On logins with the standard profile, this happens once in Flood::isAllowed() and twice after a successful login.

At some point someone will get their password wrong, or the flood table will be used for something else, but until then it's extra database queries every user login.

Steps to reproduce

Proposed resolution

Create the flood database table in Flood::isAllowed() too.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Before:
before screenshot showing queries against information schema in a trace

After:
after screenshot showing no queries against information schema

Issue fork drupal-3410312

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch created an issue. See original summary.

catch’s picture

Status: Active » Needs review
catch’s picture

Priority: Normal » Critical

Unless something very flukey is happening, which I am not ruling out yet, this reduces a core MR test run to 7m48s from ~10m.

catch’s picture

Confirmed that this is removing three queries against information_schema when running performance tests.

I haven't yet confirmed that we're running this query every time we log in a user during tests - if the installer queries flood at all, then it will, if not, it might only affect second or third logins (which the performance tests do).

catch’s picture

Modified StandardPerformanceTest::testLogin() to remove the cache warming.

This shows multiple information_schema queries on the first login in 11.x:
Before image showing multiple information schema queries

vs. with the MR, a single information_schema query, then a create table, then 'normal' flood queries after that:
After image showing table creation but no more information schema queries

The CREATE TABLE takes 46ms on my local machine, but a possible explanation for the faster test runs is that because gitlab is on a ramdisk, table creation is cheaper than running all those extra database queries. Or we're getting lucky with the runners today and it's not a full minute's improvement, but it shows that the performance tests are correctly diagnosing a bug!

Even though the first flood query in a test appears to be the first login, this is still potentially saving test time, because after a successful login, we also DELETE any flood entries for that user (one for the uid, one for the uid + IP if that's also configured), whether or not there are any - so every login until the flood table is created actually results in multiple queries to information_schema. I'm counting three, which will go down to one (plus the table creation).

catch’s picture

Issue summary: View changes
StatusFileSize
new445.01 KB

Before image had the wrong queries expanded, so updating the issue summary.

catch’s picture

Issue summary: View changes
StatusFileSize
new427.45 KB

Better 'before' screenshot too.

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
fjgarlin’s picture

Status: Needs review » Reviewed & tested by the community

I've reviewed the code and checked all the previous comments with performance information. It makes total sense to create the table the first time we query it and don't find it, rather than catching exceptions all the time and not creating the table.

We can see performance gains in the run too, which is a nice side effect too.

Marking RTBC.

mondrake’s picture

Great find!

Personally, I find on-the-fly table creation a rather bad approach. If we had the table created at installation, we wouldn’t have to care about this. On-the-fly table creation breaks transactions for DBs that do not support transactional DDL - not just the transaction where ensureTable is called, also any concurrent transaction from other sessions. But I suppose this train is lost now :(

catch’s picture

@mondrake we've got a few more options now - for example services that need to create tables could subscribe to a module installed event and create the storage then. The main reason for on-the-fly table creation originally was to avoid creating database tables from hook_schema() for services that were using alternative storage - like multiple cache tables when a site was using redis or memcache. Worth a new issue to discuss.

mondrake’s picture

Yeah… I was also thinking events would help in #3397622: Adding GIN and GIST indexes to PostgreSQL databases, to allow database drivers to decorate db objects specifications with db specific features. I will open a new issue to discuss.

catch’s picture

Not sure whether it's enabled by this issue or test runs have been slowly getting faster, but new tests we can mark with @group #slow which are overhanging the end of the test runs #3410396: Add @group slow to ForumTest, HelpTopicSearchTest, ModulesListFormWebTest.

  • larowlan committed f0c16a29 on 10.2.x
    Issue #3410312 by catch: Flood database backend ::isAllowed() should...

  • larowlan committed a3f1255f on 11.x
    Issue #3410312 by catch: Flood database backend ::isAllowed() should...
larowlan’s picture

Version: 11.x-dev » 10.2.x-dev
Status: Reviewed & tested by the community » Fixed

Committed to 11.x and backported to 10.2.x

Nice one!

catch’s picture

This narrowed the query range too much, opened #3410450: Fix random performance test failures. Ironically although I found this looking at performance test query spans, the assertions only count successful queries - if a query triggers an exception, the 'after' event doesn't get fired, so we don't see a strong decrease in the number of queries in the test coverage itself even though less are happening in reality. It might be possible to check if the start and end events match and log some kind of error if they don't to cover that case too.

mondrake’s picture

if a query triggers an exception, the 'after' event doesn't get fired

We may want to add another statement execution event on top of the start/end ones, e.g. StatementExecutionFailureEvent, so that exception handling may dispatch them.

mondrake’s picture

mondrake’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

catch’s picture