Problem/Motivation

DatabaseBackendUnitTest fails currently with PostgreSQL as database backend. Note: #2443647: PostgreSQL: Fix system\Tests\Cache\ChainedFastBackendUnitTest fails for the same reason, so no matter which one we work on first, it will fix the other issue too.

This is because DatabaseCacheTagsChecksum does a select query via the query method, which we do not wrap transactions around. Normally all select queries get the savepoint added and released, which would allow the cachetags table to get created and Drupal can continue on.

Options:

  • Change DatabaseCacheTagsChecksum to use select class?
  • Add transaction handling in DatabaseCacheTagsChecksum (I made a small patch, but seemed to have more failures).
  • Add transaction handling, but only for preg match of select queries in Connection::query() in pgsql driver.

Proposed resolution

Identify and fix the failing tests.

Remaining tasks

Write patch.

User interface changes

None.

API changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mradcliffe’s picture

Issue summary: View changes
bzrudi71’s picture

Status: Active » Needs review
FileSize
839 bytes

+1 for: change DatabaseCacheTagsChecksum to use select class. Seems like the cleanest and best solution to me but still open for other solutions :-)

Berdir’s picture

This is in the critical path just like cache get, where we explicitly avoid using the select method ( see DatabaseBackend).

Why is this a problem but that one isn't? Why doesn't every single test fail because of this? Maybe they implicitly rely on set() being called first?

daffie’s picture

Status: Needs review » Reviewed & tested by the community

It all looks good to me.
I can confirm that the test fails for postgreSQL and with the patch the test passes for postgreSQL.
As far as I can see the code changes but does the same.
So for me it is RTBC.

Berdir’s picture

Status: Reviewed & tested by the community » Needs review

My questions have not been answered yet.

amateescu’s picture

bzrudi71’s picture

Yep, #2371709: Move the on-demand-table creation into the database API seems to be the root cause and should fix this issue without doing quirks within the pgdriver itself. If the try->catch stuff is gone, PG should pass tests...
Thanks @amateescu for the link to the other issue.

andypost’s picture

Issue tags: +needs profiling

Also using select here is performance regression because query alter overhead.
The change should be measured.

mradcliffe’s picture

Would wrapping it in a transaction be worse for performance? If the query is not guaranteed to succeed, then it should be wrapped in a transaction for PostgreSQL (and MySQL strict mode).

I looked into driver-specific code, but I think that's too expensive for PostgreSQL because I had to preg_match all select queries run through query. This is just one query. High-performance sites hopefully will be running a non-database backend for caching.

I think band-aid fixing this now is fine because the other issue stalled out.

bzrudi71’s picture

Usually transactions are not that worse in terms of performance as one would think, so this seems the way to go before we get rid of the root cause in [#2371707]. And please let us not even think of preg_matching in queries any more :-) To make sure this one is good to get in I will start a complete bot run tomorrow before RTBC.

Berdir’s picture

As commented before, I would still like to understand why this fails but the actual backend works, which uses an absolutely identical pattern? And why does it *only* fail in that kernel test, and not everywhere? We have to run into this scenario after *every* installation?

If you look at GenericCacheBackendUnitTestBase::getCacheBackend(), that explicitly calls deleteAll(), which should be not needed, because we should be able to assume that any backend will respect the test environment and be empty already. However, if you look at deleteAll(), it does a truncate query and contains the table-creation logic as well, my guess is that creates the cache table?

But that doesn't explain what creates the for web tests and real installations?

mradcliffe’s picture

I spent about 9 months going around and around debugging the install sending garbage queries before I gave up and wrapped Select queries in implicit commit save points (#2001350: [meta] Drupal cannot be installed on PostgreSQL and #2181291: Prevent a query from aborting the entire transaction in pgsql).

If I had to guess I think that because KernelTestBase doesn't run through the install it doesn't emulate the same behavior, and then it tries to get cachetags table, which fails, aborts the transaction because there's no implicit commit, and the test fails. DatabaseBackendTagTest calls db_select() on cachetags which will be wrapped in the savepoint handling by Select class so that's why that does not fail.

I think you're correct and we could also add a transaction around the actual cache table select to be safe, but the less amount of these transactions the better in my opinion. Could it be possible that a contrib module that wants to help manage cache could be in the same situation, switch to database back end, and try to get cache table immediately? I don't see a specific test for swapping out the cache backends and immediately trying to get cache, but I don't know if we already have coverage for that somewhere else.

I'd rather stick with the approach of adding transaction/savepoint for now until the cache backend schema situation gets a permanent solution (probably not until after 8.1.x development cycle).

bzrudi71’s picture

I vote for this band-aid fix for now. A todo is added and I have no idea how long it will take to fix the root cause. Guess that we will not have it fixed for 8.0.0 :-(

+++ b/core/lib/Drupal/Core/Cache/DatabaseCacheTagsChecksum.php
@@ -108,11 +108,17 @@ public function calculateChecksum(array $tags) {
+        // cache-related tables are not gauranteed to exist.

Nitpick: gauranteed -> guaranteed

Berdir’s picture

Not sure, but anyway, we need to profile this, for example with the internal page cache.

As mentioned, this is one of the 2-3 queries that is executed there, even small changes can have quite an impact there, and it is also a query that is called a lot.

mradcliffe’s picture

Status: Needs review » Needs work

I'll try and get setup with Apache Benchmark and YSlow for phantom.js on Saturday and Sunday. Next step would be to add the transaction handling around the query('select' in the cache DatabaseBackend class as well.

mradcliffe’s picture

I talked with @Crell yesterday about this, and another solution may be to create tagged services for PostgreSQL, MySQL, and SQLite for these back ends so that each backend can extend parent for performance / niche bugs.

Berdir’s picture

Yes, possibly, but that will require manual configuration to actually use those replacement services AFAIK, it won't happen automatically.

It's easier if there's a factory, which is the case for the cache DatabaseBackend (We're adding a mysql specific implementation in #2336627: Deadlock on cache_config (DatabaseBackend::setMultiple()))

andypost’s picture

@mradcliffe please elaborate the idea with tagged services.
I don't get how cachetags table creation relates to cache backend... does it mean a sort of PgsqlCacheBackend that could be useful somehow

alexpott’s picture

These fails are caused in Postgres because setMultiple() is called before anything else within testSetMultiple. If $backend = $this->getCacheBackend(); didn't delete all data from the cache bins under test then we'd get an error on creating the cache table too.

alexpott’s picture

Status: Needs work » Needs review
FileSize
2.97 KB

Patch attached fixes the fails in POstgres and I think will be performance neutral.

Different approach to the earlier patches so no interdiff.

Berdir’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -213,37 +245,11 @@ public function setMultiple(array $items) {
+        // Only pass the values to values since the order of $fields matches the
+        // order of the insert fields.
+        $query->values(array_values($fields));

Not sure I understand the array_values(), I get that we don't need the keys but it doesn't hurt if we pass them in?

If that fixes PostgreSQL, then that's fine with me.

alexpott’s picture

@Berdir it is a performance optimisation - save unnecessary looping...

  public function values(array $values) {
    if (is_numeric(key($values))) {
      $this->insertValues[] = $values;
    }
    else {
      // Reorder the submitted values to match the fields array.
      foreach ($this->insertFields as $key) {
        $insert_values[$key] = $values[$key];
      }
      // For consistency, the values array is always numerically indexed.
      $this->insertValues[] = array_values($insert_values);
    }
    return $this;
  }

If we pass in the keys we have to loop for every call to ->values(). Seems pointless here.

bzrudi71’s picture

Status: Needs review » Reviewed & tested by the community

Nice, we have pass now! @Berdirs concerns seem addressed and I found nothing obvious wrong while reviewing the new approach. Let's move forward here. RTBC

amateescu’s picture

I ran Drupal\system\Tests\Cache\DatabaseBackendUnitTest on SQLite with this patch applied and we're good :)

Test summary
------------

Drupal\system\Tests\Cache\DatabaseBackendUnitTest  160 passes                                      

Test run duration: 2 sec
bzrudi71’s picture

Thanks @amateescu for the additional SQLite test :-)

alexpott’s picture

alexpott’s picture

FileSize
1.25 KB
3.24 KB

After discussing with @catch improved the comments to explain why this function is as it is. Just in case anyone gets the urge to optimise before we have PostgreSQL testbots.

xjm’s picture

Assigned: Unassigned » catch

+1 for the comments and thanks for the SQLite run.

Seems like a catch patch.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.0.x, thanks!

  • catch committed c6e77e5 on 8.0.x
    Issue #2443651 by alexpott, mradcliffe, bzrudi71: PostgreSQL: Fix system...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.