Problem/Motivation

I'm trying to fix D7 tests with SQLite in #3172878: Fix SQLite tests in Drupal 7.

It looks like enabling WAL in SQLite may be essential to allow the level of concurrency we use in drupalci (see below); it's a good thing to do anyway (#3172877: Enable WAL journal mode by default for SQLite database [D7 backport]).

However, when we do that in D7 core, it seems that testbot can no longer open the results database to generate the report at the end of a run.

I believe the problem is due to permissions on the directory; when using WAL the directory that the SQLite database is within needs to be writeable for the client to be able to run queries. This is because SQLite will create some temp files that sit alongside the db file, and if the directory is not writeable it can't do so.

See, for e.g. https://stackoverflow.com/a/20372658

The error we're getting is, e.g.:

https://dispatcher.drupalci.org/job/drupal_d7/94138/console

18:39:45 JavaScript 117 passes, 0 fails, and 0 exceptions
18:39:46 
18:39:46 Test run duration: 2 min 16 sec
18:39:46 
18:39:46 Attempting to connect to database server.
18:39:46 Database is active.
18:39:46 chown -R 1001 /var/www/html
18:39:46 chown -R 1001 /var/www/.composer/cache
18:39:46 chmod -R 777 /var/www/html
18:39:50 
18:39:50 In JunitXmlBuilder.php line 47:
18:39:50                                                                    
18:39:50   SQLSTATE[HY000]: General error: 14 unable to open database file 

Without WAL enabled in SQLite I think the tests quickly hit locking problems e.g. https://www.drupal.org/pift-ci-job/1830033

AJAX forms on cached pages 33 passes, 0 fails, and 0 exceptions
PDOException: SQLSTATE[HY000]: General error: 5 database is locked in /var/www/html/includes/database/prefetch.inc:168
Stack trace:
#0 /var/www/html/includes/database/prefetch.inc(168): PDOStatement->execute(Array)

Steps to reproduce

That error happens when \DrupalCI\Build\Artifact\Junit\JunitXmlBuilder::generate() tries to run a SELECT query on the results database.

https://git.drupalcode.org/project/drupalci_testbot/-/blob/production/sr...

  public function generate(array $test_groups, DatabaseInterface $results_database) {
    $mapped_results = [];

    $db = $results_database->connect($results_database->getDbname());

    $q_result = $db->query('SELECT * FROM simpletest ORDER BY test_id, test_class, message_id;');  // <= line 47

...and I believe that's because the directory containing the file is not writeable by the current user at that point.

Some more context; I put some debug code in just before this SELECT query and copied the db file elsewhere to make sure it existed and contained the expected data.

If I open that file with the sqlite3 client and run a SELECT, some temp files are created - e.g.:

testbot@drupalci:/var/lib/drupalci/workspace$ ll
total 24712
-rw-rw-rw- 1 testbot  testbot  12636160 Sep 25 11:03 db.sqlite
-rw-rw-rw- 1 testbot  testbot     32768 Sep 25 11:06 db.sqlite-shm # <== temp file
-rw-rw-rw- 1 testbot  testbot         0 Sep 25 11:06 db.sqlite-wal # <== temp file

If we change the permissions on the directory so that it's not writeable by testbot, and try to run a SELECT:

testbot@drupalci:/var/lib/drupalci/workspace$ chmod a-w .
testbot@drupalci:/var/lib/drupalci/workspace$ sqlite3 db.sqlite 
SQLite version 3.8.7.1 2014-10-29 13:59:56
Enter ".help" for usage hints.
sqlite> SELECT * FROM simpletest ORDER BY test_id, test_class, message_id;
Error: unable to open database file

Proposed resolution

I believe one way to fix this would be to ensure that the directory the db sits in is writeable before we try to query it for the final results.

There are probably different ways to do this - you could move the file to somewhere writeable like /tmp perhaps?

Or there are other places in the code that try to "open up the permissions" e.g. \DrupalCI\Build\Build::cleanupBuild():

    // Open up permissions on containers.
    $uid = posix_getuid();
    $environment = $this->container['environment'];
    $commands = [
                 'chown -R ' . $uid . ' ' . $environment->getExecContainerSourceDir(),
                 'chown -R ' . $uid . ' ' . $environment->getContainerComposerCacheDir(),
                 'chmod -R 777 ' . $environment->getExecContainerSourceDir(),
                ];
    $result = $environment->executeCommands($commands);
    $db_container = $environment->getDatabaseContainer();
    $db_dir = $this->container['db.system']->getDataDir();
    $commands = [
      'sudo chown -R ' . $uid . ' ' . $db_dir,
      'chmod -R 777 ' . $db_dir,
    ];
    if (!empty($db_container)) {
      $result = $environment->executeCommands($commands, $db_container['id']);
    }

Maybe we could make those permissions changes on the db_dir earlier?

I think the fix needs to happen in the testbot code rather than in D7 core if we're going to be able to use WAL at all (and I think we need to).

Remaining tasks

  • Figure out an acceptable way to make the results db accessible to generate the final report, and implement it.

User interface changes

n/a

API changes

n/a

Data model changes

n/a

CommentFileSizeAuthor
#2 3173146-2.patch1.01 KBmcdruid

Comments

mcdruid created an issue. See original summary.

mcdruid’s picture

StatusFileSize
new1.01 KB

Not sure if this is the ideal solution, but seems to have got it working for me.

mcdruid’s picture

Status: Active » Needs review
mcdruid’s picture

Issue summary: View changes
mcdruid’s picture

AFAICS this works with just the chown - seems the chmod -R 777 is not necessary.

I think the directory is owned by www-data rather than testbot, and that's the problem.

joseph.olstad’s picture

Project: DrupalCI: Drupal.org Testing Infrastructure » DrupalCI: Test Runner
Component: Code » Documentation

I believe this is the issue queue that is most active for this type of ci testing issue

joseph.olstad’s picture

Component: Documentation » Testrunner Tests
Category: Task » Support request
Mixologic’s picture

Category: Support request » Bug report
Status: Needs review » Fixed

I reviewed and tested this, deployed it to production. Drupalci now has this fix.

Thanks McDruid!

joseph.olstad’s picture

Go @McDruid Go!

lots of core patches to go yet! I was hoping for the variable_get stale cache fix to go in (asap) gotten bit too many times with this issue:
#973436-136: Overzealous locking in variable_initialize()

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.