Docker Remote API calls are timing out if the process ever hits 3 minutes without stdout output.

The container side of the stream closes the transaction.

Possible suspect could be https://github.com/docker/docker/blob/fe53c277857ba77caf9189b1d6d095835c...

Original Report:

During the Portland Sprints we observed that a

./drupalci build:php execution sometimes dies while compiling php the process in the end finishes without problems in the background.

we did not yet find out where the timeout occurs since it's hard to track where exactly the request terminates (most likely due to a timeout error)

output:

vagrant@vagrant-ubuntu-trusty-64:~/drupalci_testbot$ ./drupalci init:base --forcebuild
Executing init:base
Please select the numbers corresponding to which DrupalCI web environments to support. Separate multiple entries with commas. (Default: [4])
  [0] drupalci/base
  [1] drupalci/db-base
  [2] drupalci/php-base
  [3] drupalci/web-base
  [4] all
 > 3
Executing build ...
Building drupalci/web-base container
-------------------- Start build script --------------------
Step 0 : FROM drupalci/php-base

--------------------- End build script ---------------------

Comments

Mixologic’s picture

I've spent a considerable amount of time digging into this issue this week and have it isolated to the following information:

The drupalci script doesnt actually die, its still there in the process table, just not showing any more output from runtests.sh, which is also still running. What *does* happen is that the output stream from runtests.sh gets severed from the drupalci script - most likely caused by a timeout somewhere in the docker php library, however it could be that docker itself, when given a command to run, has a timeout. Whatever connects the output stream of docker to the output stream of dockerci is where the problem likely lies.

Mixologic’s picture

Oops. Ignore that last. Im talking about a different timeout and was confused. I need coffee.

dasrecht’s picture

Wow, great you looked into it. i think @zaporylie also looks into it currently at the montpellier code sprints.

/dasrecht

zaporylie’s picture

Yes, I am, although I havn't experienced that issue yet.

jthorson’s picture

We talked about this on our last call ... we decided it was the timeout on the rest call made by the docker_php library.

Did this not pan out?

Mixologic’s picture

I had adjusted every timeout variable I could find, but none of them appeared to have an impact on the running script. I suspect there may be something else going on in the host operating system.

@isntall has been running the container tests on his local machine with archlinux as the host, and never experiences this timeout issue.

cilefen’s picture

Status: Active » Needs review

I can't reproduce this with the composer update from #2476303: "Can't progress bar past 100%" errors during Drupalci Pull commands. Could that have just fixed it?

jthorson’s picture

Title: ./drupalci build:php times out but completes in background » ./drupalci commands timeout after 180 sec of no output (but completes in background)
Issue summary: View changes
cilefen’s picture

@jthorson That is a suspicious time span.

jaredsmith’s picture

After looking into this further at DrupalCon Los Angeles, it appears that the 180 second timeout is the default timeout in the Go library that Docker depends on. See https://github.com/docker/docker/blob/master/pkg/ioutils/readers.go#L71

  • Mixologic committed 70bdf6a on 2465451-random-stoppage-fix
    Issue #2465451: Fixes random stoppages in docker php library until pull...

  • Mixologic committed 1a003e7 on 2465451-random-stoppage-fix
    Issue #2465451: Added another commit to the pull request that flushes...
Mixologic’s picture

Finally got this one nailed. Turns out this is a bug in the way that the docker php library was handing the stream format that docker provides, which, btw, is a nonsensical mess. (For the curious, search for the word "implementation" on this page: https://docs.docker.com/reference/api/docker_remote_api_v1.19/)

It wasnt handling the format correctly, hitting a loop where it would go until it thought it was at the end of the stream (which just ended up being the same stream point over and over again) and then it would finally timeout after not sending any data. So setting higher timeouts would have just made the blocked socket take longer to respond.

Anyhow, this is fixed, and the changes are made to composer.json and composer.lock file until such time as this pull request gets filled: https://github.com/stage1/docker-php/pull/125 - at which time we can file a follow up and revert back to the canonical version of stage1/docker-php.

isntall’s picture

Status: Needs review » Reviewed & tested by the community

I've reviewed the before & after jobs.
Looks good.

It'll be nice to have full output on all of the docker-php commands.

  • Mixologic committed cf9751b on 2465451-dockerphp-stream-fix
    Issue #2465451: Fixes random stoppages in docker php library until pull...
  • Mixologic committed d1206e0 on 2465451-dockerphp-stream-fix
    Issue #2465451: Added another commit to the pull request that flushes...
Mixologic’s picture

This is merged into production now, we wont see anything until the next time we rebuild the amis or add a composer install step to the jenkins jobs.

isntall’s picture

Status: Reviewed & tested by the community » Fixed

There is a new AMI built with this change included.

We're going to need to keep an eye on the status of the patch and the project.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Mixologic’s picture

Ps, also added a default socket timeout, as some contrib tests were dying with one very long test.