Before drupalcon barcelona, there was an issue with max_allowed_packet while dumping pift_ci_job_results. While Basic increased that variable, it didn't solve the underlying issue:
mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table `pift_ci_job_result` at row: 147068065
Suffice it to say all dev environments are blocked from having new data until this table can be fixed. As an interim, I've committed ee9c69f55274c86e2d1c18e8d2b95f439a4d26ba which will skip exporting data from the table altogether. However, this is a larger issue with the architecture of the pift_ci_job_results itself and probably need to be re-architected.
Some ideas / needs raised during the extended sprints:
- "We need the passing result data because when we make a new test, we need to see that the test actually executed"
- "What about making the whole page a static HTML blob and storing that in the database?"
- "It'd be great if someday we could see which tests are failing on the branch randomly. This would require some type of semantic setup, which couldn't be achieved by storing an HTML blob"
- "Export the data to another file, like JSON"
Right now the table is at 169272738 rows. Marking Critical because of the rapid growth of this table.
Comments
Comment #2
japerryComment #3
japerryComment #4
japerryComment #5
pwolanin commentedputting the HTML blob in a row or (better) in a file loaded in the page callback and exporting JSON, CSV, or something else for the full data per result for later/off-line processing and keeping just file references would be much more scalable so we'd have 1 row per test run in the DB for now.
Comment #6
drummThis was fixedby http://cgit.drupalcode.org/infrastructure/commit/?id=94adf1c86fbd8bbcfba... and new dev sites do work. mlhtest-drupal.redesign.devdrupal.org was built out today and has recent data in
pift_ci_job_result. The dev site DB on disk did increase in size from 26G to 28G, probably a result of increased activity from DrupalCon in both testing and issues. It took 1h to build out, which is slightly faster than dev sites in the last couple weeks, which took 1h10m.The root cause of the previous dev site not being built out was
ERROR 2013 (HY000) at line 24681: Lost connection to MySQL server during query, most likely a simple connection blip between devwww and devdb.Comment #7
drummmlhtest-drupal.redesign.devdrupal.org can be used for sprinting-related work. Please add URLs to /var/www/dev/mlhtest-drupal.redesign.devdrupal.org/comment as issue(s) are worked on, so we know what all is going on there.
Comment #8
drummThat leaves the question of whether the row-per-test data storage makes sense. This isn't a new style of data storage, QA.Drupal.org does the equivalent. We do have a faster pace of testing happening now, and more tests in core than ever.
According to New Relic, SELECT queries on the
pift_ci_job_resulttable are taking 28.4ms on average over the last 7 days, the tallest spikes on the response time graph are 67ms. It does not make the top 20 most time consuming queries for the site. For now, I think this shows it isn't a critical problem, if there is a problem.When first launched, I did have to go through a few iterations on getting the table & keys right, but we are in at least an okay spot for now. The table has a not-too-large covering index for the query that hits it. We should keep an eye on the table's growth. A large number of rows alone isn't necessarily a problem, but is indeed worth investigating.
Comment #9
drummRudy switched this table to be stored in a compressed format awhile ago. That saved us 50% on disk and as far as I know has otherwise been doing well.
I implemented trimming test results from issues that have been Closed (fixed) for 6 months. That's only run once. We have it available to run more-frequently, but it isn't significant savings.
This still leaves us with a lot of rows. A possible next step for issue results would be to not store successes matching the branch job’s results, basically storing only the differences. That would align well with the UI, really you are only interested in the difference.
Comment #10
drummWe are running into disk space issues on staging now. While we can provision more disk, we can’t do that forever.
Comment #14
drummOne more cache clear is needed as the last update starts:
DELETE FROM cache WHERE cid LIKE 'entity_%';This is running slowly and successfully on staging. I’ll be deploying to production shortly.
Remaining work:
Comment #18
drummThis has been running for quite awhile on production. Over 5G has been cleared with only ~6% processed.
With the latest commits, only the diff will be stored for new jobs.
Currently processing is still crawling along with a hook_update_N() running for a very long time. The last part here is making a drush process to keep processing without blocking deployments. And the drush process can load each branch job result once, which should cut up to 1/3 off the processing time.
Comment #19
drummThis is working well. We’ve freed 15G, and are on track to free another 60G.