Before drupalcon barcelona, there was an issue with max_allowed_packet while dumping pift_ci_job_results. While Basic increased that variable, it didn't solve the underlying issue:

mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table `pift_ci_job_result` at row: 147068065

Suffice it to say all dev environments are blocked from having new data until this table can be fixed. As an interim, I've committed ee9c69f55274c86e2d1c18e8d2b95f439a4d26ba which will skip exporting data from the table altogether. However, this is a larger issue with the architecture of the pift_ci_job_results itself and probably need to be re-architected.

Some ideas / needs raised during the extended sprints:

  • "We need the passing result data because when we make a new test, we need to see that the test actually executed"
  • "What about making the whole page a static HTML blob and storing that in the database?"
  • "It'd be great if someday we could see which tests are failing on the branch randomly. This would require some type of semantic setup, which couldn't be achieved by storing an HTML blob"
  • "Export the data to another file, like JSON"

Right now the table is at 169272738 rows. Marking Critical because of the rapid growth of this table.

Comments

japerry created an issue. See original summary.

japerry’s picture

Issue summary: View changes
japerry’s picture

Issue summary: View changes
japerry’s picture

Issue summary: View changes
pwolanin’s picture

putting the HTML blob in a row or (better) in a file loaded in the page callback and exporting JSON, CSV, or something else for the full data per result for later/off-line processing and keeping just file references would be much more scalable so we'd have 1 row per test run in the DB for now.

drumm’s picture

Priority: Critical » Normal

While Basic increased that variable, it didn't solve the underlying issue:

mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table `pift_ci_job_result` at row: 147068065

Suffice it to say all dev environments are blocked from having new data until this table can be fixed.

This was fixedby http://cgit.drupalcode.org/infrastructure/commit/?id=94adf1c86fbd8bbcfba... and new dev sites do work. mlhtest-drupal.redesign.devdrupal.org was built out today and has recent data in pift_ci_job_result. The dev site DB on disk did increase in size from 26G to 28G, probably a result of increased activity from DrupalCon in both testing and issues. It took 1h to build out, which is slightly faster than dev sites in the last couple weeks, which took 1h10m.

The root cause of the previous dev site not being built out was ERROR 2013 (HY000) at line 24681: Lost connection to MySQL server during query, most likely a simple connection blip between devwww and devdb.

drumm’s picture

mlhtest-drupal.redesign.devdrupal.org can be used for sprinting-related work. Please add URLs to /var/www/dev/mlhtest-drupal.redesign.devdrupal.org/comment as issue(s) are worked on, so we know what all is going on there.

drumm’s picture

Title: pift_ci_job_result table gets too large » Is pift_ci_job_result table getting too large?
Component: Development Environments » Servers

That leaves the question of whether the row-per-test data storage makes sense. This isn't a new style of data storage, QA.Drupal.org does the equivalent. We do have a faster pace of testing happening now, and more tests in core than ever.

According to New Relic, SELECT queries on the pift_ci_job_result table are taking 28.4ms on average over the last 7 days, the tallest spikes on the response time graph are 67ms. It does not make the top 20 most time consuming queries for the site. For now, I think this shows it isn't a critical problem, if there is a problem.

When first launched, I did have to go through a few iterations on getting the table & keys right, but we are in at least an okay spot for now. The table has a not-too-large covering index for the query that hits it. We should keep an eye on the table's growth. A large number of rows alone isn't necessarily a problem, but is indeed worth investigating.

drumm’s picture

Rudy switched this table to be stored in a compressed format awhile ago. That saved us 50% on disk and as far as I know has otherwise been doing well.

I implemented trimming test results from issues that have been Closed (fixed) for 6 months. That's only run once. We have it available to run more-frequently, but it isn't significant savings.

This still leaves us with a lot of rows. A possible next step for issue results would be to not store successes matching the branch job’s results, basically storing only the differences. That would align well with the UI, really you are only interested in the difference.

drumm’s picture

Title: Is pift_ci_job_result table getting too large? » Store only diff of results with branch for issue tests
Project: Drupal.org infrastructure » Project issue file test
Version: » 7.x-3.x-dev
Component: Servers » Code
Assigned: Unassigned » drumm
Category: Bug report » Feature request

We are running into disk space issues on staging now. While we can provision more disk, we can’t do that forever.

  • drumm committed 3518f7f on 2575797-result-diff
    Issue #2575797: Store only diff of results with branch for issue tests,...

  • drumm committed 8ef1b40 on 2575797-result-diff
    Issue #2575797: Clear the schema cache
    

  • drumm committed 3518f7f on 7.x-3.x
    Issue #2575797: Store only diff of results with branch for issue tests,...
  • drumm committed 8ef1b40 on 7.x-3.x
    Issue #2575797: Clear the schema cache
    
drumm’s picture

One more cache clear is needed as the last update starts: DELETE FROM cache WHERE cid LIKE 'entity_%';

This is running slowly and successfully on staging. I’ll be deploying to production shortly.

Remaining work:

  • Store new test results as diffs.
  • Update results in email notifications to show the diff.

  • drumm committed 60157c3 on 7.x-3.x
    Issue #2575797 by drumm: Adjust OOM protection
    

  • drumm committed dffa027 on 7.x-3.x
    Issue #2575797 by drumm: Refactor result fetching for UIs to use common...

  • drumm committed d2ec7ec on 7.x-3.x
    Issue #2575797 by drumm: Store diff for new results
    
drumm’s picture

This has been running for quite awhile on production. Over 5G has been cleared with only ~6% processed.

With the latest commits, only the diff will be stored for new jobs.

Currently processing is still crawling along with a hook_update_N() running for a very long time. The last part here is making a drush process to keep processing without blocking deployments. And the drush process can load each branch job result once, which should cut up to 1/3 off the processing time.

drumm’s picture

Status: Active » Fixed

This is working well. We’ve freed 15G, and are on track to free another 60G.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.