Posted by xjm

Yesterday, I queued all the 8.x RTBC patches for retesting, since most of them were more than a week old.

  • 108 patches were requeued.
  • The queue did not clear for 18 hours.
  • We started with 9 bots and were down to 6 after 8 hours. They were fixed overnight.
  • Roughly 10% of the patches failed due to testbot environment issues (failed to clear checkout directory, database tables not found, could not write to config, etc.) and had to be requeued.
  • 2 other patches had other random fails.
  • One bot went on a mini-rampage, repeatedly failing patches, and had to be disabled manually.
  • A couple dozen other patches entered the queue over the course of the day.
  • 9 patches (outside of the RTBCs) were redundant (superseded by a later patch on the same issue) so I canceled testing on them to take them out of the queue.

Among the first page of 50 RTBCs (2-5 weeks old):

  • 8 failed to apply.
  • 4 applied but failed tests legitimately (would have broken HEAD).
  • 7 were set back to NW for missing core gates requirements.

Conclusions:

  • An external browser script to retest all RTBCs once a week is probably a bad idea, since it would have all the problems described above but without a QA admin to keep an eye on things.
  • A weekly QA/PIFR cron job to retest all RTBCs once a week is also probably a bad idea for the same reasons.
  • A weekly anything to retest more than core RTBCs is most likely out of the question.
  • project_issue doesn't expose the data we need to create a truly first in, first out listing, so we need a mechanism that doesn't bump some but not all issues.
  • What might work is a nightly testbot cron job to retest RTBC patches between 7 and 8 days old. Since testbot only posts a comment if the patch fails, this would also prevent the reordering of the RTBC queue.
CommentFileSizeAuthor
#1 sorry_testbot.png20.76 KBxjm

Comments

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

Issue summary: View changes

Updated issue summary.

xjm’s picture

StatusFileSize
new20.76 KB

Also, related:
sorry_testbot.png
Sorry, testbot. :)

jthorson’s picture

Some related thoughts ...

- While testing is a time-consuming process, simply verifying whether a patch still applies is much less resource-intensive ... running an 'patch conflict' detection script on a daily basis (and automatically flagging those which need reroll) should be more feasible than a full test run.

- An ongoing retesting process doesn't necessarily have to have the same loading concerns over the long run. Testing each patch 'x' days after it was originally tested would result in a somewhat statistical distribution of the retests, and adding a variable portion to the retest period (i.e. random time between 6 and 8 days) will help smooth out any bumps caused by batch events.

- This could be further mitigated by implementing some sort of adjustable 'test priority' mechanism, which could be used to tweak the order that tests are picked up by the system. For example, prioritizing contrib before core helps get the 'short quick tests' out of the way before the longer core runs; we could likewise prioritize 'first test' runs before 'retest' runs and only process the retests when there is idle/available capacity. Such a mechanism could also provide the 'first in, first out' ordering you mentioned (ordering by test_id provides this).

- Give me a heads-up in advance next time you queue up 100+ tests! ;) (I was actually available all weekend to help deal with the fallout, but let a nasty cold keep me on the couch instead of in front of the computer.)

boombatower’s picture

Perhaps I read this too quickly, but we had a auto-rest mechanism that used to retest RTBC + Needs review every couple of days, but was disabled when we migrated to d6 and what not due to query running too slow. jthorson did some work to bring it back (as I recall), is that not the case?

Having priorities makes sense, but curious if we can get something running more quickly like we used to have.

rfay’s picture

Just FYI, as @boombatower points out, this used to work and #675460: Ensure/Refactor cron_retest() query to only re-test the last file on an issue; Automatically retest RTBC patches is the issue about getting it working again.

So this is a dup. But a very important one, so I won't close it.

rfay’s picture

Just FYI, as @boombatower points out, this used to work and #675460: Ensure/Refactor cron_retest() query to only re-test the last file on an issue; Automatically retest RTBC patches is the issue about getting it working again.

So this is a dup. But a very important one, so I won't close it.

jthorson’s picture

I did get it working, but it required a database change ... and I didn't want to go mucking with the table structure leading up to Denver ... and Munich ... and Sydney ... and Feature freeze ... and Feature freeze II ... you get the idea.

I think the D7 drupal.org upgrade might be an opportune time to make any table changes that are needed for this, and some other potential low-hanging PIFT fruit.

boombatower’s picture

Sure. Also sounds scarily familiar.

xjm’s picture

Awesome. <3

One note I had (hopefully this isn't too much in the weeds at this point) was about this:

While testing is a time-consuming process, simply verifying whether a patch still applies is much less resource-intensive ... running an 'patch conflict' detection script on a daily basis (and automatically flagging those which need reroll) should be more feasible than a full test run.

I'm somewhat hesitant about this unless we expose the last time the full test suite ran, and were careful to communicate that "this applies" ≠ "ship it". The reason is that there can be patches that still apply, but break stuff. (5-10% ish of patches that are at least two weeks old in my one datapoint above.) :)

Thanks!

jthorson’s picture

Re #8: True ... the intent would be to flag 'this no longer applies' patches, and do nothing with the ones that still do.

jthorson’s picture

Issue summary: View changes

Updated issue summary.

yesct’s picture

I think if it does apply, we *do* want to know if it still passes tests.

yesct’s picture

Issue summary: View changes

Removing myself from the author field so that I can unfollow issues. --xjm

jthorson’s picture

Adding Dev Tools Priority tag.

mgifford’s picture

Coming from #2201839: Remind Folks About Proper Protocol for Retesting Issues in on Confirmation Page I'm trying to think about how to best address this.

I very much believe that computer time is abundant and people time is scarce.

The more we can count on bots to retest stale patches, the faster we can have have people re-roll the patches and submit them again.

I remember a DrupalCon Panel about git ages ago with discussions about mythical adoptions of unicorn, pegasus & pegacorn systems. The vision at that time was that we could potentially build systems such that we didn't need to re-build patches all the time simply because Core changed. That seems to have been harder to implement than it was to dream about.

Anyways, we can do a lot to shape human behavior, but we also need to be investing in infrastructure such that we can rely on computers for what they are good for.

Do we need to put more CPU's towards solving this? It would be useful to see how long the issue queue wait line is. Maybe on the retest confirmation page we could actually show how many issues are sitting in the queue and the estimated time before this patch is tested.

Seeing the load on the servers and the length of the queue would matter. Right now though I never would have known that it were so high. 10 minutes per test?

jthorson’s picture

This implementation is already under development, and is next on the testbot priority list (once we unblock patch testing on PHP 5.4).

Automatic patch conflict detection and reroll is also on the roadmap for the next generation of the automated testing infrastructure. It's not the case that it's much harder to implement ... it's more due to the fact that the group of people supporting (and developing) the testing infrastructure has averaged about half a body for the last number of years.

The issue queue size is available on qa.drupal.org. On any particular day, we have outstanding capacity available to be used ... the typical wait time to start testing is ~2 min, and the typical test run time is ~2 hours. Some historical metrics on the testbot queue are available at https://groups.drupal.org/node/372153

mgifford’s picture

Thanks @jthorson

Really like the graphs!

jthorson’s picture

jthorson’s picture

Currently retesting passed RTBC patches every 24 hours. Tests are re-queued at a rate of up to 5 per every 15 minutes ... this may be subject to tweaking as we monitor the testbot queue.

xano’s picture

jthorson++++++++++++++++++++++++ and puppies!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.