Problem/Motivation

On Friday night, I committed a 12 MB diff to the 8.4.x and 8.3.x branches. Shortly thereafter, VCS integration on d.o stopped working. The commits list shows the 8.4.x commit, but not the 8.3.x one:
https://www.drupal.org/node/3060/commits

Currently has at the top:
https://www.drupal.org/commitlog/commit/2/52e3eec616efab4d5201e30d085a09...

Despite all these commits since:

git log --after=52e3eec616 | pbcopy

commit 679614f0165e9f0735c6ba87d99f3f49e63b77d5
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sun Mar 5 06:13:32 2017 -0600

    Issue #2668752 by mr.baileys, chx, tim.plunkett, Berdir: DefaultPluginManager::setCacheBackend has wrong information about language suffix

commit fdbbf4b673f123d1c568cd86a7f261f7129bb2a9
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 18:08:47 2017 -0600

    Issue #2857780 by vegantriathlete: Correct typo in comment in TagTest.php

commit 59ee98a2626b58ada48810a65e9138d5720e5434
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 15:09:09 2017 -0600

    Issue #2802663 by mpdonadio, Berdir, jibran, jhedstrom, amateescu: Exceptions due ChangedItem, CreatedItem, and TimestampItem implicit dependencies on datetime module

commit 1324a42f0c1303e889e6567337da6e6d430b71ed
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 14:07:24 2017 -0600

    Revert "Issue #2802663 by mpdonadio, jibran, Berdir, jhedstrom, amateescu: ChangedItem, CreatedItem, and TimestampItem fields have an implicit dependency on datetime module"
    
    This reverts commit cd176e1d2438e925950fd2d2e256fb8d6c57980f.

commit 9ee87105b9a5fe5ccf771a6af6ef1ff38590c23c
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 14:01:37 2017 -0600

    Issue #2856364 by drpal, Wim Leers: Reposition Quickedit overlay on toolbar orientation change

commit cd176e1d2438e925950fd2d2e256fb8d6c57980f
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 13:52:45 2017 -0600

    Issue #2802663 by mpdonadio, jibran, Berdir, jhedstrom, amateescu: ChangedItem, CreatedItem, and TimestampItem fields have an implicit dependency on datetime module

commit fa39f06433eebd7f2b52bc48a33a7fba05d6f150
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 12:42:17 2017 -0600

    Issue #2361539 by alexpott, webflo, AjitS, himanshu-dixit, tameeshb, xjm: Config export key order is not predictable for sequences, add orderby property to config schema

commit 47ec7f7d4f0146e1de578457028fbbf52ff1cfd0
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 12:27:34 2017 -0600

    Issue #2600836 by balagan, Sagar Ramgade, Chi, anil280988, joshi.rohit100, Jo Fitzgerald, vaplas, dawehner, alexpott, klausi: Make protected Simpletest test methods public for consistency

commit cd8aa44faea8c0d067664740b042be49da80e86b
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 12:06:48 2017 -0600

    Issue #2845731 by ashish-deynap, vaplas, chiranjeeb2410, Munavijayalakshmi, tramsaal, xjm: Theming guide links are wrong

commit 91f0f7b1268d569fc69f4ed07c2da26d5778f07e
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 09:20:03 2017 -0600

    Issue #2857410 by tim.plunkett, DamienMcKenna: Adding a page-level #prefix or #suffix causes it to be output twice on one-col layout

commit f621a88dd0bb760c74142125cda9056b3afd1ce8
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 09:04:49 2017 -0600

    Issue #2857714 by klausi: Upgrade Coder to 8.2.11

commit 72d65cc85cd3a71526f703ca480a2aae14fb65dc
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Sat Mar 4 08:02:01 2017 -0600

    Issue #2857822 by alexpott, klausi: Fix coding standards issues introduced mostly by array syntax conversion

commit 52e3eec616efab4d5201e30d085a09ba5c4817e6
Author: xjm <xjm@65776.no-reply.drupal.org>
Date:   Fri Mar 3 19:20:24 2017 -0600

    Issue #2776975 by joelpittet, dawehner, tim.plunkett, xjm, pfrenssen: March 3, 2017: Convert core to array syntax coding standards for Drupal 8.3.x RC phase

commit 3aba956a39a4852f42cad8483e4e6e307bedbd0e
Author: Alex Pott <alex.a.pott@googlemail.com>
Date:   Fri Mar 3 16:50:40 2017 +0000

    Issue #2666166 by jsst, Lendude, Andrej Galuf: View with not required relationship and aggregation enabled fails to execute

Additionally, neither that issue nor any other has had an automated commit comment posted since.

I don't want to just assume that it's the 12 MB patch, but… Was at the 12 MB patch? Did I break it?

Proposed resolution

?

Remaining tasks

?

FWIW, we have another large coding standards patch on the way (only 1 MB rather than 12): #2854529: Fix Drupal.Scope.MethodScope - all methods should have scopes

CommentFileSizeAuthor
#13 2857960.patch809 bytesdrumm
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

xjm created an issue. See original summary.

xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
drumm’s picture

Assigned: Unassigned » drumm
drumm’s picture

I haven’t been able to track down the error yet. I’m switching to trying to reproduce this in staging. My suspicion is something that handles the list of changed files in the commit OOMed. This could take a few hours.

Since the Git repo is working, please don’t pause committing for this. This also takes out the testing on commit. I queued up https://www.drupal.org/pift-ci-job/614401 manually, you may want to queue more.

In general, when a project gets out of sync, we don’t catch up the commit comments. We do usually get the Git history on Drupal.org back in line, but that OOMs for core. It can be manually populated with a little work.

drumm’s picture

I remembered we have versioncontrol_sync_log to potentially help here. I reviewed and found nothing useful.

I also broke the repo sync lock with UPDATE versioncontrol_repositories SET locked = 0 WHERE repo_id = 2;, so the next commit will have a full shot at syncing.

drumm’s picture

Title: d.o VCS integration died » Avoid broken locks in commit processing
Priority: Major » Normal

Looks like the recent commits are going through normally. The problem was most likely an out of memory error when processing the large commit, leaving the lock open after the process died.

drumm’s picture

mlhess and I have been making decent progress getting Git staging into a good state for testing. Commit hooks are working. We still need to resize a disk that filled up and test everything.

drumm’s picture

Staging’s Git is working well now.

drumm’s picture

Yesterday’s missed Git event sync happened after a Git event that took 109.03 seconds to process. It was retried 4 times while the previous event was processed, but in quick succession within 3.6 seconds.

https://cgit.drupalcode.org/versioncontrol/tree/includes/VersioncontrolR... is what needs some work. Beanstalk does support queue item delays, which would be appropriate here, https://cgit.drupalcode.org/drupalorg/tree/drupalorg/drupalorg.drush.inc....

Need to decide on:

  • Hard-code Beanstalkd awareness into versioncontrol. Not technically correct, gets the job done.
  • DrupalQueue has static variables and/or common object instances in it, have drupalorg sneak the beanstalkd_params in. Might be tricky.
  • Implement our own queue class on top of Beanstalk that does this. Technically correct technical debt.
  • Just put a sleep() call in. Queue portability?
drumm’s picture

We now have zero locked repositories with the new layer of reposync code for GitLab support, and are monitoring for new locked repositories. Since the cutover, we haven’t had any reposync issues logged.

This won’t fix the underlying race condition, but we’ll know it when it happens, and be more-able to recover these. The words I said in #10 still look good.

drumm’s picture

Project: Drupal.org infrastructure » Version Control API
Version: » 7.x-1.x-dev
Component: Git » API module
Issue tags: +affects drupal.org

Hard-code Beanstalkd awareness into versioncontrol.

Going with this, since it is straightforward.

drumm’s picture

Status: Active » Needs review
FileSize
809 bytes

If the queue is Beanstalkd, this adds a configurable delay to the followup attempts. By default 5 seconds for the first attempt, then 10 for the second, etc.

drumm’s picture

This is now deployed on Drupal.org, we’ll see how effective it is over a few weeks.

In general, locked repositories were encountered much less frequently since the move to GitLab:

SELECT count(DISTINCT elid) events, substring(from_unixtime(start), 1, 7) month FROM versioncontrol_sync_log WHERE message LIKE 'Locked repository:%' AND start > unix_timestamp('2018-07-01') GROUP BY month;
+--------+---------+
| events | month   |
+--------+---------+
|     50 | 2018-07 |
|     51 | 2018-08 |
|     51 | 2018-09 |
|     72 | 2018-10 |
|     36 | 2018-11 |
|     56 | 2018-12 |
|     65 | 2019-01 |
|    215 | 2019-02 |
|    135 | 2019-03 |
|     23 | 2019-04 |
|     10 | 2019-05 |
|      9 | 2019-06 |
|     10 | 2019-07 |
|      4 | 2019-08 |
+--------+---------+

I’m doing a full sync on all 55 non-core repositories that hit a sync lock since April as a precaution. This is a bit overkill, since some of these would have synced on one of the 5 retries, it doesn't hurt to sync again.

drumm’s picture

We finally hit a couple syncs that ran into locked repositories, and this is working well.

A branch was updated on a project, which took 20 seconds. In the meantime, 7 seconds later, a tag was pushed too. That hit the lock, was retried 6 seconds later, also hit the lock, and was retried successfully 9 seconds after that.

#3078948: Git tags not available for new release creation hit a hard lock, the syncing crashed out with OOM. The retries were doomed, but they did get spaced out nicely.

  • marvil07 committed 15a9733 on 7.x-1.x authored by drumm
    Issue #2857960 by drumm: Avoid broken locks in commit processing
    
marvil07’s picture

Status: Needs review » Fixed

@drumm, thanks for working on this.
I agree with your suggested solution: it seems like a non-usual case, and beanstalk awareness is the easier way to get it working.
This is now on 7.x-1.x.
We may want to wrap the condition with a module_exists() with the module providing the class.

drumm’s picture

Thanks!

I double checked and … instanceof ClassThatDoesNotExist just returns FALSE and does not throw any notices/etc, so I think this is fine without extra double checking.

marvil07’s picture

@drumm Thanks for checking, it gives peace of mind :-)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.