Create a vc_git-compliant update hook that will not OOM when trying to update all repository default branch info [#1653262]

Comment	File	Size	Author
#6	git-read-branches.sh_.txt	430 bytes	damien tournoud
#6	git-branch-data.txt.gz	199.48 KB	damien tournoud

Comment #1

we/he/they

commented 23 June 2012 at 06:15

Sounds like this might be useful in the future, so if we're building it from scratch, I'd be in favor of building it such that it could be reused if possible. However, if that adds extra complication and delay, please ignore me. ;)

Thanks,
-Derek

Log in or register to post comments

Comment #2

damien tournoud commented 23 June 2012 at 08:19

Are we talking about reading and writing to the actual Git repositories or to the Drupal objects representing them? If the former, what is required?

Log in or register to post comments

Comment #3

eliza411 commented 23 June 2012 at 18:38

Issue tags:

+git deployment blocker

Tagging

Log in or register to post comments

Comment #4

sdboyer commented 24 June 2012 at 13:59

@dww - yes, it could and will be useful in the future. for sure. unfortunately figuring out a generic pattern for communicating with them is a little tricky, so we're just gonna one-off it here.

@DamZ - direct repository read, and database write. from the update hook, we need to enqueue one job for each of the 19000+ repositories. that job reads the current symbolic ref in HEAD from disk, then writes it to the db. preferably, we would also verify that the master branch still exists, as it does not for a number of repositories, then pick another of the available branches (it wouldn't be hard to come up with a rough canonicality metric), and set that as the default if master has been deleted, THEN update the db accordingly.

at least, that's what we have to do to be complete about it. if we were to skip the latter step, it would be pretty easy to just grep all .git/HEAD files for anything which is not set to master (only a scant few are, core included), write 'master' to the db for the rest, then just let people sort out re-setting their own default branches accordingly. i would prefer not to do that, though, since we'd be knowingly creating an inconsistent data state that the application can't normally create.

ultimately the concern is that simply loading 19000 fully-classed repository objects, then running operations directly on them within the update function, could run OOM. popping up the memory limit might be enough of an answer to that question for us to do it sloppily without enqueueing, but we'll need to figure out just what the memory cost is. and to dww's original point, i don't want to have to artificially increase the memory limit every time we run an operation against all repos.

Log in or register to post comments

Comment #5

damien tournoud commented 24 June 2012 at 15:03

I assume this could be done using a standard multi-step update function. But it would be even better just to load this metadata in one go from the filesystem and save it somewhere so that we can run the update path over and over again without needing to access the actual repositories.

Log in or register to post comments

Comment #6

damien tournoud commented 24 June 2012 at 15:25

Status	File	Size
new	git-branch-data.txt.gz	199.48 KB
new	git-read-branches.sh_.txt	430 bytes

I took five minutes to write a small script that does that. Here is the script and the raw HEAD+branches data from our repositories.

Log in or register to post comments

Comment #7

sdboyer commented 24 June 2012 at 22:51

oh cool, that helps. handy that we don't use packed refs (yet), so we can still get away with just reading what's under refs/. we don't need the full branch list, only whether or not the current default branch exists, as vcapi already has all that info, and can also access the info on whether or not a branch has a release associated with it. we need that info to make the smartest decision about what to set as the new default branch. my preferred criteria would be, in decreasing importance, a) that the branch has a release, b) that the branch is for D7, and c) that the branch has the highest available major version number. we've got a nontrivial number of repos to do this for:

grep '^project.*master;false' git-branch-data.txt | wc -l
1493

just about as many sandboxes to do, though the first criteria doesn't apply there.

grep '^sandbox.*master;false' git-branch-data.txt | wc -l
1757

we'll still need to enqueue jobs for all those repositories which need to have that their default branch updated, but since we'll have a full list of all the repos to hit, we'll be able to single-load them all and enqueue the jobs, so no risk of memory explosion.

the list of repos which have non-master branches as the default is much smaller, and we can just manually map those.

grep ';true' git-branch-data.txt | grep -v master | wc -l
9

Log in or register to post comments

Comment #8

sdboyer commented 25 June 2012 at 00:31

Status:

Active

» Needs work

hard part's done though, thanks Damien - i can take care of the update hook.

Log in or register to post comments

Comment #9

sdboyer commented 25 June 2012 at 00:43

heh, turns out the mem usage isn't actually that bad - loading all 19k repos at once eats about 90MB. oh well :) then again, that's just loading the repos, not loading all of their branches or anything. doesn't obviate the need for a better strategy on this in the long term, either. some sort of multi-part or batch strategy.

Log in or register to post comments

Comment #10

sdboyer commented 25 June 2012 at 16:36

Title:	Create a vc_git-compliant update hook that will not OOM when trying to update all repositories	» Create a vc_git-compliant update hook that will not OOM when trying to update all repository default branch info
Status:	Needs work	» Postponed

commushed, with a hardcoded path to a local version of that file. once we're ready to deploy, i'll regenerate the file so that it's as up to date as possible then ensure it's in place.

marking postponed so that we don't forget to do that on deploy.

Log in or register to post comments

Comment #11

damien tournoud commented 25 June 2012 at 21:33

Did you meant to commit the data file too?

Log in or register to post comments

Comment #12

sdboyer commented 26 June 2012 at 00:54

nope, intentionally left it out. i'm going to generate it just before we start and drop it in the repo then.

Log in or register to post comments

Comment #13

sdboyer commented 26 June 2012 at 00:59

meh, actually, no real harm putting it in now. commushed the script and a current data dump in there. we can remove them from the repo after this update is done.

Log in or register to post comments

Comment #14

23 May 2014 at 18:23

Commit 53e8008 on vcapi-deps, 7.x-3.x, 1548064-support-new-apachesolr, 7.x-3.x-dev by sdboyer:
```
Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...
```
Commit 6b98219 on vcapi-deps, 7.x-3.x, 1548064-support-new-apachesolr, 7.x-3.x-dev by sdboyer:
```
Issue #1653262: Add the branch data and data-generating script.
```

Log in or register to post comments

Comment #15

9 July 2014 at 20:13

sdboyer committed 53e8008 on 2299191-beta_project_issue_project_searchapi

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2299191-beta_project_issue_project_searchapi
```
Issue #1653262: Add the branch data and data-generating script.
```

Log in or register to post comments

Comment #16

28 September 2014 at 14:27

sdboyer committed 53e8008 on 2322267-migrate-country-field

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2322267-migrate-country-field

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #17

29 September 2014 at 09:55

sdboyer committed 53e8008 on 2322267-migrate-gender-field

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2322267-migrate-gender-field

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #18

1 October 2014 at 19:41

sdboyer committed 53e8008 on 2348121-missing-bio-information

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2348121-missing-bio-information

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #19

7 October 2014 at 11:35

sdboyer committed 53e8008 on 2350591-not-spammer-role

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2350591-not-spammer-role

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #20

7 October 2014 at 14:15

sdboyer committed 53e8008 on 2322267-bakery-sync-country

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on 2322267-bakery-sync-country

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #21

21 October 2014 at 12:52

sdboyer committed 53e8008 on random-supporter-logos

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on random-supporter-logos

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #22

21 October 2014 at 15:35

sdboyer committed 53e8008 on hosting-type-field

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on hosting-type-field

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #23

23 October 2014 at 13:23

sdboyer committed 53e8008 on filter-partners-by-sector

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on filter-partners-by-sector

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #24

24 October 2014 at 17:12

sdboyer committed 53e8008 on restrict-commit-issue-notifications

Issue #1653262 by sdboyer, Damien Tournoud: Create a vc_git-compliant...

sdboyer committed 6b98219 on restrict-commit-issue-notifications

Issue #1653262: Add the branch data and data-generating script.

Log in or register to post comments

Comment #25

drumm

he/him

NY, US

commented 17 April 2015 at 18:13

Version:	6.x-3.x-dev	» 7.x-3.x-dev
Issue summary:	View changes

Log in or register to post comments

Comment #26

drumm

he/him

NY, US

commented 5 June 2019 at 23:05

Status:

Postponed

» Fixed

Looks like the git-read-branches.sh script got us through the Git migration.

(The GitLab migration was done with a custom table tracking repositories with un-imported changes, and tricking the versioncontrol repomgr queue into running GitLab project imports.)

Log in or register to post comments

Comment #27

6 June 2019 at 13:13

drumm committed e5e3263 on 7.x-3.x

Issue #1653262: Remove git branch data and script

Log in or register to post comments

Comment #28

20 June 2019 at 13:14

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Create a vc_git-compliant update hook that will not OOM when trying to update all repository default branch info

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

News items

Our community

Documentation

Drupal code base

Governance of community