Big repositories usually end up in the memory limit, so, let's see if it is possible to improve memory consumption here.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

marvil07’s picture

Assigned: Unassigned » marvil07
marvil07’s picture

Status: Active » Needs review
FileSize
1.65 KB

I'm testing this patch in git7site doing a fullsync via drush with drupal core repository, which is a really big repository for our standards. Hopefully it's good enough.

marvil07’s picture

Status: Needs review » Needs work

Sadly not enough yet:

$ time drush vc-sync 2
Beginning synchronization of repository drupal                                                                                                               [ok]
exec(): Unable to fork [git show --numstat --summary --pretty=format:"%H%n%P%n%an%n%ae%n%at%n%cn%n%ce%n%ct%n%B%nENDOFOUTPUTGITMESSAGEHERE"                   [warning]
'2ecc273c60f3b625fe3b44005813e73eba5ea57a'] VersioncontrolGitRepository.php:413
PHP Fatal error:  Allowed memory size of 838860800 bytes exhausted (tried to allocate 236978129 bytes) in /var/www/git-dev.drupal.org/htdocs/sites/all/modules/versioncontrol_git/includes/plugins/reposync/VersioncontrolGitRepositoryHistorySynchronizerDefault.class.php on line 181
Drush command terminated abnormally due to an unrecoverable error.                                                                                           [error]
Error: Allowed memory size of 838860800 bytes exhausted (tried to allocate 236978129 bytes) in
/var/www/git-dev.drupal.org/htdocs/sites/all/modules/versioncontrol_git/includes/plugins/reposync/VersioncontrolGitRepositoryHistorySynchronizerDefault.class.php,
line 181
The external command could not be executed due to an application error.                                                                                      [error]

real    423m15.116s
user    165m31.480s
sys     110m7.890s

  • Commit 121af53 on 7.x-1.x by marvil07:
    Issue #2226443: Direct query for current commits on full repository sync...
  • Commit 1f2e17a on 7.x-1.x by marvil07:
    Issue #2226443: Support empty-message annotated tags.
    
  • Commit d4e0507 on 7.x-1.x by marvil07:
    Issue #2226443: Pass repo_id instead of the full repository on...
marvil07’s picture

Status: Needs work » Active
FileSize
1.84 KB

Just an update about the last additions:

  • I have added the patch in comment 2, which deals with the main problem: loading too much operation objects at the same time is a lot of memory.
  • I discovered a logic bug on a edge case which was causing a endless loop which ended up in memory exhaustion(when an annotated tag has no message, i.e. an space), that is fixed too.
  • On repository synchronization, default branch can be updated, and in that case a full repository object was passed to store in the queue, which can be avoided since we can pass just the repo_id, which I had done.

I could finally sync core. Notice sometimes it need to be run several times to get it right, but it seems to be better at memory usage.

Sadly, core sandboxes has lots of commits pending to read, and that's a lot of exec's, and a lot of time.
Maybe it is now working, but I cannot know now, since git7site is being re-initiated daily.
I will try again with imp which is way behind(last time after ~14hours if failed because of re-synchronization), and see how it goes.
Hopefully there is still something else I can do to improve performance, but for now exec's are taking most of the time in the run.

Maybe it is time to implement #973890: Create a reposync plugin using libgit2 php binding(use a php binding of a c library to access git data) or #1019976: Improve performance of log parser(use git-fast-export to avoid most of git calls on initial synchronization, and force flushing on big repositories or the ones with a lot of data pending to be sync, i.e. imp)?

marvil07’s picture

Status: Active » Fixed

Let's call this fixed for now.

What I had tried on git7site:

  • Core is syncing fast(few pending git data to sync):
    $ time drush vc-sync --nobatch 2
    Beginning synchronization of repository drupal                                                                                                               [ok]
    Successfully synchronized repository drupal                                                                                                                  [success]
    
    real    2m43.052s
    user    1m57.570s
    sys     0m10.280s
    
  • WSCCI core sandbox is syning ok too(several, but not too much git data to sync):
    $ time drush vc-sync --nobatch --ignorelock 26466
    Beginning synchronization of repository 1260830                                                                                                              [ok]
    Successfully synchronized repository 1260830                                                                                                                 [success]
    
    real    392m45.565s
    user    299m56.700s
    sys     26m4.000s
    

So, i. e. IMP sandbox probably will take a lot of time, but I guess it will eventually succeed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.