I just noticed, while following the timings on a site install, that immediately after git_deploy was enabled during a drush site install, the time to enable the next modules jumped from about 25ms to about 4000ms, until the end of the install.

Sure enough, removing git_deploy from my deployment cut the drush site install time from about 2 minutes to 18 seconds. Same if I rename git_deploy_system_info_alter() to disable it.

While not a deal breaker since the install still work normally, it would be good if that function could notice it is being run during a site install and just skip its processing, which is AIUI useless during install anyway.

Members fund testing for the Drupal project. Drupal Association Learn more

Comments

fgm’s picture

More details: this hook turns out to be invoked more than once per module or theme. For instance, on a small site I am currently working on, it is invoked over 700 times when displaying the admin/modules page, apparently 3 times for each module, so some static caching is probably in order anyway.

tobiasb’s picture

fgm’s picture

Status: Closed (duplicate) » Active

Not entirely duplicate, I think.

Your other issue and patch addresses the lack of static caching, but there is also the issue of not invoking it /at all/ during a site-install operation.

eMPee584’s picture

Please try installation with the linked patch applied, then report back.

blueyed’s picture

Are you referring to the patch from #1354756: Prevent git parsing on every other admin page load: implement caching? This does not apply (to the 7.x-2.x branch) (anymore).

blueyed’s picture

Here is a patch implementing naive caching: instead of using a single cache object, it uses a cache object for every module's data (and includes filemtime in the key name / cid).

I do not know about best practice here, but this way Drupal will take care of the garbage collection automatically.

btw: please note that it would be easier to provide patches, if there's a return instead of a level of indentation/block, e.g.

if( $foo ) {
  return;
};
[...meat of code...]

vs.

if( ! $foo ) {
  [...meat of code...]
}

(This can be seen with the patch linked at #1354756: Prevent git parsing on every other admin page load: implement caching).

hswong3i’s picture

Status: Active » Needs review
FileSize
3.77 KB

I give a cleanup of above patch for cache, and reference with other else Drupal core cache_get()/cache_set() coding style.

Also remove the cache lifetime since if we flush all cache, typically this info cache will also cleanup, too ;-)

P.S. Sorry that by git format-patch, I merge patch from #1471178: Increases security by escaping shell arguments to here, too.

hswong3i’s picture

Sorry #7 will not works with git submodule since filemtime() can't get the correct last modify info as we original expected.

As an counter solution I use GIT HEAD hash as unique key by invoke "git rev-parse --verify", which give us a relatively fast response + solid, unique info for generating cache key.

hytse6c’s picture

Status: Needs review » Reviewed & tested by the community

It works! ty

Freso’s picture

Status: Reviewed & tested by the community » Needs work

Please have the patch only patch this one issue.
http://webchick.net/please-stop-eating-baby-kittens

hswong3i’s picture

Status: Needs work » Needs review
FileSize
1.78 KB

Patch updated for this issue only.

hswong3i’s picture

Sync coding style.

hytse6c’s picture

Status: Needs review » Reviewed & tested by the community

Tried. Looks good. ty

hswong3i’s picture

Any else update or information required?

Freso’s picture

If fgm would reply back whether this works or not, then I think it's ready to go in. I just tested it myself on a live install, and it seems to work. I didn't test it during an install though, which is what the original issue was (partly) about.

fgm’s picture

Checking right now.

fgm’s picture

OK, so here are some hard numbers about this "install" part, which is indeed the main topic of this issue. After initial inconclusive quick tests, I ran three series of 10 installs of Drupal 7 core and the "development" install profile (because it contains a larger number of modules than just core, amplifying the problem over just core), with default caching (DB):

  • one with git_deploy commented out in the profile: initial time 18s
  • one on top of the current 7.2 HEAD (d2ca3873f0968bf9f7c1bbc13d394d250de3d881): initial time 37s (33s on an earlier 7.2 version)
  • one with the patch in #12 applied on top of that HEAD: initial time 40s

To summarize, whatever its possible interest for other reasons, the patch does not help at all for this issue.

Run 7.x-2.x HEAD HEAD + patch Without module
1 69 71 22
2 72 69 25
3 72 69 28
4 69 69 26
5 63 70 26
6 63 69 23
7 64 70 24
8 65 71 23
9 63 71 25
10 63 70 24
Average 66,3 69,9 24,6
Median 64,5 70,0 24,5
S. Deviation 3,8 0,9 1,8
Note: Initial runs, not in a series, gave smaller numbers, likely due to disk caching in the OS. See below:
Initial runs 33 40 17
Freso’s picture

Status: Reviewed & tested by the community » Needs work

And back to "needs work" it goes. Thanks for the report, FGM! :)

fgm’s picture

If you wonder why the durations are so very different from the original issue, that's because the latest results have been taken on a SSD while the original ones were on a HD, which has a much greater sensitivity to disk activity.

hswong3i’s picture

Status: Needs work » Needs review

Refer to #17, something may not totally correct: git_deploy not ONLY massively slow down module installation, but also ALL admin/* pages loading, see #1354756: Prevent git parsing on every other admin page load: implement caching for more information.

IMHO, we can't/no way to prevent initial .git scanning for every new module, or else version data may not up-to-date; BTW, we may prevent all other else unnecessary console git query unless folder content do changed.

Some suggestion: what if also compare other else admin/* page loading speed for with/without #12 patch? Refer to my local testbed, the speed can improve from ~30sec down to >1sec, which similar as non-git module management style with tar.gz.

hswong3i’s picture

Issue fork to github as https://github.com/pantarei/drupal-git_deploy/tree/7.x-2.x-1511112

Retouch file with format-patch, and only focus on cache_set().

Lukas von Blarer’s picture

Does this patch need performance testing again?

hswong3i’s picture

For testing (or experience) with the patch effect, please kindly try with my testbed on github. Installation instruction can found from DruStack project page.

I have around 100 git submodule under sites/all/modules/contrib, and already coming with a patched version of git_deploy.

Feel free to replace the patched version of git_deploy with latest drupal.org official version, then we can "feel" the different when browsing admin/* pages, e.g. reduce from >30sec to <1sec ;-)

fearlsgroove’s picture

I can understand the need for more testing, but is there any chance you could push your testing branch into drupal's git repo, rather than a github mirror?

I'm using the patched version on a few projects and it helps considerably with the performance issues. I have yet to see any significant adverse effects.

adixon’s picture

This looks useful, thanks.

I use git deploy on a multi-site install, so it occurs to me that some kind of shared cache for all installs (at least for the remote git repository status information) would make sense, though I'm not sure there's a standard in Drupal to do this.

A simple solution would be a file based cache within the module folder, though it wouldn't fly as a standard Drupal solution, and you'd have to consider the file permissions issues.

Lukas von Blarer’s picture

In my opinion it works as well. What is there remaining to review? Can we mark this as RTBC?

@adixon I guess this is a separate issue. Lets get the basic caching functionality committed.

fgm’s picture

@adixon separate issue indeed, but just for reference there is something like this in Drush Make, which caches downloads globally.

Wim Leers’s picture

+1, I noticed this too. Also for update checking.

Same for the 6.x-2.x-dev branch.

adixon’s picture

Thanks for your notes about shared file caching.

Your D7 file applies almost perfectly to the 6.x-2.x-dev branch, and I added another simple layer underneath of shared file caching in /tmp, by adding two cover functions for cache_get/set, as below. I'll post it as a new issue if anyone else would like to use it.

function _git_deploy_cache_get($cid) {
  $cache_object = cache_get($cid);
  if (empty($cache_object)) {
    // try and get it from the shared file cache at /tmp
    if (!is_dir($tmpdir = '/tmp/git_deploy/')) {
      mkdir($tmpdir);
    }
    $tmpfid = str_replace(array(':','/'),'-',$cid);
    if ($cache_data = unserialize(file_get_contents($tmpdir.$tmpfid))) {
      cache_set($cid,$cache_data);
      $cache_object = cache_get($cid);
    }
  }
  return $cache_object;
}


function _git_deploy_cache_set($cid, $data) {
  cache_set($cid,$data);
  if (!is_dir($tmpdir = '/tmp/git_deploy/')) {
    mkdir($tmpdir);
  }
  $tmpfid = str_replace(array(':','/'),'-',$cid);
  file_put_contents($tmpdir.$tmpfid,serialize($data));
}

Will test the D7 next.

adixon’s picture

Okay, confirmed that D7 improves my modules page load time from about 60 seconds to 15 seconds (on my production machine).

The other admin pages are also much better.

It sure is a nice and simple patch.

My only question is - would it be worth putting these cache entries into their own bin? As it is, those cache entries are going to get pulled out and sit in memory for all your visitors, and they're only useful for a handful of visitors, typically.

Not a big thing, but the other advantage would be that it's own backend caching could be assigned appropriately.

adixon’s picture

For the 6.x branch, I've just posted a new issue #2009094: 6.x-2.x caching with a patch that includes the above code and two other enhancements (shared file cache + separate cache bin). Requires a db_update to create the new cache table.

adixon’s picture

FileSize
4.51 KB

And here's my revised patch for this issue, it includes:

1. An optional shared file cache [as per 29 above].
2. Separated cache bin [as per 30 above].

It's almost identical to the 6.x patch in #2009094: 6.x-2.x caching, just revised for D7. It should be applied to 7.x-2.x.

I've tested it on a new install and an upgrade to an existing one, working nicely on a production machine.

codycraven’s picture

Status: Needs review » Reviewed & tested by the community

I was able to apply #32 to latest 7.x-2.x and page performance is significantly improved, after running update.php.

Note there are a couple end of line whitespace issues in #32.

Lukas von Blarer’s picture

Status: Reviewed & tested by the community » Needs work

Then lets fix these whitespace issues.

Lukas von Blarer’s picture

The patch doesn't apply to latest dev. The README.txt contains changes that have already been committed and the .module file has some other issues. @adixon, can you please re-roll it?

Sweetchuck’s picture

FileSize
6.22 KB

For the latest dev.

Sweetchuck’s picture

FileSize
10.81 KB

This patch:
Move the static cache into hook_system_info_alter().
Add hook_flush_cache().

I do not know what will happened with the outdated files in directory "git_deploy_tmp_dir".
We need to describe this in the README.txt. We can not figure out programmatically what files need to be deleted in the "git_deploy_tmp_dir", because more than one Drupal instance can use this directory.

Possible improvements
I think the cache file format is not optimal.
current: $cache_id = "git_deploy:$directory:$head_hash";
maybe it could be: git_deploy:<project_type>:<project_name>:<head_hash>
Examples:
git_deploy:module:git_deploy:6ee53636697afc3a0b7377a06ffe6ecb744207ad

adixon’s picture

Nice ...

I haven't got any contribution to your cache_id proposal, except that from looking at my own cache files and watching the behaviour, I expect you're right - i.e. the directory shouldn't be in the cache id because I think if the same git deployed module exists in separate directories, we don't need separate caches of the git information.

On the question of the tmp dir - my assumption was that the OS would deal with cleaning up the tmp dir, however that is set up. For my CentOS machine, it gets cleaned up with a daily cron unless you put in an exception. As far as I know, Drupal doesn't clean up its tmp directories in other modules.

The only improvement I added to my own code since my last patch was to make the implementation of the static file cache less of a hassle by providing a simple way to hack the module and provide a single default git_deploy_tmp_dir, rather than have to put it in each sites config file [since typically, you want all installs using your code to share the same git_deploy_tmp_dir]. In other words a little change like the below.

I can roll a patch, but it's getting so complicated I'll probably get it wrong ...

/* modify the following line if you want to use a shared file cache */
define('GIT_DEPLOY_TMP_DIR','');


$tmp_dir = variable_get('git_deploy_tmp_dir', GIT_DEPLOY_TMP_DIR);

And finally - I do get irritating errors from drush, because the default permissions in the tmp directory don't let me create/access the static file cache. That's really just a file permissions error for me to figure out, but if there's something in there I don't really understand, maybe the static file cache should be disabled for drush usage.

Sweetchuck’s picture

FileSize
11.98 KB

I have a site with 150 contrib module. (Git submodule)

The cost of git_deploy_system_info_alter() in millisec with this patch:

  • 26897 - Empty cache.
  • 6500 - File cache.
  • 4000 - DB cache.
  • 30 - Second run in one page request.

@Todo
Better PhpDoc
testing
code review

adixon’s picture

Those look like excellent numbers. I assume the last number is related to how the hook gets run more than once for a single page load [which makes sense given that a module might be altering the system info at various stages]. Since this module's altering isn't going to be affected by any other module's changes, I had been wondering if some static variable cache might improve the second and third invocations of the hook, but 30 milliseconds makes me thing that it's not worth it.

Conclusion: thanks for those numbers.

Sweetchuck’s picture

@adixon: Yes, it is strange why run more than once the _system_rebuild_module_data(), but some contrib module trigger the _system_rebuild_module_data() by itself.
In most cases this happening on ?q=admin/modules page.
In my test environment the trigger functions are:

  1. update_get_projects()
  2. system_modules()
  3. xmlsitemap_help() -> _xmlsitemap_get_blurb() -> _xmlsitemap_get_version()
Sweetchuck’s picture

Status: Needs work » Needs review
FileSize
12.56 KB

Small modifications.

Sweetchuck’s picture

FileSize
13.29 KB

Changes:

  • In the static cache ($additions) the key changed to $git_dir, because more than one module can live in one Git repository. See the Panels, OG or Commerce.
    In this case to read the versioning informations from Git history will give us same result all the time. This is faster. If a repository contains more than one modules then we need to read the version information only once.
    Side effect: Only one cache entry is written in the local cache and the shared file cache. If the git_deploy_system_info_alter() called with different order than previously then the cache is not usable.
  • Better conditions in _git_deploy_file_cache_write().
Sweetchuck’s picture

FileSize
14.7 KB

Changes:
The format of the shared file cache changed to Windows INI like file format, because WebServer writable files can not be trusted and sending untrusted data unserialize() is dangerous.

If this issue will be fixed then we do not need to use the file cache any more. #467226: Support for per-bin key prefix

Lukas von Blarer’s picture

I confirm that the patch in #44 works. I have a performance increase from 41718 ms to 12893 ms.

What is missing to get this committed?

Sweetchuck’s picture

The dependency cannot be resolved if the *.info file contains a line like this:
dependencies[] = libraries (>=2.0)
Try the git_deploy together with colorbox and flexslider.

Lukas von Blarer’s picture

There is a different issue for this: #1905736: Version numbers not exposed to update.php I tried to merge them without success. The two patches change code at the same place...

Sweetchuck’s picture

FileSize
15.19 KB

Add hook_enable() to move git_deploy module to beginning of the module list.

OnkelTem’s picture

My two cents.
Time measurements of loading admin/modules page:

• Latest dev (08.11.2013), 5 runs:

∘ 10289
∘ 12337
∘ 10135
∘ 10573
∘ 11542
avg: 10975

• With git_deploy-1511112_5.patch, 5 runs:

∘ 5290
∘ 5954
∘ 5636
∘ 5104
∘ 4002
avg: 5197

• Without git_deploy, 5 runs:

∘ 1662
∘ 1549
∘ 1492
∘ 1371
∘ 1450
avg: 1504

Conclusions:

  • Patched (#48) version is 3.46 (346%) times slower, then clean Drupal (w/o git_deploy)
  • Patched (#48) version is 2.11 (211%) times faster, then unpatched git_deploy
  • I would recommend just disabling Git Deploy module on production sites or while active developing. Enable it for updates/localizations updates.
donquixote’s picture

Issue summary: View changes

The current patches, and the existing module, miss out on a much simpler optimization:

Currently a module repository that contains more than one module will be scanned multiple times, for no reason. This can be avoided by introducing a static variable cache that caches per .git directory, not per module.

The same can then be done with the persistent cache.

Attached is a patch (with git format-patch) which
- refactors and splits up git_deploy_system_info_alter().
- uses dirname() instead of the substr() logic, to go to the parent directory.
- introduces a static cache per .git directory.
- introduces the escapeshellargs() check, which I suppose is needed.

It does NOT introduce a persistent cache like the patches above. This can be done later.
No interdiff, it would be pointless.

The splitting-up / refactoring allows to test and benchmark the discovery logic without the cache.

I developed this change before I found this issue, which explains why it is so different.

donquixote’s picture

donquixote’s picture

Review of #48:

+/**
+ * @file
+ * Install, update and uninstall functions for the Git deploy module.
+ */

Is this still needed nowadays?

 *[-*-] This module add versioning information to projects checked out of [-git-]{+Git+}.

Really?

+  if (!array_key_exists($git_dir, $additions)) {

Oh, so the static cache actually uses $git_dir as a key. But the persistent cache uses $file->name.

donquixote’s picture

Here is a combined patch (with git format-patch to distinguish the steps) which combines my changes in #50 with the changes from #48 by Sweetchuck.
So it introduces the same persistent cache.

I am not really convinced this is useful, because this cache is wiped on drush cc all. My own goal was to speed up clearing caches, so this does not help. But the static variable *does* help.

Maybe we can simplify the git commands?

donquixote’s picture