Drush Make should use Drush core's native download abilities concurrently [#1267228]

Comment	File	Size	Author
#79	1267228-79.make-pm-full_version.patch	3.87 KB	dww
#78	1267228-78.make-pm-full_version.patch	2.14 KB	dww
#74	1267228-74.make-pm-version-string.patch	2.14 KB	dww
#67	drush-make-concurrency-1267228-67.patch	8.53 KB	jhedstrom
#65	drush-make-concurrency-1267228-65.patch	8.14 KB	jhedstrom
#54	drush-make-concurrency-1267228-54.patch	8.28 KB	jhedstrom
#52	drush-make-concurrency-1267228-52.patch	7.67 KB	jhedstrom
#44	drush-1267228-pm-download-concurrent.patch	5.09 KB	steven jones
#40	drush-make-concurrent-downloads-1267228-40.patch	2.18 KB	jhedstrom
#39	drush-make-concurrent-downloads-1267228-39.patch	2.19 KB	jhedstrom
#27	drush-1267228-27.patch	3.45 KB	jhedstrom
#24	drush-1267228-24.patch	2.97 KB	jhedstrom
#20	drush-1267228-20.patch	2.3 KB	jhedstrom
#5	drush-1267228-05.patch	1.72 KB	jhedstrom
#4	drush-1267228-04.patch	1.68 KB	jhedstrom

Comment #1

English

Portland, OR

commented 28 October 2011 at 21:24

Project:	Drush Make	» Drush
Version:	6.x-3.x-dev	»
Component:	Code	» Make
Category:	feature	» task

Moving to Drush issue queue.

Log in or register to post comments

Comment #2

moshe weitzman commented 5 November 2011 at 12:02

This is the next step for make? Feel free to ask questions here or in IRC when implementing this.

Log in or register to post comments

Comment #3

jhedstrom

English

Portland, OR

commented 5 November 2011 at 21:50

Now that all the test are passing (aside from the d.o-specific plugins, which still need porting), I think this is indeed the next step. The direction jonhattan outlined here seems like a reasonable approach that would basically resolve this issue:

1. Drush's new release_info engine can replace drush_make_updatexml($project):

drush_make.drush.inc:function drush_make_updatexml($project) {
drush_make.drush.inc:function drush_make_update_xml_download($project) {
drush_make.drush.inc:function _drush_make_update_xml_best_version($major, $releases) {
drush_make.generate.inc:function _drush_generate_makefile_check_updatexml($name, $type, $status_url = '') {

2. Drush_make's download classes can be the foundation of Drush's new package fetcher engine:

drush_make.project.inc:class DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Core extends DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Library extends DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Module extends DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Profile extends DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Theme extends DrushMakeProject {
drush_make.project.inc:class DrushMakeProject_Translation extends DrushMakeProject {

Log in or register to post comments

Comment #4

jhedstrom

English

Portland, OR

commented 15 December 2011 at 02:09

Status:

Active

» Needs review

Status	File	Size
new	drush-1267228-04.patch	1.68 KB

Here is a very (very) rough patch that. There are 4 failing tests, and many TODOs that I could use feedback on fixing.

Log in or register to post comments

Comment #5

jhedstrom

English

Portland, OR

commented 15 December 2011 at 03:02

Status	File	Size
new	drush-1267228-05.patch	1.72 KB

Here's a patch that works a bit better (down to 1 failing test), still needs feedback on TODO items though.

Log in or register to post comments

Comment #6

jhedstrom

English

Portland, OR

commented 15 December 2011 at 03:08

Also note, this patch only applies to normal downloads, it doesn't use pm-download for git clones yet.

Log in or register to post comments

Comment #7

moshe weitzman commented 15 December 2011 at 05:22

Well, that looks nice and simple. Kudos.

I think we should use drush_invoke() here. You are probably best off just using drush_set_option() and drush_unset_option() to get rid of options which are no longer needed by make command and are interfering with the call to pm-download. Alternatively, you could use drush_invoke_process() to completely isolate the pm-download (similar to your current exec() call), but that shells out for every download which is a small performance hit. maybe we don't care. we can't use the one call to pm-download with multiple projects since each project can have its own --download dir. or we let --destination be a comma separated list?

Maybe Greg has an opinion about how to proceed here.

Log in or register to post comments

Comment #8

moshe weitzman commented 15 December 2011 at 05:23

What do we want to do if one of the downloads fails? Should make fail, and thus trigger a rollback? Seems like folks would be annoyed by that.

Log in or register to post comments

Comment #9

greg.1.anderson commented 15 December 2011 at 05:36

My inclination would be to use drush_invoke_process. I'm wary of doing too much in the way of drush_set_option and drush_unset_option in complex environments like Drush make. drush_invoke_process may have a performance hit, but we can write the code that way without thinking too hard. Using drush_invoke, if you needed to call drush_unset_option on --no-core (for example, as your comment mentions) prior to calling pm-download, then you'd have to restore it to its previous value after pm-download completes. The best way to do that is with drush_invoke_process.

If performance is really an issue, then perhaps we could do a drush_invoke_process('make-download-all', ...), and pass in complex args and options that allow our subcommand to call drush_invoke with different --destination values for each project. We'd need something better than a comma-separated list of destinations, though; I went down the path of "aligned independent lists" of related variables once in Drush, and I could never feel good about that code and was glad to back it out. I guess we could json-encode an array and pass the encoded value as an option (cough). I tend to think that drush_invoke_process is going to be close enough performance-wise without getting into this sort of complexity.

Log in or register to post comments

Comment #10

jhedstrom

English

Portland, OR

commented 15 December 2011 at 06:11

So, make already fails entirely if one part fails, and while that can be annoying, in the long run, it's much better than a partially built site. I'd advocate that we keep it that way.

Shelling out with drush_invoke_process seems that while it might be a performance hit on processing, it would allow for concurrent downloads (I'm not sure if that would break make as it stands now though, since, for instance, patches would attempt to apply before the download had actually finished).

Log in or register to post comments

Comment #11

dww

we/he/they

commented 15 December 2011 at 06:53

+1 to fail and rollback if anything in the .make file can't be processed. Let's not change those semantics. Let's just change the implementation details so we can start to shed duplicate code between make and pm-download.

Thanks,
-Derek

Log in or register to post comments

Comment #12

moshe weitzman commented 15 December 2011 at 07:38

I tend to think that drush_invoke_process is going to be close enough performance-wise without getting into this sort of complexity.

I agree, lets use drush_invoke_process(). But perhaps we should add a little complexity. Namely, Mark is ready to add concurrency patch, and specifically add to pm-download when called with multiple projects. Make should ideally just benefit from that. We should add support for comma separated values in --destination, or if thats undesireable, add a new hidden option --project-destination-map which takes a proper array of projects and destinations. --project-destination-map can be populated already by drush_invoke_process(). With this done, make can just call drush_invoke_process() once with all the projects that need fetching via same method. If there are dependencies (.e. a patch needs to get applied), then make has to reorder operations a bit.

One hitch with the above plan is that I'm not sure that the concurrency patch can (or should) handle the use case where different option values need to be sent on each thread.

Don't we need to specify the wget package handler in drush_invoke_process()? If we don't a site be configured to use git clones by default for pm-download. This conflict might be why jonhattan advocated for a deeper refactor in #3.2

Log in or register to post comments

Comment #13

jhedstrom

English

Portland, OR

commented 16 December 2011 at 01:29

drush_invoke_process isn't going to work without a larger refactoring (patches, for example, can't be applied to projects that are being downloaded concurrent to the make build execution).

Still trying to use drush_invoke, is there an easy way to unset all options, or hide them in the event that make still needs them? (Would a custom drush_set_context be appropriate here?)

Log in or register to post comments

Comment #14

greg.1.anderson commented 16 December 2011 at 14:55

drush_invoke_process isn't going to work without a larger refactoring (patches, for example, can't be applied to projects that are being downloaded concurrent to the make build execution).

Is this really the case? Even with Mark's concurrency patch, drush_invoke_process should not return to the caller until every part of the invoke is complete. It is not an asynchronous call, it just runs in a new process. It does not return control to the caller any sooner than drush_invoke does.

Was that your concern in #13, or is there some other problem that I don't see?

I am on-board with #12 (just use --project-destination-map rather than a list in --destination). This should work fine, as long as you specify all relevant options (e.g. the package handler).

Of course, if pm-download were refactored so there was one call that took all parameters as function arguments, then you could just call that and not worry about drush_invoke vs. drush_invoke_process. That would be the cleanest solution, although you'd loose the concurrency feature. Concurrency might be nice to have.

Log in or register to post comments

Comment #15

dww

we/he/they

commented 16 December 2011 at 15:37

I think the problem with concurrency is that you might download the patch and try to apply it before you finish downloading the tarball of the thing you're trying to patch.

So, why not separate this into phases, each of which must happen in serial, but any subtasks per phase can happen in parallel. Namely:

1) download and unpack stuff
2) apply the patches you've already downloaded
3) ? (Anything else? Not sure what it'd be).

Probably that's going to require too much refactoring to happen at this point, but I really haven't looked at the internals enough to know.

Short of that, I think separate processes are fine if that's the easiest way to write the code. Honestly, separate processes for each project would be fine by me. The additional overhead of fork() & exec() are going to be *dwarfed* by all the I/O, even if we are hitting the download cache (although in many cases it'll be network I/O which will be even slower). It's pointless to try to shave a few milliseconds on an operation that takes 15 seconds to complete. Better to optimize for readable, maintainable code.

Log in or register to post comments

Comment #16

dww

we/he/they

commented 16 December 2011 at 15:44

Ps in case it's not clear from my comment, concurrency is worth complicating the code for, since doing 20 things in parallel that all take 15 seconds is a huge win. That's the only optimization worth worrying about. Of course, probably the I/O is still going to be the bottleneck, and it's not all going to complete in 15 seconds total, but that's the OS's problem, not ours. Certainly better to get all the I/O requests started ASAP so the OS can do its job instead of trickling them out in serial.

Log in or register to post comments

Comment #17

greg.1.anderson commented 16 December 2011 at 16:05

~~#15 is not necessary.~~ --project-destination-map is not necessary. Here is the best flow: refactor drush_invoke_process into two entrypoints: drush_invoke_process (same params and behavior) and drush_invoke_concurrent. It looks like this:

function drush_invoke_process($site_alias_record, $command_name, $commandline_args = array(), $commandline_options = array(), $backend_options = FALSE) {
  $invocations[] = array(
    'site' => $site_alias_record,
    'command' => $command_name,
    'args' => $commandline_args,
    'options' => $commandline_options,
    'backend-options' => $backend_options);
   return drush_invoke_concurrent($invocations);
}

function drush_invoke_concurrent($invocations, $common_options = array(), $common_backend_options = array(), $default_command = NULL, $default_site = NULL) {
  // Call through to Mark's backend_invoke here
}

Mark's concurrency patch already splits out options & c. like this before generating a command, so this shouldn't be too hard to fit together. The one minor complication is that we need to get rid of the deprecated drush_do_site_command from this call path (without breaking anything seriously. :> ).

To call drush_invoke_concurrent from Drush make, we just need to make sure that each project goes into its own item in $invocations[], paired with its own --destination option (and --package-handler & c., as needed). Options common to all calls could go into $common_options, and 'pm-download' could be put in as the default command.

As far as patches are concerned, ~~the best~~ one way to handle that would be to add a --patches option that takes a list of patches to apply when the download is finished. Then each download + patch(es) operation is done serially, but collectively each project + patch set is executed concurrently.

This, I think, is the most clear and flexible implementation. I'm happy to help hook this up to Mark's concurrent backend invoke once that patch lands.

Log in or register to post comments

Comment #18

dww

we/he/they

commented 16 December 2011 at 16:13

#17 sounds good. The one incredibly minor downside there is that everything for a single project (the tarball itself and any patches) all have to be downloaded in serial. With something like #15, everything (tarballs, patches, whatever) can be downloaded in parallel. But, that's not worth complicating the code for, and #17 definitely seems likely to be simpler to implement and maintain. Apologies for commenting without having studied all the code, I'm just talking in the abstract based on first principles, not a detailed knowledge of drush (or even drush make) internals. ;)

Log in or register to post comments

Comment #19

moshe weitzman commented 16 December 2011 at 16:58

drush_invoke_concurrent() is brilliant. nice one.

an alternative to consider is for make to do all the downloads in call to drush_invoke_concurrent() and then apply patches in subsequent code that is make-specific.

Log in or register to post comments

Comment #20

jhedstrom

English

Portland, OR

commented 16 December 2011 at 18:45

Status:

Needs review

» Needs work

Status	File	Size
new	drush-1267228-20.patch	2.3 KB

Sorry for the confusion on drush_invoke_process(). I'd been confusing it with one of the functions that was asynchronous (and when testing, I'd been passing in @self, which caused failures).

This patch properly usesdrush_invoke_process(), adds an option to disable the use of the cache (not sure if people would need that or not), and sets the DRUSH_AFFIRMATIVE context to bypass any prompts from pm-download.

There is one failing test (when it tries to download Drupal 6.17, the download succeeds, but is never moved into place)--I'm going to keep working on that for now. Once that's working, I'll start in on the larger refactoring of make so that pm-download can be called once with all projects.

Log in or register to post comments

Comment #21

dww

we/he/they

commented 16 December 2011 at 19:04

@moshe: Your alternative in #19 is exactly what I proposed in #15. The only benefit over #17 is the minor point I made in #18, but that's hardly a reason to make things more complicated and do a more major refactoring than is already underway with these two patches.

@jhedstrom: Yay, thanks for making actual progress here. ;) Sadly, today is a disaster for me, so I can't really carefully review and test, but I'll look forward to when you set this back to "needs review" and I hope I can be useful at that point.

Log in or register to post comments

Comment #22

jhedstrom

English

Portland, OR

commented 16 December 2011 at 19:16

The issue with core using pm-download is this:

Previously, make would download core into a working directory, and add contrib projects directly into that directory, resulting in something like this:

  ./
   |->index.php
   |->...
   |->sites
         |->all
               |->modules
                    |->...

now, with pm-download, the result is this:

  ./
   |->drupal-6.17
   |->sites
       |->...

I've been playing with the --drupal-project-rename parameter, but that doesn't seem to get things building in the proper directories.

Similarly, running the raw pm-download command:

 drush pm-download --destination=.  drupal-6.17 views ctools

sticks the 3 projects into the same directory.

Log in or register to post comments

Comment #23

greg.1.anderson commented 16 December 2011 at 19:23

I think that the drupal dl must be handled separately from the other modules. If you use --drupal-project-rename to create the working directory, thereafter you could just bootstrap to your working directory, and pm-download should select sites/all/modules automatically (if you don't add --destination).

Log in or register to post comments

Comment #24

jhedstrom

English

Portland, OR

commented 16 December 2011 at 19:43

Status	File	Size
new	drush-1267228-24.patch	2.97 KB

That makes sense. This patch fixes the core download issue, so all make tests are now passing. Leaving at needs work so we can do the all-in-one pm-download for contrib.

Log in or register to post comments

Comment #25

moshe weitzman commented 16 December 2011 at 20:43

I'm a bit confused by drush_set_context('DRUSH_AFFIRMATIVE', TRUE);. Thats being set in the parent process, but I think you want it set in the subprocesses that are doing the downloading. If I have that right, then remove that line and add $options['yes'] = NULL;

Log in or register to post comments

Comment #26

moshe weitzman commented 16 December 2011 at 21:28

Also needs $options['package-handler'] = wget, or we proceed with bigger refactor as per #3.2

Log in or register to post comments

Comment #27

jhedstrom

English

Portland, OR

commented 16 December 2011 at 23:12

Status	File	Size
new	drush-1267228-27.patch	3.45 KB

I took into account #25 and #26 in the latest version (attached).

To allow others to more easily work on this, I've pushed this up into a separate branch (make-pm-download).

Log in or register to post comments

Comment #28

jhedstrom

English

Portland, OR

commented 17 December 2011 at 20:10

I opened (and closed) #1374020: Specify project-destination-map for pm-download before fully understanding the implications of #17.

As I understand it now, drush_invoke_concurrent would be added as part of #1053072: Concurrently execute a single drush command on multiple sites (con'd)? The refactor of make to grab all pm-downloadable projects at once is ready to go (and really simple)

  if (isset($projects['contrib'])) {
    // Download all projects using pm-download in a single command.
    $pm_downloads = array();
    foreach ($projects['contrib'] as $project) {
      if ($project->download['type'] !== 'pm') {
        continue;
      }
      $pm_downloads[$project->name] = $project;
    }
    make_download_pm_all($pm_downloads);

    // Download remaining projects, and process pm-download projects.
    foreach ($projects['contrib'] as $project) {
      $project->make();
    }
  }

where make_download_pm_all() would be a simple wrapper to pm-download via drush_invoke_concurrent().

Log in or register to post comments

Comment #29

moshe weitzman commented 18 December 2011 at 17:02

Status:

Needs work

» Fixed

I merged in that branch and pushed to master. Nice job, all.

Lets get the concurrency in and also see if we can further integrate our download classes. Marking this fixed.

Log in or register to post comments

Comment #30

jhedstrom

English

Portland, OR

commented 19 December 2011 at 00:45

Title:

Drush Make should use Drush core's native download abilities for .tar.gz, and for git clones

» Drush Make should use Drush core's native download abilities for .tar.gz

Updating the title, since git clones did not make it into this. I opened #1375270: Drush Make should use Drush core's native download abilities for git clones to track that.

Log in or register to post comments

Comment #31

steven jones commented 21 December 2011 at 08:18

I've raised this issue: #1378474: Options used for drush pm-download are incorrect (doesn't use download cache) because the code that was committed in this issue actually makes things slower at the moment, because of: #1378528: Make downloads the same information twice.

Log in or register to post comments

Comment #32

helmo commented 21 December 2011 at 15:46

One more follow-up issue: #1378992: Clean-up drush make output

Log in or register to post comments

Comment #33

moshe weitzman commented 22 December 2011 at 21:33

MarkS mentioned that we might want to group projects by their --destination and then call invoke_process() with --concurrency. He thinks that is makes for a simplifies make command a bit. Might be splitting hairs.

drush_invoke_process('@self', 'pm-download', $projects, array('destination '=> $dest, 'concurrency' => 5));

Log in or register to post comments

Comment #34

greg.1.anderson commented 22 December 2011 at 22:55

Title:	Drush Make should use Drush core's native download abilities for .tar.gz	» Drush Make should use Drush core's native download abilities concurrently
Status:	Fixed	» Active

If you use concurrency, you must use '@none', not '@self' for the site. If you use '@self', then you will get one bootstrap per project, which may negate a lot of the benefit of concurrency.

The downside of using '@none' is that you must supply the destination, but that is already part of the proposal in #33, so it should not be a problem.

If you call drush_backend_invoke_concurrent directly, there is no need to group projects by destination:

$invocations = array();
foreach ($projects as $project) {
  $invocations[] = array(
    'args' => array($project['project']);
    'options' => array('destination' => $project['destination']),
  );
}
drush_backend_invoke_concurrent($invocations, array(), array('concurrency' => 5), 'pm-download', '@none');

The 'concurrency' feature is not hooked up in #1053072: Concurrently execute a single drush command on multiple sites (con'd) yet, but that is an easy thing to add.

Log in or register to post comments

Comment #35

moshe weitzman commented 2 January 2012 at 06:09

@jhedstrom - I looked into this a bit, and #28 could be even more ambitious. Why not do all downloads concurrently as first step? I'm including git, file, etc. One approach is to create anew hidden command called make-download and send invocations to it like below. We might want to

$invocations = array();
foreach ($projects as $project) {
  $invocations[] = array(
    'args' => array($project);
    'options' => array('destination' => $project['destination']),
  );
}
drush_backend_invoke_concurrent($invocations, array(), array('concurrency' => 5), 'make-download', '@none');

I think applies to libraries as well as projects, but we could leave those for now.

Then we proceed with applyPatches, addLockfile, etc. Those can proceed as they do today if we want (i.e. one project at a time). I know that this is a deep refactor of make, but it would speed things up a lot and i don't think all that much code needs to change. It's just a little open heart surgery :)

Feedback wanted. I might take a stab at this.

Log in or register to post comments

Comment #36

jhedstrom

English

Portland, OR

commented 2 January 2012 at 06:16

@moshe I think this would be an awesome way to proceed--if you get a chance to give this a go before I do, I'd be happy to review and test, otherwise, I may have time to get to this later in the week.

Log in or register to post comments

Comment #37

greg.1.anderson commented 2 January 2012 at 18:51

+1 for #35. My thought was to put everything that make needed into pm-download, but making a wrapper command as suggested would greatly reduce the refactoring effort, I am guessing, and would still rock performance-wise.

n.b. Best performance gains will be seen in environments with high latency and high throughput. If you have very low latency and very low throughput, then gains will be small-to-zero. I am guessing that in typical environments, gains will be pretty good. You might want to make some measurements with invoke_concurrent before doing the whole refactor, though, just to be sure.

Log in or register to post comments

Comment #38

moshe weitzman commented 5 January 2012 at 15:29

I have not found time to work on this, and probably won't for a while. Please proceed if you think you can take this on. Sorry for giving some false hopes in #35.

Log in or register to post comments

Comment #39

jhedstrom

English

Portland, OR

commented 6 January 2012 at 02:35

Status:

Active

» Needs work

Status	File	Size
new	drush-make-concurrent-downloads-1267228-39.patch	2.19 KB

This isn't as ambitious as #35, nor is it working, but here's a start at getting projects that are downloadable by pm-download to be done concurrently.

The reason it isn't working is that only the first project is downloaded when calling drush_backend_invoke_concurrent(). Posting here for review to get some feedback on how that function should actually be called.

Log in or register to post comments

Comment #40

jhedstrom

English

Portland, OR

commented 6 January 2012 at 02:36

Status	File	Size
new	drush-make-concurrent-downloads-1267228-40.patch	2.18 KB

Same broken patch with the obvious errors fixed.

Log in or register to post comments

Comment #41

steven jones commented 6 January 2012 at 12:01

You need to add an additional entry to the array passed to the concurrent download:

+++ b/commands/make/make.download.inc
@@ -40,6 +40,45 @@ function make_download_pm($name, $download, $download_location) {
+    $invocations[] = array(
+      'args' => $args,
+      'options' => $options,
+    );

You need an additional entry in this array:

'site' => array(),

Making it:

+++ b/commands/make/make.download.inc
@@ -40,6 +40,45 @@ function make_download_pm($name, $download, $download_location) {
+    $invocations[] = array(
+      'args' => $args,
+      'options' => $options,
+      'site' => array(),
+    );

However, you're code won't work, as it'll still try to make the projects downloaded concurrently, which will attempt to download them again, causing errors.

Also, in my testing I could only get it to work with concurrency == 1, if I increased it, then I got random 'directory not exists' errors.

Log in or register to post comments

Comment #42

greg.1.anderson commented 6 January 2012 at 16:19

You shouldn't need to set 'site' if you are passing in '@none' as the default site. Code looks like it's basically right; I'll see if I have time to try it out and debug. Could be a problem in backend_invoke.

Log in or register to post comments

Comment #43

steven jones commented 10 January 2012 at 06:54

I have this very almost working, don't have time to post a patch at the moment, but will do when I have time.

Log in or register to post comments

Comment #44

steven jones commented 11 January 2012 at 08:42

Status	File	Size
new	drush-1267228-pm-download-concurrent.patch	5.09 KB

Right, here's a patch that takes the approach of #40 and makes it work. The code is fairly rough, and I've had to go hacking around in a few places. However, I think that this is completely the wrong approach.

I reckon that if Crell reviewed this patch, he might say that: "This code stinks".

There's too much knowledge of low level stuff, at a high level, we have to hack in the fact that we've downloaded all the projects already, because as far as the make object is concerned, it hasn't. This also only speeds up downloading of the projects, patching is still painfully slow.

A better approach would be a more extreme version of #35 whereby we do the entire ->make() call in a new drush command, thereby allowing us to do them all concurrently, this would include patches, git etc. This would also mean that we could remove the special case of 'pm' (which already has far too many special cases in the drush make code now, IMHO) and effectively all makes are handled the same and all are able to be run concurrently, including future ones.

Looking at the code for DrushMakeProject it seems to be reasonably well self-contained, apart from a dependency on the static in make_tmp(). So it should be possible (in the worst case) to serialize the object, and pass that in as an argument to a hidden Drush command that will unserialize and call ->make() on it.

Log in or register to post comments

Comment #45

jhedstrom

English

Portland, OR

commented 11 January 2012 at 17:24

So it should be possible (in the worst case) to serialize the object

Rather than serializing the entire object, it might be better to simply serialize the specific project array, and then instantiate the make object on the hidden command.

Log in or register to post comments

Comment #46

greg.1.anderson commented 11 January 2012 at 18:06

#44 makes some very good points, but I'm a bit confused about the conclusion. If #35 is not going to work out in a straightforward way, then we should go ahead and extend pm-download so that it supports all of the capabilities of Drush make (patching, downloading by additional mechanisms not yet supported in pm-*, etc.). Drush make, then, would just be an alternate way to specify options to pm-download. The special-casing of pm-download is effectively removed by refactoring everything in Drush make that isn't pm-download into pm-download (with perhaps some exceptions -- maybe library downloads?).

I think that interim steps such as #35 are worthwhile if they can be done cleanly; however, I don't think that serializing objects and passing them back in to hidden Drush commands just to re-use existing Drush Make code is a good idea. It is sounding to me like the interim solution might be getting to be more complex and more work than doing the full refactor.

Log in or register to post comments

Comment #47

steven jones commented 11 January 2012 at 18:29

Is that the goal? To shoehorn all of Drush make's capabilities into pm-download? Drush make can do a lot, and most of it isn't tied to Drupal.org projects, which last time I looked at pm-download, is.

Why can't pm-download stay as just a way to download a project, and then add a wrapper command that can choose how to download a project, apply patches, recursively make etc.?

Log in or register to post comments

Comment #48

greg.1.anderson commented 11 January 2012 at 18:56

I believe that adding non-d.o sources and the capability to apply patches to pm-download would be the cleanest solution. Processing make files and doing recursive makes obviously should remain the purview of Drush make -- limit my comments in #46 to downloading and patching projects, which is most, but not all of what Drush make does.

There may be other clean solutions that could also be employed at the option of the implementer, but long-term, I think that using Drush's native download capabilities in Drush Make means enhancing Drush's native download capabilities so that there is a good match between the features Drush Make needs (to download) and the features that Drush pm-download provides.

Log in or register to post comments

Comment #49

moshe weitzman commented 12 January 2012 at 20:17

I'm hesitant to rely on pm-download as heavily as suggested in ##4. That function is about projects and it really shouldn't be downloading non-project files via GET, bzr, etc. At the same time, I'm at the point where I might take any approach that gets us concurrent downloading without getting too messy.

Log in or register to post comments

Comment #50

jhedstrom

English

Portland, OR

commented 12 January 2012 at 20:32

limit my comments in #46 to downloading and patching projects

I don't think we can move patching of projects into pm-download, since make will still be downloading and potentially patching projects outside of pm-download. If we did add patching capabilities to pm-download, make would still need its own patching functionality.

As to the concurrency issue proposed in #35, I ran into trouble trying to implement that approach as a hidden command when needing to pass complex data structures to the command line. As Steven Jones suggests, that could be done by serializing them, so I'm inclined to try to pursue this approach again (an better alternative to serializing would be if the backend api allowed such complex structures to be passed in, but I don't know offhand if that's currently possible).

Log in or register to post comments

Comment #51

moshe weitzman commented 12 January 2012 at 20:48

You can pass complex objects as arguments and options using backend and you can receive complex objects as return value. The backend APi that adrian built is pretty amazing that way. More info at http://api.drush.ws/api/file/includes/backend.inc/master.

I would think that API functions like drush_invoke_process() automatically pass along complex values using the API above. If you need to receive a complex return value from a backend call, just call http://api.drush.ws/api/function/drush_backend_get_result/master

Log in or register to post comments

Comment #52

jhedstrom

English

Portland, OR

commented 13 January 2012 at 02:43

Status	File	Size
new	drush-make-concurrency-1267228-52.patch	7.67 KB

This patch isn't fully working (I have to take off for the night, so posting current progress).

It takes some ideas from #44 (specifically, passing in 'site' => array() to the concurrency function alters behavior, which might indicate a bug, and also the change to the drush_tempdir has a positive effect that I don't fully understand in this context).

This patch also uses serialization of the project array (I wasn't able to determine how to pass non-string values to the concurrency function), which I think isn't ideal.

Log in or register to post comments

Comment #53

greg.1.anderson commented 13 January 2012 at 02:59

To pass non-string values to the concurrency function, it should be sufficient to set 'method' => 'POST' in the common backend options parameter. Backend invoke will then do the json encoding for you, and drush will similarly unencode them on the other end, allowing you to simply use drush_get_option('my-encoded-option') to recover your array (presuming you passed in 'my-encoded-option' => array(...) as one of the items in the $options array passed to the concurrency function).

So, instead of:

      $invocations[] = array(
        'args' => array(
          serialize($project),
          make_tmp(),
        ),
        'site' => array(),
      );

Try:

      $invocations[] = array(
        'args' => array(
          make_tmp(),
        ),
        'options' => array(
          'project' => $project,
        ),
        'site' => array(),
      );

I didn't test this, but it should work. This looks like a good way to get concurrency into Drush quickly. I do agree with #49 in that I don't think that pm-* is the right place to handle downloads of libraries et. al.; perhaps someday down the road I'll have time to demonstrate my ideas in #48 with a patch, but the current approach is good for now.

Edit: Don't forget to add 'method' => 'POST' to $common_options if you try the snippet above.

Log in or register to post comments

Comment #54

jhedstrom

English

Portland, OR

commented 13 January 2012 at 19:11

Status	File	Size
new	drush-make-concurrency-1267228-54.patch	8.28 KB

Here's a patch that uses method = POST as Greg suggests above. Note, I had to alter backend.inc to make this work with arrays, but I don't fully understand the potential ramifications, so this needs review:

diff --git a/includes/backend.inc b/includes/backend.inc
index 9f49004..d2d31e5 100644
--- a/includes/backend.inc
+++ b/includes/backend.inc
@@ -771,7 +771,7 @@ function _drush_backend_classify_options($site_record, $command_options, &$backe
   }
   foreach ($command_options as $key => $value) {
     $global = array_key_exists($key, $global_option_list);
-    $propagate = !is_array($value);
+    $propagate = TRUE;
     $special = FALSE;
     if ($global) {
       $propagate = (!array_key_exists('never-propagate', $global_option_list[$key]));

5 tests are failing, all revolving around patching and translations. The reason for this is that those parts of make are not currently robust enough to work concurrently. For instance, files are downloaded to a temp directory within the main temp directory called __download__, and then this directory is wiped out. When patches are concurrently being downloaded, one instance of make-process will wipe out another instance's downloads. Fixing this should be pretty straightforward, one option would be to get rid of _make_download_file and use drush_download_file, but a quick test indicates this will be more work than a simple swap.

Log in or register to post comments

Comment #55

jhedstrom

English

Portland, OR

commented 13 January 2012 at 19:45

Created #1404416: Replace _make_download_file with drush_download_file update drush make documentation to track the replacement of _make_download_file().

Log in or register to post comments

Comment #56

jhedstrom

English

Portland, OR

commented 13 January 2012 at 19:49

Status:

Needs work

» Needs review

The test that is failing is the one that uses POST to download. Moshe thinks this could safely be removed, and I personally haven't used POST in a make file that I remember, so we may be able to remove that functionality.

Log in or register to post comments

Comment #57

jhedstrom

English

Portland, OR

commented 13 January 2012 at 19:51

Status:

Needs review

» Needs work

#56 was for the wrong issue. Sorry.

Log in or register to post comments

Comment #58

greg.1.anderson commented 14 January 2012 at 20:58

I guess the Aegir guys have not been using Drush-5 lately; backend invoke method POST is broken without the changed called out in #54. It is not immediately clear to me if there may be some problems associated with the suggested change; in theory, it should not be necessary to filter out array values like this, as options that may have been converted to arrays (e.g. --include) should be handled by the various option 'merge' features. I'll take a look at this when I'm back to a fast unmetered internet connection.

Log in or register to post comments

Comment #59

greg.1.anderson commented 15 January 2012 at 14:26

I posted a fix to the backend invoke method POST problem in a separate issue: #1405576: Fix for backend invoke method POST. Let's get that in before committing here.

Log in or register to post comments

Comment #60

moshe weitzman commented 16 January 2012 at 00:15

Status:

Needs work

» Reviewed & tested by the community

This code looks good to me. Since #1405576: Fix for backend invoke method POST is in, I think this can be committed when the tests are passing.

Log in or register to post comments

Comment #61

steven jones commented 16 January 2012 at 10:23

This is going to break some existing make files, because before we could rely on the file being processed in serial, so could do things like download a library and pop it within a previously downloaded module. I'm not sure we could do things like that now?

Maybe we should increment the api version of the make files to 3.x ?

Log in or register to post comments

Comment #62

greg.1.anderson commented 16 January 2012 at 15:20

You should be able to force serial processing by passing --concurrency=1 on the commandline.

Maybe serial processing should be the default for Drush-5? Alternately, we could insure that Drush make downloads all projects before downloading any libraries. Are there any other path or other order dependencies that might cause problems in makefiles?

Log in or register to post comments

Comment #63

dww

we/he/they

commented 16 January 2012 at 18:03

Yeah, I continue to believe we want to have distinct phases here, each of which must happen in serial, but any subtasks of which can happen in parallel. Namely:
- Download projects
- Download libraries
- Download patches
- Apply patches

I don't think all of that should be pm's responsibility. I think we should just fetch projects in parallel via pm and concurrency, and then handle libraries and patches separately via make-specific commands.

Log in or register to post comments

Comment #64

greg.1.anderson commented 16 January 2012 at 18:12

Proceeding per #63 is best for backwards compatibility. Since it is not possible to use drush_make (the external project) with Drush-5, I think it is important for the version of Drush make in Drush-5 to be able to process current (make api version 2) make files.

Log in or register to post comments

Comment #65

jhedstrom

English

Portland, OR

commented 16 January 2012 at 21:04

Status:

Reviewed & tested by the community

» Needs work

Status	File	Size
new	drush-make-concurrency-1267228-65.patch	8.14 KB

Git clones are failing for reasons very similar to why concurrent downloads were failing. Currently, make uses a __git__ directory to do the cloning, then removes that entire directory (so, on concurrent checkouts, a git clone may have it's directory ripped out from under it). The fix is to simply use more unique temporary directories instead via drush_tempdir().

That gets us down to 2 failing tests, which I suspect are for very similar reasons (but don't have time to confirm now).

Log in or register to post comments

Comment #66

greg.1.anderson commented 16 January 2012 at 21:18

Looking good. When you use drush_tempdir(), you can remove the code that explicitly deletes the directory if you wish; it will be cleaned up automatically when drush exists. If you want to continue to delete it explicitly (e.g. to save space as soon as it is no longer needed), you can use drush_delete_tmp_dir instead of exec'ing rm -rf. Using the drush function will be better for Windows compatibility (may cross that bridge someday...)

Log in or register to post comments

Comment #67

jhedstrom

English

Portland, OR

commented 17 January 2012 at 02:02

Status:

Needs work

» Needs review

Status	File	Size
new	drush-make-concurrency-1267228-67.patch	8.53 KB

This patch has all tests passing.

The last 2 remaining failures were the result of not passing options to the hidden command. I used drush_redispatch_get_options() for this, but am not sure if there's a better way to do that.

I also removed the code that removes temp directories in the git cloning since drush_tempdir() will do this.

Log in or register to post comments

Comment #68

moshe weitzman commented 17 January 2012 at 19:03

Status:

Needs review

» Reviewed & tested by the community

drush_redispatch_get_options() looks fine to me. Would be good to get Greg's input.

"Drush callback: hidden file to process an individual project.": s/file/command

This is RTBC IMO.

Log in or register to post comments

Comment #69

greg.1.anderson commented 17 January 2012 at 19:24

drush_redispatch_get_options() for the win -- this is exactly the right thing to do in this situation. I did not run the tests myself, but agree this is RTBC.

Log in or register to post comments

Comment #70

jhedstrom

English

Portland, OR

commented 17 January 2012 at 19:29

Status:

Reviewed & tested by the community

» Fixed

Committed. Thanks everybody!

Log in or register to post comments

Comment #71

greg.1.anderson commented 17 January 2012 at 20:39

Excellent -- thanks for all of your hard work!

One more thing I should have mentioned in #69: it looks to me as if the committed patch behaves as described in #63, so the concerns of #61 should not be an issue. Passing in --concurrency=1 on the cli will force everything to serial execution, so there is a workaround should any problems arise. Anyone who has problems should open up an issue to report them, of course.

Log in or register to post comments

Comment #72

31 January 2012 at 20:40

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #73

dww

we/he/they

commented 5 February 2012 at 02:53

Status:

Closed (fixed)

» Needs work

Sorry to reopen this, but I believe I just found a bug in this code. Either that, or I'm totally confused. ;) Probably some of both.

It's very confusing how make handles versions. See #1371298-8: Rewrite .info files inside drush make when checking out from git to include version and project info for more. Basically, version is a project-level attribute, not a download-level attribute. Therefore, I don't see how this code can work at all:

/**
 * Download project using drush's pm-download command.
 */
function make_download_pm($name, $download, $download_location) {
  $version = $name . '-' . $download['version'];
  ...

$download['version'] is generally undefined, unless people have invalid .make files in the first place. WTF? ;) Digging deeper, turns out that if you don't specify any download info at all, then inside make_projects() we initialize like so:

          $project['download'] = array(
            'type' => 'pm',
            'version' => $release['version'],
            'download_link' => $release['download_link'],
          );

That's the only reason the code in make_download_pm() works at all. If you put this (entirely valid input) into your .make file, you get total breakage:

projects[views][type] = module
projects[views][download][type] = pm
projects[views][version] = 3.1

Given this and my hacky patch at #1371298-9: Rewrite .info files inside drush make when checking out from git to include version and project info I'm tempted to open a totally separate issue about cleaning up how make handles and propagates version strings in these various data structures and methods. What we have now is fragile and confusing. But, I wanted to reopen this first to get feedback from the folks who've been working on all this.

Thanks,
-Derek

Log in or register to post comments

Comment #74

dww

we/he/they

commented 7 February 2012 at 03:13

Status:

Needs work

» Needs review

Status	File	Size
new	1267228-74.make-pm-version-string.patch	2.14 KB

Based on an IRC chat with jhedstrom, we agreed that overloading download['version'] is too confusing. We decided to rename this to download['version_string'] in the hopes of helping to keep everything straight between the project-level 'version' attribute (which supports all kinds of shorthand, etc) and the download-level 'version_string' that's always expected to be a fully-formed, valid Drupal version string.

This also fixes the specific bug I mentioned in #73 in terms of not initializing version_string in all the cases we need to in the pm-download case.

Log in or register to post comments

Comment #75

greg.1.anderson commented 7 February 2012 at 03:22

To me, the difference between version_string and version is still confusing. Could we rename version_string to requested_version? That is assuming that I have correctly interpreted how this variable is used in the code. In general, differentiating a variable by the way is use is better than differentiating it by its type.

Log in or register to post comments

Comment #76

jhedstrom

English

Portland, OR

commented 7 February 2012 at 03:26

Status:

Needs review

» Reviewed & tested by the community

I think requested_version makes sense. I'm not at a machine I can commit from atm, but marking this RTBC.

Log in or register to post comments

Comment #77

dww

we/he/they

commented 7 February 2012 at 03:38

Status:

Reviewed & tested by the community

» Needs review

It's not about integer vs. string. It's about a fully-formed version string (e.g. '7.x-3.x-dev') vs. the various forms of short-hand that drush make supports in the project-level 'version' attribute (e.g. '3', '3.x', or '3.1'). Also, keep in mind no end-users are expected to define this. It's internal at this point.

Perhaps '_full_version' would be better? The underscore is to indicate it's an internal thing (the convention we use for function names) and 'full' to express that it's a full version string.

Either way, seems like #74 isn't RTBC itself. Setting back to needs review to get clear direction on what needs to happen here.

Thanks,
-Derek

Log in or register to post comments

Comment #78

dww

we/he/they

commented 7 February 2012 at 03:49

Status	File	Size
new	1267228-78.make-pm-full_version.patch	2.14 KB

Here's a re-roll with 'full_version' per IRC chat with Moshe.

Log in or register to post comments

Comment #79

dww

we/he/they

commented 7 February 2012 at 04:16

Status	File	Size
new	1267228-79.make-pm-full_version.patch	3.87 KB

Now with passing tests. ;) Thanks test suite. I missed a spot for s/version/full_version/ that was breaking the translation tests.

I also cleaned up a few other local variables to use more self-documenting names instead of just $version all over the place.

Log in or register to post comments

Comment #80

moshe weitzman commented 7 February 2012 at 04:34

Status:

Needs review

» Fixed

committed. thx

Log in or register to post comments

Comment #81

21 February 2012 at 04:40

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Drush Make should use Drush core's native download abilities concurrently

Comments