I recently upgraded from 5.8 to 5.9 and now I'm getting DRUSH_BACKEND messages like this with "drush make make_file".

 >> field_group-7.x-1.1 downloaded.[ok]
 >> strongarm-7.x-2.0 downloaded.[ok]
 >> view_unpublished-7.x-1.1 downloaded.[ok]
 >> link-7.x-1.1 downloaded.[ok]
 >> jquery_update-7.x-2.3 downloaded.[ok]
>> DRUSH_BACKEND:{"type":"notice","message":"Command dispatch complete","timestamp":1367242226.1479,"memory":7515724,"error":null,"packet":"log"}
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.19 MB","timestamp":1367242226.1485,"memory":7516408,"error":null,"packet":"log"}
 >> Project xmlsitemap contains 9 modules: xmlsitemap_taxonomy, xmlsitemap_modal, xmlsitemap_i18n, xmlsitemap_node, xmlsitemap_user, xmlsitemap_menu, xmlsitemap_custom, xmlsitemap_engines, xmlsitemap.
 >> webmaster_menu-7.x-1.1 downloaded.[ok]
>> DRUSH_BACKEND:{"type":"ok","message":"xmlsitemap-7.x-2.0-rc2 downloaded.","timestamp":1367242226.2942,"memory":7577432,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.19 MB","timestamp":1367242226.2985,"memory":7516388,"error":null,"packet":"log"}</em>
 >> Project views_bulk_operations contains 2 modules: actions_permissions, views_bulk_operations.
 >> diff-7.x-3.2 downloaded.[ok]
 >> module_filter-7.x-1.7 downloaded.[ok]
 >> views_bulk_operations-7.x-3.1 downloaded.[ok]
 >> DRUSH_BACKEND:{"type":"notice","message":"Command dispatch complete","timestamp":1367242229.1381,"memory":7515772,"error":null,"packet":"log"}
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.19 MB","timestamp":1367242229.1386,"memory":7516456,"error":null,"packet":"log"}
 >> override_node_options-7.x-1.12 downloaded.[ok]
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.19 MB","timestamp":1367242229.194,"memory":7516572,"error":null,"packet":"log"}
 >> masquerade-7.x-1.0-rc5 downloaded.[ok]
 >> redirect-7.x-1.0-rc1 downloaded.[ok]
 >> Project metatag contains 8 modules: metatag_panels, metatag_twitter_cards, metatag_ui, metatag_context, metatag_dc, metatag_views, metatag_opengraph, metatag.
 >> metatag-7.x-1.0-beta7 downloaded.[ok]
 >> zen-7.x-5.1 downloaded.[ok]
 >> DRUSH_BACKEND:{"type":"notice","message":"Command dispatch complete","timestamp":1367242232.4251,"memory":7516092,"error":null,"packet":"log"}

and on and on...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

Status: Active » Postponed (maintainer needs more info)

We can't see the command you ran.

jhedstrom’s picture

Version: 7.x-5.9 » 8.x-6.x-dev
Status: Postponed (maintainer needs more info) » Active

I've noticed this as well. Happens on 6.x as well. No exact command/makefile yet as it is sporadic.

DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.06 MB","timestamp":1367256699.0769,"memory":7060704,"error":null,"packet":"log"}
dpearcefl’s picture

I added the full command line to the original post.

jhedstrom’s picture

Using git bisect, I traced this down to commit cc8b76b, for issue #1829418: Drush invoke process can't handle large packets of data.

greg.1.anderson’s picture

Assigned: Unassigned » greg.1.anderson

Does this only happen for drush make?

jhedstrom’s picture

FileSize
8.31 KB

drush make is the only place I've seen it. Unfortunately, those messages appear to be sporadic. To bisect, I used a makefile that contained enough projects so it was fairly consistent in outputting those messages (attaching here).

greg.1.anderson’s picture

After this issue is fixed, it would be good to go back and investigate #1380948: Print errors from caught exceptions during updatedb.

achton’s picture

I'm seeing this on upgrading from 5.8 to 5.9 too:

Command: drush -y make --working-copy --no-core --contrib-destination=. ./my-makefile.make

Make new site in the current directory? (y/n): y
 >> Project adaptivetheme contains 3 themes: adaptivetheme_subtheme, adaptivetheme_admin, adaptivetheme.
 >> adaptivetheme-7.x-3.1 downloaded.                                                                                                                                                     [ok]
 >> Project xmlsitemap contains 9 modules: xmlsitemap_node, xmlsitemap_custom, xmlsitemap_i18n, xmlsitemap_menu, xmlsitemap_modal, xmlsitemap_engines, xmlsitemap_user, xmlsitemap_taxonomy, xmlsitemap.
 >> DRUSH_BACKEND:{"type":"ok","message":"xmlsitemap-7.x-2.0-rc2 downloaded.","timestamp":1368298070.1093,"memory":6074284,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 6.78 MB","timestamp":1368298070.1102,"memory":6015056,"error":null,"packet":"log"}
 >> wysiwyg-7.x-2.x-dev downloaded.                                                                                                                                                       [ok]
 >> DRUSH_BACKEND:{"type":"ok","message":"webform_validation-7.x-1.2 downloaded.","timestamp":1368298070.1414,"memory":6074748,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 6.78 MB","timestamp":1368298070.1426,"memory":6015588,"error":null,"packet":"log"}
 >> DRUSH_BACKEND:{"type":"ok","message":"webform-7.x-4.0-alpha6 downloaded.","timestamp":1368298070.7354,"memory":6074140,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 6.78 MB","timestamp":1368298070.7364,"memory":6015072,"error":null,"packet":"log"}
 >> DRUSH_BACKEND:{"type":"ok","message":"transliteration-7.x-3.1 downloaded.","timestamp":1368298070.8175,"memory":6074588,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 6.78 MB","timestamp":1368298070.8186,"memory":6015532,"error":null,"packet":"log"}
 >> DRUSH_BACKEND:{"type":"ok","message":"token-7.x-1.5 downloaded.","timestamp":1368298070.8696,"memory":6074840,"error":null,"packet":"log"}
 >>
 >>
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 6.78 MB","timestamp":1368298070.8705,"memory":6015832,"error":null,"packet":"log"}
 >> Project views contains 2 modules: views, views_ui.
 >> views-7.x-3.7 downloaded.                                                                                                                                                             [ok]
 >> strongarm-7.x-2.0 downloaded.                                                                                                                                                         [ok]

Etc. etc. ..

Damien Tournoud’s picture

We are seeing that happening too.

Looking at the code, it feels like the "out of the band" facility provided by drush_backend_packet() is basically broken:

  • drush_backend_packet() output a "packet" to STDERR,
  • _drush_backend_proc_open() (that call the backend command), only sets up pipes for STDIN and STDOUT.

The facility is only used in a few places, so it's possible that nobody every noticed.

greg.1.anderson’s picture

Historically, Drush redirects stderr to stdout in backend invoke. It could be that there are some places where this is not done.

The best solution would probably be to reserve stderr for log messages & out of band info, and to stop redirecting stderr to stdout in backend invoke. This might be time consuming.

As mentioned in #9, it was probably a mistake that the out of band info was sent to stderr; we could try a quick fix of just changing this to stdout. That would be a much quicker fix, if it worked; however, there is some advantage to the fact that old versions of drush are sending out of band information to stderr; it might make it easier to do the "best" solution in Drush 6, and still work with Drush 5.

Not sure how soon I could take this on; I know we want to release Drush 6 RSN.

Damien Tournoud’s picture

I assume something somewhere redirects STDERR to STDOUT when calling the backend command... so it kind-of works most of the time, but I haven't been able to find where.

Damien Tournoud’s picture

Here is my hypothesis on how this currently works:

  • For SSH backend invocations, STDERR is explicitly redirected to STDOUT (in _drush_backend_generate_command)
  • For local backend invocations, the output buffer (started by _drush_bootstrap_output_prepare()) is actually what converts STDERR into STDOUT, at least in most cases.

The output buffer set by _drush_bootstrap_output_prepare() will either discard everything except backend packets (with drush_backend_output_discard() - when --pipe is set), or will collect everything (with drush_backend_output_collect()), but in that case backend packets will be discarded a bit later by drush_backend_output().

I'm confused how all this is working in the first place. Seems like the whole system has grown out of control.

moshe weitzman’s picture

Priority: Normal » Major
Issue tags: +Release blocker

It sounds like this might be more widespread than what we hoped. Thanks for investigating, Damien.

This system might have been out of control when Adrian first wrote it :). Anyway, lets look for simplifications. FWIW, the concurrency that we added has really not worked out as well as I hoped. Only make uses it, and it causes surprises for folks there. I would not be opposed to removing it, if that would simplify the backend code.

greg.1.anderson’s picture

Backend has been hairy from the beginning; there have been some improvements, but concurrency and incremental output added additional logic which I think has resulted in a net increase in complexity. In any event, I think that we can all agree that improvements would be helpful.

I don't know that I'm in favor of removing concurrency. I am pretty sure msonnabaum is using it (where's my d.o feature to add someone as a follower to a thread? :> ), and it's useful for speeding up batch processing. I think it is of no practical benefit to make, though; while there is the possibility of some benefit under the right conditions, it rarely works out to any substantial increase in performance in practice.

I think that we could simplify the whole system if we managed to get consistent about stderr usage, and stopped using ob_start / ob_end in backend and pipe mode entirely. We could then also remove all of the places where we redirect stderr to stdout. We could also get rid of the dreaded (and problematic-for-Windows) `tty` construct if we handled stdin in backend invoke. That would involve -not- closing stdin at the beginning of the function, and avoiding the deadlock in some other way. Perhaps Drush could send a new special "start" and "end" log packet pairs. Backend invoke could close stdin on receipt of the "end" packet if "start" is received as the first packet, or on the receipt of any other packet if "start" is not the first packet. I think this should allow Drush 6 to talk to Drush 5 without deadlock.

greg.1.anderson’s picture

Title: DRUSH_BACKEND messages » Redesign and simplify Drush backend invoke to eliminate DRUSH_BACKEND messages in terminal output

Changing title.

anarcat’s picture

for the record, this is also a significant problem in aegir because some errors just disappear from our task logs, because they are not fed through stdout as they should be.

also note related issue #1721586: Improve Drush's handling of STDERR in backend invoke, which suggests adding an option of suppressing stderr (not a good idea in my opinion, because we may miss stuff) or just redirecting it to stdout in all cases (that I would prefer).

it is unclear to me why those packets go on stderr.

i'd love to discuss this during the code sprint tomorrow.

greg.1.anderson’s picture

Unfortunately, I cannot sprint tomorrow. :( I'll try to be in IRC at least, tho.

If you guys want to take out concurrency, I am not opposed.

anarcat’s picture

Is there a test to cover the problem that was fixed when this problem was introduced?

In other words, is there a test that checks the deadlock mentionned above?

greg.1.anderson’s picture

Unfortunately, the problem only happens on Windows. The issue is discussed here: http://drupal.org/node/766080#comment-4309936

moshe weitzman’s picture

I discussed concurrency with jhedstrom since make is its only customer right now. We agreed that concurrency has not worked out that well and can be removed. Hopefully Greg can help with that.

anarcat’s picture

Status: Active » Needs review
FileSize
880 bytes

actually, i don't know if I can really reproduce this directly in drush 5. what I see is that this may be drush 6 specific for core drush commands *buuut* i definitely see some backend packets output to stderr regularly in aegir. this shouldn't happen: all drush backend output should go out on stdout or stderr, but not both!

this is an api change that breaks everything.

so here's a patch to reverts a tiny part of 510735ac, which was introduced to get #1058874: drush_backend_invoke buffers output till end of command. Show progress. working. Apparently, the reason that data is sent to stderr is to avoid "Cannot modify header information - headers already sent" messages, as those packets are being sent too early. The thing is, that also means that those packets are lost to certain backend handlers. It also makes drush foo --backend | drush bar fail.

So I believe we should fix the "Cannot modify header information" instead of working around it.

I'm running the test suite now to see if this explodes.

greg.1.anderson’s picture

Based on some inconclusive tests I started, it also seems to me that there might be something wrong with the handling of the introduced null character. I might try changing this to something non-null later, to make it easier to debug. No time to continue now, though.

greg.1.anderson’s picture

Status: Needs review » Needs work
FileSize
854 bytes

Here is a fix that removes a highly-destructive 'trim' that can corrupt backend packet start patterns in the remainder buffer. This should take care of the primary symptom; however, because of the issue with 'drush foo --backend | drush bar', I agree that anarcat's STDERR changes are also necessary. I don't have time to address those at the moment, but I believe that with this patch, we're now pretty close.

greg.1.anderson’s picture

Title: Redesign and simplify Drush backend invoke to eliminate DRUSH_BACKEND messages in terminal output » 1
Status: Needs work » Needs review
FileSize
1.27 KB

Correct minor defects in backend invoke that can cause large output problems

greg.1.anderson’s picture

Title: 1 » Correct minor defects in backend invoke that can cause large output problems
Component: Make » Base system (internal API)

[Made a typo with drush iqs in #24; here is the correction of what should have appeared above.]

It turns out that #21 and #23 merge together quite nicely. All tests pass; I did not have any problems with "Cannot modify header information" or other defects. It seems that it is not necessary to rewrite backend invoke after all.

moshe weitzman’s picture

Status: Needs review » Active
Issue tags: -Release blocker

Committed to 8.x-6.x.

1. Does anything need backport?
2. Do we need any new tests?

Once those are resolved, we can mark this Fixed.

Let's release Drush6 with concurrency and evaluate it again for 7.

greg.1.anderson’s picture

Since the bug fix turned out to be trivial, I think that backporting to 7.x-5.x is appropriate. I cherry-picked the commit on the Drush 5 branch.

Could still use a new unit test. I think a good way to test this would be to repeatedly call drush_log from within a command called via backend invoke, with each successive call returning a message that is one character longer. The total backend packet length should go from smaller than the packet chunk size (4096, unless set to something else) to somewhat larger than same.

I'll put something together to do this later.

Steven Jones’s picture

Status: Needs work » Active

So...this looks to have introduced a regression in Drush 5.x.

We (the Aegir project) have some lame tests, that basically consist of running lots of Aegir specific Drush commands, and making sure that no errors are logged.
We run these tests with Jenkins, and we can run then against lots of different Drush versons, which is handy for testing compatibility.

You can see this matrix test here:

http://ci.aegirproject.org/job/Matrix%20test/76/

Note that Drush 7.x-5.x-dev is failing, but 7.x-5.9 is not.
Seems like the change to sending output to stdout rather than stderr does cause fatal errors for installing Open Atrium.
Also, note that we still have DRUSH_BACKEND in the output too :(

greg.1.anderson’s picture

Okay, I'm going to back out all of #24 from 7.x-5.x, and put the log packets back on STDERR in 8.x-6.x. We can continue to try to isolate DRUSH_BACKEND in the output as well, but it seems clear that moving away from STDERR is a problem.

Perhaps drush foo --backend 2>&1 | drush bar will have to do, at least for Drush 5. In Drush 6, perhaps we could consider moving the packets to STDOUT if we cached all of the log messages prior to the first call to drush_print; however, this would not be very desirable for long-running commands that output log messages, but no standard output. Maybe someone else might have a better idea here.

greg.1.anderson’s picture

Status: Active » Needs work
FileSize
3.34 KB

Here is a failing test. It sends longer and longer log messages, until a seven-character message causes Drush to drop a backend packet somewhere.

anarcat’s picture

Status: Active » Needs work

whoa... okay, this is so very weird.

we should really consider rewriting this whole packet handling thing from scratch at some point, it seems really confusing to me, and we just keep piling up weirder stuff on top of it all the time... :)

couldn't we just ob_start() around the bootstrap process to get rid of the 'headers already sent' crap?

is anyone going to fix this by friday? it's the first day i'll have free to figure this thing out and i'd like to release aegir beta2 with drush 6 support and have this bug fixed. ;)

greg.1.anderson’s picture

Status: Needs work » Needs review
FileSize
625 bytes
3.92 KB

I think it would take longer than 'til next Friday to rewrite the backend invoke mechanism; even if we had a candidate implementation, it would probably take longer than that to review. A full rewrite would be welcome, if someone had the time to do it. In the meantime, though, try this patch, which changes the end-of-packet marker from null + eol to eot + eol. Note that there is a code comment that says that we only care about the last null in the trailing string because there is only one null in a packet -- and yet each packet started and ended with a null. Bummer. All of the backend tests pass with this; I will run the full suite next. n.b. #24 was also a necessary fix; there were two packet-corrupting bugs here. Hopefully there is not yet another defect still lurking. ;)

greg.1.anderson’s picture

All unit tests pass with #32. Hopefully this will work for Aegir as well.

greg.1.anderson’s picture

This version is a little cleaner; it puts the terminal null at the very end of the packet, after the eol. This still works, because if the terminal null appears in the text string being processed, then the entire packet is complete and will be handled and removed. Placing the null in the penultimate position left a one-character window that could corrupt the packet.

greg.1.anderson’s picture

Issue tags: +Release blocker

We should make sure this is fixed prior to the Drush 6 release.

jhedstrom’s picture

With #34 I'm still seeing these sort of messages in the output:

 >> DRUSH_BACKEND:{"type":"ok","message":"environment_indicator-7.x-1.1 downloaded.","timestamp":1369850233.5548,"memory":7089296,"error":null,"packet":"log"}
 >> DRUSH_BACKEND:{"type":"memory","message":"Peak memory usage was 8.47 MB","timestamp":1369850233.5574,"memory":7022728,"error":null,"packet":"log"}
greg.1.anderson’s picture

I cannot reproduce; I do not see any of these messages when using the makefile attached to #6.

It could have something to do with concurrency; there is no unit test for split packets with concurrency at the moment. Could you try running your test again with --concurrency=1, and see if the problem goes away?

jhedstrom’s picture

Re: #37 with concurrency set to 1, those messages go away--however, they go away with and without this patch.

greg.1.anderson’s picture

#34 and #24 both correct rare intermittent bugs with split packet handling; neither of these two bugs are related to concurrency. It appears that in addition to these two defects, there is at least one more intermittent defect related to concurrency. By the symptoms, it also seems that this is a split packet defect; however, I cannot easily see the problem by inspections.

We have two ways to proceed:

1. Write a unit test that manages to illustrate the bug every time, to allow isolation and correction of the problem.

2. Remove concurrency.

I'm not sure which route I would take, if I picked this up again first; if someone else wants to submit a patch, either option would be okay with me -- whichever is easier.

Is everyone okay with committing #34?

greg.1.anderson’s picture

Here is a third option. This patch includes the fix in #34, and also changes the default concurrency to 1. This will mask the existing concurrency bugs, and still allow those who want to use it the option of turning it ack on. There is little motivation to do that for make, since the performance is about the same for concurrent and non-concurrent runs of make; however, some folks have gotten good performance increases running batch jobs concurrently. I recommend committing this, at least, and continue the discussion on whether to fix or rewrite or remove concurrency - maybe in Drush 7.

moshe weitzman’s picture

Status: Needs review » Active
Issue tags: -Release blocker

Committed #40. Leaving this open for further work and discussion. I think we could release as is though.

greg.1.anderson’s picture

Title: Correct minor defects in backend invoke that can cause large output problems » Correct defects in backend invoke that can drop backend packets when operating concurrently

Sounds good; adjusting title (although we could also just decide to remove concurrency to fix this as well).

anarcat’s picture

Title: Correct defects in backend invoke that can drop backend packets when operating concurrently » Correct minor defects in backend invoke that can cause large output problems
Status: Active » Needs review
Issue tags: +Release blocker

What's wrong with the idea of sending everything on the same stream (STDOUT) and buffering the bootstrap process?

It seems every time we touch this we make the code more complicated, I feel very unconfortable obscuring it further.

greg.1.anderson’s picture

Title: Correct minor defects in backend invoke that can cause large output problems » Reduce complexity of backend invoke; consider removing concurrency
Issue tags: -Release blocker

Sending everything to STDOUT has no benefit. Why would you want to do that? Is drush foo --backend | drush bar even a real use-case? Drush itself does not have any mechanisms to parse the backend output from stdin. Does something in Aegir do that? If so, you could add a 2?&1 if you really needed it.

Regarding obscuring the code, the last two commits fixed an off-by-one error, and removed a 'trim' that was stripping characters that should not be stripped. I agree that incremental output and concurrency complicated and obscured backend invoke, but the commits here did not further complicate or obscure it.

If you want to simplify backend invoke without removing features, I would recommend that we keep backend packets on STDERR as they currently are, and adjust backend invoke to read packets from STDERR (remove the 2>&1 from all of the places that Drush adds it when doing backend invokes). If we put the packets all on STDERR, then we could just read lines of text from STDERR (unlimited buffer size, everything up to the eoln at the end of each packet) and discard any line that was not a packet. Then all of the split-packet code would disappear.

Removing concurrency has also been discussed. This feature can be helpful when doing large batch processing, but I still would not object if we removed it.

However, I think that things work here well enough to release. It would be a bad idea to rewrite backend invoke at this point, unless doing so is absolutely necessary, as it would take a long time to write, review, debug and commit. Let's get Drush 6 out, and consider further improvements to backend invoke in Drush 7, as time and availability of contributors permits.

moshe weitzman’s picture

Priority: Major » Normal
Status: Needs review » Active
Steven Jones’s picture

Sorry to report bad news again, but according to Aegir's tests, this problem still exists:

http://ci.aegirproject.org/job/Matrix%20test/DRUSH_VERSION=8.x-6.x-dev/8...
(then search on that page for 'DRUSH_BACKEND')

You'll see lots of occurrences of the backend packets getting output.

greg.1.anderson’s picture

Bother. I still recommend shipping Drush-6 like this, and then fix per #44. For more rapid resolution, patches welcome.

moshe weitzman’s picture

I notice that all the stray messages happen at the beginning of the request or the end. That's probably significant. Maybe output buffer related? I see ob calls at http://api.drush.org/api/drush/includes%21bootstrap.inc/function/_drush_.... I'm not too familiar with this part of Drush

I see both these late log messages a lot: "type":"notice","message":"Command dispatch complete" and "type":"memory","message":"Peak memory usage was

The outlier is "type":"message","message":"Changed permissions of \/var\/aegir\/.drush\/platform_drupal7.alias.drushrc.php to 640" but perhaps that happens in an exit hook or something?

greg.1.anderson’s picture

I also noticed that it tended to be later log messages that were lost (and therefore emitted as plain text). The backend invoke buffer is large, so if you see two notices in a row, it probably means there was one incident where a packet was lost, and every remaining packet in the buffer was dumped to stdout.

This could happen if aegir writes a null to stdout near the end of a run. I could probably make this better by fixing that part in backend invoke that says "we only care about the last null...". That seems to be untrue.

Damien Tournoud’s picture

@moshe: I noticed that also, and my hunch is also that we are somewhat hitting the atomicity limit of the pipe.

Writing to a pipe is only atomic up to PIPE_BUF bytes, which is at least 512 bytes by POSIX standards, but 4096 bytes (a page) under linux. I suspect that somewhat Drush sub-processes are outputting everything to the pipe at the same time, and the last few messages end-up being interleaved, and as a consequence do not match the regexp and are just output to the screen.

Looking at the code, that should not happen because we are using the $chunk_size parameter of ob_start(). Hm. Or maybe the output buffer is properly flushed, but we need an additional flush() call to flush to the pipe?

Damien Tournoud’s picture

We could try adding a flush() right after:

    fwrite(STDERR, sprintf(DRUSH_BACKEND_PACKET_PATTERN, $data));
greg.1.anderson’s picture

I believe that in #64, concurrency is set to 1, and there is only one process writing log messages at the same time. Perhaps this should be confirmed, though.

greg.1.anderson’s picture

Also, I do not believe that packets are being interleaved; if they were, we would see them interleaved in the output. "Dropped" packets (those that are not interpreted and parsed) are echo'ed to stdout exactly as they are received.

Damien Tournoud’s picture

FileSize
651 bytes

I was poking at all the wrong places. The problem is just that if the master process is too slow to read the output of the children, multiple packets have to be read at the same time and the first and/or last packet in a batch fails to parse.

Here is an easy way to reproduce the problem.

Damien Tournoud’s picture

Status: Active » Needs review
FileSize
748 bytes

In that case, the problem is that the first packet is missing its facing \0, and the last packet its trailing \0.

Removing the trim() within _drush_backend_proc_open() that occurs before reading the backend output seems to fix the issue.

This actually explains all the issues that I have seen: when we read both some packets and the backend output in one go, the facing and/or trailing zeros are discarded, which makes the first and/or last packet fail to parse.

greg.1.anderson’s picture

Ah yes, you are correct. That trim must go. The code path above only runs when the final buffer is processed -- the one that contains the backend results (or at least the start of them). If this buffer also contains one or more backend packets, then these are discarded.

I'll run the unit tests with #55 and see how they fare.

greg.1.anderson’s picture

#55 introduces failures, because an extra end-of-line is inserted in the output in instances where there are no packets or output in the final buffer. In this case, the trailing string contains a single null, which the trim was removing. This null is also removed by drush_backend_parse_packets, so we just make sure that we do the 'empty' test again after all of the packets are removed from the data.

This works fine for Drush's unit tests; how do it still do the right thing with Aegir?

Steven Jones’s picture

I'm re-running the tests locally with a patched and non-patched version of Drush and will report back shortly.

Steven Jones’s picture

Status: Needs review » Reviewed & tested by the community

Victory! I've just tested locally, and the DRUSH_BACKEND messages go away with the patch from #57.

Any chance that the fix could get backported to 5.x too?

Amazing work everyone, sorry that I was only able to test, and didn't have the time to post patches.

Damien Tournoud’s picture

             if (!empty($trailing_string)) {

^ Let's try to avoid this if we can. What about $trailing_string === '' or strlen($trailing_string) == 0?

greg.1.anderson’s picture

Status: Reviewed & tested by the community » Patch (to be ported)

Committed #57, adjusted as suggested in #60.

This could be backported to 7.x-5.x, but note that there are multiple commits here, and we'd need to be sure to include them all.

greg.1.anderson’s picture

Status: Patch (to be ported) » Needs review
FileSize
9.21 KB

Okay, think I got them all. All Drush tests pass; feedback on how this works with Aegir would be appreciated.

greg.1.anderson’s picture

Version: 8.x-6.x-dev » 7.x-5.5

#62 is the backport to 7.x-5.x; setting version.

Steven Jones’s picture

Sure, will apply and test within 12 hours.

boombatower’s picture

Seems to work for me.

Steven Jones’s picture

Version: 7.x-5.5 » 7.x-5.x-dev
Status: Needs review » Reviewed & tested by the community

Tested 5.x with the patch from #62 and it seems to have fixed the DRUSH_BACKEND appearing in the output too.

greg.1.anderson’s picture

Version: 7.x-5.x-dev » 8.x-6.x-dev
Status: Reviewed & tested by the community » Active

Committed #62 to 7.x-5.x.

We can still consider removing concurrency and simplifying logic in Drush 7.

Steven Jones’s picture

Lovely. FWIW I don't think concurrency is actually too much of an issue, and I don't think it complicates this code path too much, but I do think that there's too much magical string handling going on in backend.inc and it's way too hard to follow. I would suggest that cleaning it up so that it was easier to follow what was going on would be better than just removing features to be able remove the corresponding code.

greg.1.anderson’s picture

It would be helpful to know if Aegir runs correctly with --concurrency=4. I expect it likely does now.

#44 contains an idea for simplifying the code that could work without removing concurrency.

Steven Jones’s picture

I would imagine that Aegir would, but we don't make active use of anything that could be run concurrently except for make, by that I mean that all our backend invokes are for a single command on a single target, so there's nothing to run in parallel.

greg.1.anderson’s picture

Earlier it was reported that Aegir was failing with concurrency, but worked when it was set to 1. I think this result was spurious, but it would be good to have confirmation.

Steven Jones’s picture

Sorry, do you know exactly where someone said that concurrency made a difference in Aegir land? This has not been the case in my testing, concurrency or not it's not been working correctly.

greg.1.anderson’s picture

Sorry, my mistake -- setting concurrency to 1 fixed make, not Aegir. I think all is well here -- but simplification would still be a good idea.

Steven Jones’s picture

Ah fine. We actually do a make as part of the Aegir tests, and we've seen that go from having DRUSH_BACKEND in it to those disappearing too, so make is looking good too.

anarcat’s picture

By the way, I found that using the "interactive" backend option hides the backend packets from us and they get properly integrated into the calling process. My guess is that this is because "interactive" uses proc_open() while ... non-interactive uses system() or something silly.

Yet, interactive just doesn't work in Debian and probably other linuxes, as reported elsewhere, because of the crazy `tty` stuff. I have opened a separate issue to remove this obscenity, see #2025527: > `tty` is evil and killed my cat.

anarcat’s picture

Also, just thought I'd mention that right now, on PHP 5.4 Wheezy, the drush 6.x test suite yields a ton of DRUSH_BACKEND packets in the output. This may mean that the whole test suite is compromised and simply not finding errors when it should, because they are lost in those DRUSH_BACKEND packets.

This is reproduced on travis too: https://travis-ci.org/drush-ops/drush/jobs/8107473 - and the test suite is actually failing right now, although travis doesn't detect it:

anarcat@angela:tests$ curl -s 'https://s3.amazonaws.com/archive.travis-ci.org/jobs/8107473/log.txt' | grep -v '"error":null' | grep error
DRUSH_BACKEND:{"error":"DRUSH_COMMAND_INSUFFICIENT_BOOTSTRAP","message":"Command core-cron needs a higher bootstrap level to run - you will need to invoke drush from a more functional Drupal environment to run this command.","packet":"set_error"}
DRUSH_BACKEND:{"error":"DRUSH_COMMAND_NOT_EXECUTABLE","message":"The drush command 'core-cron' could not be executed.","packet":"set_error"}
DRUSH_BACKEND:{"error":"DRUSH_NO_DRUPAL_ROOT","message":"A Drupal installation directory could not be found","packet":"set_error"}
The shell alias echotest uses replacements "%root". You must use this    [error]
Not found: @badalias                                                     [error]
anarcat’s picture

After fighting with this, I have come to believe we should always call drush with proc_open().

greg.1.anderson’s picture

#76: While sub-optimal, it is actually expected to see the DRUSH_BACKEND in the unit tests, because the tests are calling drush --backend directly.

greg.1.anderson’s picture

Also, Drush backend invoke "interactive" mode does not use --backend when calling Drush; you therefore don't see any packets in backend invoke, because no packets are generated. You also won't see any structured results from the command; all you get is the status result code from proc_open, and the raw stdout from the command. If you don't need to parse the Drush log output, or get the structured result of a command, then calling Drush in "interactive" mode is fine.

anarcat’s picture

It is not merely suboptimal - it is wrong. Errors are being skipped, as I demonstrated above.

greg.1.anderson’s picture

The tests in #76 are not wrong; some tests test the negative; the error messages you are seeing are from Drush, not from the test. The test expects Drush to fail.

This is the test that produces The shell alias echotest uses replacements "%root".:

public function testShellAliasReplacementNoAlias() {
    $options = array(
      'config' => UNISH_SANDBOX,
    );
    // echo test has replacements that are not satisfied, so this is expected to return an error.
    $this->drush('echotest', array(), $options, NULL, NULL, self::EXIT_ERROR);
  }

So, the situation is suboptimal, as it often tricks people into thinking that there is an error in the test; however, the test is actually working correctly.

anarcat’s picture

That is just one of the errors, how about this one:

DRUSH_BACKEND:{"error":"DRUSH_COMMAND_INSUFFICIENT_BOOTSTRAP","message":"Command core-cron needs a higher bootstrap level to run - you will need to invoke drush from a more functional Drupal environment to run this command.","packet":"set_error"}

This is in testDispatchUsingAlias(), I don't think this is an expected failure.

I really don't see why drush packets should be spewed on stderr, it just confuses everything.

greg.1.anderson’s picture

Maybe you mean this test, from testTarget() in backendTest.php? This test is expecting core-cron to fail for the reason specified in the log message quoted in #82.

    // Check error propogation by requesting an invalid command (missing Drupal site).
    $this->drush('core-cron', array(), array('backend' => NULL), NULL, NULL, self::EXIT_ERROR);

As for your next point, 'spewed' is kind of a charged word; I'm not completely sure what specific change you are suggesting here. If you mean that stderr output should be hidden in the unit tests, then I agree with you. It is standard behavior for phpunit to emit anything written to stderr, but we could take pains to hide stderr in the unit tests so that the output was cleaner. This improvement would be welcome. If you mean that backend packets should go to stdout, I still disagree per #44. In either event, though, the best way to continue the point would be with a patch.

anarcat’s picture

Right, sorry for the loaded words, I guess I get pretty upset with this. Thank you for your continued response and help with this issue.

I guess I never answered #44, let's see if I can do that now. To clarify: I don't believe drush packets should be sent to stderr, not only in the unit tests, but in drush as a whole.

Sending everything to STDOUT has no benefit. Why would you want to do that? Is drush foo --backend | drush bar even a real use-case? Drush itself does not have any mechanisms to parse the backend output from stdin. Does something in Aegir do that? If so, you could add a 2?&1 if you really needed it.

Yes, there is a real use case for drush | drush. We use the following contraption to safely transmit database credentials during the Debian package install process:

            su -s /bin/sh aegir -c "cd $VARLIB && drush hostmaster-install $FLAGS --backend $site_uri 2>&1 | drush backend-parse $DEBUG" <<EOF
{ "yes": 1,
  "version": "$VERSION",
  "aegir_db_host": "$AEGIR_DB_HOST",
  "aegir_db_user": "$AEGIR_DB_USER",
  "aegir_db_pass": "$AEGIR_DB_PASS",
  "client_email": "$EMAIL"
}
EOF

The reason this is done this way is that this is the only way to safely pass credentials so they do not show up in the process list, something drush has problems with elsewhere (e.g. #671906: mysql credentials leak in drush sqlc and archive-dump). --backend is necessary for that reason here - if there would be another way to make drush read JSON from stdin, that would be useful to our use case.

"backend-parse" is something that parses backend output from stdin. The source can be found here, it's quite trivial (and should probably be merged into core): http://drupalcode.org/project/provision.git/blob/HEAD:/parse.backend.inc

But this is not the only use case: a bunch of other places call drush using system() or similar, which doesn't properly connect the stderr pipe, which leaks to stderr. In our case, this is what is happening with drush hosting-queued, an alternative queue runner:

http://drupalcode.org/project/hosting.git/blob/refs/heads/6.x-2.x:/queue...

I suspect there are other occurrences in drush, but I am not familiar enough with all the commands to get one on top of my head.

So yes, there are use cases. :)

Regarding obscuring the code, the last two commits fixed an off-by-one error, and removed a 'trim' that was stripping characters that should not be stripped. I agree that incremental output and concurrency complicated and obscured backend invoke, but the commits here did not further complicate or obscure it.

You're right, this was uncalled for, and I apologize. I do have the feeling this code is way too complicated, but you can hardly be blamed for that. Thanks again for trying to fix it. :)

If you want to simplify backend invoke without removing features, I would recommend that we keep backend packets on STDERR as they currently are, and adjust backend invoke to read packets from STDERR (remove the 2>&1 from all of the places that Drush adds it when doing backend invokes). If we put the packets all on STDERR, then we could just read lines of text from STDERR (unlimited buffer size, everything up to the eoln at the end of each packet) and discard any line that was not a packet. Then all of the split-packet code would disappear.

If we put packets on stderr, we need to use proc_open() everywhere, that's for sure. That said, I don't understand why we really need to send data on STDERR. It seems like just a workaround, as the packets are lost instead of parsed... In my mind, we should use output buffering to avoid the "headers already sent message"... maybe I missed why this wasn't appropriate.

Removing concurrency has also been discussed. This feature can be helpful when doing large batch processing, but I still would not object if we removed it.

If concurrency works, I don't mind it being there. :) But in my mind, I can do concurrency on top of drush, not sure it should handle multiple children itself...

However, I think that things work here well enough to release. It would be a bad idea to rewrite backend invoke at this point, unless doing so is absolutely necessary, as it would take a long time to write, review, debug and commit. Let's get Drush 6 out, and consider further improvements to backend invoke in Drush 7, as time and availability of contributors permits.

Agreed. I wish I had time to rewrite the whole backend system again, but the truth is that I am struggling to get Aegir 2.x out the door and I can't afford to spend time on a full rewrite now. :)

Should we close this issue and open a new one specifically for refactoring backend.inc?

greg.1.anderson’s picture

This issue is specifically about reducing the complexity of backend invoke, so I think we can put refactoring work here.

I can agree with a lot of what is in #84. I don't think that ob_start is necessarily going to work here; note that _drush_bootstrap_drupal_full() is already using ob_start, and the message still is coming through. I think that my idea to suppress Drush log messages until after the Drupal bootstrap is dubious; it could run into problems with Drush commands that don't need to bootstrap Drupal, or that call bootstrap_max. If you come up with a patch that works, by one technique or another, we could consider it; it seems much simpler to keep the log messages on STDERR. We could make Drush read log messages from STDERR, and commands that want to pipe can use 2>&1.

I don't think that we really need drush backend-parse (except maybe for debugging?); what would be more useful would be a way to parse options from stdin without using --backend, which I think is your use-case above. It's harder to prevent credentials leaks in remote sql-syncs, but I agree that should ideally be fixed (separate issue, as quoted above).

greg.1.anderson’s picture

Status: Active » Closed (won't fix)
Issue tags: +Needs migration

This issue was marked closed (won't fix) because Drush has moved to Github.

If desired, you may copy this bug to our Github project and then post a link here to the new issue. Please also change the status of this issue to closed (duplicate).

Please ask support questions on Drupal Answers.

greg.1.anderson’s picture

Issue summary: View changes

Adding full command line

greg.1.anderson’s picture

Status: Closed (won't fix) » Closed (duplicate)