Closed (fixed)
Project:
Drupal core
Version:
7.x-dev
Component:
simpletest.module
Priority:
Critical
Category:
Bug report
Assigned:
Reporter:
Created:
27 Aug 2009 at 01:08 UTC
Updated:
31 Mar 2011 at 13:56 UTC
Jump to comment: Most recent file
Comments
Comment #1
boombatower commentedI thought about this the other day, and you most likely have confirmed it.
I think the current way of detecting fatal errors works 100% on single process running, but when running multiple processes concurrently like bot does it may not always work when the errors occur on the other side of an HTTP request. This is not a pifr issue, but instead a limitation of the error catching with simpletest.
This is a really fun issue to solve, but the solution here is to make it keep track of all the prefixes or something like that and then check all the error logs, or to use a common one, but I'm not sure if that will make write errors.
Comment #2
boombatower commentedComment #3
damien tournoud commentedThe only issue is that PIFR 1.x parse the result of running the run-tests shell script, and do not load the assertions that are in the DB. Because of the fatal error, the test results for a particular test are truncated. This will be solved by rolling PIFR 2.x. By the way, this only happens when fatal errors occurs on the parent site.
Comment #4
boombatower commentedYes, it does only happen on parent site, which is what I meant by on the request side.
I thought it dump out the message in run-tests...I'll double check, but I am fairly certain it does.
Also the limitation I brought up is an actually issue and can occur.
Either way both of these problems are with SimpleTest and not PIFR.
So we need to double check run-tests.sh and fix the issue I mentioned about concurrency.
Comment #5
carlos8f commentedsubscribe.
Comment #6
carlos8f commentedI hope we can get some attention to this issue, since it's obviously horrible for quality assurance. See #655814: Test: Error reporting with SimpleTest/PIFR where I did some testing on this.
Comment #7
chx commentedI wonder what the bot would say. This is a very C inspired patch hence the name :) Note that i registered printf to shutdown with because print is not callable like that.
Comment #9
chx commentedBah.
Comment #11
chx commentedWe now test for content-type for text/html before printing.
Comment #12
chx commentedOK, OK
Comment #13
carlos8f commentedHeader names returned from drupalGetHeaders() are normalized to lowercase. This has to be 'content-type' to work.
Should probably be drupal_get_http_header('content-type'), to be consistent with lowercase normalization on HTTP header names.
Why use a null byte? Seems kinda sneaky :) Why not use an HTML comment like
<!-- SimpleTest request succeeded -->?This patch is creative, but after getting it to work, unfortunately I couldn't verify that it improved error reporting at all. It seems like it only attempts to address incomplete drupalGet() responses due to fatal errors. From my testing today, PIFR currently catches these, but run-tests.sh does not (with concurrency mode on *or* off).
I'm working with run-tests.sh right now and noticing that it actually passes tests that fail both the web test interface and the PIFR bot. See http://qa.drupal.org/pifr/test/22368 for one example: this actually passed run-tests.sh with 16/0/0 passes/fails/exceptions in DrupalErrorCollectionUnitTest. How strange...
This review is powered by Dreditor.
Comment #14
boombatower commentedAssuming this is still about what I wrote the issue is when:
Comment #15
carlos8f commented@boombatower: i used three things: run-tests.sh with --concurrency 1, --concurrency 2, and no concurrency set. run-tests.sh passed one test class 16/0/0 while PIFR failed it 16/1/0 in each case. Unfortunately I can't reproduce exactly what command syntax I was using and with what patches, but I'll see if I can document it more.
I suspect it might have to do with batch size, i.e. running with a TestGroup or --class TestClass.
Anyway, I'm pretty determined to get to the bottom of this :)
Comment #16
boombatower commentedAs I attempted to document above, I can encounter the error or not even if failed test and concurrency > 1.
** > 1 simply means there is a chance. **
Essentially what has to happen is that a test with the fatal error has to complete after another test has started, thus the "last_prefix" field no longer points to the test prefix that had the fail, and it is never picked up.
There are two solutions:
1) Different method of detecting fatal errors, (not terribly simple)
2) Better way of grabbing errors from files.
As for #2 it would by simpler to pull from all files, but then we loss track of what test they are for and such, so we either need a place to store that array, maybe run-tests.sh only (prefer not, but web ui can't run multiple concurrency so doesn't have this issue).
Comment #17
carlos8f commentedThanks for the elaboration. As far as fatals in the .test code (which I think is the main issue, since PIFR has caught the ones in drupalGet() responses so far) it's extremely reproducible that entire test classes drop out of the PIFR results, so my thought is to have PIFR mark the entire class as failed if it enumerated it at the beginning and it didn't come back in the results. That would be like a 'Detect aborted test' check.
I will have more to say after I've installed PIFR locally and played around with it.
Comment #18
chx commentedcarlos8f, i used a 0 byte because we are writing after the so anything you write is illegal html anyways and need to be sliced off. I suspected something was wrong with the code and lower case it is :/ the problem with the latest path is the lack of Content-Type does not trigger the error handler, this needs investigation.
Comment #19
carlos8f commented@chx: since you're detecting and filtering out the null byte before parse() is called on the HTML, there seems to be no worry of invalid HTML. Not sure what you mean by lack of Content-Type, the content type is 'text/html; charset=utf-8' as far as I see. Anyhow, glad to have you working on this.
Comment #20
chx commentedWell I would be glad if others would now work on this :) What I meant is that the current patch does
if (isset($headers['Content-Type']) && $headers['Content-Type'] == 'text/html; charset=utf-8')but what if you have a fatal error and no Content-Type header? The previous patch ommitted this and then I got many failures. This requires a bit of investigation.Comment #21
carlos8f commentedOK, so the latest incident involves #653940: Clean-up: Tests do not report all errors in which a fatal error was introduced in comment.test. It crashes browser-based SimpleTest as well as run-tests.sh, but PIFR somehow marked the patch as 'passed' and it was committed. This is unacceptable and really requires some attention.
Comment #22
glacialheart commentedsubscribing
Comment #23
boombatower commentedThis should fix problem once I figure out how to pull db_prefixes used.
Edit: patch is a bit off, but doesn't work yet anyway.
Comment #24
boombatower commentedYea, so I've been staring at the code...there is no way to relate prefixes to test info, unless we log in in some manor. The reason being that the prefix is created in DWTC and if that process dies..there goes the relation.
The data must either be logged in some manor (several possibilities), or we refactor the code as I have suggested and attempted to create the environments outside of test run and place test run in isolated process.
This provides a number of advantages including:
1) performance gain from re-using schemes
2) easy tracking of errors
3) complete and proper isolation which will remove static issues completely
Comment #26
carlos8f commentedI'm at work on a patch for PIFR that will implement the sanity check I described in #17. That will plug the gap, but the SimpleTest concurrency improvements will still be necessary for proper error reporting.
Comment #27
sunsubscribing
Comment #28
carlos8f commentedOK, I think I have found a definitive solution in SimpleTest:
My idea is to preemptively fail every test method. If execution does not complete, the fail record is left in the database!
I spent some quality time with PIFR last night (strictly professional of course) and verified boombatower's reports that fatal errors don't get read out of the logs reliably with concurrency > 1. Since my test machine had 2 cores, I set concurrency=2. A fatal error was reported about half the time. With the production test clients, the situation is different because of higher concurrency settings. It seems that the fatal error is almost never caught in production, where it is most vital to catch it.
The test results of this patch are very conclusive, namely that the fails are not caused by the patch itself, but by fatal errors in HEAD. The Comment tests are indeed currently broken as I described in http://drupal.org/node/653940#comment-2366180, and this is the first time it has been revealed!
I also benchmarked the effect of the extra queries, and determined that it doesn't noticeably increase the runtime of tests. So that's also good.
Comment #30
carlos8f commentedAttached a screenshot of what the results look like.
Comment #31
carlos8f commentedNote that #28 will need a re-test after http://drupal.org/node/653940#comment-2369428 has landed.
Comment #32
sun+1 from me, very nice patch
Comment #33
dries commentedComment #24 sounds compelling -- care to elaborate boombatower?
Comment #34
chx commentedI emailed Jimmy because I had no queue access about something similar
And then at the end iterate simpletest_fatals and there you have it. I am not sure I like the idea of inserting and deleting an assert like that. You are taking away any OOP advantages we had by breaking the encapsulation.
Comment #35
carlos8f commented@chx, see my patch in #28 that implements this :) i've pinged boombatower and waiting to hear back.
I think what boombatower's overhaul entails is basically isolating the test runs as much as possible. That would mean generating the testId in a parent process, spawning a child process to run the test and nothing more, and then doing cleanup in the parent process which would not have crashed if a fatal error occurred in the test. This is perfectly logical since one of SimpleTest's traditional flaws is that it isn't properly isolated from PHP, thus a fatal error ruins everything unless there are multiple layers of PHP to provide a fallback.
I didn't understand the 'performance gain from re-using of schemes' part though. Elaboration, yes.
Comment #36
carlos8f commentedchx and I talked in IRC and decided to add a DrupalTestCase::deleteAssert() method since deleting it manually was kind of messy. DrupalTestCase::insertAssert() now returns a message ID to make it easier to delete.
Comment #38
carlos8f commentedOnce again, the fail in #36 is caused by BROKEN TESTS ELSEWHERE:
Running Option widget tests locally:
assertFieldValues() is in FieldTestCase, but OptionsWidgetsTestCase does NOT extend FieldTestCase. Problem!
This patch is worth it's weight in gold, as far as I'm concerned :)
Comment #39
carlos8f commentedLatest development in the saga:
http://drupal.org/node/552436#comment-2371556
Comment #41
chx commentedThis is very pretty especially with a specific id being deleted.
Comment #42
webchickThis is awesome-tastic awesome souflé smothered in awesomesauce.
Thanks for your great work on this, carlos! Committed to HEAD.
Comment #43
boombatower commentedSo we don't actually catch the fatal error, hmm. This is a nice temporary patch, but it should record the actual error, otherwise the fact that qa.d.o displays assertions will be useless and I will get complaints that the bot fails and they don't know why.
Glad to see some initiative in providing a quick fix.
Comment #44
carlos8f commentedboombatower:
It now points them to the file, method, and start line, and tells them 'fatal error', although obviously the actual error would be helpful. In a perfect world, every developer that writes a patch would run simpletest locally (at least with the tests that relate to their patch) before they submit to d.o. That way they would see the actual error 100% of the time and save our bot queue the trouble :)
Comment #45
boombatower commentedThat was the reason I didn't both including the assertion messages in the first place (and had issues with XML-RPC scaling). Yet, from my experience and complaints received this definitely isn't a perfect world. I agree with you, developers should debug tests locally.
Comment #46
chx commentedSo yeah but we needed pifr v2 and its cool detailed report because sometimes the bot fails differently to local to begin with. But, it's fine to know at least which method fatal'd.
Comment #48
chrisns commentedI did this patch which solved it for us
https://gist.github.com/896344/
simple but seems to work for now