Comments

boombatower’s picture

Project: Project Issue File Review » Drupal core
Version: 6.x-1.x-dev » 7.x-dev
Component: Code » simpletest.module

I thought about this the other day, and you most likely have confirmed it.

I think the current way of detecting fatal errors works 100% on single process running, but when running multiple processes concurrently like bot does it may not always work when the errors occur on the other side of an HTTP request. This is not a pifr issue, but instead a limitation of the error catching with simpletest.

This is a really fun issue to solve, but the solution here is to make it keep track of all the prefixes or something like that and then check all the error logs, or to use a common one, but I'm not sure if that will make write errors.

boombatower’s picture

Title: PIFR not catching fatal error in this patch » SimpleTest does not *always* catch PHP fatal errors on HTTP request side when running in concurrency mode
damien tournoud’s picture

Title: SimpleTest does not *always* catch PHP fatal errors on HTTP request side when running in concurrency mode » PIFR 1.x do not catch PHP fatal errors on parent site
Project: Drupal core » Project Issue File Review
Version: 7.x-dev » 6.x-1.x-dev
Component: simpletest.module » Code
Status: Active » Closed (won't fix)

The only issue is that PIFR 1.x parse the result of running the run-tests shell script, and do not load the assertions that are in the DB. Because of the fatal error, the test results for a particular test are truncated. This will be solved by rolling PIFR 2.x. By the way, this only happens when fatal errors occurs on the parent site.

boombatower’s picture

Title: PIFR 1.x do not catch PHP fatal errors on parent site » SimpleTest does not *always* catch PHP fatal errors on HTTP request side when running in concurrency mode
Project: Project Issue File Review » Drupal core
Version: 6.x-1.x-dev » 7.x-dev
Component: Code » simpletest.module
Status: Closed (won't fix) » Active

Yes, it does only happen on parent site, which is what I meant by on the request side.

I thought it dump out the message in run-tests...I'll double check, but I am fairly certain it does.

Also the limitation I brought up is an actually issue and can occur.

Either way both of these problems are with SimpleTest and not PIFR.

So we need to double check run-tests.sh and fix the issue I mentioned about concurrency.

carlos8f’s picture

subscribe.

carlos8f’s picture

Title: SimpleTest does not *always* catch PHP fatal errors on HTTP request side when running in concurrency mode » Fatal PHP errors don't cause tests to fail
Priority: Normal » Critical

I hope we can get some attention to this issue, since it's obviously horrible for quality assurance. See #655814: Test: Error reporting with SimpleTest/PIFR where I did some testing on this.

chx’s picture

Status: Active » Needs review
StatusFileSize
new2.05 KB

I wonder what the bot would say. This is a very C inspired patch hence the name :) Note that i registered printf to shutdown with because print is not callable like that.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new2.06 KB

Bah.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new2.6 KB

We now test for content-type for text/html before printing.

chx’s picture

StatusFileSize
new2.6 KB

OK, OK

carlos8f’s picture

Status: Needs review » Needs work
+++ modules/simpletest/drupal_web_test_case.php	2009-12-10 19:14:57 +0000
@@ -1285,8 +1285,18 @@ class DrupalWebTestCase extends DrupalTe
+    $headers = $this->drupalGetheaders();
+    if (isset($headers['Content-Type']) && $headers['Content-Type'] == 'text/html; charset=utf-8') {

Header names returned from drupalGetHeaders() are normalized to lowercase. This has to be 'content-type' to work.

+++ includes/bootstrap.inc	2009-12-10 18:54:43 +0000
@@ -2247,3 +2250,9 @@ function drupal_static_reset($name = NUL
+  if (drupal_get_http_header('Content-Type') == 'text/html; charset=utf-8') {

Should probably be drupal_get_http_header('content-type'), to be consistent with lowercase normalization on HTTP header names.

+++ includes/bootstrap.inc	2009-12-10 18:54:43 +0000
@@ -2247,3 +2250,9 @@ function drupal_static_reset($name = NUL
+    print chr(0);

Why use a null byte? Seems kinda sneaky :) Why not use an HTML comment like <!-- SimpleTest request succeeded -->?

This patch is creative, but after getting it to work, unfortunately I couldn't verify that it improved error reporting at all. It seems like it only attempts to address incomplete drupalGet() responses due to fatal errors. From my testing today, PIFR currently catches these, but run-tests.sh does not (with concurrency mode on *or* off).

I'm working with run-tests.sh right now and noticing that it actually passes tests that fail both the web test interface and the PIFR bot. See http://qa.drupal.org/pifr/test/22368 for one example: this actually passed run-tests.sh with 16/0/0 passes/fails/exceptions in DrupalErrorCollectionUnitTest. How strange...

This review is powered by Dreditor.

boombatower’s picture

Assuming this is still about what I wrote the issue is when:

run-tests.sh --concurrency > 1
carlos8f’s picture

@boombatower: i used three things: run-tests.sh with --concurrency 1, --concurrency 2, and no concurrency set. run-tests.sh passed one test class 16/0/0 while PIFR failed it 16/1/0 in each case. Unfortunately I can't reproduce exactly what command syntax I was using and with what patches, but I'll see if I can document it more.

I suspect it might have to do with batch size, i.e. running with a TestGroup or --class TestClass.

Anyway, I'm pretty determined to get to the bottom of this :)

boombatower’s picture

As I attempted to document above, I can encounter the error or not even if failed test and concurrency > 1.

** > 1 simply means there is a chance. **

Essentially what has to happen is that a test with the fatal error has to complete after another test has started, thus the "last_prefix" field no longer points to the test prefix that had the fail, and it is never picked up.

There are two solutions:
1) Different method of detecting fatal errors, (not terribly simple)
2) Better way of grabbing errors from files.

As for #2 it would by simpler to pull from all files, but then we loss track of what test they are for and such, so we either need a place to store that array, maybe run-tests.sh only (prefer not, but web ui can't run multiple concurrency so doesn't have this issue).

carlos8f’s picture

Thanks for the elaboration. As far as fatals in the .test code (which I think is the main issue, since PIFR has caught the ones in drupalGet() responses so far) it's extremely reproducible that entire test classes drop out of the PIFR results, so my thought is to have PIFR mark the entire class as failed if it enumerated it at the beginning and it didn't come back in the results. That would be like a 'Detect aborted test' check.

I will have more to say after I've installed PIFR locally and played around with it.

chx’s picture

carlos8f, i used a 0 byte because we are writing after the so anything you write is illegal html anyways and need to be sliced off. I suspected something was wrong with the code and lower case it is :/ the problem with the latest path is the lack of Content-Type does not trigger the error handler, this needs investigation.

carlos8f’s picture

@chx: since you're detecting and filtering out the null byte before parse() is called on the HTML, there seems to be no worry of invalid HTML. Not sure what you mean by lack of Content-Type, the content type is 'text/html; charset=utf-8' as far as I see. Anyhow, glad to have you working on this.

chx’s picture

Well I would be glad if others would now work on this :) What I meant is that the current patch does if (isset($headers['Content-Type']) && $headers['Content-Type'] == 'text/html; charset=utf-8') but what if you have a fatal error and no Content-Type header? The previous patch ommitted this and then I got many failures. This requires a bit of investigation.

carlos8f’s picture

Status: Needs work » Active

OK, so the latest incident involves #653940: Clean-up: Tests do not report all errors in which a fatal error was introduced in comment.test. It crashes browser-based SimpleTest as well as run-tests.sh, but PIFR somehow marked the patch as 'passed' and it was committed. This is unacceptable and really requires some attention.

glacialheart’s picture

subscribing

boombatower’s picture

StatusFileSize
new1.52 KB

This should fix problem once I figure out how to pull db_prefixes used.

Edit: patch is a bit off, but doesn't work yet anyway.

boombatower’s picture

Yea, so I've been staring at the code...there is no way to relate prefixes to test info, unless we log in in some manor. The reason being that the prefix is created in DWTC and if that process dies..there goes the relation.

The data must either be logged in some manor (several possibilities), or we refactor the code as I have suggested and attempted to create the environments outside of test run and place test run in isolated process.

This provides a number of advantages including:
1) performance gain from re-using schemes
2) easy tracking of errors
3) complete and proper isolation which will remove static issues completely

Status: Active » Needs review

biocomp.pat requested that failed test be re-tested.

carlos8f’s picture

Status: Needs review » Needs work

I'm at work on a patch for PIFR that will implement the sanity check I described in #17. That will plug the gap, but the SimpleTest concurrency improvements will still be necessary for proper error reporting.

sun’s picture

carlos8f’s picture

Assigned: Unassigned » carlos8f
Status: Needs work » Needs review
StatusFileSize
new1.9 KB

OK, I think I have found a definitive solution in SimpleTest:

     // Iterate through all the methods in this class.
-    foreach (get_class_methods(get_class($this)) as $method) {
+    foreach (get_class_methods($class) as $method) {
       // If the current method starts with "test", run it - it's a test.
       if (strtolower(substr($method, 0, 4)) == 'test') {
+        // Insert a fail record. This will be deleted on completion to ensure
+        // that testing completed.
+        $method_info = new ReflectionMethod($class, $method);
+        $caller = array(
+          'file' => $method_info->getFileName(),
+          'line' => $method_info->getStartLine(),
+          'function' => $class . '->' . $method . '()',
+        );
+        DrupalTestCase::insertAssert($this->testId, $class, FALSE, t('The test did not complete due to a fatal error.'), 'Completion check', $caller);

My idea is to preemptively fail every test method. If execution does not complete, the fail record is left in the database!

I spent some quality time with PIFR last night (strictly professional of course) and verified boombatower's reports that fatal errors don't get read out of the logs reliably with concurrency > 1. Since my test machine had 2 cores, I set concurrency=2. A fatal error was reported about half the time. With the production test clients, the situation is different because of higher concurrency settings. It seems that the fatal error is almost never caught in production, where it is most vital to catch it.

The test results of this patch are very conclusive, namely that the fails are not caused by the patch itself, but by fatal errors in HEAD. The Comment tests are indeed currently broken as I described in http://drupal.org/node/653940#comment-2366180, and this is the first time it has been revealed!

I also benchmarked the effect of the extra queries, and determined that it doesn't noticeably increase the runtime of tests. So that's also good.

Status: Needs review » Needs work

The last submitted patch failed testing.

carlos8f’s picture

Status: Needs work » Needs review
StatusFileSize
new15.7 KB

Attached a screenshot of what the results look like.

carlos8f’s picture

Note that #28 will need a re-test after http://drupal.org/node/653940#comment-2369428 has landed.

sun’s picture

+1 from me, very nice patch

dries’s picture

Comment #24 sounds compelling -- care to elaborate boombatower?

chx’s picture

I emailed Jimmy because I had no queue access about something similar

       $this->setUp();
       db_insert('simpletest_fatals')->values(array('testId' => $this->testId, 'method' => $method))->execute();
       try {
         $this->$method();
         // Finish up.
       }
       catch (Exception $e) {
         $this->exceptionHandler($e);
       }
       db_delete('simpletest_fatals')->condition('testId', $this->testId)->condition('method', $method)->execute();
       $this->tearDown();

And then at the end iterate simpletest_fatals and there you have it. I am not sure I like the idea of inserting and deleting an assert like that. You are taking away any OOP advantages we had by breaking the encapsulation.

carlos8f’s picture

@chx, see my patch in #28 that implements this :) i've pinged boombatower and waiting to hear back.

I think what boombatower's overhaul entails is basically isolating the test runs as much as possible. That would mean generating the testId in a parent process, spawning a child process to run the test and nothing more, and then doing cleanup in the parent process which would not have crashed if a fatal error occurred in the test. This is perfectly logical since one of SimpleTest's traditional flaws is that it isn't properly isolated from PHP, thus a fatal error ruins everything unless there are multiple layers of PHP to provide a fallback.

I didn't understand the 'performance gain from re-using of schemes' part though. Elaboration, yes.

carlos8f’s picture

StatusFileSize
new2.88 KB

chx and I talked in IRC and decided to add a DrupalTestCase::deleteAssert() method since deleting it manually was kind of messy. DrupalTestCase::insertAssert() now returns a message ID to make it easier to delete.

Status: Needs review » Needs work

The last submitted patch failed testing.

carlos8f’s picture

Once again, the fail in #36 is caused by BROKEN TESTS ELSEWHERE:

The test did not complete due to a fatal error.	Completion check	options.test	60	OptionsWidgetsTestCase->testRadioButtons()

Running Option widget tests locally:

Fatal error: Call to undefined method OptionsWidgetsTestCase::assertFieldValues()

assertFieldValues() is in FieldTestCase, but OptionsWidgetsTestCase does NOT extend FieldTestCase. Problem!

This patch is worth it's weight in gold, as far as I'm concerned :)

carlos8f’s picture

Status: Needs work » Needs review

Latest development in the saga:

http://drupal.org/node/552436#comment-2371556

chx requested that failed test be re-tested.

chx’s picture

Status: Needs review » Reviewed & tested by the community

This is very pretty especially with a specific id being deleted.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

This is awesome-tastic awesome souflé smothered in awesomesauce.

Thanks for your great work on this, carlos! Committed to HEAD.

boombatower’s picture

So we don't actually catch the fatal error, hmm. This is a nice temporary patch, but it should record the actual error, otherwise the fact that qa.d.o displays assertions will be useless and I will get complaints that the bot fails and they don't know why.

Glad to see some initiative in providing a quick fix.

carlos8f’s picture

boombatower:

It now points them to the file, method, and start line, and tells them 'fatal error', although obviously the actual error would be helpful. In a perfect world, every developer that writes a patch would run simpletest locally (at least with the tests that relate to their patch) before they submit to d.o. That way they would see the actual error 100% of the time and save our bot queue the trouble :)

boombatower’s picture

That was the reason I didn't both including the assertion messages in the first place (and had issues with XML-RPC scaling). Yet, from my experience and complaints received this definitely isn't a perfect world. I agree with you, developers should debug tests locally.

chx’s picture

So yeah but we needed pifr v2 and its cool detailed report because sometimes the bot fails differently to local to begin with. But, it's fine to know at least which method fatal'd.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

chrisns’s picture

I did this patch which solved it for us
https://gist.github.com/896344/
simple but seems to work for now