Support from Acquia helps fund testing for Drupal Acquia logo

Comments

keith.smith’s picture

Title: Abstract SimpleTest browser in to it's own object » Abstract SimpleTest browser in to its own object

Status: Needs review » Needs work

The last submitted patch failed testing.

dmitrig01’s picture

should apply

dmitrig01’s picture

oops, forgot browser.inc

Crell’s picture

Subscribing. No time to really review right now. Glancing at the code quickly my first reaction is that most of the properties of DrupalBrowser should be protected, not public, and have proper accessors.

+1 on the concept, though. Potentially couldn't this even replace drupal_http_request?

boombatower’s picture

+1 on concept, postponing #330504: Create auto-install script until completed.

I'll review thoroughly later.

boombatower’s picture

Status: Needs work » Needs review

#3 & #4 seem to indicate you would like it reviewed.

boombatower’s picture

/cvs/drupal/drupal/ would appear to be an issue, not to mention do patches add to CHANGELOG.txt? I thought webchick or Dries did that, maybe I'm mistaken.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Patches do add to CHANGELOG.txt - although often at the last minute when requested. This seems like enough of a new feature it'd warrant a mention though.

Damien Tournoud’s picture

I like the idea of having a better HTTP client / browser available, but this one relies on cURL, so I don't think it can fly as-is.

boombatower’s picture

@catch: thanks for the clarification.

@Damien: I agree having a generic HTTP client would be nice (as I have uses for it outside of SimpleTest), but what do you mean by it relies on cURL so that is no go?

Damien Tournoud’s picture

@boombatower: I mean that Drupal 7 currently doesn't require the cURL extension to be available. Do we really want to make that a requirement? Because replacing drupal_http_request() by something cleaner, faster and more powerful would be really nice.

catch’s picture

cURL isn't available on all systems, and I don't think we should up requirements for Drupal as a whole just for the browser.

To me this means having the browser abstracted with the same requirements ought to be fine - maybe even aggregator module could require it with a pluggable browser in contrib, but not core core stuff.

boombatower’s picture

I really like the idea personally and cURL doesn't seem like an out of the question library does?

boombatower’s picture

Another idea given in IRC would be to make a core browser.module, but I personal think it should replace the drupal_http_request functionality.

Note: also requires simplexml.

boombatower’s picture

Modules that currently use drupal_http_request() in core:

  • xmlrpc
  • aggregator
  • openid
  • simpletest
  • system
  • update

They all have their own parsing code and such which could be much simplified/reduced with DrupalBrowser and would keep all the code in one place.

Summary (requirements): curl, simplexml for PHP.

Question: Should core use DrupalBrowser?

Definitely refactor the browser so that it is not only usable by SimpleTest, but by other developers. Perhaps browser.module. Allow it to function with drupal_http_request so that the parsing/traversing capabilities can be used.

Dave Reid’s picture

I had a plan for a #64866: Pluggable architecture for drupal_http_request(). This issue and that one could probably work together. The abstract browser could be used as one of the 'pluggable' methods for drupal_http_request.

boombatower’s picture

So looks like we need pluggable backends for drupal_http_request: one for curl and one with fsockets (then a setting to use one or the other depending on the host). The DrupalBrowser will always use curl and will do more, but core will only use it to parse with simplexml stuff.

chx’s picture

This is another of those pluggable frameworks we struggle with. If this is an interface (and I think it is) then the methods we need are get and post, we need a property which shows whether this browser is capable of keeping a session (curl can as it knows about cookies). We should ditch the socket based implementation and instead use the HTTP stream wrapper. Basic iimplementation is file_get_contents($url) and the meta data is going to be in $GLOBALS['http_response_header'] which we can copy into our object. curl implementation, in simpletest directory, uses curl as we have it.

chx’s picture

Oh and parsing is totally different from this browser.

boombatower’s picture

Alright, I'll start working on this.

boombatower’s picture

One thing we need to make sure of is that the browser works fine with url's other than drupal ones, currently it only does partially.

boombatower’s picture

Local code is coming along, but I'll hold of with patch for a bit.

boombatower’s picture

FileSize
8.08 KB

Here is a very early patch, just to get an idea of my direction and see what other think.

DamZ noted that I should look at: http://pear.php.net/package/HTTP_Request2. Any opinions: should it be abstracted as much as http_request2 lib or more like what I have?

boombatower’s picture

Assigned: dmitrig01 » boombatower
FileSize
10.51 KB

More functional browser (minimal POST supported), API refactoring.

Direction looked at by chx.

boombatower’s picture

FileSize
22.35 KB

Needs:
1) A bit more documentation and filling out.
2) Support for uploads
3) Socket backend
4) Setting in admin page
5) Some of the items in the TODO list, included in patch

This will due a post request correct now. The browser is designed to correctly interpret urls and actions for non-drupal paths as well (even interprets base HTML element).

Bed time, 4:17 am here.

boombatower’s picture

FileSize
25.9 KB

This took a bunch of tinkering to get POST to work with wrapper as I tried to use http_build_query() to encode fields, but later found out when I tried to replace the code in the curl backend that it failed there to. Issues was it uses & by default and I need an actual &. Just changed to http_build_query($fields, NULL, '&')

It would be possible to support cookies in stream backend, but we can worry about that later. All other items still apply.

boombatower’s picture

FileSize
27.49 KB

Supports proper loading of wrappers and multiple (different) wrapper browsers can be open at the same time.

Not sure where to put admin setting, but code runs off browser_wrapper currently.

Turned the wrapper backend info() method into a hook and implemented in system module.

boombatower’s picture

FileSize
39.13 KB

This patch starts the integration with DrupalWebTestCase, which will prove that code works and tell me what else I need to add.

The included, temporary, browser.test proves that it works with the basis DrupalWebTestCase.

I have started cleaning things out, very slowly, so don't review the DrupalWebTestCase changes for code style and such, just bugs. The block test currently gets 24 passes, 18 fails, and 0 exceptions. It does several posts before the db craps and I get:

Fatal error: Uncaught exception 'PDOException' with message 'SELECT 1 FROM {blocked_ips} WHERE ip = :ip - Array ( [:ip] => 127.0.0.1 ) SQLSTATE[42S02]: Base table or view not found: 1146 Table 'drupal-7.simpletest719844blocked_ips' doesn't exist' in /home/jimmy/software/php/drupal-7/includes/database/database.inc:504 Stack trace: #0 /home/jimmy/software/php/drupal-7/includes/database/database.inc(1552): DatabaseConnection->query('SELECT 1 FROM {...', Array, Array) #1 /home/jimmy/software/php/drupal-7/includes/bootstrap.inc(1012): db_query('SELECT 1 FROM {...', Array) #2 /home/jimmy/software/php/drupal-7/includes/bootstrap.inc(1114): drupal_is_denied('127.0.0.1') #3 /home/jimmy/software/php/drupal-7/includes/bootstrap.inc(1059): _drupal_bootstrap(3) #4 /home/jimmy/software/php/drupal-7/index.php(21): drupal_bootstrap(9) #5 {main} thrown in /home/jimmy/software/php/drupal-7/includes/database/database.inc on line 504 

Of course being SimpleTest it found a number of bugs in my code during the implementation. :)

boombatower’s picture

Status: Needs work » Needs review
FileSize
55.26 KB

This still needs a lot of work, but I'd like to see how many tests fail with this in place so far.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
31.03 KB

In order to keep the patch size down I have remove the SimpleTest integration and will just focus on browser and tests. Once is at a good point then we can commit and then work on a followup patch to replace SimpleTest browser.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
31.11 KB

Interesting reasons for why it was passing locally...anyhow. The reason it failed on test box is that the browser was accessing the host drupal site and not the prefixed database since it never set the prefix in the test (as DrupalWebTestCase does that).

So added and walah.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
31.1 KB

I get the feeling something related to curl craps.

Removed curl form test to see.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
42.46 KB

I expect this to still fail on test client (I need to investigate that), but it passes locally. Added a TON of documentation.

I still need to add a bit more documentation, clean up some of the API, and provide hooks, but it is very far along.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

This is much cleaner then the browser available through DrupalWebTestCase. It still needs some additional functionality before it can replace it, but I would like to get some reviews so we can move this closer to getting in.

My ideal plan would be to commit this very close to as it is now. Make a few more clean up patches and get it up to speed with DrupalWebTestCase, then work on integrating with DrupalWebTestCase and replacing the current code.

Then I want to either replace drupal_http_request or make it a wrapper that calls this browser.

After that we can start using the parsing/printing capabilities to clean up areas of core per #17.

boombatower’s picture

FileSize
46.33 KB

Additional documentation, proper browser implementation for generic request($method) [and docs], support for HTTP header redirect and meta refresh tag [and tests].

boombatower’s picture

Outstanding todos:
* Hooks for get, post events. (optional)
* HTTP authentication (easy followup patch)
* Deal with Drupal-Assertion callback stuff. (decide on api, followup)
* Error handling. (decide how it should work)
* Deal a from containing one button without a name, even no value? (followup patch)
* Page support proxy behavior. (cool, followup patch)

Error handling within the Browser class is the main thing left.

boombatower’s picture

FileSize
45.78 KB

Error handling should be fine.

Cleaned up documentation and code a bit more. This is looking very clean for an initial patch. I don't want to add any more so that the patch stays at a reasonable size.

All tests pass locally, I'll need to investigate what testing client doesn't have setup for this. Any additional tests will be obvious when merged with SimpleTest since all the tests in core will use the browser.

boombatower’s picture

Status: Needs work » Needs review

Curious what bot reports.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
45.78 KB

Spent some time debugging the issue and figure out it happens when clean_urls are disabled. Not an issue with test client configuration.

Code was not being triggered...and had a bug.

$base = substr($this->url, strpos($this->url, '?'));

to

$base = substr($this->url, 0, strpos($this->url, '?'));

and browser_test.module was flawed

$url = url('browser_test/refresh/meta', array('absolute' => TRUE)) . '?refresh=true';

to

$url = url('browser_test/refresh/meta', array('absolute' => TRUE, 'query' => 'refresh=true'));

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review
FileSize
45.78 KB

I was messin with the server before to figure out issue and had a left over.

New patch to ensure dry run.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Somehow I completely lost all my recent changes....several hours worth.....arrrrrrrrrrrrrg. Apparently I never rolled the patch.

Had added a bunch of documentation....cleaned a number of things.............very very annoying.

boombatower’s picture

Status: Needs work » Needs review
FileSize
47.16 KB

I don't think this copy will every be as perfect as the other...but it will have to suffice.

Dries’s picture

Is this meant to deprecate drupal_http_request()? That is, should the xml-rpc backend, the aggregator module, etc all leverage this new browser system? If we're going through the effort of abstracting things out, it is good to know what we hope to accomplish with it (in core).

tstoeckler’s picture

from #41 by boombatower:

Then I want to either replace drupal_http_request or make it a wrapper that calls this browser.
Dries’s picture

I missed that. That's good news. It took several years to get drupal_http_request() to work properly -- lots of edge cases with headers, timeout handling, redirect loops, etc. Fortunately, we have some tests for it. Either way, the sooner we can do that mapping, the more confident I'd be in the re-usability and robustness of this new "browser" component.

boombatower’s picture

Also from #17:

Modules that currently use drupal_http_request() in core:

* xmlrpc
* aggregator
* openid
* simpletest
* system
* update

Some systems do XML parsing and writing which could be handled via the browser's BrowserPage component that uses SimpleXML. Thus writing XML is as easy as $element->asXML();

AS explained through the comments...I would like to get the browser in without replacing anything so that the patch stays small (get big fast). Then I will go through and start replacing all areas that either do XML related operators, drupal_http_request(), and SimpleTest's DrupalWebTestCase itself. There are a couple of additions needed to replace DrupalWebTestCase, but this code is very close and should provide a much cleaner base.

Dries’s picture

OK with me, I'm sure we can make the mapping work.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

Status: Needs work » Needs review

Test slave crapped.

Owen Barton’s picture

This looks good to me. Is there anything in particular we need reviewed, other than basic code style and layout? It would be great if chx could review and check that his architecture thoughts have been sufficiently covered.

So the follow up patches would be to:
(a) Replace the simpletest browser and appropriate XML calls with calls to this
(b) switch drupal_http_request to use this as the backend (a simple wrapper for simple use cases), and test the various modules that use this function
(c) anything else?

I was also thinking that there are probably some useful libraries of http test cases that we could draw from, to make sure we have all the obscure edge cases covered. For example, I assume perhaps the simpletest project has some self tests we could refactor to point at the new browser. Also the wget, curl and lftp projects would likely have some nice tests we could use for inspiration.

It would also be great to make sure the hooks are capable of allowing a contrib to provide transparent optional caching of http requests (assuming the browser isn't doing this already).

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

It doesn't currently provide caching hooks, but we can add that in followup patch.

My plan in terms of tests was to use the SimpleTest tests and write more as we integrate it into SimpleTest itself. The basic tests included prove that it functions and most of the features work.

boombatower’s picture

Status: Needs work » Needs review
FileSize
47.65 KB

Updated for system.module not applying.

Also, in terms of review...yea just basic code review and it would be nice to get an "ok" from chx.

chx’s picture

Status: Needs review » Needs work

There are a number of things I dislike, mostly the separation of http wrapper and browser. There should be a browser class that the various implementations can extend. Keep some methods abstract as necessary. Then have a factory (which i called get in cached and system queue they might need renaming) which returns a browser object. Use the preferred-class-stored-in-variable way as seen in cache or system queue. And then you can use DBTNG or cache style one line wrappers around the code if it gets too ugly.

boombatower’s picture

Wrapper architecture
I am not sure I agree about the http wrapper and browser architecture both logically and in terms of implementation.

The browser is the highest level, it controls everything and runs the show. As such it doesn't make sense that it should be extended by lower level functionality.

class BrowserCURL extends Browser

Now when I go to use the browser I end up with a variety of classes that all extend Browser. That doesn't seem to make sense logically.

Even to look at DBTNG:

class DatabaseConnection_mysql extends DatabaseConnection {

Mysql is a type of connection and thus extends a DatabaseConnection (generic). In the same way:

class HttpWrapper_stream implements HttpWrapperInterface {

It is an interface since no methods or handling are required.

HttpWrapper is not a type of browser and thus shouldn't extend it.

Browser as class
There are two reasons the browser is a class: 1) it stores a lot of context that would require a bunch of ugly statics or globals if written as functions, 2) it followed the DrupalWebTestCase standard which was easy to use and clean.

After thinking about it the reasons for making the browser a set of functions vs a class is somewhat 50/50 it could be done each way...although it would be ugly if done as functions due to all the context that would need to be retrieved at the start of each function. Since both implementations of the browser may be in use at the same time, core using default, contrib forcing other/custom, then the functions would need to also have to specify the type of backend to use in each call...very ugly vs as an object where you just use the factory to get the type you want and use the object reference over and over without having to specify type over and over.

browser_get('stream', 'url1');
browser_get('stream', 'url2');
browser_get('stream', 'url3');

vs

$browser = Browser::getInstance('stream');
$browser->get('url1');
$browser->get('url2');
$browser->get('url3');

Also since the BrowserPage class only makes sense as a class it seems to fit that the browser is a well for the reasons above and consistency.

Not to rush this, but this patch is holding up a few others...and if to be fully implemented before code freeze...needs to get moving.

boombatower’s picture

Status: Needs work » Needs review
FileSize
47.65 KB

After a discussion on skype earlier today with chx we decided on an architecture which requires a bit of refactoring.

Since it is extremely unlikely that anyway will ever want a wrapper backend other than curl or PHP stream we decided to remove the abstraction layer and simply merge it into the base browser code.

The browser will be turned into a set of functions with static variables instead of a class and will not longer accept a wrapper backend parameter. Instead the browser will default to curl if available, otherwise it will use the PHP stream wrapper.

The BrowserPage class will remain since it is a good candidate for a class. It is build on SimpleXML which is an object oriented architecture and since we would like to be able to have multiple pages loaded into memory we need an easy way to keep track of state information with the convenience methods it provides.

For the majority of use cases, a simple GET request, the following code will be required.

$state = browser_get('http://example.com');

I have included a very early version of the refactoring with a function browser_get() and basic architecture.

boombatower’s picture

FileSize
9.03 KB

Woops.

Dries’s picture

- Documentation at the top of the file still mentions 'class'. It needs to be updated.

- _browser_header_parse_all doesn't seem to be used.

- browser_header_name() wasn't a self-explanatory name.

- In browser_get(), $state isn't used.

- Why is browser_header_name() not a private function?

- Why is browser_init() not a private function?

- This doesn't look right:

+    // Detect the availability of curl.
+    $curl = &drupal_static('browser_curl', FALSE);
+    $curl = function_exists('curl_init');

- I don't understand browser_state_set(). It's not self-explanatory.

Dries’s picture

Other than the small code comments, I wanted to add that the design is a bit odd. The use of drupal_static() felt abusive and really begs for a simple class-based design.

boombatower’s picture

FileSize
37.11 KB

This is a good example of the new power the browser provides. The one used by DrupalWebTestCase cannot work on sites other than the local drupal install (at least not well, one reason why it only works if site is in root or up one..and clean urls are on.) due to its weak base URL handling (relies on url()). The new browser interprets the base URL according to spec.

The api.drupal.org site provides a great example of the browser working through its paces.

browser_load();

$fields = array(
  'search' => 'drupal_http_request',
);
$state = browser_post('http://api.drupal.org/', $fields, 'Search');
print_r($state);

With some debug code added we can see what the browser did to get the end result page (same as you see in your browser).

_browser_execute_curl(Array
(
    [80] => 1
    [10002] => http://api.drupal.org/
    [44] => 
)
)
_browser_execute_curl(Array
(
    [47] => 1
    [10002] => http://api.drupal.org/
    [10015] => search=drupal_http_request&=http%3A%2F%2Fapi.drupal.org%2Fapi%2Fautocomplete%2F6&op=Search&form_build_id=form-7275cf0ac557bbb4fe29e7f849880aef&form_id=api_search_form
)
)
_browser_execute_curl(Array
(
    [80] => 1
    [10002] => http://api.drupal.org/api/search/6/drupal_http_request
    [44] => 
)
)
_browser_execute_curl(Array
(
    [80] => 1
    [10002] => http://api.drupal.org/api/function/drupal_http_request/6
    [44] => 
)
)

1) The first request is the initial request to view form definition (as you would hit home page). (GET)
2) The form submission with default values and the search parameter. (POST)
3) The server redirects to a search URL. (GET)
4) After finding the function it redirects to final location. (GET)

I'll update the remaining documentation after dinner.

Damien Tournoud’s picture

I agree with Dries on the need for a "simple class-based design" here.

This is nothing more then an HTTP client, so the "Browser" name sounds far-fetched.

API-wise, here is what I would love to see:

$http_client = drupal_http_client();

// ->get() returns TRUE for 200-like codes.
if ($http_client->get("http://api.drupal.org")) {
  $headers = $http_client->getHeaders();
  $code = $http_client->getCode();
  $content = $http_client->getContent();
}

// ->post() returns TRUE for 200-like codes.
if ($http_client->post("http://api.drupal.org", array(
  'search' => 'drupal_http_request',
  'op' => 'Search'
))) {
  // etc.
}

// ->head() ->delete() and friends are supported too.
$http_client->head("...");
$http_client->delete("http://twitter.com/statuses/destroy/12345.json");
boombatower’s picture

As for #71...that is exactly what it did before.

Considering all the form handling stuff I'd say its a bit more than an HTTP client.

Either way...arg on the change in architecture again.

To clarify #67 was for chx to look at and was not done. The main documentation block has, still, not yet been updated. (dinner)

Crell’s picture

I'm with Dries and Damien here. Whether or not it's technically pluggable or swappable, this use case begs to be a robust class/object. I don't know why there was a consideration of it not being one. It's a tight coupling between complex functionality and relevant stateful data. That's an object.

boombatower’s picture

I think that removing the plugabble architecture is fine...and if we decide to go back to class (I am fine with) then won't be complete loss since I already refactored the code to remove the interface.

chx’s picture

I do not want to hold this issue further. We are fighting on every corner whenever OOP gets involved. We are using drupal_static instead of writing a set/get pair of function. In my view, OOP is evil -- while sometimes necessary there should be a reason to use it. There are two reasons AFAIK, one is building on an already OOP system like PDO or SimpleXML the other is pluggability where an interface, a base class and subsequent classes help. The browser itself has neither. The page is SimpleXML based. Also, using this is WAY simpler than OOP could be. But do what you want. I am tired of fighting this for two years now. I know that sun was complaining not even knowing what a method or an interface is but sure go ahead and undo Drupal. You guys, who were raised on Java, learned Java, work with Java will never ever understand that people do not get this OOP crap. Having DBTNG is bad enough. Let's use OOP when necessary and never otherwise.

But again: I am so out of this issue. I find it funny that since December noone poke in this issue but once I tried to make something more Drupalish, immediately there is the usual for-OOP, loud crowd and the against-OOP much huger crowd is nowhere and I am so tired of trying to speak for them. But do whatever you want, just get this in.

kbahey’s picture

@Damien & Crell

I agree that it is an HTTP Client, and that drupal_http_request() can benefit from that.

But I am trying to see what does OOPifying this buys us in real concrete terms (other than being OOP)? Why not just a keyed array as arguments, and get it over with?

chx’s picture

Well, let's pit codes against each other:

// Jimmy called this _load I like _init more.
browser_init();

$fields = array(
  'search' => 'drupal_http_request',
);
$state = browser_post('http://api.drupal.org/', $fields, 'Search');
print_r($state);

vs DamZ:

$http_client = drupal_http_client();
if ($http_client->get("http://api.drupal.org")) {
  $headers = $http_client->getHeaders();

we can stop here. It's absolutely a matter of taste whether you pass around $browser before or after the function name: $browser = browser_init(); browser_post($browser, ...) vs $browser = new browser; $browser->post(...). I have merely removed the need for the $browser variable to be held outside of the browser routines and moved it into a static -- once again, in the past we have used a get/set function pair to do the same, we do not need any more.

The OOP bloat, however, is shining immediately in Damien's example $headers = $http_client->getHeaders(); this is absolutely typical that even for the simplest cases you want to add a getter. Don't. Keep it simple.

So to sum up: being OOP or not being OOP makes little difference API-wise however OOP is alien to many people and also it leads to bloat. Always. I do not want to point out other issues where I needed to fight hard to get OOP-induced bloat thrown out but it happens.

Crell’s picture

chx, I'm not going to get into another debate with you about the finer points of OO vs. procedural. There's plenty of time for that in Paris where I have two sessions submitted on the subject. :-) You know that I have also gone on record repeatedly opposing OOP where it has no value or is harmful, so I hope you don't think I am pushing OO everywhere just for the sake of "being Java-ish".

I am only going to respond to one of your points for now: That there is some huge majority of PHPers in the world that are incapable of ever understanding "evil OOP" because it's just so hard for them. To be blunt, I call bull.

There are, in fact, hundreds of thousands of developers in the world well-versed in Classic-OOP (C++, Java, PHP, C#, etc., which is different than the Javascript or Perl or Ruby approaches). The professional and semi-professional PHP world has long since moved to mostly OO, even farther than I think is healthy for PHP code. There is a huge sea of developers who get shivers when they *don't* have classes everywhere, and to them Drupal's hooks and drupal_static() and such are completely incomprehensible. Perhaps if that great majority of OO-fearful developers doesn't speak up more, it's because there aren't as many of them as you imagine?

How about giving Drupal developers a little credit for being able to handle a method call? We all learned the Drupal-specific weirdness that is FAPI and the crazy indirection of hooks. A language construct used by hundreds of thousands of people around the world is not our mortal enemy. It's just a tool, like anything else.

chx’s picture

Well be it as I said. Do OOP just do not bloat it. But then you guys will review what's forthcoming 'cos I won't.

Owen Barton’s picture

I think the OOP/no-OOP discussion is not a big deal in this case. Developers just want to be able to submit http requests in an easy to use fashion. I honestly think most developers are happy with a "function params/access array", or with a "create object/access methods/parameters" pattern. If they don't know the former (perhaps they never used any randomly downloaded php lib before using Drupal?) they will pick it up pretty quick as soon as they do any DB work.

The situation is different, and I agree needs more care/thought when we are expecting developers to extend classes or implement interfaces themselves to be able to use a system. So far we have not come up with any use cases here though, and so the question is simply - which approach leads to a simpler and more elegant internal structure/code for the browser.

Dries’s picture

Decision: let's go with a _simple_ OO based design.

Damien Tournoud’s picture

Let's redo this. I'm not sure why I put getters in the example above. Probably to map with Simpletest drupalGetContent(), but it isn't really useful here.

$http_client = drupal_http_client();

// ->get() returns TRUE for 200-like codes.
if ($http_client->get("http://api.drupal.org")) {
  // you can use $http_client->headers, $http_client->code, $http_client->content
}

// ->post() returns TRUE for 200-like codes.
// Note: this is a straight post.
if ($http_client->post("http://api.drupal.org", array(
  'search' => 'drupal_http_request',
  'op' => 'Search'
))) {
  // etc.
}

// ->prepareFormValues() parses the HTML of the current page,
// uses the values from $edit and complete them with default
// values, and returned a keyed-array that is ready to post.
$values = $http_client->prepareFormValues($form_id, $edit);

// ->postForm() posts a HTML Form. It simply calls ->prepareFormValues()
// followed by ->post().
if ($http_client->postForm("http://api.drupal.org", array(
  'search' => 'drupal_http_request',
), 'Search')) {
  // etc.
}

// ->head() ->delete() and friends are supported too.
$http_client->head("...");
$http_client->delete("http://twitter.com/statuses/destroy/12345.json");
boombatower’s picture

I'll go ahead and map the function based code (with the merged backends) _back_ to _simple_ OOP style code. I will not make any further changes.

I have lots of things I want to add, and areas of core to clean up with this. Those are for followup patches just as any additional thoughts to changing the browser right now.

I still don't want to call this just an http_client. We have functions like clickLink() etc that will be added back...doesn't seem like just an http client. Not to mention that is actually understands what forms are in the sense that you can "click" a button instead of just submitting it as a post var.

I dono...thin line there.

Crell’s picture

boombatower: Actually that sounds like a perfect case for subclassing. The base class is just an http client that makes requests and holds the response data for you to do stuff with. A subclass of that provides browser emulation with clickLink(), form handling, and so forth. That way, people just doing JSON or ReST requests can use the simple object without the extra overhead of the fancier web-browser-in-a-box.

boombatower’s picture

@Crell: please look at #63...and see if that is more what you are looking for. I can return it to that, but remove curl/stream separation..or keep it.

Status: Needs review » Needs work

The last submitted patch failed testing.

boombatower’s picture

I am going to start from #63 and remove the wrapper interface stuff. No one has responded in a while...so please "forever hold your piece".

boombatower’s picture

Status: Needs work » Needs review
FileSize
38.6 KB

Nothing new, essentially turned the #63 OO implementation into the #70 backendless implementation. Based on the comment I am not sure anyone look at that code...since it is exactly what was being requested. I just merged the backend interfaces into one class since chx and I agreed that there really is not other use case.

There are still things that can be added and a few patches that have gotten into the SimpleTest browser that are not in this one, but those are great followups. This browser provides the essentials and does it well. Cleaned up version of DrupalWebTestCase browser, although I know it can be cleaned up further.

I would like to get some reviews and then gets this in if possible so I can start on followups.

boombatower’s picture

FileSize
38.52 KB

Quick, code clean.

EDIT: This is holding up a number of other patches so if we could prioritize this that would be great.

c960657’s picture

Status: Needs review » Needs work
+    $headers = $this->requestHeaders + $options['header'];

I suggest renaming the array key to headers for consistancy.

+   * @param $url
+   *   Absolute URL to request.

The code allows $url to be false - the documentation should reflect that.

+  public function post($url, array $fields, $submit) {

This method not only does a POST request but also does some forms handling. A raw post() method is required if Browser is to replace drupal_http_request(). I suggest renaming the method to submitForm(). It could possibly support <form method="GET"> too.

The name of the submit button is not necessarily sufficient for identifying which form on the page to submit. It is possible to have multiple forms containing submit buttons with the same name, or even two submit buttons in the same form with the same name.

I suggest mentioning in the documentation for getAbsoluteUrl() that it only has limited support for resolving relative URLs. I would be nice with a general Drupal function for doing this, possibly based on the Net_URL2 PEAR package (see Net_URL2::resolve()).

getDefaultFieldValue() and processField() misses a @return Doxygen comment. Would it be cleaner if these functions always returned an array instead of treating forms that generate only one value specially (i.e. a <input name="foo" value="bar"> would return array('foo' => 'bar'))?

Also, would it be cleaner to get rid of the $html_value and $type parameters for getDefaultFieldValue() and processField() and instead let them figure this out themselves?

Perhaps getDefaultFieldValue() and processField() could be combined into one function - i.e. like this:

$post += $this->getFieldValues($input, $name, isset($fields[$name]) ? $fields[$name] : null);

post() does not support if several fields have the same name (because the post variables are stored in an array with the name as key). If PHP receives a request, e.g. foo=1&foo=2, only the last value appear in $_REQUEST. But here it is the first value that is preserved AFAICT.

Some systems actually use multiple fields with the same name, e.g. Bugzilla (though only in GET requests, I think), but perhaps we can live with this limitation?

The wording of $input (a SimpleXMLElement representing an <input> or <select> element), $fields (representing key/value pairs used for filling out the form), and $post (key/value pairs to include in the request) is not very clear. I suggest e.g. $elements, $suppliedVariables, $postVariables, respectively (perhaps not the best suggestion - but the idea is to make it clear that the two latter are similar, and to avoid the use of the word “input”, because <input> elements are not the only form elements).

+    return $this->xpath('.//input|.//textarea|.//select');

This does not support <button> elements.

+      $options[(string) $element['value']] = asText($element);

I assume this should be $this->asText($element).

+    $elements = getSelectElements($select);

I assume this should be $this->getSelectElements($select).

+   * @return
+   *   An array of SimpleXMLElement objects representing option elements.
+   */
+  public function getSelectElements(SimpleXMLElement $element) {

The method name should probably include “option”.

+        $options = array_merge($options, $this->getAllOptions($group));

Where is the getAllOptions() method defined?

+  public function getBaseUrl() {
...
+      if ($pos = strpos($base, '?')) {
+        // Remove query string.
+        $base = substr($base, 0, $pos);
+      }

Why strip the query string? Please add a reference to the RFC describing what is going on in this method.

+  /**
+   * Ensure that header name is formatted with capital letters.
...
+  protected function headerName($name) {

What is the purpose of this? HTTP header names are case-insensitve. In the current Simpletest browser this is handled by lower-casing the name. See also #303838: drupal_http_request - case sensitive HTTP header field names.

+      // Remove blank headers.
+      if ($header) {

AFAIK the empty string is a valid header value.

+  public function getHeaders() {

Perhaps this should be called getResponseHeaders(), so they are not confused with request headers.

+   * Parse an idividual header into name and value.

idividual

The typo "peform" occurs several times.

+    // TODO

Do what?

It would be convenient with a setRequestHeader($name, $value) method so you do not have to set all headers in one method call, possibly overwriting a previous call to setUserAgent().

Perhaps the two implementations (cURL and PHP streams) should be implemented using the pattern described here: http://www.garfieldtech.com/blog/pluggable-systems-howto

chx’s picture

Thanks for the review. I fixed a missing &lt; / &gt; in the review. Perhaps the two implementations (cURL and PHP streams) <= not. We have ditched that because there is no sensible other implementation.

boombatower’s picture

FileSize
38.64 KB

I made some changes you suggested...but the majority are known and are followup patches....this is holding up too much as is already 300000X better then what we have.

+    $headers = $this->requestHeaders + $options['header'];
I suggest renaming the array key to headers for consistancy. 

That is a PHP attribute name, not mine.

This method not only does a POST request but also does some forms handling. A raw post() method is required if Browser is to replace drupal_http_request(). I suggest renaming the method to submitForm(). It could possibly support <form method="GET"> too. 

I am aware of the fact that we want to separate....follow up patch, as for replacing drupal_http_request()....it only does GET.

getDefaultFieldValue() and processField() misses a @return Doxygen comment. Would it be cleaner if these functions always returned an array instead of treating forms that generate only one value specially (i.e. a would return array('foo' => 'bar'))?

post() does not support if several fields have the same name (because the post variables are stored in an array with the name as key). If PHP receives a request, e.g. foo=1&foo=2, only the last value appear in $_REQUEST. But here it is the first value that is preserved AFAICT.

Howso? If I do $arr['foo'] = 'bar'; then $arr['foo'] = 'bar2'; ...bar2 wins. Also that is a horrible edge case...I don't see the use...you want to specify the last value and don't care about rest.

+    return $this->xpath('.//input|.//textarea|.//select');
This does not support <button> elements.

Limitation of the previous system, things like this are good followups.

+      $options[(string) $element['value']] = asText($element);
I assume this should be $this->asText($element).

+    $elements = getSelectElements($select);
I assume this should be $this->getSelectElements($select).

Those are left overs from all the conversion of stuff.

Why strip the query string? Please add a reference to the RFC describing what is going on in this method.

Followup patch...already 30x better then before.

+  /**
+   * Ensure that header name is formatted with capital letters.
...
+  protected function headerName($name) {
What is the purpose of this? HTTP header names are case-insensitve. In the current Simpletest browser this is handled by lower-casing the name. See also #303838: drupal_http_request - case sensitive HTTP header field names.

Just to ensure they are all using one standard to they are consistent and comparisons work well. Don't effect the functionality.

+      // Remove blank headers.
+      if ($header) {
AFAIK the empty string is a valid header value.

Was not aware...curious could you provide an example (I have changed the code)...it look something like?

header:

Seems odd

+    // TODO
Do what?

Make the function actually work...placeholder.

Perhaps the two implementations (cURL and PHP streams) should be implemented using the pattern described here:

If you look the initial patches we abstracted to be pluggable...we decided against that since we don't think it makes sense for anything other than PHP wrappers and curl.

boombatower’s picture

Status: Needs work » Reviewed & tested by the community

Based on review and my cleanup...any other changes..unless major bugs need to be done in followup.

Dries’s picture

@boombatower, I think I'm happy to commit this patch but it is somewhat difficult to see the delta between what c960657 suggested and what you changed. Could you create follow-up issues for the things that are left to be fixed, and link them from a comment in this post? Like that, we have a better overview, and a better way to track progress. Thanks! Once the follow-up issues are created and linked from, I'll know better whether to commit this patch or not.

I'll do another review of this patch shortly.

Thanks for your hard work!

c960657’s picture

Status: Reviewed & tested by the community » Needs work

Howso? If I do $arr['foo'] = 'bar'; then $arr['foo'] = 'bar2'; ...bar2 wins.

Yes, but if you post to a form that has the following two input fields, in a regular browser the latter wins, but in this browser the former is posted.

<input type="text" name="foo" value="123">
<input type="text" name="foo" value="456">

This pattern is actually useful: If the former is a type=hidden field and the second is a checkbox with the second name, you can on the server side detect whether the checkbox was not checked or whether it was not included in the form at all.

we decided against that since we don't think it makes sense for anything other than PHP wrappers and curl.

That's good enough for me. I just want to note that some frameworks (e.g. PEAR) provide a number of different HTTP libraries, including the http PECL module.
But even if we decide on only having these two, it might make the code clearer if the code for the two different implementations were split into a base class and two subclasses, instead of being mixed in one class using if's. Well - just a suggestion.

    if (!isset($options['header'])) {
      $options['header'] = array();
    }

    // Merge default request headers with the passed headers and generate
    // header string to be sent in http request.
    $headers = $this->requestHeaders + $options['header'];
    $options['header'] = $this->headerString($headers);

In these lines, $options['header'] change type from an array to a string. This is confusing. I suggest using two different variables for the two different representations.

as for replacing drupal_http_request()....it only does GET.

Huh? drupal_http_request() also supports POST, PUT etc.

Anyway, if you know that you are going to rename post() to submitForm(), it may be easier to do it now rather than in a followup.

[HTTP header case]
Just to ensure they are all using one standard to they are consistent and comparisons work well. Don't effect the functionality.

I think using lower-case is much better - it is simpler and easier to explain. The suggested implementation does not use the case used in the RFC, e.g. ETag or WWW-Authenticate, so this is just as “inaccurate” as the plain lower-case version. The header names are not visible to the end user, so I don't think its worth trying to make them look nice.

The suggested implementation doesn't properly change e.g. LOCATION into Location - it only ever changes the first letter AFAICT.

Was not aware...curious could you provide an example (I have changed the code)...it look something like?
header:

E.g. TE:, see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.39

getDefaultFieldValue() and processField() misses a @return Doxygen comment.

The still do.

What do you think about my suggestion about combining getDefaultFieldValue() and processField() into one function?

boombatower’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
38.47 KB

That's good enough for me. I just want to note that some frameworks (e.g. PEAR) provide a number of different HTTP libraries, including the http PECL module.
But even if we decide on only having these two, it might make the code clearer if the code for the two different implementations were split into a base class and two subclasses, instead of being mixed in one class using if's. Well - just a suggestion.

Which is how it was in the earlier ...patches and it was decided to go this route.


This pattern is actually useful: If the former is a type=hidden field and the second is a checkbox with the second name, you can on the server side detect whether the checkbox was not checked or whether it was not included in the form at all.

That still seems like an edge case and was not supported by the previous version.

Anyway, if you know that you are going to rename post() to submitForm(), it may be easier to do it now rather than in a followup.

This is still following the pattern of DWTC...with some cleaned up code a number of additions. So we need to keep looking at it like that...once this is in...feel free to pump out issues to add/correct functionality.

The purpose of this issue is to "abstract" it and move it out of DWTC..and a bit of cleanup on the side. This is not intended to be a perfect browser as its predecessor was not.

Changed to all lowercase...

What do you think about my suggestion about combining getDefaultFieldValue() and processField() into one function?

My thought it to rework the form handling in a separate issue. I would really like to get this in so all the other things waiting on thsi can get started..then we can continue to clean as well.

Also changed the header handling to allow blank ones...very odd.

Once this gets in...i'll go through and make a number of issues and link to them from here.

webchick’s picture

Dries said he wanted to re-review this, leaving this issue to him.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Committed to CVS HEAD. Marking as 'code needs work' until the follow-up issues are created. Thanks boombatower.

boombatower’s picture

Component: simpletest.module » browser system

Moving to new component.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

c960657’s picture

I would really like to get this in so all the other things waiting on thsi can get started.

What other things are you referring to?

So far, no code in core uses the new browser. While trying to convert drupal_http_request() to a wrapper around the browser (#553280: Integrate browser.inc and drupal_http_request()), I noticed that it lacked several methods in order to provide the same functionality as drupal_http_request(). I think this would also be the case when replacing the Simpletest browser in #553278: Replace DrupalWebTestCase browser with browser.inc. In order to fix these, modifying/extending the browser API is necessary.

There is just one week until the API freeze on October 15, but it seems like there is still a lot to do. If we are not able to get at least #553278: Replace DrupalWebTestCase browser with browser.inc committed in time for the freeze (not that fixing that bug itself would constitute an API change, but it will give us an understanding of the necessary changes to the browser API), would it be better to remove browser.inc for D7?

Don't get me wrong — I like the idea about a separate browser class and I appreciate your work on this, but I'd rather not see D7 ship with three different HTTP implementations (drupal_http_request(), DWTC and browser.inc) because there is no time to consolidate the code.

c960657’s picture

I have opened a separate issue for the discussion of my last comment:
#600032: Back out browser.inc