Cloning of context objects [#1100188]

Context objects should be clonable (we probably need a different name for it to avoid confusion with clone() though). A cloned object should

1) Have a reference to its parent object, prototype style.

2) Pass through unknown requests to a parent.

3) Allow for context information and handlers to be explicitly set.

4) Be added to a global context stack for clean retrieval.

5) Be lockable, with a tracking variable.

6) Pop off the stack when the tracking variable goes out of scope.

Comment	File	Size	Author
#21	butler_bench2.png	141.33 KB	msonnabaum
#20	compose.png	83.59 KB	msonnabaum
#20	clone.png	81.33 KB	msonnabaum

Comments

Comment #1

eaton commented 31 March 2011 at 00:42

It sounds like 'wrapping' might be a better explanation of what's going on than 'cloning'... The process would involve making a "dummy" context object that has some explicit, hard-coded responses already set (for example, 'The primary node entity for this request is node 2'), has a pointer to the request context that it is 'wrapping,' and passes on any questions to that wrapped context if it doesn't have a hard-coded override.

Pseudo-code for using one of these wrapped objects might look like, say...


$context = get_context();
$overrides = array(
  'primary_node' => node_load(1),
);
$wrapped_context = $context->wrap($overrides);

$block = build_me_a_block('node_summary', $wrapped_context);

$wrapped_context->discard(); // or perhaps just unset()?

In the above example, I'm assuming that I'm getting the global context, overriding the primary node with an explicitly loaded node object (or just setting it directly, if none existed on the current request context). The block can ask all sorts of contextual questions if it wants to, because it's getting a full context object.

Is that accurate?

Comment #2

Crell commented 1 April 2011 at 00:14

Yep. This is basically what you and I designed back in San Francisco. I did some dummy code for it at the time that surprisingly worked. We just need to implement it properly.

Actually, the syntax we're using now makes overriding easier, and the separate lock step (which I discussed with DamZ in Copenhagen over a fancy dinner while ignoring our host, Microsoft) from the clone/wrap step gave us a control variable we could use in place of a discard() method. It actually worked quite well in my testing. I just need to write it for reals. :-)

Comment #3

Crell commented 14 May 2011 at 01:31

I've created a new branch called mock-clone that implements this sort of stacking, but actually using PHP clone(). It may or may not work in the long run but I think it's worth a shot. The upshot is that due to the way PHP works, we don't actually duplicate much if any memory to make the new object thanks to copy-on-write. The downshot is that once we start editing the values in the new object, we may end up copying more data than we intend to. Not sure. Input welcome. :-)

Comment #4

dave reid

he/him

English

Nebraska USA

commented 17 May 2011 at 16:31

Without knowing all the internals of Butler, I would get the concept of cloning better than wrapping. So I'm +1 for that, although it looks odd since usually it's written as $object = clone $parent since clone is not a function, it's a keyword.

Comment #5

Crell commented 17 May 2011 at 16:36

Well the alternative syntax would be something like:

$c2 = $c1->mock();
$c2['foo'] = 'bar';
$t2 = $c2->lock();

So it's not that far off. The main difference is in the implementation of how we handle stacking the data internally. Do we actually call up the chain, or do we just rely on PHP's variable copying semantics and hope for the best?

We have less control in the latter approach, but PHP handles most of the difficult parts for us so the implementation is super simple. I don't know if that's a good trade off or not right now. That's what I need feedback on. :-)

Comment #6

catch

he/him

English

commented 17 May 2011 at 16:42

I thought I replied to this ages ago.

In eaton's example, the full node object is loaded directly into the context object, that might also be fine with copy on write, but I'd feel better if it was just a node ID and the node_load() (or equivalent for whichever context handler) only happened when you need to access the specific object. Then if there is a full copy at some point, it's an int, not a big node object. entity_load() and whatever else can handle their own static caching. This isn't directly answering the clone syntax question I know.

Comment #7

eaton commented 17 May 2011 at 17:18

We have less control in the latter approach, but PHP handles most of the difficult parts for us so the implementation is super simple. I don't know if that's a good trade off or not right now. That's what I need feedback on. :-)

Benchmarks in a variety of circumstances -- focusing on the memory footprint and the execution time -- really seem like the only way to resolve that question. I don't have a horse in the race, philosophically speaking.

In eaton's example, the full node object is loaded directly into the context object, that might also be fine with copy on write, but I'd feel better if it was just a node ID and the node_load() (or equivalent for whichever context handler) only happened when you need to access the specific object. Then if there is a full copy at some point, it's an int, not a big node object. entity_load() and whatever else can handle their own static caching. This isn't directly answering the clone syntax question I know.

Yeah, this is something that's come up a couple of times. Should we allow context to be actual things or should the code making use of context do its own loading, and rely on other caching mechanisms. There's a serious danger in assuming that everything will be an entity -- just because we have "cheap" entity caching doesn't mean that will solve all the problems.

The initial discussions around context involved the assumption that the context system would "lazy load" expensive context information once it's requested, but hold onto it after that. Not sure how it interacts with the entity cache, but we should probably try to achieve at least some level of consistency for devs using the context wrapper. (IE, 'do I need to load this myself, or will it be loaded for me.')

Comment #8

voxpelli commented 17 May 2011 at 17:47

I like the concept of wrapping better - then you get a child context to a parent context. If you clone - then don't you get a sibling context rather than a child context? I don't think we want to have sibling contexts - we want child contexts that inherit non-overridden data from their parents? So - the interface shouldn't be talking about cloning I think - whether we use it internally is another matter. (I might be totally off here - if so - please ignore me)

I liked the syntax Hugo used for the wrapping/mocking in his code from DrupalCon Cph: https://gist.github.com/549641 It's similar to the one mentioned in #5. Instead of mock() - might createChild() or something be more appropriate? (Again - I might be off)

+1 on #6 - not saving an entire node object but only an id and perhaps doing some kind of lazy loading seems like a good idea.

Comment #9

Crell commented 17 May 2011 at 20:31

Status:

Active

» Needs review

catch, eaton: Remember that entities are objects, and in D8 are likely to be classed objects. Objects don't duplicate unless you explicitly tell them to, so with the approach in this branch a node object, eg, would never be copied. Only its handle would. Which, now that I think about it, is a hole in the immutability since modifying the node object would modify it's single universal copy (since it is likely to be the same actual object as the one in the entity cache). Hm. That may actually be a hint that we should be storing only the lookup data (nid, uuid, etc.) rather than the actual object. Thoughts?

voxpelli: Well, there's no native "child" behavior either way. I was actually originally going to implement something similar to what Hugo has there (we discussed it back in CPH during a BoF, which is where the colon-delimited syntax came from), but realized that we had a potential to use clone() so ran with it to see what would happen. :-)

With a manual approach, each context object would have a parent context object that it references explicitly. If we try to access a non-overridden (literally or its handler) property, then we'd call that parent object directly and let it handle whatever caching is going to happen, and just return what it gives us. Pro: We have very specific control over what gets cached where and referenced where. Con: More PHP-space implementation, and therefore a performance hit for more function calls.

With the automated approach (what's in this branch right now), the stack is implicit. Each object's list of properties is replicated by PHP, and therefore we get access to the parent's saved data at virtually no cost (because by default it is both our data and its data until we start messing with it). Once we set a property, however, then that array key and the parent variable change, but the others do not, within the engine. So the memory usage is lower, and the runtime is faster because there's no PHP-space function calls. (At least I think that's correct.) Pro: Much simpler implementation, probably(?) faster. Con: We don't get to control the logic of when things get copied. If we derive a new value in a child object, then that doesn't propagate back up to the parent. So if, for instance, we don't end up looking for current-nid until a child context object, and then later on in a higher context object we ask for it, we have to re-derive it. That may or may not be expensive.

Either way there's still the global context stack, and you can only access the top most at any given time. The transaction-like tracking variable is still there either way, context objects still have a mutable and immutable mode, etc. The key difference is, I think, where we move the potential extra cost. Hmmm...

I wonder if we're over-thinking it. :-)

Comment #10

eaton commented 17 May 2011 at 20:45

catch, eaton: Remember that entities are objects, and in D8 are likely to be classed objects. Objects don't duplicate unless you explicitly tell them to, so with the approach in this branch a node object, eg, would never be copied. Only its handle would. Which, now that I think about it, is a hole in the immutability since modifying the node object would modify it's single universal copy (since it is likely to be the same actual object as the one in the entity cache). Hm. That may actually be a hint that we should be storing only the lookup data (nid, uuid, etc.) rather than the actual object. Thoughts?

Yeah, that's definitely thorny. In a "true" complex system we could easily get into weird side effects, although I'm less concerned about that than the developer load of always having to figure out what to do with a particular piece of information. ("Perform a lookup? Address it directly?" etc.)

I'd definitely lean towards storing the identifiers whenever possible. There may be certain situations where that's not possible, for example if a View were part of the context, but for entities at least it makes sense.

Either way there's still the global context stack, and you can only access the top most at any given time. The transaction-like tracking variable is still there either way, context objects still have a mutable and immutable mode, etc. The key difference is, I think, where we move the potential extra cost. Hmmm...

Honestly, I think we might be. Only time will tell which approach is the smoothest/fastest, and changing between the two approaches doesn't seem like it would be PUNISHING bad if we realized one or the other was giving us a bad performance hit.

Comment #11

voxpelli commented 17 May 2011 at 21:07

No matter if we use clone or create a new object I think the syntax should be the same - like in #5. The mock() there could just as well do a clone as it could create an entirely new object? Using the same syntax hides the exact implementation details from the outside, which is probably good, and it also allows for easier experimentation.

Comment #12

Crell commented 17 May 2011 at 22:36

Good point in #11. As soon as I get the chance I'll add a simple wrapper method so that the externally facing API wouldn't change either way.

Comment #13

Crell commented 17 May 2011 at 23:21

I've added an addLayer() method that just wraps clone($this).

Comment #14

catch

he/him

English

commented 18 May 2011 at 04:09

I'm not sure there's a straight choice between storing and returning objects (possibly arrays an objects might get returned too), and storing pointers and returning pointers.

Does anything stop us storing pointers and returning objects? That is roughly what the menu router system does for context - store some meta data in {menu_router} but you can access the current loader object via menu_get_object(). We are likely going to add a bit of CPU overhead in terms of magic methods or similar, but it feels like that'd be minimal. If people are already discussing lazy loading the context data into the context object, then the knowledge of how to load it is going to be needed anyway inside that system. I'm not sure how this differs for views specifically, there is views_get_view() etc. - although views does have issues with build expense and static caching, but I'm not sure that will be helped if the full view is copied in.

Another reason to store metadata instead of or as well as objects, is if we want to generate cache keys etc. via context - node/nid + langcode - otherwise we'd have to have a way to build those from the objects, when we only really need the data to load them.

Entities aren't the only thing that has static caching or doesn't need to be built more than one request. $_REQUEST is available all the time, global $language is available all the time (although probably global $language will cease to exist if this takes over completely), global $user via $_SESSION is available all the time (and not a real entity).

One other question here - back to entities again. Say there's a CRUD operation on a node during the page request- currently this will refresh caches etc., if there's a cloned copy of the $node in the context object somewhere, what should happen to that?

Comment #15

Crell commented 18 May 2011 at 15:50

global $language, global $user, and pretty much anything along those lines will/should cease to exist, being replaced by the context system. That's the idea. $_*, you should never, ever touch directly because then you cannot mock them. Those should also be accessed via the context object.

At least at present, the idea is that you ask for 'foo:bar', the context object maps that to the handler "baz", which returns "some value". The context object then caches "some value" and also returns it to the caller. It then never calls baz again for the remainder of the request. So at that point, no, there's not really a way to get "both" the nid and the node object.

Thinking about it further, if we need to be able to generate a varnish cache ID to regenerate a block just from context information, that means we have to have lookup IDs, not objects, in that URL. That means we have to be dealing with the lookup keys and push the object caching elsewhere. The alternative is that we have to build a dual-mode retrieval into the context system, which seems overly complex.

Which then leads me to suggest that we do want to have a service locator after all, and one of the services (plugins) that we offer is an entity controller. That way, you still get full dependency injection but can do something like:

$node = $this->services->get('NodeController')->load($this->context['nid']);

(Exact syntax TBD, of course.)

So inside a plugin (which I hope will become the norm, not the exception) that's how you'd do node_load(), and then the only place that node objects get cached is in NodeController, where they are now.

Follow?

Comment #16

pounard

French

commented 31 May 2011 at 23:27

@#14 Any object in PHP is passed by reference, meaning that by loading any entity using the actual API you are supposed to fetch a reference over the lazzy loaded instance. Each modification somewhere would actually modify the object for everyone. The only way to break this is to use the clone keyword to copy the object, cloning a node does not make any sense except if you really want to create a new one (that's not totally true because if you serialize then unserialize an object, you loose references, but that's another problem and would probably not happen during runtime).

PHP clone is not a deep clone, if you clone an object which has properties that points to another, you only copy the reference, the internal object won't be clone.

I think using the clone keyword is probably better, by sticking to the language specifics you will maximize the comprehension for external developers. Plus you can use the __clone() magic method for objects if you need to do a deep clone or specific operations on clone.

I hope that the CRUD operations are well developed. But I think that if you load a node an entity, and this one already has been loaded, you will fetch a reference, and if you save an entity, eventually caches will be refreshed, but not this particular static cache I hope, at least if cache are being refreshed the saved object should be put back in it before doing anything else to avoid desynchronization.

PS: I like the 5 & 6, actually the implementation you did on the mock branch seems really elegant! Are you trustful enough towards to the PHP GC to be sure your implementation will work flawlessly? And if I call the lock method twice on a context object, would it destroy it even before the first lock goes out of context?

Plus you are manually calling the __destruct() magic method, which seems not advised. You probably should have a static method such as unstack(ContextInterface $object) that just unset(self::$contextStack[spl_object_hash($object)]). You would be able to call this exact same method in the context object destructor itself. The unset() call is silent if the value does not exists, and it's a language construct so it will be really a lot faster than the array_search() and array_splice() calls.

By the way, if you remove the object from the stack when your observer goes unset() then it's not referenced anymore anywhere, so it will be implecitely destructed by PHP GC, you don't have to do it by yourself. Removing it explicetely from the stack by the observer object seems really better than giving the object the responsability to remove itself on destruct.

Comment #17

Crell commented 3 June 2011 at 01:11

I've split the object-vs-key question off to its own issue: #1177246: Context values: Keys or objects?. Let's continue that there and focus on the clone/override mechanism itself in this thread.

Comment #18

Crell commented 22 June 2011 at 00:50

OK, finally, an update. Sorry for the delay. :-(

There are now two branches in the repository: 1100188-mock-clone and 1100188-mock-compose. Both have an identical interface, and both pass the same unit tests. The only difference is the internal implementation.

I'm not entirely sure how to benchmark these, but we could use another pair of eyes on them for both performance and for what weird bugs we'd get with each approach. I'm sure there are some. :-) Feedback welcome.

Comment #19

catch

he/him

English

commented 22 June 2011 at 01:29

Crell pinged me in irc.

I haven't looked at any code, for performance comparison though, I would set up a test case that follows what you'd expect to be a common usage pattern (probably a combination of mocking and not mocking). i.e. if using clone you may want to do things like clone clones, rather than just iterate and clone the same object over and over again. Mocking objects should actually change something on the object if that's what we expect people to do etc. - also the size of the context object is going to matter.

It'd be handy to have two scripts - one that does 50/100/500 - whatever the medium-worst case should be in a Drupal request. And another that is 10,000-100,000 iterations of the same thing - no single page should do this, but a daemon might, and cumulative performance is as important as per-request. Variations on usage with this would be good as well, but anything is better than nothing.

ab or any other load testing tool is useless for stuff like this, you might get something from microtime, but xhprof is going to give you actual CPU and memory numbers (the memory is a bit wonky but gives you an idea and no worse than any other way of doing it), and it skews the results a bit less than xdebug does. I'm happy to talk people through setting that up and reviewing the results in irc.

Comment #20

msonnabaum commented 14 July 2011 at 02:55

Status	File	Size
new	clone.png	81.33 KB
new	compose.png	83.59 KB

I threw together a benchmark for this found here:

http://drupalcode.org/sandbox/msonnabaum/1214752.git/blob_plain/refs/hea...

It's a mix of context handlers and explicitly set contexts, with context overrides up to 6 layers deep. XHProf info attached, but it looks like clone is slightly faster and uses less memory, but its still only about 300ms in over 17k+ calls.

Feedback on how to improve the test welcome.

Comment #21

msonnabaum commented 20 July 2011 at 04:22

Status	File	Size
new	butler_bench2.png	141.33 KB

I made a few alterations to the benchmarking script to ensure consistency and make it resemble a real-world request a bit more.

The test script adds each k/v from $conf as an explicitly set context on the original context object, in addition to the http contexts that has registered context handlers.

It then iterates 100 times, randomly overriding contexts up to 6 layers deep, picking from explicitly set contexts vs contexts with registered handlers.

So I ran that request 1000 for each branch, and here are the averaged results.

4e264399d3ec0 = compose
4e263717dae1d = clone

http://drupal.org/files/issues/butler_bench2.png

So looking at CPU time, clone is about 50% faster than compose (-9ms CPU, -30ms wall), while compose's peak memory is 100% better (-2.4k).

Between the two, the speed difference is more significant, but this is still on the very high side of what we'd actually see in a request. I don't know that the difference is significant enough to make a decision between the two based on performance, but if there were no other compelling reason to go with one or the other, clone looks better.

Comment #22

Crell commented 27 July 2011 at 00:38

Given that it seems like it's not a landslide in practice, I'm inclined to go with the composition method rather than cloning as I think it would offer us more flexibility later in terms of caching and mocking and such. If we end up not using any of that flexibility, we can revisit this question later. If we do our jobs right it shouldn't be an API change to change our minds later.

If no one objects in a day or two I will merge the compose version into the mainline and close this issue.

Comment #23

eaton commented 27 July 2011 at 04:03

I concur. Assuming that we're going to be wrapping objects rather than duplicating them means that we'll have the underlying process for assorted testing tricks in place. It might make sense to keep an eye out for future implementation details that would make swapping the underlying clone/compose implementation more difficult, but for the moment sounds like it's just a question of speed vs. memory.

Comment #24

Crell commented 29 July 2011 at 04:19

Status:

Needs review

» Fixed

I have merged the compose version into the 7.x-1.x branch. Thanks, Mark! Onto the next task.

Comment #25

12 August 2011 at 04:21

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Cloning of context objects

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

News items

Our community

Documentation

Drupal code base

Governance of community