Currently EFQ v2 does the following in CoreBundle:

    $container->register('entity.query', 'Drupal\Core\Entity\Query\QueryFactory')
      ->addArgument(new Reference('service_container'));

and this in FieldSqlStorageBundle:

    $container
      ->register('entity.query.field_sql_storage', 'Drupal\field_sql_storage\Entity\QueryFactory')
      ->addArgument(new Reference('database'));

then, the majority of the QueryFactory is just figuring out which $module to work with, it's not important:

class QueryFactory
  function get($entity_type, $conjuction) {
    $module = $this->figureOutModule($entity_type);
    return $this->container->get("entity.query.$module")->get($entity_type, $conjunction);
  }
}

Queries do:

drupal_container()->get('entity.query')->get($entity_type, $conjunction)

Now, while that the service name for the specific implementation factory is module bound is an EFQ speciality, the rest isn't. We could generalize this for key-value, php storage and whatnot:

  1. There is a service registered in CoreBundle. This is the user facing API, so let's call it userFacingFactory. This factory gets the container (!) injected. This doesn't need an interface, it's a fixed class.
  2. This service figures out based on some configuration what is the service name for the actual implementation.
  3. The implementationSpecificFactory implements an interface. This is registered in the DIC and returns an instance of the implementation. The interface for this must not contain a constructor.
  4. The actual implementation I didn't register into the DIC, the factory will inject things into it. The interface for this must not contain a constructor.

This is a very flexible and dependency injectable system. It allows for practically everything that currently is a problem

  1. Specific implementations really need to be in the DIC so that we can inject things into them
  2. We need a factory for the specific implementations because usually we have some constructor argument that requires a different object. Collection for key-value for example.
  3. There might be multiple implementation based on the constructor argument as mentioned in the previous point.
  4. Then, we need to somehow pick between them.

There's only one wart: the injection of the DIC into userFacingFactory.

Comments

chx’s picture

Issue summary: View changes

moar

Crell’s picture

I've run into the same conceptual challenge in a couple of places. The patch for upcasting controller paramters (#1798214: Upcast request arguments/attributes to full objects) has the same limitation (the factory must be ContainerAware), and if we want to pre-compile the list of access checks for a given route (per #1793520: Add access control mechanism for new router system), we'll likely run into the same issue there.

Which, I realized, is very similar to ContainerAwareEventDispatcher: The subscribers are non-ContainerAware services, but the event dispatcher itself must be ContainerAware.

I *think* this is OK, because I don't have a better idea, but I want to check with some Symfonians first to see if there is something we're missing.

Crell’s picture

OK, I just had a lengthy conversation with lsmith, ircmaxell, and katbailey in IRC. I'll try to do a more complete summary elsewhere soon, but for now, here's the skinny:

- ContainerAware factories are in general OK. Not ideal, but OK.

- ContainerAware other things are in general not OK.

- There are a couple of possible ways to avoid ContainerAware factories, but most seem to be too complex or not-yet-written to be of use here.

- In simple cases where the factory logic boils down to just a 1:1 mapping, it's probably easier to avoid the factory entirely and just encode that into the service IDs, just as we're discussing for the cache system: #1764474: Make Cache interface and backends use the DIC.

In this case, I think we could probably get away with registering entity.query.$entity_type (because each entity type will be using only a single storage backend, and the EFQ implementation is bound to the storage backend, or at least was when I last looked at it). Then you simply $container->get('entity.query.' . $entity->getType()); and you're good to go. (I may be wrong here if there's complexity I'm missing, but it seems like we should be able to do that.)

chx’s picture

Re EFQ, the complexity you missed is a) there's an optional conjunction argument for top level conditions b) you need a new query object every time otherwise your conditions for example just keep piling up. I will stay with the two-factory method :)

Crell’s picture

Ah, then I think what you're putting in the DIC is a factory for that backend. So $container->get('entity.query.' . $entity->getType())->gimmieANewQuery($conjunction);

Which nicely mirrors how you get a database connection object and call ->insert() on it to get a new insert object.

So basically, almost the same as code snippet 3 above except $entity_type moves into the service ID.

chx’s picture

That's pointless, more or less. If you are running a $container->get->get type chain anyways, then it is absolutely not worth doing that. $container->get('entity.query')->get($entity_type, $conjunction) works totally well and does not need a looping construct for every entity type -- instead you register one per storage backend. And, now the figureOutModule code part is just 1 LoC.

chx’s picture

Another advantage is that when you are running a lot of queries you can do

$factory = $container->get('entity.query');
foreach() {
  $query = $factory->get($entity_type, $conjunction);
}

Making it slightly faster.

Crell’s picture

When I'm running "a lot" of queries, they're usually on the same entity type. So

$factory = $container->get('entity.query.node'); 
foreach() {
  $query = $factory->get($conjunction);
}

is just as fast. Really, it's a wash performance-wise. My point is that when figureOutWhichToUse() is one line, there's no need to even make a factory object out of it. If it were more involved, sure, then it makes total sense. But when it's simple, I just don't see the point.

chx’s picture

When you are running a lot of queries you are in field_has_data which definitely iterates over every entity type.

Sylvain Lecoy’s picture

Hi there, I had a thought recently about what is good and what is bad. In substitution patterns, in order to isolate tests and their depended-on objects, we can use several mechanisms.

Here we are covering Dependency Injection and Dependency Lookup, in a testing context:

  • Dependency Injection injects the test double via the API of the object under test.
  • Dependency lookup retrieves the test double via another object. The other "locator" object should then provide some mechanism to configure it during unit tests with a test double.

As long as the two above sentences remains true, I personally think its good practice to continue. To connect this issue, ContainerAware are perfectly acceptable in that the container will be the object locator.

However, I recently saw in a EntityListController a commit which break object encapsulation (@see #1781372-65: Add an API for listing (configuration) entities, in the constructor, the dependency lookup is done via entity_get_controller() and entity_get_info(). These functions allows a hacky mechanism to configure them, e.g. by using drupal_static_reset() BUT it is not obvious and hacky.

For the simplicity, we should either use: Dependency Injection, and refactor the controller signature by changing it from public function __construct($entity_type); to public function __construct($entity_type, array $entity_info, EntityStorageControllerInterface $storage);. Or use Dependency lookup and do it correctly by providing a mechanism to configure the object locator, e.g. entity_set_controller.

Do you see what I mean ? Should I open a [meta] best practices issue for that ?

Crell’s picture

Sylvain: So you're saying you agree with the statement from #1 in general, but we're not applying DI consistently in Drupal yet? I would agree entirely with the second point. A lot of the code slush work is going to have to be continued refactoring to leverage DI better. Sadly a lot of newer work has gone in without regard for that, which means we're going to have to rewrite it now that the functionality itself is in. :-(

Sylvain Lecoy’s picture

Crell: Yeah, I actually agree with the statement from #1 in general, ContainerAware is very fine. I have set a handbook page for DI in the unit tests point of view if you want to have a look: http://drupal.org/node/1814344. This page is mainly taken from an excellent paper on Cairngorm a Flex framework, and you may want to jump directly on the Substituting - Dependency Injection and Dependency Lookup section.

Maybe we should redirect people to this to help them understand best-practices.

catch’s picture

Hmm in terms of new things going in and not taking care of DI, I've been working on the following basis:

- if the not-injected-thing is in the container or should be, then it's CNW automatically or at least a follow-up to fix it.

- if the not-injected-thing is not in the container and won't necessarily be for a long time, then let it go in, at least assuming there's an issue like #1763974: Convert entity type info into plugins around.

In the case of entity info, unless what gets injected is exactly the same as the current array, then we'll need to go back and change things anyway. If we think we should just inject all the things even if the thing being injected is far from where it likely ought to be, then I'd not necessarily be against that but I can see a lot of pushback from patch authors so it could use an issue to discuss it (which I'm not sure this one necessarily is).

Crell’s picture

Entity info is a complex space, because it's still changing rapidly. As I said, we're going to have a lot of cleanup to do in Slush around this space. Fortunately that's what slush is for. :-)

katbailey’s picture

Related to this but not so far discussed here is the issue of how we actually *do* pluggability, i.e. how does a module provide an alternative to one of our pluggable systems and how does that get picked as the one that gets used. This is how I'm imagining it will work:

  • $conf variables specify the service name to use for any given pluggable system, and the factory for that system returns the correct service based on the value of the $conf variable
  • We use a particular tag for each pluggable system, so that services can be checked at compile time to make sure they implement the correct interface for that system
  • The module provides a class implementing the interface for the pluggable system in question, e.g. CacheBackendInterface
  • The module provides a bundle which registers this class as a service to the DI container (tagged with the correct tag) along with all its dependencies (whether they are existing services provide by core, such as the 'database' service, or other new services it registers).

There was discussion over at #1809206-30: KeyValueFactory hard-codes DatabaseStorage about whether we could use service parameters to achieve configurable services, but I don't think that is feasible without Symfony's Config component. Once we remove the giant hack that is the bootsrap container, management of DIC services will happen entirely via the kernel. So I really do think we're stuck with using $conf vars.

Thoughts?

sun’s picture

Once we remove the giant hack that is the bootsrap container, management of DIC services will happen entirely via the kernel.

Sorry, but this is too tempting. Someone still needs to answer the question of "42": #1706064: Compile the Injection Container to disk changed the bootstrap and kernel invocation to pass a system/module list into the kernel's constructor. Accessing that list requires at minimum two to four services already. Which, on top, we want to be swappable already. How can we remove the bootstrap container without introducing a critically conflicting race condition?

katbailey’s picture

That one change bothered me a lot actually and happened at the very end - I kept my mouth shut about it because I really just wanted that patch to land and figured I would deal with it in the ExtensionHandler patch which is the main issue that unborks the whole bootstrap mess: #1331486: Move module_invoke_*() and friends to an Extensions class.

Berdir’s picture

@katbailey The database DIC issue currently also uses %database.info% instead of globals, we'll have to rewrite that too if we can't do that.

There are two reason why I really dislike to use global $conf for this:

a) It's used for way too many different things. We use it for variable overrides (will go away), config() overrides and we also use it for settings.php-only overrides that happen before the variables are loaded, but still used variable_get() (at least in 7.x code) because it's so convenient.

b) "global $conf". So we're rewritting so much, are avoiding to use functions in classes, use dependency injection, all to make it possible to write unit tests and have clear separations and then have no better idea than using globals in there, all over the place? Srsly? :)

We can still use setParameter() on a compiled DIC, right? What about something like this?

settings.php
$parameters['my.parameter'] = ....;

DrupalKernel.php, after we have the container, either a built or a compiled one.

global $parameters;
foreach ($parameters as $name => $parameter) {
  $container->setParameter($name, $parameter);
}

This still uses a global, but it would be centralized in one place and the implementations wouldn't have to know, it could also be set explicitly in a test container, for example. There might even be a way to avoid that global, I'm not sure, the function that reads settings.php could place it somewhere where DrupalKernel could get it from?

katbailey’s picture

@Berdir um, sure :-) That sounds great. My main point was really that we use settings.php to specify the service names to be used - certainly it would be good to find as elegant a way as possible to do that.

sun’s picture

We might need to move the kernel/container building into drupal_settings_initialize(), passing the in-build $container directly into settings.php, so settings.php can do whatever it wants with it.

Swapping out and overriding services will get considerably harder with that, since you need to know the internals of $container, and worse, Drupal's custom service IDs and parameter names inside-out, but to me it almost looks like the only viable solution that would successfully combat the race condition problem.

Also, to clarify early, settings.php violates each and every test paradigm, but that happens by design. Its sole and entire purpose is to manipulate how the system would work under normal circumstances. Just put some heavy override into settings.php and witness how the entire Drupal core test suite will fail.

Moving the kernel/container building even further up (lower) in the bootstrap is guaranteed to cause some other funky race conditions though. We'd essentially replace most (but not all) parts of _drupal_bootstrap_configuration() (i.e., DRUPAL_BOOTSTRAP_CONFIGURATION).

The most obvious part is that the kernel/container wants to get compiled, but that relies on the filesystem, and thus stream wrappers, and in turn, system.module to register the built-in stream wrappers, which in turn are configurable, so that requires configuration (which by default requires cache, which in turn may require the database [at least by default]), ...

I still have my doubts whether it is architecturally right to aim for eliminating the bootstrap container in the first place. Doing so will eliminate most special-casing around Drupal's early bootstrap phases. But at the same time, it factually eliminates the entirety of performance optimizations that currently exist. A kernel/container without bootstrap container inherently means that Drupal is a fully-blown memory beast as soon as it gets instantiated. And since the bootstrap container is replaced with the kernel/container, that only can and has to mean that we have to get from zero to hero in all cases, regardless of whether it makes any sense from a performance perspective.

In total, I really appreciate and love that we're touching code that hasn't really been touched for years. But seriously, I'm missing a clearly laid out architectural plan that goes beyond pure DI idealism and takes real world performance and other issues into account. :-/ Every single change on this front thus far caused Drupal core to become measurably slower and there are close to zero proposals in the queue for how to re-establish former numbers (which by all means, were horribly bad already in the first place). IMHO, we really need solid answers for these problem spaces as soon as possible. I don't think anyone of us wants to release a D8 that is twice or thrice as slow as D7 (which on its own was at least two times slower than D6).

Crell’s picture

A couple of points:

1) I still believe we need to offer a SiteBundle of some sort to allow site-specific manipulation of the container, in a fashion that would get compiled. that's a much better approach for most swap-outs than building up the container and then trying to futz with it.

2) sun is correct that the phased bootstrap that gradually gets bigger is on its way out. That's by design, as it's super-fragile. Yes, portions of that are intended for optimization purposes, but really... in what circumstances are those useful?

- The database phase already does nothing but load helper functions.
- The page cache phase should get replaced with HttpCache (#1597696: Consider whether HttpCache offers any significant benefit over the existing page cache), which would happen before the Kernel does anything other than get instantiated. So it would actually be earlier than it is now.
- Configuration needs to happen before anything, so that we have CMI. Agreed.
- Session will hopefully be lazy based on #335411: Switch to Symfony2-based session handling, so it won't need an ordered step.
- Page header invokes hook_boot, which catch has already agreed needs to die in favor of a request listener. It also runs ob_start(), which with the Request/Response model serves no purpose anymore.
- Bootstrap code includes a crapload of procedural code, plus modules. The module loading should move to be lazy by #1331486: Move module_invoke_*() and friends to an Extensions class or a follow up therein. The procedural code loading we can't really get rid of, but at least if you're on APC it's not a big deal.
- bootstrap full sets up the language (should become lazy), the path system (will become lazy via #1269742: Make path lookup code into a pluggable class), initializes the theme system (who knows what happens to that after Twig gets in), and runs hook_init (which should also get replaced by a kernel request listener).

So looking at the bootstrap pipeline... I see a lot of things that aren't needed, or won't be needed. There's nearly no point in the current pipline where you can intercept the process and do "less than full" and do anything useful other than page caching, which is planned to move earlier than it is now anyway.

So really, it's entirely true that we'd have to go "zero to hero" without a bootstrap container, but that's not a regression. That's already effectively the case. Hero is just really expensive. We should be focusing on making Hero cheaper, not on avoiding Hero. Avoiding Hero hasn't worked out so well.

My general answer to making Hero cheaper is "lazy instantiate things via the DIC, and make partial page caching via WSCCI/Scotch work". We're working on that. :-)

And now that I think about it, comment #16 on is off topic for this thread, including this comment. :-)

sun’s picture

in what circumstances are those useful?

Erm. High-availability, large-scale sites? Skipping a full-blown bootstrap is known to have an extraordinary effect on performance; e.g., see js.php - the numbers presented there also apply to +APC.

Early page cache, and even more so, page cache without hooks, essentially follows that paradigm and achieves the same: Avoiding each and every CPU + memory consumption that could possibly slow-down the request. There's no way you can deny or ignore the success of doing so.

Thanks, however, for taking the time to create the overview of tasks that are currently being worked on.

In light of these, it pretty much sounds like we're introducing a god object with the kernel/container, and in the end, we're replacing drupal_bootstrap() with the kernel.

On that ground, the logical conclusion would then be that all of the _drupal_bootstrap_*() phases are moved into the kernel. Potentially as events (we have to make them faster anyway). Some phases, as you already outlined, will cease to exist anyway.

Off-hand, I don't really see how that would resolve any of the race conditions we're facing, but at least from my perspective, those race conditions are exactly what makes this discussion totally on-topic for this issue. Pluggability is not only defined by extensions participating in the system, but also by "settings.php." Both sources are potential root causes for race conditions that need to be addressed. settings.php is arguably a bigger offender in our architecture, but nevertheless, we're talking about one and only one topic.

Crell’s picture

I've not done super-HA sites, but just from reading the code early page cache is the only step I see that you CAN jump in on. Well, I suppose you could print-exit in hook_boot, but that's Wrong(tm) already. I don't see the big list of steps as being performance-useful now. Therefore, eliminating it doesn't really hurt anything.

IMO, the main HA optimization should be pre-kernel partial page caching via HTTP, be that Varnish or HttpCache.

Sylvain Lecoy’s picture

Also, seems it is the place to discuss two problem spaces:

  1. I can see we are mainly using container as a service locator, we should not use the term Dependency Injection but then Dependency Look-up.
  2. What I mean by services is actually factories, registering factories into a DI is an approach I've never seen before, and could lead to a potential drupalism. EDIT: they seems using this approach in Symfony, stills, that's not common.

Proper dependency injection is actually different than dependency look-up. Both are fine but they are different. When you do dependency injection, you configure your container so when you ask for a new object it returns the object configured with its dependencies. Basically you define a "configuration" of objects against interfaces and let the DIC instantiates and resolves dependencies.

It is worth to remember that inversion of control is a pattern that we are using every days in Object-Oriented Programming. When you pass by reference a "depended-on" object, you've really just done injection of dependence - also named Inversion of Control. The process of letting an object manage the entire life-cycle of objects and their dependencies has led to DIC. With this mechanics, the programmer don't have to type "new" keyword anymore. That's how it came in Java, Flash and other strong typed languages.

Dependency Injection is the process of transforming a constructor which creates its own dependencies to a constructor who get its dependencies through an external mechanism.

  public function __construct() {
    $this->service = new MyService();
    $this->storage = new MyStorage();
  }

Manual configuration of depended-on object

to:

  public function __construct(ServiceInterface $service, StorageInterface $storage) {
    $this->service = $service;
    $this->storage= $storage;
  }

Dependency Injection style configuration of depended-on object

And this is how we do in Drupal:

  public function __construct() {
    $this->service = drupal_container->get('service');
    $this->storage= drupal_container->get('storage');
  }

Dependency Look-up style configuration of depended-on object

More advanced Dependency Injectors uses the language internals such as Reflection on private properties, that way you let the DI configure objects without degrades encapsulation and security mechanism that the former example has compared to the two later. In the second example, you open possibilities for foreign services and storage object that you have no control about. But in PHP we have no such a mechanism so we'll skip speaking about security and possible breaking of encapsulation, that's another subject.

Secondly, we are getting factories everywhere, but is this really necessary ? The purpose should be to register an interface, then the DIC returns the product of the factory related to the DIC configuration. Avoiding runtime operations such as factories creations and so on.

If we need a service locator, then the DI component introduces overhead, but as soon as it gets lazy instanciated, it should not matter. A more adapted pattern for dependency look-up can't really be found, Symfony is doing it good. If we need dependency injection, then the entry point could be more in a DI fashion than in a DL (dependency look-up) fashion. But I don't know if the DIC of Symfony can handle that, and want to handle that. Java keeps the state running, so DI perfectly fit. In PHP however, the bootstrap of such a configuration takes time, and its better not to manage registered objects life-cycle. We should go as straight as possible, that's why IMO there is some factories that can be avoided.

What I eventually want to bring is that it is not common to adds factories in a dependency injection container as the IoC patterns emerged to avoid using factories. When using a factory your code is still actually responsible for creating objects, by DI you outsource that responsibility to a framework which is separate from your code. That could be a potential drupalism to learn: the way Drupal uses DI.

Sylvain Lecoy’s picture

For instance we could hook the configuration of the container, so that a module could configure it.

Example, I have an OAuth module. Depending on the server configuration it can either use the native PECL lib, either use the embedded library.

The OAuth service get registered as a OAuthConsumerInterface, on hook invocation, the oauth module (if enabled) configure the container with the according implementation. That way, all client API using OAuthConsumerInterface gets a native PECL wrapper, or a PHP lib depending on the setup, and the beauty of this is that it does not requires a factory at runtime, everything is configured before compiling - at configuration time. Do you get the idea ?

Of course this is a perfect example to illustrate, maybe problem is not as simple that's why we need factories. But again, if we inject factories in the container that's a good indicator that there is something not done correctly: either in the design, either in the implementation. Good Dependency Injection/Look-up is the example I gave, and this is my opinion, but I think Symfony is also doing it wrong on their Factories documentation.

The dream plan for example like this would be to: allow the client API (here modules) to configure the container, then compiles it. At configuration, the heavy lifting on configuring which object to put in the container, at runtime, the compiled objects are as fast as possible. The challenges are to think about not using factories, but try to resolve the dependency at configuration time, and of course also keep in mind that DIC have a cost, too. A side effect of doing this is that the classes well designed are also more unit-testable because you can inject a mock/stub object instance into the container, during tests.

<?php
  protected function setUp() {
    $this->container->set('oauth_consumer', new MyMockedOAuthConsumer()); // Implements OAuthConsumerInterface.
  }
?>

I will not rewrite this article, but will refer to it: http://tutorials.jenkov.com/dependency-injection/dependency-injection-re....

catch’s picture

@Crell I think what sun means is custom .php files that do a minimal bootstrap then something else. For example http://api.drupal.org/api/drupal/core%21modules%21statistics%21statistic... (which is in core). Not for example that hook_init() in Drupal 7 and 8 loops through the module info for CSS and JS files to add them to the page, regardless of the request type #1029460: Find a better hook than hook_init() for drupal_add_js() and drupal_add_css().

Presumably the answer to custom PHP files like these is that they'll eventually be a route, and that bootstrap becomes 'light enough' to handle that request efficiently. We're a very, very long way from that at the moment but I'm tempted to hold the release up on it if that becomes necessary.

A new global in settings.php for configuring pluggability seems fine to me, or some kind of site bundle.php - I'm fine with anything that's 1. in a PHP file 2. provides a centralized place to set up stuff for a site 3. isn't a measurable performance regression from $conf

I'm very concerned about the various dependencies for having a compiled container, it feels like #534594: [meta] system info cache writing and registry rebuilding is broken without a full bootstrap all over again at the moment.

Crell’s picture

Sylvain: We know the difference between a DIC and an SL. We're using Symfony's DI component as a DIC as much as possible, but in many cases, especially legacy code, that's not feasible. Factories that serve little more purpose than mapping to other services are safe to make ContainerAware. Controllers are as well. Little else is, but it's an ongoing process to get there.

catch: With the Kernel and Container, separate .php files are already barely functional. Those really should switch to either normal routes or their own dedicated kernel, as the installer team is working on: #1530756: [meta] Use a proper kernel for the installer.

Presumably the answer to custom PHP files like these is that they'll eventually be a route, and that bootstrap becomes 'light enough' to handle that request efficiently. We're a very, very long way from that at the moment but I'm tempted to hold the release up on it if that becomes necessary.

Exactly the goal I'm trying to drive us toward. :-) It's just a long road. The dependency knot for the kernel/container/bootstrap is fugly, but is being slowly unraveled. We just need to make sure to support CLI rebuilds, as a canary if nothing else.

Crell’s picture

Issue summary: View changes

typofix

Crell’s picture

Status: Active » Fixed

This is pretty well settled by now.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.