Coming from #1839090: Call to undefined function drupal_get_path()

When a page is served from cache, only modules set bootstrap enabled in their system entries are loaded. If the module's Statistics API implementation happens to call functions that are defined in the module's .module file, they will not loaded when the page is served from cache.

Seems like there are a couple of possibilities:

  1. Make developers aware of this behavior in the API documentation and make it their responsibility to include the required files before running their code. This seems like the easiest way to go, but may not have the best DX
  2. If a Statistics field is active, load the declaring module. This isn't particularly performant (suppose a very heavy module has a very lightweight statistics.inc that doesn't make a call back to a mainline function), and this may have unintended consequences (some modules may improperly call drupal_get_path() at the top of the file to include random other incs). DX could be good in some ways, but bad in other ways.
  3. Another possibility is adding something to the API that somehow allows developers to reference multiple files that need inclusion. I don't know of any other Drupal system that allows this, so it's not standard. It also seems a little cumbersome and the API is already a bit of a mouthful.

Any thoughts?

Comments

mondrake’s picture

After some trials, I tend to think that this case requires dedicated focus while developing the integration plugin.

Out of experience with two modules I developed some integration for, 'Browscap' and 'IP Geolocation Views and Maps':

  • drupal_get_path() won't work (see #1839090: Call to undefined function drupal_get_path())
  • also, module_load_include() won't work
  • so, to include module files you will need to use require_once dirname(__FILE__) . '/path/to/yourmodule.module';, and ensure that any chained inclusion follow the same logic (i.e. you cannot have a module_load_include() in the file you include through require_once...)
  • file inclusion may not be enough - a hook_init implementation that would normally 'do something' and be triggered at module load, would need to be explicitly called by the plugin in case of cached page (this is the case for ip_geoloc)
  • any dependency should be addressed manually (i.e. if a module's function depends on a function in another module, the latter would have to be manually loaded as well)

With this said, I believe options 2 and 3 above would fall short anyway (#2 because if developer is not checking all the possible implications, dependencies, etc, the entire call would fail, #3 because you would miss the function calls of hook_init). It is really a matter of developing the statistics plugin callback in such a manner that it can cope with cached page requests.

My ideal scenario is as follows (I do not know if this is all possible, but let's dream :)):

  1. Make clear to developers they have to deal specifically with this case, if they want to (per #1 above).
  2. In the API, add a parameter in hook_statistics_api() to declare if the module is capable or not of collecting data for cached page requests. By default, NOT. Better Statistics should use this to decide whether or not to call the callbacks of the module at runtime.
  3. In the admin form, add a 'cached request' fieldset where an additional flag for each of the modules that declared themselves 'cache-ready' could be set/unset by an administrator. For performance reasons, you may want to disable data collection in this case even if the module is capable of. (example - geolocation adds a heavy overhead that overcomes the benefit of caching the page)
  4. In the admin form, again, and again for perfomance opportunities, have a general switch to switch on/off completely accesslog statistics for any request served from cache.

Cheers

mondrake’s picture

I just realised an alternative could be to call drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL); if we are serving a cached page. But that would slash any benefit of caching I believe.

EDIT:

However, calling that bootstrap phase during hook_exit can lead to nasty issues like

Warning: Cannot modify header information - headers already sent by (output started at /.../includes/bootstrap.inc:1364) in drupal_send_headers() (line 1216 of /.../includes/bootstrap.inc).

if the called module's hook is trying to write to the returned page (I guess that's because in a caching concern the request has been executed and returned already)...

iamEAP’s picture

Hmm... I think I'd like to avoid complicating the API much further.

I wonder if, rather than bootstrapping Drupal further, it would be sufficient to always include common.inc. I'd bet a majority of functions required would be found in there, though there may be some cases where a function in common.inc relies on a function elsewhere...

mondrake’s picture

I'd leave that to the plugin developer, in order not to load Better Statistics with unnecessary overload.

For instance, I ended up including that for the IP Geolocation plugin (it is working now), as one dependency requires also drupal_http_request()

    // If caching, the module will not be loaded. In such case,
    // ip_geoloc_get_visitor_location() will be undefined.
    if (function_exists('ip_geoloc_get_visitor_location')) {
      $geoloc = ip_geoloc_get_visitor_location(); // uses internal cache
    } 
    else {
      require_once DRUPAL_ROOT . '/includes/common.inc';
      $list = system_list('module_enabled');
      require_once DRUPAL_ROOT . '/' . $list['ip_geoloc']->filename;
      require_once DRUPAL_ROOT . '/' . $list['smart_ip']->filename;
      $geoloc = ip_geoloc_get_location_by_ip(ip_address(), FALSE, TRUE, FALSE, TRUE);
      if (!empty($geoloc) and (!isset($geoloc['provider']) or empty($geoloc['provider']))) {
        $geoloc['provider'] = 'ip_geoloc';
      }
    }
iamEAP’s picture

I have a feeling there won't really be a clean, friendly way to solve this problem. Someone's going to have the feel the burden eventually, and I think it's better to pass it to developers (to ensure their code works for both cached and uncached requests) than to site builders (who may not necessarily understand or want to understand caching).

With that being said, can we move forward with providing better documentation to developers? This might be a good opportunity to create some documentation pages on d.o.

mondrake’s picture

Title: API - Calling a module's function on a cached page request » API - Dealing with statistics data collection on cached page requests

I tend to agree :), I am also proposing to change the issue title here.

Just a few free thoughts for your consideration, iamEAP:

  1. I still believe that a general switch 'get statistics for requests served from cache yes/no' would make sense to admins, for general performance opportunity. Also it looks like in Drupal 8 cache/non-cache statistics collection may take fully separate paths (if hook_exit() is eventually dropped), so this 'main switch' could support activation (or not) of the cached functionality.
  2. Is there a way that the information whether a page is served or not by cache is 'gathered' by Better Statistics and passed on to the implementing module? Just to avoid that each module get that info its own way (e.g. checking function_exists, headers, drupal_bootstrap), not to mention that bootstrap phase may be raised by a module in the process of data collection.
  3. Last, currently in better_statistics_exit() the very first thing the routine is doing is raising bootstrap phase to DRUPAL_BOOTSTRAP_SESSION. I believe this is so to match the core function. But - is it really necessary for cached pages?
iamEAP’s picture

I think #1 would be a good idea, but probably rolled in as part of #1879726: Should statistics be collected on every page? (in addition to path-based and role-based inclusion). Though it might make sense to include this on the performance configuration page rather than the statistics configuration page.

I think #2 is also probably a good idea. I don't know the best way to hand off the info. We'd have to break the API to include the info as part of the call to the respective field data callback. Suppose we could provide a function that modules can call like better_statistics_cache_status() or something like that.

I believe the only reason it's necessary is because accesslog stores a session ID and DRUPAL_BOOTSTRAP_SESSION is required to gain access to that information.

iamEAP’s picture

Status: Active » Needs review

Here's a patch that:

  • Adds a function better_statistics_served_from_cache() that returns TRUE for cache hit, FALSE for cache miss, and NULL for uncacheable,
  • Makes use of the function for the implementation of the "cache" field provided by the module,
  • Adds documentation to statistics.api.php about callback functions being used on cached pages.

For your purposes, the function returning FALSE vs. NULL doesn't matter much, since in both of those cases, Drupal's bootstrap phase should be full.

iamEAP’s picture

mondrake’s picture

Maybe I am paranoid.... but theoretically better_statistics_served_from_cache() could be invoked before headers are fully sent, so freezing the value of the static variable in the function. I changed that to include a switch to force scanning the headers, and included a call at the beginning of better_statistics_exit() - so to make sure cache status returned is the one relevant at that point.

Maybe with that the following code could become conditional on $cache_status === TRUE??

    drupal_bootstrap(DRUPAL_BOOTSTRAP_SESSION);
    include_once DRUPAL_ROOT . '/includes/unicode.inc';
iamEAP’s picture

That's actually a good point. We may as well just get rid of the static caching entirely.

iamEAP’s picture

Ah, actually, nevermind.

The headers_list() function will return all headers regardless of whether they've been sent or not. Static caching is probably a good idea, still. Disregard #11.

Some confusion might come from the fact that there isn't necessarily any correlation between cache status and bootstrap phase. If what you want to know is the current phase, calling drupal_bootstrap() should return the phase. The call to bootstrap to the session phase in hook_exit() incurs very little overhead on uncached pages because it would just be a no-op (drupal_bootstrap would see that Drupal's already bootstrapped to a higher phase and just return).

mondrake’s picture

Status: Needs review » Reviewed & tested by the community

I prefer to have static as module plugins will likely call the function in the callbacks to determine how to return the field value.

So either patch in #9 or in #10 are ok with me. The #10 will just make sure that even if better_statistics_served_from_cache() is called before the headers are built, the status of headers at hook_exit is checked.

iamEAP’s picture

Status: Reviewed & tested by the community » Fixed

Thanks, mondrake. Committed #10 with small changes to comments.

http://drupalcode.org/project/better_statistics.git/commit/a74ae36

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.