Support from Acquia helps fund testing for Drupal Acquia logo

Comments

markus_petrux’s picture

hmm... let's say modules that need a library need to implement hook_libraries_info(), maybe something like this:

function wysiwyg_libraries_info() {
  return array(
    'fckeditor',
    'tinymce',
  );
}

Now, libraries API knows what it will be requested to find at one time or another. For example, Wysiwyg API when working to set up the node edit form, or whatever else form, could ask for the path to 'tinymce' like this:

$path = libraries_get_path('tinymce');

Here, Libraries API first would check the cached repository. If a match is not found, then a look up against the file system is triggered (this look up would search for the library in sites/all, sites/domain, etc.). If that scan fails, libraries_get_path() returns FALSE. If that scan succeeds, it caches the result and returns the path.

If a match is found in the libraries cache, it could try a simple file_exists() to make sure the library is still there. If not, then the cached information would be removed, and FALSE would be returned. If a match is found in the cache and confirmed by file_exists(), then the path would be returned.

With this method, every invocation to libraries_get_path() would mean a minimum request to the file system (file_exists()), a scan against sites/all/thing, sites/domain/thing, at a maximum.

libraries_get_path() could also have a second argument so that the caller can bypass file_exists() confirmation. Or the other way around, by default file_exists() confirmation is not performed when a match is found in the cache. The file system look up could only be forced by the calling module on the admin side or similar.

Finally, this needs some kind of garbage collector to clear cached stuff that will never be requested again, and that is not present on the file system. Maybe here the libraries API could detect when a module is disabled, and if it was implementing hook_libraries_info(), then try to clean the related libraries. Or maybe using a cron task that executed once a day or so.

Not sure if I'm missing something. :-|

sun’s picture

- We probably want to cache the currently available libraries permanently until an administration page is visited. In Wysiwyg API's case, that would be admin/settings/wysiwyg/*. I'd do something like drupal_flush_libraries() [ideal example] there.

- We want to prepare this cache, so a minimum of additional processing is done at (regular) runtime.

- We probably want to allow partial cache flushes, i.e. drupal_flush_libraries('wysiwyg'), which invokes wysiwyg_libraries_info(), removes only those entries from the cache, and cache_set()'s back the remaining result (via hook_exit() ?).

- libraries_get_library('foo') gets the current cache of all libraries; if no cache exists, uses library_get_path() to scan for that library, determines its version, performs dependency checks, executes an optional 'load callback' (or similar; to allow modules like Wysiwyg to attach further properties) and stores back the cache.

- libraries_get_path() probably does not have to be cached, because it's a simple file system operation, and combined with a cached registry of libraries, it's probably invoked occassionally only. (That patch for Wysiwyg API removed all instances but those in editor library definitions.)

- Debatable: Whether to cache this registry in one BLOB, or cache each library separately.

- Debatable: Whether library info should be an object of class DrupalLibrary.

Relevant code from wysiwyg.module:

  $editors = wysiwyg_load_includes('editors', 'editor');
  foreach ($editors as $editor => $properties) {
    // Fill in required properties.
    $editors[$editor] += array(
      'title' => '',
      'vendor url' => '',
      'download url' => '',
      'editor path' => wysiwyg_get_path($editors[$editor]['name']),
      'library path' => wysiwyg_get_path($editors[$editor]['name']),
      'libraries' => array(),
      'version callback' => NULL,
      'themes callback' => NULL,
      'settings callback' => NULL,
      'plugin callback' => NULL,
      'plugin settings callback' => NULL,
      'versions' => array(),
      'js path' => $editors[$editor]['path'] . '/js',
      'css path' => $editors[$editor]['path'] . '/css',
    );
    // Check whether library is present.
    if (!($editors[$editor]['installed'] = file_exists($editors[$editor]['library path']))) {
      continue;
    }
    // Detect library version.
    if (function_exists($editors[$editor]['version callback'])) {
      $editors[$editor]['installed version'] = $editors[$editor]['version callback']($editors[$editor]);
    }
    if (empty($editors[$editor]['installed version'])) {
      $editors[$editor]['error'] = t('The version of %editor could not be detected.', array('%editor' => $properties['title']));
      $editors[$editor]['installed'] = FALSE;
      continue;
    }
    // Determine to which supported version the installed version maps.
    ksort($editors[$editor]['versions']);
    $version = 0;
    foreach ($editors[$editor]['versions'] as $supported_version => $version_properties) {
      if (version_compare($editors[$editor]['installed version'], $supported_version, '>=')) {
        $version = $supported_version;
      }
    }
    if (!$version) {
      $editors[$editor]['error'] = t('The installed version %version of %editor is not supported.', array('%version' => $editors[$editor]['installed version'], '%editor' => $editors[$editor]['title']));
      $editors[$editor]['installed'] = FALSE;
      continue;
    }
    // Apply library version specific definitions and overrides.
    $editors[$editor] = array_merge($editors[$editor], $editors[$editor]['versions'][$version]);
    unset($editors[$editor]['versions']);
  }
  return $editors;
markus_petrux’s picture

hmm... it seems to me that this approach complicates things too much.

The common problem that needs to be solved, I think, is that modules need to know the location of certain libraries. Users need to know where they can install libraries required by certain modules in a directory that's independent from the module location.

For example, module_a needs library_a:

1) User can install library_a at sites/all/libraries/library_a, sites/domain.1/libraries/library_a, sites/domain.2/libraries/library_a, and so on...

That's easy to explain, easy to manage for the user. And it's similar in concept on how module/theme locations work.

2) All module_a needs to do is libraries_get_path('library_a'), and if it will get the location of library_a the user wants to have enabled for the current site. ie. if running on domain.1, it will get "sites/domain.1/libraries/library_a". if running on domain.2, it will get "sites/domain.2/libraries/library_a". if running on domain.3, it will get "sites/all/libraries/library_a".

3) Libraries don't need .info files. Libraries cannot be nested in subdirectories. Only at "sites/(all|domain)/libraries/library_a".

Isn't it simple and nice?

If that's not enough, then I think further information (and methods) attached to each library will vary too much from one library to another, so that could be implemented by each separate module, independently.

If time and experience is able to tell more common requirements related to library management, that could be added at any time. But now, it seems to me quite complex to accomplish, because we don't know what other libraries/modules need.

sun’s picture

Version: » 7.x-1.x-dev
Issue tags: +Libraries

Tagging.

markus_petrux’s picture

For the record: Here's a good reason to keep 3rd party libraries off the modules directories: #546584: Modules or themes with too many files kill drupal_system_listing performances.

tstoeckler’s picture

Status: Active » Needs review

This one is based off of #719896: Add a hook_libraries_info().
It puts a cache_set() in the detection and a cache_get() in the loading. That's all. I didn't roll a patch, because that would have included all of #719896, so here's the relevant code.

/**
 * Detect libraries and library versions.
 */
function libraries_detect($libraries) {
  foreach ($libraries as $name => $library) {
    libraries_detect_library($libraries[$name]);
    cache_set('libraries_' . $name, $library);
  }
  return $libraries;
}

/**
 * Loads a library.
 */
function libraries_load($library, $variant = NULL) {
  $library = cache_get('libraries_' . $library);
  if (empty($library)) {
    $library = libraries_info($library);
    libraries_detect_library($library);
  }
  libraries_load_files($library, $variant);
}
wizonesolutions’s picture

I've been reading this thread. Is it going to be module developers' jobs to provide ways to flush the library caches? Just checking because this code provides caching and getting from the cache, but not a way to clear it.

tstoeckler’s picture

First and foremost, I noticed while testing, that there are some bugs in the code in #6. I didn't roll a (new) patch yet, because I'm juggling quite some Libraries API patches currently and didn't have the time to sort it all out.

Re #7:
Well, the cache is (currently) stored in the normal cache table, so every time you hit the 'Clear caches' button on the performance page, your libraries cache will be deleted. Also once there is a UI for Libraries, the plan is to delete the libraries cache there and do a rescan.

wizonesolutions’s picture

Makes sense.

Yeah, there are quite a few patches floating around huh.

What needs to be changed in #6? If I have time before you maybe I can give it a shot.

Kevin

tstoeckler’s picture

I think it was a serialization issue. I didn't dig too deep but for some reason with the above code, in libraries_load() the cache_get would return an object instead of an array. If I remember correctly, I somehow got it to work, though.
I bet sun will have some expert advise on this, as well as the general approach.

tstoeckler’s picture

This one works. And passes tests.

Details:
- A cache_set in libraries_detect() [not in libraries_detect_library(), which is what we call in libraries_load()]
- A cache_get in libraries_load. On a miss, we call libraries_detect_library (see above).

Notes:
I think it doesn't make much sense, that in order to detect all libraries you currently have to do:

$libraries = libraries_info();
libraries_detect($libraries);

I think we should call libraries_info() inside of libraries_detect() (just like we do libraries_info($library) in libraries_load()), so that the above would just be

libraries_detect();
tstoeckler’s picture

Great, I forgot to mention the most important thing :)

Currently we don't call libraries_detect() anywhere, so that the libraries don't get cached currently until you call libraries_detect() currently.
Thinking about this, that doesn't make much sense. Alternative patch for consideration, which does the cache_set in libraries_detect_library().

sun’s picture

Status: Needs review » Needs work

It's rather unlikely that you want to detect all supported libraries under normal operation. The most common case probably is that one module supports one library, and its functionality is only invoked on certain requests/pages (aside from admin pages, perhaps). Wysiwyg definitely showcases a more extensive use-case already, where one module supports multiple libraries. But even for that use-case, only a single admin page will try to detect all of the supported libraries, but of course only a sub-set of "client-side editor libraries", not "all libraries".

Therefore, caching each library individually probably makes sense. However, following core's practice, I think that only the load operation should try to retrieve from cache, or otherwise detect + cache - i.e., let's not squeeze caching functionality into an otherwise clean API function to detect a library.

+++ libraries.module	25 Jul 2010 20:01:38 -0000
@@ -187,7 +187,7 @@ function libraries_info($library = NULL)
 function libraries_detect($libraries) {
...
     libraries_detect_library($library);

@@ -278,19 +278,28 @@ function libraries_detect_library(&$libr
+function libraries_load($name, $variant = NULL) {
...
+    libraries_detect_library($library);

With this change... why would I ever want to call libraries_detect() ?

+++ libraries.module	25 Jul 2010 20:01:38 -0000
@@ -278,19 +278,28 @@ function libraries_detect_library(&$libr
+  // Update the cache.
+  cache_set('libraries_' . $name, $library);
 }

This will only set or update the cache in case the library is installed. All other early-out returns in this function won't cache the known results.

+++ libraries.module	25 Jul 2010 20:01:38 -0000
@@ -278,19 +278,28 @@ function libraries_detect_library(&$libr
+  $library = cache_get('libraries_' . $name);

{cache} may not be the best bin, as it's flushed on every content update, no? I'm actually not sure, but we should ensure that our items basically live on until we ourselves renew them (drupal_flush_all_caches() being the only other exception).

Introducing libraries_cache_get/set() helper functions may be worth to consider.

Powered by Dreditor.

tstoeckler’s picture

Thanks for the quick review.

I think that only the load operation should try to retrieve from cache, or otherwise detect + cache - i.e., let's not squeeze caching functionality into an otherwise clean API function to detect a library.

That's what the patch in #11 does. But that leaves us with:

why would I ever want to call libraries_detect() ?

Note that this is not changed in this patch. We need to think of a code-flow that makes more sense here. Maybe, in libraries_load, if we miss the cache, just do a libraries_detect()? It's really not THAT expensive (compared to other caches), esp. because the number of libraries will be <100 for 99.99% of sites (made-up number), and that would keep the API function libraries_detect_library() clean. It doesn't really feel right, though...

This will only set or update the cache in case the library is installed. All other early-out returns in this function won't cache the known results.

I thought this was a good thing. Early-out returns means, that the library could not be properly detected, so either libraries_load() will fail, or it will load the wrong files. This is of course related to #864376: Loading of external libraries, where we (currently) rely on loading of 'error-libraries', but that's a different thing to sort out.

sun’s picture

Live performance numbers readily available in #841794: Cache wysiwyg_load_includes() ;)

1) In libraries_load(), if we miss the cache, then of course we want to rebuild the cache, so it exists next time.

2) The only downside of cache handling directly in libraries_load() would be that <100 libraries on a site potentially also means <100 separate database cache lookups for <100 loaded libraries. I guess it'll work for now. We can think about mass/multiple cache/pre/post-read-ahead/loading in the future, if it becomes an issue at all.

3) Yes, early out returns are a good thing -- unless the function writes the collective result of what it learned at the end of the function. :) Not an issue anymore if we move the cache handling into libraries_load() though.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
1.36 KB

New patch. We update the cache both when calling libraries_detect() (for instance on /admin/reports/libraries (soon...) or maybe on module enable?) and on a missed cache on libraries_load(). libraries_detect() being more or less useless for now, I think, is something we'll have to fix or think about, when we've got a couple more issues sorted out (UI for one). We can always remove it or re-utilize it later.
Also multiple cache get and set functions are definitely useful, but again, I think for now this is all we can do, without overcomplicating the current system.

pounard’s picture

A single cache entry for all libraries would be better than the latest patch. You store tiny cache entries and you will create SQL requests for each one of them.

In order to provide the right cache clear, you should clear it a flush_all_caches() time (which will happens quite often, module enable, disable, etc).

Libraries has to be scaned only when developing, but as soon as Drupal enters in production mode (all caches activated) then no scan should ever happen, and as many cache fetch as you have libraries is really too much (it's I/O each time, it probably be even worst than scanning the FS if you have a fast harddrive).

When you don't find a library, you can mark it in the current cached structure as "non existing" and update the cache. It'll avoid later rescan, you could keep only one cache entry for all the libraries. This cached entry will be updated on new library scan or at cache clear time, which seems quite easy to implement (it doesn't imply any complex algorithm).

If users upload new libraries, you have to rescan for them, but it needs a user action that is the download itself, so why not letting him press the "cache clear" button (or another "library cache clear" more specific button)?

sun’s picture

A single cache entry for all libraries would be better than the latest patch. You store tiny cache entries and you will create SQL requests for each one of them.

That's not an issue at all, and actually in line with Drupal core - we're moving away from using extremely big and bloated cache data, of which only a marginal part is used in the end. The same principle applies to libraries - in a single request, modules might be interested in 1-3 libraries, but not necessarily all of them. Thus, we'd be loading lots of needless data for no good reason. Lastly, cache lookups are quick.

I agree on the necessity of caching requested but unavailable libraries to avoid subsequent filesystem scans though.

pounard’s picture

Whatever happens, if you install 3 or 4 libraries, it's in order to use them. Their metadata has a neglictable impact on memory, on the opposite to something like, menu cached entries or theme registry. I really think that making one cache entry per library is an error. If you have ten library are you really willing to make 10 SQL request, will the latency it produces (even worst if in a distant memcache or such)? It's all about finding a balance between latency and memory usage, and in this case, I think latency wins, since you cache almost nothing except some pathes.
That's probably why element_info() is not cached (except static cache), and such. It's used tons of time in the same page, but there isn't any bin for it, because the time needed to build the info array is probably smaller than causing latency using a distant storage. Same thing with libraries, not exactly the same problem, but as the data you store represent almost nothing in term of memory usage, it may worth the shot of doing only one cache entry.

It's only an opinion and I let you do your choices, it's not my decision to make, nevertheless when core gives you a direction, it doesn't mean you cant discuss it for such edge cases.

sun’s picture

Bear in mind that Drupal's overall performance suffers a lot from needless PHP memory consumption.

It depends on which kind of library you're looking at. For example, wysiwyg editor + tons of corresponding plugin libraries wouldn't have to be loaded at all on any page that doesn't contain a text format-enabled form element. However, if you're replacing the core jQuery with an alternative version, then yeah, that library would be loaded on each and every page.

In turn, we might need a way to denote the caching method for each library. Probably easy to implement. However, I'd leave that to a separate follow-up issue for now.

pounard’s picture

I now, I do deal with everyday. But it also suffers from too many SQL queries. Sometimes core alone just to display a content listing can go up to 200 requests, which is way too much. You have two problems, but dealing with one by making the second worst is not really an option. All is balance, and in this case memory consuption is almost nothing.

But I kind like the fact that some libraries could have a different caching policy, this is a good idea, but libraries such as WYSIWYG remains edge cases.

EDIT: Putting in a place a solid caching policy is engeneering, but caching almost anything everywhere is often over-engeneering. I would advise to you to use a compromise here, meaning cache only the existence or not of libraries in the cache array (with their original path), rebuild the rest at runtime, it would ensure only one really small cache entry (and avoid more than one I/O for fetching back data from a distant storage), and it event may be faster than caching anything.

It could lead to something like ((average_path_length + average_name_length) * sizeof(char)) memory consuption, which means even with 50 libraries (arbitrary number, a lot), let's say with 90 char average path length (written "/var/www/my_maginificient_site/www/sites/all/libraries/flash/superb_library/real_library" to compute this, as example) and 16 average name length would cost only 5Kb (some Views from the Views module often consume up to 10M which is something like 2000 times more. On my box with a simple site, core and only 5 modules (sample is my blog) Drupal consumes something like 25M per page, which is 5000 times my worst case scenario here. But, making 50 SQL requests over 300 shows here a ratio of 1/6 which is non neglictable.

Re-Edit: 5kb is nothing, and it's more likely that MySQL buffers, PDO buffers, considering the known memory leaks in PDO in some PHP versions, objects spawn in order to retrieve cache, database query objects from the database layer, and other environmental stuff will eat may be 100 times more than this.

pounard’s picture

I apologize I was comparing the code with the wrong branch. 2.x seems indeed to store a lot more of data than legacy 1.x

tstoeckler’s picture

Version: 7.x-1.x-dev » 7.x-2.x-dev
FileSize
1.54 KB

Reroll.

I think, if this passes tests, we should commit this as is, because the master branch is currently unusable for any serious site because it has to reprocess all information on every page load. I agree that there might be better caching strategies and we can advance on those in the future, but this really seems like a good first step.

sun’s picture

Status: Needs review » Needs work
+++ b/libraries.install
@@ -0,0 +1,15 @@
+ * @file
+ *   Install, uninstall, and update functions for libraries.module.

No indentation after @file.

+++ b/libraries.install
@@ -0,0 +1,15 @@
+libraries_schema() {

errrrrr?

+++ b/libraries.install
@@ -0,0 +1,15 @@
+  $schema['cache_libraries'] = system_schema_cache_7054();

What's 7054? :) Let's use drupal_get_schema_unprocessed('system', 'cache') here ;)

+++ b/libraries.module
@@ -359,8 +360,15 @@ function libraries_load($name, $variant = NULL) {
+    if (empty($library)) {

Can the cached $library ever be an empty array, NULL, or FALSE? isset() is commonly safer for cache patterns.

Might also be a good idea to flip the if/else logic in the code. Requires to think a bit more about the negated condition.

Powered by Dreditor.

tstoeckler’s picture

FileSize
1.56 KB

Yes, it's late :) ...

Maybe something more along these lines...

tstoeckler’s picture

Status: Needs work » Needs review
sun’s picture

Status: Needs review » Needs work
+++ b/libraries.module
@@ -359,8 +360,14 @@ function libraries_load($name, $variant = NULL) {
+    $library = cache_get($name, 'cache_libraries');
+    if (isset($library)) {

cache_get() returns FALSE in case of a non-existing cache though.

I'm quite sure you'll be able to fix that, so please go ahead :)

EDIT: ...and commit after fixing it, I mean ;) oh yes, it's late.

Powered by Dreditor.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
1.55 KB

Uploading fixed patch.
Will commit this as soon as the previous one comes back green.

tstoeckler’s picture

FileSize
1.56 KB

Now with less PHP errors. Guess I better wait for this one to come back...

tstoeckler’s picture

FileSize
1.56 KB

Oh boy.... Sorry for the spam. Now for this one I actually went through the trouble of setting up a dev site, so I'm pretty sure there's no harm in this one.
I'll wait for it to come back green and then commit this one.

tstoeckler’s picture

Status: Needs review » Fixed

Marking fixed. http://drupal.org/commitlog/commit/10030/0793cd8444a0d0bf5cfac93cc85c25d...
If you want to leave this issue open for more advanced caching strategies, help yourself.

tstoeckler’s picture

Status: Fixed » Needs work

http://api.drupal.org/api/drupal/modules--system--system.api.php/functio...

Hmmm.... I don't think we need to provide dedicated cache flushing functionality, but at least drush cc should work:

/**
 * Implements hook_flush_caches.
 */
function libraries_flush_caches() {
  return array('cache_libraries');
}

Marking needs work.

tstoeckler’s picture

For clarity, I meant:
I don't think we need to provide dedicated cache flushing functionality yet...

tstoeckler’s picture

FileSize
331 bytes

I tested this patch (Drush ftw) and it works. If it passes tests, I'll commit this. I like to place hook invokations (this is our first one...) at the top of the module file, but I also don't really care, so if you feel differently please reopen.

tstoeckler’s picture

Status: Needs work » Needs review
tstoeckler’s picture

sun’s picture

Thank you! Both commits look excellent.

Status: Fixed » Closed (fixed)
Issue tags: -Libraries

Automatically closed -- issue fixed for 2 weeks with no activity.