i've always found it annoying that there's no practical place to store small bits of module data without making a new db table. the default solution seems to be "throw it into the variable table" and manage it that way, which core does as well. the problem with this is that $conf can grow to be quite giant (i've seen sites w/ over 1000 vars loaded into $conf), and many of these vars aren't really needed all that often (the user registration emails are a good example). this seems like wasteful memory usage to me.

attached patch is a very lightweight solution to address this problem. it makes the caching/loading of variables optional, by adding a third arg to variable get/set. since the default behavior is to assume caching, it won't break anything in any current code, and if a dev determines that they need to store something there that will rarely be loaded, they specify in their set/get calls that the variable isn't in the cache, and it's loaded straight from the db as needed.

at this point, i haven't coded the db update to support this change, or gone through core to determine which vars we would want to remove from caching--i'd like to get some feedback on the idea before proceeding any further. :)

CommentFileSizeAuthor
#37 bootstrap.inc.patch836 bytesAnonymous (not verified)
#6 drupal_head.patch3.17 KBfgm
variable_cache.patch1.97 KBhunmonk
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dries’s picture

This needs to be benchmarked before it can be considered. In general, it is difficult to hand-tune this kind of stuff.

hunmonk’s picture

@dries: yes, understood. my hope was to start out with something simple like this, and use it where it would obviously provide benefit (ex. the user registration emails i mentioned). maybe later we could find some way to automate it, but for now i was just testing the water with the idea, which i would finish coding if it seemed like a reasonable advancement, and passed some benchmarking tests.

moshe weitzman’s picture

Version: x.y.z » 6.x-dev

i have seen the same slowdowns on my sites too. i think this is a fine direction ... api.module is an example of a module which puts a huge variable into variables table and thus is uneedingly slowing down every request.

hunmonk’s picture

Assigned: hunmonk » Unassigned

unassigning myself. hopefully someone else will pick up the torch... :)

RobRoy’s picture

Status: Needs review » Needs work

If we're goint to have variable_get() set $conf[$name] = unserialize($value); when getting an uncached variable, then we need to check if (isset($conf[$name])) when getting that variable since adding it to the global variables is only useful if we're going to access it again later on that same page request.

Otherwise, I think we should instead just be doing a return from inside that if statement no? Or, am I way off?

fgm’s picture

Version: 6.x-dev » 5.x-dev
Status: Needs work » Needs review
FileSize
3.17 KB

I've redone the patch a bit differently, because these observations actually cover two distinct needs: prefetching (performed in variable_init) and caching (currently done in variable_set, and that could otherwise be performed in variable_get).

In order to make the change as invisible as possible and not add complexity to the API, I only changed the parameters to variable_setby adding the optional prefetch parameter, which maintains the current default behaviour and does not imply any change elsewhere.

Caching does not seem useful to place under programmer control, even for large variables, since once the variable has been loaded, it costs basically nothing to maintain hit: the performance hit has already been sustained.

With this patch applied, only variables with a prefetch status set to 1 will be fetched and cached by variable_init. Other variables will be fetched and cached only when requested from a variable_get or set using variable_set. Deletion using variable_del is not affected.

The patch adds one column to the variable table.

hunmonk’s picture

didn't do a thorough code review, but a cursory look at the code indicates that the approach is solid. i see no sense in storing huge variables in $conf if they're not needed regularly -- at a minimum there's a performance loss having to unserialize them when it's pulled from the cache, and this happens for every page hit IIRC.

i've had numerous ocassions where a module i was designing needed persistent storage of some piece of information, which was too minor to warrant a database table, but not used nearly enough to be loading it every single page, either. having a sensible storage mechanism for this case seems very reasonable to me.

Dries’s picture

I'm not a fan of this patch. These things should work without a bazillion of parameters.

hunmonk’s picture

@Dries: can you suggest an alternative approach that you believe would be cleaner?

fgm’s picture

Dries: your qualms are is why I did it that way, minimizing impact and need for tweaking.

It takes one parameter less than the original approach, and maintains identical API and semantics for all code not explicitly adding the third parameter to disable prefetch, which means that exactly no existing code is affected.

Only modules with known "not always used" behaviour should use this. As Hunmonk wrote, the welcome emails from core are a good example, just like any variable storage for a module that doesn't need it everytime : think variables for cron-triggered code, used only in this statistically rare case, for instance.

Crell’s picture

I agree that this is an important feature to have for performance reasons. If not done as a part of the existing variable_* functions, then it would need a separate set of functions to handle it. I think extending the existing API is the cleaner method.

Steven’s picture

Version: 5.x-dev » 6.x-dev

I'm postponing this for 6.0, as it is really a problem caused by naughty modules. We should avoid abusing variables, but perhaps we should look to a more flexible variable solution for 6.0.

hunmonk’s picture

We should avoid abusing variables

this happens because there's no good place to put the kind of information that's being discussed in this thread.

but perhaps we should look to a more flexible variable solution for 6.0.

yes!

joshk’s picture

+1 for some kind of variable overhaul in 6.0, as well as some better caching options for high-volume sites. In particular, cached values that are really large (big pages, complex variables) can become problematic when the DB and Webserver are not on the same system. It would be nice to implement some filesystem based caching: e.g. a "cache" directory in files with hashed filenames, and a cache table in the db that just keeps the normal data but stores the file hash rather than the data.

I have no idea if that would actually be more effective, but in one site where I have cache results that are 400k+ in size I think it would help a lot.

Dries’s picture

I'm likely to mark this "won't fix". You should file issues against modules abusing the variable cache. This looks like cruft to me. We can't let the variable table evolve into a random storage mechanism -- that is not what it has been designed for.

fgm’s picture

Dries, you say "We can't let the variable table evolve into a random storage mechanism -- that is not what it has been designed for.".

Is there a document somewhere explaining what this table has [not] been designed for, to identify where abuse actually occurs ?

If there is no such document dictating proper use of that table forever, is there a reason not to evolve for better application support ?

A patch along the lines suggested would typically maintain the current default behaviour and speed for all existing code, but allows applications (i.e. contrib modules) to maintain an efficient centralized "variable" storage instead of multiplying module-specific tables or overloading the global variable cache. IMHO, this can reduce cruft, notably at the DB level, not increase it.

Another option would be to only allow privileged (core ?) modules to have their variables preloaded and cached, and only cache (not preload) variables for other modules, or even not cache them at all. It feels less elegant to me, though.

moshe weitzman’s picture

well, if we are going to declare this API for small variables only then lets decrease the size of the value column so modules *can't* misuse the table. the only sticky point is how to handle existing large variables. i suggest creating a variables_legacy table and putting them there so the admin can deal with the sitiation. comments?

hunmonk’s picture

We can't let the variable table evolve into a random storage mechanism -- that is not what it has been designed for.

i think this is almost beside the point -- there is obviously a need for a random storage mechanism, whether it's a part of the variable table or not. i can say this with confidence because the variable table is being used for that purpose today, even though that's not what it's designed for. :)

and how do we decide what's worthy of the variable table or not? size? frequency of use in page calls??

asking contrib modules to keep their own misc data tidbits in their own table is not a solution -- they'll end up right where they're going now, in the variables table. so i see two reasonable solutions:

  1. the currently proposed patch, or some improved variation of it
  2. a separate long-term or occasional use variable table that modules can make use of

otherwise you're going to have a bloated variable table like you do today...

dww’s picture

Status: Needs review » Needs work

i think this would be a help. lots of modules need persistent storage of some settings (hence the old hook_settings() and the current system_settings_form()). if we don't want modules to be able to use this as a generic storage mechanism, why are we providing such a handy way for them to use this mechanism in the first place? i don't see how contrib modules handling settings, even large ones, as settings/variables, is inherently "abuse", worthy of issues to "fix" them.

that said, i've certainly held back my own tendency to make elements of project*.module other settings, for fear of adding yet more crap to every page load on d.o. :( yet, many of these things really should be settings, to help this set of modules be useful in other contexts, not just d.o. i've held back on making my own project-specific variable table for this stuff, since that just seemed like evil duplication of effort. if anything, we should make it easier for modules to use settings instead of a) hard-coding something that has to be changed only via constants in the code or b) encouraging a free-for-all of incompatible home-brew solutions in each module with their own tables.

so, a huge +1 for giving responsible, performance-conscious module writers a mechanism to efficiently store their own settings, no matter how big or infrequently used, using the hooks and API we already provide for them. a single, optional argument to variable_set() (which really needs a corresponding attribute in each FAPI array element for the system_settings_form() code -- hence the "needs work"), seems like a very clear, intuative, and reasonable move, not "a bazillion parameters" or a hack/bloat to get around a "problem caused by naughty modules"...

thanks,
-derek

dww’s picture

Title: make variable caching optional » make variable prefetching optional

new title to better capture the heart of the matter: people calling variable_set() might have some idea (and therefore should have some control) over if they think this variable is worth prefetching on every page load. fgm is totally right -- once we fetch, we should cache regardless, the question is what should happen in variable_init().

a handful of infrequent pages that do a few extra queries (if people guess wrong and don't prefetch something agressively enough) is almost certainly going to be better than the increased cost on *every* page load of always prefetching everything, unserializing() all those rows, taking up that much RAM, etc, etc.

catch’s picture

Bumping this to drupal 7. Same issue, different approach over at: http://drupal.org/node/145164

I have a feeling there may be a third issue where this was fixed, but not closing these until I find it.

sun’s picture

+1, this is badly needed. Enhancing the existent variable system makes it easy for contrib module developers to implement this and migrate their variable bloat in one hook_update_N query.

Pancho’s picture

Version: 6.x-dev » 7.x-dev

Moving feature requests to D7 queue. Sorry...

fgm’s picture

Side note: it would probably be useful to also consider http://drupal.org/node/138544 (variable_* with array) when updating the variable system with caching/prefetching.

R.Muilwijk’s picture

I've also been working on a solution for this at:

http://drupal.org/node/318159

I do not know what is the better solution... doing this on every variable_get which is easily forgotten or in a registry costing memory..

fgm’s picture

I've thought of it again these last few days, especialy after Crell tackled the "multiple variables at once" issue #138544: Allow multiple variable deletes in variable_del() in the variable_set case #317930: Allow multiple values in a single variable_set() call.

The problem of doing this efficiently is actually a bit more complicated than it seems. Let's have :

V
The number of variables
P
The number of prefetched variables (typically a large fraction of V)
N
The number of non-prefetched variables actually needing to be loaded (hopefully a small integer, such as N + P is still "significantly" lower than N).
Q
The cost of a variable SELECT query
F
The cost of a row fetch

With these notations:

  • the current non-patched behaviour costs the full memory cost for all V variables, and 1*Q+ V*F
  • the current patch costs (1 + N)*Q + (P + N)*F
  • for N = 0, which should be the case for short loads like the partial boots for cached pages and the like, the benefit if obvious: we are are Q + P*F instead of Q + V*F, and gain the whole RAM benefit
  • for N = 1, which is the typical case for normal pages which this patch tries to address, cost is 2*Q + (P+1)*F, which is expected to be lower than the current Q + V*F
  • but from then on, the situation degrades rapidly, as Q, although dependent on the DB engine and setup, is typically much higher than F, and we don't want to see it increase, which will be caused with this patch.

So in this scheme, the next improvement should lie in finding a way to group all the non-prefetched (let's call them deferred) loads in just one query. Probably still not with the prefetched ones anyway, to maintain the fast loading reduced boot phases advantage.

Since the earliest point where it is possible to know which function will be invoked is after path resolution, a possible scheme would be to have modules declare their deferred variables along with their menu callbacks. That way, after path resolution is effected, the list of deferred variables would be available and they could all be fetched in a single query. It could look like:

$items['some/path'] = array(
  'title' => 'Rarely used feature',
  'page callback' => 'some_function',
  'page arguments' => array(),
  'page variables' => array('some-var1', 'some-var2'),
  'access arguments' => array('some permission'),
  );

The cost would then be only 2*Q + (P+N)*F, and still get us the memory benefit.

Now, keeping the current patch in addition to this optimization,

  • modules not taking advantage of this system by predeclaring the deferred variables would still work
  • modules creating deferred variables and not properly declaring them would still work, and only sustain an additional Q for each "unplanned" variable_get
  • modules creating such declared deferred variables would gain the full optimization benefit

In short, existing code would still work unchanged, while adapted code would run faster with less RAM.

R.Muilwijk’s picture

@fgm, I would suggest you check out my patch at http://drupal.org/node/318159 because it has some of the stuff you said in...

Crell’s picture

Conditional loading goes over well with me, surprise surprise. :-) However, I don't like the explicit declaration.

Crazy idea:

The new Registry system automatically caches per-router item what variables it's going to need. It's self-learning.

chx is working on an even smarter self-learning system for patch caching: #223075: Adaptive path caching

What if we tried caching variables per-router item like the registry? Prefetch just those that were used last time and lazy-fetch (and mark for later) any others? That way it will rebuild itself as we go, we can always clear the cache, and the worst case scenario is that we fetch variables individually for one page.

fgm’s picture

Funny, I was thinking along the same lines a moment ago:

In the case of these persistent variables, it is even simple in principle (not sure how it would fit in the boot sequence, though): since all these variables are loaded into $GLOBALS['conf'], a simple array difference between the values of that global before- and after-callback would give the variables actually used by the callback.

The problem is that this means two copies of the global, which is the main problem we are trying to avoid, precisely in order to limit RAM requirements. Self-learning along these lines would have to only fire infrequently for the solution not to be worse than the problem.

Crell’s picture

Hrm, that is true. We need the variables available before the menu callback is determined. I still think that's solvable, though.

Anonymous’s picture

Issue tags: +Performance

adding performance tag

CorniI’s picture

subscribe, definitly interesting

moshe weitzman’s picture

Priority: Normal » Critical

I am running into this performance problem again on a live site. Please re-consider this.

cha0s’s picture

In order to be useful, the solution will have to address system_settings_form() somehow, agreed? I think you'll find that most variables are stored automatically by this mechanism.

catch’s picture

I think we'd be better off tackling the current abuse of the {variable} table - if it had less crap in it that doesn't belong there, then the prefetching would be less of an issue.

#517044: Improve text handling, texts in text files (editable, overridable, translatable, etc...) and some of its predecessors tries to get big variables (e-mail texts and similar) out of this table into dedicated text storage.

We should also try to discourage dynamic variables - like all the content types ones, when we have a dedicated content types table with some settings per-content type in it already. Those are unnecessary and hard to clean up.

catch’s picture

Version: 7.x-dev » 8.x-dev
Priority: Critical » Normal

This is probably won't fix given there are less tricky issues solving the same problem around, but moving to D8 since the discussion is good at least.

Anonymous’s picture

FileSize
836 bytes

Hi !

As waiting for a complete rethinking of the variable system can take some time, I thought of a patch that could be applied in the current D6 without introducing any change in the API nor in the database.

It is based on the asumption that not every variable will be read for each generated page. It is therefore useless to unserialize all values extracted from the database. This operation should be done once in variable_get.

In order to achieve this, it uses a little trick that is to name the variable extracted from the database with a '--' before. If the variable does not exist with its standard name, variable_get will look for the same variable name starting with '--' and unserialize it if found. (not sure if I'm clear… look at the patch, it should be simpler ;-) )

peterx’s picture

Now that this issue is D8, I agree with rethinking. I propose moving some of the variable caching to some session style caching modules. For instance, the variable_role add-on module would cache values by role and load the values for users that have those roles. When a module stores variables, they could be cached based on the use. There could be variable_theme and other variable service modules that are loaded only if needed.

I wrote something similar for a number of sites where the variable is cached by module. Module example calls variable_get_module('example', array()); and gets an array of variables for that module. There is just one database operation because everything for that module is in one row. The performance improvement was useful for modules that were rarely loaded. These were modules with very small .module files that did little other than decide to occasionally load a .inc file. Some sites did something similar for roles and themes where the user has a big choice and typically loads only one or a few. By making this service a separate module and adding it to D8, modules could make it an automatically loaded prerequisite.

Changing this to a role related variable example, the variables would be cached in a table keyed by role. When you login with role X, you get entry X deserialized into memory. If you have roles X and Y, both are loaded. You would have something like $GLOBALS['variable_role']['variable name']['role'] = value.

I suggest a separate discussion for variable_role, variable_theme, variable_user, and some other common uses.

ogi’s picture

subscribe

Fabianx’s picture

Why not allow memcache style usage? Or use another store/retrieve backend?

I think the variable table is slow, because:

* variable_set needs to lock the rows of the table (MyISAM - of course only)
* variable prefetching can get really large

What we want is:

* A persistent storage, which is a key, value store ( that can be emulated by the DB )
* The persistent key, value store should be able to utilize write-through caching for performance and persistency reasons

Why not use an approach with a write-through cache per variable?

Directly set / delete the variable in memcache / APC / whatever, but queue up requests, then write all update, insert, deletes in one "session" or transaction at the end of the script or at error_handler.

This would allow for great performance, because variables can be gotten from the key, value store, which can fallback to pre-fetching, but does not need to.

For shared hosting, nothing would change.

For someone having memcache and a key,val backend, it would just satisfy the requests from memcache and if the cache is not set, retrieve it from the backend (which can even still be a SQL table).

The point why this is not possible today is that there exists no (or it is not used, yet) abstract Key, Val Store, which can use either a real Key,Val store or SQL as backend.

Correct me if I am wrong here, as I am not yet so familiar with the code in 7.x/8.x.

Implementation Proposal:

* Implement abstract KeyValueStore class
* Convert variable_set to use this KeyValueStore class
* Add MySQLKeyValueStore concrete class
* Add MemcacheKeyValueStore using decorator pattern like:

KeyValueStore = new MemcacheKeyValueStore(MySQLKeyValueStore("variables"));

As long as memcache (for example) can satisfy the request, the MySQLKeyValueStore->get is never executed and such the prefetching never occurs.

Even for variable_set, which would be just passed through would nothing change, both would just store the variable.

The more I think about it, the more I see that this could be already implemented now (if we could overwrite variable_set / variable_get functions).

It would be as fast as now, but it would be faster and more lightweight for Memcache / APC / etc.

It would even allow for the original posters request by plugging:

MySQLSingleKeyValueStore(MySQLPrefetchingKeyValueStore("variables"))

and doing:

variable_unregister_prefetch('varname'), which is one DB hit at the beginning for sure, but would allow a module to specify at install that this variable should not be prefetched.

Again would here the first class lookup the variable name in its internal cache and retrieve it single form DB, while the second class would not prefetch variables with prefetch = 0 in the table.

Slightly different, but the idea keeps to be the same.

As variable_get is performance critical, so a module_invoke_all would probably be not a good idea, but I can imagine the class approach to work well and allow endless possibilitites for integrating write-through caching, like on a processor.

Anyone has a good test to benchmark variable_get / variable_set? I am curious what the savings in memory, cache_times, etc. would be ...

Best Wishes,

Fabian

PS: I do agree that modules misuse variable_*, but why not offer another solution to easily set/get configuration options, etc. in a Key/Value Store.

Crell’s picture

Issue tags: +Configuration system

I should note that the Configuration Management Initiative is now looking into this question, with the goal of ripping out variable_get() entirely and replacing it with something that doesn't date from 2001... :-)

Fabianx’s picture

While I appreciate the work of the Configuration Management Initiative, I've gone ahead and found a way to workaround the hardcoded ways of the variable system.

It is a contrib module using hook_boot for 6.x, but should be easily portable to 7.x. It is not compatible with strongarm.

With a trick we are able to add a write-through cache to variable_get/variable_set by replacing $conf with an ArrayObject. This means only necessary variables are prefetched from the variables table. All the others are gotten from the cache, or with a query from the DB (in case of a cache_miss, but knowing that the variable _should_ be there according to the keys loaded).

Please only use this module with memcache or APC as cache backend as everything else will severely degrade performance.

Currently the module utilizes the 'cache' cache_bin.

Have fun and enjoy less memory intensive and faster pages.

The module can be found in my sandbox here (6.x-1.x branch):

http://drupal.org/sandbox/Fabianx/1172558

@catch: It would be great if you could test it as you had been one of the strong supporters of this issue.

PS: Of course now that I finished this, I think a $conf['variable_inc'] should go into core (8.x,7.x,6.x) like with cache_inc, session_inc and lock_inc. Then a module could choose to replace the variable handling with its own doing the same like the contrib module above, but without the ArrayObject hack.

Vacilando’s picture

Subscribing.

Berdir’s picture

Status: Needs work » Closed (won't fix)

Given that the new config system is now in and variable_get() *will* go, we can now close this issue, can't we?