I like the idea of separating module data from drupal data. I see so much junk in system and variables table from modules. Even modules that are no longer installed still has lingering code.

Comments

kevinquillen’s picture

"I see so much junk in system and variables table from modules. Even modules that are no longer installed still has lingering code."

I agree, but I think part of the problem there is modules are approved without having a complete uninstall routine in modules that would require it (has tables/variables).

crystaldawn’s picture

BTW, there is no "module approval" check. Anyone with a CVS account can create a new module at any time. The only "approval" needed is when you first get a CVS account. It's actually quite difficult to get a CVS account to create modules. Instead of allowing anyone to create a module and trying to police what goes into them all, they instead try to police the people that are creating the modules and only allow those they deem worthy to be creating them in the first place. But the problem is that the people giving out these CVS accounts are not very good developers in the first place so many novice developers end up with cvs accounts. And when that happens, things like lazy coding happens more frequently because the novice developer doesnt realize that leaving a variable in the variables table DOES decrease the performance because of the nature of that table. In theory, keeping it all in memory like it does will speed things up. To a point. But if you cram everything including the kitchen sink into it, it starts hurt performance rather than increase it. So IMHO, the reason the variables table is a problem is not because of how it's designed, but because of the sheer amount of novice developers having CVS accounts. So since that is how they do things, then having a system like this makes sense since it allows people to be lazy and not impact performance at all if a module leaves things in it once it's gone.

Here's another undocumented feature that I havent mentioned. This module also has the ability to "stamp" when a variable is saved/updated/created. Well, it also has the ability to tell where it was called from. That is, it knows which module did the saving/updating/creating. So it's possible to save that data along with the timestamp and use it later to clean out modules that have been disabled for XXX amount of time. Lets say someone enables a module, decides they dont like it, and disable it. Well, it now what u end up with is variables left in the variables table right? Well what if they used this system instead. They would then have variables left over in this table too. But here's why thats not a problem. It would be very easy to add a feature called "clean old module data" or something along those lines that looks at variables older than xxxx days and checks the module that saved them against the current modules that are installed. If it sees some lingering data from a module thats been turned off for a long time, it could delete those variables. There ya go. Problem solved :) Course this feature doesnt actually exist yet, but the framework to do such a thing is layed out in this module and its possible to have a feature like this. But getting people to switch to this rather than the built in variables is probably just as hard as trying to get people to use an uninstall/install file. So who knows if its even worth doing or not.

Flying Drupalist’s picture

How big of a variable table is too big?

I don't have 1000 rows, but I have one really large row.

My table is 1.95mb, is that big or small?

I have a memory limit of 64 mb.

Thanks.

crystaldawn’s picture

The easiest way to tell if your table is to large is to multiply its table size by 10. It takes more ram to hold data because its not just the data that it's holding. It's also got to hold the php structures as well. For example, a variable might contain the word 'hi'. Which is only 2 bytes. But to hold that in ram it would require at least 12 bytes because it needs to know that its a string of 2 characters "string:2:'hi'". So the more small variables u have, the larger it gets in a hurry (this is one of the downfalls of using a high level language like php, it uses more memory than something like a low level language would such as C/C++). If you have 1 big one, you dont have as much "overhead" as you would with 1000 smaller variables, but its still a bad idea to store any mass amount of anything in the variables table because it wasnt meant for that type of usage :) The variables table was meant to be small, lean, and fast.

Flying Drupalist’s picture

My mistake, I did a sql dump and assumed 1 line = 1 row, stupid stupid.

I actually have 4110 rows.

But since you mentioned 150,000 rows on the front page for 32m, at 4000 I should be well within reason right?

1 module, display suite, is using 1.4m of the 1.9m I mentioned earlier. Is that excessive? 19m as you mentioned earlier should still be within range for me.

Thanks a lot for educating.

crystaldawn’s picture

For your current usage, you seem to be within reason memory wise. Most people however dont have the ability to change the php memory limit and are often times stuck with a low default like 2MB which would cause all kindsa problems if you tried to put your website on it with a larger variables table and it would be difficult to track down the cause to the variables table. Most people dont even bother and just increase the memory limit and then find themselves having to increase it again months later because they dont know what caused it to hit the limit in the first place. As a comparison. The largest variables tables I have is 1200 rows with a website that has about 15 extra modules installed on it other than core modules. So yours does seem to be a little unusually high.

Flying Drupalist’s picture

Gotcha, understood, thanks!

andy inman’s picture

I've just updated my comment in the API docs to point here. Question, is there any reason why not to patch variable_set (etc) to use this module? With the addition of a little routine to get all variables and store them in QV as a migration exercise. Well, I suppose the issue would be that some Drupal internal stuff *would* be better statically cached.

Also, I noticed this module: Variables API - some duplication of effort? Are you guys talking to each other?

crystaldawn’s picture

I had not heard of this API before until you mentioned it. At first glance this looks more like a drupal 7 module than a drupal 6 module. I'm not quite clear what this module is attempting to accomplish other than to create a cache for the variables system. I guess what it would accomplish is that saving/deleting content all the time would be more accurate? It doesnt seem to actually replace it with anything. I'll have to look at how to implement and use it to see what it actually does. They dont have any docs either which makes it harder to understand what the module is trying to accomplish.

As for adding this to the core, it would be quite possible but I wouldnt want to propose it myself since I'd be a biased opinion of adding it to core. If a neutral party proposed it then I'd be for it yes. It would be great to be able to use all the functionality I have here with the default variable functions.

I dont know if anyone has noticed my other module or not, but I do have a function override module that is available that could completely override the existing drupal variable functions and replace it with code that could convert it to use QV's. The problem is that it requires a really obscure PHP Module called APD that needs to be installed before it would work. I wish APD were a part of the core PHP functions. It allows you to completely redefine a PHP function. For example, you could redefine print_r() to do whatever the heck you wanted it to do. It's kind of like putting ALL php variables into a "class" so that it can be "extended". I would NEVER propose using my function override module on a core module, but its certainly plausible for developers who host their own clients sites to use it to completely customize drupal on their own webservers (which is what I do myself). In most of my websites, I have indeed replaced variable_get/set/del with QV's without losing the ability to upgrade drupal. This also means my sites dont suffer from the variable system quirks that are mentioned in that thread u linked because all calls to variable_get/set/del are actually run through QV's :) In my own websites, I have a module called "Variables Override" which replaces the drupal variable_set/get/del functions with QV's. The end result is that all modules that use those functions still work and my own custom modules can use them as well along with the added functionality of QV's. I did this because the current variable system is prone to many inconsistency problems on high traffic websites with lots of goings on with variables that change frequently.

andy inman’s picture

Thanks for the pointer on APD - never heard of it, but sounds like something well worth checking out.

Likewise I'd never heard of the Variables API module until the other day. It seems to be a back-port of some D7 functionality to D6, I'm not sure what it's trying to achieve, and it seems to me that you're further ahead.

Maybe some "marketing" may be needed to get QV accepted by module developers. Do you have performance data? Obviously variable_get() is very fast because data is statically cached. I've seen variable_get used inside loops, which seems like bad practice to me (better to assign the value to a PHP variable before the loop.) Presumably with QV the query cache will have a big impact? What about other caching, APC for example, to speed up reads? (probably best done via cache_router which seems to work well.)

It does seem to me that Drupal core should provide a simple and reliable method of storing ad-hoc values needed by modules (typically configuration data.) variable_set/get/del is simple but as you've pointed out, can lead to instability. As others have pointed out, you cannot update values on a frequent basis without potential whole-site performance issues. Ok, that's fine for configuration data, but not for code-generated values that need to be stored.

Final question - I don't claim to understand the MySQL techniques you've referred to in your docs, but could Drupal's cache interface be improved by using similar techniques? I am referring to the standard cache method which stores data in database tables.

crystaldawn’s picture

I have never done any marketing at all simply because I dont want to have a biased opinion. I let others do that for me. The best marketing is word of mouth imho.

I do have performance data that compares QV's vs Drupals Variable system with different server settings, etc. I havent however turned it into a graph chart or anything like that yet.

Using any type of query, cached or not, is bad practice yes. I rarely use anything other than PHP var's in a while/foreach, etc. The less overhead there is, the faster things can run. But sometimes it's not practical to always adhere to it as a practice. But you can do this with QV's without having any stability issues. It might slow it down a little, but you'll never end up with improper results. You cant say the same for drupal's variable system. It is possible to get skewed results with drupals variable system in while/forloops. While its not possible to have skewed results using QVs

It also is possible to speed up QV's read's using a multitude of different "caching" systems for PHP, MySQL, and Apache. You cant really increase the speed of drupal's variable system at all using any of these methods available to QV's.

You cant get much faster than the current drupal variable_get/set/del functions simply because its loaded into memory. But doing it this way also opens up instability issues as it's been noted in the past. So they selected to use speed over stability. The solution I use is just the opposite. It suffers slightly in speed because it's more scalable than it is fast. But the greatest difference is that you'll see a decrease in stability with the built in drupal variable system before you will see a decrease in speed with QV's. I do have some test data that shows both of these somewhere. I never did get to a point where there was a performance decrease in QVs though. I havent really hit any limit based on the tests I have run (up to 5Million rows) before theres any noticeable decrease in performance with QVs. The tests I ran were to show that drupal's variable system will cause problems before QV's would. And any performance issues with QV's will not affect an entire website if it's coded correctly. But performance issues with drupal's variable system affects the site even when the system isnt being used no matter how well the site is coded which is really bad.

The technique I use is called 4th normal form. It's an accepted method and is listed in wikipedia here: http://en.wikipedia.org/wiki/Fourth_normal_form It's been around for years and years. The variables tables doesnt really use a "normalization" scheme at all and thus it has no pro's or con's at all. It just simply exists. I think the only app for drupal that does use any type of normalization is CCK.

EvanDonovan’s picture

@crystaldawn: Have you considered making a patch for core? Of course, it wouldn't make it in to 7.x, but rather 8.x now, but it seems like it could be a very good improvement.

crystaldawn’s picture

I havent made a patch, but it would be as simple as commenting the existing get/set/del functions and replacing them in the include file with the qv's and then renaming the qv's to drupal's get/set/del function names and that would get everything running off qv's. The only thing I'd really need to do is to make a conversion function that duplicates whatever is in the variable table and makes sure that each entry also has an entry in the QV table which would be quite easy to do. But I think if I were to ever do that, I'd probably want to lobby to have parsing_api put in as a core module too lol. I use that module on almost every client project I get along with qv's. Its lightyears easier than using regex lol.

vacilando’s picture

I've also run into the memory problem due to the unreasonable caching of the whole table in variable_init() in bootstrap.inc.

Your module is a solution to that.

I however cannot install it because, if I understand, it does not convert the existing variable table values to your table. That would mean loads of settings would get lost immediately. What do you recommend?

rantnrave’s picture

@#14 There is a solution to your problem but it requires modifying core to utilize QVs rather than its internal system. The fix is to use the other module he wrote (crystaldawn) called drupal override function and what it does is allows you to completely override a drupal core function with a new function. This would enable you to continue to update drupal without problems and never lose your changes that you've made to core. The only downfall is that it does require you to install PECL APD or PECL Runkit. Both of which are NOT easy to do. There are some distros that come with one or both of these as packages and installing them may be extremely easy, but if it doesnt, its a real pain to get them compiled and working. Once you have overrides working you could then re-write any function you need (such as variable_get, set, del, and init). Doing so would automagically upgrade all instances of the variable calls to use QV's rather than drupals internal variable functions. You would need to do this because many modules use variable functions and those would all need to either be changed to QV's or the core variable functions would need to be re-written (or in this case overridden) before it would be of any value to you. It should be noted that function overriding is usually considered "black hat" but in some extreme instances it may be required.

andy inman’s picture

Not directly a solution, but it occurred to me to write something to track which variables are actually read, then leave that running for a while. After a reasonable time, anything which hasn't been read could probably be deleted.

vacilando’s picture

Have a look at http://drupal.org/project/variable_clean -- it can tidy up your existing variable table by removing those entries that are not used by enabled modules.

rantnrave’s picture

variable_clean seems to sound nice. But it is a very very new module. I wonder how well it actually works. It seems to me that deleting things that it thinks doesnt belong could be seriously dangerous. I'd rather have un-needed variables kickin around and deal with memory limits rather than have missing variables that break unknown things. I would probably only consider using such a monster on a small site. I think with a large variable table, it would probably wreak havoc and cause more problems than the memory problem its attempting to fix. I dont see how it can tell which modules use which variables since there is no way to track that sort of thing. You couldnt even track it by looking at the php source itself. One such example would be theme settings. Some modules load/set settings from other modules. One such example is the Location module. It seems to me to be an impossible task to make such a claim, I'd be interested to hear how that tracking is actually calculated.