Using freetagging, we're having some scale issues with tagadelic. Simply put, the volume of taxonomy terms overwhelms the following query in tagadelic_get_weighted_tags():
$result = db_query_range('SELECT COUNT(*) AS count, d.tid, d.name, d.vid FROM {term_data} d INNER JOIN {term_node} n ON d.tid = n.tid WHERE d.vid IN ('. substr(str_repeat('%d,', count($vids)), 0, -1) .') GROUP BY d.tid, d.name ORDER BY count DESC', $vids, 0, $size);
In my install, with 300+ terms, this query takes 2000+ ms to complete.
The attached patch solves this by creating a tagadelic table in the database, which stores the data needed to generate the tag clouds.
This data is updated via nodeapi on 'insert', 'update', and 'delete.' Term data is updated on hook_taxonomy 'delete' or 'update.'
One issue not fully tested: Since taxonomy_node_save() needs to execute before tagadelic updates the term count, I'm not sure this approach gets us the most accurate counts. There would be at least one additional way to do so.
On node save, store a varaible (tagadelic_node = $node->nid). Then put a non-cached menu check againast that variable. If present, reset it to zero and run the mnew tagadelic_index_node() function.
Comment | File | Size | Author |
---|---|---|---|
#41 | tagadelic_cache_backport5.patch | 1.89 KB | David Stosik |
#40 | tagadelic_cache_backport5.patch | 1.91 KB | David Stosik |
#30 | cache_new1.patch | 1.86 KB | arthurf |
#28 | cache.new_.patch | 1.92 KB | arthurf |
#26 | cache_new.patch | 2.75 KB | arthurf |
Comments
Comment #1
agentrickardUpdate to the patch that creates a tagadelic_update() function that converts term_node data to the new table structure.
Comment #2
agentrickardAccidentally left a db_queryd in the last patch.
Comment #3
agentrickardAfter more testing, added a default ORDER BY to the get_weighted tags query.
MySQL times for the query is now sub 1 ms.
See http://rubybaboon.com/tagadelic
Comment #4
agentrickardMissed a condition in taxonomy_node_save where the term might be passed as an array of objects.
This patch corrects for that.
Comment #5
Bèr Kessels CreditAttribution: Bèr Kessels commentedI am just wondering why you chose for the route of a private table and not simply caching the retreived array with weighted tags in the cache table? It would make the code a lot cleaner afaik.
I might miss an obvious point though :)
Comment #6
agentrickardCaching the array didn't occur to me until later ;-).
In our implementation, we are running auto-generated content that updates via cron, and our tag cloud changes frequently. We wanted the tag clouds to update without lag.
That said, using cache may be a better route.
In either case, a patch is needed IMO, because of scale issues.
Comment #7
Bèr Kessels CreditAttribution: Bèr Kessels commentedYes a patch is certainly needed. But i prefer this module to remain tableless, and since we have a proper solution (cache) we can solve this much easier.
agentrickard, do you have the resources to rewrite this whole patch so it uses Drupals caching instead? If not, please let me know too, so I can have a look at the current patch more closely.
Comment #8
agentrickardPatch vs. 4.7 or vs. HEAD?
I can do a cache patch vs. 4.7 easily.
Comment #9
agentrickardI think this should work against 4.7 (// $Id: tagadelic.module,v 1.19.2.3 2006/06/13 19:35:35 ber Exp $).
The change is in tagadelic_get_weighted_tags().
Before running the db_query, I check for a cache object. Tagadelic cache objects are named 'tagadelic:' followed by a list of $vids. They are set to expire every half hour.
Tested on localhost only.
Comment #10
Bèr Kessels CreditAttribution: Bèr Kessels commentedIf possible against CVS. I am very reluctant to introduce new features on a stable branch.
If not possible, please let me know, so I can try to apply (by hand) to a CVS version. By now, I did not even try that, because you mention it explicitly for 4.7.
The code looks good. I love the fact that with only five (six?) changed lines (disgregarding the change in identation) you introduced this caching. Nice job!
Bèr
Comment #11
agentrickardCVS changes would take a little while, since Dries just committed the cache table patch and my cvs version is out of date. (http://drupal.org/node/72290).
I'll see what I can do over the long (US) holiday weekend.
Comment #12
Bèr Kessels CreditAttribution: Bèr Kessels commentedSorry! My mistake! no, I meant CVS as in "the cvs version of tagadelic". tagadelic HEAD is still completely 4.7-compatible. I will only start to upgrade tagadelic to core-head it that head becomes frozen (and maybe even then i'll wait a little).
So: I meant: use HEAD tagadlic, on a 4.7 Drupal, and make it work there. You will find that HEAD tagadelic has loads of features that 4.7 tagadelic has not. Yet both run fine on 4.7
(why not release it for 4.7 I hear you think! well: I prefer stability, incubation times and predictability. Example: on my host I needed to upgrade image module 4.7 to the new image module 4.7 (security issue) and by doing so, I broke 30 sites! Because image module 4.7 (new) was completely different from 4.7 (old). I think that is BAD. I prefer not to go there... (though I love the new split up image modules)
Comment #13
agentrickardThe whole CVS version thing confuses my little mind.
Attached is a Drupal 4.7.3, Tagadelic 4.7.cvs patch for caching the $tags generated by tagadelic_get_weighted_tags().
Comment #14
agentrickardLooking at the patch, it may be necessary to add $cid = 'tagadelic: nullset' before the else on line 203.
This would force a null return on line 207 [$data = cache_get($cid);].
This is only needed if sending a blank $cid might return data from the cache table.
Setting the $cid to 'tagadelic: nullset' would ensure that no data match is found in the cache table.
Comment #15
Bèr Kessels CreditAttribution: Bèr Kessels commentedFrom your last comment I assume that this patch needs some work.
Comment #16
Bèr Kessels CreditAttribution: Bèr Kessels commentedhttp://drupal.org/node/64022 was marked duplicate of this.
Comment #17
agentrickardNo, I don't think that modification is necessary.
If $cid is not set, then cache_get should fail to return a result.
This would only be an issue if the cache table allowed cid to be NULL, which it does not.
That comment was simply the only flaw I could see in the logic.
I think the patch is ready for testing.
Comment #18
Bèr Kessels CreditAttribution: Bèr Kessels commentedI have no experience with benchmarks, so if someone strolls along who feels like doing some, please give me some stats on the performance before and after this patch?
Comment #19
Bèr Kessels CreditAttribution: Bèr Kessels commentedThis renewed patch is untested, but should work on HEAD. please try it and report if it worked for you. Thanks
Comment #20
mudanoman CreditAttribution: mudanoman commentedBer,
It doesn't look like the cache patch file was uploaded properly.
Thanks,
Ivan
Comment #21
Bèr Kessels CreditAttribution: Bèr Kessels commentedretry
Comment #22
Bèr Kessels CreditAttribution: Bèr Kessels commentedHmm, somehow the file itself is empty, this means that I lost my work.
Comment #23
m3avrck CreditAttribution: m3avrck commentedI ran some benchmarks on just caching the entire tagadelic block output in general and it's significantly faster up to 14% or so.
Depends on the number of terms, etc... but with 100 terms and showing 50 in 3 different tagadelic blocks, it was great.
I have not yet benchmarked this patch in particular but it seems like you'd see equal results. I'll have a more detailed review tomorrow.
Comment #24
alliax CreditAttribution: alliax commentedWhat is the current state of this? Since you lost all your work on caching the results in the cache table, nothing has been done?
I am using tagadelic happily but as one site I have a freetagging and the catgory got bigger, with under 1000 visitors a day, 1and1 blocked my database and I had to change hosting company before I realised having 3 tagadelic blocks on every page view without block caching might be why the mysql consuming was too high. I removed the tagadelic blocks for now.
Comment #25
arthurf CreditAttribution: arthurf commentedThe last two patches don't seem to exist any more, not sure what's happened. Anyway, I did a quick and dirty version of a caching scheme that has admin configurable time limits.
Comment #26
arthurf CreditAttribution: arthurf commentedIgnore my last patch. This one should work a bit better.
Comment #27
Bèr Kessels CreditAttribution: Bèr Kessels commentedI don't like the time limits. Drupal has all sorts of comlpex logic to determine if a cache needs wiping. We should not add more stuff to that.
Without the option the patch looks fine.
Comment #28
arthurf CreditAttribution: arthurf commentedPoint well taken. Here's a patch minus the time stuff.
Comment #29
Bèr Kessels CreditAttribution: Bèr Kessels commentedGood! But some teenyweeny thingsies. Not that I want to sound like an *ss, but I'd like to keep stuff clean, or else not make it messier then it is =)
* please remove the comment about caching, that is a hint/todo, wich would be done after this patch =)
* mind your spacing its if ($omething) { and not if( $something) . same for teh else followed by a wite line.
For the rest: good one and ready to go.
Comment #30
arthurf CreditAttribution: arthurf commentedPoints well taken about style. I wasn't even paying attention. Here's the cleaned up version.
Comment #31
Bèr Kessels CreditAttribution: Bèr Kessels commentedComment #32
Bèr Kessels CreditAttribution: Bèr Kessels commentedCommitted to HEAD. Thanks a lot.
If anyone feels like backporting this to a stable branch, please open a new issue thread.
Comment #33
Bèr Kessels CreditAttribution: Bèr Kessels commentedComment #34
(not verified) CreditAttribution: commentedComment #35
jayjaydluffy CreditAttribution: jayjaydluffy commentedI am about to apply this patch on my module in my site since I am having the same problem with my robust taxonomy. Should anyone of you here sum up what this patch does? This has been a long thread and me as a newbie could hardly follow the discussion.
I'm looking forward to your response to my request, and it's highly appreciated.
Thanks
Comment #36
alliax CreditAttribution: alliax commentedYou just have to update your module to the dev version, no need to apply a patch. Then, whenever the module maintainer will decide, you will be able to switch to the next release version.
That's what I understood, and when I upgraded to the latest dev version, the problem of the tagadelic bloc not updating when site cache was activated, suddenly disappeared.
Comment #37
jayjaydluffy CreditAttribution: jayjaydluffy commentedThanks though I already have successfully patched my module with the latest patch above. Well, it goes perfect. Thanks again!
Comment #38
alliax CreditAttribution: alliax commentedExcuse me, my last answer was completely out of context, I've mistaken two different issues and I was replying like I was talking about the other one. So please just ignore my last comment :-)
Comment #39
jayjaydluffy CreditAttribution: jayjaydluffy commentedI found something weird after applying this patch. After some deep observation on its performance, I found out that everytime a tag is inserted in a node, the cache is cleared and new data is set to cache. I want to disable this stuff since I've added admin control to automatically clear the cache after n seconds, where n depends on configured number of minutes. I've been examining it in the module but cannot locate where the stuff occurred. Please anyone can help me find it. Thanks!
Comment #40
David Stosik CreditAttribution: David Stosik commentedHello,
Here is a patch that backports caching system on DRUPAL-5 branch.
Hope this can be applied.
By the way, if #391930: Tagadelic block cache displays wrong tags gets applied, I will need to backport this change (using cache_page instead of cache table).
Regards,
David
PS: don't use this patch, there is a forgotten dpr. uploadin new patch in next post.
Comment #41
David Stosik CreditAttribution: David Stosik commentedClean patch here.
Comment #42
Bèr Kessels CreditAttribution: Bèr Kessels commentedLooks good and solid. Tested on my blog (http://bler.webschuur.com/tags) and found no (degrading) issues.
If others can test this on large(r) databases, that would be great. Else it will go in to the next release this weekend.
Comment #43
Bèr Kessels CreditAttribution: Bèr Kessels commentedber@yasmine:~/Documenten/TAG_tagadelic/unfuddle/tagadelic-5.x$ patch -p0 < tagadelic_cache_backport5_0.patch
(Stripping trailing CRs from patch.)
patching file tagadelic.module
Reversed (or previously applied) patch detected! Assume -R? [n]
Apply anyway? [n] y
Hunk #1 FAILED at 204.
1 out of 1 hunk FAILED -- saving rejects to file tagadelic.module.rej
Comment #44
David Stosik CreditAttribution: David Stosik commentedCould you explain more, please?
Comment #45
Bèr Kessels CreditAttribution: Bèr Kessels commentedWell, the patch fails on a DRUPAL-5 checkout. patch says that it sees changes already in there.
What version did you patch against?
Comment #46
David Stosik CreditAttribution: David Stosik commentedWell, if I can remeber, it was the HEAD at the time I posted...
Comment #47
MacMannn CreditAttribution: MacMannn commentedI'm pretty new to Drupal, but I've been playing around with the Tagadelic module (// $Id: tagadelic.module,v 1.36.2.12 2009/03/29 12:52:40 ber Exp $). On my testsite I've created a menu item that leads to a page where the tagcloud for all the content on the site is being shown. At first this seemed to work OK, but after a while I noticed that the cloud was not updating as more tags were added. Weird. Upon further examination it turned out that the cloud was generated once, CACHED, and loaded from the cache ever after...
Having read through the discussion above I can understand the need for caching. When you load a cloud on every node as a block, that could indeed produce some serious performance issues. In my case there's very little chance of that, since the cloud will only be generated when the user visits the specific cloud page by clicking on a menu item. So, I tried to work around the issue by changing the code of the module. By commenting out the following lines, I've been able to take out the caching and made the module behave as I wanted to.
211 $cache_name = 'tagadelic_cache_'. $options;
212 $cache = cache_get($cache_name);
215 if (isset($cache->data)) {
216 $tags = unserialize($cache->data);
217 }
218 else {
227 cache_set($cache_name, 'cache', serialize($tags), CACHE_TEMPORARY);
228 }
I'm not sure if the fact that the cloud was generated only once and loaded from the cache ever after on my testsite is caused by my setup of if it's a flaw in the module. But if it's a bug, then this comment might be helpful to others running into the same trouble.
Comment #48
David Stosik CreditAttribution: David Stosik commentedThis thread is only about backporting a feature of the Drupal 6 version. If you think caching the tagcloud is a bug, please open your own issue, or find another issue dealing with this problem. ;)
Comment #49
David Stosik CreditAttribution: David Stosik commentedBy the way the issue #391930: Tagadelic block cache displays wrong tags already deals with the problem you raise.
Bèr Kessels, should it be backported to D5?
Comment #50
Bèr Kessels CreditAttribution: Bèr Kessels commented