Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
taxonomy_term_count_nodes() (used by directory.module and some other contrib modules) return the wrong count when nodes are assigned several terms (multi-select vocabulary). The node is counted as many times at it has terms.
Comment | File | Size | Author |
---|---|---|---|
#68 | taxonomy_count_nodes_1054616-68-D6.patch | 1.93 KB | JeremyFrench |
#67 | messages-error.png | 727 bytes | anawillem |
#62 | taxonomy_count_nodes_D6_2.patch | 2.69 KB | JeremyFrench |
#61 | taxonomy_count_nodes_20100920.patch | 1.08 KB | PEF-dupe |
#59 | taxonomy_count_nodes-1.414.2.16.patch | 2.79 KB | perarnet |
Comments
Comment #1
beginner CreditAttribution: beginner commentedThis is an almost complete rewrite of taxonomy_term_count_nodes().
Surprisingly, this module is not used by core, but it is by some contrib module.
The rewrite amounts to a change of API, fixing the bug and adding some functionality in the process.
As such, the patch will most certainly NOT be accepted for Drupal-5.
However the patch is against Drupal-5 and to be tested with Drupal-5, because the contrib modules to test with are not ported to D6 yet.
To reproduce the bug:
enable the contrib directory.module.
Create a node which has many terms within the same vocab (multi-select) (the terms should be descendants of the same parent term.
In the settings, enable 'show node counts'.
See the directory main page at ?q=directory: the nodes are counted twice at the common parent term.
The way to correct the bug was to count separately the term's own nodes and those of child terms, making sure there are duplicate.
Since we are doing so, we are now able to return more information upon return: instead of returning an integer, we return an array of different values, as noted at the top of the patch.
If this way of solving the problem is properly tested and deemed acceptable, then I can find a way to roll a proper patch for Drupal-6.
Btw, project.module does "not use taxonomy_term_count_nodes() because it includes child terms' counts" (quote from project.module code). Instead, it has created its own implementation With this patch, it could use this function again by looking at the proper value within the returned array.
Comment #2
beginner CreditAttribution: beginner commentedHere is a patch for D6/HEAD.
It is a complete rewrite of the function.
Here it is, for easier review:
Comment #3
stella CreditAttribution: stella commentedHi,
I tested the drupal 6.x patch version and it gives the error below. The patch seems to call a function from the directory module. Surely any patch to the taxonomy module should be independent of the directory module?
Cheers,
Stella
Comment #4
Pasqualleneed a verdict here
issue summary:
1. the function is incorrect
2. the function is not used by core
3. the patch is an API change
4. the patch still not work
the possibilities:
1. drupal 7.x
2. fix the patch quickly
3. remove the function
feel free to change back the priority after final decision
Comment #5
moshe weitzman CreditAttribution: moshe weitzman commentedwe are accepting bug fixes before release and afterwards. so i think 2 is reasonable. the patch here looks more complicated than i expected.
Comment #6
beginner CreditAttribution: beginner commentedmy verdict:
1) the function is not used by core -> minor
2) but it's useful and used by some contrib modules -> so we keep it in core (unless someone objects).
3) a bugfix can happen any time, so keep it in the D6 queue, even if the fix appears after the D6 release.
About the proposed fix:
a) forget the patch for a while. See the general idea in #2.
b) Yes, it's complicated. That's why I haven't bothered to update the patch: I need a proper review on the concept first.
c) In order to get that review, I set CNR. The patch doesn't apply but I need feedback on the idea.
d) the process is quite cpu intensive (??), so I cache the results, but I am far from sure it's the right approach.
So, please:
http://drupal.org/patch/review
Comment #7
mlncn CreditAttribution: mlncn commentedI have an alternate, simpler patch that applies against Drupal 5.
I think the priority is normal, it's not good for Drupal 6 to ship with bugs. The bug in this function in Drupal 5 certainly burnt me-- it would have been better if it weren't there.
The difference in my approach is that it doesn't change the API or take the step of caching this result as a Drupal variable.
If I add in beginner's check for not double-counting (or worse) nodes belonging to both a term and one or more children terms –
– I think I'd prefer my patch.
benjamin, Agaric Design Collective
Comment #8
beginner CreditAttribution: beginner commentedMerci beaucoup. I'll test your patch.
I'll make time for that next year (!!) ;)
Comment #9
brenda003I tested both patches on D5. Firstly, returning an array was unexpected results (yes, even after reading that's what happens!). There must be a better way to do this. I also received several errors on line 1105 about Invalid argument supplied for foreach.
This line:
Still the issue of having the parent terms tallying the TOTAL number of nodes both within the parent term and the children term doesn't seem to be working. The array includes both count_own and count_children - so I'm guessing we'll need to add these up. This feels dirty.
Agaric's approach is cleaner and a good starting point, perhaps another placeholder for DISTINCT should be added to the function to complete it.
Comment #10
Wim Leersbeginner's patch is way too complex. But I've seen it working in the directory module.
Benjamin Melançon's patch doesn't work at all. All it does, is exclude the children completely from being counted. I'd even call that a new bug.
A simple illustration of the problem:
3 nodes are tagged with "sound", 2 nodes are tagged with "video". 1 of the 3 nodes that is tagged with "sound" is *also* tagged with "video". So the affected number of nodes is 4, *not* 5.
So when we retrieve the node count of the "home cinema tag", we will get 0 for the actual tag, 3 for "sound", 2 for "video". But we're actually counting the same node *twice*! So we're getting instead of 4.
Obviously, Benjamin Melançon's patch, by simply excluding all children, doesn't fix this.
Comment #11
Wim LeersI've fixed this for D5. I ported it to D6/D7 (which use versioned taxonomy terms), but didn't test it there, it's a trivial port though, so I expect it to work.
I know I need to add more comments, but I'd like to get some feedback on my approach first. It's fairly trivial: I keep the current taxonomy_count_nodes() function, but in order to get the correct count (i.e. only count child nodes if no ancestor nodes are assigned to the same node), I exclude nodes from the count that match ancestor terms.
Comment #12
Wim LeersI just realized this doesn't work 100% correctly yet. It doesn't take one case into account yet: when a node is assigned 2 terms of the same level (i.e. sound and video in my example of #10).
I guess we will have to resort to collecting an array of nids (vids in D7) and then keeping only the uniques. That's all I can think of ATM at least.
Comment #13
beginner CreditAttribution: beginner commentedKeeping an array of nids and keeping only the uniques is precisely what I am doing.
I am sorry that Benjamin's patch does not work out. I know my approach is very complex, but I thought a long time about it, and so far, it is the only one that works.
Please read again comment #6, then review #2.
Comment #14
Wim LeersWhile thinking about this, I've found a way to do it without keeping an array of nids (which is *very* bad for scalability). I'll update my patch tonight.
Comment #15
Wim LeersUpdated patch.
Pros/cons:
+ *less* code than what's currently in core
+ no recursion is needed.
+ less PHP memory usage (thanks to lack of recursion: we only cache the counts for the terms that were actually requested, instead of for all children too)
- more stress on the DB server
? scalability
This solution is also much more scalable than beginner's patch: we don't have to collect all nids on the server side, we simply let the DB do all the work (as it should be).
However, if you ask for the node count of a term with a lot of descendants (children, grandchildren, and so on), the query time will increase. We'd have to do benchmarks to make definitive conclusions.
On the other hand, this *fixes* the bug, with *simpler* code. It's not used by Drupal core itself, so minimal benchmarking should be sufficient.
Comment #16
catchThis needs re-rolling for dbtng (and I'm not sure how that works for the IN() here). Will definitely need benchmarks too.
Comment #17
catchPatch no longer applied. Started initial dbtng conversion but then realised this needs refactoring to use the SELECT query builder (and I'm not sure how that plays with COUNT() and DISTINCT() yet). So posting untested initial re-roll so there's an applicable patch to work against later. Also this will need a test.
taxonomy_get_tree() needs fixing anyway for performance, not the job of this patch. I also agree that we only need minimal benches as the function is pretty far outside the critical path.
I grepped contrib yesterday to see if it this got used before seeing this issue - looking for dead code in taxonomy module. It does seem to be a useful utility function, and it's something which would be handy for term administration too - finding 'unassigned' terms for bulk deletion for example. So simplifying it and having it actually work seems like the best approach to take.
We might be able to simplify things even further by fetching the results into an array keyed by nid using fetchAllAssoc('nid'), which would remove duplicates without further array munging - then return a count on the array. Then it'd just be a simple select and count(). No idea how this would compare for scaling with a large result set to the distinct and count, but will look into that as well.
Comment #18
catchHere it is with dbtng conversion, addExpression() makes the DISTINCT(COUNT(nid)) easy to do :)
Tests forthcoming, hopefully later today.
Comment #19
catchFound a place to use the function in an existing test - this doesn't deal with the original bug, but it gives us some coverage, and also allows us to use an API function instead of a direct query in one place.
On my system I'm getting array to string conversion notices and two fails, whilst testing the function via devel's execute PHP works so far. So not quite sure what's happening yet. Will leave at CNR so it runs via the test bot.
Comment #20
catchRe-rolled for whitespace and to nudge the test bot.
Comment #21
Dries CreditAttribution: Dries commentedcatch, your last patch does not seem to have the tests that the second-to-last patch had?
Comment #22
catchDries, you're right - and even though the penultimate patch comes up 4k, I only see the taxonomy.module hunks, hence the bad re-roll. If I don't have it locally, it should be easy to recreate anyway.
CNW for now (and there were test failures with the actual tests - which I wanted the testbot to double check for me, which can't happen if they aren't in the patch, doh!).
Comment #23
catchSome initial tests, while it seems to work when running in devel, it's not happening in the test. Specifically, despite saving a parent/child relationship between terms, taxonomy_get_tree for the parent doesn't return this. Not done much debugging yet, posting for reference.
Comment #24
catchHere it is with working tests. I had to add a $reset argument to both taxonomy_get_tree() and taxonomy_term_count_nodes() to get the tests not failing dismally. We need this anyway until standardise static caching goes in. The tests handle individual terms, terms with no nodes of their own but with children, the case above when two children are assigned to the same node, and the node type argument.
As noted above, it's not used in core (which means the tests are the only coverage we get) - although it might come in handy later if we want to add removing 'orphaned' taxonomy terms in taxonomy administration (which I have plans to).
Comment #25
catchExtra code removal (_taxonomy_term_children() -which is redundant now).
Comment #26
catchAnd the patch...
Comment #28
catchRe-rolled for changes to taxonomy_get_tree() parameters.
Comment #29
theabacus CreditAttribution: theabacus commentedHmmm, both patches (#11 & #15) fail on Druapl 6.6. Hunk fail on lke 896 and 910 respectively.
Comment #30
catch@theabacus - any chance you could test it on HEAD? Once it's in we could potentially do a backport, but that won't happen until it's committed to HEAD.
Comment #31
theabacus CreditAttribution: theabacus commentedThe patch applied successfully [Hunk #1 succeeded at 915 (offset 5 lines)] and I ran it through some testing. I was unable to do extensive testing due to the 2nd error. Please let me know if any clarity is needed. I will be happy to test the next version when it is released.
Taxonomy Tree used
> Edibles (Vocab)
> Fruit
> Apple
> Golden Delicious
> Granny Smith
> Orange
Error occurs when node has a single term.
(Error: notice: Undefined index: 2 in .../drupal-6.x-dev/modules/taxonomy/taxonomy.module on line 966.)
Error occurs when creating a term over 1 level deep (i.e. creation of Granny Smith).
Fatal error: Call to undefined function hs_taxonomy_term_count_nodes_exclude_ancestors() in .../drupal-6.x-dev/modules/taxonomy/taxonomy.module on line 963
Term counts still returns incorrect count in certain circumstances:
Node terms: Apple, Orange
Fruit term count: 2 (Incorrect. Should be 1)
Node terms: Fruit, Apple
Fruit term count: 1 (Correct)
Node Terms: Fruit, Apple, Orange
Fruit term count: 1 (Correct)
Comment #33
catchre-testing.
Comment #35
catchWorks for me, uploading again.
Comment #36
CitizenKane CreditAttribution: CitizenKane commentedPatch applies cleanly to D7 HEAD and all tests pass. Works with hierarchical terms. However, this does not seem to work with synonyms. Using the follow procedure:
1.) Make terms ny and New York.
2.) Make New York a synonym of ny.
3.) Make two nodes, tag one with New York and the other with ny.
4.) do taxonomy_term_count_nodes with tid of ny.
In my testing this returns 1. I would instead expect this to return 2. Any thoughts? I'm not sure if that's the desired behavior for synonyms.
Comment #37
catchSynonyms are just a lexical synonyms for individual terms stored as text strings, they don't actually involve any kind of relationships between terms. I think based on this we're probably RTBC here, but I won't change the status since the last patch has my name on it.
Comment #38
CitizenKane CreditAttribution: CitizenKane commentedBased on #36 and #37 this seems RTBC.
Comment #39
catchRe-rolled for changes to taxonomy module tables.
Comment #41
catchLet's try that again.
Comment #43
catchretesting.
Comment #44
drewish CreditAttribution: drewish commentedI don't like the extra lines between @params. We don't do that elsewhere in core. I'm not saying you should strip them out of the entire module but you should drop them where you're already touching the docs.
I really hate 0 as a default for $type. What's wrong with NULL or ""? Also it makes the tests really confusing seeing "page" some places and 0 others.
Comment #45
catchRe-roll per drewish's comments, also moved some more stuff inside the static check in taxonomy_term_count_nodes().
Comment #46
catchComment #47
catchOne local fix didn't make it into the patch, this should be it.
Comment #48
webchickComments from IRC:
I found it odd that $reset was not the last parameter in taxonomy_get_tree(), but catch pointed out that the final parameter was for internal use only, so this makes sense.
Should be "array" not "aray"
In the tests, there's lots of stuff like:
That second param was changed to NULL per drewish's feedback, but the test calls are still 0.
Also, the first time you call the reset param, could you explain in a comment why you're doing so? Perhaps do the same under:
Wim, if you're out there, I'd love to see you review this one more time. Otherwise I'll probably commit it once these minor issues are fixed.
Comment #49
catchRe-rolled with those changes.
Comment #50
webchickAwesome. :) Thanks a lot!
Committed to HEAD.
Comment #52
catchStill a bug in Drupal 6.
Comment #53
catch#372722: Calling _taxonomy_term_children() results in "PHP Fatal error: Allowed memory size of 268435456 bytes exhausted" describes major memory issues with the current D6 code, bumping priority.
Comment #54
catchBackport based on Wim's patch in #15. I've left out the API changes since while they're necessary for the simpletests to work, we might not need them just to make this less broken.
This patch is completely untested - I don't have any sites using taxonomy_term_count_nodes() at the moment (or not that I can think of).
Comment #55
JeremyFrench CreditAttribution: JeremyFrench commentedSubscribe, I've just spent several hours trying to work out why my code wasn't adding items to the taxonomy in a simpletest.
Comment #56
ibis CreditAttribution: ibis commentedsubscribe
Comment #57
perarnet CreditAttribution: perarnet commentedRerolled patch in #54 with latest Drupal release. Initial testing ok.
Comment #59
perarnet CreditAttribution: perarnet commentedTrying again - with cvs patch
Comment #61
PEF-dupe CreditAttribution: PEF-dupe commentedFix just with add DISTINCT in the two SQL requests:
- $result = db_query(db_rewrite_sql('SELECT t.tid, COUNT(n.nid) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 GROUP BY t.tid'));
+ $result = db_query(db_rewrite_sql('SELECT t.tid, COUNT( DISTINCT n.nid ) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 GROUP BY t.tid'));
and
- $result = db_query(db_rewrite_sql("SELECT t.tid, COUNT(n.nid) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 AND n.type = '%s' GROUP BY t.tid"), $type);
+ $result = db_query(db_rewrite_sql("SELECT t.tid, COUNT( DISTINCT n.nid ) AS c FROM {term_node} t INNER JOIN {node} n ON t.vid = n.vid WHERE n.status = 1 AND n.type = '%s' GROUP BY t.tid"), $type);
It's works for me when I use the block of Catalog module in Ubercart.
Comment #62
JeremyFrench CreditAttribution: JeremyFrench commentedRename patch from #59 and remove strange first line. Still completely untested.
Comment #63
Andrew Gorokhovets CreditAttribution: Andrew Gorokhovets commented+1
Comment #64
kenorb CreditAttribution: kenorb commentedPatch #62 is working for me. My terms were doubled, after patch #62 - it fixes the issue.
Comment #66
dpearcefl CreditAttribution: dpearcefl commentedPlease repost the patch in #62 with a proper filename (so the system knows it is a patch) and mark the issue "needs review".
http://drupal.org/node/1054616
[description]-[issue-number]-[comment-number].patch
Comment #67
anawillem CreditAttribution: anawillem commentedPLEASE.
We also need this urgently and have a policy against hacking the sacred core of drupal. Knowhatimean?
pretty please.
Comment #68
JeremyFrench CreditAttribution: JeremyFrench commentedPatch attached, first I have done on core with git. Hopefully it is all correct.
[description]-[issue-number]-[comment-number].patch + D6 so testbot ignores it.
Refactored a little and tested in a limited sense. Could do with testing in a more general sense.
Comment #70
JeremyFrench CreditAttribution: JeremyFrench commentedNot sure why test bot tested, and why it tested the wrong patch. Trying to set to needs review again.
Comment #71
dmention_eblack CreditAttribution: dmention_eblack commentedThe fix that PEF posted in comment #61 fixed the issue for me.
Has any progress been made on getting this into core?