When devel generate is asked to attach terms to an entity, it picks a "random" term by finding the highest id from that vocabulary in the taxonomy_term_data table, and generating a random number from 1 to that maximum. But in a multi vocabulary site, terms from a vocab are unlikely to start at id 1, making this range not very helpful. In fact, if the target vocabulary contains only small number of the total set of terms, a single term from that vocab will almost always be picked.
The attached patch uses the D7 ->orderRandom query option to pick the term randomly from only the right vocab. A D6 backport would have to select BOTH a min and a max from taxonomy_term_data to allow the previous logic to work reliably.
Comments
Comment #1
duellj commentedWorks great. Before it would generate content mostly within one term in a vocabulary with a small (~10 terms) set. With the patch, content is generated more evenly over the available terms.
Comment #2
henrikakselsen commentedWorks great for me too!
Comment #3
salvisThis is pretty inefficient. If we have 1000 terms in a vocabulary, then every call needs to randomize the 1000 terms. If we allow 10 terms and we generate 100 nodes, we'll randomize the 1000 terms 500 times.
I wonder how
orderRandom()compares to counting the terms (this could be cached) and then usinglimit(mt_rand(0, $max - 1), 1)on the unsorted (!) terms...Comment #4
salvisI do think this should be fixed, but orderRandom() is probably not the right way...
Comment #5
cobian.h commentedthe patch gives me "Hunk #1 FAILED at 15."
:(
Comment #6
frank ralf commentedJFTR
This issue might be a duplicate of #1433412: Taxonomy terms aren't randomised when generating nodes with devel_generate.
Comment #7
frank ralf commentedPatch applies and works as promised (with 7.x-1.3).
Will try the suggestions by salvis (#3) and report back as soon as I find the time.
Some resources for others who might want to give this a try:
* http://jan.kneschke.de/projects/mysql/order-by-rand/
Comment #8
pcambraAgreed that orderRandom() is not the best solution performance-wise, but the way it's done now causes the same terms to be selected.
Also, we shouldn't depend on MySQL tools.
Comment #9
richard.thomas commentedHere's my take on this, we do a count query first, if there are any taxonomy terms, we use mt_rand to select a random row between 0 and count-1, giving us a much better random distribution.
Comment #10
salvisThis looks reasonable — should we cache the count (per vocabulary)?
Comment #11
pcambraLooks good, also a comment on performance, we might want to replace db_select by db_query.
Comment #12
salvisYes, but db_query() does not lend itself to rewriting anymore, and because of the missing translation layer it can cause problems for the non-MySQL databases. I'm eliminating all db_query()s from my modules.
Comment #13
richard.thomas commentedSounds good, here is a new version with static caching of the count by vid.
Comment #14
salvisLooks good — let's see whether it applies to D8...
A trailing space crept in.
Since you cater to the fact that the count could be 0, you should use !isset() rather than empty().
Comment #15
salvis#13: devel_generate-betterrandomtaxonomyterms-1328380.patch queued for re-testing.
Comment #17
richard.thomas commentedGood point, fixed to use !isset() and removed trailing space, here's a version that applies to 8.x-dev, but I have no drupal 8 environment to test in at the moment.
Comment #18
pcambraWe're returning an array keyed by 'tid', I guess this is never going to hit cache this way, is it?
Comment #19
nterbogt commentedIt should hit the cache as it's keyed by $vocabulary->vid.
Comment #20
salvisLooks good. Anyone wants to test this?
Comment #21
pcambraOk, tried the patch and the logic works great, had to change several things to make it work with D8 HEAD.
Comment #22
salvisThanks, pcambra! Are you OK with #21, richard.thomas?
Comment #23
richard.thomas commentedYep, looks good to me.
Comment #24
13rac1 commentedNo longer applies to current 8.x-1.x-dev.
Tagging for backport.
Comment #25
13rac1 commentedMy goal is fixing this in 7.x, but here is a patch for 8.x. Requires the patch in #2103393: field_behaviors_widget() is removed. Won't pass tests until that patch is committed.
Comment #26
pcambraDevel generation has changed to be pluggable in #2154141: DevelGenerate as a D8 plugin type so this needs a re roll.
Comment #27
moshe weitzman commentedI think I fixed the bug, but in an inefficient way. Lets just call this fixed and discuss efficiency when needed.