When devel generate is asked to attach terms to an entity, it picks a "random" term by finding the highest id from that vocabulary in the taxonomy_term_data table, and generating a random number from 1 to that maximum. But in a multi vocabulary site, terms from a vocab are unlikely to start at id 1, making this range not very helpful. In fact, if the target vocabulary contains only small number of the total set of terms, a single term from that vocab will almost always be picked.

The attached patch uses the D7 ->orderRandom query option to pick the term randomly from only the right vocab. A D6 backport would have to select BOTH a min and a max from taxonomy_term_data to allow the previous logic to work reliably.

Comments

duellj’s picture

Status: Needs review » Reviewed & tested by the community

Works great. Before it would generate content mostly within one term in a vocabulary with a small (~10 terms) set. With the patch, content is generated more evenly over the available terms.

henrikakselsen’s picture

Works great for me too!

salvis’s picture

Status: Reviewed & tested by the community » Needs review

This is pretty inefficient. If we have 1000 terms in a vocabulary, then every call needs to randomize the 1000 terms. If we allow 10 terms and we generate 100 nodes, we'll randomize the 1000 terms 500 times.

I wonder how orderRandom() compares to counting the terms (this could be cached) and then using limit(mt_rand(0, $max - 1), 1) on the unsorted (!) terms...

salvis’s picture

Status: Needs review » Needs work

I do think this should be fixed, but orderRandom() is probably not the right way...

cobian.h’s picture

the patch gives me "Hunk #1 FAILED at 15."
:(

frank ralf’s picture

frank ralf’s picture

Patch applies and works as promised (with 7.x-1.3).

Will try the suggestions by salvis (#3) and report back as soon as I find the time.

Some resources for others who might want to give this a try:
* http://jan.kneschke.de/projects/mysql/order-by-rand/

pcambra’s picture

Agreed that orderRandom() is not the best solution performance-wise, but the way it's done now causes the same terms to be selected.

Also, we shouldn't depend on MySQL tools.

richard.thomas’s picture

Status: Needs work » Needs review
StatusFileSize
new1.67 KB

Here's my take on this, we do a count query first, if there are any taxonomy terms, we use mt_rand to select a random row between 0 and count-1, giving us a much better random distribution.

salvis’s picture

This looks reasonable — should we cache the count (per vocabulary)?

pcambra’s picture

Looks good, also a comment on performance, we might want to replace db_select by db_query.

salvis’s picture

Yes, but db_query() does not lend itself to rewriting anymore, and because of the missing translation layer it can cause problems for the non-MySQL databases. I'm eliminating all db_query()s from my modules.

richard.thomas’s picture

Sounds good, here is a new version with static caching of the count by vid.

salvis’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev

Looks good — let's see whether it applies to D8...

+++ devel/devel_generate/taxonomy.devel_generate.inc	(working copy)
@@ -15,22 +15,23 @@
+  // Count the available terms for this vocabulary, keeping a static cache to ¶

A trailing space crept in.

+++ devel/devel_generate/taxonomy.devel_generate.inc	(working copy)
@@ -15,22 +15,23 @@
+  if (empty($term_counts[$vocabulary->vid])) {
+    $term_counts[$vocabulary->vid] = (int) $query->countQuery()->execute()->fetchField();
+  }
+  if ($term_counts[$vocabulary->vid] > 0) {

Since you cater to the fact that the count could be 0, you should use !isset() rather than empty().

salvis’s picture

Status: Needs review » Needs work

The last submitted patch, devel_generate-betterrandomtaxonomyterms-1328380.patch, failed testing.

richard.thomas’s picture

Status: Needs work » Needs review
StatusFileSize
new1.89 KB

Good point, fixed to use !isset() and removed trailing space, here's a version that applies to 8.x-dev, but I have no drupal 8 environment to test in at the moment.

pcambra’s picture

+++ devel/devel_generate/taxonomy.devel_generate.inc	(working copy)
@@ -15,22 +15,23 @@
+  $term_counts = &drupal_static(__FUNCTION__, array());
+  if (!isset($term_counts[$vocabulary->vid])) {

We're returning an array keyed by 'tid', I guess this is never going to hit cache this way, is it?

nterbogt’s picture

It should hit the cache as it's keyed by $vocabulary->vid.

salvis’s picture

Looks good. Anyone wants to test this?

pcambra’s picture

StatusFileSize
new1.95 KB

Ok, tried the patch and the logic works great, had to change several things to make it work with D8 HEAD.

salvis’s picture

Thanks, pcambra! Are you OK with #21, richard.thomas?

richard.thomas’s picture

Yep, looks good to me.

13rac1’s picture

Status: Needs review » Needs work
Issue tags: +needs backport to 7.x-1.x

No longer applies to current 8.x-1.x-dev.

Tagging for backport.

13rac1’s picture

Status: Needs work » Needs review
StatusFileSize
new2.2 KB

My goal is fixing this in 7.x, but here is a patch for 8.x. Requires the patch in #2103393: field_behaviors_widget() is removed. Won't pass tests until that patch is committed.

pcambra’s picture

Issue summary: View changes
Status: Needs review » Needs work

Devel generation has changed to be pluggable in #2154141: DevelGenerate as a D8 plugin type so this needs a re roll.

moshe weitzman’s picture

Status: Needs work » Fixed

I think I fixed the bug, but in an inefficient way. Lets just call this fixed and discuss efficiency when needed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.