Problem/Motivation
If two different vocabularies contain terms of the same name, the API function taxonomy_get_term_by_name() returns the first match. This can make the function unstable, as it may select a term from an unexpected vocabulary.
Proposed resolution
Add the vocabulary machine name as an optional argument for taxonomy_get_term_by_name(). Patch in #105 implements this solution.
Remaining tasks
Taxonomy maintainer has signed off on the change. The patch has been updated to avoid unnecessary queries and always return an empty array when there are no valid results. The latest patch in #105 needs to be reviewed and marked RTBC if appropriate.
#920600: Document the deprecated $conditions parameter in entity_load() added an @todo to taxonomy_term_load_multiple() indicating that the $conditions parameter will be
deprecated in favor of first calling EntityFieldQuery::execute() to retrieve the list of entity IDs matching a given set of conditions.
We need clarification on whether this will alter the signature of taxonomy_term_load_multiple() and whether the patch would need any changes to be forward-compatible.
User interface changes
None.
API changes
Minor, backwards-compatible addition only. No existing code will be affected.
An additional, optional argument is added to taxonomy_get_term_by_name(). The signature becomes:
taxonomy_get_term_by_name($name, $vocabulary = NULL)
Original report by @dutchslab
Am I mistaken, but will taxonomy_get_term_by_name() return the first occurrence of the term name, with no regard to the vocabulary it's associated with? If you have multiple vocabularies with same terms, this function isn't very useful anymore . . .
| Comment | File | Size | Author |
|---|---|---|---|
| #126 | taxonomy-vocabulary-336697-126.patch | 3 KB | oriol_e9g |
| #123 | taxonomy-vocabulary-336697-123.patch | 3 KB | oriol_e9g |
| #120 | taxonomy-vocabulary-336697-120.patch | 3.04 KB | oriol_e9g |
| #114 | taxonomy_vid_arg-336697-POST-APOCALYPSE.patch | 3.05 KB | xjm |
| #105 | 336697-105.patch | 3.01 KB | xjm |
Comments
Comment #1
catchThis is right, and should be an easy enough fix. Moving to Drupal 7 since bugs get fixed there first, then backported.
Comment #3
arthurf commentedI use a modified version of taxonomy_get_term_by_name() to select by vocabulary ids to limit my number of returns- this maybe helpful for you. I also think it might be a nice thing for other modules to have access to.
Note for sub D7, the table name has to change from taxonomy_term_data to term_data
Comment #4
karschsp commentedtagging for the novice queue.
Comment #5
jbomb commentedComment #6
jbomb commentedComment #7
jbomb commentedhere's a slightly modified version of arthurf's fix from #3. I modified it slightly so that $vocabularies ($vid in the patch) is passed to the db driver as (an) argument(s).
Comment #8
jbomb commentederr... noticed something's wrong with the patch, re-rolling.
Comment #9
jbomb commentedthat's better
Comment #10
jbomb commentedComment #12
catchThis ought to be handled when #343788: Taxonomy module doesn't use it's own APIs gets in.
You'd be able to do
It'd then be easy to add the extra argument to taxonomy_get_term_by_name().
The only thing missing at the moment is the case insensitive condition on name.
Comment #13
jbomb commentedShould this be marked duplicate?
Comment #14
catchYours does multiple vids, which isn't and won't be covered by the other patch, so probably worth keeping open for that. Also the other patch is more general, and isn't committed yet, whereas this is very specific. Could use some reviews over there though :)
Comment #15
davyvdb commentedComment #16
yesct commentedI did a coding standards review (http://drupal.org/coding-standards): looks OK there,
and I downloaded the D7 head from cvs and when I applied the patch I got:
ctheys% patch -p0 < ../336697-tax-name-vid.patch
patching file modules/taxonomy/taxonomy.module
Hunk #1 succeeded at 888 (offset 1 line).
Hunk #2 succeeded at 1209 (offset 1 line).
patching file modules/taxonomy/taxonomy.test
Which is probably OK.
It installed OK then... and my eyes kept closing, so I'm going to stop there and sleep. :)
Comment #17
yesct commentedOK. I'm not sure what I should try in order to "exercise" the patch.
I thought I would try something, so I made two vocabularies, and put the same terms in each. Then I made new content and selected the term for each vocabulary, and I tried putting only one term on content, and some more variations and combinations. And it seems to be working as I expect (listing the right content under the right term in the right vocab).
I'd appreciate leads on what exactly to try testing for this one.
Comment #19
coltraneI've wanted this feature since Drupal 4.7. Let's hope it can get into 7.
Comment #20
coltraneMaking the title more specific.
Comment #21
davyvdb commentedPatch looks nice. Does what it needs to do and we have a test for it.
Comment #22
webchickSorry, too late for Drupal 7 now. :\ Unless it fixes a critical bug of some kind, no more API changes.
Comment #23
pasquallein this form this function should be removed, as it is not used at all in core (simply because it totally useless)..
Searching for a term in all vocabularies? I do not know any use case for that..
Comment #24
davyvdb commentedSome contributed module developer might already be depending on this. You never know.
Comment #25
pasquallethen it must be a bug in that module
Comment #26
lyricnz commentedReroll for Drupal 8, and improve tests to add multiple terms with the same name, then check the $vid filter is working.
> 171 passes, 0 fails, 0 exceptions, and 51 debug messages
Comment #27
mr.baileysPatch looks good to me, and I agree that being able to restrict on vid makes sense.
Some minor style issues with the current patch:
vid should be marked as "(Optional)" in the comment (see http://drupal.org/node/1354#general)
Trailing whitespace.
Assertions should end with a period (2 occurrences).
These two properties are optional, I'd consider dropping them since they play no role in this test.
Powered by Dreditor.
Comment #28
oriol_e9g1, 2, 3 & 4, not 5.
Comment #29
oriol_e9gComment #30
oriol_e9gFile rename for testbot?
Comment #32
oriol_e9gInvalid unix-style line endings, trying again.
Comment #33
oriol_e9gComment #35
oriol_e9g#26: taxonomy-get-name-vid-336697.patch queued for re-testing.
Comment #36
oriol_e9gOk, fail again xD. Tomorrow a good patch & 5.
Comment #37
oriol_e9g1, 2, 3 & 4
Comment #38
oriol_e9gAnd 5
Comment #39
lyricnz commented- consistent capitalisation of Vocabulary ID in the first two PHP doc comments
- wrap PHP doc and comments at 80 characters
- typo "diferent"
- typo/english in last comment:
that not exists ==> that doesn't exist
but existst in ==> but exists in
Comment #40
oriol_e9gNew patch. Thanks!
Comment #41
oriol_e9gComment #42
oriol_e9gSorry. One more 80 characters comment fix
Comment #43
lyricnz commentedArguably "Load single terms when restricted to one vocabulary." should say "single term"
And "exists in other vocabulary" should either say "exists in another vocabulary" or "exists in the other vocabulary".
Comment #44
oriol_e9gThanks!
Comment #45
oriol_e9gComment #46
lyricnz commentedYay.
Comment #47
mr.baileysLooks good, thanks for all your work on this oriol_e9g!
One more issue with the current patch:
Should use a descriptive message about the assertion rather than the default "Value @value is FALSE.".
Comment #48
oriol_e9gSome comments added and punctuation in affirmative sentences.
Comment #50
oriol_e9gMMmmmm...
Comment #51
oriol_e9gComment #52
oriol_e9gBetter wording and duplicated assert removed (patch line 118).
Comment #53
lyricnz commentedPlease restrict this patch to just that required to implement+test "Optional vid argument for taxonomy_get_term_by_name()". This makes it easier to review, and more obvious what we're doing.
Raise another issues to do all the cleanups in the tests.
Comment #54
oriol_e9gOk. I will create a new issue to cleanup the tests.
Comment #55
oriol_e9gTests cleanups split to: http://drupal.org/node/1195254
Comment #56
lyricnz commentedLooks good to me. 56 messages on a three line change? Hahaha :)
Comment #57
catchSorry folks, this should be machine name, not $vid, for portability between servers/sites.
Rather than join on the taxonomy_vocabulary() table (which is unlikely to be well indexed although it'd be worth an EXPLAIN), I'd use taxonomy_vocabulary_get_names() or similar, look up the correct vid from there, then do the same query - but the argument should be machine name.
Comment #58
oriol_e9gComment #59
oriol_e9gGo again: optional filter by vocabulary machine name.
Comment #60
oriol_e9gOps! spaces mistake.
Comment #61
catchLooks good, it would be nice for this to fail a bit harder:
I would consider
+ if (isset($vocabularies[$vocabulary])) {- remove this and just throw the notice if the vocabulary doesn't exist - then it'll be logged to watchdog etc.Comment #62
oriol_e9gComment #63
oriol_e9gFor some strange reason the link above doesn't work: http://drupal.org/files/issues/taxonomy-get-name-336697-62.patch
Comment #64
lyricnz commentedYeah, looks good to me.
Comment #65
catchThis is only an API addition, so could potentially be backported.
Comment #66
xjm#62 includes a comment cleanup not relevant to the patch, and the wording is a little unclear. I'll reroll.
Comment #66.0
xjmUpdated issue summary.
Comment #67
xjmAttached is like #62 but without the extraneous comment change.
Comment #67.0
xjmClearer.
Comment #68
xjmAdded an issue summary. Still RTBC.
Comment #68.0
xjmUpdated issue summary.
Comment #69
oriol_e9g@xjm with your patch we lose the tests, #62 is better because include tests.
Comment #70
xjmAh, that is a mistake.
Comment #71
xjm#67 plus the test from #62. Please review.
Comment #72
oriol_e9gComment #72.0
oriol_e9gUpdated summary to review patch.
Comment #73
lyricnz commentedLooks good
Comment #74
oriol_e9gIf we need a patch for D7 backporting I can roll one.
Comment #75
oriol_e9g#71: term-by-name-336697-71.patch queued for re-testing.
Comment #76
oriol_e9g#71: term-by-name-336697-71.patch queued for re-testing.
Comment #77
sunIt's a bit odd to pass a vocabulary name here. The function name suggests to pass a name of a term, but it's very unusual to continue that for additional, optional arguments.
IMO, the second argument should be $vid.
Second, minor, while being there, let's also add the missing "$" on the @param lines in the phpDoc.
Since the value can be NULL, we need to test with isset().
20 days to next Drupal core point release.
Comment #78
xjm#77: I believe the use of machine name was recommended by catch in #57:
I'll reroll with the other fixes for now.
Comment #79
xjmI added the
$beforenameas well because it drove me crazy for one to be wrong and not the other. Sorry kittens.Comment #80
oriol_e9gI think that we are walking in circles xD
The firsts patches works with $vid argument #54, then introduce the vocabulary machine name for portability between servers/sites in #60, If it's preferred we can return to #54 versions.
Another option is accept both arguments:
Comment #81
oriol_e9gComment #82
catchI'd still go for machine name. Vocabularies as entities wasn't a great idea in retrospect and this discrepancy is part of that. But machine_name is more consistent with node type etc.
Comment #83
sunok, good point - I almost forgot that we have machine names for vocabs now.
However, any reason for attempting to load vocabularies? Why don't we simply pass the 'machine_name' => $vocabulary in $conditions?
Comment #84
catchThat's because machine_name isn't in the taxonomy_term base table.
Comment #85
xjmSo is #79 RTBC then?
Comment #86
catchYep.
Comment #87
lars toomre commentedThis new function is unlikely to be called frequently in any particular page load. Hence, I believe we should do some minimal checking before assignment.
As currently written, there is an implicit assumption that whatever value that was passed as $vocabulary will exist as a key within $vocabularies. An isset($vocabularies[$vocabulary]) check will avoid throwing a PHP error and also prevent an invalid condition from being set. An else statement could added to the new isset condition as well to handle $vid values. I imagine the resulting code would look something like:
From the documentation, it appears that one can only limit the term to that it one vocabulary. I have a use case for getting a term from vocA, vocB or vocC. Does it make sense to expand the definition of $vocabulary to allow for array() of vocabularies to look in?
Comment #88
xjmThis is a good point, but we could simplify this to:
since
taxonomy_vocabulary_get_names()is an API function and always returns an array. A null vocabulary name will not be set and so the check is skipped.I'm not sure if the magic int-or-string bit is a good idea. Callers can do this lookup if needed.
There's another issue related to this somewhere.
Comment #89
lars toomre commentedThanks for your thoughts @xjm.
I suggest using two distinct isset() checks so that there is no data base hit in the event that $vocabulary is not even set. I think this is more performant since the added time form an additional isset() call is minor relative to even having to access the data base.
In checking taxonomy_vocabulary_get_names() just now, I see that there is no static storage of the $vocabularies array. Hence,
looping through vocA or vocB or vocC condition currently will result in three calls to get the array values from the data base. That argues that either taxonomy_vocabulary_get_names() be modified for static storage, or this function should handle the multiple vocabularies constraint.
Comment #90
sunIndeed, we also need specify the expected behavior if you pass a $vocabulary name but that vocabulary does not exist.
In that case, it would be utterly wrong to silently skip/ignore the $vocabulary limitation. Instead, I think I'd expect an empty result set (and obviously it would be superfluous to even attempt to load terms for a non-existing vocabulary).
Comment #91
xjmAh, yes. If someone passes a term name and it is not found in the vocabulary, it returns an empty array--so obviously we need to do the same thing for non-existent vocabularies as well.
Regarding #89, sounds like maybe we should add static caching to
taxonomy_vocabulary_get_names(). That could be opened as a separate novice issue.Comment #92
lars toomre commented@sun What are your thoughts about the elseif clause suggested in #87?
I think it makes sense to overload the $vocabulary variable as 'mixed', but am open to following other patterns implemented in core.
Comment #93
lars toomre commentedAs per @xjm's suggestion in #91, I created the related issue #1274674: Add static caching to taxonomy_vocabulary_get_names().
Comment #94
lars toomre commentedAnother issue came up from some local testing. This thought concerns how does one deal with capitalization and/or whitespace issues in the passed $vocabulary variable.
I am presuming that the vocabulary machine_name array is expected to be lower case. Hence, if one wants to constrain the search to say the 'tags' vocabulary, one must define the constraint to be exactly 'tags'. Based upon @sun's comment in #90 above, one should expect the following vocabulary names to have no match: 'TAGS', 'Tags', 'Tags ', ' tags', 'tAgs ', etc. Is this assumption really reasonable?
Shouldn't we both trim whitespaces and convert $vocabulary to lower case before doing any comparisons?
Comment #95
lars toomre commentedLet me finally relay a question by a work associate. She observed that in D7 a $vocabulary became an entity.
Hence, why then is it not possible to call this function in the form
Comment #96
oriol_e9gIn #60 we return a NULL value if vocabulary doesn't exists but this was removed for a comment in #62.
Comment #97
oriol_e9gI can work with this but I need some guidance... where are we going? It would be helpful a version of #79 patch with #90 solved?
Comment #98
oriol_e9gI will try with this.
Same approach as #79 plus:
1. When filtering by a non-existing vocabulary return and empty array() and avoid data hit for loading terms. This is consistent because if you filter by a non-existing term you receive and empty array, not an error, and you can expect the same for a non-existing vocabulary filtering.
2. I added a new test to ensure that we return an empty array when filter by a non-existing vocabulary.
Comment #99
xjm#98 looks good to me.
Comment #100
xjm#98: term-by-name-336697-98.patch queued for re-testing.
Comment #102
xjmRerolled with a couple minor comment cleanups.
Comment #103
xjmSorry, one other doxygen fix.
Comment #103.0
xjmUpdated tasks.
Comment #104
xjmUpdated summary.
Comment #105
xjmAnd, killing fewer kittens, since that first line of the first hunk would undoubtedly collide with the docs cleanup sprint.
Comment #105.0
xjmUpdated issue summary.
Comment #106
lambic commentedLooks good to me, except there's a todo on taxonomy_term_load_multiple() to remove $conditions, which would break this change.
Comment #107
lyricnz commentedCall me pedantic, but the doc comment that says:
The "Defaults to NULL" is irrelevant - the comment should describe what the default *behaviour* is, not the value that happens to represent that. In fact, the rest of the comment "(optional) Vocabulary machine name to limit the search." describes that sufficiently by the use of the word "optional"
Comment #108
xjmExplicitly listing the default value for optional parameters is part of our doxygen standards. On http://drupal.org/node/1354, use your browser to search for the text "Defaults".
Hmm, is there an issue for this? Can someone clarify what the status or intent of that is? Edit: the commit on the
@todoreferences #920600: Document the deprecated $conditions parameter in entity_load().So what we need to know is if switching
taxonomy_term_load_multiple()to EFQ will alter its signature, I guess? Can someone explain what we'd need to change to make this forward-compatible?Comment #108.0
xjmUpdated issue summary.
Comment #108.1
xjmUpdated issue summary.
Comment #108.2
xjmUpdated issue summary.
Comment #109
xjmAdded note to summary about the
@todo.Comment #110
lyricnz commentedI disagree with your interpretation of that page. The value "NULL" has no meaning in this case, and represents the *lack of* an explicit machine id to be used for restricting the result set. In fact the page you refer to includes:
I would suggest that this falls into "The description should clarify the default value if omitted"
Comment #111
xjmHrm, I don't quite understand #110. From my perspective the comment in the patch uses exactly the pattern in the example snippet, and it does exactly "The description should clarify the default value if omitted". That is, it documents the default supplied in the function signature:
Perhaps the doxygen standard should be clearer in indicating that the documentation for optional parameters should end in the sentence, "Defaults to foo," where foo is the default value supplied in the signature.
Is this better in your opinion?
Comment #112
lyricnz commentedHi xjm, and thanks for listening to my concern. My objection is that "NULL" has no semantic meaning - it's simply a "magic value" used to denote the non-existence of this optional parameter. I would suggest that the only comment required is:
This seems to be meeting the objective "The description should clarify the default value if omitted" by clarifying via "optional" rather than explicitly including the value.
i.e. if you don't pass the machine name, you don't care what the default value was, it might as well be "DONOTUSE YES YES FROB" :)
But, this is quibble.
Comment #113
xjmThis pattern is already used in core; see, for example, field_has_translation_handler().
Comment #114
xjmRerolled for
core/.Comment #115
xjmCross-referencing #143817: API: Get just term ID by name.
Comment #116
xjmMarked #1161248: Add vid argument to taxonomy_get_term_by_name as a duplicate of this issue.
Comment #117
oriol_e9g#114: taxonomy_vid_arg-336697-POST-APOCALYPSE.patch queued for re-testing.
Comment #118
oriol_e9gIt's an api addition, we have an issue summary, we have discused, we have tests, testbot it's happy, so... anybody thinks that this is RTBC?
Comment #119
Niklas Fiekas commentedLooks good to me. All I have are a few comment style issues:
This should fit on one line, I think.
Same here, probably.
One more word fits on the first line.
Comment #120
oriol_e9gDocs fixes from #119
Comment #121
xjmAlright, I think this is RTBC (and long since). :)
Comment #122
catchWhitespace error here:
But I fixed that on commit. Committed/pushed to 8.x, not sure webchick will want to backport this, but that's up to her :)
Comment #123
oriol_e9gOk, API addition backported.
Comment #124
oriol_e9gConsider this API addition for D7.
Summary for webchick
taxonomy_get_term_by_name()is not useful because you cannot filter by vocabulary.Comment #125
xjmLet's fix the first part here too (space after if).
Comment #126
oriol_e9gComment #128
Niklas Fiekas commented#126: taxonomy-vocabulary-336697-126.patch queued for re-testing.
That failing test case passes on my local machine, so one more try before more investigation follows.
Edit: Yay, random test failure no longer!
Comment #129
oriol_e9gMysterious temporal error that works locally many times and works in a second testbot attempt.
Comment #130
xjmTechnically you shouldn't RTBC your own backport, but it looks fine to me now. :)
Comment #131
webchickThis is a feature request, so I'll have to have a look at it after we're below thresholds.
Comment #132
webchickWe are finally below thresholds, so features are back on the table. Re-testing the latest patch, since it's been awhile.
Comment #133
webchick#126: taxonomy-vocabulary-336697-126.patch queued for re-testing.
Comment #134
webchickWhew! Thanks for all the hard work on this folks. :) Nice to get this helper in!
Committed and pushed to 7.x. Thanks! Marking as something to mention in the 7.13 release notes.
Comment #135.0
(not verified) commentedUpdated issue summary.