Problem/Motivation
Entity reference autocomplete with tagging silently discards invalid input.
The taxonomy autocomplete widget's input validation silently discards invalid input.
Invalid meaning according to the RFC 4180 (http://tools.ietf.org/html/rfc4180) CSV format.
Description | User input (including quote characters) | Actual tags created | Expected tags created | Input discarded? |
---|---|---|---|---|
Full quotes, with comma | "Gary, IN" | "Gary, IN" | "Gary, IN" | None |
Missing trailing quote | "Gary | None | ? | Yes |
Missing leading quote | Gary Indiana" | "Gary Indiana" | ? | None |
Quotes within quotes | "Jimmy "The Boss" Smith, Mr" | "Jimmy", "Mr" | ? | Yes |
Double quotes (escaping) within quotes | "Jimmy ""The Boss"" Smith, Mr" | "Jimmy "The Boss" Smith, Mr" (one tag) | "Jimmy "The Boss" Smith, Mr" (one tag) | None |
Steps to reproduce
To reproduce:
- Create a node type.
- Add entity reference field, enable tagging mode.
- Use "Autocomplete (tags style) widget.
- Create new node, see User input column in table above.
This saves the node but not the (invalid) tag and does not show any error message.
Proposed resolution
Determine what input format to adopt for input values. It is suggested in several comments to use the RFC 4180 CSV format.
Determine how to handle discarded input
Remaining tasks
Confirm the use of RFC 4180 CSV format
Why not deprecate explode?
Review
Commit
User interface changes
Error message on invalid input
API changes
Data model changes
Release notes snippet
Original report by hefox
- Go to a node type with free tagging
- Input
"Blah blah blah
(notice no trailing quote) - Save node
- Notice that the term was not saved and no error was produced
Comment | File | Size | Author |
---|---|---|---|
#69 | 1329742-69.patch | 11.35 KB | quietone |
| |||
#69 | interdiff-58-69.txt | 13.99 KB | quietone |
#58 | 1329742-autocomplete-tag-explode-58.patch | 11.12 KB | dpi |
#58 | interdiff-1329742-autocomplete-tag-explode-55-58.txt | 12.34 KB | dpi |
#55 | 1329742-uisamplex.png | 185.39 KB | dpi |
Comments
Comment #1
d1b1 CreditAttribution: d1b1 commentedPlease provide more details on how to test. I was able to setup a basic drupal site, but could not reproduct the specific bug: 'term was not saved'. When I tested this feature, I found that double quotes were not saved, but the actual tag value was. "Foo Foo results in saved tag Foo Foo.
The bug might need to be that double quotes are not saved for free tagged taxonomy values.
Feedback?
Comment #2
d1b1 CreditAttribution: d1b1 commentedIt appears that the placement of the double quotes in the tag string determine what is saved.
Example:
My Free Tag" is here => Saved as 'Me Free Tag'
My Free Tag is " => Saved as 'My Free Tag is '
"My Free Tag is => Saved as ''
To reproduce this bug:
1. Install a basic d7 install.
2. Add terms to a taxonomy vocab.
3. Add a taxonomy term field to a node type.
4. Set the number of terms to unlimited.
5. Set the widget to AutoComplete.
6. Create a node.
7. Enter all three of the examples from above.
Comment #3
hefox CreditAttribution: hefox commentedIssue title is the actual issue title, not a comment title. Needs work/needs review are for patches. Postponed was likely the status you were looking for. See http://drupal.org/node/317
Third example in comment 2 is the bug, thanks for providing more thorough debug steps
Comment #4
hefox CreditAttribution: hefox commentedComment #5
aaron.r.carlton CreditAttribution: aaron.r.carlton commentedI think the issue is with the regexp in function drupal_explode_tags(), ln #6983 of common.inc. I'm not awesome at regex, but I've been staring at it a while...
Comment #6
Garrett Albright CreditAttribution: Garrett Albright commentedTo clarify #2, it appears that when a tag starts with a double quote character, the tag is not saved at all (not that an empty tag is saved).
Comment #7
hefox CreditAttribution: hefox commentedCorrect, it doesn't any existence of that tag.
Comment #8
Garrett Albright CreditAttribution: Garrett Albright commentedThe code being used to split tags strikes me as being overly-complex and therefore fault-prone.
(Sorry for using the regular <code> tag, but the ?> in the first pattern trips up the filter if I do it PHP style.)
I'm thinking a single pattern could be used, without having to do the str_replace() and preg_replace(). However, after a good deal of trying, I couldn't quite work it out. Maybe I'll give it another try later, but for now I'm afraid I have other priorities.
Comment #9
emclaughlin CreditAttribution: emclaughlin commentedI was looking at this bug in hopes of patching it, but ran into an issue before I could get started. How could you make tags starting with " workable in a way that would differentiate between, say, "Gary, IN", "Gary, "Gary Indiana", "Indiana where each individual tag is:
Gary, IN
Gary
Gary Indiana
Indiana
Because commas are sometimes parts of tags, you can't just say if there's no quote before the next comma, " is part of the tag. The only fix that I can see would be adding a warning that lets the user know that they can't use " within a tag if they try to do so. Is there a way around this that I'm not seeing, or would just adding the error be the fix?
Comment #11
hbergin CreditAttribution: hbergin commentedIs it safe to say that the number of instances of " must be even, and that each pair must be separated by a comma? If so, error checking could try to ensure that pattern before any tag exploding is done. If the tag string complies with that pattern, then the rest of the explode tags code could be left as is. If not, set error message and do not save.
"Gary, IN", "Gary, "Gary Indiana", "Indiana
The error checking would fail for the above example given by emclaughlin.
Comment #12
BrockBoland CreditAttribution: BrockBoland commentedThis is still an issue in D8.
Comment #13
hefox CreditAttribution: hefox commentedRemoving novice tag cause it appears the regular expression is very not novice-y
Comment #14
BrockBoland CreditAttribution: BrockBoland commented#11 is correct: it will split on the commas, and any tag without a closing " will be dropped. So, in the example given there:
"Gary, IN", "Gary, "Gary Indiana", "Indiana
You would get just two tags:
Comment #15
drudrew CreditAttribution: drudrew commentedThe real problem is that an invalid input value is silently accepted.
This issue cannot be fundamentally solved by changing the behavior of drupal_explode_tags() alone.
What I think should happen is an error on the form validation level. Any other thoughts ?
Comment #16
Feng-Shui CreditAttribution: Feng-Shui commented+1 for a form validation error, unless the user adheres to some level of "syntax correctness" there's no way to process their input and guarantee the desired outcome.
From what I can gather all tags must be separated with a comma regardless of weather or not they are enclosed within quotes, the closing of a quote should not indicate a new tag. If you want to use a comma, encase the entire tags in quotes.
Does that mean the following should be able to be entered?
Input: "Jimmy "The Boss" Smith, Mr"
Output: Jimmy "The Boss" Smith, Mr
Regex confuses me, my head hurts.
Comment #17
BrockBoland CreditAttribution: BrockBoland commented(accidentally re-added the Novice tag last night)
Comment #18
BrockBoland CreditAttribution: BrockBoland commentedAgreed with #15: this should trigger a validation error.
For #16: "Jimmy "The Boss" Smith, Mr" should result in an error, because it would see the first tag as Jimmy (with trailing space), not know what to do with The Boss" Smith, and the second tag would be Mr.
It does currently support double-double-quotes to include a quote in a tag, so "Jimmy ""The Boss"" Smith, Mr" would result in a single tag: "Jimmy "The Boss" Smith, Mr"
Comment #19
droplet CreditAttribution: droplet commentedWhat does Drupal actually supported ?
Custom format or CSV standard ( http://en.wikipedia.org/wiki/Comma-separated_values )
EDIT:
super long history. can't find out any idea:
http://drupalcode.org/project/drupal.git/commit/2348e7de6f9714496316ee42...
Comment #21
drudrew CreditAttribution: drudrew commentedTwo things need to happen to solve this issue:
First, we need to define what is valid input. It's time for a formal specification. As suggested by #19, a CSV specification would be a good candidate. With the specification, we can create or adopt a validation routine.
Second, we need to find a way to raise a validation error in either:
taxonomy_autocomplete_validate() is where invalid input can be detected. It parses the input using drupal_explode_tags(). Currently, only the "valid" tags are deposited to the form using form_set_value(). Unfortunately, this is not where the validation error is triggered.
taxonomy_field_validate() can trigger a validation error. The invalid parts, however, are discarded before reaching this.
My preference is to somehow make the raw input to reach taxonomy_field_validate(). Then, it can trigger a validation error.
Any thoughts ?
EDITS: The functions above can be found in DRUPAL_8_ROOT/core/modules/taxonomy/taxonomy.module
Comment #22
emclaughlin CreditAttribution: emclaughlin commentedWhy does the autocomplete function discard input without raising an error? That seems like bad form all around.
Comment #23
drudrew CreditAttribution: drudrew commentedThis is *not* an issue of drupal_explode_tags().
It is rather an issue in the core taxonomy module. It lacks of validation of the tag field.
If it were to be fixed, it should be raised as a separate issue. This issue is closed because the problem lies elsewhere.
Comment #24
droplet CreditAttribution: droplet commented@#21 drudrew,
whatever your input, it should split into multiple tags or single tag. Nothing taxonomy_*_validate can do or should do.
I created a new issue #1680022: Import default "Tags" field's help text (No one know they can use "quote, mark" in tags) for import help text and formal specification.
Comment #25
StuartJNCC CreditAttribution: StuartJNCC commentedRFC 4180 (http://tools.ietf.org/html/rfc4180) defines the CSV format. Some of it is not relevant here because it refers to multiple-line files, but the following paragraphs, apart from the CRLF bits, appear to cover us:
So, using examples from the above:
"Gary, IN"
--> valid (and currently (i.e. D8) accepted correctly)"Gary
--> invalid under 5 (currently fails silently)Gary Indiana"
--> invalid under 5 (although currently accepted correctly)"Jimmy "The Boss" Smith, Mr"
--> invalid under 7 - embedded quotes must be doubled (saves 2 tags currently: "Jimmy" and "Mr")"Jimmy ""The Boss"" Smith, Mr"
--> valid (currently accepted correctly)Comment #26
no_angel CreditAttribution: no_angel commentedadd "needs issue summary update" tag
Comment #27
no_angel CreditAttribution: no_angel commentedadd "prague 2013" tag
Comment #28
flyrs CreditAttribution: flyrs commentedComment #28.0
flyrs CreditAttribution: flyrs commented[flyrs] changed issue summary to reflect recent comments and suggested solutions. This change made during prague2013 sprint.
Comment #29
droplet CreditAttribution: droplet commentedLet's use the PHP standard function.
Comment #31
droplet CreditAttribution: droplet commentedRemove spaces
Comment #32
droplet CreditAttribution: droplet commentedComment #34
droplet CreditAttribution: droplet commentedAbout the test cases fixes:
In D7 and D6,
"term with, a comma and / a
->
a comma and / a
(dropped
"term with
)In new Patch:
"term with, a comma and / a
->
term with, a comma and / a
(CSV RFC 4180 thinks above string is single tag)
Comment #35
droplet CreditAttribution: droplet commentedUsing "CSV RFC 4180" may give unexpected result to average users:
Example:
INPUT:
aaa"aaa", "bbb"bbb
Current Result:
PHP str_getcsv / "CSV RFC 4180" :
But I think MOST users expected:
aaa"aaa"
"bbb"bbb
Comment #36
andypostComment #37
tkoleary CreditAttribution: tkoleary commentedHas anyone tested how this would work using Select2?
The issue to replace jquery autocomplete is here: #1271622 and Select2 looks like the leading candidate as there is a patch to add it to language selection on install.
Comment #38
jhedstromPatch no longer applies. Also, #1847596: Remove Taxonomy term reference field in favor of Entity reference might make fix this (unless it's also broken in entity reference autocomplete).
Comment #39
jibran#1847596: Remove Taxonomy term reference field in favor of Entity reference is in now can someone please reproduce it? If it's still a bug move back to NW else move it to D7.
Comment #40
droplet CreditAttribution: droplet commented#1847596: Remove Taxonomy term reference field in favor of Entity reference is a code refactoring that isn't address the problems.
Comment #41
jibranAre you able to reproduce it?
Anyways moving it to ER queue because auto complete is belongs to ER now.
Comment #42
jibranComment #43
droplet CreditAttribution: droplet commentedYes, it can.
The testcase doesn't cover the edge cases listed in Issue Summary
(http://cgit.drupalcode.org/drupal/tree/core/modules/system/src/Tests/Ent...)
Comment #44
badrange CreditAttribution: badrange at Wunder commentedComment #50
dpiCleaned up summary.
Comment #51
dpiComment #52
dpiComment #53
dpiEarly times
Comment #55
dpiSome implementation notes:
If a tag starts with quote, then its always going to be quoted. A tag can never start with an escaped quote, unless it is in quotes itself. Understanding this will help with reviewing. For example, if you use the tags
"Hello
, the imploder will create tags['"""Hello"']
. The first and fourth quotes are to denote the tag is quoted. The second and third quotes are escaped.Messages are passed to the caller via an array reference, as this results in no API breaks. Exceptions would break compatibility, and they are not translatable. Changing return value is preferred, but would break API.
Drupal Utilities cannot use translatable strings (
t()
orTranslateableMarkup
) because theUtility/
directory is a separate project without a dependency to core. (See fail in [#12496886-53]). So error messages are arrays containing 'message' and 'arguments' compatible witht()
or simple search and replace.The strings with discarded input outlined in summary have equivalent unit tests in
TagsTest
:[]
[]
['Hello']
With the new implementation, the following inputs have different results, for better or worse:
['hello']
[]
['Hello', 'baz']
['Hello ']
['Hello', 'baz']
[]
['baz']
[]
['Hello Foo bar World', 'baz']
['Hello Foo bar World', 'baz"']
['Hello']
['Hello "Foo bar" World']
['Hello', 'baz']
['Hello "Foo bar" World', 'baz']
['"', 'hello']
[]
['hello']
['hello "world"']
['hello']
[]
Attached sample of how error appears
Comment #56
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedWow, this is an impressive patch. Here is my review.
Since this is in the Utility namespace, I do think BC is a concern here. If this was purely some internal implementation detail of the tags widget, I'd say we can change it, but since others may rely on the buggy or broken (by autocomplete standards) behavior, we should move this off into a new method and deprecate the old. Perhaps
explodeWithErrors
,csvExplode
,explodeAdvancedWithErrors
etc.I wonder if we should build some constants up that are named appropriately, only for readability sake. I'd much prefer to read
preg_match(WHITESPACE_THEN_QUOTE,...)
.Local vars are snake case. Strict compares also seem to be convention where possible.
Named capture groups will make this much easier to read and maintain.
The comment states "first single quote", but it seems like we end up finding the "end_quote_position". One or the other is unclear.
Maybe instead of strings we can deal with @api class constants, so the error could be static::EXPLODE_NO_END_QUOTE, then the consumer of the API can decide the best message. Also resolve our translation woes.
Snake case again.
Some comment describing "this is the code path where a tag is not in a quoted string, therefore quotes inside the tag are invalid" might be useful. You sort of lose the context of what is going on by the time you've reached this part of the code. Just a style/suggestion more than anything else.
Sample might be the wrong word here. Maybe $tag_error_range_start/end or something.
Hm, I do get why we're doing this. I wonder if there is any precedent to pass in the translator into Drupal unaware components. FWIW, we make a similar exception for constraints/constraint validator @see \Drupal\Core\TypedData\Validation\ExecutionContext::addViolation.
We should assert the whole error message in each test case.
I think it's convention to refer to other methods as ::testExplode in comments.
Probs not really required on a data provider.
Wow, nice test cases.
Purely style, but I think it's slightly more readable to go:
Comment #57
borisson_Needs work based on #56
Comment #58
dpiThanks for the review Sam!
This issue makes API back compat tricky thanks to error messages.
Agreed, old utility should be deprecated since we shouldn't guarantee every input will form tags. Ive put back the old explode utility and renamed the new.
Ive changed these random camelcase occurrences as the file was already using snake.
Great suggestion! 💯
As we discussed externally. I love the idea of using constants. Though it makes it difficult for consumers, as they will have to map constants to translation strings and their appropriate argument/contexts.
Swapped out hasError with messages themselves, if any.
Core is very inconsistent about this.
Sample would be correct since its giving 10 characters, not the whole string. Though I think I can pass the whole tag and let the consumer deal with an excessively long string. Ive changed this aspect.
Some keys get quote long, So ive kept things consistent by being on different lines. This style is justified moreso since adding error message asserts above.
Comment #59
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedNice! Do you think it makes sense to deprecate the other method in this issue or somewhere else?
What's cool is, all the test cases we have for "explode" also pass for "safeExplode":
While, we don't have any negative test cases that assert what
explode
doesn't deal with, it's a good endorsement that if you were usingexplode
with the intention of liberally accepting input,safeExplode
is going to be right up your alley too.What about explodeWithErrors or something? Is this really more "safe" and was it "dangerous" before? Naming things is hard :-/
If the comment is for the whole else block, probably best to swap these two lines?
Comment #69
quietone CreditAttribution: quietone at PreviousNext commentedThis was a bugsmash triage target earlier in the week. This needs an evaluation of the use of the RFC as the standard to follow. It is not clear to me why a new method is being added instead of deprecating the existing one. I have added both of those to the issue summary.
I decided to update the patch to 10.1.x.
Since this has not been worked on in 5 years I am changing this to 'un-assigned'.
Comment #70
smustgrave CreditAttribution: smustgrave at Mobomo commentedCan confirm the issue on Drupal 10.1
Using a standard install I used the Articles tags field using the values in the issue summar
Patch #69 does solve the proble.
safeExplode
1. Should be typehinted
// Find end single quote.
2. Think this could be written better.
// End of string. Finish.
3. Same
4. empty space
5. Can we get a test only patch to see they fail.
Thanks!
Comment #71
smustgrave CreditAttribution: smustgrave at Mobomo commentedAlso IS has a remaining task
Why not deprecate explode?
Which should be answered.