Updated: Comment #27

Problem/Motivation

The taxonomy autocomplete widget's input validation silently discards invalid input.
Invalid meaning according to the RFC 4180 (http://tools.ietf.org/html/rfc4180) CSV format.

Examples using the above mentioned validity criterias (see also comment #25):
"Gary, IN" --> valid (and currently (i.e. D8) accepted correctly)
"Gary --> invalid under 5 (currently fails silently)
Gary Indiana" --> invalid under 5 (although currently accepted correctly)
"Jimmy "The Boss" Smith, Mr" --> invalid under 7 - embedded quotes must be doubled (saves 2 tags currently: "Jimmy" and "Mr")
"Jimmy ""The Boss"" Smith, Mr" --> valid (currently accepted correctly)

To reproduce this bug:
1. Do a basic d8 install.
2. Create an article node (which contains a taxonomy_term field, with the autocomplete widget)
3. Enter one of the examples from above
4. Click save
5. Repeat with a new example from 3

This saves the node but not the (invalid) tag and does not show any error message.

In the core taxonomy module, method taxonomy_autocomplete_validate() lacks validation of the tag field.

Proposed resolution

This issue has two parts.
1) Determine what input format to adopt for taxonomy input values.
It is suggested in several comments to use the RFC 4180 CSV format.
2) Fix the taxonomy modules validation of invalid values according to 1)

[drudrew] suggests in comment #21 propogating the raw input to taxonomy_field_validate() so a validation error can be triggered

[droplet] suggsests in comment #24 to close this issue and create a new one.

Remaining tasks

(reviews needed, tests to be written or run, documentation to be written, etc.)
1) Make formal input specification
2) Fix validation accordingly

#1680022: Import default "Tags" field's help text (No one know they can use "quote, mark" in tags)

Original report by [hefox]

1) Go to a node type with free tagging
2) Input
"Blah blah blah
3) click save
4) Notice that the term was not saved and no error was produced

(not sure which component, just guessing)

Happening in d6, looked at d7 and looks the same.

Files: 
CommentFileSizeAuthor
#34 1329742.tag-33.patch1.99 KBdroplet
PASSED: [[SimpleTest]]: [MySQL] 63,688 pass(es).
[ View ]

Comments

d1b1’s picture

Title:drupal_explode_tags will drop tags that starts with a quote mark but don't end it» Unable to confirm the noted behavior
Status:Active» Needs work

Please provide more details on how to test. I was able to setup a basic drupal site, but could not reproduct the specific bug: 'term was not saved'. When I tested this feature, I found that double quotes were not saved, but the actual tag value was. "Foo Foo results in saved tag Foo Foo.

The bug might need to be that double quotes are not saved for free tagged taxonomy values.

Feedback?

d1b1’s picture

Title:Unable to confirm the noted behavior» Additional testing found an issue

It appears that the placement of the double quotes in the tag string determine what is saved.

Example:
My Free Tag" is here => Saved as 'Me Free Tag'
My Free Tag is " => Saved as 'My Free Tag is '
"My Free Tag is => Saved as ''

To reproduce this bug:
1. Install a basic d7 install.
2. Add terms to a taxonomy vocab.
3. Add a taxonomy term field to a node type.
4. Set the number of terms to unlimited.
5. Set the widget to AutoComplete.
6. Create a node.
7. Enter all three of the examples from above.

hefox’s picture

Title:Additional testing found an issue» drupal_explode_tags will drop tags that starts with a quote mark but don't end it
Status:Needs work» Active

Issue title is the actual issue title, not a comment title. Needs work/needs review are for patches. Postponed was likely the status you were looking for. See http://drupal.org/node/317

Third example in comment 2 is the bug, thanks for providing more thorough debug steps

hefox’s picture

Issue tags:+Novice
aaron.r.carlton’s picture

I think the issue is with the regexp in function drupal_explode_tags(), ln #6983 of common.inc. I'm not awesome at regex, but I've been staring at it a while...

Garrett Albright’s picture

To clarify #2, it appears that when a tag starts with a double quote character, the tag is not saved at all (not that an empty tag is saved).

hefox’s picture

Correct, it doesn't any existence of that tag.

Garrett Albright’s picture

The code being used to split tags strikes me as being overly-complex and therefore fault-prone.

function drupal_explode_tags($tags) {
  // This regexp allows the following types of user input:
  // this, "somecompany, llc", "and ""this"" w,o.rks", foo bar
  $regexp = '%(?:^|,\ *)("(?>[^"]*)(?>""[^"]* )*"|(?: [^",]*))%x';
  preg_match_all($regexp, $tags, $matches);
  $typed_tags = array_unique($matches[1]);

  $tags = array();
  foreach ($typed_tags as $tag) {
    // If a user has escaped a term (to demonstrate that it is a group,
    // or includes a comma or quote character), we remove the escape
    // formatting so to save the term into the database as the user intends.
    $tag = trim(str_replace('""', '"', preg_replace('/^"(.*)"$/', '\1', $tag)));
    if ($tag != "") {
      $tags[] = $tag;
    }
  }

  return $tags;
}

(Sorry for using the regular <code> tag, but the ?> in the first pattern trips up the filter if I do it PHP style.)

I'm thinking a single pattern could be used, without having to do the str_replace() and preg_replace(). However, after a good deal of trying, I couldn't quite work it out. Maybe I'll give it another try later, but for now I'm afraid I have other priorities.

emclaughlin’s picture

I was looking at this bug in hopes of patching it, but ran into an issue before I could get started. How could you make tags starting with " workable in a way that would differentiate between, say, "Gary, IN", "Gary, "Gary Indiana", "Indiana where each individual tag is:
Gary, IN
Gary
Gary Indiana
Indiana

Because commas are sometimes parts of tags, you can't just say if there's no quote before the next comma, " is part of the tag. The only fix that I can see would be adding a warning that lets the user know that they can't use " within a tag if they try to do so. Is there a way around this that I'm not seeing, or would just adding the error be the fix?

hbergin’s picture

Is it safe to say that the number of instances of " must be even, and that each pair must be separated by a comma? If so, error checking could try to ensure that pattern before any tag exploding is done. If the tag string complies with that pattern, then the rest of the explode tags code could be left as is. If not, set error message and do not save.

"Gary, IN", "Gary, "Gary Indiana", "Indiana
The error checking would fail for the above example given by emclaughlin.

BrockBoland’s picture

Version:7.x-dev» 8.x-dev

This is still an issue in D8.

hefox’s picture

Issue tags:-Novice

Removing novice tag cause it appears the regular expression is very not novice-y

BrockBoland’s picture

Issue tags:+Novice

#11 is correct: it will split on the commas, and any tag without a closing " will be dropped. So, in the example given there:

"Gary, IN", "Gary, "Gary Indiana", "Indiana

You would get just two tags:

  • Gary, IN
  • Gary,
drudrew’s picture

The real problem is that an invalid input value is silently accepted.
This issue cannot be fundamentally solved by changing the behavior of drupal_explode_tags() alone.

What I think should happen is an error on the form validation level. Any other thoughts ?

Feng-Shui’s picture

+1 for a form validation error, unless the user adheres to some level of "syntax correctness" there's no way to process their input and guarantee the desired outcome.

From what I can gather all tags must be separated with a comma regardless of weather or not they are enclosed within quotes, the closing of a quote should not indicate a new tag. If you want to use a comma, encase the entire tags in quotes.

Does that mean the following should be able to be entered?

Input: "Jimmy "The Boss" Smith, Mr"
Output: Jimmy "The Boss" Smith, Mr

Regex confuses me, my head hurts.

BrockBoland’s picture

Issue tags:-Novice

(accidentally re-added the Novice tag last night)

BrockBoland’s picture

Agreed with #15: this should trigger a validation error.

For #16: "Jimmy "The Boss" Smith, Mr" should result in an error, because it would see the first tag as Jimmy (with trailing space), not know what to do with The Boss" Smith, and the second tag would be Mr.

It does currently support double-double-quotes to include a quote in a tag, so "Jimmy ""The Boss"" Smith, Mr" would result in a single tag: "Jimmy "The Boss" Smith, Mr"

droplet’s picture

What does Drupal actually supported ?

Custom format or CSV standard ( http://en.wikipedia.org/wiki/Comma-separated_values )

EDIT:
super long history. can't find out any idea:
http://drupalcode.org/project/drupal.git/commit/2348e7de6f9714496316ee42...

drudrew’s picture

Two things need to happen to solve this issue:

First, we need to define what is valid input. It's time for a formal specification. As suggested by #19, a CSV specification would be a good candidate. With the specification, we can create or adopt a validation routine.

Second, we need to find a way to raise a validation error in either:

  • taxonomy_autocomplete_validate(), or
  • taxonomy_field_validate()

taxonomy_autocomplete_validate() is where invalid input can be detected. It parses the input using drupal_explode_tags(). Currently, only the "valid" tags are deposited to the form using form_set_value(). Unfortunately, this is not where the validation error is triggered.

taxonomy_field_validate() can trigger a validation error. The invalid parts, however, are discarded before reaching this.

My preference is to somehow make the raw input to reach taxonomy_field_validate(). Then, it can trigger a validation error.

Any thoughts ?

EDITS: The functions above can be found in DRUPAL_8_ROOT/core/modules/taxonomy/taxonomy.module

emclaughlin’s picture

Why does the autocomplete function discard input without raising an error? That seems like bad form all around.

drudrew’s picture

Status:Active» Closed (won't fix)

This is *not* an issue of drupal_explode_tags().
It is rather an issue in the core taxonomy module. It lacks of validation of the tag field.
If it were to be fixed, it should be raised as a separate issue. This issue is closed because the problem lies elsewhere.

droplet’s picture

Status:Closed (won't fix)» Active

@#21 drudrew,
whatever your input, it should split into multiple tags or single tag. Nothing taxonomy_*_validate can do or should do.

I created a new issue #1680022: Import default "Tags" field's help text (No one know they can use "quote, mark" in tags) for import help text and formal specification.

StuartJNCC’s picture

RFC 4180 (http://tools.ietf.org/html/rfc4180) defines the CSV format. Some of it is not relevant here because it refers to multiple-line files, but the following paragraphs, apart from the CRLF bits, appear to cover us:

5. Each field may or may not be enclosed in double quotes.
If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:

"aaa","bbb","ccc" CRLF
zzz,yyy,xxx

6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:

"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx

7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:

"aaa","b""bb","ccc"

So, using examples from the above:

"Gary, IN" --> valid (and currently (i.e. D8) accepted correctly)
"Gary --> invalid under 5 (currently fails silently)
Gary Indiana" --> invalid under 5 (although currently accepted correctly)
"Jimmy "The Boss" Smith, Mr" --> invalid under 7 - embedded quotes must be doubled (saves 2 tags currently: "Jimmy" and "Mr")
"Jimmy ""The Boss"" Smith, Mr" --> valid (currently accepted correctly)

no_angel’s picture

add "needs issue summary update" tag

no_angel’s picture

Issue tags:+prague 2013

add "prague 2013" tag

flyrs’s picture

Title:drupal_explode_tags will drop tags that starts with a quote mark but don't end it» Taxonomy autocomplete widget's input validation silently discards invalid input
Component:base system» taxonomy.module
Issue tags:-Needs issue summary update
flyrs’s picture

Issue summary:View changes

[flyrs] changed issue summary to reflect recent comments and suggested solutions. This change made during prague2013 sprint.

droplet’s picture

Priority:Minor» Normal
Status:Active» Needs review
StatusFileSize
new1.15 KB
FAILED: [[SimpleTest]]: [MySQL] 63,352 pass(es), 16 fail(s), and 1 exception(s).
[ View ]

Let's use the PHP standard function.

Status:Needs review» Needs work

The last submitted patch, 29: 1329742.tag_.patch, failed testing.

droplet’s picture

StatusFileSize
new1.2 KB
FAILED: [[SimpleTest]]: [MySQL] 63,645 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Remove spaces

droplet’s picture

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, 31: 1329742.tag-31.patch, failed testing.

droplet’s picture

Issue summary:View changes
Status:Needs work» Needs review
StatusFileSize
new1.99 KB
PASSED: [[SimpleTest]]: [MySQL] 63,688 pass(es).
[ View ]

About the test cases fixes:

In D7 and D6,
"term with, a comma and / a
->
a comma and / a
(dropped "term with)

In new Patch:
"term with, a comma and / a
->
term with, a comma and / a
(CSV RFC 4180 thinks above string is single tag)

droplet’s picture

Using "CSV RFC 4180" may give unexpected result to average users:

Example:

INPUT:
aaa"aaa", "bbb"bbb

Current Result:

Array
(
    [0] => aaa
    [1] => bbb
)

PHP str_getcsv / "CSV RFC 4180" :

Array
(
    [0] => aaa"aaa"
    [1] => bbbbbb
)

But I think MOST users expected:
aaa"aaa"
"bbb"bbb

andypost’s picture

tkoleary’s picture

Has anyone tested how this would work using Select2?

The issue to replace jquery autocomplete is here: #1271622 and Select2 looks like the leading candidate as there is a patch to add it to language selection on install.

jhedstrom’s picture

Status:Needs review» Needs work
Issue tags:+Needs reroll

Patch no longer applies. Also, #1847596: Remove Taxonomy term reference field in favor of Entity reference might make fix this (unless it's also broken in entity reference autocomplete).