Problem/Motivation

Entity reference autocomplete with tagging silently discards invalid input.

The taxonomy autocomplete widget's input validation silently discards invalid input.
Invalid meaning according to the RFC 4180 (http://tools.ietf.org/html/rfc4180) CSV format.

Description User input (including quote characters) Actual tags created Expected tags created Input discarded?
Full quotes, with comma "Gary, IN" "Gary, IN" "Gary, IN" None
Missing trailing quote "Gary None ? Yes
Missing leading quote Gary Indiana" "Gary Indiana" ? None
Quotes within quotes "Jimmy "The Boss" Smith, Mr" "Jimmy", "Mr" ? Yes
Double quotes (escaping) within quotes "Jimmy ""The Boss"" Smith, Mr" "Jimmy "The Boss" Smith, Mr" (one tag) "Jimmy "The Boss" Smith, Mr" (one tag) None

Steps to reproduce

To reproduce:

  1. Create a node type.
  2. Add entity reference field, enable tagging mode.
  3. Use "Autocomplete (tags style) widget.
  4. Create new node, see User input column in table above.

This saves the node but not the (invalid) tag and does not show any error message.

Proposed resolution

Determine what input format to adopt for input values. It is suggested in several comments to use the RFC 4180 CSV format.
Determine how to handle discarded input

Remaining tasks

Confirm the use of RFC 4180 CSV format
Why not deprecate explode?

Review
Commit

User interface changes

Error message on invalid input

API changes

Data model changes

Release notes snippet

Original report by hefox

  1. Go to a node type with free tagging
  2. Input "Blah blah blah (notice no trailing quote)
  3. Save node
  4. Notice that the term was not saved and no error was produced
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

d1b1’s picture

Title: drupal_explode_tags will drop tags that starts with a quote mark but don't end it » Unable to confirm the noted behavior
Status: Active » Needs work

Please provide more details on how to test. I was able to setup a basic drupal site, but could not reproduct the specific bug: 'term was not saved'. When I tested this feature, I found that double quotes were not saved, but the actual tag value was. "Foo Foo results in saved tag Foo Foo.

The bug might need to be that double quotes are not saved for free tagged taxonomy values.

Feedback?

d1b1’s picture

Title: Unable to confirm the noted behavior » Additional testing found an issue

It appears that the placement of the double quotes in the tag string determine what is saved.

Example:
My Free Tag" is here => Saved as 'Me Free Tag'
My Free Tag is " => Saved as 'My Free Tag is '
"My Free Tag is => Saved as ''

To reproduce this bug:
1. Install a basic d7 install.
2. Add terms to a taxonomy vocab.
3. Add a taxonomy term field to a node type.
4. Set the number of terms to unlimited.
5. Set the widget to AutoComplete.
6. Create a node.
7. Enter all three of the examples from above.

hefox’s picture

Title: Additional testing found an issue » drupal_explode_tags will drop tags that starts with a quote mark but don't end it
Status: Needs work » Active

Issue title is the actual issue title, not a comment title. Needs work/needs review are for patches. Postponed was likely the status you were looking for. See http://drupal.org/node/317

Third example in comment 2 is the bug, thanks for providing more thorough debug steps

hefox’s picture

Issue tags: +Novice
aaron.r.carlton’s picture

I think the issue is with the regexp in function drupal_explode_tags(), ln #6983 of common.inc. I'm not awesome at regex, but I've been staring at it a while...

Garrett Albright’s picture

To clarify #2, it appears that when a tag starts with a double quote character, the tag is not saved at all (not that an empty tag is saved).

hefox’s picture

Correct, it doesn't any existence of that tag.

Garrett Albright’s picture

The code being used to split tags strikes me as being overly-complex and therefore fault-prone.

function drupal_explode_tags($tags) {
  // This regexp allows the following types of user input:
  // this, "somecompany, llc", "and ""this"" w,o.rks", foo bar
  $regexp = '%(?:^|,\ *)("(?>[^"]*)(?>""[^"]* )*"|(?: [^",]*))%x';
  preg_match_all($regexp, $tags, $matches);
  $typed_tags = array_unique($matches[1]);

  $tags = array();
  foreach ($typed_tags as $tag) {
    // If a user has escaped a term (to demonstrate that it is a group,
    // or includes a comma or quote character), we remove the escape
    // formatting so to save the term into the database as the user intends.
    $tag = trim(str_replace('""', '"', preg_replace('/^"(.*)"$/', '\1', $tag)));
    if ($tag != "") {
      $tags[] = $tag;
    }
  }

  return $tags;
}

(Sorry for using the regular <code> tag, but the ?> in the first pattern trips up the filter if I do it PHP style.)

I'm thinking a single pattern could be used, without having to do the str_replace() and preg_replace(). However, after a good deal of trying, I couldn't quite work it out. Maybe I'll give it another try later, but for now I'm afraid I have other priorities.

emclaughlin’s picture

I was looking at this bug in hopes of patching it, but ran into an issue before I could get started. How could you make tags starting with " workable in a way that would differentiate between, say, "Gary, IN", "Gary, "Gary Indiana", "Indiana where each individual tag is:
Gary, IN
Gary
Gary Indiana
Indiana

Because commas are sometimes parts of tags, you can't just say if there's no quote before the next comma, " is part of the tag. The only fix that I can see would be adding a warning that lets the user know that they can't use " within a tag if they try to do so. Is there a way around this that I'm not seeing, or would just adding the error be the fix?

hbergin’s picture

Is it safe to say that the number of instances of " must be even, and that each pair must be separated by a comma? If so, error checking could try to ensure that pattern before any tag exploding is done. If the tag string complies with that pattern, then the rest of the explode tags code could be left as is. If not, set error message and do not save.

"Gary, IN", "Gary, "Gary Indiana", "Indiana
The error checking would fail for the above example given by emclaughlin.

BrockBoland’s picture

Version: 7.x-dev » 8.x-dev

This is still an issue in D8.

hefox’s picture

Issue tags: -Novice

Removing novice tag cause it appears the regular expression is very not novice-y

BrockBoland’s picture

Issue tags: +Novice

#11 is correct: it will split on the commas, and any tag without a closing " will be dropped. So, in the example given there:

"Gary, IN", "Gary, "Gary Indiana", "Indiana

You would get just two tags:

  • Gary, IN
  • Gary,
drudrew’s picture

The real problem is that an invalid input value is silently accepted.
This issue cannot be fundamentally solved by changing the behavior of drupal_explode_tags() alone.

What I think should happen is an error on the form validation level. Any other thoughts ?

Feng-Shui’s picture

+1 for a form validation error, unless the user adheres to some level of "syntax correctness" there's no way to process their input and guarantee the desired outcome.

From what I can gather all tags must be separated with a comma regardless of weather or not they are enclosed within quotes, the closing of a quote should not indicate a new tag. If you want to use a comma, encase the entire tags in quotes.

Does that mean the following should be able to be entered?

Input: "Jimmy "The Boss" Smith, Mr"
Output: Jimmy "The Boss" Smith, Mr

Regex confuses me, my head hurts.

BrockBoland’s picture

Issue tags: -Novice

(accidentally re-added the Novice tag last night)

BrockBoland’s picture

Agreed with #15: this should trigger a validation error.

For #16: "Jimmy "The Boss" Smith, Mr" should result in an error, because it would see the first tag as Jimmy (with trailing space), not know what to do with The Boss" Smith, and the second tag would be Mr.

It does currently support double-double-quotes to include a quote in a tag, so "Jimmy ""The Boss"" Smith, Mr" would result in a single tag: "Jimmy "The Boss" Smith, Mr"

droplet’s picture

What does Drupal actually supported ?

Custom format or CSV standard ( http://en.wikipedia.org/wiki/Comma-separated_values )

EDIT:
super long history. can't find out any idea:
http://drupalcode.org/project/drupal.git/commit/2348e7de6f9714496316ee42...

drudrew’s picture

Two things need to happen to solve this issue:

First, we need to define what is valid input. It's time for a formal specification. As suggested by #19, a CSV specification would be a good candidate. With the specification, we can create or adopt a validation routine.

Second, we need to find a way to raise a validation error in either:

  • taxonomy_autocomplete_validate(), or
  • taxonomy_field_validate()

taxonomy_autocomplete_validate() is where invalid input can be detected. It parses the input using drupal_explode_tags(). Currently, only the "valid" tags are deposited to the form using form_set_value(). Unfortunately, this is not where the validation error is triggered.

taxonomy_field_validate() can trigger a validation error. The invalid parts, however, are discarded before reaching this.

My preference is to somehow make the raw input to reach taxonomy_field_validate(). Then, it can trigger a validation error.

Any thoughts ?

EDITS: The functions above can be found in DRUPAL_8_ROOT/core/modules/taxonomy/taxonomy.module

emclaughlin’s picture

Why does the autocomplete function discard input without raising an error? That seems like bad form all around.

drudrew’s picture

Status: Active » Closed (won't fix)

This is *not* an issue of drupal_explode_tags().
It is rather an issue in the core taxonomy module. It lacks of validation of the tag field.
If it were to be fixed, it should be raised as a separate issue. This issue is closed because the problem lies elsewhere.

droplet’s picture

Status: Closed (won't fix) » Active

@#21 drudrew,
whatever your input, it should split into multiple tags or single tag. Nothing taxonomy_*_validate can do or should do.

I created a new issue #1680022: Import default "Tags" field's help text (No one know they can use "quote, mark" in tags) for import help text and formal specification.

StuartJNCC’s picture

RFC 4180 (http://tools.ietf.org/html/rfc4180) defines the CSV format. Some of it is not relevant here because it refers to multiple-line files, but the following paragraphs, apart from the CRLF bits, appear to cover us:

5. Each field may or may not be enclosed in double quotes.
If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:

"aaa","bbb","ccc" CRLF
zzz,yyy,xxx

6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:

"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx

7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:

"aaa","b""bb","ccc"

So, using examples from the above:

"Gary, IN" --> valid (and currently (i.e. D8) accepted correctly)
"Gary --> invalid under 5 (currently fails silently)
Gary Indiana" --> invalid under 5 (although currently accepted correctly)
"Jimmy "The Boss" Smith, Mr" --> invalid under 7 - embedded quotes must be doubled (saves 2 tags currently: "Jimmy" and "Mr")
"Jimmy ""The Boss"" Smith, Mr" --> valid (currently accepted correctly)

no_angel’s picture

add "needs issue summary update" tag

no_angel’s picture

Issue tags: +prague 2013

add "prague 2013" tag

flyrs’s picture

Title: drupal_explode_tags will drop tags that starts with a quote mark but don't end it » Taxonomy autocomplete widget's input validation silently discards invalid input
Component: base system » taxonomy.module
Issue tags: -Needs issue summary update
flyrs’s picture

Issue summary: View changes

[flyrs] changed issue summary to reflect recent comments and suggested solutions. This change made during prague2013 sprint.

droplet’s picture

Priority: Minor » Normal
Status: Active » Needs review
FileSize
1.15 KB

Let's use the PHP standard function.

Status: Needs review » Needs work

The last submitted patch, 29: 1329742.tag_.patch, failed testing.

droplet’s picture

FileSize
1.2 KB

Remove spaces

droplet’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 31: 1329742.tag-31.patch, failed testing.

droplet’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
1.99 KB

About the test cases fixes:

In D7 and D6,
"term with, a comma and / a
->
a comma and / a
(dropped "term with)

In new Patch:
"term with, a comma and / a
->
term with, a comma and / a
(CSV RFC 4180 thinks above string is single tag)

droplet’s picture

Using "CSV RFC 4180" may give unexpected result to average users:

Example:

INPUT:
aaa"aaa", "bbb"bbb

Current Result:

Array
(
    [0] => aaa
    [1] => bbb
)

PHP str_getcsv / "CSV RFC 4180" :

Array
(
    [0] => aaa"aaa"
    [1] => bbbbbb
)

But I think MOST users expected:
aaa"aaa"
"bbb"bbb

andypost’s picture

tkoleary’s picture

Has anyone tested how this would work using Select2?

The issue to replace jquery autocomplete is here: #1271622 and Select2 looks like the leading candidate as there is a patch to add it to language selection on install.

jhedstrom’s picture

Status: Needs review » Needs work
Issue tags: +Needs reroll

Patch no longer applies. Also, #1847596: Remove Taxonomy term reference field in favor of Entity reference might make fix this (unless it's also broken in entity reference autocomplete).

jibran’s picture

Status: Needs work » Postponed (maintainer needs more info)

#1847596: Remove Taxonomy term reference field in favor of Entity reference is in now can someone please reproduce it? If it's still a bug move back to NW else move it to D7.

droplet’s picture

Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Needs work

#1847596: Remove Taxonomy term reference field in favor of Entity reference is a code refactoring that isn't address the problems.

jibran’s picture

Title: Taxonomy autocomplete widget's input validation silently discards invalid input » Autocomplete widget's input validation silently discards invalid input
Component: taxonomy.module » entity_reference.module
Issue tags: -Needs reroll

Are you able to reproduce it?
Anyways moving it to ER queue because auto complete is belongs to ER now.

jibran’s picture

Title: Autocomplete widget's input validation silently discards invalid input » Autocomplete silently discards invalid input
droplet’s picture

Yes, it can.

The testcase doesn't cover the edge cases listed in Issue Summary
(http://cgit.drupalcode.org/drupal/tree/core/modules/system/src/Tests/Ent...)

badrange’s picture

Issue summary: View changes
Issue tags: +Barcelona2015

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

dpi’s picture

Title: Autocomplete silently discards invalid input » Autocomplete with tagging silently discards invalid input
Component: entity_reference.module » field system
Issue summary: View changes
Issue tags: +entity reference field

Cleaned up summary.

dpi’s picture

Issue summary: View changes
dpi’s picture

Version: 8.5.x-dev » 8.6.x-dev
Assigned: Unassigned » dpi
Status: Needs work » Active
dpi’s picture

Status: Active » Needs review
FileSize
10.68 KB

Early times

Status: Needs review » Needs work

The last submitted patch, 53: 1329742-autocomplete-tag-explode.patch, failed testing. View results

dpi’s picture

Status: Needs work » Needs review
FileSize
11.13 KB
185.39 KB

Some implementation notes:

If a tag starts with quote, then its always going to be quoted. A tag can never start with an escaped quote, unless it is in quotes itself. Understanding this will help with reviewing. For example, if you use the tags "Hello, the imploder will create tags ['"""Hello"']. The first and fourth quotes are to denote the tag is quoted. The second and third quotes are escaped.

Messages are passed to the caller via an array reference, as this results in no API breaks. Exceptions would break compatibility, and they are not translatable. Changing return value is preferred, but would break API.

Drupal Utilities cannot use translatable strings (t() or TranslateableMarkup) because the Utility/ directory is a separate project without a dependency to core. (See fail in [#12496886-53]). So error messages are arrays containing 'message' and 'arguments' compatible with t() or simple search and replace.

The strings with discarded input outlined in summary have equivalent unit tests in TagsTest:

Issue summary input string Unit test key Unit test input string Returns tags Errors created
"Gary quoted, missing trailing quote "Hello [] 'No ending quote character found.'
Gary Indiana" unquoted, unexpected quote Hello" [] Unexpected quote character found after "Hello"
"Jimmy "The Boss" Smith, Mr" quoted, unexpected quote "Hello "Foo bar" World, baz" ['Hello'] Unexpected text after "Hello ". Expected comma or end of text. Found "Foo bar"

With the new implementation, the following inputs have different results, for better or worse:

String Old New Errors?
Hello" ['hello'] [] 'Unexpected quote character found'
"Hello "Foo bar" World, baz" ['Hello', 'baz'] ['Hello '] 'Unexpected text after "@tag". Expected comma or end of text. Found @unexpected.'
Hello "Foo bar" World, baz ['Hello', 'baz'] [] 'Unexpected quote character found'
""Hello "Foo bar" World, baz" ['baz'] [] 'Unexpected text after "@tag". Expected comma or end of text. Found @unexpected.'
Hello Foo bar World, baz"" ['Hello Foo bar World', 'baz'] ['Hello Foo bar World', 'baz"'] None
Hello ""Foo bar"" World, ""baz"" ['Hello'] ['Hello "Foo bar" World'] 'Unexpected text after "@tag". Expected comma or end of text. Found @unexpected.'
Hello ""Foo bar"" World, baz ['Hello', 'baz'] ['Hello "Foo bar" World', 'baz'] None
""""",hello ['"', 'hello'] [] 'No ending quote character found.'
hello ""world"" ['hello'] ['hello "world"'] None
hello ""world" ['hello'] [] 'Unexpected quote character found'

Attached sample of how error appears

sample

Sam152’s picture

Wow, this is an impressive patch. Here is my review.

  1. --- a/core/lib/Drupal/Component/Utility/Tags.php
    +++ b/core/lib/Drupal/Component/Utility/Tags.php
    
    @@ -12,30 +12,108 @@ class Tags {
    +  public static function explode($string, &$errors = NULL) {
    

    Since this is in the Utility namespace, I do think BC is a concern here. If this was purely some internal implementation detail of the tags widget, I'd say we can change it, but since others may rely on the buggy or broken (by autocomplete standards) behavior, we should move this off into a new method and deprecate the old. Perhaps explodeWithErrors, csvExplode, explodeAdvancedWithErrors etc.

  2. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +      preg_match('/^\s*(")/', $string, $matches, PREG_OFFSET_CAPTURE);
    

    I wonder if we should build some constants up that are named appropriately, only for readability sake. I'd much prefer to read preg_match(WHITESPACE_THEN_QUOTE,...).

  3. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +      $inQuotes = count($matches) != 0;
    

    Local vars are snake case. Strict compares also seem to be convention where possible.

  4. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +        $first_quote_position = $matches[1][1];
    +        $string = substr($string, $first_quote_position + 1);
    

    Named capture groups will make this much easier to read and maintain.

  5. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +        // Find first single quote.
    +        preg_match('/(?:^|[^"])(?:"")*(")(?:[^"]|$)/', $string, $matches, PREG_OFFSET_CAPTURE);
    

    The comment states "first single quote", but it seems like we end up finding the "end_quote_position". One or the other is unclear.

  6. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +            'message' => 'No ending quote character found.',
    

    Maybe instead of strings we can deal with @api class constants, so the error could be static::EXPLODE_NO_END_QUOTE, then the consumer of the API can decide the best message. Also resolve our translation woes.

  7. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +        $nextChar = isset($matches[1][0]) ? $matches[1][0] : NULL;
    

    Snake case again.

  8. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +      else {
    +        $end_position = strpos($string, ',');
    +        $end_position = $end_position !== FALSE ? $end_position : strlen($string);
    +        $tag = substr($string, 0, $end_position);
    +        $string = substr($string, $end_position + 1);
    

    Some comment describing "this is the code path where a tag is not in a quoted string, therefore quotes inside the tag are invalid" might be useful. You sort of lose the context of what is going on by the time you've reached this part of the code. Just a style/suggestion more than anything else.

  9. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -12,30 +12,108 @@ class Tags {
    +          $sample_end = $matches[1][1];
    +          $sample_start = max($sample_end - 10, 0);
    

    Sample might be the wrong word here. Maybe $tag_error_range_start/end or something.

  10. +++ b/core/lib/Drupal/Core/Entity/Element/EntityAutocomplete.php
    @@ -166,7 +166,11 @@ class EntityAutocomplete extends Textfield {
    +          $form_state->setError($element, t($error['message'], $error['arguments']));
    

    Hm, I do get why we're doing this. I wonder if there is any precedent to pass in the translator into Drupal unaware components. FWIW, we make a similar exception for constraints/constraint validator @see \Drupal\Core\TypedData\Validation\ExecutionContext::addViolation.

  11. +++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
    @@ -56,4 +56,233 @@ class TagsTest extends UnitTestCase {
    +   * @param bool $hasError
    +   *   Whether errors are logged.
    

    We should assert the whole error message in each test case.

  12. +++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
    @@ -56,4 +56,233 @@ class TagsTest extends UnitTestCase {
    +   * Provides test data for testExplode().
    

    I think it's convention to refer to other methods as ::testExplode in comments.

  13. +++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
    @@ -56,4 +56,233 @@ class TagsTest extends UnitTestCase {
    +   * @return array
    

    Probs not really required on a data provider.

  14. +++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
    @@ -56,4 +56,233 @@ class TagsTest extends UnitTestCase {
    +  public function providerTestExplode() {
    

    Wow, nice test cases.

  15. +++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
    @@ -56,4 +56,233 @@ class TagsTest extends UnitTestCase {
    +    $tests['unquoted'] = [
    +      'hello',
    +      ['hello'],
    +      FALSE,
    +    ];
    

    Purely style, but I think it's slightly more readable to go:

    return [
      'unquoted' => [],
      ...
    ];
    
borisson_’s picture

Status: Needs review » Needs work

Needs work based on #56

dpi’s picture

Thanks for the review Sam!

Since this is in the Utility namespace, I do think BC is a concern here. If this was purely some internal implementation detail of the tags widget, I'd say we can change it, but since others may rely on the buggy or broken (by autocomplete standards) behavior, we should move this off into a new method and deprecate the old. Perhaps explodeWithErrors, csvExplode, explodeAdvancedWithErrors etc.

This issue makes API back compat tricky thanks to error messages.

Agreed, old utility should be deprecated since we shouldn't guarantee every input will form tags. Ive put back the old explode utility and renamed the new.

Local vars are snake case. Strict compares also seem to be convention where possible.

Ive changed these random camelcase occurrences as the file was already using snake.

Named capture groups will make this much easier to read and maintain.

Great suggestion! 💯

Maybe instead of strings we can deal with @api class constants

As we discussed externally. I love the idea of using constants. Though it makes it difficult for consumers, as they will have to map constants to translation strings and their appropriate argument/contexts.

We should assert the whole error message in each test case.

Swapped out hasError with messages themselves, if any.

@return array [....] not really required on a data provider.

Core is very inconsistent about this.

Sample might be the wrong word here.

Sample would be correct since its giving 10 characters, not the whole string. Though I think I can pass the whole tag and let the consumer deal with an excessively long string. Ive changed this aspect.

Purely style, but I think it's slightly more readable to go:

Some keys get quote long, So ive kept things consistent by being on different lines. This style is justified moreso since adding error message asserts above.

Sam152’s picture

Issue tags: -prague 2013, -Barcelona2015

Nice! Do you think it makes sense to deprecate the other method in this issue or somewhere else?

What's cool is, all the test cases we have for "explode" also pass for "safeExplode":

diff --git a/core/tests/Drupal/Tests/Core/Common/TagsTest.php b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
index b668a337c0..75e65e9387 100644
--- a/core/tests/Drupal/Tests/Core/Common/TagsTest.php
+++ b/core/tests/Drupal/Tests/Core/Common/TagsTest.php
@@ -56,6 +56,12 @@ protected function assertTags($tags) {
     }
   }

+  public function testSafeExplodeWithLegacyTestCases() {
+    $string = implode(', ', array_keys($this->validTags));
+    $tags = Tags::safeExplode($string);
+    $this->assertTags($tags);
+  }
+
   /**
    * Test converting a string to tags.
    *

While, we don't have any negative test cases that assert what explode doesn't deal with, it's a good endorsement that if you were using explode with the intention of liberally accepting input, safeExplode is going to be right up your alley too.

  1. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -39,6 +39,118 @@ class Tags {
    +  public static function safeExplode($string, &$errors = NULL) {
    

    What about explodeWithErrors or something? Is this really more "safe" and was it "dangerous" before? Naming things is hard :-/

  2. +++ b/core/lib/Drupal/Component/Utility/Tags.php
    @@ -39,6 +39,118 @@ class Tags {
    +      else {
    +        // Handle tags not in quotes.
    

    If the comment is for the whole else block, probably best to swap these two lines?

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Assigned: dpi » Unassigned
Issue summary: View changes
Issue tags: +Bug Smash Initiative
FileSize
13.99 KB
11.35 KB

This was a bugsmash triage target earlier in the week. This needs an evaluation of the use of the RFC as the standard to follow. It is not clear to me why a new method is being added instead of deprecating the existing one. I have added both of those to the issue summary.

I decided to update the patch to 10.1.x.

Since this has not been worked on in 5 years I am changing this to 'un-assigned'.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs Review Queue Initiative

Can confirm the issue on Drupal 10.1
Using a standard install I used the Articles tags field using the values in the issue summar
Patch #69 does solve the proble.

safeExplode
1. Should be typehinted

// Find end single quote.
2. Think this could be written better.

// End of string. Finish.
3. Same

+        // Handle tags not in quotes.
+
+        // Determine where the tag ends.

4. empty space

5. Can we get a test only patch to see they fail.

Thanks!

smustgrave’s picture

Also IS has a remaining task

Why not deprecate explode?

Which should be answered.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.