Patterns cannot support query strings nor fragments, and there have been several reports filed that could have been prevented if we had extra validation to check for query string and fragment characters.

We should add an additional validation for any characters that are set to be removed or replaced with a separator in the Pathauto settings.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dave Reid created an issue. See original summary.

Dave Reid’s picture

Dave Reid’s picture

Issue summary: View changes
peschmae’s picture

Assigned: Unassigned » peschmae

I'll take a shot at this.

peschmae’s picture

Assigned: peschmae » Unassigned
Status: Active » Needs review
FileSize
2.93 KB
49.19 KB

I've attached a patch file, which implements a basic pattern validation, against the mentioned characters (?#&).

The validation can easily be extended to check for further characters which are replaced in URLs anyways (e.g. !), but I can also add a second validation to check against those characters.

I've also added a assertion to PathautoUiTest::testPatternsValidation(), but currently only check for #, since I just wanted to test the basic functionality. I could also expand it to check against all invalid characters, but since this is my first contribution to Drupal I was unsure if that is needed/wanted.

I basically tried to mimic the behaviour of "token_element_validate", but was unsure if I should add the newly created "pathauto_pattern_validate" to "after_build" as well.

Berdir’s picture

Status: Needs review » Needs work
+++ b/pathauto.module
@@ -153,3 +154,29 @@ function pathauto_entity_base_field_info_alter(&$fields, EntityTypeInterface $en
+    $invalid_characters = ['#', '?', '&'];

I think the idea is to not hardcode those settings but get them from the characters that we replace.

You can get the list from \Drupal\pathauto\AliasCleaner::getPunctuationCharacters

peschmae’s picture

I agree, that all the characters listed in \Drupal\pathauto\AliasCleaner::getPunctuationCharacters don't belong into an entity-alias and I would prefer a more generic list than the one I've hardcoded as well, but in my opinion the list in getPunctuationCharacters is too broad to apply them to the whole pattern, since it contains things like '/','.' and '-' which are quite common in URLs.

The following example is taken from a comment in AliasCleaner:cleanAlias

[token1]/[token2]-[token3]/[token4]

should we really disallow the hypen there?
I think for archives/dates this is a quite common use-case.

Additionally, right now the list from getPunctuationCharacters is currently only applied to the token-values and not the whole pattern (cleanTokenValues vs cleanAlias), accordingly validating against the whole list of characters that get replaced, would change the behaviour of the module quite substantially.

Berdir’s picture

you are right, that list alone isn't enough. If you check how it is used in \Drupal\pathauto\AliasCleaner::cleanString, we only want those that are either set to REMOVE or REPLACE. So instead of the function, you could actually just get the list directly from the configuration, if it's not in there then we also don't do anything.

Berdir’s picture

Assigned: Unassigned » Dave Reid
Status: Needs work » Needs review

And of course, the default configuration is actually incomplete right now because the only one in there is hyphen it seems?

So yeah, maybe its easier to just hardcode the view that we really care about.

Lets see what @davereid things about this.

  • Berdir committed 4376750 on 8.x-1.x authored by peschmae
    Issue #2681299 by peschmae: Add validation to reject invalid characters...
Berdir’s picture

Version: 8.x-1.x-dev » 7.x-1.x-dev
Status: Needs review » Patch (to be ported)

Lets do this. Committed.

peschmae’s picture

Assigned: Dave Reid » peschmae

I'll take a shot, at porting this to Drupal7

peschmae’s picture

Assigned: peschmae » Unassigned
Status: Patch (to be ported) » Needs review
FileSize
3.02 KB

I've provided a backport to Drupal 7 for this issue.

Since there was already a validation function in pathauto.admin.inc (_pathauto_validate_numeric_element() ), I've based the implementation on that instead of mimicking the behaviour of 'token_element_validate' which I did for Drupal 8.
I'm not sure about the function naming, especially when to prefix it with an underscore, but copied that from _pathauto_validate_numeric_element

I've also added an assertion to PathautoFunctionalTestCase::testPatternsValidation() but again only checked for '#' and not all three characters.

Status: Needs review » Needs work

The last submitted patch, 13: add_validation_to-2681299-13.patch, failed testing.

peschmae’s picture

I've used the php5.4 array short syntax, which resulted in the failure of the test on PHP 5.3, this patch fixes it.

dermario’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community
FileSize
39.99 KB

This patch does exactly what it should do - its an exact backport from the d8 variant. I reviewed the code, and tested it on my local machine:

The test are fine too.

Dave Reid’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/pathauto.admin.inc
@@ -224,6 +224,33 @@ function _pathauto_validate_numeric_element($element, &$form_state) {
+function _pathauto_validate_invalid_characters($element, &$form_state) {

Let's keep this close to the D8 version, so name this "pathauto_pattern_validate"


<code>
+++ b/pathauto.admin.inc
@@ -37,7 +37,7 @@ function pathauto_patterns_form($form, $form_state) {
-      '#element_validate' => array('token_element_validate'),
+      '#element_validate' => array('token_element_validate', '_pathauto_validate_invalid_characters'),
       '#after_build' => array('token_element_validate'),

@@ -55,7 +55,7 @@ function pathauto_patterns_form($form, $form_state) {
-          '#element_validate' => array('token_element_validate'),
+          '#element_validate' => array('token_element_validate', '_pathauto_validate_invalid_characters'),
           '#after_build' => array('token_element_validate'),

It looks like the original patch also missed adding this validation to when the form loaded. We'll need to do this as well.

dermario’s picture

Ok, the function naming of @peschmae was consistent to the existing _pathauto_validate_numeric_element(), thats why i didn't mentioned that. I renamed that function to pathauto_pattern_validate() and added it (with 1 small modification) to #after_build. Now the form gets validated when the form is loaded.

peschmae’s picture

This patch adds the validation to after_build in the D8 version as well.

Since the previous patch is already merged, I didn't provide an interdiff (and it would just contain the same change as the attached patch)