Fail to validate regex with special chars in Clientside validation regex ajax callback.

The issue is caused by processing input value with check_plain() function.

There is no reason to escape input through the check_plain() filter before validating input with preg_match() condition.
Please recheck.

Patch attached

Comments

itsekhmistro created an issue. See original summary.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

Yup there doesn't seem to be a reason to check_plain that post variable before checking it. If it needs checkplain for the json output then it should be done after.

Git blame shows it wasn't added in a particular issue only a port for ctools stuff.
Git commit: 11b06940e8cf43087f22acdae63568e2a88cdd5b

  • Jelle_S committed 0d187bb on 7.x-2.x authored by itsekhmistro
    Issue #2645114 by itsekhmistro: Fail to validate regex with special...
jelle_s’s picture

Fixed in latest dev. Thanks for the patch!

jelle_s’s picture

Status: Reviewed & tested by the community » Fixed
kaidawai’s picture

Hi,
sorry, didn't see this earlier. This is a regression. Actually you just introduced an inconsistency. This check_plain is there for consistency with serverside check.

Thing is:
you are only looking isolated at the callback for clientside evaluation. In this evaluation the input is not further processed(this happens serverside). Because this processing is not done here check_plain seems not necessary. But it is to reflect the serverside behavior.

But when you hit the submit button same data gets submitted, and now the value /is/ processed further which makes check_plain necessary. So there is a mandatory check_plain in the the serverside validation.

What you created now is a clientside check that differs from the serverside check. That means values that are accepted clientside are not accepted serverside and vice versa. This renders the clientside validation disfunctional. Please revert. Thanks.

kaidawai’s picture

Status: Fixed » Needs review
itsekhmistro’s picture

Hi Kai,

>This is a regression. Actually you just introduced an inconsistency. This check_plain is there for consistency with serverside check.

This is not a regression and it works correctly.
Without the patch you won't be able to correctly handle regex patterns with special chars like quotes (", ').
And normally you need to apply check_plain to sanitise an output, not an input. This is generic Drupal approach.

kaidawai’s picture

1) yes, I introduced this check_plain, not because i liked to have one more line of code. I remember it even though it is now 2+ years ago. So we spent more than one thought on it. The architechture of the serverside check is simple right: first you normalize the input and than you use whatever check on the normalized input.
2)yes, check_plain substitutes dangerous chars. by that sanitizing the input.
3) check your regex. if you used a correct regex, that assumes a sanitized/normalized input it works. your check, that passes on quotes without escaping them is potentially dangerous.

Testcase from top of my head: use pure serverside formapi. Without clientside validation. Two textareas. Use your check. Put a lot of specialchars in the areas. Give a "bad"(according to your rule which should handle input with Ä'~&;"'....) input in just one of the fields. Submit. Correct the bad input. Leave the other field alone. Resubmit. IIRC the problem should show somewhre in the process.
If not i'll create a thorough testcase. I am pretty sure we introduced the check_plain with a reason.

Another remark: a couple of years ago, MySQL didn't support charsets as it does nowadays. Remember MySQL 4. Browsers didn't understand utf-8. Every browser handled them differently. by then all special characters where commonly stored as html entities. There was not other way to handle them at that time. Nowadays you would construct things differently and in my experience htmlspecialchars(and so is check_plain) is more a problem than a solution. But couple of years ago you had little choice and it does give you a normalized representation: "Ä" is not consistent "Ä" always stays Ä but this again makes it necessary to handle & as & if it is mixed you get into problems.

itsekhmistro’s picture

Hi Kai,

Thank you for detailed feedback.

However to make it short, here is a test scenario:
- There is a regex validation rule to check input for valid French name (First or Last name)

array('rule' => "regexp[/^[a-zàâçéèêëîïôûùüÿñæœA-ZÀÂÇÉÈÊËÎÏÔÛÙÜŸÑÆŒ .\\-’']*$/]", 'error' => 'Merci de renseigner un nom valide'),

- Input text (name) is "De'wayne" or "Daniel-O'connor"

Without patch - validation fails.
With patch - works correctly.

The code behind the issue is straight forward and in the patch it is visible how the input is used for in validation callback.
Please provide a use case when "check_plain" is required here.

Thank you in advance.

kaidawai’s picture

First: put clientside validation off.
Why:
- everything that is clientside can be manipulated by a user(simples turn off js or change the script in the browser, do even nastier things with perl....)
- The real and important check is the serverside one
- clientside has to resemble the serverside behavior.
- Clientside check is only a convinience for the user

Second: your test doesn't test.
Why:
You are checking if one string is accepted by your regex in one scenario.
The important thing to test however is: to deny bad input. And to always deny it. Without a way for a user to accidentally or maliciously inject a value you don't want.

Why: because if there is a way tho bypass the check you have a security breach.

I gave a simple case already. But again:
Turn off clientside validation. Use fapi textareas. Use a regex that denies "&" for example. See it fail the first time. See it pass the second time you hit submit without changing anything.
Think of of a way to avoid that behavior serverside with your regex. Use that regex with clientside and expect the same behavior...

Or: use an old browser and see how usefull it renders you check. Bypass a check that simple?

yes, i give you that: you are right that if i'd design a site from scratch, i wouldn't do it that way. Form api, panels, CC, multiforms and so on are inconsistent like hell. There are even still function stubs in the code that were never implemented because: you simply can't without rewriting everything. Routing is somewhat confusing. Solution: Drupal 8. At least the routing should be better.

nikunjkotecha’s picture

Status: Needs review » Reviewed & tested by the community

I suggest we use this patch, in fapi_validation there is no check_plain for this rule so we are good to use it even in client side.

function fapi_validation_rule_regexp($value, $params) {
  return (bool) preg_match($params[0], (string) $value);
}
nikunjkotecha’s picture

Just to add another scenario where this is required - Drupal is multi-lingual and French/German names contain single quotes and even special alpha characters like è, we can't ask them to change name just because we need to use check_plain which is not even used for regex rule in fapi.

itsekhmistro’s picture

Status: Reviewed & tested by the community » Fixed

I'm closing this issue,
The patch is already applied and works well.

Imho, there were no good explanation given why we shouldn't use it.
I admit from security point of view it is safe and matches to Drupal best practices.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.