We had the case that a text was rejected as spam because the German words "gewinnen" and "Gewinnermittlung" were used in the text and in protected form configuration "win" was set as a reject pattern. As "win" is part of "gewinnen" the text was rejected.

I think for these cases it would be helpful to be able to configure patterns and words seperately so that an admin can define patterns which should be checked as parts of a string and words which should only be checked as is.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tobiberlin created an issue. See original summary.

tobiberlin’s picture

Issue summary: View changes
altagrade’s picture

Status: Active » Needs work

This had been resolved for Drupal 7 on #3283252: Regex matches against partial words causing a rejection when it shouldn't by introducing a new Allowed patterns option. Because of time constrains we have to postpone this for later (about a month) or will wait until someone ports the feature to Drupal 9/10.

pminf’s picture

Issue tags: +Prague2022

I'm working on this during DrupalCon Prague 2022.

xpersonas’s picture

Awesome. Glad to hear you are working on this. I was about to start looking into this issue as well. I didn't realize this module flagged partial words. I have submissions getting blocked for things like "win" or "hell" because of words like "window" and "hello". So not exactly ideal.

I'm hoping we can flag a stand alone word like "hell" but still allow "hello".

DrupalUser77’s picture

Has there been any update on this? Thanks!

  • 5180c234 committed on 2.0.x
    Issue #3306579: Possibility to set words as reject pattern
    
altagrade’s picture

xpersonas’s picture

I was still struggling to get this to work. It looked like the code matching whole words OR string in string. But string in string matches are the problem. We ban the word "ass" but want to accept users that submit "compass" or other variations.

This patch seems to be working for me if anyone could test or add input.

xpersonas’s picture

Status: Fixed » Needs review
altagrade’s picture

Status: Needs review » Postponed (maintainer needs more info)

Have just tested this by banning "ass", allowing "compass" and submitting a test comment containing "compass" and it passed successfully.

However, it will fail if you enter "compasses" instead, because the module is not aware of other variations, so each variation must be allowed separately. We'd like to keep the module as simple as possible avoiding implementing external libraries like, for example, https://www.phpclasses.org/package/8907-PHP-Get-the-different-forms-of-l...

Could you please run the above test again and confirm it's working for you?

xpersonas’s picture

However, it will fail if you enter "compasses" instead, because the module is not aware of other variations, so each variation must be allowed separately. We'd like to keep the module as simple as possible avoiding implementing external libraries like, for example, https://www.phpclasses.org/package/8907-PHP-Get-the-different-forms-of-l...

I need to clarify that a bit. Are you referring to adding "compass" to the allowed patterns? Right now, I was simply testing the "reject patterns." I have an extensive list of rejected patterns that include both "ass" and "asses". In my test, with this patch, both "compass" and "compasses" pass as expected. Therefore, I feel like that portion of the code is working as I expect.

Allowed Patterns

The "allowed patterns" feature confuses me. I get where that could have been helpful if we were doing string within string checks. However, now that we're checking whole words, I'm struggling to see why we need to allow any specific pattern.

But maybe I'm missing something here?

Email Addresses w/ @

Looking at the other issues, I see there was some effort to prevent rejected patterns in email addresses. Which is true, with this implementation, something@asses.com would not be rejected.

Personally, I could live with bypassing checking email addresses. But I get that some may want to. Would we want to run another loop extracting email addresses and checking them differently? That does start to take the simplicity out of things. I agree that this should be as simple as possible.

So I'm not entirely sure what the best route forward here is. Should we have a config option - "check whole words" or "search within strings"?

Thoughts?

altagrade’s picture

Initially, the module was using just if (preg_match("/\b$clean_pattern\b/i", $user_input)) {, however because of https://www.drupal.org/project/protected_submissions/issues/3118715 and https://www.drupal.org/project/protected_submissions/issues/3205659 it was changed to deal with e-mail addresses, so accepting your patch would mean regression.

At the same time, despite our intention to keep things simple, whatever has to change to address new issues must be implemented even if another loop is required, so in this respect be assured the module is open for any kind of changes. But before introducing another layer of complexity, help me to understand your point better.

So if you read the initially reported issue:

We had the case that a text was rejected as spam because the German words "gewinnen" and "Gewinnermittlung" were used in the text and in protected form configuration "win" was set as a reject pattern. As "win" is part of "gewinnen" the text was rejected.

Now, if both "gewinnen" and "Gewinnermittlung" were listed as "Allowed Patterns", then they would overcome the "Reject Pattern" win and pass successfully.

Also, regarding your own comment:

We ban the word "ass" but want to accept users that submit "compass" or other variations.

and then later:

In my test, with this patch, both "compass" and "compasses" pass as expected. Therefore, I feel like that portion of the code is working as I expect.

So the new "Allowed Patterns" does resolve the initially reported problem, doesn't it? If you want the module to do more/better then could you please give example of real use cases when separating "patterns" and "words" would be useful?

xpersonas’s picture

So the new "Allowed Patterns" does resolve the initially reported problem, doesn't it? If you want the module to do more/better then could you please give example of real use cases when separating "patterns" and "words" would be useful?

Well, for this part at least, we don't sell compasses. Our site has nothing do with with compasses. Just happened that the term was included in a form submission and because of the "ass" match within "compass" it was rejected. I could include some that I know we need to white list, but there are so many words that could include those 3 letters with no foul intent, I could never cover all of them.

Makes me wonder if config options should be:

Radio buttons because it has to be one or the other:

  • Match whole words
  • Match within strings (choosing this option brings up the "allowed patterns" textarea.

Checkbox:

  • Check email addresses.

And then the same chosen method from above would be used for checking emails too - match whole words or match within strings. For instance, I would allow "@bassfishing.com" and "@assessor.com" but not "@ass.com".

There are just so many instances where 3 or 4 letter profane words can be part of larger non-profane words.
https://raw.githubusercontent.com/coffee-and-fun/google-profanity-words/...

altagrade’s picture

I agree with that many 3 or 4 letter profane words can be part of larger non-profane words, but since the module was initially created for a specific customer use case and has been working great so far, feel reluctant to open a box of Pandora that would bring plethora of changes.

However, since seeing stats (1,052 sites using the https://www.drupal.org/project/protected_submissions are yet to change to https://www.drupal.org/project/protected_forms with 888 sites currently using it) slowly growing I understand we can not keep it for just needs of one specific project and have to meet needs of all other use cases.

The only concern is time deficiency. Do you mind joining efforts and co-maintain the module? Or offer PR at least for this issue?

xpersonas’s picture

Yeah, I would be happy to help co-maintain or what ever you think is best. I did update my merge request for your review. I wanted to get some feedback. Wasn't sure the best way to do that. I'm used to Github, but still new to Gitlab with this Drupal workflow. That merge request needs tested and discussed.

I did this rather quickly, but here is the idea I had in mind...

Config Form

xpersonas’s picture

StatusFileSize
new236.22 KB
xpersonas’s picture

altagrade’s picture

Status: Postponed (maintainer needs more info) » Needs review

Thanks for the work done. However, I'd love to hear from other users of the module on this. Let's keep this open for now.

anybody’s picture

Title: Possibility to set words as reject pattern » Allow setting words as reject pattern
Assigned: Unassigned » grevil
Issue tags: +Needs tests
Parent issue: » #3340780: Needs tests for all functionality

We should review and add tests for this in the context of #3340780: Needs tests for all functionality

anybody’s picture

Assigned: grevil » Unassigned
Status: Needs review » Needs work
anybody’s picture

Version: 2.0.1 » 2.0.x-dev

Conflicts need to be resolved.

dhruv.mittal’s picture

Assigned: Unassigned » dhruv.mittal

Working on it

dhruv.mittal’s picture

Assigned: dhruv.mittal » Unassigned
Status: Needs work » Needs review
anybody’s picture

Status: Needs review » Needs work

Tests are failing

xpersonas’s picture

Issue tags: -Needs tests
xpersonas’s picture

Hello. I did some cleanup on this fork. Took a lot of the previous commits, but also got cli tests passing for eslint, phpcs, and took some of the phpunit tests we had and cleaned that up so that we have some basic functionality, login, and install/uninstall tests. Even though I had a fight with phpstan, I think it's in a decent spot.

I had a few site accidentally updated to the release version of this module. My clients had included "spac" as a reject pattern. (Not really sure why or what that means, but they added it) And I had a lot of forms rejected for using the word "spacing". I've found this module tremendously helpful, but it's hard for me to imagine using it without the option to check whole words.

There must be use cases for checking within strings instead of whole words, but I haven't encountered that in my experience.

This fork also logs rejections to it's own table rather than the main watchdog table. (that was added a while back) I've had way too many instances where I have a rejected form submission but it was already removed from the watchdog list.

So whole word checking and logging to a separate table are essential to my workflow. I'd love to see this become part of the primary module if possible. And I'm happy to help in any way I can. Let me know if anyone has any thoughts or problems.

xpersonas’s picture

Status: Needs work » Needs review
tirupati_singh’s picture

Status: Needs review » Needs work

Hi all, I replicated the issue and the issue persists. Upon adding any pattern in the Reject patterns field, the user is unable to submit the form due the any word containing the pattern added in the reject patterns field. I also applied the provided MR as a patch and it applied cleanly without any errors. However, after applying the patch getting below issue:

The website encountered an unexpected error. Try again later.

TypeError: Drupal\protected_forms\Form\ProtectedFormsForm::__construct(): Argument #1 ($module_handler) must be of type Drupal\Core\Extension\ModuleHandlerInterface, Drupal\Core\Config\ConfigFactory given, called in /app/web/core/lib/Drupal/Core/Form/ConfigFormBase.php on line 57 in Drupal\protected_forms\Form\ProtectedFormsForm->__construct() (line 28 of modules/custom/protected_forms/src/Form/ProtectedFormsForm.php).

Now on visiting the module configuration page getting the issue and also on submitting the form on which the protected form is enabled getting the issue shown in the screenshot. The provided MR still need works hence moving it to Needs work.

Thanks!

anybody’s picture

Looks like the MR reverts previous expected changes. Instead it should fix the issues without reverting the improvements. Especially I don't like the schema to be reverted.

anybody’s picture

xpersonas’s picture

Apologies. I did fix the constructor error. After another fight with phpstan, I have the changes committed with all tests passing to Drupal standards. Green checks everywhere. Though certainly more tests could be added (and welcomed).

But again. Apologies if I have taken this in the wrong direction, that was not my intention. I was only hoping to help and contribute because that is the "Drupal" thing to do.

For me, pattern matching within strings was never going to be a solution to any problem I had. For all my sites, and I presumed others, it needed to check whole word instances. However, I've felt a bit of push back on this the entire time. I did like how this module was set up overall, so it just made sense to add that feature to it while keeping your initial string check. Which is what I tried to do. I wanted to contribute.

My last merge request was an attempt to:

  • maintain whole word instance checking as an option
  • add cli tests for eslint, phpcs, and phpstan
  • add/refine phpunit tests based on what existed, what was suggested, and what I thought was logical
  • log rejections in it's own table rather than watchdog (which could perhaps be a setting) because on high traffic sites, I was often asked what was rejected only to find it was too late

Looks like the MR reverts previous expected changes. Instead it should fix the issues without reverting the improvements. Especially I don't like the schema to be reverted.

My intention was not to revert any improvements. I tried to carry over most of the previous commits. If "improvements" were reverted, I'm open to discuss.

It's been a few months, but I believe my changes to the schema were to address code quality and test issues that arose. I certainly wouldn't change the schema for a fun time. I have this MR running on many high profile, high traffic sites. I saw no issues with my changes on those sites and in my tests. Please let me know if they caused any other issues though.

I have this module in a really good place for my needs, and I realize that's "my" needs. This issue/thread began what is going on 3 years now. Perhaps I should take this version of the module I have now and make it a private custom module from this point forward. I'm honestly not trying to step on any toes or cause any issues for anyone. So again, apologies if it feels that way.

tirupati_singh’s picture

Hi @anybody @xpersonas, I've applied the latest changes in the MR as a patch and it applied cleanly without any error. On applying the patch, the issue mentioned in the comment #31 has been resolved. However, when the module is installed and then the patch has been applied, then on visiting the module configuration page i.e., /admin/config/content/protected_forms getting the below error:

The website encountered an unexpected error. Try again later.

InvalidArgumentException: "0" is an invalid render array key. Value should be an array but got a string. in Drupal\Core\Render\Element::children() (line 97 of core/lib/Drupal/Core/Render/Element.php).

But after reinstalling the module with the applied patch then the above issue has not been encountered.

After applying the patch, a new field Pattern matching has been added with options Check whole word instances and Check within strings. The newly added field provides the option to match the added pattern to match the whole or match only the pattern within the words. Both the provided field options are working fine with the forms and blocking the form submission as per the selected option in the field. I've attached the video clip for your reference and screenshots of before and after fixes as well.

As getting the mentioned error after applying the patch, hence keeping the issue to Need works.

Thanks!

xpersonas’s picture

@tirupati_singh: Thank you for testing things out so thoroughly and adding a video. I see the issue now. I pushed up a fix. If you try to run a new patch with the current diff, that error should go away. At least it did for me. So let me know if you have any issue with it.

grevil’s picture

@xpersonas I think @anybody accidentally looked at a different MR. This:

Especially I don't like the schema to be reverted.

Doesn't make much sense, as you were the only one working on the current MR AND there were no schema changes in the original commit in https://git.drupalcode.org/project/protected_forms/-/commit/5180c2349098....

anybody’s picture

@grevil sadly not.
https://git.drupalcode.org/project/protected_forms/-/commit/40f8328270ff... has been reverted here and that's not what we want.

xpersonas’s picture

@anybody, do you not want me contributing to this module? I'm not getting the Drupal community love vibes here. Lol

Regarding that schema change. I agree. Sadly, that schema change did not make sense to me either. That's why I worked to revert that change already.

https://git.drupalcode.org/project/protected_forms/-/merge_requests/8/diffs

That change to the schema was added to the this MR at some point, but didn't really make sense to me.

https://git.drupalcode.org/project/protected_forms/-/merge_requests/8/di...

That's why I reverted it. There were quite a few changes, some that made sense, some that did not, and some that were going in a good direction. Which is why I consolidated all that into the current state of the MR.

I would ask that you review the current state of this MR vs only looking at the commit history.

If this MR stands no chance of ever being approved, that's ok too. Not a big deal. Just let me know so I can take the appropriate action with all my client sites.