Condition : "Text contains any word in a list" [#1343442]

I've coded a custom condition for a project and was wondering if it could be useful for everybody. Basically, it adds a new condition in the "Data" group that allows to compare a string and a list of string, aka a list<text> which is an array of strings. If there's a match between them it returns true. Here's the code I wrote. It could be perfected, the comparison loop could be a do-while for better performance, there's no need to declare a $pattern variable for a single use, and the final test could leave the == 0 since in PHP 0 == FALSE. But it works very well :)

For history, I used it to automatically flag comments that contain some offensive words on comment submission/edition. That's why I used a regexp with \b to be sure to match whole words only.

/**
 * Implements hook_rules_condition_info().
 *
 * Declare a new Rules condition : "Text contains any word in a list"
 * This condition is gonna be used as a way to match comment content and the list of suspect words
 *
 * @return array the condition description
 */
function mymodule_rules_condition_info() {
  return array(
    'mymodule_text_contains' => array(
      'label' => t('Text contains any word in a list'),
      'group' => t('Data'),
      'parameter' => array(
        'text' => array(
          'label' => t('Text to search into'),
          'type' => 'text'
        ),
        'list' => array(
          'label' => t('List of words to match'),
          'type' => 'list<text>',
          'description' => t('The comparison will be case insensitive'),
        ),
      ),
    ),
  );
}

/**
 * Rules condition callback. Matches a text against a list of words.
 *
 * @param $text string Text to search into
 * @param $list array List of words to match
 * @return boolean TRUE if there's a match, FALSE otherwise
 */
function mymodule_text_contains($text, $list) {
  $match = 0;
  foreach ($list as $word) {
    // This pattern takes care of word boundaries, and is case insensitive
    $pattern = "/\b$word\b/i";
    $match += preg_match($pattern, $text);
  }
  return $match == 0 ? FALSE : TRUE;
}

Comments

Comment #1

xandeadx CreditAttribution: xandeadx commented 17 January 2012 at 06:55

Component:

Rules Core

» Rules Engine

Comment #2

DjebbZ CreditAttribution: DjebbZ commented 19 January 2012 at 23:06

I hope it can be useful to you !

Comment #3

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 17 February 2012 at 11:25

This looks pretty handy. I am struggling hard with trying to envision a simple use case for Rules out of the box to explain it to people, and if this condition were there, a "spam filter" would be an obvious one.

Comment #4

fago

German

Vienna

CreditAttribution: fago commented 24 February 2012 at 11:12

Status:

Active

» Needs work

Yep, sounds handy. Let's polish and include it.

$pattern = "/\b$word\b/i";

This would treat $word as regex. Maybe we should just use stripos() instead?

Comment #5

DjebbZ CreditAttribution: DjebbZ commented 6 March 2012 at 06:33

The problem with stripos is that it detects words inside longer words. It specifically wasn't my use case at all. The wording of this condition is "text contains any word in a list". stripos() would make it more more like "text contains this sequence of characters from a list". I think both are completely ok, and even if close in meaning and code, can have their own existence.

For someone who's searching for whole words, the stripos() would bring false positives. For someone searching for a pattern, the stripos() is better. We may even make the case sensitivity an option for both cases.

Comment #6

fago

German

Vienna

CreditAttribution: fago commented 7 March 2012 at 14:43

I see. I'd agree that it should check for words only.

Comment #7

mitchell CreditAttribution: mitchell commented 28 March 2012 at 19:17

Title:	New condition : "Text contains any word in a list"	» Condition : "Text contains any word in a list"
Issue tags:		+data transforms

Comment #8

ressa CreditAttribution: ressa commented 28 August 2013 at 22:18

Component:	Rules Engine	» Rules Core
Status:	Needs work	» Active

Thanks for sharing @DjebbZ, it was just what I was looking for. Any chance this might make it into the official version? I have tested it, and it seems to work just fine.

Comment #9

Jarviss CreditAttribution: Jarviss commented 17 November 2013 at 19:10

Wolfgang can it be added to Rules release? As addition we can change code to use Vocabulary for spam list filter!

Comment #10

Jarviss CreditAttribution: Jarviss commented 17 November 2013 at 20:26

Wolfgang can you help to change this code so Rules Condition could compare Spam Vocabulary terms based on Vocabulary id provided to to Rules condition?

Comment #11

Jarviss CreditAttribution: Jarviss commented 18 November 2013 at 17:55

Issue summary:

View changes

Here is code for Rules to use Vocabulary ID - which is Spam filter Vocabulary

<?php
/**
 * Implements hook_rules_condition_info().
 *
 * Declare a new Rules condition : "Text contains any word in a list"
 * This condition is gonna be used as a way to match comment content and the list of suspect words
 *
 * @return array the condition description
 */
function rules_spam_rules_condition_info() {
  return array(
    'rules_spam_text_contains' => array(
      'label' => t('Text contains words from Vocabulary'),
      'group' => t('Data'),
      'parameter' => array(
        'text' => array(
          'label' => t('Text to search into'),
          'type' => 'text'
        ),
        'list' => array(
          'label' => t('Vocabulary ID'),
          'type' => 'text',
          'description' => t('The comparison will be case insensitive'),
        ),
      ),
    ),
  );
}

/**
 * Rules condition callback. Matches a text against a list of words.
 *
 * @param $text string Text to search into
 * @param $list array List of words to match
 * @return boolean TRUE if there's a match, FALSE otherwise
 */
function rules_spam_text_contains($text,$list) {
  $match = 0;
  $list = (int) $list;
  $terms = taxonomy_get_tree($list);
  $items = array();
  foreach ($terms as $term) {
          $items[] = $term->name;
          }
  foreach ($items as $word) {
    // This pattern takes care of word boundaries, and is case insensitive
    $pattern = "/\b$word\b/ui";
    $match += preg_match($pattern, $text);
  }
  return $match == 0 ? FALSE : TRUE;
}

Comment #12

arruk CreditAttribution: arruk commented 9 December 2014 at 16:35

This looks great! Has anyone used it with Reservation Conflict?

Comment #13

mpotter CreditAttribution: mpotter commented 21 April 2015 at 19:27

Would really like to see a generic "contains" string data comparison. Tried something like the OP but I don't see anything in the "Comparison Operator" field for data.

The use case: I want to compare the email address of a new user to see if it contains certain text in order to automatically assign it to an OG Group.

Even better would be a regex comparison operator. Does this already exist somewhere in contrib?

Comment #14

mpotter CreditAttribution: mpotter commented 21 April 2015 at 19:30

Nevermind, I'm an idiot. I was looking at operators within Data comparison and missed the entire Text comparison conditions.

Comment #15

Québec CreditAttribution: Québec commented 6 April 2016 at 12:22

Hi.

I'm trying to make a simple antispam to prevent humans from manually creating unwanted content. But I just cannot find the way to make Rules check into a list. It seems to work if I make one condition per word. But if I make a list — one word per line or comma separated — it does not work.

So I saw all that PHP code (#0 and #11). Does this mean that it is not possible to make Rules match a content to a list of words? Do I need this code? Where to put it? How to make the list; comma seperated?

Thanks.

Comment #16

TR CreditAttribution: TR commented 7 March 2019 at 19:56

Issue tags:

-data transforms

+Needs tests, +Novice

Still in need of an actual patch here.
Also needs a test case.

Comment #17

TR CreditAttribution: TR commented 30 November 2019 at 00:01

Version:

7.x-2.x-dev

» 8.x-3.x-dev

Moving to 8.x-3.x. There doesn't seem to be much interest in adding this, but if it is added it should go into the current version of Rules first. It can then be backported to 7.x-2.x if there is community interest.