As identified in #686136-16: Implement hook_mollom_form_info for submission protection by Mollom, we have a major problem in scaling.

I didn't think of this possibility, because I haven't used Webform that way.

  • Mollom's hook was basically designed to support "a couple of forms" only (node type forms, comments, etc). We (mainly me) were not aware of the possibility that a module may expose thousands of (different) forms that could be protected.
  • Currently, the information gathered for hook_mollom_form_info() is entirely uncached, which means that all implementations are invoked on every page containing a form. However, even when caching the info, we'd still have to load thousands of nodes to build the cache in Webform's case, so "just throw some caching at it" is no solution.
  • Nathan and me discussed in IRC whether we could pass $form_id to the hook implementations and only make them return something in case the $form_id is relevant for them. Not sure whether that's a valid way to go, but we need to investigate this direction in general.
  • Further discussion led to the idea that we may need two separate hooks, of which only one returns a list of $form_ids + titles, and the other is only invoked if the particular form is indeed altered/processed currently.
  • Nate raised the idea of whether we could optionally pass $form_id to the hook implementations, to keep only one hook, and conditionally return less (or more, if $form_id was given).

Food for thought. I have to sleep over.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

quicksketch’s picture

Following. Everything sun reports above is exactly the concerns I have with hook_mollom_form_info().

My preferred solution would probably look something like the following:

- A hook that provides a list of all forms that may be mollom enabled (probably just like hook_mollom_form_info() is currently, only without element information or mappings). This would be used ONLY when needing a list of all forms, such as on the mollom administration screens. It would NOT be used in hook_form_alter() for performance reasons.

- A hook similar to hook_form_alter(), that mollom invokes to ask modules, "Do you want to have mollom protect this form?". Then the invoked modules can use simple pattern matching on the $form_id and return element and mapping information if mollom protection is desired. This has the advantage of doing no SQL requests for the check, which might return a huge data set in the case of Webform. However it has the downside that this hook would need to be called on every form, meaning multiple calls per page. Avoiding a single SQL query would probably offset this performance hit though, and it would work consistently no matter how many webforms were on a site.

sun’s picture

Almost :) The tricky part is that the hook implementations do not decide whether a form is protected - they just "register" + describe a form.

I didn't sleep yet, but based on your further thoughts...

<BRAINDUMP>

  1. We need a list of forms that may be protected. Only for administrative purposes, to protect a (yet unprotected) form.
  2. Whether a form needs to be protected or not, is the responsibility of mollom.module. After all, it's the only module that knows whether the user configured/protected a form (or not).
  3. To figure out whether a $form_id is protected, we only need a list of $form_ids that are configured for/protected by Mollom. Since this information is only known to Mollom module, it could be cached separately (only statically or not) - in the end it's the result of SELECT form_id FROM {mollom_form}; - only form ids that are contained in the result are protected + vice-versa.
  4. Effectively, we don't really need to invoke hook_mollom_form_info() when investigating whether we need to alter a form. We can build + prepare that info within Mollom module.
  5. We only need to invoke hook_mollom_form_info() to gather the most current form definition from the implementing module (form elements can vanish + stuff).
  6. We already store the 'module' a form belongs to. That may be helpful in only invoking the actually needed hook implementation - since we know the $form_id already, there's no point in invoking the hook for modules that don't own the form.
  7. However, invoking the hook would still mean that we'd load ALL forms of the module. By passing an additional $form_id we could limit the loading to the requested $form_id.
  8. This, however, would not work for the administrative list of forms that can be protected. For this list, we'd likely need a completely separate thing.

    TBH, the administrative select list scenario is what scares me most... Even if we are able to quickly load 10,000 webforms, this selection will be unusable. ;) Ultimately, that's probably more what http://drupal.org/project/form is about (in the long run), since it allows to configure forms "in context", i.e. without the complex list + selection problem. But that's D7 material only.

</BRAINDUMP>

quicksketch’s picture

Thanks sun. I realized I had been misunderstanding the way mollom works. Since I applied the Webform patch, then installed mollom for the first time, the mollom_form table was created with all of the Webform forms on my site pre-loaded. Mollom seems to be enabled on all supported forms and on all fields within those forms by default. This made me assume that Mollom uses an EXCLUSION list to determine which forms mollom should be disabled on, rather than being an enabled list as it is currently (what happens when I add a new field to an existing Webform?)

My strategy had been based on Mollom not keeping any information about a form unless its options have been changed. If a form is explicitly disabled though the Mollom admin interface, then of course we wouldn't need to ask any other modules for additional information on that form. However if Mollom didn't store any information about forms unless it'd been uniquely configured/disabled, then it would make sense to ask all modules, "Do you want Mollom on this form?". Basically my approach was based on the idea that Mollom doesn't store any information about a form unless it's been explicitly configured.

sun’s picture

Quickly answering the open questions before going to sleep:

what happens when I add a new field to an existing Webform?

If the form is configured for textual analysis (i.e. individual form elements were configured) then the new webform component will not be part of the textual analysis unless Mollom's protection for the form is updated.

If a webform component is removed from the form without updating Mollom's configuration, then Mollom will silently fail and ignore that particular component.

All enabled form elements are stored in {mollom_form}.enabled_fields for each $form_id.

Mollom doesn't store any information about a form unless it's been explicitly configured.

This is actually the case (perhaps my last follow-up might have been confusing). Mollom only stores a configuration for a form, if the form is protected in any way. Not sure whether that re-validates some of your thoughts?

--

Based on what we analyzed until now and currently know, this would mean to do the following:

function mollom_form_alter(&$form, &$form_state, $form_id) {
  static $protected_forms;
  if (!isset($protected_forms)) {
    // Mixing some D7 in here, shorter :P
    $protected_forms = db_select('mollom_form')->fields(NULL, array('form_id', 'module'))->fetchAllKeyed();
  }
  // If this form is protected at all...
  if (isset($protected_forms[$form_id])) {
    // ...ask the exposing module for its current definition.
    $function = $protected_forms[$form_id] . '_mollom_form_info';
    if (function_exists($function) && ($form_info = $function($form_id))) {
      // Merge form definition with stored form configuration.
      // @todo Perhaps, or perhaps not, mollom_form_load() could take over the merging (or not). ;)
      $mollom_form = mollom_form_load($form_id);
      // Start processing...
    }
  }
}

function webform_mollom_form_info($form_id = NULL) {
  // If no specific $form_id was requested, return all forms that can be protected.
  if (!isset($form_id)) {
    $result = db_query('SELECT nid, title FROM {node} WHERE type = :type', array(':type' => 'webform'))->fetchAllKeyed();
    $forms = array();
    foreach ($result as $nid => $title) {
      $forms['webform_client_form_' . $nid'] = t('Webform: @title', array('@title' => $title));
    }
    return $forms;
  }
  // Otherwise, return form definition for the requested $form_id only.
  ...
}
quicksketch’s picture

I think it's extremely odd that Mollom would auto-enable itself on every form on the site automatically, but then when that form is updated the new fields would not be checked for spam. As part of the update, I think it would make sense to do one of the following:

- Have mollom disabled on all forms by default, requiring a manual enabling.
- Have mollom enabled on all forms by default, but maintain an exclusion list of ones to disable. All supported fields are checked for spam unless manually disabled.

Right now having mollom enabled by default on install, but then not work with a form after an update doesn't make any sense. Keep in mind that all Webforms start with no fields at all, meaning that all initial definitions of new webforms would get wouldn't contain any fields in mollom.

- Create a new Webform.
- Mollom pulls in the definition of Webform's forms (including the new webform that doesn't have any fields)
- User goes through and adds the fields to the new Webform
- Mollom doesn't work on the new fields because the definition has already been saved in the mollom_forms table.

sun’s picture

hook_mollom_form_info() implementations can decide on their own whether it makes sense to auto-protect forms by default or not - which really depends on the module. To do this, the implementation currently declares:

  'mode' => MOLLOM_MODE_*,

If the "mode" property is not set, then a form will not be protected upon installing Mollom module.

However. With respect to what we are discussing here, I'm not really sure whether we can keep on auto-enabling protection for forms upon installation of modules any longer. It requires us to do what we currently do: Load each and every thing and return a full form information array. On top of that, this happens in mollom_install() and mollom_modules_installed(), i.e. in a request that's very slow already...

Dries’s picture

I think it's extremely odd that Mollom would auto-enable itself on every form on the site automatically, but then when that form is updated the new fields would not be checked for spam.

I basically agree. I think the best approach is to have mollom disabled on all forms by default, requiring a manual enabling.

sun’s picture

Status: Active » Needs review
FileSize
8.03 KB

ok, so we have the first agreement in removing the default protection upon installation. We can remove a lot of code then and tests need to be adjusted. Let's see what the testbot thinks about this.

Status: Needs review » Needs work

The last submitted patch, mollom-DRUPAL-6--1.scale_.9.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
7.92 KB

heh, I guess it helps if I don't remove drupal_install_schema() :)

Status: Needs review » Needs work

The last submitted patch, mollom-DRUPAL-6--1.scale_.11.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
9.19 KB

ok, this one passes. Now on to the other changes we discussed.

Dries’s picture

+++ tests/mollom.test	3 Mar 2010 19:34:01 -0000
@@ -1450,9 +1443,10 @@ class MollomFormConfigurationTestCase ex
-    $this->drupalGet('admin/settings/mollom/add');
-    $this->assertText(t('All available forms are protected already.'));
-    $this->assertText(t('Operations'));
+    // @todo
+//    $this->drupalGet('admin/settings/mollom/add');
+//    $this->assertText(t('All available forms are protected already.'));
+//    $this->assertText(t('Operations'));
   }

The patch looks good -- it is, in fact, a nice clean-up. The only thing that confuses me is the snippet above. It is not clear why that code would have stopped working.

sun’s picture

Here comes the real patch. :)

A couple of weird changes. Still some @todos to resolve, but at least all tests except Reporting functionality are passing already.

Implements what we discussed above: hook_mollom_form_info() only returns a quick list of $form_ids and their titles if no $form_id was passed. If a $form_id was passed, only the information about that form is returned.

Some feedback topics:

1) Do we want to split into 2 hooks: hook_mollom_form_list() and hook_mollom_form_info($form_id) ?

2) Any ideas what we could do with the reporting functionality? We don't even have a $form_id there - only a $entity and $session_id. The only way to solve this currently would be to have modules also return the 'entity' property in its form list. Most probably that's even the only solution.

3) I realized that we can keep the 'mode' property in the registered form information. It's still valuable to allow forms to define the intended default protection mode when protecting the form.

quicksketch’s picture

1) Do we want to split into 2 hooks: hook_mollom_form_list() and hook_mollom_form_info($form_id) ?

I would be in favor of two hooks, though it might be extra work for just about every module except Webform. Two completely different behaviors depending on whether or not $form_id is passed in smells like an $op argument to me. ;-)

sun’s picture

$op? oh noes! /me runs ;)

So, yes, let it be two separate hooks then. I basically prepared that with mollom_form_list() and mollom_form_info() already.

re: 2) Note that this only applies to the current reporting functionality... the revamped reporting through existing delete confirmation forms in #717212: Remove "report to Mollom" links and integrate with entity delete confirmation forms instead makes this question obsolete. However, that's mostly D7 material only (although I'll try to backport the enhancements, so that we can use it for Webform already)

On it.

Status: Needs review » Needs work

The last submitted patch, mollom-DRUPAL-6--1.scale_.15.patch, failed testing.

Dave Reid’s picture

I think the idea of the two hooks makes a lot of sense. First one with an array of form_id => form name so we can populate the select list, second one for all the form's nitty gritty.

Dries’s picture

+1 for having 2 hooks. :)

sun’s picture

Status: Needs work » Needs review
+++ mollom.module	3 Mar 2010 21:57:39 -0000
@@ -477,53 +488,50 @@ function mollom_get_mode($form_id) {
+function mollom_form_info($form_id, $module) {
+  $form_info = module_invoke($module, 'mollom_form_info', $form_id);
...
+  $form_info += array(

$form_info can get NULL here in case a module gets disabled.

+++ mollom.module	3 Mar 2010 21:57:39 -0000
@@ -1736,31 +1763,46 @@ function comment_mollom_report_delete($e
+          'author_name' => 'name',
+          'author_mail' => 'name',

huh? I thought this was fixed already?

Powered by Dreditor.

sun’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
31.54 KB

Here we go.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, mollom-DRUPAL-6--1.scale_.22.patch, failed testing.

Dries’s picture

+++ mollom.admin.inc	3 Mar 2010 21:33:23 -0000
@@ -181,7 +181,9 @@ function mollom_admin_configure_form_nex
+  $mollom_form = mollom_form_info($form_id, $form_list[$form_id]['module']);

This patch looks good. The only thing that I'm not clear about is why we care about passing around the module name? The mapping could be stored somewhere and looked up. It is not something we need to change though -- it is OK as is.

I haven't tried the patch yet but the approach taken in this patch looks solid.

sun’s picture

Status: Needs work » Needs review
FileSize
31.54 KB

hum? Testbot is unable to apply patch? Strange. Let's try again.

re: passing $module/storing lookup mappings: We need to figure out which lookup mappings we need prior to implementing a look-up cache. From what I can tell, we need

1) a $form_id => $module lookup,
2) an "entity" => $form_id + $module lookup,

and with #717212: Remove "report to Mollom" links and integrate with entity delete confirmation forms instead, we additionally need

3) a "delete $form_id" => $form_id + $module lookup.

Note that 2) and 3) are basically substitutes of each other. When integrating with delete confirmation forms, we only need the 3rd lookup, since the 2nd is only required for the custom/generic "report to Mollom" form, which will then only be used for reporting plain session ids (i.e. only used for e-mails currently).

However, I'm not sure whether the integration with delete confirmation forms will be fully backported to D6, since it removes all the "report to Mollom" links that users currently expect. But then again, it makes Mollom's UI much more logical/usable/grokable. Further discussion about this belongs into #717212: Remove "report to Mollom" links and integrate with entity delete confirmation forms instead, of course.

I'd recommend to tackle the lookup cache in a follow-up patch or even separate issue.

Status: Needs review » Needs work

The last submitted patch, mollom-DRUPAL-6--1.scale_.25.patch, failed testing.

sun’s picture

Version: 6.x-1.12 » 6.x-1.x-dev
Status: Needs work » Needs review

errr, WTF?! It's trying to checkout and apply this patch against 6.x-1.12 ?!?

sun’s picture

#25: mollom-DRUPAL-6--1.scale_.25.patch queued for re-testing.

Dries’s picture

Agreed that we should tackle the look-up stuff in a follow-up issue. Let's see what the testbot has to say.

Dries’s picture

#22: mollom-DRUPAL-6--1.scale_.22.patch queued for re-testing.

Dries’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Needs review » Patch (to be ported)

Tested and reviewed. Committed to DRUPAL-6--1 HEAD. Needs to be ported to D7.

sun’s picture

Status: Patch (to be ported) » Reviewed & tested by the community
FileSize
32.49 KB
2.72 KB

Same patch for HEAD.

Plus a small follow-up for D6... visual synchronization is important, again and again :)

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed both patches. Great job, sun!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

  • Commit 9bc32fa on master, fai6, 8.x-2.x, fbajs, actions by Dries:
    - Patch #728718 by sun: hook_mollom_form_info() does not scale.
    
    

  • Commit 9bc32fa on master, fai6, 8.x-2.x, fbajs, actions by Dries:
    - Patch #728718 by sun: hook_mollom_form_info() does not scale.