Enabling drupal installs to post data on their installed components would be good for:

1. improving the rating/ranking of modules/themes on drupal.org, based on actual use, and
2. enabling drupal installs to query drupal.org for data on available updates, bugfixes, security issues

See this previous issue on the project.module: http://drupal.org/node/34108

The attached patch is designed as a first step in this direction.

Rather than a 'directory', the module would enable a listing of 'client' sites (which, of course, can still be the basis of a directory). Each client is identified by an id, assigned when the client first connects and returned to the client so it can be referenced in future communication. If a site chooses, it can post data on its installed modules/themes/theme engines, and also some summary stats (number of users and nodes), which will enhance the modules etc. filtering/ranking on drupal.org. Data on client sites' systems is stored in a new table, client_system.

Update info from drupal.org is not implemented directly here, but a window is left open for the XML-RPC server (e.g., drupal.org) to return messages to the client, which would be set via a new hook, hook_drupal_client_messages(). Hooks could be implemented in e.g. project.module or cvslog.module to fetch update info based on a client's system data in the client_system table. On the client site, any messages received from the server are output via the _help hook. A return array might look like:


$messages = array(
  'admin' => '<p>2 modules have updates available; see <a href="/admin/modules">modules page</a> for details.</p>',
  'admin/modules' => '<p>Priority updates available</p><ul><li><strong>product.module (part of the <a  href="/link">ecommerce module</a>)</strong><br /><a href="/downloadlink">Download updates</a></li><li>etc.</li></ul>'
);

with the results being displayed on the 'admin' and 'admin/modules' pages.

Details of the patch:

  • changes table 'directory' to 'client', added new fields as needed
  • creates 'client_system' table to store data on client systems (modules, etc.)
  • makes login and directory server functionality configurable, so that these don't have to be on for sites to use the module
  • adds ability to specify user name to associate client record on XML-RPC server
  • adds options for exporting (a) system info, (b) extended info to settings page
  • when options selected, system info and extended info (number of nodes, number of users) are sent to XML-RPC server
  • reworks validation a bit on settings page to remove redundant code, test for all required data at once

To test the patch:

  • patch drupal HEAD and install two fresh sites, one for a client and one for a server
  • config the client at admin/settings/drupal to point to the server's XML-RPC file
  • config the server at admin/settings/drupal to accept client connections
  • run cron.php on the client site repeatedly, testing the various options (system info disabled vs. enabled, ditto for extended info, etc.)
  • between crons, look at the 'client' and 'client_system' tables on the server site to see if they are filled.

For testing, you’ll need to temporarily comment out two lines in drupal_cron() on the site serving as the XML-RPC server, as follows


/**
 * Implementation of hook_cron(); handles pings to and from the site.
 */
function drupal_cron() {
  // if (time() - variable_get('drupal_cron_last', 0) > 21600) {
    variable_set('drupal_cron_last', time());

    // If this site acts as a Drupal XML-RPC server, delete the sites that
    // stopped sending "ping" messages.
    if (variable_get('drupal_client_service', 0)) {
      db_query("DELETE FROM {client_system} s INNER JOIN {client} ON s.cid = c.cid WHERE c.changed < '" . (time() - 259200) . "'");
      db_query("DELETE FROM {client} WHERE changed < '" . (time() - 259200) . "'");
    }

    // If this site acts as a Drupal XML-RPC client, send a message to the
    // Drupal XML-RPC server.
    if (variable_get('drupal_register', 0) && variable_get('drupal_server', 0)) {
      drupal_notify(variable_get('drupal_server', ''));
    }
  // }
}

Otherwise the repeated cron runs won’t work, as the system is waiting until a set interval has passed.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nedjo’s picture

One weakness of the approach I've used here is that, to try to provide a basis for a later query "what are the updated versions available of this module/theme/engine?", the description sent to drupal.org includes the filemtime() of each system file on the client system. But that's very limited.

In http://drupal.org/node/37927 we're considering using hashes of each project release file. Could/should we use the same here--only of each .module, .theme. and .engine file? We could generate these on drupal.org and store them in the cvs_files table. Then we could post hashes as part of the system module data. These would then be compared with the data on drupal.org to find the exact match.

Does this sound right?

Dries’s picture

Status: Needs review » Needs work

Quick review:

  1. I don't see why we need an additional $cid. The URL is already unique, or should be. For security, the (URL, mail)-pair is slightly better. Clearly, a $cid is handy for processing queries, but I'm not sure we want to send it back to the client. It's easy enough to change your $cid and to impersonate someone else. Also, people flush their variables table. I don't trust a client side $cid, unless maybe, it's a random string of data but even then I would end up flusing my $cid.
  2. Please drop the drupal_client_messages stuff for now; I'm not convinced this the best approach.
  3. Why do we want to link the username to the drupal.org users table? I'd be very careful about this kind of stuff; it's easy enough to enter a fake user ID. I suggest we drop that the user_name/user database integration.
  4. Glueing partial strings to each other (eg. $missing[] = t('your site mission');) is problematic for translations. It is not something we can do. The error handling code needs to be rewritten.
  5. In user output, we don't abbreviate words; things like info should be information.
  6. In user output, it is Drupal not drupal.
  7. +      db_query("DELETE FROM {client_system} s INNER JOIN {client} ON s.cid = c.cid WHERE c.changed < '" . (time() - 259200) . "'");
    +      db_query("DELETE FROM {client} WHERE changed < '" . (time() - 259200) . "'");

    Always use %d. Also, c is not defined?

  8. Use single quotes only when it shortens the code. In other cases, it actually degrades readability. 'http://'. $server .'/xmlrpc.php' should be "http://$server/xmlrpc.php".
  9. Why do we send the updated time of the files? I suggest we drop that for now. We'll reconsider this when we have proper version support.
  10. Why do we send themes and modules that are not enabled? I suggest we just send the active themes/modules.
  11. The documentation mentions that the information will be listed. That is no longer true; the 'Drupal sites' page is offline. We might or might not list the information in future. I suggest we remove all references to the 'Drupal sites' page.
nedjo’s picture

Thanks for the thoughtful review Dries. It was a busy week and I didn't get a chance to focus on this. I'll work on a revised patch incorporating your suggestions.

What I've been hoping we might be able to do is to leave a door open to returning useful information to user sites on available updates etc. But, I agree, these things need to be carefully thought out and the window (another architectural metaphor) for doing so for 4.7 is closing. Perhaps developing some of this in a contrib module e.g. for CivicSpace over the 4.7 cycle would be useful, with a mind to refining and then moving features to core for 4.8.

Cvbge’s picture

+ db_query("DELETE FROM {client_system} s INNER JOIN {client} ON s.cid = c.cid WHERE c.changed < '" . (time() - 259200) . "'");
+ db_query("DELETE FROM {client} WHERE changed < '" . (time() - 259200) . "'");

Always use %d.

Are there any reasons to use %d? I imagine it's possibile to redefine time() functions, but if someone can run code to do that, he can do much worse things... It might look a bit nicer though...

nedjo’s picture

FileSize
20.3 KB

Here's a revised patch following most of the suggestions.

* $cid is retained, but not returned to client.
* no messages returned to client
* no user id used
* data on client system limited to enabled components, changed time not sent
* other minor fixing up.

nedjo’s picture

FileSize
23.85 KB

The last patch is my best attempt at a "do the minimum" approach, which is probably what's needed.

But for the sake of argument - or perhaps to begin sketching in what will probably be future improvements - here is another take on the "do the minimum, while still leaving a door open to sending client sites updates" approach.

Added to the previous patch is:

* configuration option to designate server as offering update information (by email). this information is by means of subscription.
* if this option is set on the server, a token is returned to the client, which then displays a message on the settings page. either (a) there is already an account for the site email on the XML-RPC server, in which case the client is prompted to log in, or (b) there is no existing account for the site email, so the client is prompted to register.
* once registered and logged in on the XML-RPC server, the client owner sees a new section on his/her user account edit page, 'Client site updates', where he/she can opt in.

So, this leaves a door open, without committing us to anything. If/when we're ready, we flip a switch on drupal.org and client sites can begin subscribing.

But - while perhaps better than my previous suggestion - this is only one of a number of possible approaches and I accept that we'll probably have to wait for 4.8 to do anything along these lines.

Cvbge’s picture

Hi,

If you change database.mysql, please provide changes to database.pgsql and update.inc

Does table client_system really need to have primary key (cid, name)? You seem to use cid only

Cvbge’s picture

Furthermore:

I don't think DELETE FROM {client_system} s INNER JOIN {client} c ON s.cid = c.cid is standard sql

nedjo’s picture

Here's an updated version of the patch at comment #5. I've added pgsql and update versions. For the update, I've just dropped the old directory table and added client and client system. This assumes there's no needed information in the old tables (that won't be replaced when clients reconnect). Valid assumption?

Also, I realize I haven't considered what will happen when old drupal.module versions try to connect. They'll be sending data in a way that the new drupal.module won't understand.

I don't think DELETE FROM {client_system} s INNER JOIN {client} c ON s.cid = c.cid is standard sql

Yep, thanks, it was wrong, now reworked.

Does table client_system really need to have primary key (cid, name)? You seem to use cid only

cid by itself won't be unique. The unique combination will be cid, name. (Unless we can have a module with the same name as a theme. Is this possible? If so, the primary key should be a combination of the three fields.)

nedjo’s picture

Forgot to save my last changes before generating the patch, here's the right version.

Dries’s picture

I'd like to go with the minimal version of this patch.

Just wondering, what will happen when someone sends an "old style ping"? This shouldn't generate errors, or render the database inconsistent. Looks like the patch needs another version to make this work right.

What happens when I change the name of my site? A new entry is inserted, and the old entry will expire, or will we end up with two entries?

nedjo’s picture

FileSize
23.52 KB

Here's an (minimal version) updated patch. The only change (besides moving and renaming the update function) is an added test at the beginning of function drupal_client_ping():


  // Pre 4.7 versions sent data in a different format, no longer supported.
  if (!is_array($client)) {
    return FALSE;
  }

Just wondering, what will happen when someone sends an "old style ping"? This shouldn't generate errors, or render the database inconsistent. Looks like the patch needs another version to make this work right.

What I've done (change noted above) is return FALSE when past client versions attempt to register.

We could, alternately, add in support for past versions, but it would be a bit complicated. There were more arguments passed in previously, and of different data types. Since XMLRPC wants us to define the data types, I'm not sure how this would work. Maybe we would need two separate methods, one now deprecated but still supported for backward compatibility.

What happens when I change the name of my site? A new entry is inserted, and the old entry will expire, or will we end up with two entries?

Changing the name won't lead to two entries--the old name will be replaced with the new. Changing the base url will lead to two entries--until the old one expires due to lack of pings.

jvandyk’s picture

The patch is a bit broken. Two updates in updates.inc have the same function name. Also, the update_sql functions requre closing parentheses.

At a higher level, I must ask: why are we tieing all these functions together in one module? Why is "let users from other drupal sites log into my drupal site" here along with "send information about my drupal site to drupal.org?"

It seems to me the remote authentication functions should be in a separate module, where you can configure which sites you want to allow to log in to your site.

The sending of configuration information to drupal.org (and possibly future updating functionality) should be a totally separate module. It has nothing to do with remote authentication.

Under "Enable Services" - "Client Connections" we have an enable/disable checkbox with the descriptoin "If enabled, your Drupal site will allow other sites to register as clients" means nothing to me. Clients of what?

puregin’s picture

I agree with jvandyk that the drupal.org module seems to be a bit of a mishmash of functionality at this point. Perhaps this should be refactored into a couple of single-purposed modules.

Djun

jvandyk’s picture

FileSize
24.17 KB

Here is an updated patch that fixes the above issues and improves the help and descriptions. It does not yet move the remote authentication code into a separate module.

I propose moving the remote authentication stuff into remoteauth.module and making the enable/disable remote authentication setting available on a tab when viewing user settings (when remoteauth.module is enabled).

nedjo’s picture

FileSize
26.69 KB

Thanks John for your insightful comments and detailed suggestions. It really helps to have someone go over a patch with such a careful eye to detail. Your suggestions certainly improve the "plain languageness" of the help text and field labels.

This revised patch incorporates John's suggestions and fixes (some I've adapted). I've also gone through the code again and made a number of small additional changes, mainly for clarity and consistency. Of note:

  • Introduced a new theme function for the client listing. I also converted an inline CSS instruction to a class name ('indented'), adding the class name to an existing item in drupal.css (not sure if I chose appropriately). Yes, this listing functionality isn't currently in use on drupal.org, but it could be useful on other sites using the Drupal module and, if it's there, it should be consistent with other Drupal code.
  • Changed the module description to emphasize its use in ranking modules and themes. New text: "Lets you register your site with a central server and improve ranking of Drupal projects by posting information on your installed modules and themes; also enables users to log in using a Drupal ID." (But is this too long/wordy?)

Why is "let users from other drupal sites log into my drupal site" here along with "send information about my drupal site to drupal.org?"

I assume what originally tied these together was that they are both implementations of inter-site communications.

I propose moving the remote authentication stuff into remoteauth.module and making the enable/disable remote authentication setting available on a tab when viewing user settings (when remoteauth.module is enabled).

Moving remote authentication might indeed make sense. It would obviously be a different (probably subsequent) patch.

Dries’s picture

I too had problems with the words (and variable name) 'client service'. It's pretty meaningless. I suggest we call it 'ping' or 'ping service'.

$result = db_query("SELECT cid FROM {client} WHERE changed < '%s'", time() - 259200); should use %d.

$result = db_query('SELECT * FROM {client} ORDER BY '. $sort); looks prone to security problems. I know, it is part of the original code, but if possible, I'd use %d or %s.

As for theme_client_item(), I think it's better to remove the CSS and to use simple definition lists.

I'd use a define for the version:

define(VERSION, '4.7.0');
'version' => VERSION

rather than:
'version' => variable_get('drupal_version', '4.7')

Otherwise the code looks good! If you can provide one more revision, we can commit this to HEAD. Great work, Nedjo.

nedjo’s picture

FileSize
26.33 KB

Here is a new patch. I've:

  • altered the fieldset labels in the settings page, dropping 'client' and 'server' references (now closer still to what John had suggested)
  • used a definition list for the theme function.
    (thought: should we instead consider extending theme_item_list() so it could output dl lists as well as ol and ul ones?)
  • used a define for the version.
    (thought: should this be in common.inc or possibly boodstrap.inc instead, so that it's available to other modules?

Maybe we're there, but if not I'm happy to make further tweaks.

Dries’s picture

Status: Needs work » Fixed

Thanks Nedjo. You missed several suggestions so I fixed these myself and committed the patch. This new feature will help us improve the drupal.org experience.

Cvbge’s picture

Category: feature » bug
Status: Fixed » Reviewed & tested by the community
FileSize
2.18 KB

Fixed:
1. db_next_id('client') -> '{client}_cid'
2. docs: there is no more drupal_directory_page(), it's named drupal_client_page()

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Once again; good catch. Committed. Rock on.

Cvbge’s picture

Priority: Normal » Minor
Status: Fixed » Reviewed & tested by the community
FileSize
1.72 KB

Last one, I hope.
Added missing $ret[]= to update_sql()

markus_petrux’s picture

FileSize
1.39 KB

The attached patch fixes the updates.inc so the new table names are enclosed between curly brakets {}.

markus_petrux’s picture

FileSize
1.43 KB

Sorry, this one includes the $ret = fix posted above (since it was not in cvs I missed it).

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)