background
sf_import module has been in dev for some time now, but until now only offered the ability to perform one-time imports of Salesforce data. While this feature has proved invaluable for me, I have wanted to be able to set up such imports to be performed regularly and break my dependence on Salesforce Outbound Messages (sf_notifications) - which have many problems of their own.

solution
The attached patch extends sf_import enabling administrators to designate import configurations to be performed as part of regular cron runs. After the initial batch API import, sf_import will process one import configuration per cron run on an ongoing basis. In this way, Drupal data are kept in synch with Salesforce data, without having to rely on incoming data from Salesforce and the perils fraught therein.

In short, this patch adds a polling feature for imports.
Given an import configuration, on each cron run, Drupal asks Salesforce "do you have any updates for a specific record type since my last cron run?"

known limitations
The most notable limitation that comes to mind is around dependent record types.
For example, what happens when importing a Salesforce lookup field (mapped to a nodereference field) for which a Drupal object doesn't yet exist?
At its core, this is a limitation within sf_import, and applies to one-off imports as well as ongoing imports.

Various TODOS are documented in the body of the patch.

Patch posted here to hopefully get some reviews. This will be going into dev in a week or two either way.

Comments

joeybaker’s picture

This sounds great!

However, I'm not able to apply the update. I get an error message on update.php:

warning: array_merge() [function.array-merge]: Argument #2 is not an array in /update.php on line 174.

aaronbauman’s picture

StatusFileSize
new21.41 KB

this warning will not affect application of the patch - it's simply that the hook_install_n didn't return anything.
you should see a new database table {salesforce_import}.
here's an updated patch anyway.

joeybaker’s picture

Hey Aaron–

Thanks for the update! I am now either being a klutz or there's another bug. I am able to apply the update, but I get the result:

The following queries were executed
sf_import module
Update #6000
No queries

…and the table is not added to the database.

dpearcefl’s picture

I don't know if this idea and my work on sf_import recently may help you, but you may want to look at these other issues:
#1137796: Update Drupal users/nodes only if Salesforce is providing new values for fields - Speeds things up by stopping unneeded node_save() calls
#1138154: sf_import_batchjob performance improvement - Massive speed up and reduction in API calls

kostajh’s picture

We are also looking at ditching Outbound messages. I will spend some time testing and working on this, thanks for writing the code! Is it possible to push a branch for this issue to the git repo so that we can work collaboratively on a branch instead of via patches that may not contain everyone's work?

EDIT: Regarding git workflow, I found this instead: Advanced patch contributor guide

kostajh’s picture

StatusFileSize
new21.89 KB

I also had problems installing the db schema. In hook_schema, $schema wasn't being returned. Attached patch contains this fix. Also, wraps "Submit" and "Delete" in t().

kostajh’s picture

StatusFileSize
new22.21 KB

Here's another updated version that also includes patch from #1138964: On import: No failures but no results displays a single dot. In this patch the number for hook_update_N is set consistently with the other modules; drupal_write_record is used in place of db_query; the wording on "linking" is adjusted to reflect that any entities can be linked, not just nodes.

I'm going to work on the UI - you should be able to create an import configuration without running the batch processing right away, and should also be able to run the import from the index page without waiting for cron. Also, error handling could be improved - the batch process fails if there is an exception returned from SF, but I don't think this should stop the import.

kostajh’s picture

Status: Needs review » Needs work

I have to move on to other things for the rest of the day, but just a quick note to say that this isn't working for users. I haven't tested with nodes. After updating 94 records, the batch process returns an error with no detailed information. I started adding some debug statements but didn't see anything obvious as to why this was occurring.

For those testing with the patches in this issue, be sure to run git rebase origin/6.x-2.x to get the latest changes from sf_import-6.x-2.x-dev.

kostajh’s picture

One more thing, in the Saleforce PHP Toolkit (SforceBaseClient.php), we have a "getUpdated" function that might be helpful in the implementation of hook_cron().

	public function getUpdated($type, $startDate, $endDate) {
		$this->setHeaders("getUpdated");
		$arg = new stdClass;
		$arg->sObjectType = new SoapVar($type, XSD_STRING, 'string', 'http://www.w3.org/2001/XMLSchema');
		$arg->startDate = $startDate;
		$arg->endDate = $endDate;
		return $this->sforce->getUpdated($arg)->result;
	}
kostajh’s picture

Priority: Normal » Major
Status: Needs work » Needs review
StatusFileSize
new825 bytes

Here's an updated patch with the following changes:

  • Removes the "extra-linked" option since this isn't used
  • Add fieldmap description to the page where you create a new import configuration
  • Display the import results even if there were failures on import

I've been doing some testing with users and so far it works great if the import configuration is invoked via cron.

However, on creating a new import configuration or editing an existing one, and starting the batch process, Drupal will start updating 100 or so users before failing with errors. I haven't been able to figure out why that's occurring.

kostajh’s picture

StatusFileSize
new22.04 KB

Sorry, wrong patch attached, try this one.

kostajh’s picture

Another issue with this. Using the Apex Dataloader I made updates to 250 Contacts. Then I ran cron which triggered an import. After processing 145 records, I got an error Cron run exceeded the time limit and was aborted. The expected behavior would be for the remaining 105 Contacts to be imported. Instead sf_import starts all over again.

I added a debug statement just below this bit in sf_import_cron():

while ($context['finished'] < 1 && time() < $request_time + $limit) {
    sf_import_batchjob($object->name, $conf, $context);
  }

But it is never called if cron exceeds the time limit. So I think the db_query to update the LastModified time needs to happen in that while loop.

I tried the following and it works but I'm sure there is a better way.

  while ($context['finished'] < 1 && time() < $request_time + $limit) {
    sf_import_batchjob($object->name, $conf, $context);
    $query = $context['sandbox']['salesforce']['query'];
    $pos = $context['sandbox']['position'];
    $record = $query->records[$pos - 1];
    $object->last = strtotime($record->LastModifiedDate);
    $object->conf['cron-remaining'] = $query->size - $pos;
    db_query('UPDATE {salesforce_import} SET conf = "%s", last = %d WHERE id = %d', serialize($object->conf), $object->last, $object->id);
  }

Also, farther on down in sf_import_cron there is reference to $query_array, but that is never defined in sf_import_cron - it's in sf_import_batchjob.

kostajh’s picture

Follow up to #9 above: we should also look at implementing getDeleted which will return Salesforce records deleted within a given time frame.

kostajh’s picture

StatusFileSize
new21.99 KB

This patch includes previous work, and fixes a SQL error when inserting a new import configuration.

kostajh’s picture

StatusFileSize
new18.97 KB

This patch applies cleanly to 6.x-2.x-dev after the latest changes.

kostajh’s picture

StatusFileSize
new2.98 KB

I ended up re-writing this completely. I needed to get something functional and stable put together quickly and what is attached works, where I was having a lot of difficulty with the approach and patches from earlier in this issue.

amariotti’s picture

Wow. I'm a total newb to this whole git stuff. Can someone patch the file for me and upload it or point me to somewhere that helps me out with this? I have git installed on my server and tinkered with it but couldn't get it to apply without this error:

1135752-cron-import-16.patch:29: trailing whitespace.
  
1135752-cron-import-16.patch:42: trailing whitespace.
    
1135752-cron-import-16.patch:47: trailing whitespace.
    
1135752-cron-import-16.patch:62: trailing whitespace.
      } 
1135752-cron-import-16.patch:71: trailing whitespace.
    
Checking patch salesforce_api/salesforce_api.module...
error: while searching for:
  }
  $sf = salesforce_api_connect();
  if ($sf) {
    $response = $sf->client->getUpdated($type, $start, $end);
    if ($response->ids) {
      return $response;
    } else {

error: patch failed: salesforce_api/salesforce_api.module:1313
error: salesforce_api/salesforce_api.module: patch does not apply
Checking patch sf_import/sf_import.module...
kostajh’s picture

StatusFileSize
new11.07 KB

Here's a new patch - I had the order reversed for salesforce_api_get_id_with_sfid.

@amariotti: download the patch file to the root of your Salesforce directory, then run "patch -p1 < name-of-patch.patch". There are docs on patching here

amariotti’s picture

Thanks. Not sure why I missed that. :)

Here's the error I got this time, but everything appears to be OK on the Drupal side.

patching file salesforce_api/salesforce_api.module
Hunk #1 FAILED at 1313.
1 out of 1 hunk FAILED -- saving rejects to file salesforce_api/salesforce_api.module.rej
patching file sf_import/sf_import.module
Hunk #1 succeeded at 219 (offset -2 lines).
patching file sf_node/sf_node.module
kostajh’s picture

Hold on, that last patch was no good.

kostajh’s picture

StatusFileSize
new4.32 KB

Try this one please. You'll have to disable sf_import, uninstall it, then re-enable the module for the database schema to be installed.

kostajh’s picture

StatusFileSize
new4.31 KB

Another minor update: should correctly process all fieldmaps with automatic create/update settings in place.

amariotti’s picture

I've set this up, but it appears that I can't really take advantage of this given the way I went about getting my data into Drupal. Do I need to start over and start from this point? I just struggle since I went to so much trouble getting the 9k records into Drupal in the first place. Recommendations?

kostajh’s picture

StatusFileSize
new7.38 KB

Here's an updated patch. I refactored the code in hook_cron for greater flexibility, and added drush command for "sf-get-updated".

kostajh’s picture

@amariotti: You should be able to work with this patch if the salesforce_object_map table contains correct mappings between Drupal object IDs and SFIDs. What are the problems you are having?

kostajh’s picture

StatusFileSize
new7.29 KB

Updated: removed an extraneous watchdog call.

amariotti’s picture

@kostajh, can this be applied on top of the previous patches?

kostajh’s picture

Nope, please revert the previous patch applied (https://drupal.org/patch/reverse) then apply this one.

amariotti’s picture

Reversed and then applied the new patch...

Here's the error that I get back. Is it possible to run the sf-get-updated on specific field maps? I tried to run it using one fieldmap (using the long unique string) and it looks like it checked everything.

Here was my output after running the following command:
drush sf-get-updated 25bec3bec36bf0133ec0ace85cad22c1

Checking for updated records...
Returned 3 record(s)
 SFID                Fieldmap                          Time Imported 
 001G000000f4LKvIAM  c4ca4238a0b923820dcc509a6f75849b  1305220561    
 001G000000f4LL9IAM  c4ca4238a0b923820dcc509a6f75849b  1305220561    
 001A000000co7rkIAA  c4ca4238a0b923820dcc509a6f75849b  1305220561    

Processing records...
WD php: Duplicate entry '69466' for key 2                                                                                                                                                                    [error]
query: INSERT INTO users (mail, name, created) VALUES ('email@hisemail.net', '69466', 1305220561) in /home/user/public_html/modules/user/user.module on line 329.
WD salesforce: Salesforce returned an unsuccessful response: stdClass Object                                                                                                                                 [error]
(
    [created] => 1
    [errors] => stdClass Object
        (
            [fields] => Array
                (
                    [0] => LastName
                    [1] => Join_Date__c
                    [2] => Member_Expiration_Date__c
                )

            [message] => Required fields are missing: [LastName, Join_Date__c, Member_Expiration_Date__c]
            [statusCode] => REQUIRED_FIELD_MISSING
        )

    [id] => 
    [success] => 
)

WD salesforce: Salesforce returned an unsuccessful response: stdClass Object                                                                                                                                 [error]
(
    [created] => 1
    [errors] => stdClass Object
        (
            [fields] => Array
                (
                    [0] => LastName
                    [1] => Join_Date__c
                    [2] => Member_Expiration_Date__c
                )

            [message] => Required fields are missing: [LastName, Join_Date__c, Member_Expiration_Date__c]
            [statusCode] => REQUIRED_FIELD_MISSING
        )

    [id] => 
    [success] => 
)

Processed 3 record(s)
 SFID                OID    Fieldmap                         
 001A000000co7rkIAA         c4ca4238a0b923820dcc509a6f75849b 
 001G000000f4LL9IAM  25439  c4ca4238a0b923820dcc509a6f75849b 
 001G000000f4LKvIAM  25440  c4ca4238a0b923820dcc509a6f75849b 

Duplicate entry &#039;69466&#039; for key 2                                                                                                                                                                  [error]
query: INSERT INTO users (mail, name, created) VALUES (&#039;email@hisemail.net&#039;, &#039;69466&#039;, 1305220561) in /home/user/public_html/modules/user/user.module on line 329.
amariotti’s picture

Now that I have drush running should I wipe out my users that are syncing from SF and do a fresh import using the drush sf-import command? I just need this to start working and syncing properly. Each time I run this it pulls a new set of 3 users and says that they can't be synced because of duplicates. I'm assuming that it's because the UIDs are matching other ones. Any ideas how to troubleshoot this or is starting over the best solution? I don't have a problem with going that route, especially since I know that SF still has all of the data that it needs. It's not depending on Drupal for anything. Thoughts?

dpearcefl’s picture

FYI: I have a project waiting to be approved that implements "Advanced Actions" for Salesforce Imports and Exports. This will allow CRON to be a trigger for the action you define.

amariotti’s picture

@dpearceMN: Is that response for me?

dpearcefl’s picture

It was more a general post of another solution possibility with working code.

joeybaker’s picture

Hey all–

I'm a bit confused about the current status of this patch. I've got it applied, but I see no GUI to change settings. Did I gloss over the part where drush is now required, or am I just blind and foolish?

kostajh’s picture

The patch in #26 implements hook_cron, so that every time cron runs, sf_import will take fieldmaps with "Sync on create" or "Sync on update" checkboxes ticked and look for updates in Salesforce, and import them. There is no UI aside from that.

The original patch posted in this issue (all the way up to comment #15) takes a different approach, and has a UI for creating automatic import tasks, but I was never able to get it working well for users.

At some point we should decide which approach we'd like to take and then move forward with development so we can get this into -dev.

joeybaker’s picture

gotcha! Thanks!

I'll have to test that to see if it works, but I'd certianlly say it's not the most intuitive UI. Might be a good idea to at least change the wording on those checkboxes to make it clear that they will also operate on cron.

kostajh’s picture

Status: Fixed » Needs review
StatusFileSize
new15.51 KB

Ok, here is a patch that addresses concerns from above. I'm committing this to dev and will tag an alpha release later today or tomorrow.

After applying the patch, make sure to disable then re-enable sf_import as the menu structure has changed.

In this patch:

  • New settings page. There is now a "Import Settings" tab under Import, along with the "Batch Import" page from before
  • On the Import Settings page (admin/settings/salesforce/import), you can select which fieldmaps you want to use for regular imports from Salesforce
  • Once you have selected the fieldmaps you want to use, you can see which records have been updated since the last update was run.
  • If updates are pending, you are also given the option to manually trigger an import.
  • There is a status message at the top of the Import page which tells you the time of last import, along with how many records were imported, and how many were processed
  • Basic documentation via hook_help()
kostajh’s picture

Status: Needs review » Fixed

Status: Needs review » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.