Harvest records

Drupal-opac provides drush command for harvest, update and delete records/opac nodes. Here i will speak about harvest command only. It is assumed that you have intalled drush yet (version 5.0 is required) and you know how to use it. Otherwise visit the Drush projet page to learn more.

Drush commands overview

OPAC main module provides 4 harvest drush commands :

  • opac-harvest (alias harvest): Import record in drupal,
  • opac-update: Do exactly the same than harvest command but allows to specify paramaters to sort and filter nodes you want update,
  • opac-servers-list: Show existing opac servers list. Usefull before launching a harvest if you don't remember the machine name of a server,
  • opac-purge: Remove OPAC nodes. Be carefull, this command doesn't care about the origin of nodes but just check if they of content type used by opac module.

OPAC authority sub module also provide a drush command:

  • opac-auth-harvest (alias auth-harvest): Like opac-harvest but for authorities.

Example of use

Here are few examples of using drush commands. For more information about parameters, you can type "drush help" followed by the name of the command. I.e:

drush help opac-harvest

Should return:

Import a set of records into OPAC module.

Options:
 --batch-size                              Number of biblios harvested in each batch process. Default value is harvest_batch_size variable or 1000 
 --file                                    You must give the absolute path of a file. Allows to specify a file containing biblio ids to harvest.   
 --from                                    The first biblio number to harvest.                                                                     
 --print-id                                Print each biblio id                                                                                    
 --servers                                 A comma-separated list of servers for which to harvset.                                                 
 --to                                      The last biblio number to harvest.                                                                      




Harvest from all opac server.
--batch-size, --from and --to options are not define there, so drush will use values specified in the module.

drush harvest -v --print-id=1

-v is the verbose mode of drush and --print-id enables printing all node identifier during harvesting process.


drush harvest -v --print-id=1 --create-only

With --create-only parameter, you tells drush harvest to create new nodes only. Existing nodes are skipped.




Harvest biblios from 1 to 8000 and asking for 1000 biblios per request on "my_server"

drush harvest --servers=my_server --batch-size=1000 --from=1 --to=8000 -v --print-id=1

You can use 'drush opac-servers-list' command to get all servers identifier




Harvest biblios from a file.
The file must contain only biblio ids and must be formatted with one id per line. Like this:

5
7
12
456
...



Now you can type:

drush harvest --servers=my_server --file=/path/to/the/file -v --print-id=1

You can use --batch-size with --file parameter.

Also, OPAC provides 1 drush command for data checking:

drush help opac-check

Options:
 --duplicate-node                          Only possible value: keep-last-id. What to do when duplicate nodes are found. 'keep-last-id': Keep the node with 
                                           the greatest id, delete the others. Default is to do nothing and print a report.                                 
 --duplicate-record                        Only possible value: ask. What to do when duplicate records are found. 'ask': Ask what to do. Default is to do   
                                           nothing and print a report.                                                                                      
 --missing-node                            Only possible value: delete. What to do when a missing node is found. 'delete': Delete the opac_records entry.   
                                           Default is to do nothing and print a report.                                                                     
 --missing-record                          Only possible value: delete. What to do when a missing opac_records entry is found. 'delete': Delete the node.   
                                           Default is to do nothing and print a report.                                                                     
 --verbose                                 Be verbose. Print one line per inconsistency found.

Harvest plugins

By default, during the harvesting process, data coming from connector is directly copied into node fields, without any particular process.

Guide maintainers

Claire Hernandez's picture