Reliability

To ensure our synchronization process does not break, we use the DataSync framework. This module allow us to launch our batch on a system thread, which allow us to forget the PHP max execution time and configure a higher memory limit. This framework also does transactional SQL(we patched the DataSync module to support PostgreSQL, patches were released) which can be usefull in case we got errors, set this mess in order!

Network overview

We use a async push/pull mechanism. Server pushes the synchronization order to clients. Each client will then queue a new DataSync job.
On client side, when job runs, it pulls the initial content list to synchronize. While browsing the object tree, it sometime pulls objects dependencies from server when needed.

Data handling

Data handling is based on a full OO code. Each type of data is represented by a class subclassing what we call an Entity. The Entity class is a simple transport object, which provide internal UUID check and generation methods.

Each object have a type (eg. node), and identifier (for node, the nid); we ensure synchronization link between sites with a generated V4 UUID at push time. The UUID handling is masked for custom entities developer, fully handled by core framework.

Each specialized entity provides at invocation time its own list of dependencies (each dependency is also an entity, but for entity developer, only type and identifier will be shown), and can use a metadata registry to store some external attributes (like configuration option, or whatever) that the client will get at save time.

Client side, a class called EntityParser handles the the Entity cluster saving, pulling information from the server when needed. It's optimized and internally register every object it saves, this avoid double pulling for some objects, and save bandwidth. It also does circular dependency checks, which ensures failsafe cluster browsing at save time.