Managing the whole chain: Dev - Integration - Staging - Production is a major issue for Drupal. It's one of the Drupalcon2007 conclusion. As we discussed during the Building High Traffic and Scalable Businesses with Drupal , it is imperative for us to have migration methods and tools.

At France24, we are massively switching on Drupal. At the end of the year, france24.com website and several satelites will be using Drupal 5.2. We need a simple architecture to allow our team of 5 developers :

- to work on the same code repository and database.

- test the function on an integration environment

- publish content and edit configuration on a staging environment

- easily migrate on production or "synchronise" with a production environment.

Today, a majority of the Drupal dev i asked this question aswer me : "we are working directly on the production server, reporting manually the configurations we've made on a staging environment..."

At France24 we worked during 2 months to find a simple solution to answer this question. We have mounted a complete dev -> integration -> production environment responding to our needs. For some days, this platform is used by our team. It seems to work correctly, but we are still testing it. If you want to join the experiment and share, you are welcome.
Synthesis

So here's is our solution.

We have based the synchronisation mechanism on:

- MySQL Replication

- Even & Odd Database Identifiers

Each developer has his own PHP / DB environement for dev and testing purpose. All the php code is commited by the different developpers on a central SVN and continuously updated on an integration platform. Every modification of the configuration (blocks, menus, views, ...) are made on the integration platform (in the integration DB) using the Drupal GUI. We have created a special module that hacks the DB accesss : every records created on the integration DB have odd ids. At the same time, the production website is using his own DB. The DB access module we have created generates even ids on the production DB. Using this method the production website can permanently feed the integration platform.
For this, we use a MySQL synchronisation from production to integration. The other way is blocked. By using even and odds ids we are sure to avoid records conflicts.

Each time we make a new release of a web site we dump the integration db, make a new folder with the new php code, create a new setting.php that reference the new db, test it for a short moment on a specific virtualhost on the production platform , and if every thing is ok, change the www alias of our webserver so that it links to the new version of the website.

Using this method :

- we have a staging platform that is tested permanently (the integration platform)

- the switching period is quite short (changing an alias)

- we can roll back very easily by changing the www alias of our website

- we make a backup of the integration db 2 times a day, so that we can roll back in case of an unresolved blank page bug... :)

Authors:
Michel Lévy-Provencal: Internet Technical Manager (mlevy-provencal@france24.com)
Corentin Delorme: Drupal Developper
Elbou Elbechir: Drupal Developper
Luc Mombrand: Drupal Developper

If you need more technicla informations on the solution or wanna share ideas, you can contact us by email.

Comments

ronan’s picture

The even/odd id idea is a really creative solution to a problem that affects everybody who works on drupal professionally (as well as responsible amateurs). I'm glad to see that the Drupal community has started to commit brainpower to solving this. Thanks for sharing your ideas.
------------------------------------
Ronan - Gorton Studios - http://www.gortonstudios.com/

------------------------------------
Ronan
Founder - NodeSquirrel - https://www.nodesquirrel.com/
Agency Tools Lead - Pantheon - https://www.pantheon.io/

jwalling’s picture

In 1990, I configured two Data General mini computers (A & B) to journal transactions which were communicated bi-directionally. We used Computer A for production and Computer B for backup and batch reporting. Computer A indexed tables with even numbers and Computer B indexed tables with odd numbers. If computer A crashed, users were moved to Computer B with a digital switch. When computer A was running again and all the B transactions were updated on A, the users were moved back to computer A.

The journals were linear based on time stamps. We could journal several days which provided adequate time to restore the crashed computer.

If Drupal had (has?) a way to journal transactions, the above technique could be equally useful. Perhaps a little tweaking would make it possible.

John

ryan_courtnage’s picture

Is you module going to be released to the public domain? It'd be create to get the community rallying around it!!

Thanks you!!

mikiane’s picture

We are going to publish a little more information about the method next week.
The code will also be published as a contrib.

We'll add the links in this thread.

Michel Lévy-Provençal

kasperjames’s picture

i assume a list including bazaar and other sql database installations will be attached to that as well?
because as a new comer i have learned that i have a great nack for such utilization of resources.

i would be interested in contributing.

thanks,
kj

Thanks for your help and support:
www.kasperjames.com/

pescobar’s picture

hi there,

am very intrigued by this process and appreciate you posting about your methods... it's something that i've been struggling with as well for a while now

i've read about using the even/odd trick to provide master-master mysql replication in a HA environment

you can set the row increment in the mysql config, that might be possiblity for not having to hack a db access module

esco

Jose Reyero’s picture

Hey, this is a really cool solution for a problem all of us have!

I'm thinking just putting hands to coding such thing, maybe writing some replacement db layer or something like that...?

If you guys could share some more details or share some code that would be great!

Dave Cohen’s picture

I remember having that converstation more than once at DrupalCon myself. When it came up, I mentioned an old post where I described my own approach. Seeing your post here inspired me to check my latest code into my sandbox, in case it's of use to anyone.

My approach is similar to yours. Where you use even and odd ids, I use ids less than 1000 for integration, and greater than 1000 for production and development servers. I have a limit of 1000 nodes, terms, menu items, etc for integration, which I find is enough, but your approach would be superior if you never have to worry about that. I don't try to bring data from production back to integration, although I can see how that would be useful.

So, how to handle database entries that have no numerical id?

And do you handle files that may have been uploaded to the production server?

Thanks for sharing. I look forward to seeing more.

mikiane’s picture

So, how to handle database entries that have no numerical id?
you mean translations for example... good question. No answer. Gonna investigate about it.

And do you handle files that may have been uploaded to the production server?
By using Rsync for synchronizing prod / integration files directory.

Dave Cohen’s picture

So, how to handle database entries that have no numerical id?
you mean translations for example... good question. No answer. Gonna investigate about it.

For my purposes, I modify some tables which have no unique index, so that they have one. (Although this unique index is not necessarily numeric in my scheme.) Then, as part of my update.php form_alter, I add some checkboxes. If checked, the code will delete rows from the site being updated, then those rows get replaced with content from the integration server. It uses INSERT IGNORES and is relatively complicated. That's one reason I contributed my module to my sandbox rather than the regular contrib directory. This stuff is for experienced users only.

And do you handle files that may have been uploaded to the production server?
By using Rsync for synchronizing prod / integration files directory.

Do you set up a different files directory for each of your site's released versions? You'd have to, to really create an accurate snapshot of the site at a given moment in time.

mikiane’s picture

Do you set up a different files directory for each of your site's released versions? You'd have to, to really create an accurate snapshot of the site at a given moment in time.

It's exactly what we do. We also generate a new DB for each version associated to a new setting.php... So that we can can roll back easily if we have a problem.

mikiane’s picture

Here's some information on how we implemented the DB Hack.

Firstly we use the runkit PHP package : http://pecl.php.net/package/runkit

MySQL configuration:

### MASTER & SLAVE ###
# Checking MySQL configuration
vi /etc/mysql/my.cnf

"skip-networking" must be desactivated.

Note: In new versions of MySQL, this parameter is renamed "bind-address".
"server-id" is diffrent on each server

Exemple:
MASTER: "server-id = 1"
SLAVE: "server-id = 2"

"auto_increment_increment" set to 10. Exemple: "auto_increment_increment = 10"

"auto_increment_offset" set to the server-id.

Exemple:
MASTER: "auto_increment_offset = 1"
SLAVE: "auto_increment_offset = 2"

# Restart MySQL servers
/etc/init.d/mysql restart

Dump the production DB and inject it into the integration DB

### MASTER ###

mysqldump -u root -p nom_de_la_base > /root/backup.sql

scp /root/backup.sql root@integration.france24.com:/root/

### SLAVE ###

mysql -u root -p nom_de_la_base < /root/backup.sql

# To fields in the 'sequences' table dont have to be replicated.

mysql -u root -p -e "SELECT * FROM sequences WHERE name = 'node_nid' OR name = 'node_revisions_vid'" nom_de_la_base
mysql -u root -p -e "UPDATE sequences SET id = id + offset WHERE name = 'node_nid' OR name = 'node_revisions_vid'" nom_de_la_base
mysql -u root -p -e "SELECT * FROM sequences WHERE name = 'node_nid' OR name = 'node_revisions_vid'" nom_de_la_base

Replication configuration
### MASTER ###
mysql -u root -p
> GRANT REPLICATION SLAVE ON *.* TO 'nom_d_utilisateur_slave'@'integration.france24.com' IDENTIFIED BY 'mot_de_passe_slave';
> FLUSH PRIVILEGES;
> SHOW MASTER STATUS;

### SLAVE ###
mysql -u root -p
> GRANT REPLICATION SLAVE ON *.* TO 'nom_d_utilisateur_master'@'production.france24.com' IDENTIFIED BY 'mot_de_passe_master';
> FLUSH PRIVILEGES;

> CHANGE MASTER TO MASTER_HOST='production.france24.com', MASTER_USER='nom_d_utilisateur_slave', MASTER_PASSWORD='mot_de_passe_slave', MASTER_LOG_FILE='File_master', MASTER_LOG_POS=Position_master;
> START SLAVE;
> SHOW SLAVE STATUS\G;
« Slave_IO_Running » et « Slave_SQL_Running » doivent être à « Yes »

> SHOW MASTER STATUS;

### MASTER ###
mysql -u root -p
> CHANGE MASTER TO MASTER_HOST='integration.france24.com', MASTER_USER='nom_d_utilisateur_master',
MASTER_PASSWORD='mot_de_passe_master', MASTER_LOG_FILE='File_slave', MASTER_LOG_POS=Position_slave;
> SLAVE START;
> SHOW SLAVE STATUS\G;

« Slave_IO_Running » & « Slave_SQL_Running » have to be set to « Yes »

### MASTER ###
> SLAVE STOP;
> REVOKE ALL PRIVILEGES ON *.* FROM 'nom_d_utilisateur_slave'@'integration.france24.com';
> DELETE FROM mysql.user WHERE User='nom_d_utilisateur_slave';

### SLAVE ###
> SLAVE STOP;
> REVOKE ALL PRIVILEGES ON *.* FROM 'nom_d_utilisateur_master'@'production.france24.com';
> DELETE FROM mysql.user WHERE User='nom_d_utilisateur_master';

The DB Hack Module:

content of the france24_hack.info file:
; $Id: france24_hack.info $
name = France24_hack
description = Allows dynamic replacement of function db_next_id.

content of the france24_hack.module file:

 
$newCode = '$name = db_prefix_tables($name);
  db_query(\'LOCK TABLES {sequences} WRITE\');
  $id = db_result(db_query("SELECT id FROM {sequences} WHERE name = \'%s\'", $name)) + 10;
  db_query("REPLACE INTO {sequences} VALUES (\'%s\', %d)", $name, $id);
  db_query(\'UNLOCK TABLES\');
 
  return $id;';
 
runkit_function_redefine('db_next_id', '$name', $newCode);
 
mikiane’s picture

We will package a module ASAP

incaic’s picture

Any word on on how it's going?

mikiane’s picture

We have made our first migrations using this method and it's going quite well.
We had a problem with the cache and history tables. We have created a patch that disables the synchro on those tables.
Another point very important: this method depends also on the modules you are using. You must test the solution carrefully before using it....

yfreeman’s picture

While on a similar subject

I have the need to have two drupal installs, one internal (intranet) and one external (internet)
The installs will be sychronising the content from one to the other.

the internal install will be available when the connection to the internet drop, (which it often does) and then resync with the server (external install) once the connection is back up

The problem with this is the ID's that won't be unique anymore.

This same problem would occur if you would be clustering a bunch of server running drupal with a load balancer, and the had multiple Databases for redundancy. The uniqueness of the ID's would fail.
I remember an article way back on the structure of high traffic website [it was flickr] that had redundant databases, the article mentioned that they can't use auto increment on the database because the data is scattered about many servers.

One of the solutions would be to make an intermediate table between any uniquely identified records with an id that is generated, say based on time and random numbers . [year-month-day-hour-minute-second-millisecond-locationID-random_number1-random_number2-random_number3]
e.g. 20080118014144011534

All links would be rewitten, similar to pathauto. So every node or uniquely identified records truly has a unique identifier.

There is probably a more simpler solution. I'm thinking the number crunching and matching the tables would cause some lag. Testing needs to be done.

Writing a synchronizing module would require deep integration with Drupal, especially on the database layer, flagging new, modified, deleted etc. records that have to be synched with timestamps, and sychronizing time.

This for me would be a long term module, I plan on developing such a module in the comming weeks.

Dave Cohen’s picture

I've suggested a BOF for DrupalCon related to this topic. If anyone's interested in that please vote or comment on the session proposal. Thanks.

http://boston2008.drupalcon.org/session/updating-and-upgrading-live-sites

mpaler’s picture

I am wondering if the community has any updates on setting up a system like this?

Mikiane,

Do you have any new updates on how things are going?
Did you ever release a module?
Any tips/hints on which modules are problematic with this setup?

Couple additional questions while I'm at it. >;-)

1. You say:
"And do you handle files that may have been uploaded to the production server?
By using Rsync for synchronizing prod / integration files directory."

Are you committing your files dir to version control?

2. Why do we need to use PECL runkit?

Dave, how did the meeting on this at Drupalcon go? Any documentation/ideas/etc come out of that?

Any updates are GREATLY appreciated.

Mike

Mark Theunissen’s picture

Trackback: http://codebaboon.com/method-syncronising-database-changes

I have a solution that I want to experiment with, using Toad for MySQL and odd/even ids. It's too long to put here. Comments appreciated!

__________________________________________________________

Mark Theunissen

Code Baboon
Drupal based services

mpaler’s picture

Has anybody evaluated this?

http://drupal.org/project/replication

JBI’s picture

This saturday will be held a session about staging. France24.com team will explain there odd and even numbers solution.

That way we will catch up with BOF session at last Drupalcamp when we were not able to engage a dialogue across the ocean because the sound was not working.

More information on https://barcamp.pbwiki.com/DrupalCampParisStaging
Fell free to edit the page and tag the video.

We are considering around 3PM French time.

As there is time differences could you please tell me when at the earliest you would be available ? on the comment.

ericfaucon’s picture

Hello everybody !

I looked over the best solution for the last months, but did not find the perfect solution yet.
I succeeded in setting up the France24 solution, excluding some tables (accesslog, history, cache*, etc...) directly from the replication using the "replicate-ignore-table" and "replicate-wild-ignore-table" statements in the my.cnf file.

The problem I'm facing now is caused by the CCK fields :
Any CCK field for a specific content type is added as a MySQL field to the table.
If you use the same CCK field for another content type, a content_field_myfield table is created, and the field in the table deleted.
From now, the queries coming from the prod server to update the field are not valid, as the field is not in the table anymore.
Replication is therefore broken

Then, testing may take a while, and I'm not receiving the prod data, so I can't switch my www to the staging environment.

Does anybody have that kind of problem ?

What about the France24 team ? How do you handle database structure changes ?

Thank you !