Does anyone know how Drupal.org (D6) splits their MySQL read and write queries to a slave and master database? I am trying to get all "read" queries go to a slave database and all "write" queries go to a master database.

I was told that there isn't a safe way to split all SELECT queries to go to a slave (read-only) because not all SELECT queries are slave-safe. But, Drupal.org does it successfully so I would greatly appreciate if someone could give me information on this.

I tried this patch by SwampCritter, but I couldn't get it to work: http://msrwd.svn.sourceforge.net/viewvc/msrwd/D6/
Whenever I try to create new content, I get an error message: "Post could not be saved" and the post is not saved.

I have also tried to modify /includes/database.mysql-common.inc on my own in order to split the read/write queries to go to a slave/master db:

function db_query($query) {
  $args = func_get_args();
  array_shift($args);
  $query = db_prefix_tables($query);
  if (isset($args[0]) and is_array($args[0])) { // 'All arguments in one array' syntax
    $args = $args[0];
  }
  _db_query_callback($args, TRUE);
  $query = preg_replace_callback(DB_QUERY_REGEXP, '_db_query_callback', $query);

  /* Added  by apersaud
   Integrating multiple read/write databases
  */
  if(strpos(strtolower($query),"select") === 0){
    db_set_active('readonly');  //this will not contain any data from the master (write) database untill replication happens
  }
  else {
    db_set_active('default');
  }

  return _db_query($query);
  db_set_active('default');
}

And here's how I set up the read/write datbases in settings.php:

$db_url['default'] = 'mysql://username:password@localhost/databasename';
$db_url['readonly'] = 'mysql://username:password@localhost/databasename';

However, this also didn't work. Again, when I try to create new content, I get an error message "Post could not be saved" and also the only row of data in the node_access table is deleted upon this action (which causes access permissions problems for users). Also, in the log entries I get these errors:

Duplicate entry '0' for key 2 query: INSERT INTO node (vid, type, language, title, uid, status, created, changed, comment, promote, moderate, sticky, tnid, translate) VALUES (0, 'story', '', 'Test 9', 6, 0, 1242671928, 1242671928, 0, 0, 0, 0, 0, 0) in .../includes/common.inc on line 3422.

Does anyone know what patch Drupal.org applied in order to achieve this functionality successfully in D6?

Comments

apersaud’s picture

I found out drupal.org uses a patch developed by "Four Kitchens" in order to achieve this functionality, so I don't know how they did it.

However, I was able to hack something together that seems to be working so far on Drupal 6. I modified one file '/includes/database.mysql-common.inc':

function db_query($query) {
  $args = func_get_args();
  array_shift($args);
  $query = db_prefix_tables($query);
  if (isset($args[0]) and is_array($args[0])) { // 'All arguments in one array' syntax
    $args = $args[0];
  }
  _db_query_callback($args, TRUE);
  $query = preg_replace_callback(DB_QUERY_REGEXP, '_db_query_callback', $query);
	
  /* Added on 5/21/2009 by apersaud
     Integrating multiple read/write databases
 */
    if(strpos(strtolower($query),"select") === 0){
        $commits = array('alter', 'insert', 'update', 'delete', 'flush', 'lock','create');
	$is_commit = false;
	foreach($commits as $type) {
	  if((strpos(strtolower($query),$type))){
          $is_commit = true;
         }
       }
	if($is_commit){
	  db_set_active('default');  	
	} else{
         db_set_active('readonly');  
	}	
    }
    else {
      db_set_active('default');
    }	

  return _db_query($query);	
}

It's not the prettiest solution, but it works. I did it this way because not all SELECT read queries are slave-safe, so I tried to catch those non-slave-safe queries and direct them to the master database. I might catch some SELECT queries that should probably go to the slave but I don't think there's any harm in it going to the master instead.

manjeet’s picture

Directing queries containing LAST_INSERT_ID() function to the slave returns incorrect results.
Adding one more conidtion:
if(strpos(strtolower($query),"LAST_INSERT_ID()") === FALSE){
....
....
}
else{
db_set_active('default');
}

zhaogaoxin’s picture

I combine the both, so the final function is like this

function db_query($query) {
  $args = func_get_args();
  array_shift($args);
  $query = db_prefix_tables($query);
  if (isset($args[0]) and is_array($args[0])) { // 'All arguments in one array' syntax
    $args = $args[0];
  }
  _db_query_callback($args, TRUE);
  $query = preg_replace_callback(DB_QUERY_REGEXP, '_db_query_callback', $query);

  /* 
   * Integrating multiple read/write databases
   */
  if(strpos(strtolower($query), "select") === 0 && strpos(strtolower($query), "last_insert_id") === FALSE){
    $commits   = array('alter', 'insert', 'update', 'delete', 'flush', 'lock','create');
    $is_commit = false;
    foreach($commits as $type) {
      if((strpos(strtolower($query), $type))){
        $is_commit = true;
      }
    }
    
    if($is_commit){
      db_set_active('default'); 
      //drupal_set_message('default');
    } else{
      db_set_active('readonly');
      //drupal_set_message('readonly'); 
    }
  }
  else {
    db_set_active('default');
  }
  /*
   * End read/write router
   */

  return _db_query($query);
}

It works well, thank you all.

arhak’s picture

subscribing

ronnbot’s picture

We're using Pressflow since it has support for slave databases. But, the slave databases are severely under utilized since db_query_slave/db_query_range_slave/etc have to be used to send the query to the slaves. Unless you want to patch every module to use db_query_slave instead of db_query and so on, no more than 10% of query will ever go to the slaves.

So, what we did is patch the include file, 'database.mysqli.inc', specifically the function '_db_query()'.

The patch below basically overwrites the parameter $slave (if false) by checking the query if it is slave save by doing some logic:

  • it is a select
  • does not contain key terms such as 'last_insert_id' and etc.
  • querying tables we deemed slave safe.

The list of slave save tables is defined in the settings.php. For example:

$conf['slavable_tables'] = array(
  "blocks*", "boxes", "cache*", "contact*",
  "content_group*", "content_node*", "date*", 
  "imagecache*", "languages", "locales*", "menu*", 
  "permission", "role", "system", "term_data", 
  "url_alias", "variables", "views*", "vocabulary*"); 

Basically, we chose tables that are fairly 'static' in nature. Also, "*" is the wild card, so that blocks and blocks_roles tables are included.

Here is the patch:

--- database.mysqli.inc	2010-11-18 10:42:13.000000000 -0800
+++ database.mysqli.inc	2010-11-19 16:39:07.000000000 -0800
@@ -91,6 +91,18 @@ function db_connect($url) {
 function _db_query($query, $debug = 0, $slave = FALSE) {
   global $active_db, $active_slave_db, $queries, $user;
 
+  $slave = $slave ? $slave : _db_query_is_slave_safe($query); // send to to slave if 'safe'
+
+//  static $counter = 0, $a, $b;
+//  if (!$slave) {
+//    $a++;
+//    drupal_set_message("noslave $a: ".$query);
+//  } else {
+//    $b++;
+//    drupal_set_message("toslave $b: ".$query);
+//  }
+//  $counter++;
+  
   if (variable_get('dev_query', 0)) {
     list($usec, $sec) = explode(' ', microtime());
     $timer = (float)$usec + (float)$sec;
@@ -414,3 +426,90 @@ function db_column_exists($table, $colum
  * @} End of "ingroup database".
  */
 
+function _db_query_is_slave_safe($query) {
+  static $slave_safe_tables_pattern = false;
+
+
+  if ($slave_safe_tables_pattern == false) {
+    $slave_safe_tables = variable_get("slavable_tables", array());
+
+    if (count($slave_safe_tables)) {
+      $slave_safe_tables_pattern = '/^('. strtr(implode("|", $slave_safe_tables), array("*" => ".*")) .')$/';;
+//      drupal_set_message('$slave_safe_tables_pattern '.$slave_safe_tables_pattern);
+    }
+  }
+
+  if ($slave_safe_tables_pattern == false) { // still nothing, assume slave are not desirable
+    return false;
+  }
+
+  $query = strtolower($query);
+  if (strpos($query,"select ") !== false && !preg_match("/alter|insert|update|drop|delete|flush|lock|create|last_insert_id|found_rows|sql_calc_found_rows|row_count/", $query)) {
+    // a select and contains none of the unwanted key words
+
+    $tokenize_query = _db_query_tokenize($query);
+    $tables = array();
+    $table_count = 0;
+    for ($i = 0; $i < count($tokenize_query); $i++) {
+      $token = $tokenize_query[$i];
+      if (isset($tokenize_query[$i+1]) && in_array($token, array("join", "from"))) {
+        $table = $tokenize_query[$i+1];
+        $tables[$table] = $table;
+        $table_count++;
+      }
+    }
+
+    if ($table_count == 0) { // couldn't detect a table, must be a reeealy complicated query
+      return false;
+    }
+
+    foreach ($tables as $table) {
+      if (!preg_match($slave_safe_tables_pattern, $table)) { // table is not in the list
+        return false;
+      }
+    }
+
+    return true; // if we get here, it's all good to go to slave
+  }
+  return false;
+}
+
+function _db_query_tokenize($query, $cleanWhitespace = true) {
+
+  /**
+   * Strip extra whitespace from the query
+   */
+  if($cleanWhitespace) {
+   $query = ltrim(preg_replace('/[\\s]{2,}/',' ',$query));
+  }
+
+  /**
+   * Regular expression based on SQL::Tokenizer's Tokenizer.pm by Igor Sutton Lopes
+   **/
+  $regex = '('; # begin group
+  $regex .= '(?:--|\\#)[\\ \\t\\S]*'; # inline comments
+  $regex .= '|(?:<>|<=>|>=|<=|==|=|!=|!|<<|>>|<|>|\\|\\||\\||&&|&|-|\\+|\\*(?!\/)|\/(?!\\*)|\\%|~|\\^|\\?)'; # logical operators
+  $regex .= '|[\\[\\]\\(\\),;`]|\\\'\\\'(?!\\\')|\\"\\"(?!\\"")'; # empty single/double quotes
+  $regex .= '|".*?(?:(?:""){1,}"|(?<!["\\\\])"(?!")|\\\\"{2})|\'.*?(?:(?:\'\'){1,}\'|(?<![\'\\\\])\'(?!\')|\\\\\'{2})'; # quoted strings
+  $regex .= '|\/\\*[\\ \\t\\n\\S]*?\\*\/'; # c style comments
+  $regex .= '|(?:[\\w:@]+(?:\\.(?:\\w+|\\*)?)*)'; # words, placeholders, database.table.column strings
+  $regex .= '|[\t\ ]+';
+  $regex .= '|[\.]'; #period
+  $regex .= '|[\s]'; #whitespace
+
+  $regex .= ')'; # end group
+
+  // get global match
+  preg_match_all( '/' . $regex . '/smx', $query, $result );
+
+  // return tokens
+  $arr = $result[0];
+  $new_arr = array();
+  foreach ($arr as $i => $val) {
+    $val = trim($val);
+    if ($val) {
+      $new_arr[] = $val;
+    }
+  }
+  return $new_arr;
+}
jamesliu78’s picture

I found a problem on
if (strpos($query,"select ") !== false && !preg_match("/alter|insert|update|drop|delete|flush|lock|create|last_insert_id|found_rows|sql_calc_found_rows|row_count/", $query)) {

If I send query llike this
SELECT node.title, node.created FROM node
It will return false, because the syntax have "create" on "node.created".

So I changed the code to

$query = trim(strtolower($query));
  if (strpos($query,"select ") === 0 && !preg_match("/alter |insert |update |drop |delete |flush |lock |create |last_insert_id|found_rows|sql_calc_found_rows|row_count/", $query)) {

But I don't know it safe? or there have better solution?

zhuzhuojiang’s picture

marked

nilanjan.kundu’s picture

Got couple of issues with this patch listed below:
1) When the "Devel" Module is on it is not working as expected.
> Devel add username and source function name before each query like
$query = '/* '. $name .' : '. $bt[2]['function'] .' */ '. $query;
As a result function having the keywords (i.e *lock*) will go in Master DB instead of Read replica.
> If there is any issue in MySQL query, problem in mysql_error and trigger_error.

2) When you have more read only tables (in my case having more than 100 read tables), there might be performance issue. As a solution check master must tables and rest of the query sent to Read only DB.

3) For heavy transaction site MySQL replication process itself would takes around 2-3 sec. That is why in immediate read ((any data retrieval attempt before replication of data from Master to Slave DB) page SQLs will need to be executed from Master DB. For example profile edit page.
Taking care of all these issue, updated patches are listed below:

diff --git a/includes/database.mysql.inc b/includes/database.mysql.inc
index 45c03cf..9c61499 100644
--- a/includes/database.mysql.inc
+++ b/includes/database.mysql.inc	
@@ -89,8 +89,23 @@ function db_connect($url) {
  * Helper function for db_query().
  */
 function _db_query($query, $debug = 0) {
-  global $active_db, $queries, $user;
-
+  global $active_db, $queries, $user, $master_db_sql, $active_db_slave;
+  static $master_must_tables;
+  $query_temp = trim(strtolower($query));
+  
+  if(!is_array($master_must_tables)){
+    $master_must_tables = array("cron*", "variable", "sessions");
+    $current_url = $_GET['q'];
+    if(preg_match( '/user\/[0-9]+\/edit/', $current_url)){
+      $master_must_tables[] = 'users';
+	   $master_must_tables[] = 'profile_values';
+    }
+  }
+   
+  if (count($master_must_tables)) {
+    $master_must_tables_pattern = '/(`?)(\w.*)('. strtr(implode("|", $master_must_tables), array("*" => ".*")) .')(`?)\s/i';
+  } 
+  
   if (variable_get('dev_query', 0)) {
     list($usec, $sec) = explode(' ', microtime());
     $timer = (float)$usec + (float)$sec;
@@ -104,8 +119,16 @@ function _db_query($query, $debug = 0) {
     $name = str_replace(array('*', '/'), '', $name);
     $query = '/* '. $name .' : '. $bt[2]['function'] .' */ '. $query;
   }
-
-  $result = mysql_query($query, $active_db);
+    $common_db = '';
+   
+   if(preg_match("/truncate |alter |insert |update |drop |delete |flush |lock |create |last_insert_id|found_rows|sql_calc_found_rows|row_count/", $query_temp) || preg_match( $master_must_tables_pattern, $query)){
+       $result = mysql_query($query, $active_db);
+	   $common_db = 'master';
+	}
+   else{
+       $result = mysql_query($query, $active_db_slave);
+	   $common_db = 'slave';
+	}  
 
   if (variable_get('dev_query', 0)) {
     $query = $bt[2]['function'] ."\n". $query;
@@ -116,16 +139,16 @@ function _db_query($query, $debug = 0) {
   }
 
   if ($debug) {
-    print '<p>query: '. $query .'<br />error:'. mysql_error($active_db) .'</p>';
+    print '<p>query: '. $query .'<br />error:'. mysql_error($common_db == 'slave' ? $active_db_slave : $active_db) .'</p>';
   }
 
-  if (!mysql_errno($active_db)) {
+  if (!mysql_errno($common_db == 'slave' ? $active_db_slave : $active_db)) {
     return $result;
   }
   else {
     // Indicate to drupal_error_handler that this is a database error.
     ${DB_ERROR} = TRUE;
-    trigger_error(check_plain(mysql_error($active_db) ."\nquery: ". $query), E_USER_WARNING);
+    trigger_error(check_plain(mysql_error($common_db == 'slave' ? $active_db_slave : $active_db) ."\nquery: ". $query), E_USER_WARNING);
     return FALSE;
   }
 }
diff --git a/includes/database.inc b/includes/database.inc
index 0a68008..7a85477 100644
--- a/includes/database.inc
+++ b/includes/database.inc	
@@ -122,8 +122,8 @@ function db_prefix_tables($sql) {
  * @return the name of the previously active database or FALSE if non was found.
  */
 function db_set_active($name = 'default') {
-  global $db_url, $db_type, $active_db;
-  static $db_conns, $active_name = FALSE;
+  global $db_url, $db_type, $active_db, $active_db_slave;
+  static $db_conns, $db_conns_slave, $active_name = FALSE;
 
   if (empty($db_url)) {
     include_once 'includes/install.inc';
@@ -134,6 +134,16 @@ function db_set_active($name = 'default') {
     // Initiate a new connection, using the named DB URL specified.
     if (is_array($db_url)) {
       $connect_url = array_key_exists($name, $db_url) ? $db_url[$name] : $db_url['default'];
+       if (is_array($connect_url)) {
+         $connect_url_slave = $connect_url['slave'];
+         $connect_url = $connect_url['master'];
+         if (is_array($connect_url)) {
+           $connect_url = $connect_url[mt_rand(0, count($connect_url))];
+         }
+         if (is_array($connect_url_slave)) {
+           $connect_url_slave = $connect_url_slave[mt_rand(0, count($connect_url_slave))];
+         }
+       }
     }
     else {
       $connect_url = $db_url;
@@ -150,14 +160,22 @@ function db_set_active($name = 'default') {
     }
 
     $db_conns[$name] = db_connect($connect_url);
+     if (isset($connect_url_slave)) {
+       $db_conns_slave[$name] = db_connect($connect_url_slave);
+     }
+     else {
+       $db_conns_slave[$name] = $db_conns[$name];
+     }
   }
 
   $previous_name = $active_name;
   // Set the active connection.
   $active_name = $name;
   $active_db = $db_conns[$name];
+  $active_db_slave = $db_conns_slave[$name];
 
-  return $previous_name;
+  //return $previous_name;
+  return array_search($previous_name, $db_conns);
 }