When Solr is being run in a replicated environment, writes (indexing) needs to go to a master server, where as searches (reads) can go to any of the slaves which are generally configured with a load balancer / round robin DNS.

So, there should be the option to use different instance information for indexing and for searching.

I'm going to take this forward.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertDouglass’s picture

Start looking in the apachesolr/SolrPhpClient/Apache/Solr/Service/Balancer.php file. I am also in the process of updating this whole package to the next version which you can find here: https://issues.apache.org/jira/browse/SOLR-341

JacobSingh’s picture

Status: Active » Needs review
FileSize
14.66 KB

D'oh, I might have done this differently...

Anyway, here is a patch which allows for multiple instances of slave / master pairs. Please take a look. I know it needs some work, but if you are already working on this, it would be good to know.

robertDouglass’s picture

Not sure of this for 1.0. Jacob, can you make the argument that we absolutely need this for 1.0?

anarchivist’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev

This seems somewhat abandoned, and there seems like there's relatively recent interest given #434314: Load balance search queries. I may look into rerolling JacobSingh's patch for 6.x-1.x-dev this weekend.

robertDouglass’s picture

@anarchivist I think there'd be interest in a revived patch.

Scott Reynolds’s picture

I second that comment. After talking at length with how to scale out our Solr instance, I would like to have one master Solr instance just for the indexing.

robertDouglass’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

I'd love it if you could run with this, Scott.

anarchivist’s picture

Wow, this issue has sat for a while! :) We ended up using a load balanced setup, with a proxy in front of the load balancer to point update requests to a separate Solr server. I'm inclined to think that this would be the best way to do this, and we might be better served with documentation about how to get this set up...

robertDouglass’s picture

@anarchivist - any tips from your configuration can be included in the documentation. Would love to hear about how you did it.

anarchivist’s picture

Sure thing. I'll do my best to work on it, as I'm supposed to be documenting it at work, too. :)

robertDouglass’s picture

Status: Needs review » Needs work
claudiu.cristea’s picture

Status: Needs review » Needs work
FileSize
62.85 KB
65.48 KB
24.11 KB
21.15 KB

I need this functionality too...

Here's a first attempt to implement multi Solr servers against DRUPAL-5--2. Right now this patch provides only the ability to allow different servers for querying and indexing. The "load balancing" feature is implemented only in terms of defining members in the load balancer.

A new tab (Solr Servers) is added to ApacheSolr settings page. See the first image. That page provides tools for adding/editing/deleting Solr servers. Also it provides the ability to configure which server is the indexer and which is part of the load balancer.

Any feed-back is welcomed.

Screenshots:

claudiu.cristea’s picture

Version: 6.x-2.x-dev » 5.x-2.x-dev
Status: Needs work » Needs review
Scott Reynolds’s picture

Assigned: JacobSingh » Unassigned

So very cool. All the Form API stuff I haven't reviewed, and thats a big part of this patch. Unfort, I don't have a D5 site I maintain so my review is just a read through of the code.

So as I understand this patch, I can add load balancers, query only and indexers. But as I understand the changes to apachesolr_get_solr(), if I ask for 'query' I will only get back the first load balancer. Doesn't matter how many load balancer's I have, I always get the first one. Same is true for 'indexer'.

So I think we need a round robin system for this. And the only way to really move variables across multiple sessions is variable_gets/sets. And variable_gets/sets are cached so not sure how effective that will be.

Also, why bother using the key for the load balancer as "balancer" and the indexer server as "indexer". Seems to me, the only place those variables are used is apachesolr_get_solr(), so might as well have them line up with the $service_type. Makes the code easier to read and when doing a dump of the variable table, it will make sense ('balancer' == 'query' ? thats confusing).

claudiu.cristea’s picture

Thanks for your review...

As I state in my comment the "load balancer" feature is not implemented. I only want to open the door to load balancing by defining multiple servers that will took part in the load balancer. In order to select the "query" server (which is unique right now) I'm picking up the first server that is "load balancer member". This is a temporary solution until we will learn to "balance"...

For the index server the things are different. Only one server can be "index server". So the first one will be always the single one...

Yes, renaming 'balancer' to 'query' can add more clarity to the code

claudiu.cristea’s picture

Assigned: Unassigned » JacobSingh
FileSize
20.97 KB

Changed:

  • "balancer" => "query"
  • "indexer" => "index"

It seems less confusing to me too...

claudiu.cristea’s picture

Assigned: JacobSingh » Unassigned
claudiu.cristea’s picture

Status: Needs work » Needs review
claudiu.cristea’s picture

@Scott Reynolds:

So I think we need a round robin system for this. And the only way to really move variables across multiple sessions is variable_gets/sets. And variable_gets/sets are cached so not sure how effective that will be.

There is a file, in the Solr PHP client, SolrPhpClient/Apache/Solr/Service/Balancer.php that it seems to do the job... The problem here is that this file defines a class Apache_Solr_Service_Balancer which uses Solr services of type Apache_Solr_Service (defined in SolrPhpClient/Apache/Solr/Service.php) while we are using our own class Drupal_Apache_Solr_Service which extends Apache_Solr_Service.

I don't have a clear picture right now about the reasons for that extension...

Looking in the code, I found there a real load balancing based on ping timeouts and not a simple rotation. The class is trying to find if a server is heavily loaded before deciding to use it or not...

I'm not an "expert" in Solr PHP client but a way to deal with this is to create a class that extends the Balancer class and to rewrite only methods that are referring to Apache_Solr_Service, replacing him with his successor Drupal_Apache_Solr_Service.

Any thoughts?

Scott Reynolds’s picture

Right so Apache_Solr_Service_Balancer wraps the two arrays of writable and readable Solr objects. Those Solr objects can be the Drupal Solr objects just fine. I seem to remember though that changing the code to just use this Class wasn't quite equivalent to what we are doing. But looking at it now, it doesn't stand out. Would be interested to see what happens when we try to replace the Service implementation with the Balancer implementation.

Looks like the "interface" for interacting with the object is equivalent.

claudiu.cristea’s picture

Yes. The "interface" that is applicable to a Balancer is the same. For example the public methods: add(), addDocument(), addDocuments(), commit(), delete(), deleteById(), deleteByQuery(), optimize(), search() are called in the same way, with the same argument lists... So, replacing the Solr object with a Solr Balancer object should work.

There are also other methods that need recoding. Just an example. In apachesolr_requirements() we are pinging the server to see if is up.

      $solr = apachesolr_get_solr();
      $ping = @$solr->ping(variable_get('apachesolr_ping_timeout', 4));

I cannot see any method or variable to access a specific server through the Balancer object... In this case we will have to build the object as Apache_Solr_Service and use the ping() method...

You're right, I think, we should:

  • Build the list of writables & redables as Drupal_Apache_Solr_Service type
  • Cache them for later use in the request
  • Create the Balancer object using the previous lists
  • Use Balancer or Service, by case
Scott Reynolds’s picture

Well a majority of the 'pinging' happens just prior to executing a command. I believe the Balancer.php handles that as well in its code. But to your point, hook_requirements would have to be rewritten for this anyway, as we would like to loop through all servers.

So i propose we extend the Balancer class and add our 'ping' method (I might call it something else like ping_all_servers). Which is missing from the existing patch btw, checking indexing and query servers to make sure they are up.

claudiu.cristea’s picture

Well, it's not only ping(). We have also: clearCache(), getLuke(), getStatsSummary(), getFields(), deleteMultipleById()... These are on a first look...

claudiu.cristea’s picture

I'm OK with extending the Balancer object.

In order to allow control to the balancer but also to a specific server I'm thinking to refactor the apachesolr_get_solr() object in this way:

No arguments

// Returns the Apache_Solr_Service_Balancer object
$solr = apachesolr_get_solr();

Numeric argument

// Returns the Drupal_Apache_Solr_Service object corresponding to that server ID
$solr = apachesolr_get_solr(2);

Keyed array argument with connection infos.

// Returns the Drupal_Apache_Solr_Service object corresponding to the server with those connection infos.
$solr = apachesolr_get_solr(array('host' => 'localhost', 'port' => '8983', 'path' => '/solr'));

OR

String containing the connection URL. BTW: In my patch I forgot to do a validation for connection URL duplicates (you cannot add the same server twice!).

// Returns the Drupal_Apache_Solr_Service object corresponding to the server with this connection string.
$solr = apachesolr_get_solr('localhost:8983/solr');

This, together with extending Balancer (for multi-pings, etc) for accessing both, the Balancer and a single Service. Course we will microcache using static all those objects inside the function.

Any thoughts?

Scott Reynolds’s picture

Im not a fan of a function that accepts multiple different argument types. So I would purpose.

$solr = apachesolr_get_balancer();
$solr = apachesolr_get_server($host, $port, $path)

I think that will make the code clear. But really other then my lil oppressive compulsiveness with function names, the plan makes sense.

claudiu.cristea’s picture

OK... No complain on this. But... apachesolr_get_server() should take also the server ID (delta) as argument... Don't have an example right now but I feel that we need that kind of flexibility....

I will try to create a patch on based on last comments... Then I will try to port it on 6.x-1.x-dev so that you can test it (porting will be in "blind" mode - I don't have a 6.x-1.x-dev installed!)

claudiu.cristea’s picture

Title: Allow Solr to configure different hosts for indexing and searching » Load balancing implementation
FileSize
26.4 KB
60.99 KB
58.54 KB
30.55 KB

Voila! Here's a functional load balancing Apache Solr implementation based on SolrPhpClient/Apache/Solr/Service/Balancer.php. And you know what? It's working!

The patch is against DRUPAL-5--2. It would be nice if someone will try to port to a 6.x branch... I can do that but I don't have a 6.x installation so I will work as a blind man. And I don't want to do it unless someone really want to test it.

Screenshots:

TODO: I'm a little bit confused about how some functionality like ping(), getStatsSummary(), getFields(), getLuke(), will work on a load balancer. I've implemented this based on balancer "first came, first served". I think that this needs some discussions and dissemination.

claudiu.cristea’s picture

Improving performance when a single server is used (no balancer). We don't need to load all Balancer API/object if we don't need it.

For a good abstraction I switched back to apachesolr_get_solr() (Sorry @Scott Reynolds). Now apachesolr_get_solr() returns a Drupal_Apache_Solr_Service_Balancer object if there are more than one server configured and a Drupal_Apache_Solr_Service object if we have only one server configured. All methods applicable to Drupal_Apache_Solr_Service should work also with Drupal_Apache_Solr_Service_Balancer in an abstract way. New/missed methods can be added to the class in the new file Drupal_Apache_Solr_Service_Balancer.php.

claudiu.cristea’s picture

Version: 5.x-2.x-dev » 6.x-1.x-dev
FileSize
26.45 KB

Here's a NOT tested patch against DRUPAL-6--1. It may contain errors.

@Scott Reynolds, Can you test it?

Scott Reynolds’s picture

Status: Needs review » Needs work

I will here soon I hope. Depends on how today goes and my motivation

But on first read

return $service->deleteByMultipleIds($ids, $fromPending = true, $fromCommitted = true, $timeout = 3600);

Eek! get rid of the default values.

And what is this pattern? What is the Exception class when code = 0 so we can use multiple catch statements. And comments should be above, start with a capital and end with a period.

   catch (Exception $e) {
        if ($e->getCode() != 0) { //IF NOT COMMUNICATION ERROR
          throw $e;
        }
      }
t('Server !host:!port/!path was saved as %name

Could be turned into one thing

t('Server !host_path was saved as %name

so those are my notes on first read. I will go through it and fix those.

claudiu.cristea’s picture

@Scott Reynolds

And what is this pattern? What is the Exception class when code = 0 so we can use multiple catch statements. And comments should be above, start with a capital and end with a period.

   catch (Exception $e) {
        if ($e->getCode() != 0) { //IF NOT COMMUNICATION ERROR
          throw $e;
        }
      }

This piece of code was inspired (in fact copied) from the Balancer.php code... This is the way used to "balance" between hosts. And I forgot there the comment as it was in the Balancer.php. The exception code check should be correct.

I agree for the rest of comments....

pounard’s picture

It might be a newbie question, but each time you call the "_selectWriteService()" method it does iterate through the write services list and does a ping on one (or if you're unlucky) more than one of them.
This means that every end user that will try a search on your site will trigger this, isn't quite heavy and won't it make the whole site slower?

Edit: same for _selectReadService() method. It means that every request made on the PHP site will make it ping a list of servers until one answers with the right ping time, then do another request which is the search request.
Wouldn't be safier and faster to rely on low level load balancing (like apache httpd does) instead of trying to ping all these servers at each client http request on your site?

Re-edit: it was just something that bothered me, I'll read what does the ping() method does really before doing any judgement on this.

Re-re-edit: I don't think so removing the first parameter for ping() method is that good, it means that if you give nothing it makes it wait indefinitely or until the request fail, this might take a long time! Having a default value here may protect bad pieces of code of making the PHP waits too long (which would makes the HTTPd waits too long, which could lead on bad configured sites to denial of service because of crazy long time running of HTTPd threads/forks).

claudiu.cristea’s picture

@pounard

It might be a newbie question, but each time you call the "_selectWriteService()" method it does iterate through the write services list and does a ping on one (or if you're unlucky) more than one of them.
This means that every end user that will try a search on your site will trigger this, isn't quite heavy and won't it make the whole site slower?

Edit: same for _selectReadService() method. It means that every request made on the PHP site will make it ping a list of servers until one answers with the right ping time, then do another request which is the search request.
Wouldn't be safier and faster to rely on low level load balancing (like apache httpd does) instead of trying to ping all these servers at each client http request on your site?

This how the Solr PHP cleint has implemented balancing... I simply want to use the API that they provide. Anyway I think that this approach is better than s simple "blind" rotation... A discussion about how balancing works must take place here http://code.google.com/p/solr-php-client/issues/list

Re-re-edit: I don't think so removing the first parameter for ping() method is that good, it means that if you give nothing it makes it wait indefinitely or until the request fail, this might take a long time! Having a default value here may protect bad pieces of code of making the PHP waits too long (which would makes the HTTPd waits too long, which could lead on bad configured sites to denial of service because of crazy long time running of HTTPd threads/forks).

I removed the first parameter in order to abstract ping() so that I can pass 2 arguments. The balancer ping() may take 2 arguments.

pounard’s picture

@claudiu.cristea

This how the Solr PHP cleint has implemented balancing... I simply want to use the API that they provide. Anyway I think that this approach is better than s simple "blind" rotation...

I totally agree with this approach, but as the PHP execution context is not persistent, the more HTTP requests you make, the more latency you add to you script execution (and the longer is). This not just one request, but it can be two, or three, depending on the SolR instances states, this is really huge if each client hit on your site triggers this.
Servers availlability could be tested one request among 10 (for example) then the result be cached, it could avoid that each client request creates numerous and performance expensive useless HTTP requests.
Edit here: Servers availlability checks could be time based, while your site is on really high load, this checks would trigger rarely (compared to the number of hits), so in this kind of situation it would avoid expensive extra HTTP requests (you could even do blind round-robin among the best ping responses you had before); and with a very low load, each client hit would trigger the availlability checks (which would not be a good idea, but could be a nice trick to keep SolR instances availlability statistics).

A discussion about how balancing works must take place here http://code.google.com/p/solr-php-client/issues/list

Could not find the right entry, do you have the exact link?

I removed the first parameter in order to abstract ping() so that I can pass 2 arguments. The balancer ping() may take 2 arguments.

Ok, I see there is a real goal so I won't argue! But why not keeping named arguments?

pounard’s picture

This is kind out of my skills, but I'd say that on HA environments, I would describe Drupal more like a frontend rather than a the real application. Why? Because on HA environment all the real business stuff, which is the data, is kept on a replicated database (whichever is the backend, MySQL, PostgreSQL, this is not important for Drupal itself).

Why a frontend? Because it would be itself replicated on many replicated HTTPd instances, and clients would hit Drupal instances going through a round-robin or load-balancing proxy, hitting the real PHP script finally only if some reverse proxy judged that the client needed fresh information.

Because, at this point, I see Drupal as a frontend, I see the database backend as the real business application, and because Drupal itself does not known about how the load is balanced, it should not try to balance itself the load of another piece of the HA environment, it should rely on lower layer of the whole environment.

If there is a real need that Drupal does the balancing itself, then it should be at the lower cost possible, and doing a lot of HTTP request is really a bottleneck (it's a lot of IO's, Drupal side, HTTPd side, and even SolR side), so I would adopt rather a "cron style" balancing, I mean that availability checks should be made in parallel of client's requests (time based cron implementation could be a nice solution, because it can be ran on CLI side of environment, which leaves the lowest footprint possible on HTTPd PHP side).

So, I'm not fan at all doing on the fly availability check on clients hits, because it's client hits which creates the load, it's a really big overhead due to the fact that PHP is not persistent. "Cron style" does not mean this checks must absolutely live in cron implementation, but it can be time based and happens only in one of X client hits (reducing the HTTP requests over load by X).

That's an opinion which can be discussed. I think the PHP balancer is a good start, but that this checks have to be delayed (may be they can still live at each client hit, but it MUST be a configurable behavior).

There is also a need to check if there is only one writer, or only one reader, that the load balancing process must be implicitely skipped in this case, no ping must be made if there is only one instance, just hit it whatever happens even on high load. A throttle mecanism could be implemented through regular cron in this case to shutdown the module if the SolR instance does not respond.

pounard’s picture

Try this patch.
I tested making some requests using it, it works, but the real testing would be to do usage SolR instances statistics over time.

If I stick to my point of views, it lacks:
- Cron implementation (optional configurable)
- Current reader and current writer should be a pool
- These should be set every X requests (not set at all until now, my implementation currently does work exactly like yours since I do not store the current server that should be used.
- Needs to rewrite all apachesolr_get_solr() calls that need a writer to apachesolr_get_solr(TRUE) to enforce the load balancer to give a writer.

What do you think?

pounard’s picture

Oups forgot the attached file.

pounard’s picture

Wrong patch attached (sorry I did the diff on the wrong -dev copy). Here is the right patch over the latest -dev version.
(sorry for polluting the queue).

So so sorry, still not the right patch, I'm working with too much apachesolr modules copies. I'll clean my environment and redo it, sorry again.

pounard’s picture

Downloaded current -dev version.
Copied it to apachesolr.orig
Applyed apachesolr-balancer-267831-D6.patch on original folder.
Had some errors, had to remove the CHANGELOG.txt and all .orig patch backup files (that's the reason why my patches were wrong)!!!
Rewrote my code using old patches.
Did the patch over the latest -dev version.
Tested it manually running the drupal instance on which I made the changes.

Finally works (good patch) please test it.

Edit: Really, really sorry for poluting the list, but Drupal does not allow to delete files from a comment or changing them (so so sorry I made the same mistake twice).

pounard’s picture

Added load balancing mode configuration:
- hitcount based servers availability check rotation
- manual server selection (means no balancing) mode
Missing:
- random server selection implementation
- real round robin implementation
- better mode check algorithm (current implementation works but is quite ugly)

Apache Solr module API should have these methods also:
- apachesolr_server_get_readers()
- apachesolr_server_get_writers()
- apachesolr_server_get_all()
Which should mask the variable_get('apachesolr_servers') stuff, because with my implementation I do many iterations on servers, this methods should allow some static cache and a cleaner code.

Each balancing method should be a function for code readability (maybe implemented in the balancer objet as static methods?).

claudiu.cristea’s picture

+++ apachesolr/apachesolr.module	2010-02-07 15:51:08.000000000 +0100
@@ -1311,25 +1349,116 @@
+    else if ($readonly && $delta = variable_get('apachesolr_current_reader', FALSE)) {

I cannot see any variable_set('apachesolr_current_reader', ...) in your code. I don't know for what it stands. It's confusing. True also for 'apachesolr_current_writer'

+++ apachesolr/apachesolr.module	2010-02-07 15:51:08.000000000 +0100
@@ -1311,25 +1349,116 @@
+function apachesolr_get_solr($readonly = TRUE) {

I cannot see anywhere I your code calling this function in this way: apachesolr_get_solr(TRUE|FALSE). That's also confusing....

If I stick to my point of views, it lacks:
- Cron implementation (optional configurable)

Where is the cron implementation that you've requested? Do we need a cron implementation? I don't think so.

- Current reader and current writer should be a pool

Current reader, writer? Let Apache_Solr_Service_Balancer class decide this.

- These should be set every X requests (not set at all until now, my implementation currently does work exactly like yours since I do not store the current server that should be used.

Please read @Scott Reynolds comment in #14 about using a rotation based on variable_get(set). I agree with him. I think that writing/reading Drupal vars is more expensive than pinging Solr servers.

- Needs to rewrite all apachesolr_get_solr() calls that need a writer to apachesolr_get_solr(TRUE) to enforce the load balancer to give a writer.

This is handled automatically by the Apache_Solr_Service_Balancer class. When you perform a $solr->search(), he will use readers. When you will perform a $solr->commit(), he will use a writer... Don't bother with this...

As a general point of view, I think that a simple "blind" rotation is not a good idea and is not a real balancer. I think that we should rely on SolrPhpClient API/classes. If we have doubts or we want to improve the way balancer selects the writer/reader, my advice is to ask for bug fixes/improvements/features there not on Drupal side.

I stand for #29 solution, of course with fixes and other improvements that should be made there.

Powered by Dreditor.

claudiu.cristea’s picture

Cross posted :-) My review referred to #39. Anyway, I underline: Keep it simple! Rely on SolrPhpClient.

pounard’s picture

I cannot see any variable_set('apachesolr_current_reader', ...) in your code. I don't know for what it stands. It's confusing. True also for 'apachesolr_current_writer'

I said that it was imcomplete code.

Where is the cron implementation that you've requested? Do we need a cron implementation? I don't think so.

Cron implementation is simple and can be written later.

Current reader, writer? Let Apache_Solr_Service_Balancer class decide this.

It decides if you let balancing mode in selection mode it won't use these variables.

Please read @Scott Reynolds comment in #14 about using a rotation based on variable_get(set). I agree with him. I think that writing/reading Drupal vars is more expensive than pinging Solr servers.

I don't think so, variable_get() does not coast anything because all variables are statically cached at Drupal bootstrap. And I think making a full HTTP request will coast a lot more than a single SQL query over an already made connection. EDIT: Depends on the environment. Plus, he said he was not sure.this is not a fact.

EDIT: I think the whole point here, is to make this configurable. The site admin must really have the choice between:

  • Manually set servers (can always be usefull)
  • Random selection (do not cost any ping, and algorithm is really simple and do not coast any PHP execution time but random results)
  • Real round-robin (custom implementation, quite effective in most cases, simple algorithm, no pings)
  • Intelligent selection (algorithm really heavy, this deports the load over PHP instead of deporting it over SolR instances)
  • Intelligent selection with throttle delayed (means that the heavy algorithm is triggered regularly but not on every client hit, which spread the load between clients requests)

May be more can be added! But for this it means that a global clean design must be adopted over all performance discussion, because a nice design does not coast performance, its implementation does.
This, because every environment is different. In all case, I prefer a clean design and wrong algorithms over all, because each algorithm can be updated easily inside a function, but a design can't be refactored every moonday morning.

Re-EDIT:
If you choose only one possible implementation, and decides it will be this one and only this one, this won't be HA at all, it will only be a solution for a subset of existing environments, and I think finally site admins will choose system side round robin or load balancing because they won't be able to configure the behavior.

pounard’s picture

@claudiu.cristea #42 I think that SolrPhpClient implementation is not that good. It is really good in fact but supports only one balancing mode. HA environments requires flexibility in configuration because they all are different.

pounard’s picture

BTW, with 4 CNAME records pointing to the same SolR instance on the same host that is living the Drupal instance (my dev box). I experienced from 1 to 2 seconds of PHP execution time delay compared to other to the original apachesolr module in -dev version (without the load balancing patch), using the #29 algorithm.

And, all of this on the very same machine, so you can consider that TCP data transfer is immediate because HTTPd and SolR are on localhost. I'm pretty sure that this ping request is the bottleneck here.

claudiu.cristea’s picture

@pounard

BTW, with 4 CNAME records pointing to the same SolR instance on the same host that is living the Drupal instance (my dev box). I experienced from 1 to 2 seconds of PHP execution time delay compared to other to the original apachesolr module in -dev version (without the load balancing patch), using the #29 algorithm.

It's obvious... In fact you are using the same server but you are loading additional code and perform additional tasks. The balancer is effective under heavy loading.

pounard’s picture

@claudiu.cristea #40
It been two issue queue where I'm actually fighting with you. I don't want this. What I'm saying is no nonsense, it's actually true (on some environment at least). EDIT: Removed some not nice writing, sorry should not say that kind of things, but I really think you are not reading what I'm saying.

It's obvious... In fact you are using the same server but you are loading additional code and perform additional tasks. The balancer is effective under heavy loading.

The misc. CNAME are here only to configure apachesolr module with more than one instance. What you are saying here is total nonsense because, as a dev box, I actually do only one hit at a time.
So, what really happens here is that my PHP execution time do not load more than maybe 10% CPU (not that much) on a dual core CPU, which means in case there is an heavy load, SolR goes to its own CPU (only apache httpd and solr are running here).
So, here, what really happens is that SolR got only two queries, the first one is one ping, the second one is the actual query. There is no load at all here, there isn't even any parallelism for pings, as the PHP execution flow is 100% sequential, SolR can't be under heavy load.
I could do some XDEBUG callee map/time profiling to illustrate this.

In my case, with a SolR which IS NOT under heavy load, the real bottleneck is the SolrPhpClient load balancer implementation which behave itself like it should on a persistent environment (these guys seems to be actually really good Java developers) but in the case of a PHP script execution, in one full Drupal bootstrap, only one SolR query will be done, so the fact that multiple ping can be done to realize only one SolR query is really a nonsense because you actually load the SolR instance virtually by doing multiple HTTP ping request in order to do only one query.

For the cron indexation implementation that will potentially do multiple writer request, this kind of load balancing is good, but for actual client hit, it's bad.

pounard’s picture

Let's flatten the problem and start on a sane basis. The fact is this module needs to handle balancing.

In one hand, you have the SolrPhpClient library, which is a good start indeed. When reading to the code, it's clean and efficient. Internally, it does a series of ping (intelligent load balancing) to find what is the best server at the request time, then, do an internal server cache to use the same non loaded server for a series of classical requests. This algorithm is really good.

In the other hand, you have the PHP execution environment. You have to take account that this environment is not persistent, which means that every client hit on your site will do a full Drupal bootstrap.

The fact here is, if you do a full Drupal boostrap, for each SolR query, the SolrPhpClient load balancing algorithm internal caching is, unfortunately, useless.

This internal caching (which should be time based) was done because in case of multiple request, it does share the resources of an available SolR instance to multiple hits, so the ping would be insignificant among all those queries. This would work if PHP software was persistent, because multiple client hits would go to the same available SolR (which should be what we want here) until others are working under heavy load.

Because we don't have this persistent environment, so let's make it persistent. In order to achieve this, you have two methods. The first is to do a persistent PHP daemon, which checks regularly the servers load, and when it finds an available one, redirect all queries to it. The second one is to emulate a persistent environment into a non persistent environment.

The first one is the best I think, but with Drupal, it can't work on every environments because you don't always have access to the system side. The second one coasts a bit more, but you have tools to do this: variables, and sessions.

Why should we avoid the multiple HTTP requests? Because the TCP I/O is quite slow. I read some paper some time ago from Yahoo developers which were in phase of eliminating all the unnecessary TCP I/O because it would slow too much their web apps. They are right, and I trust those people since they make HA environments since a lot of years.

The fact is I asked this question to some developers and sysadmins arround: "Do a TCP I/O coast more than an update query?" They answered: "It depends. Depends if you have heavy database business stuff, means a lot of constraints checks and a lot of stored procedures or stored triggers". The fact is Drupal has none of these stuff.

What does really coast when you do a TCP I/O is the socket opening and closing. And you have to take into account that to go further, if only one site is under heavy load, doing a ping will also add the response latency. Response latency is not that important because you can put a really tiny timeout. But the more your timeout is tiny, the more pings you will send, the more TCP I/O you will do.

Testing on a non loaded site (really, non loaded at all) with only one client connected, doing these pings really have doubled the PHP script execution time. This statement is really wrong, because HA environments should be able to carry a lot of client hits simultaneously. The more your PHP execution time is long, the less simultaneous client hits you can handle. By stretching up the PHP execution time, you expose yourself to a real involuntary DDoS attacks. It means that if too many clients hits your site and each one of them provokes this kind of latency, your HTTPd threads will "sleep" the time of the TCP requests, it won't handle as much client connections as it should.

These are the facts, and the discussion should begin here.

skwashd’s picture

An alternative to this is to run haproxy on each web head. I have documented how to do it on my blog - http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balanc... Comments / feedback welcome there.

I hope this isn't considered blog spam, the solution was a result of the limitations as explained in this issue.

pounard’s picture

@skwashd #43 This is a good solution, and I think it makes more sense that trying to do it at PHP side.

jpmckinney’s picture

Status: Needs work » Closed (won't fix)

I don't think this is a problem Drupal or apachesolr should solve. Put a proxy in front of Solr, and have the proxy decide where to send the /update, /select, /etc. request.

pounard’s picture

Agree, this issue should be closed I think. This is none of the business of Drupal to attempt any kind of load balancing, it will take more effort than it will gain in the end.

akshar’s picture

Hi,
I have used the last patch "apachesolr-balacing-delayed-better.patch" and have the latest version of apache solr ...there are two functions with the name "apachesolr_update_6006()" . the module already has it plus the patch also has the function with the name name.....so i get a error which function should i keep.....Also, the links of modify and delete do not work is there something that i am doing wrong...any help would be appreciated

akshar’s picture

Status: Closed (won't fix) » Needs review

Hello,

Since the modify and delete links were not working when i debugged the code i found that the server id was not being passed, in the function " function apachesolr_settings_servers($server_id = NULL, $delete = NULL) "...so i added a line $server_id = arg(4) and modify and delete are now working fine. Please let me know weather this is the correct approach. Thanks in advance.

jpmckinney’s picture

Status: Needs review » Closed (won't fix)

Sorry, this patch is not being considered for inclusion into the module. If you are reporting an error in the unpatched module, please open a new issue. If you are just reporting an error in the patch, as I said, we are not considering this patch anymore.

akshar’s picture

Status: Closed (won't fix) » Needs review

OK could you then please tell me which patch should i use? because i am in urgent need of the functionality that this patch provides. Thanks in advance

jpmckinney’s picture

Status: Needs review » Closed (won't fix)

There is no patch for you to use. None of the patches here work. If you want load balancing, as written in comments #51, #52, "put a proxy in front of Solr, and have the proxy decide where to send the /update, /select, /etc. requests." "This is none of the business of Drupal to attempt any kind of load balancing, it will take more effort than it will gain in the end." Please take the time to learn about load balancing solutions such as HAProxy.

akshar’s picture

Thanks a lot for your reply jpmckinney.....ill check that.