Hello Everybody,

we have recently launched our web2.0 platform which is based on Drupal 4.7. We adapted most of the drupal modules to fit our needs. It is really impressing what drupal offers out of the box and how flexible it is though.
Now it is the time to decide if drupal will be the future base for our community. What is a serious worry to us right now is the question of scalability. Our growth plans allot 20000 concurrent users and more. In simple words: We want to be able to grow with our user count and their behaviour.
The central issue is from my point of view the database architecture. Now my question: Who has experiences in scaling mysql (drupals mysql). Someone who runs a high-load mysql db-server told me that replication wouldn't help in scaling as the slave runs its update process in one single thread.

I am looking forward to your comments.

thanks in advance, regards,

Henning

Comments

Operations-1’s picture

Hi Henning... there is a difference between 20.000 CONCURRENT users and 20.000 logged in users... 20.000 concurrent users is 20.000 CONCURRENT DATABASE QUERIES... a user reading a post in your site, should not be counted in the concurrent user number... he is logged in to your site, but he is not requiring anything from the database... now, if he clicks in a link, the database will perform a query and, then, this user should be counted in the concurrent users number... only if 20.000 users click somewhere on your site AT THE SAME TIME you will have 20.000 concurrent users... i think this is too much even for the most scaled system.... are you trying to compete with google? :) to achieve this kind of concurrent database access you need a grid or something similar... but 20.000 logged in users i think that drupal can handle if you have a nice dedicated server....

Souvent22’s picture

Even "a" nice dedicated server woudln't do. If you need to handle 20,000 concurrent users, you are most def. going to employ a tired system with table partitioning, caching, etc. It's a function of your entire stack, from the internet connection, load-balancer up on to the application. It's not really fair or correct to ask if Drupal can handle 20,000 concurrent connections. it's more can your DB handle that many connections, and can PHP/App process 20,000 pages fast enough.

So short answer: that's a complicated question :).

mcmaus’s picture

Hi !
we defined ccu the way you did or pretty similar to it. As our business case has to mention a possible ccu number (=logged in user) we did a estimation how many queries a user causes. its defined like that: CCU causes 30 DB-Queries per minute(30% Single-Table SELECT / 30% Multi-Table SELECT / 30% UPDATE / 10% INSERT)

would you agree? especially in an drupal environment?

regards, Henning

kbahey’s picture

I can't say we run a site with that many concurrent users, but we have experience with bottlenecks for large sites, and how to get over them.

It depends on what these concurrent users are doing. If they just login and sit pretty, then it is easy. If they load pages or invoke AJAX stuff that does lots of queries, then it is a different ball game.

Someone who runs a high-load mysql db-server told me that replication wouldn't help in scaling as the slave runs its update process in one single thread.

That is right. Replication here is for data protection/hot backup, not for scalability.

What you can do is benchmark it. Setup the server(s) in a lab setting, and bombard them with requests. Not directly doable using ab2 or similar tools, since you are using Ajax, but with some front end code you can call those AJAX components.

You may find some of the articles here useful: On drupal performance tuning and optimization for large web sites.
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

Lazlo’s picture

Useful.