Problem/Motivation
Azure's managed db with MariaDB/MySQL is being reported as 3x slower than a non managed container db.
I've come up with a document to challenge this issue.
Steps to reproduce
Compare Drupal performance on Azure managed db with Drupal performance on a normal container db.
In a fairly simple comparison we've noticed the Azure managed db to be 3x slower performance on average.
Proposed resolution
Let's find a winning recipe to maximize performance.
Remaining tasks
Review and discuss.
ContentsChoosing the right version of MySQL/MariaDB 1
DB Monitoring and Profiling: 3
AZURE conclusion to challenge 3
Drupal Application Performance 3
Upgrade Drupal to 10.1.5, skip 10.0.x 3
Latest new core includes some performance enhancements 3
Application Performance Patches 3
Performance profiling to identify pain points 4
MySQL Performance considerations within an AWS managed db (RDS)
-
To get more iops we had to go to a better class of storage. This vastly increased cost but also performance.
MySQL Performance considerations within an Azure managed db system.
Choosing the right version of MySQL/MariaDB-
**Upgrade MySQL/MariaDB Versions**: Ensure that you are running the latest version of MySQL/MariaDB supported by Azure for potential performance improvements - (upgrade from 10.3 MariaDB to a newer release)
Increase IOPS (Input / Output Per Second)
-
**Choose the Right Service Tier**: Azure offers different service tiers with varying levels of performance. Consider upgrading to a higher tier if you are experiencing performance issues. Higher tiers offer more resources like CPU and memory. This is often the simplest way to improve performance. You can upgrade to a higher service tier with more CPU, memory, and better storage performance.
-
**Faster and costlier** selecting a higher service tier can increase IOPS, it may also come with a higher cost.
-
**Read Replicas**: For read-heavy workloads, you can consider using read replicas to offload read traffic from the primary database, which can help improve performance.
-
**Monitoring and Tuning**: Continuously monitor your database's performance and make adjustments as needed based on actual usage patterns.
innodb_buffer_pool_size:
The `innodb_buffer_pool_size` determines the size of the InnoDB buffer pool, which is a critical component of MySQL's InnoDB storage engine.
-
Increasing `innodb_buffer_pool_size` allocates more memory for caching data and index pages. This means that more of your frequently accessed data can be stored in memory, reducing the need to read from disk. This can lead to significantly faster SELECT queries, as data retrieval from memory is much faster than disk I/O.
-
Reduced Disk I/O - A larger buffer pool reduces the frequency of disk reads
-
Improved Concurrency - When data is readily available in memory, there is less contention for disk resources.
some considerations and potential drawbacks
-
**Memory Usage**: A larger buffer pool consumes more memory.
-
**Diminishing Returns**: Increasing the buffer pool size beyond a certain point may yield diminishing returns. The optimal size depends on the workload and the available system resources.
-
**InnoDB Locking**: A large buffer pool doesn't directly affect write operations, which are influenced more by factors like the InnoDB log and transaction configuration. So, for write-heavy workloads, optimizing other parameters is necessary.
-
**Configuration Impact**: Modifying the `innodb_buffer_pool_size` requires a server restart to take effect. It's essential to carefully plan and test any changes
To determine the optimal `innodb_buffer_pool_size` for your specific workload, you should monitor your database's performance, analyze its behavior, and consider factors like the available system memory, the size of your dataset, and the nature of your queries. It's a good practice to start with a conservative setting, monitor performance, and gradually increase it as needed, all while keeping an eye on system memory usage. Consulting with a database administrator or MySQL performance expert can also be helpful in making these decisions.
-
Use database monitoring tools to keep track of query performance and system resource utilization. Regularly review performance metrics and adjust configurations accordingly.
-
To get the best cost , performance, do not use Azure managed db
-
Consider switching to Amazon AWS or continue without managed db.
-
Azure managed db currently 3x slower than container based db
-
Table indexes could be added in some tables that may benefit. This can significantly speed up query performance.
-
Review core and contrib module performance tags on drupal.org for performance patches and related information. Example: https://www.drupal.org/project/issues/search?issue_tags=Performance
-
Use XHProf for Application performance and logic profiling to target our efforts
-
Clean up and optimize existing content.
-
This will improve db dump times, possibly other performance benefits
User interface changes
N/A
API changes
TBD
Data model changes
TBD
Comments
Comment #2
sylus commentedJust a bit of background that I have more information in this regard in relation to Azure. :)
The problem is with the difference btw Single Server and Flexible Server.
Single Server
Single Server is on a retirement path w/Azure and all future installs are asked to use Flexible Server for both MySQL and PostgreSQL.
This is where most of the performance issues arise due to the fact that Single Server is on a Shared pool of backend windows node behind a gateway. Performance can be slightly improved by using redirection which reduces network latency between client applications and MySQL servers by allowing applications to connect directly to backend server nodes. Please note no functionality like this was added for the PostgreSQL Single Server.
https://learn.microsoft.com/en-us/azure/mysql/single-server/how-to-redir...
Flexible Server
With Flexible Server you get pretty much a managed virtual machine of which you get complete control over the IOPS and it is actually running under Linux rather then Windows and best of all no invisible gateway. With this setup I have found performance to be acceptable. I would say with all the proper configuration you can probably in most cases get similar to 75-80% of the performance you would get with a local containerized db that would obviously have less latency then a remote db even if in the same availability zone. People can do further improvements to this which I have been exploring using ProxySQL to account for query caching and other stuff that only MariaDB has over stock MySQL.
https://github.com/canada-ca-terraform-modules/terraform-azurerm-flex-mysql
https://github.com/canada-ca-terraform-modules/terraform-azurerm-flex-po...
Comment #3
joseph.olstad@sylus, thanks so much for the explanation, this is a huge leap forward for us!
I know at least one skilled devops resource who hit a wall on this for quite some time, this is key information that will bring redemption!
This information was not easy to find. It's good to know that Azure has fixed the problem with the Flexible server option.
Comment #4
joseph.olstadI host my own physical server that absolutely blows away anything I've seen from AWS and Azure.
I run two RAID arrays.
The first RAID array is for the the system drive is the fastest currently available nand devices with two M2.1 interfaces on the motherboard.
The second RAID array is a conventional SATA mounted to /data and is a conventional array. I knock off one of the main system devices out of the first array, so that it's unmounted and make an image and store the image on the /data partition.
After making an image , re-add the device back to the array. This way easy to make a fresh clone to secondary warm spare live without down time.
From there, a weekly or daily rsync.
For high availability situations, two warm spares where the second warm spare is a couple weeks cold.
I have yet to see a proper high performance Azure server or AWS server.
I am currently doing an upgrade, the first array will be a 2T highest possible performance
second array is Western Digital Gold conventional 8T drives.
Very disappointing performance from both Azure and Amazon AWS and extremely high cost. With that said, the Flexible server option sounds like it should be a huge improvement over the previous option we were stuck with.
Comment #5
joseph.olstad@sylus,
We're continuing ongoing evaluation of Azure. The flexible server option is in use.
I'm going to paraphrase a colleague on this.
He says this:
"IOPS is not even the issue here, I still believe. The issue is whatever Azure did to secure connections etc. such as logging, auditing, intrution detection overhead. It is not a Drupal issue, it is an azure design issue.
Profiling with AWS, only 9 seconds vs 71 seconds with Azure"
7.653 seconds with my personal laptop running the wxt-docker-scaffold from the application docker container to the mariadb container.
However run the same test in an Azure environment and we're seeing between 32seconds and 47 seconds for the exact same test.
It's between 6 and 8 times slower than a 15 watt low power laptop running a full stack of 6 docker containers.
AWS at least is able to deliver decent performance but Azure is not, and I threw in my own personal laptop score results just for fun.
My personal laptop is using a consumer grade m2.1 device that delivers 3500 mb/s . On my servers I run 7000 mb/s devices, so I find anything cloud to be extremely slow compared to my physical servers that run raid1 arrays of 7000 mb/s m2.1 nand devices.
Comment #6
joseph.olstadAn update on this, I ran another benchmark directly on a server grade db host
Using a socket connection (localhost)
we found a real result of 3.8 seconds
Compared to the loop back 127.0.0.1 came in at over 7.8 seconds (very surprising result)
Compare this to the real result of between 32 and 48 seconds for the azure db from the application side
So, it appears that there's up to 44 seconds of network lag with an azure managed db on 1000 simple select calls.
Surprising to see that a socket is over twice as fast as the loopback 127.0.0.1 when testing directly from the db server host.
Comment #7
joseph.olstadok, ran some more tests, @sylus, you're correct with saying the MariaDB is significantly better, for some reason MySQL has much more TCP lag than MariaDB.
a side note:
if using a socket connection (localhost instead of 127.0.0.1) for example, MySQL 8 compares to MariaDB in the above mentioned test metric. MariaDB 10.6.1 when using an IP address such as 127.0.0.1 however appears to handle TCP much more efficiently than MySQL 8.
Still evaluating.
Comment #8
joseph.olstad@sylus, you mentioned MariaDB in the last paragraph of comment #2, how was this provisioned in the context of azure managed db? Is MariaDB a special option , which option did you choose to provision the azure managed db which enabled you to get MariaDB?
I'm asking because TCP latency is by far improved when using MariaDB 10.6.1 default configuration compared to MySQL 8 default configuration.
Comment #9
joseph.olstadYesterday I ran some additional tests:
installed MariaDB 10.6.15
ran tests
MARIADB 10.6.15 TCP (127.0.0.1) test
MARIADB 10.6.15 SOCKET test
then on the same server -
purged MariaDB then:
MySQL 8 installed
MySQL v8 TCP (127.0.0.1) test
MySQL v8 SOCKET test
Conclusion: MariaDB 10.6.15 handles TCP much more efficiently than MySQL v8 given a default configuration with only one option set
The only option set was:
max_allowed_packets=64MSame option set for both installations and test runs.
MySQL taking 15.260 seconds to complete the TCP test
MariaDB taking 6.084 seconds to complete the TCP test. <-- MariaDB 10.6.15 wins by a huge margin.
Comment #10
joseph.olstadHi @sylus, I've been told that MariaDB is no longer available as a flexible server option.
With my own benchmarks I've noticed IP lag issues with MySQL compared to what MariaDB is delivering.
https://azure.microsoft.com/en-us/updates/azure-database-for-mariadb-wil...
Wondering if you had any idea how this will affect those being forced into the managed db and what might be our options to mitigate this?
It seems very strange that MySQL would have a marked IP lag issue compared to MariaDB, wondering if it is doing something or have a configuration option enabled by default that possibly could be disabled or ajusted to mitigate the performance issue?
Comment #11
joseph.olstadongoing performance work trying to get MySQL 8 to perform fast in Azure has not yet been fruitful. Have cranked up IOPS as high as they go, cranked up the cost of the instance to 16000$ per month or more , still slow, cranked up memory, cpu cores, filesystem options, still slow as molasses.
Yet when testing a db instance on a 200$ option with MariaDB in azure, it's nearly 3x faster than anything we've gotten out of Azure+MySQL 8.
Comment #12
joseph.olstadAdditional analysis comparing MariaDB to MySQL:
Measuring the /var/lib/mysql folders respectively from a fresh install and loading the same wxt based drupal dump file into both MySQL and MariaDB.
drush crdu -h /var/lib/mysql/schemanameordu -h /var/lib/mysql(can't recall which one I compared, but both same comparison between MySQL/MariaDB)Conclusion:
MySQL spreads it's filesystem out 5x larger than MariaDB with the exact same number of data rows and tables.
MariaDB performance is vastly superior in comparison. MariaDB is vastly more efficient.
** EDIT ** MariaDB has built-in ProxySQL service which vastly improves TCP throughput and drastically reduces TCP latency . MySQL does NOT have this and therefore suffers with very poor tcp performance. ** END EDIT **
Implications:
Comment #13
joseph.olstadAnother note on the TCP performance.
MariaDB has ProxySQL built-in which vastly improves tcp performance as seen from the socket vs localhost test results comparing MariaDB and MySQL.
MySQL does not have ProxySQL - TCP performance is very poor in comparison to MariaDB.
Exploring a new option now, PostgreSQL with Azure.
Comment #14
joseph.olstad#3478097: MySQL no longer recommended and remove Percona
Comment #15
smulvih2