Severalnines - ClusterControl

Since MongoDB is the favored database for many developers, it comes to no surprise that the community support is excellent. You can quickly find answers to most of your problems on knowledge sites like Stack Overflow, but the community also creates many tools, scripts and frameworks around MongoDB.

ClusterControl is part of the community tools that allow you to deploy, monitor, manage and scale any MongoDB topology. ClusterControl is designed around the database lifecycle, but naturally it can’t cover all aspects of a development cycle. This blog post will cover a selection of community tools that can be used to complement ClusterControl in managing a development cycle.

Schema management

The pain of schema changes in conventional RDBMS was one of the drivers behind the creation of MongoDB: we all suffered from painfully slow or failed schema migrations. Therefore MongoDB has been developed with a schemaless document design. This allows you to change your schema whenever you like, without the database holding you back.

Schema changes are generally made whenever there is application development. Adding new features to existing modules, or creating new modules may involve the creation of another version of your schema. Also schema and performance optimizations may create new versions of your schemas.

Even though many people will say it’s brilliant not being held back by the database, it also brings a couple of issues as well: since old data is not migrated to the new schema design, your application should be able to cope with every schema version you have in your database. Alternatively you could update all (old) data with the newer schema right after you have deployed the application.

The tools discussed in this section will all be very helpful in solving these schema issues.

Meteor2 collection

The Meteor2 collection module will ensure that from both client and server side, the schema will be validated. This will ensure that all data gets written according to the defined schema. The module will only be reactive, so whenever data does not get written according to the schema, a warning will be returned.

Mongoose

Mongoose is Node.js middleware for schema modelling and validation. The schema definition is placed inside your Node.js application, and this will allow Mongoose to act as an ORM. Mongoose will not migrate existing data into the new schema definition.

MongoDB Schema

So far we only have spoken about schema changes, so it is time to introduce MongoDB Schema. MongoDB Schema is a schema analyzer that will take a (random) sample of your data and output the schema for the sampled data. This doesn’t necessarily mean it will be 100% accurate on its schema estimation though.

With this tool you could regularly check your data against your schema and detect important or unintentional changes in your schema.

Backups

ClusterControl supports two implementations for backing up MongoDB: mongodump and Percona Consistent Backup. Still, some less regular used functionalities, like partial/incremental backups and streaming backups to other clusters, will not be available out of the box.

MongoDB Backup

MongoDB Backup is a NodeJS logical backup solution that offers similar functionality as mongodump. In addition to this, it can also stream backups over the network, making it useful for transporting a collection from one MongoDB instance to another.

Another useful feature is that it has been written in NodeJS. This means it will be very easy to integrate in a Hubot chatbot, and automate the collection transfers. Don’t be afraid if your company isn’t using Hubot as a chatbot: it can also function as either a webhook or be controlled via the CLI.

Mongob

Mongob is another logical backup solution, but in this case it has been written in Python and is only available as a CLI tool. Just like MongoDB Backup, it is able to transfer databases and collections between MongoDB instances, but in addition to that, it can also limit the transfer rate.

Another useful feature of Mongob is that it will be able to create incremental backups. This is good if you wish to have more compact backups, but also if you need to perform a point in time recovery.

MongoRocks Strata

MongoRocks Strata is the backup tool for the MongoRocks storage engine. Percona Server for MongoDB includes the MongoRocks storage engine, however it lacks the Strata backup tool for making file level backups. In principle mongodump and Percona Consistent Backup are able to make reliable backups, but as they are logical dumps the recovery time will be long.

MongoRocks is a storage engine that relies on a LSM tree architecture. This basically means it is an append only storage. To be able to do this, it operates with buckets of data: older data will be stored in larger (archive) buckets, recent data will be stored in smaller (recent) buckets and all new incoming data will be written into a special memory bucket. Every time a compaction is done, data will trickle down from the memory bucket to the recent buckets, and recently changed data back to the archive bucket.

To make a backup of all buckets, Strata instructs MongoDB to flush the memory bucket to disk, and then it copies all buckets of data on file level. This will create a consistent backup of all available data. It will also be possible to instruct Strata to only copy the recent buckets and effectively take an incremental backup.

Another good point of Strata is that it provides the mongoq binary, that allows you to query the backups directly. This means there is no need to restore the backup to a MongoDB instance, before being able to query it. You would be able to leverage this functionality to ship your production data offline to your analytics system!

MongoDB GUIs

WIthin ClusterControl we allow querying the MongoDB databases and collections via advisors. These advisors can be developed in the ClusterControl Developer Studio interface. We don’t feature a direct interface with the databases, so to make changes to your data you will either need to log into the MongoDB shell, or have a tool that allows you to makes these changes.

PHPMoAdmin

PHPMoAdmin is the MongoDB equivalent of PHPMyAdmin. It features similar functionality as PHPMyAdmin: data and admin management. The tool will allow you to perform CRUD operations in both JSON and PHP syntax on all databases and collections. Next to all that, it also features an import/export functionality of your current data selection.

Mongo-Express

If you seek a versatile data browser, Mongo-Express is a tool you definitely need to check out. Not only does it allow similar operations as PHPMoAdmin, it also is able to display images and videos inline. It even supports fetching large objects from GridFS buckets.

Robomongo

The tool that goes one step further is Robomongo. Being a crowd funded tool, the feature list is huge. It is able to perform all the same operations as Mongo-Express, but in addition to this also allows user, role and collection management. For connections it supports direct MongoDB connections, but also supports replicaSet topologies and MongoDB Atlas instances.

Conclusion

With this selection of free community tools, we hope we have given you a good overview how to manage MongoDB data next to ClusterControl.

Happy clustering!

Tags:

MongoDB

tools

clustercontrol

In our previous blog, we saw how easy it is to get started with RDS for MySQL. It is a convenient way to deploy and use MySQL, without worrying about operational overhead. The tradeoff though is reduced control, as users are entirely reliant on Amazon staff in case of poor performance or operational anomalies. No access to the data directory or physical backups makes it hard to move data out of RDS. This can be a major problem if your database outgrows RDS, and you decide to migrate to another platform. This two-part blog shows you how to do an online migration from RDS to your own MySQL server.

We’ll be using EC2 to run our own MySQL Server. It can be a first step for more complex migrations to your own private datacenters. EC2 gives you access to your data so xtrabackup can be used. EC2 also allows you to setup SSH tunnels and it removes requirement of setting up hardware VPN connections between your on-premises infrastructure and VPC.

Assumptions

Before we start, we need to make couple of assumptions - especially around security. First and foremost, we assume that RDS instance is not accessible from outside of AWS. We also assume that you have an application in EC2. This implies that either the RDS instance and the rest of your infrastructure shares a VPC or there is access configured between them, one way or the other. In short, we assume that you can create a new EC2 instance and it will have access (or it can be configured to have the access) to your MySQL RDS instance.

We have configured ClusterControl on the application host. We’ll use it to manage our EC2 MySQL instance.

Initial setup

In our case, the RDS instance shares the same VPC with our “application” (EC2 instance with IP 172.30.4.228) and host which will be a target for the migration process (EC2 instance with IP 172.30.4.238). As the application we are going to use tpcc-MySQL benchmark executed in the following way:

./tpcc_start -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -d tpcc1000 -u tpcc -p tpccpass -w 20 -r 60 -l 600 -i 10 -c 4

Initial plan

We are going to perform a migration using the following steps:

Setup our target environment using ClusterControl - install MySQL on 172.30.4.238
Then, install ProxySQL, which we will use to manage our traffic at the time of failover
Dump the data from the RDS instance
Load the data into our target host
Set up replication between RDS instance and target host
Switchover traffic from RDS to target host

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Prepare environment using ClusterControl

Assuming we have ClusterControl installed (if you don’t you can grab it from: https://severalnines.com/download-clustercontrol-database-management-system), we need to setup our target host. We will use the deployment wizard from ClusterControl for that:

Deploying a Database Cluster in ClusterControl

Once this is done, you will see a new cluster (in this case, just your single server) in the cluster list:

Database Cluster in ClusterControl

Next step will be to install ProxySQL - starting from ClusterControl 1.4 you can do it easily from the UI. We covered this process in details in this blog post. When installing it, we picked our application host (172.30.4.228) as the host to install ProxySQL to. When installing, you also have to pick a host to route your traffic to. As we have only our “destination” host in the cluster, you can include it but then couple of changes are needed to redirect traffic to the RDS instance.

If you have chosen to include destination host (in our case it was 172.30.4.238) in the ProxySQL setup, you’ll see following entries in the mysql_servers table:

mysql> select * from mysql_servers\G
*************************** 1. row ***************************
       hostgroup_id: 20
           hostname: 172.30.4.238
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 100
max_replication_lag: 10
            use_ssl: 0
     max_latency_ms: 0
            comment: read server
*************************** 2. row ***************************
       hostgroup_id: 10
           hostname: 172.30.4.238
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 100
max_replication_lag: 10
            use_ssl: 0
     max_latency_ms: 0
            comment: read and write server
2 rows in set (0.00 sec)

ClusterControl configured ProxySQL to use hostgroups 10 and 20 to route writes and reads to the backend servers. We will have to remove the currently configured host from those hostgroups and add the RDS instance there. First, though, we have to ensure that ProxySQL’s monitor user can access the RDS instance.

mysql> SHOW VARIABLES LIKE 'mysql-monitor_username';
+------------------------+------------------+
| Variable_name          | Value            |
+------------------------+------------------+
| mysql-monitor_username | proxysql-monitor |
+------------------------+------------------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'mysql-monitor_password';
+------------------------+---------+
| Variable_name          | Value   |
+------------------------+---------+
| mysql-monitor_password | monpass |
+------------------------+---------+
1 row in set (0.00 sec)

We need to grant this user access to RDS. If we need it to track replication lag, the user would have to have then‘REPLICATION CLIENT’ privilege. In our case it is not needed as we don’t have slave RDS instance - ‘USAGE’ will be enough.

root@ip-172-30-4-228:~# mysql -ppassword -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 210
Server version: 5.7.16-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER 'proxysql-monitor'@172.30.4.228 IDENTIFIED BY 'monpass';
Query OK, 0 rows affected (0.06 sec)

Now it’s time to reconfigure ProxySQL. We are going to add the RDS instance to both writer (10) and reader (20) hostgroups. We will also remove 172.30.4.238 from those hostgroups - we’ll just edit them and add 100 to each hostgroup.

mysql> INSERT INTO mysql_servers (hostgroup_id, hostname, max_connections, max_replication_lag) VALUES (10, 'rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', 100, 10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mysql_servers (hostgroup_id, hostname, max_connections, max_replication_lag) VALUES (20, 'rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', 100, 10);
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE mysql_servers SET hostgroup_id=110 WHERE hostname='172.30.4.238' AND hostgroup_id=10;
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE mysql_servers SET hostgroup_id=120 WHERE hostname='172.30.4.238' AND hostgroup_id=20;
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

mysql> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.07 sec)

Last step required before we can use ProxySQL to redirect our traffic is to add our application user to ProxySQL.

mysql> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('tpcc', 'tpccpass', 1, 10);
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; SAVE MYSQL USERS TO MEMORY;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.05 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT username, password FROM mysql_users WHERE username='tpcc';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| tpcc     | *8C446904FFE784865DF49B29DABEF3B2A6D232FC |
+----------+-------------------------------------------+
1 row in set (0.00 sec)

Quick note - we executed “SAVE MYSQL USERS TO MEMORY;” only to have password hashed not only in RUNTIME but also in working memory buffer. You can find more details about ProxySQL’s password hashing mechanism in their documentation.

We can now redirect our traffic to ProxySQL. How to do it depends on your setup, we just restarted tpcc and pointed it to ProxySQL.

Redirecting Traffic with ProxySQL

At this point, we have built a target environment to which we will migrate. We also prepared ProxySQL and configured it for our application to use. We now have a good foundation for the next step, which is the actual data migration. In the next post, we will show you how to copy the data out of RDS into our own MySQL instance (running on EC2). We will also show you how to switch traffic to your own instance while applications continue to serve users, without downtime.

Tags:

As we saw earlier, it might be challenging for companies to move their data out of RDS for MySQL. In the first part of this blog, we showed you how to set up your target environment on EC2 and insert a proxy layer (ProxySQL) between your applications and RDS. In this second part, we will show you how to do the actual migration of data to your own server, and then redirect your applications to the new database instance without downtime.

Copying data out of RDS

Once we have our database traffic running through ProxySQL, we can start preparations to copy our data out of RDS. We need to do this in order to set up replication between RDS and our MySQL instance running on EC2. Once this is done, we will configure ProxySQL to redirect traffic from RDS to our MySQL/EC2.

As we discussed in the first blog post in this series, the only way you can get data out of the RDS is via logical dump. Without access to the instance, we cannot use any hot, physical backup tools like xtrabackup. We cannot use snapshots either as there is no way to build anything else other than a new RDS instance from the snapshot.

We are limited to logical dump tools, therefore the logical option would be to use mydumper/myloader to process the data. Luckily, mydumper can create consistent backups so we can rely on it to provide binlog coordinates for our new slave to connect to. The main issue while building RDS replicas is binlog rotation policy - logical dump and load may take even days on larger (hundreds of gigabytes) datasets and you need to keep binlogs on the RDS instance for the duration of this whole process. Sure, you can increase binlog rotation retention on RDS (call mysql.rds_set_configuration('binlog retention hours', 24); - you can keep them up to 7 days) but it’s much safer to do it differently.

Before we proceed with taking a dump, we will add a replica to our RDS instance.

Amazon RDS Dashboard

Create Replica DB in RDS

Once we click on the “Create Read Replica” button, a snapshot will be started on the “master” RDS replica. It will be used to provision the new slave. The process may take hours, it all depends on the volume size, when was the last time a snapshot was taken and performance of the volume (io1/gp2? Magnetic? How many pIOPS a volume has?).

Master RDS Replica

When slave is ready (its status has changed to “available”), we can log into it using its RDS endpoint.

RDS Slave

Once logged in, we will stop replication on our slave - this will ensure the RDS master won’t purge binary logs and they will be still available for our EC2 slave once we complete our dump/reload process.

mysql> CALL mysql.rds_stop_replication;
+---------------------------+
| Message                   |
+---------------------------+
| Slave is down or disabled |
+---------------------------+
1 row in set (1.02 sec)

Query OK, 0 rows affected (1.02 sec)

Now, it’s finally time to copy data to EC2. First, we need to install mydumper. You can get it from github: https://github.com/maxbube/mydumper. The installation process is fairly simple and nicely described in the readme file, so we won’t cover it here. Most likely you will have to install a couple of packages (listed in the readme) and the harder part is to identify which package contains mysql_config - it depends on the MySQL flavor (and sometimes also MySQL version).

Once you have mydumper compiled and ready to go, you can execute it:

root@ip-172-30-4-228:~/mydumper# mkdir /tmp/rdsdump
root@ip-172-30-4-228:~/mydumper# ./mydumper -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -p tpccpass -u tpcc  -o /tmp/rdsdump  --lock-all-tables --chunk-filesize 100 --events --routines --triggers
.

Please note --lock-all-tables which ensures that the snapshot of the data will be consistent and it will be possible to use it to create a slave. Now, we have to wait until mydumper complete its task.

One more step is required - we don’t want to restore the mysql schema but we need to copy users and their grants. We can use pt-show-grants for that:

root@ip-172-30-4-228:~# wget http://percona.com/get/pt-show-grants
root@ip-172-30-4-228:~# chmod u+x ./pt-show-grants
root@ip-172-30-4-228:~# ./pt-show-grants -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -u tpcc -p tpccpass > grants.sql

Sample of pt-show-grants may look like this:

-- Grants for 'sbtest'@'%'
CREATE USER IF NOT EXISTS 'sbtest'@'%';
ALTER USER 'sbtest'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*2AFD99E79E4AA23DE141540F4179F64FFB3AC521' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE USER, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, PROCESS, REFERENCES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SELECT, SHOW DATABASES, SHOW VIEW, TRIGGER, UPDATE ON *.* TO 'sbtest'@'%';

It is up to you to pick what users are required to be copied onto your MySQL/EC2 instance. It doesn’t make sense to do it for all of them. For example, root users don’t have ‘SUPER’ privilege on RDS so it’s better to recreate them from scratch. What you need to copy are grants for your application user. We also need to copy users used by ProxySQL (proxysql-monitor in our case).

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Inserting data into your MySQL/EC2 instance

As stated above, we don’t want to restore system schemas. Therefore we will move files related to those schemas out of our mydumper directory:

root@ip-172-30-4-228:~# mkdir /tmp/rdsdump_sys/
root@ip-172-30-4-228:~# mv /tmp/rdsdump/mysql* /tmp/rdsdump_sys/
root@ip-172-30-4-228:~# mv /tmp/rdsdump/sys* /tmp/rdsdump_sys/

When we are done with it, it’s time to start to load data into the MySQL/EC2 instance:

root@ip-172-30-4-228:~/mydumper# ./myloader -d /tmp/rdsdump/ -u tpcc -p tpccpass -t 4 --overwrite-tables -h 172.30.4.238

Please note that we used four threads (-t 4) - make sure you set this to whatever makes sense in your environment. It’s all about saturating the target MySQL instance - either CPU or I/O, depending on the bottleneck. We want to squeeze as much out of it as possible to ensure we used all available resources for loading the data.

After the main data is loaded, there are two more steps to take, both are related to RDS internals and both may break our replication. First, RDS contains a couple of rds_* tables in the mysql schema. We want to load them in case some of them are used by RDS - replication will break if our slave won’t have them. We can do it in the following way:

root@ip-172-30-4-228:~/mydumper# for i in $(ls -alh /tmp/rdsdump_sys/ | grep rds | awk '{print $9}') ; do echo $i ;  mysql -ppass -uroot  mysql < /tmp/rdsdump_sys/$i ; done
mysql.rds_configuration-schema.sql
mysql.rds_configuration.sql
mysql.rds_global_status_history_old-schema.sql
mysql.rds_global_status_history-schema.sql
mysql.rds_heartbeat2-schema.sql
mysql.rds_heartbeat2.sql
mysql.rds_history-schema.sql
mysql.rds_history.sql
mysql.rds_replication_status-schema.sql
mysql.rds_replication_status.sql
mysql.rds_sysinfo-schema.sql

Similar problem is with timezone tables, we need to load them using data from the RDS instance:

root@ip-172-30-4-228:~/mydumper# for i in $(ls -alh /tmp/rdsdump_sys/ | grep time_zone | grep -v schema | awk '{print $9}') ; do echo $i ;  mysql -ppass -uroot  mysql < /tmp/rdsdump_sys/$i ; done
mysql.time_zone_name.sql
mysql.time_zone.sql
mysql.time_zone_transition.sql
mysql.time_zone_transition_type.sql

When all this is ready, we can setup replication between RDS (master) and our MySQL/EC2 instance (slave).

Setting up replication

Mydumper, when performing a consistent dump, writes down a binary log position. We can find this data in a file called metadata in the dump directory. Let’s take a look at it, we will then use the position to setup replication.

root@ip-172-30-4-228:~/mydumper# cat /tmp/rdsdump/metadata
Started dump at: 2017-02-03 16:17:29
SHOW SLAVE STATUS:
    Host: 10.1.4.180
    Log: mysql-bin-changelog.007079
    Pos: 10537102
    GTID:

Finished dump at: 2017-02-03 16:44:46

One last thing we lack is a user that we could use to setup our slave. Let’s create one on the RDS instance:

root@ip-172-30-4-228:~# mysql -ppassword -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com

mysql> CREATE USER IF NOT EXISTS 'rds_rpl'@'%' IDENTIFIED BY 'rds_rpl_pass';
Query OK, 0 rows affected (0.04 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'rds_rpl'@'%';
Query OK, 0 rows affected (0.01 sec)

Now it’s time to slave our MySQL/EC2 server off the RDS instance:

mysql> CHANGE MASTER TO MASTER_HOST='rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', MASTER_USER='rds_rpl', MASTER_PASSWORD='rds_rpl_pass', MASTER_LOG_FILE='mysql-bin-changelog.007079', MASTER_LOG_POS=10537102;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
                  Master_User: rds_rpl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin-changelog.007080
          Read_Master_Log_Pos: 13842678
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 20448
        Relay_Master_Log_File: mysql-bin-changelog.007079
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 10557220
              Relay_Log_Space: 29071382
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 258726
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1237547456
                  Master_UUID: b5337d20-d815-11e6-abf1-120217bb3ac2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.01 sec)

Last step will be to switch our traffic from the RDS instance to MySQL/EC2, but we need to let it catch up first.

When the slave has caught up, we need to perform a cutover. To automate it, we decided to prepare a short bash script which will connect to ProxySQL and do what needs to be done.

# At first, we define old and new masters
OldMaster=rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
NewMaster=172.30.4.238

(
# We remove entries from mysql_replication_hostgroup so ProxySQL logic won’t interfere
# with our script

echo "DELETE FROM mysql_replication_hostgroups;"

# Then we set current master to OFFLINE_SOFT - this will allow current transactions to
# complete while not accepting any more transactions - they will wait (by default for
# 10 seconds) for a master to become available again.

echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\";"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032


# Here we are going to check for connections in the pool which are still used by
# transactions which haven’t closed so far. If we see that neither hostgroup 10 nor
# hostgroup 20 has open transactions, we can perform a switchover.

CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND (hostgroup=10 OR hostgroup=20)' -B -N 2> /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND (hostgroup=10 OR hostgroup=20)' -B -N 2> /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

# Here is our switchover logic - we basically exchange hostgroups for RDS and EC2
# instance. We also configure back mysql_replication_hostgroups table.

(
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=110 WHERE hostname=\"$OldMaster\" AND hostgroup_id=10;"
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=120 WHERE hostname=\"$OldMaster\" AND hostgroup_id=20;"
echo "UPDATE mysql_servers SET hostgroup_id=10 WHERE hostname=\"$NewMaster\" AND hostgroup_id=110;"
echo "UPDATE mysql_servers SET hostgroup_id=20 WHERE hostname=\"$NewMaster\" AND hostgroup_id=120;"
echo "INSERT INTO mysql_replication_hostgroups VALUES (10, 20, 'hostgroups');"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

When all is done, you should see the following contents in the mysql_servers table:

mysql> select * from mysql_servers;
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+
| hostgroup_id | hostname                                      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment     |
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+
| 20           | 172.30.4.238                                  | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              | read server |
| 10           | 172.30.4.238                                  | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              | read server |
| 120          | rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              |             |
| 110          | rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              |             |
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+

On the application side, you should not see much of an impact, thanks to the ability of ProxySQL to queue queries for some time.

With this we completed the process of moving your database from RDS to EC2. Last step to do is to remove our RDS slave - it did its job and it can be deleted.

In our next blog post, we will build upon that. We will walk through a scenario in which we will move our database out of AWS/EC2 into a separate hosting provider.

Tags:

In this video we will demonstrate the differences between deploying a MongoDB ReplicaSet versus deploying a MongoDB Sharded Cluster in ClusterControl.

With the new version of ClusterControl 1.4, you can automatically and securely deploy sharded MongoDB clusters or Replica Sets with ClusterControl’s free community version; as well as automatically convert a Replica Set into a sharded cluster.

The video also demonstrates the different overview sections of each type of deployment within ClusterControl as well as how to add nodes and convert the ReplicaSet into a Sharded Cluster.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

What is a MongoDB ReplicaSet?

A replica set is a group of MongoDB instances that host the same data set. In a replica, one node is a primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set.

What is a MongoDB Sharded Cluster?

MongoDB Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput.

Tags:

The free ClusterControl Developer Studio provides you a set of monitoring and performance advisors to use and lets you create custom advisors to add security and stability to your MySQL, Galera, and MongoDB infrastructures.

ClusterControl’s library of Advisors allows you to extend the features of ClusterControl to add even more database management functionality.

Advisors in ClusterControl are powerful constructs; they provide specific advice on how to address issues in areas such as performance, security, log management, configuration, storage space, etc. They can be anything from simple configuration advice, warning on thresholds or more complex rules for predictions, or even cluster-wide automation tasks based on the state of your servers or databases.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Developer Studio Resources

Want to learn more about the Developer Studio in ClusterControl check out the information below!

Advisor Highlights

Here is some information on particular advisors that can help you with your instances

Tags:

Following the interest we saw in this topic during our recent introduction webinar to ProxySQL, we’re pleased to invite you to join this new webinar on high availability in ProxySQL.

As you will know, the proxy layer is crucial when building a highly available MySQL infrastructure. It is therefore imperative to not let it become a single point of failure on its own. And building a highly available proxy layer creates additional challenges, such as how to manage multiple proxy instances, how to ensure that their configuration is in sync, Virtual IP and fail-over.

In this new webinar with ProxySQL’s creator, René Cannaò, we’ll discuss building a solid, scalable and manageable proxy layer using ProxySQL. And we will demonstrate how you can make your ProxySQL highly available when deploying it from ClusterControl.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, April 4th at 09:00 BST (UK) / 10:00 CEST (Germany, France, Sweden)

North America/LatAm

Tuesday, April 4th at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

Introduction
High Availability in ProxySQL
- Layered approach
- Virtual IP
- Keepalived
Configuration management in distributed ProxySQL clusters
Demo: ProxySQL + keepalived in ClusterControl
- Deployment
- Failover
Q&A

Speakers

René Cannaò, Creator & Founder, ProxySQL. René has 10 years of working experience as a System, Network and Database Administrator mainly on Linux/Unix platform. In the last 4-5 years his experience was focused mainly on MySQL, working as Senior MySQL Support Engineer at Sun/Oracle and then as Senior Operational DBA at Blackbird, (formerly PalominoDB). In this period he built an analytic and problem solving mindset and he is always eager to take on new challenges, especially if they are related to high performance. And then he created ProxySQL …

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

Tags:

The video below details the features and functions that are available in ClusterControl for MySQL Replication. Included in the video are…

How to Deploy Master-Slave Replication
How to Deploy Multi-Master Replication
MySQL Replication overview including metrics
Individual Node overview & management
Backup management from Slaves or Masters
Adding Nodes
Adding Load Balancers

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl for MySQL Replication

ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your MySQL replication instances up-and-running using proven methodologies that you can depend on to work. It makes MySQL Replication easy and secure with point-and click interfaces and no need to have specialized knowledge about the technology or multiple tools. It covers all aspects one might expect for a production-ready replication setup.

ClusterControl delivers on an array of features to help deploy, manage, monitor, and scale your MySQL Replication environments.

Point-and-Click Deployment: Point-and-click, automatic deployment for MySQL replication is available in both community and enterprise versions of ClusterControl.
Management & Monitoring: ClusterControl provides management features to repair and recover broken nodes, as well as test and automate MySQL upgrades. It also provides a unified view of all MySQL nodes across your data centers and lets you drill down into individual nodes for more detailed statistics.
Automatic Failure Detection and Handling: ClusterControl takes care of your replication cluster’s health. If a master failure is detected, ClusterControl automatically promotes one of the available slaves to ensure your cluster is always up.
Proxy Integration: ClusterControl makes it easy to build a proxy layer over your replication setup; it shields applications from replication topology changes, server failures and changed writable masters. With just a couple of clicks you can improve the availability of your stack.

Learn more about how ClusterControl can simply deployment and enhance performance here.

Tags:

MySQL

replication

clustercontrol

For years, MySQL replication used to be based on binary log events - all a slave knew was the exact event and the exact position it just read from the master. Any single transaction from a master may have ended in different binary logs, and in different positions in these logs. It was a simple solution that came with limitations - more complex topology changes could require an admin to stop replication on the hosts involved. Or these changes could cause some other issues, e.g., a slave couldn’t be moved down the replication chain without time-consuming rebuild process (we couldn’t easily change replication from A -> B -> C to A -> C -> B without stopping replication on both B and C). We’ve all had to work around these limitations while dreaming about a global transaction identifier.

GTID was introduced along with MySQL 5.6, and brought along some major changes in the way MySQL operates. First of all, every transaction has an unique identifier which identifies it in a same way on every server. It’s not important anymore in which binary log position a transaction was recorded, all you need to know is the GTID: ‘966073f3-b6a4-11e4-af2c-080027880ca6:4’. GTID is built from two parts - the unique identifier of a server where a transaction was first executed, and a sequence number. In the above example, we can see that the transaction was executed by the server with server_uuid of ‘966073f3-b6a4-11e4-af2c-080027880ca6’ and it’s 4th transaction executed there. This information is enough to perform complex topology changes - MySQL knows which transactions have been executed and therefore it knows which transactions need to be executed next. Forget about binary logs, it’s all in the GTID.

So, where can you find GTID’s? You’ll find them in two places. On a slave, in ‘show slave status;’ you’ll find two columns: Retrieved_Gtid_Set and Executed_Gtid_Set. First one covers GTID’s which were retrieved from the master via replication, the second informs about all transactions which were executed on given host - both via replication or executed locally.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Setting up a Replication Cluster the easy way

Deployment of MySQL replication cluster is very easy in ClusterControl (you can try it for free). The only prerequisite is that all hosts, which you will use to deploy MySQL nodes to, can be accessed from the ClusterControl instance using passwordless SSH connection.

When connectivity is in place, you can deploy a cluster by using the “Deploy” option. When the wizard window is open, you need to make couple of decisions - what do you want to do? Deploy a new cluster? Deploy a Postgresql node or import existing cluster.

We want to deploy a new cluster. We will then be presented with following screen in which we need to decide what type of cluster we want to deploy. Let’s pick replication and then pass the required details about ssh connectivity.

When ready, click on Continue. This time we need to decide which MySQL vendor we’d like to use, what version and couple of configuration settings including, among others, password for the root account in MySQL.

Finally, we need to decide on the replication topology - you can either use a typical master - slave setup or create more complex, active - standby master - master pair (+ slaves should you want to add them). Once ready, just click on “Deploy” and in couple of minutes you should have your cluster deployed.

Once this is done, you will see your cluster in the cluster list of ClusterControl’s UI.

Having the replication up and running we can take a closer look at how GTID works.

Errant transactions - what is the issue?

As we mentioned at the beginning of this post, GTID’s brought a significant change in the way people should think about MySQL replication. It’s all about habits. Let’s say, for some reason, that an application performed a write on one of the slaves. It shouldn’t have happened but surprisingly, it happens all the time. As a result, replication stops with duplicate key error. There are couple of ways to deal with such problem. One of them would be to delete the offending row and restart replication. Other one would be to skip the binary log event and then restart replication.

STOP SLAVE SQL_THREAD; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE SQL_THREAD;

Both ways should bring replication back to work, but they may introduce data drift so it is necessary to remember that slave consistency should be checked after such event (pt-table-checksum and pt-table-sync works well here).

If a similar problem happens while using GTID, you’ll notice some differences. Deleting the offending row may seem to fix the issue, replication should be able to commence. The other method, using sql_slave_skip_counter won’t work at all - it’ll return an error. Remember, it’s now not about binlog events, it’s all about GTID being executed or not.

Why deleting the row only ‘seems’ to fix the issue? One of the most important things to keep in mind regarding GTID is that a slave, when connecting to the master, checks if it is missing any transactions which were executed on the master. These are called errant transactions. If a slave finds such transactions, it will execute them. Let’s assume we ran following SQL to clear an offending row:

DELETE FROM mytable WHERE id=100;

Let’s check show slave status:

                  Master_UUID: 966073f3-b6a4-11e4-af2c-080027880ca6
           Retrieved_Gtid_Set: 966073f3-b6a4-11e4-af2c-080027880ca6:1-29
            Executed_Gtid_Set: 84d15910-b6a4-11e4-af2c-080027880ca6:1,
966073f3-b6a4-11e4-af2c-080027880ca6:1-29,

And see where the 84d15910-b6a4-11e4-af2c-080027880ca6:1 comes from:

mysql> SHOW VARIABLES LIKE 'server_uuid'\G
*************************** 1. row ***************************
Variable_name: server_uuid
        Value: 84d15910-b6a4-11e4-af2c-080027880ca6
1 row in set (0.00 sec)

As you can see, we have 29 transactions that came from the master, UUID of 966073f3-b6a4-11e4-af2c-080027880ca6 and one that was executed locally. Let’s say that at some point we failover and the master (966073f3-b6a4-11e4-af2c-080027880ca6) becomes a slave. It will check its list of executed GTID’s and will not find this one: 84d15910-b6a4-11e4-af2c-080027880ca6:1. As a result, the related SQL will be executed:

DELETE FROM mytable WHERE id=100;

This is not something we expected… If, in the meantime, the binlog containing this transaction would be purged on the old slave, then the new slave will complain after failover:

                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

How to detect errant transactions?

MySQL provides two functions which come in very handy when you want to compare GTID sets on different hosts.

GTID_SUBSET() takes two GTID sets and checks if the first set is a subset of the second one.

Let’s say we have following state.

Master:

mysql> show master status\G
*************************** 1. row ***************************
             File: binlog.000002
         Position: 160205927
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1,
ab8f5793-b907-11e4-bebd-080027880ca6:1-2
1 row in set (0.00 sec)

Slave:

mysql> show slave status\G
[...]
           Retrieved_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1
            Executed_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1,
ab8f5793-b907-11e4-bebd-080027880ca6:1-4

We can check if the slave has any errant transactions by executing the following SQL:

mysql> SELECT GTID_SUBSET('8a6962d2-b907-11e4-bebc-080027880ca6:1-153,ab8f5793-b907-11e4-bebd-080027880ca6:1-4', '8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2') as is_subset\G
*************************** 1. row ***************************
is_subset: 0
1 row in set (0.00 sec)

Looks like there are errant transactions. How do we identify them? We can use another function, GTID_SUBTRACT()

mysql> SELECT GTID_SUBTRACT('8a6962d2-b907-11e4-bebc-080027880ca6:1-153,ab8f5793-b907-11e4-bebd-080027880ca6:1-4', '8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2') as mising\G
*************************** 1. row ***************************
mising: ab8f5793-b907-11e4-bebd-080027880ca6:3-4
1 row in set (0.01 sec)

Our missing GTID’s are ab8f5793-b907-11e4-bebd-080027880ca6:3-4 - those transactions were executed on the slave but not on the master.

How to solve issues caused by errant transactions?

There are two ways - inject empty transactions or exclude transactions from GTID history.

To inject empty transactions we can use the following SQL:

mysql> SET gtid_next='ab8f5793-b907-11e4-bebd-080027880ca6:3';
Query OK, 0 rows affected (0.01 sec)

mysql> begin ; commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> SET gtid_next='ab8f5793-b907-11e4-bebd-080027880ca6:4';
Query OK, 0 rows affected (0.00 sec)

mysql> begin ; commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> SET gtid_next=automatic;
Query OK, 0 rows affected (0.00 sec)

This has to be executed on every host in the replication topology that does not have those GTID’s executed. If the master is available, you can inject those transactions there and let them replicate down the chain. If the master is not available (for example, it crashed), those empty transactions have to be executed on every slave. Oracle developed a tool called mysqlslavetrx which is designed to automate this process.

Another approach is to remove the GTID’s from history:

Stop slave:

mysql> STOP SLAVE;

Print Executed_Gtid_Set on the slave:

mysql> SHOW MASTER STATUS\G

Reset GTID info:

RESET MASTER;

Set GTID_PURGED to a correct GTID set. based on data from SHOW MASTER STATUS. You should exclude errant transactions from the set.

SET GLOBAL GTID_PURGED='8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2';

Start slave.

mysql> START SLAVE\G

In every case, you should verify consistency of your slaves using pt-table-checksum and pt-table-sync (if needed) - errant transaction may result in a data drift.

Failover in ClusterControl

Starting from version 1.4, ClusterControl enhanced its failover handling processes for MySQL Replication. You can still perform a manual master switch by promoting one of the slaves to master. The rest of the slaves will then fail-over to the new master. From version 1.4, ClusterControl also have the ability to perform a fully-automated failover should the master fail. We covered it in-depth in a blog post describing ClusterControl and automated failover. We’d still like to mention one feature, directly related to the topic of this post.

By default, ClusterControl performs failover in a “safe way” - at the time of failover (or switchover, if it’s the user who executed a master switch), ClusterControl picks a master candidate and then verifies that this node does not have any errant transactions which would impact replication once it is promoted to master. If an errant transaction is detected, ClusterControl will stop the failover process and the master candidate will not be promoted to become a new master.

If you want to be 100% certain that ClusterControl will promote a new master even if some issues (like errant transactions) are detected, you can do that using the replication_stop_on_error=0 setting in cmon configuration. Of course, as we discussed, it may lead to problems with replication - slaves may start asking for a binary log event which is not available anymore.

To handle such cases, we added experimental support for slave rebuilding. If you set replication_auto_rebuild_slave=1 in the cmon configuration and your slave is marked as down with the following error in MySQL, ClusterControl will attempt to rebuild the slave using data from the master:

Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Such a setting may not always be appropriate as the rebuilding process will induce an increased load on the master. It may also be that your dataset is very large and a regular rebuild is not an option - that’s why this behavior is disabled by default.

Tags:

Thanks to everyone who participated in our recent webinar on how to load balance MySQL and MariaDB with ClusterControl and ProxySQL!

This joint webinar with ProxySQL creator René Cannaò generated a lot of interest … and a lot of questions!

We covered topics such as ProxySQL concepts (with hostgroups, query rules, connection multiplexing and configuration management), went through a live demo of a ProxySQL setup in ClusterControl (try it free) and discussed upcoming ClusterControl features for ProxySQL.

These topics triggered a lot of related questions, to which you can find our answers below.

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Webinar Questions & Answers

Q. Thank you for your presentation. I have a question about connection multiplexing: does ProxySQL ensure that all statements from start transaction to commit are sent through the same backend connection?

A. This is configurable.

A small preface first: at any time, each client’s session can have one or more backend connections associated with it. A backend connection is associated to a client when a query needs to be executed, and normally it returns immediately back to the connection pool. “Normally” means that there are circumstances when this doesn’t happen. For example, when a transaction starts, the connection is not returned anymore to the connection pool until the transaction completes (either commits or rollbacks). This means that all the queries that should be routed to the same hostgroup where the transaction is running, are guaranteed to run in the same connection.

Nonetheless, by default, a transaction doesn’t disable query routing. That means that while a transaction is running on one connection to a specific hostgroup and this connection is associated with only that client, if the client sends a query destinated to another hostgroup, that query could be sent to a different connection.

Whatever the query could be sent to a different connection or not based on query rules is configurable by the value of mysql_users.transaction_persistent:

0 = queries for different hostgroup can be routed to different connections while a transaction is running;
1 = query routing will be disabled while the transaction is running.

The behaviour is configurable because it depends on the application. Some applications require that all the queries are part of the same transaction, other applications don’t.

Q. What is the best way to set up a ProxySQL cluster? The main concern here is configuration of the ProxySQL cascading throughout the cluster.

A. ProxySQL can be deployed in numerous ways.

One typical deployment pattern is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. If the number of application hosts increase, you can deploy a middle-layer of 3-5 ProxySQL instances and configure all ProxySQL instances from application servers to connect via this middle-layer. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How would you recommend to make the ProxySQL layer itself highly available?

There are numerous methods to achieve this.

One common method is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. In such a deployment there is no single point of failure as every application host connects to the ProxySQL installed locally.

When you implement a middle-layer, you will also maintain HA as 3-5 ProxySQL nodes would be enough to make sure that at least some of them are available for local proxies from application hosts.

Another common method of deploying a highly available ProxySQL setup is to use tools like keepalived along with virtual IP. The application will connect to VIP and this IP will be moved from one ProxySQL instance to another if keepalived detects that something happened to the “main” ProxySQL.

Q. How can ProxySQL use the right hostgroup for each query?

A. ProxySQL route queries to hostgroups is based on query rules - it is up to the user to build a set of rules which make sense in their environment.

Q. Can you tell us more about query mirroring?

A. In general, the implementation of query mirroring in ProxySQL allows you to send traffic to two hostgroups.

Traffic sent to the “main” hostgroup is ensured to reach it (unless there are no hosts in that hostgroup); on the other hand, mirror hostgroup will receive traffic on a “best effort” basis - it should but it is not guaranteed that the query will indeed reach the mirrored hostgroup.

This limits the usefulness of mirroring as a method to replicate data. It is still an amazing way to do load testing of new hardware or redesigned schema. Of course, mirroring reduces the maximal throughput of the proxy - queries have to be executed twice so the load is also twice as high. The load is not split between the two, but duplicated.

Q. And what about query caching?

Query cache in ProxySQL is implemented as a simple key->value memory store with Time To Live for every entry. What will be cached and for how long - this is decided on the query rules level. The user can define a query rule matching a particular query or a wider spectrum of them. To identify query results set in cache, ProxySQL uses query hash along with information about user and schema.

How to set TTL for a query? The simplest answer is: to the maximum value of replication lag which is acceptable for this query. If you are ok to read stale data from slave, which is lagging 10 seconds, you should be fine reading stale data from cache when TTL is set to 10000 milliseconds.

Q. Connection limit to backends?

A. ProxySQL indeed implements a connection limit to backend servers. The maximum number of connections to any backend instance is defined in mysql_servers table.

Because the same backend server can be present in multiple hostgroups, it is possible to define the maximum number of connections per server per hostgroup.

This is useful for example in the case of a small set of connections where specific long running queries are queued without affecting the rest of the traffic destinated to the same server.

Q. Regarding the connection limit from the APP: are connections QUEUED?

A. If you reach the mysql-max_connections, further connections will be rejected with the error “Too many connections”.

It is important to remember that there is not a one-to-one mapping between application connections and backend connections.

That means that:

Access to the backends can be queued, but connections from the application are either accepted or rejected.
A large number of application connections can use a small number of backend connections.

Q. I haven’t heard of SHUN before: what does it mean?

A. SHUN means that the backend is temporarily marked as non-available but ProxySQL will attempt to connect to it after mysql-shun_recovery_time_sec seconds

Q. Is query sharding available across slaves?

A. Depending on the meaning of sharding, ProxySQL can be used to perform sharding across slaves. For example, it is possible to send all traffic for a specific set of tables to a set of slaves (in a hostgroup). Splitting the slaves into multiple hostgroups and performing query sharding accordingly is possible to improve performance, as each slave won’t read from disk data from tables for which it doesn’t process any query.

Q. How do you sync the configuration of ProxySQL when you have many instances for H.A ?

A. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How flexible or feasible it is to change the ProxySQL config online, eg. if one database slave is down, how is that handled in such a scenario ?

A. ProxySQL configuration can be changed at any time; it’s been designed with such level of flexibility in mind.

‘Database down’ can be handled differently, it depends on how ProxySQL is configured. If you happen to rely on replication hostgroups to define writer and reader hostgroups (this is how ClusterControl deploys ProxySQL), ProxySQL will monitor state of read_only variable on both reader and writer hostgroups and it will move hosts as needed.

If master is promoted by external tools (like ClusterControl, for example), read_only values will change and ProxySQL will detect a topology change and it will act accordingly. For a standard “slave down” scenario there is no required action from the management system standpoint - without any changes in read_only value ProxySQL will just detect that the host is not available and it will stop sending queries to it, re-executing on other members of the hostgroup those queries which didn’t complete on dead slave.

If we are talking about a setup not using replication hostgroups then it is up to the user and their scripts/tools to implement some sort of logic and reconfigure ProxySQL on runtime using admin interface. Slave down, though, most likely wouldn’t require any changes.

Q. Is it somehow possible to SELECT data from one host group into another host group?

A. No, at this point it is not possible to execute cross-hostgroup queries.

Q. What would be RAM/Disk requirements for logs , etc?

A. It basically depends on the amount of log entries and how ProxySQL log is verbose in your environment. Typically it’s neglectable.

Q. Instead of installing ProxySQL on all application servers, could you put a ProxySQL cluster behind a standard load balancer?

A. We see no reason why not? You can put whatever you like in front of the ProxySQL - F5, another layer of software proxies - it is up to you. Please keep in mind, though, that every layer of proxies or load balancers adds latency to your network and, as a result, to your queries.

Q. Can you please comment on Reverse Proxy, whether it can be used in SQL or not?

A. ProxySQL is a Reverse Proxy. Contrary to a Forward Proxy (that acts as an intermediary that simply forwards requests), a Reverse Proxy processes clients’ requests and retrieves data from servers. ProxySQL is a Reverse Proxy: clients send requests to ProxySQL, that will understand the request, analyze it, and decide what to do: rewrite, cache, block, re-execute on failure, etc.

Q. Does the user authentication layer work with non-local database accounts, e.g. with the pam modules available for proxying LDAP users to local users?

A. There is no direct support for LDAP integration but, as configuration management in ProxySQL is a child’s play, it is really simple to put together a script which will pull the user details from LDAP and load them into ProxySQL. You can use cron to sync it often. All ProxySQL needs is a username and password hash in MySQL format - this is enough to add a user to ProxySQL.

Q. It seems like the prescribed production deployment includes many proxies - are there any suggestions or upcoming work to address how to make configuration changes across all proxies in a consistent manner?

A. At this point it is recommended to leverage configuration management tools like Chef/Ansible/Puppet to manage ProxySQL’s configuration.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Tags:

Setting up replication in MySQL is easy, but managing it in production has never been an easy task. Even with the newer GTID auto-positioning, it still can go wrong if you don’t know what you are doing. After setting up replication, all sorts of things can go wrong. Mistakes can easily be made and can have a disastrous ending for your data.

This post will highlight some of the most common mistakes made with MySQL replication, and how you can prevent them.

Setting up replication

When setting up MySQL replication, you need to prime the slave nodes with the dataset from the master. With solutions like Galera cluster, this is automatically handled for you with the method of your choice. For MySQL replication, you need to do this yourself, so naturally you take your standard backup tool.

For MySQL there is a huge variety of backup tools available, but the most commonly used one is mysqldump. Mysqldump outputs a logical backup of the dataset of your master. This means the copy of the data is not going to be a binary copy, but a big file containing queries to recreate your dataset. In most cases this should provide you with a (near) identical copy of your data, but there are cases where it will not - due to the dump being on a per object basis. This means that even before you start replicating data, your dataset is not the same as the one on the master.

There are a couple of tweaks you can do to make mysqldump more reliable like dump as a single transaction, and also don’t forget to include routines and triggers:

mysqldump -uuser -ppass --single-transaction --routines --triggers --all-databases > dumpfile.sql

A good practice is to check if your slave node is 100% the same, is by using pt-table-checksum after setting up the replication:

pt-table-checksum --replicate=test.checksums --ignore-databases mysql h=localhost,u=user,p=pass

This tool will calculate a checksum for each table on the master, replicate the command to the slave and then the slave node will perform the same checksum operation. If any of the tables are not the same, this should be clearly visible in the checksum table.

Using the wrong replication method

The default replication method of MySQL was the so called statement-based replication. This method is exactly what it is: a replication stream of every statement run on the master that will be replayed on the slave node. Since MySQL itself is multi-threaded but it’s (traditional) replication isn’t, the order of statements in the replication stream may not be 100% the same. Also replaying a statement may give different results when not executed on the exact same time.

This may result in different datasets between the master and slave, due to data drift. This wasn’t an issue for many years, as not many ran MySQL with many simultaneous threads, but with modern multi-CPU architectures, this actually has become highly probable on a normal day-to-day workload.

The answer from MySQL was the so called row-based replication. Row based replication will replicate the data whenever possible, but in some exceptional cases still use statements. A good example would be the DLL change of a table, where the replication then would have to copy every row in the table through replication. Since this is inefficient, such a statement will be replicated in the traditional way. When row based replication detects data drift, it will stop the slave thread to prevent making things worse.

Then there is a method in between these two: mixed mode replication. This type of replication will always replicate statements, except when the query contains the UUID() function, triggers, stored procedures, UDFs and a few other exceptions are used. Mixed mode will not solve the issue of data drift and, together with statement-based replication, should be avoided.

Circular replication

Running MySQL replication with multi-master is often necessary if you have a multi-datacenter environment. Since the application can’t wait for the master in the other datacenter to acknowledge your write, a local master is preferred. Normally the auto increment offset is used to prevent data clashes between the masters. Having two masters perform writes to each other in this way is a broadly accepted solution.

MySQL Master-Master replication

However if you need to write in multiple datacenters into the same database, you end up with multiple masters that need to write their data to each other. Before MySQL 5.7.6 there was no method to do a mesh type of replication, so the alternative would be to use a circular ring replication instead.

MySQL ring replication topology

Ring replication in MySQL is problematic for the following reasons: latency, high availability and data drift. Writing some data to server A, it would take three hops to end up on server D (via server B and C). Since (traditional) MySQL replication is single threaded, any long running query in the replication may stall the whole ring. Also if any of the servers would go down, the ring would be broken and currently there is no failover software that can repair ring structures. Then data drift may occur when data is written to server A and is altered at the same time on server C or D.

Broken ring replication

In general circular replication is not a good fit with MySQL and it should be avoided at all costs. Galera would be a good alternative for multi-datacenter writes, as it has been designed with that in mind.

Stalling your replication with large updates

Often various housekeeping batch jobs will perform various tasks, ranging from cleaning up old data till calculating averages of ‘likes’ fetched from another source. This means at set intervals, a job will create a lot of database activity and, most likely, write a lot of data back to the database. Naturally this means the activity within the replication stream will increase equally.

Statement-based replication will replicate the exact queries used in the batch jobs, so if the query took half an hour to process on the master, the slave thread will be stalled for at least the same amount of time. This means no other data can replicate and the slave nodes will start lagging behind the master. If this exceeds the threshold of your failover tool or proxy, it may drop these slave nodes from the available nodes in the cluster. If you are using statement-based replication, you can prevent this by crunching the data for your job in smaller batches.

Now you may think row-based replication isn’t affected by this, as it will replicate the row information instead of the query. This is partly true, as for DDL changes, the replication reverts back to statement-based format. Also large numbers of CRUD operations will affect the replication stream: in most cases this is still a single threaded operation and thus every transaction will wait for the previous one to be replayed via replication. This means that if you have high concurrency on the master, the slave may stall on the overload of transactions during replication.

To get around this, both MariaDB and MySQL offer parallel replication. The implementation may differ per vendor and version. MySQL 5.6 offers parallel replication as long as the queries are separated by schema. MariaDB 10.0 and MySQL 5.7 both can handle parallel replication across schemas, but have other boundaries. Executing queries via parallel slave threads may speed up your replication stream if you are write heavy. However if you aren’t, it would be best to stick to the traditional single threaded replication.

Schema changes

Performing schema changes on a running production setup is always a pain. This has to do with the fact that a DDL change will most of the time lock a table and only release this lock once the DDL change has been applied. It even gets worse once you start replicating these DDL changes through MySQL replication, where it will in addition stall the replication stream.

A frequently used workaround is to apply the schema change to the slave nodes first. For statement-based replication this works fine, but for row-based replication this can work up to a certain degree. Row-based replication allows extra columns to exist at the end of the table, so as long as it is able to write the first columns it will be fine. First apply the change to all slaves, then failover to one of the slaves and then apply the change to the master and attach that as a slave. If your change involves inserting a column in the middle or removal of a column this will work with row-based replication.

There are tools around that can perform online schema changes more reliably. The Percona Online Schema Change (as known as pt-osc) will create a shadow table with the new table structure, insert new data via triggers and backfill data in the background. Once it is done creating the new table, it will simply swap the old for the new table inside a transaction. This doesn’t work in all cases, especially if your existing table already has triggers.

An alternative is the new Gh-ost tool by Github. This online schema change tool will first make a copy of your existing table layout, alter the table to the new layout and then hook up the process as a MySQL replica. It will make use of the replication stream to find new rows that have been inserted into the original table and at the same time it backfills the table. Once it is done backfilling, the original and new tables will switch. Naturally all operations to the new table will end up in the replication stream as well, thus on each replica the migration happens at the same time.

Memory tables and replication

While we are on the subject of DDLs, a common issue is the creation of memory tables. Memory tables are non-persistent tables, their table structure remains but they lose their data after a restart of MySQL. When creating a new memory table on both a master and a slave, both will have an empty table and this will work perfectly fine. Once either one gets restarted, the table will be emptied and replication errors will occur.

Row-based replication will break once the data in the slave node returns different results, and statement-based replication will break once it attempts to insert data that already exists. For memory tables this is a frequent replication-breaker. The fix is easy: simply make a fresh copy of the data, change the engine to InnoDB and it should now be replication safe.

Setting the read_only variable to true

As we described earlier, not having the same data in the slave nodes can break replication. Often this has been caused by something (or someone) altering the data on the slave node, but not on the master node. Once the master node’s data gets altered, this will be replicated to the slave where it can’t apply the change and this causes the replication to break.

There is an easy prevention for this: setting the read_only variable to true. This will disallow anyone to make changes to the data, except for the replication and root users. Most failover managers set this flag automatically to prevent users to write to the used master during failover. Some of them even retain this after the failover.

This still leaves the root user to execute an errant CRUD query on the slave node. To prevent this from happening, there is a super_read_only variable since MySQL 5.7.8 that even locks out the root user from updating data.

Enabling GTID

In MySQL replication, it is essential to start the slave from the correct position in the binary logs. Obtaining this position can be done when making a backup (xtrabackup and mysqldump support this) or when you have stopped slaving on a node that you are making a copy of. Starting replication with the CHANGE MASTER TO command would look like this:

mysql> CHANGE MASTER TO MASTER_HOST='x.x.x.x',MASTER_USER='replication_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='master-bin.0001', MASTER_LOG_POS=  04;

Starting replication at the wrong spot can have disastrous consequences: data may be double written or not updated. This causes data drift between the master and the slave node.

Also when failing over a master to a slave involves finding the correct position and changing the master to the appropriate host. MySQL doesn’t retain the binary logs and positions from its master, but rather creates its own binary logs and positions. For re-aligning a slave node to the new master this could become a serious problem: the exact position of the master on failover has to be found on the new master, and then all slaves can be re-aligned.

To solve this issue, the Global Transaction Identifier (GTID) has been implemented by both Oracle and MariaDB. GTIDs allow auto aligning of slaves, and in both MySQL and MariaDB the server figures out by itself what the correct position is. However both have implemented the GTID in a different way and are therefore incompatible. If you need to set up replication from one to another, the replication should be set up with traditional binary log positioning. Also your failover software should be made aware not to make use of GTIDs.

Conclusion

We hope to have given you enough tips to stay out of trouble. These are all common practices by the experts in MySQL. They had to learn it the hard way and with these tips we ensure you don’t have to.

We have some additional white papers that might be useful if you’d like to read more about MySQL replication.

Tags:

We previously compared two high availability solutions for MySQL - MHA and MariaDB Replication Manager and looked into how they performed fail-over. In this blog post, we’ll see how ClusterControl stacks up against these solutions. Since MariaDB Replication Manager is under active development, we decided to take a look at the not yet released version 1.1.

Flapping

All solutions provide flapping detection. MHA, by default, executes a failover once. Even after you restart masterha_manager, it will still check if the last failover didn’t happen too recently. If yes (by default, if it happened in the last 8 hours), no new failover will happen. You need to explicitly change the timeout or set --ignore_last_failover flag.

MariaDB Replication Manager has less strict defaults - it will allow up to three failovers as long as each of them will happen more than 10 seconds after the previous one. In our opinion this is a bit too flexible - if the first failover didn’t solve the problem, it is unlikely that another attempt will give better results. Still, default settings are there to be changed so you can configure MRM however you like.

ClusterControl uses similar approach to MHA - only one failover is attempted. Next one can happen only after the master has been detected successfully as online (for example, ClusterControl recovery or manual intervention by the admin managed to promote one of the slaves to a master) or after restart of cmon process.

Lost transactions

MHA can work in two modes - GTID or non-GTID. Those modes differ regarding to how missing transactions are handled. Traditional replication, actually, is handled in a better way - as long as the old master is reachable, MHA connects to it and attempts to recover missing transactions from its binary logs. If you use GTID mode, this does not happen which may lead to more significant data loss if your slaves didn’t manage to receive all relay logs - another very good reason to use semi-synchronous replication, which has you covered in this scenario.

MRM does not connect to the old master to get the logs. By default, it elects the most advanced slave and promotes it to master. Remaining slaves are slaved off this new master, making them as up to date as the new master. There is a potential for a data loss, on par with MHA’s GTID mode.

ClusterControl behaves similarly to MRM - it picks the most advanced slave as a master candidate and then, as long as it is safe (for example, there are no errant transactions), promote it to become a new master. Remaining slaves get slaved off this new master. If ClusterControl detects errant transactions, it will stop the failover and alert the administrator that manual intervention is needed. It is also possible to configure ClusterControl to skip errant transaction check and force the failover.

Network partitioning

For MHA, this has been taken care of by adding a second MHA Manager node, preferably in another section of your network. You can query it using secondary_check_script. It can be used to connect to another MHA node and execute masterha_check_repl to see how the cluster can be seen from that node. This gives MHA a better view on the situation and topology, it might not failover as it is unnecessary.

MRM implements another approach. It can be configured to use slaves, external MaxScale proxy or scripts executed through HTTP protocol on a custom port (like the scripts which governs HAProxy behavior) to build a full view of the topology and then make an informed decision based on this.

ClusterControl, at this moment, does not perform any advanced checks regarding availability of the master - it uses only its own view of the system, therefore it can take an action if there are network issues between the master and the ClusterControl host. Having said that, we are aware this can be a serious limitation and there is a work in progress to improve how ClusterControl detects failed master - using slaves and proxies like MaxScale or ProxySQL to get a broader picture of the topology.

Roles

Within MHA you are able to apply roles to a specific host, so for instance ‘candidate_master’ and ‘no_master’ will help you determine which hosts are preferred to become master. A good example could be the data center topology: spread the candidate master nodes over multiple racks to ensure HA. Or perhaps you have a delayed slave that may never become the new master even if it is the last node remaining.

This last scenario is likely to happen with MariaDB Replication Manager as it can’t see the other nodes anymore and thus can’t determine that this node is actually, for instance, 24 hours behind. MariaDB does not support the Delayed Slave command but it is possible to use pt-slave-delay instead. There is a way to set the maximum slave delay allowed for MRM, however MRM reads the Seconds_Behind_Master from the slave status output. Since MRM is executed after the master is dead, this value will obviously be null.

At the beginning of the failover procedure, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replicaton_failover_whitelist

and

replicaton_failover_blacklist

The whitelist contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered. The second variable may contain a list of hosts which will never be considered as master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

Replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set

Integration

MHA is a standalone tool, it doesn’t integrate well with other external software. It does however provide hooks (pre/post failover scripts) which can be used to do some integration - for instance, execute scripts to make changes in the configuration of an external tool. MHA also uses read_only value to differentiate between master and slaves - this can also be used by external tools to drive topology changes. One example would be ProxySQL - MHA can work with this proxy using both pre/post failover scripts and with read_only values, depending on the ProxySQL configuration. It’s worth mentioning that, in GTID mode, MHA doesn’t support MariaDB GTID - it only supports Oracle MySQL or Percona Server.

MRM integrates nicely with MaxScale - it can be used along MaxScale in a couple of ways. It could be set so MaxScale will do the work to monitor the health of the nodes and execute MRM as needed, to perform failovers. Another option is that MRM drives MaxScale - monitoring is done on MRM’s side and MaxScale’s configuration is updated as needed. MRM also sets read_only variables so it makes it compatible with other tools which understand those settings (like ProxySQL, for example). A direct integration with HAProxy is also available - MRM, if collocated, may modify the HAProxy configuration whenever the topology changes. On the cons side, MRM works only with MariaDB installations - it is not possible to use it with Oracle MySQL’s version of GTID.

ClusterControl uses read_only variables to differentiate between master and slave nodes. This is enough to integrate with every kind of proxy which could be deployed from ClusterControl: ProxySQL, MaxScale and HAProxy. Failover executed by ClusterControl will be detected and handled by any of those proxies. ClusterControl also integrates with external tools regarding management. It provides access to management console for MaxScale and, to some extend, to HAProxy. Advanced support for ProxySQL will be added shortly. Metrics are provided for HAProxy and ProxySQL. ClusterControl supports both Oracle GTID and MariaDB GTID.

Conclusion

If you are interested in details how MHA, MRM or ClusterControl handle failover, we’d like to encourage you to take a look at the blog posts listed below:

Below is a summary of the differences between the different HA solutions:

	MHA	MRM	ClusterControl
Replication support	non-GTID, Oracle GTID	MariaDB GTID	Oracle GTID and MariaDB GTID
Flapping	One failover allowed	Defaults are less restrictive but can be modified	One failover allowed unless it brings the master online
Lost transactions	Very good handling for non-GTID, no checking for transactions on master for GTID setups	No checking for transactions on master	No checking for transactions on master
Network Partitioning	No support built in, can be added through user-created scripts	Very good false positive detection using slaves, proxy or external scripts	No support at this moment, work in progress to build false positive detection using proxy and slaves
Roles	Support for whitelist and blacklist of hosts to promote to master	No support	Support for whitelist and blacklist of hosts to promote to master
Integration	Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.	Close integration with MaxScale, integration with HAProxy is also available. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.	Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.

If we are talking about handling master failure, each of the solutions does its job well and feature-wise they are mostly on par. There are some differences in almost every aspect that we compared but, at the end, each of them should handle most of the master failures pretty well. ClusterControl lacks more advanced network partitioning detection but this will change soon. What could be important to keep in mind that those tools support different replication methods and this alone can limit your options. If you use non-GTID replication, MHA is the only option for you. If you use GTID, MHA and MRM are restricted to, respectively, Oracle MySQL and MariaDB GTID setups. Only ClusterControl (you can test it for free) is flexible enough to handle both types of GTID under one tool - this could be very useful if you have a mixed environment while you still would like to use one single tool to ensure high availability of your replication setup.

Tags:

high availability

MySQL

clustercontrol

Cost analysis and cost effectiveness for databases are hardly ever performed, while it actually makes a lot of sense to perform such calculations. The easiest method of gaining insights into these is by performing a total cost of ownership (TCO) calculation. You might have theories on what your greatest cost factor is, but do you really know for sure?

Why would you perform such a TCO analysis? As with most research: prove your theory of the highest cost factor wrong. The TCO is a great tool to give a precise cost analysis and would give you some surprising insights!

Cost factors for databases

The cost factors for databases can be divided into two separate groups: capital expenses (CAPEX) and operational expenses (OPEX). Both cost factors are part of the infrastructure lifecycle.

TCO lifetime cycle

Capital expenses are the costs you pay upfront during the acquire phase: hardware purchases, (non-recurring) licensing cost and any other one time cost factors like replacement parts. These expenses are a constant factor in the TCO and are spread out over the lifetime of your database servers. Most of these costs will happen in the acquire phase, however the replacement parts will obviously take place in the maintenance phase.

Operational expenses are the costs for running the database servers. As these costs are recurring, you pay them on a regular (e.g., yearly) interval and mostly during the maintenance phase. These cost include datacenter/rack rental, power consumption, network usage and operational costs like (remote) hands and personnel. The last one includes sysops, DBAs and all costs made to facilitate them like desks, office space and training. Since these expenses are recurring, they will continue to grow during the lifetime of your database servers. The longer you operate these servers, the higher the operational expenses (OPEX) will be.

This means that the longer you use your database servers, the share between CAPEX and OPEX will shift towards a higher share of OPEX. The one time purchase of hardware may be considered a high cost upfront, but given that you will probably use the hardware for more than three years, it justifies the upfront cost.

For cloud hosting, the calculation will be similar. However, since you don’t have hardware to purchase upfront, the CAPEX will be a lot lower. As cloud hosting has a recurring monthly cost, the OPEX will be higher. In some cases, your cloud provider may calculate some (setup) cost upfront and this should be treated as CAPEX.

Example calculation for hardware

In this example we will make a calculation of a small company (under 100 employees) that hosts on hardware in their own racks in a data center. This company has two dedicated sysops and one experienced DBA (1-4 years), where the DBA is managing around 20 databases and the sysops around 200 hosts. The average DBA salary for this is $65,000, so the annual cost per database would be $3,250. The sysops average around $50,000 for the same experience, and cost $250 per host per year. The sysops are also the people who manage the datacenter. We will not factor in the facilitation costs as this would get over complicated.

For our example cluster, we will make use of a three node MySQL replication setup: one master and two slave nodes. Hardware is based upon the Dell R730 with 64GB of memory and six 400GB SSDs, as this is a very popular model for this purpose. The price of a R730 with this configuration is currently $7655.

Rental cost of a full rack is nowadays around $350[1], so the colocation cost per U is roughly $8 per month. Since the R730 is a 2U unit, the total cost for our databases would be $48 per month.

Modern colocation costs factor out the power consumption, as the power is a variable factor. Prices for power with colocation can vary a lot, but it currently averages around $0.20 per kWh. The average database server consumes around 200 watts, which results in a 144kWh consumption per month per server. For our three database servers this would result in $86 per month.

This results in the following TCO:

Cost item	CAPEX	OPEX (per year)	TCO (3 years)
Purchase: hardware	$22,965
Professional support (DBA / Sysop)		$10,500
Colocation cost		$576
Power cost (200W)		$1,032
Replacement parts	$1,500
Total	$24,465	$12,108	$60,789

TCO for database servers (hardware)

There are a couple of conclusions we can draw from this calculation. Cost for colocation, power and replacement parts are neglectable, compared to the other cost factors. Also during the lifetime of a database server, the support costs make up more than half of the total costs. And are far higher than the original purchase price of the servers.

Example calculation cloud hosting

In this example we will make a calculation for a company that hosts in the cloud. To compare fairly, we will again make use of a three node MySQL replication setup on EC2. Amazon provides a nice TCO calculator for these purposes, so we made use of this as input for the calculations below.

To make the database servers comparable, we chose the i3.2xlarge, which (currently) has 8 vCPUs, 61GB of RAM and 1900GB of SSD storage. This currently costs $0.624 per hour, which is slightly below $15 per day and $5466 per year.

In the cloud the upfront investments (CAPEX) are not necessary. This is true in many cases, except if you make use of reserved instances like in AWS. With reserved instances, you make a claim on Amazon to reserve (performant) capacity for you, that you can use at will. In our calculation, we will not make use of reserved instances. Next to the lower CAPEX, our OPEX should be lower since our sysops don’t have to go to the data center or install these servers.

This results in the following TCO:

Cost item	CAPEX	OPEX (per year)	TCO (3 years)
Professional support (DBA only)		$9,750
AWS 3x i3.2xlarge		$16,399
Total	$0	$27,149	$78,447

TCO for databases (cloud hosted)

Even though we have eliminated our upfront costs and capital investments (CAPEX), the OPEX is really high due to the premium we have to pay for high performance instances in AWS. Over a three year period, your TCO will be higher than having your own hardware

OPEX has a large influence

As you can see from these calculations, the influence of the operational costs (OPEX) during the lifetime of the servers is far greater than the initial large investment of the CAPEX. This is mostly due to running (and owning) these servers for multiple years.

In the case of owning your own hardware, we have shown that the operational costs even outweigh the initial costs for purchasing these servers. For the AWS example, the total costs of “owning” these servers is even higher than for the hardware example. This is the premium paid for flexibility, as with a cloud environment you are free to upgrade to a newer instance every year.

For both examples it is clear that the professional support for running these databases is relatively high. It looks like the sysops are clearly far more efficient when they are managing more than 200 hosts, but they don’t have to bother with the additional tasks that the DBA is supposed to do. If you could only make the DBA more efficient.

Making the DBA more efficient

Luckily, there are a few methods to make your DBA more efficient and handle more database servers. Either have the DBA relieved from various tasks (others will perform these tasks) or have the DBA perform less tasks through automation. The low hanging fruit would be to automate the most time consuming tasks or the most error prone ones.

The most time consuming DBAs tasks are provisioning, deployments, performance tuning, troubleshooting, backups and scaling clusters. Provisioning and installation of software is a repetitive task that can easily be automated, just like copying data and setting up replication when scaling out with read slaves. Similarly backups can be automated and with the help of a backup manager the restore process as well.

For the most error prone actions we could identify setting up replication, failover and schema management. In these tasks a single typo could lead to disastrous proportions where the only way to resolve is to restore from an earlier made backup.

Automation of repetitive tasks and chores is a tedious, but useful task. It takes time to automate each and every one of these tasks, and your DBA will most certainly be more busy with automation than with the other (daily) tasks. This automation may actually interfere with the normal day to day jobs, especially if the DBA isn’t a developer type and is struggling with the automation jobs. Wouldn’t it be more productive to offer the DBA a readily available toolset to work with instead? Or perhaps provide the sysops with a way that allows them to perform the tasks instead?

Do you even need a DBA?

Not all companies have full-time DBAs, at least not if you have a small number of databases. A DBA is a very specialized role, where a single person is dedicated to perform all database related tasks. This requires specialized knowledge of specific hardware, specific software, operating systems and in depth knowledge of SQL. Placing such a specialized person on a small number of databases, means the person will not have a lot to do and only cost money.

A sysop (or system administrator) is more of a generalist, and has to do a bit of everything. They generally manage hardware, operating systems, network, security, applications, databases and storage. They may have specific knowledge on one or more of these systems, but they can’t be specialized in all of them. The knowledge gap becomes more apparent when it comes to distributed setups (replication or clusters) and need for high availability.

As shown in the example calculations, the difference in salary expresses this picture as well. A DBA will cost more than a sysop and is more difficult to find as there’s not many of them around. That means that you probably have to do without a DBA. The challenge for the sysop is to have enough time to keep the databases perform well, troubleshoot any issues, monitor for any anomalies, maintain high availability, ensure data integrity and that data is backed up (and backups are verified to be ok).

ClusterControl saving costs

In ClusterControl the full database lifetime cycle has already been automated for the most popular open source databases. This means the DBA doesn’t necessarily have to automate his/her job anymore, as most of the tasks already have been automated in ClusterControl. Reliability will increase as the automation done in ClusterControl has been tested through and through, while what the DBA produced will only be tested directly in production.

Implementing a complete lifetime cycle management tool like ClusterControl means the DBA can now spend more time on useful things and manage more database servers. Or it could also mean people with less knowledge on databases can perform the same tasks.

Tags:

TCO

AWS

clustercontrol

This post is a continuation of our previous post on Online Schema Upgrade in Galera using TOI method. We will now show you how to perform a schema upgrade using the Rolling Schema Upgrade (RSU) method.

RSU and TOI

As we discussed, when using TOI, a change happens at the same time on all of the nodes. This can become a serious limitation as such way of executing schema changes implies that no other queries can be executed. For long ALTER statements, the cluster may be not available for hours even. Obviously, this is not something you can accept in production. RSU method addresses this weakness - changes happen on one node at a time while other nodes are not affected and can serve traffic. Once ALTER completes on one node, it will rejoin the cluster and you can proceed with executing a schema change on the next node.

Such behavior comes with its own set of limitations. The main one is that scheduled schema change has to be compatible. What does it mean? Let’s think about it for a while. First of all we need to keep in mind that the cluster is up and running all the time - the altered node has to be able to accept all of the traffic which hit the remaining nodes. In short, a DML executed on the old schema has to work also on the new schema (and vice-versa if you use some sort of round-robin-like connection distribution in your Galera Cluster). We will focus on the MySQL compatibility, but you also have to remember that your application has to work with both altered and non-altered nodes - make sure that your alter won’t break the application logic. One good practice is to explicitly pass column names to queries - don’t rely on “SELECT *” because you never know how many columns you’ll get in return.

Galera and Row-based binary log format

Ok, so DML has to work on old and new schemas. How are DML’s transferred between Galera nodes? Does it affect what changes are compatible and what are not? Yes, indeed - it does. Galera does not use regular MySQL replication but it still relies on it to transfer events between the nodes. To be precise, Galera uses ROW format for events. An event in row format (after decoding) may look like this:

### INSERT INTO `schema`.`table`
### SET
###   @1=1
###   @2=1
###   @3='88764053989'
###   @4='14700597838'

Or:

### UPDATE `schema`.`table`
### WHERE
###   @1=1
###   @2=1
###   @3='88764053989'
###   @4='14700597838'
### SET
###   @1=2
###   @2=2
###   @3='88764053989'
###   @4='81084251066'

As you can see, there is a visible pattern: a row is identified by its content. There are no column names, just their order. This alone should turn on some warning lights: “what would happen if I remove one of the columns?” Well, if it is the last column, this is acceptable. If you would remove a column in the middle, this will mess up with the column order and, as a result, replication will break. Similar thing will happen if you add some column in the middle, instead of at the end. There are more constraints, though. Changing column definition will work as long as it is the same data type - you can alter INT column to become BIGINT but you cannot change INT column into VARCHAR - this will break replication. You can find detailed description of what change is compatible and what isn’t in the MySQL documentation. No matter what you can see in the documentation, to stay on the safe side, it’s better to run some tests on a separate development/staging cluster. Make sure it will work not only according to the documentation, but that it also works fine in your particular setup.

All in all, as you can clearly see, performing RSU in a safe way is much more complex than just running couple of commands. Still, as commands are important, let’s take a look at the example of how you can perform the RSU and what can go wrong in the process.

RSU example

Initial setup

Let’s imagine a rather simple example of an application. We will use a bechmark tool, Sysbench, to generate content and traffic, but the flow will be the same for almost every application - Wordpress, Joomla, Drupal, you name it. We will use HAProxy collocated with our application to split reads and writes among Galera nodes in a round-robin fashion. You can check below how HAProxy sees the Galera cluster.

Whole topology looks like below:

Traffic is generated using the following command:

while true ; do sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --max-requests=0 --time=3600 --mysql-host=10.0.0.100 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=3307 --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable run ; done

Schema looks like below:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

First, let’s see how we can add an index to this table. Adding an index is a compatible change which can be easily done using RSU.

mysql> SET SESSION wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE sbtest1.sbtest1 ADD INDEX idx_new (k, c);
Query OK, 0 rows affected (5 min 19.59 sec)

As you can see in the Node tab, the host on which we executed the change has automatically switched to Donor/Desynced state which ensures that this host will not impact the rest of the cluster if it gets slowed down by the ALTER.

Let’s check how our schema looks now:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

As you can see, the index has been added. Please keep in mind, though, this happened only on that particular node. To accomplish a full schema change, you have to follow this process on the remaining nodes of the Galera Cluster. To finish with the first node, we can switch wsrep_OSU_method back to TOI:

SET SESSION wsrep_OSU_method=TOI;
Query OK, 0 rows affected (0.00 sec)

We are not going to show the remainder of the process, because it’s the same - enable RSU on the session level, run ALTER, enable TOI. What’s more interesting is what would happen if the change will be incompatible. Let’s take again a quick look at the schema:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Let’s say we want to change the type of column ‘k’ from INT to VARCHAR(30) on one node.

mysql> SET SESSION wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE sbtest1.sbtest1 MODIFY COLUMN k VARCHAR(30) NOT NULL DEFAULT '';
Query OK, 10004785 rows affected (1 hour 14 min 51.89 sec)
Records: 10004785  Duplicates: 0  Warnings: 0

Now, lets take a look at the schema:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` varchar(30) NOT NULL DEFAULT '',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

Everything is as we expect - ‘k’ column has been changed to VARCHAR. Now we can check if this change is acceptable or not for the Galera Cluster. To test it, we will use one of remaining, unaltered nodes to execute the following query:

mysql> INSERT INTO sbtest1.sbtest1 (k, c, pad) VALUES (123, 'test', 'test');
Query OK, 1 row affected (0.19 sec)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Let’s see what happened.

It definitely doesn’t look good - our node is down. Logs will give you more details:

2017-04-07T10:51:14.873524Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.873560Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879120Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2017-04-07T10:51:14.879272Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879287Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879399Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2017-04-07T10:51:14.879618Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879633Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879730Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2017-04-07T10:51:14.879911Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879924Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.885255Z 5 [ERROR] WSREP: Failed to apply trx: source: 938415a6-1aab-11e7-ac29-0a69a4a1dafe version: 3 local: 0 state: APPLYING flags: 1 conn_id: 125559 trx_id: 2856843 seqnos (l: 392283, g: 9
82675, s: 982674, d: 982563, ts: 146831275805149)
2017-04-07T10:51:14.885271Z 5 [ERROR] WSREP: Failed to apply trx 982675 4 times
2017-04-07T10:51:14.885281Z 5 [ERROR] WSREP: Node consistency compromized, aborting…

As can be seen, Galera complained about the fact that the column cannot be converted from INT to VARCHAR(30). It attempted to re-execute the writeset four times but it failed, unsurprisingly. As such, Galera determined that the node consistency is compromised and the node is kicked out of the cluster. Remaining content of the logs shows this process:

2017-04-07T10:51:14.885560Z 5 [Note] WSREP: Closing send monitor...
2017-04-07T10:51:14.885630Z 5 [Note] WSREP: Closed send monitor.
2017-04-07T10:51:14.885644Z 5 [Note] WSREP: gcomm: terminating thread
2017-04-07T10:51:14.885828Z 5 [Note] WSREP: gcomm: joining thread
2017-04-07T10:51:14.885842Z 5 [Note] WSREP: gcomm: closing backend
2017-04-07T10:51:14.896654Z 5 [Note] WSREP: view(view_id(NON_PRIM,6fcd492a,37) memb {
        b13499a8,0
} joined {
} left {
} partitioned {
        6fcd492a,0
        938415a6,0
})
2017-04-07T10:51:14.896746Z 5 [Note] WSREP: view((empty))
2017-04-07T10:51:14.901477Z 5 [Note] WSREP: gcomm: closed
2017-04-07T10:51:14.901512Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-04-07T10:51:14.901531Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-04-07T10:51:14.901541Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-04-07T10:51:14.901550Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 982675)
2017-04-07T10:51:14.901563Z 0 [Note] WSREP: Received self-leave message.
2017-04-07T10:51:14.901573Z 0 [Note] WSREP: Flow-control interval: [0, 0]
2017-04-07T10:51:14.901581Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2017-04-07T10:51:14.901589Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 982675)
2017-04-07T10:51:14.901602Z 0 [Note] WSREP: RECV thread exiting 0: Success
2017-04-07T10:51:14.902701Z 5 [Note] WSREP: recv_thread() joined.
2017-04-07T10:51:14.902720Z 5 [Note] WSREP: Closing replication queue.
2017-04-07T10:51:14.902730Z 5 [Note] WSREP: Closing slave action queue.
2017-04-07T10:51:14.902742Z 5 [Note] WSREP: /usr/sbin/mysqld: Terminated.

Of course, ClusterControl will attempt to recover such node - recovery involves running SST so incompatible schema changes will be removed, but we will be back at the square one - our schema change will be reversed.

As you can see, while running RSU is a very simple process, underneath it can be rather complex. It requires some tests and preparations to make sure that you won’t lose a node just because the schema change was not compatible.

Tags:

galera

schema change

In this video Art van Scheppingen shows you how easy it is to add your Galera setups to ClusterControl. ClusterControl provides you with a single console to deploy, manage, monitor, and scale your Galera setups and mixed environment. In addition to advanced monitoring, ClusterControl affords feature benefits like comprehensive security, automation, reporting, and point-and-click deployments.

Watch the video and learn more about the following topics:

Deploying a new Galera cluster in ClusterControl
Importing an existing Galera Cluster into ClusterControl
Navigating the Cluster Overview section in ClusterControl
Accessing your Galera monitoring dashboards in ClusterControl
Accessing your Galera node data tables in ClusterControl
Accessing server statistics in ClusterControl
Navigating the nodes section in ClusterControl
Enabling binary logging from your Galera nodes in ClusterControl
Using advisors in your Galera setup
Performing backups
Access log files for
Adding new nodes to your Galera Clusters
Adding Asynchronous slaves
Cloning Clusters in ClusterControl
Load balancing for HAproxy, ProxySQL, MaxScale
Galera arbitrator in ClusterControl

Tags:

clustercontrol

galera cluster

severalnines

This video walks you through the features that ClusterControl offers for Galera Cluster and how you can use them to deploy, manage, monitor and scale your open source database environments.

ClusterControl and Galera Cluster

ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your Galera clusters up-and-running using proven methodologies that you can depend on to work.

At the core of ClusterControl is it’s automation functionality that let’s you automate many of the database tasks you have to perform regularly like deploying new clusters, adding and scaling new nodes, running backups and upgrades, and more.

ClusterControl is cluster-aware, topology-aware and able to provision all members of the Galera Cluster, including the replication chain connected to it.

To learn more check out the following resources…

Tags:

Hybrid replication, i.e. combining Galera and asynchronous MySQL replication in the same setup, became much easier since GTID got introduced in MySQL 5.6. Although it was fairly straightforward to replicate from a standalone MySQL server to a Galera Cluster, doing it the other way round (Galera → standalone MySQL) was a bit more challenging. At least until the arrival of GTID.

There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. In a belts and suspenders approach, an asynchronous slave can also serve as a remote live backup.

In this blog post, we will show you how to replicate a Galera Cluster to a MySQL server with GTID, and how to failover the replication in case the master node fails.

Hybrid Replication in MySQL 5.5

In MySQL 5.5, resuming a broken replication requires you to determine the last binary log file and position, which are distinct on all Galera nodes if binary logging is enabled. We can illustrate this situation with the following figure:

Galera cluster asynchronous slave topology without GTID

If the MySQL master fails, replication breaks and the slave will need to switch over to another master. You will need pick a new Galera node, and manually determine a new binary log file and position of the last transaction executed by the slave. Another option is to dump the data from the new master node, restore it on slave and start replication with the new master node. These options are of course doable, but not very practical in production.

How GTID Solves the Problem

GTID (Global Transaction Identifier) provides a better transactions mapping across nodes, and is supported in MySQL 5.6. In Galera Cluster, all nodes will generate different binlog files. The binlog events are the same and in the same order, but the binlog file names and offsets may vary. With GTID, slaves can see a unique transaction coming in from several masters and this could easily being mapped into the slave execution list if it needs to restart or resume replication.

Galera cluster asynchronous slave topology with GTID failover

All necessary information for synchronizing with the master is obtained directly from the replication stream. This means that when you are using GTIDs for replication, you do not need to include MASTER_LOG_FILE or MASTER_LOG_POS options in the CHANGE MASTER TO statement. Instead it is necessary only to enable the MASTER_AUTO_POSITION option. You can find more details about the GTID in the MySQL Documentation page.

Setting Up Hybrid Replication by hand

Make sure the Galera nodes (masters) and slave(s) are running on MySQL 5.6 before proceeding with this setup. We have a database called sbtest in Galera, which we will replicate to the slave node.

1. Enable required replication options by specifying the following lines inside each DB node’s my.cnf (including the slave node):

For master (Galera) nodes:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=1         # 1 for master1, 2 for master2, 3 for master3
binlog_format=ROW

For slave node:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=101         # 101 for slave
binlog_format=ROW
replicate_do_db=sbtest
slave_net_timeout=60

2. Perform a cluster rolling restart of the Galera Cluster (from ClusterControl UI > Manage > Upgrade > Rolling Restart). This will reload each node with the new configurations, and enable GTID. Restart the slave as well.

3. Create a slave replication user and run following statement on one of the Galera nodes:

mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepassword';

4. Log into the slave and dump database sbtest from one of the Galera nodes:

$ mysqldump -uroot -p -h192.168.0.201 --single-transaction --skip-add-locks --triggers --routines --events sbtest > sbtest.sql

5. Restore the dump file onto the slave server:

$ mysql -uroot -p < sbtest.sql

6. Start replication on the slave node:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.201', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

To verify that replication is running correctly, examine the output of slave status:

mysql> SHOW SLAVE STATUS\G
       ...
       Slave_IO_Running: Yes
       Slave_SQL_Running: Yes
       ...

Setting up Hybrid Replication using ClusterControl

In the previous paragraph we described all the necessary steps to enable the binary logs, restart the cluster node by node, copy the data and then setup replication. The procedure is a tedious task and you can easily make errors in one of these steps. In ClusterControl we have automated all the necessary steps.

1. For ClusterControl users, you can go to the nodes in the Nodes page and enable binary logging.

Enable binary logging on Galera cluster using ClusterControl

This will open a dialogue that allows you to set the binary log expiration, enable GTID and auto restart.

Enable binary logging with GTID enabled

This initiates a job, that will safely write these changes to the configuration, create replication users with the proper grants and restart the node safely.

Photo description

Repeat this process for each Galera node in the cluster, until all nodes indicate they are master.

All Galera Cluster nodes are now master

2. Add the asynchronous replication slave to the cluster

Adding an asynchronous replication slave to Galera Cluster using ClusterControl

And this is all you have to do. The entire process described in the previous paragraph has been automated by ClusterControl.

Changing Master

If the designated master goes down, the slave will retry to reconnect again in slave_net_timeout value (our setup is 60 seconds - default is 1 hour). You should see following error on slave status:

       Last_IO_Errno: 2003
       Last_IO_Error: error reconnecting to master 'slave@192.168.0.201:3306' - retry-time: 60  retries: 1

Since we are using Galera with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

If you wish to perform the failover manually, simply change the master node as follows:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

In some cases, you might encounter a “Duplicate entry .. for key” error after the master node changed:

       Last_Errno: 1062
       Last_Error: Could not execute Write_rows event on table sbtest.sbtest; Duplicate entry '1089775' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.000009, end_log_pos 85789000

In older versions of MySQL, you can just use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n to skip statements, but it does not work with GTID. Miguel from Percona wrote a great blog post on how to repair this by injecting empty transactions.

Another approach, for smaller databases, could also be to just get a fresh dump from any of the available Galera nodes, restore it and use RESET MASTER statement:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> DROP SCHEMA sbtest; CREATE SCHEMA sbtest; USE sbtest;
mysql> SOURCE /root/sbtest_from_galera2.sql; -- repeat step #4 above to get this dump
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

You may also use pt-table-checksum to verify the replication integrity, more information in this blog post.

Note: Since in MySQL replication the slave applier is by default still single-threaded, do not expect the async replication performance to be the same as Galera’s parallel replication. For MySQL 5.6 and 5.7 there are options to make the asynchronous replication executed in parallel on the slave nodes, but in principle this replication is still depending on the correct order of transactions inside the same schema to happen. If the replication load is intensive and continuous, the slave lag will just keep growing. We have seen cases where slave could never catch up with the master.

Tags:

galera cluster

MySQL

clustercontrol

asynchronous replication

gtid

Combining Galera and asynchronous replication in the same MariaDB setup, aka Hybrid Replication, can be useful - e.g. as a live backup node in a remote datacenter or reporting/analytics server. We already blogged about this setup for Codership/Galera or Percona XtraDB Cluster users, but a master failover as described in that post does not work for MariaDB because of its different GTID approach. In this post, we will show you how to deploy an asynchronous replication slave to MariaDB Galera Cluster 10.x (with master failover!), using GTID with ClusterControl.

Preparing the Master

First and foremost, you must ensure that the master and slave nodes are running on MariaDB Galera 10.0.2 or later. A MariaDB replication slave requires at least one master with GTID among the Galera nodes. However, we would recommend users to configure all the MariaDB Galera nodes as masters. GTID, which is automatically enabled in MariaDB, will be used to do master failover.The following must be true for the masters:

At least one master among the Galera nodes
All masters must be configured with the same domain ID
log_slave_updates must be enabled
All masters’ MariaDB port is accessible by ClusterControl and slaves
Must be running MariaDB version 10.0.2 or later

From ClusterControl this is easily done by selecting Enable Binary Logging in the drop down for each node.

Enabling binary logging through ClusterControl

And then enable GTID in the dialogue:

Once Proceed has been clicked, a job will automatically configure the Galera node according to the settings described earlier.

If you wish to perform this action by hand, you can configure a Galera node as master, by changing the MariaDB configuration file for that node as per below:

gtid_domain_id=<must be same across all mariadb servers participating in replication>
server_id=<must be unique>
binlog_format=ROW
log_slave_updates=1
log_bin=binlog

After making these changes, restart the nodes one by one or using a rolling restart (ClusterControl > Manage > Upgrades > Rolling Restart)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Preparing the Slave

For the slave, you would need a separate host or VM with or without MariaDB installed. If you do not have MariaDB installed, you need to perform the following tasks; configure root password (based on monitored_mysql_root_password), create slave user (based on repl_user, repl_password), configure MariaDB, start the server and finally start replication.

Adding the slave using ClusterControl, all these steps will be automated in the Add Replication Slave job as described below.

Add replication slave to MariaDB Cluster

After adding our slave node, our deployment will look like this:

MariaDB Galera asynchronous slave topology

Master Failover and Recovery

Since we are using MariaDB with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

Automatic slave failover to another master in Galera cluster

This way ClusterControl will add a robust asynchronous slave capability to your MariaDB Cluster!

Tags:

asynchronous replication

Our journey in adopting MySQL and MariaDB in containerized environments continues, with ClusterControl coming into the picture to facilitate deployment and management. We already have our ClusterControl image hosted in Docker Hub, where it can deploy different replication/cluster topologies on multiple containers. With the introduction of Docker Swarm, a native orchestration tools embedded inside Docker Engine, scaling and provisioning containers has become much easier. It also has high availability covered by running services on multiple Docker hosts.

In this blog post, we’ll be experimenting with automatic provisioning of Galera Cluster on Docker Swarm with ClusterControl. ClusterControl would usually deploy database clusters on bare-metal, virtual machines and cloud instances. ClusterControl relies on SSH (through libssh) as core communication module to connect to the managed hosts, so these would not require any agents. The same rule can be applied to containers, and that’s what we are going to show in this blog post.

ClusterControl as Docker Swarm Service

We have built a Docker image with extended logic to handle deployment in container environments in a semi-automatic way. The image is now available on Docker Hub and the code is hosted in our Github repository. Please note that only this image is capable of deploying on containers, and is not available in the standard ClusterControl installation packages.

The extended logic is inside deploy-container.sh, a script that monitors a custom table inside CMON database called “cmon.containers”. The created database container shall report and register itself into this table and this script will look for new entries and perform the necessary action using a ClusterControl CLI. The deployment is automatic, and you can monitor the progress directly from the ClusterControl UI or using “docker logs” command.

Before we go further, take note of some prerequisites for running ClusterControl and Galera Cluster on Docker Swarm:

Docker Engine version 1.12 and later.
Docker Swarm Mode is initialized.
ClusterControl must be connected to the same overlay network as the database containers.

To run ClusterControl as a service using “docker stack”, the following definition should be enough:

 clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

Or, you can use the “docker service” command as per below:

$ docker service create --name cc_clustercontrol -p 5000:80 --replicas 1 severalnines/clustercontrol

Or, you can combine the ClusterControl service together with the database container service and form a “stack” in a compose file as shown in the next section.

Base Containers as Docker Swarm Service

The base container’s image called “centos-ssh” is based on CentOS 6 image. It comes with a couple of basic packages like SSH server, clients, curl and mysql client. The entrypoint script will download ClusterControl’s public key for passwordless SSH during startup. It will also register itself to the ClusterControl’s CMON database for automatic deployment.

Running this container requires a couple of environment variables to be set:

CC_HOST - Mandatory. By default it will try to connect to “cc_clustercontrol” service name. Otherwise, define its value in IP address, hostname or service name format. This container will download the SSH public key from ClusterControl node automatically for passwordless SSH.
CLUSTER_TYPE - Mandatory. Default to “galera”.
CLUSTER_NAME - Mandatory. This name distinguishes the cluster with others from ClusterControl perspective. No space allowed and it must be unique.
VENDOR - Default is “percona”. Other supported values are “mariadb”, “codership”.
DB_ROOT_PASSWORD - Mandatory. The database root password for the database server. In this case, it should be MySQL root password.
PROVIDER_VERSION - Default is 5.6. The database version by the chosen vendor.
INITIAL_CLUSTER_SIZE - Default is 3. This indicates how ClusterControl should treat newly registered containers, whether they are for new deployments or for scaling out. For example, if the value is 3, ClusterControl will wait for 3 containers to be running and registered into the CMON database before starting the cluster deployment job. Otherwise, it waits 30 seconds for the next cycle and retries. The next containers (4th, 5th and Nth) will fall under the “Add Node” job instead.

To run the container, simply use the following stack definition in a compose file:

  galera:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "PXC_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

By combining them both (ClusterControl and database base containers), we can just deploy them under a single stack as per below:

version: '3'

services:

  galera:
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure
        delay: 10s
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "Galera_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

  clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

networks:
  galera_cc:
    driver: overlay

Save the above lines into a file, for example docker-compose.yml in the current directory. Then, start the deployment:

$ docker stack deploy --compose-file=docker-compose.yml cc
Creating network cc_galera_cc
Creating service cc_clustercontrol
Creating service cc_galera

Docker Swarm will deploy one container for ClusterControl (replicas:1) and another 3 containers for the database cluster containers (replicas:3). The database container will then register itself into the CMON database for deployment.

Wait for a Galera Cluster to be ready

The deployment will be automatically picked up by the ClusterControl CLI. So you basically don’t have to do anything but wait. The deployment usually takes around 10 to 20 minutes depending on the network connection.

Open the ClusterControl UI at http://{any_Docker_host}:5000/clustercontrol, fill in the default administrator user details and log in. Monitor the deployment progress under Activity -> Jobs, as shown in the following screenshot:

Or, you can look at the progress directly from the docker logs command of the ClusterControl container:

$ docker logs -f $(docker ps | grep clustercontrol | awk {'print $1'})>> Found the following cluster(s) is yet to deploy:
Galera_Docker>> Number of containers for Galera_Docker is lower than its initial size (3).>> Nothing to do. Will check again on the next loop.>> Found the following cluster(s) is yet to deploy:
Galera_Docker>> Found a new set of containers awaiting for deployment. Sending deployment command to CMON.>> Cluster name         : Galera_Docker>> Cluster type         : galera>> Vendor               : percona>> Provider version     : 5.7>> Nodes discovered     : 10.0.0.6 10.0.0.7 10.0.0.5>> Initial cluster size : 3>> Nodes to deploy      : 10.0.0.6;10.0.0.7;10.0.0.5>> Deploying Galera_Docker.. It's gonna take some time..>> You shall see a progress bar in a moment. You can also monitor>> the progress under Activity (top menu) on ClusterControl UI.
Create Galera Cluster
- Job  1 RUNNING    [██▊       ]  26% Installing MySQL on 10.0.0.6

That’s it. Wait until the deployment completes and you will then be all set with a three-node Galera Cluster running on Docker Swarm, as shown in the following screenshot:

In ClusterControl, it has the same look and feel as what you have seen with Galera running on standard hosts (non-container) environment.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Management

Managing database containers is a bit different with Docker Swarm. This section provides an overview of how the database containers should be managed through ClusterControl.

Connecting to the Cluster

To verify the status of the replicas and service name, run the following command:

$ docker service ls
ID            NAME               MODE        REPLICAS  IMAGE
eb1izph5stt5  cc_clustercontrol  replicated  1/1       severalnines/clustercontrol:latest
ref1gbgne6my  cc_galera          replicated  3/3       severalnines/centos-ssh:latest

If the application/client is running on the same Swarm network space, you can connect to it directly via the service name endpoint. If not, use the routing mesh by connecting to the published port (3306) on any of the Docker Swarm nodes. The connection to these endpoints will be load balanced automatically by Docker Swarm in a round-robin fashion.

Scale up/down

Typically, when adding a new database, we need to prepare a new host with base operating system together with passwordless SSH. In Docker Swarm, you just need to scale out the service using the following command to the number of replicas that you desire:

$ docker service scale cc_galera=5
cc_galera scaled to 5

ClusterControl will then pick up the new containers registered inside cmon.containers table and trigger add node jobs, for one container at a time. You can look at the progress under Activity -> Jobs:

Scaling down is similar, by using the “service scale” command. However, ClusterControl doesn’t know whether the containers that have been removed by Docker Swarm were part of the auto-scheduling or just a scale down (which indicates that we deliberately wanted the containers to be removed). Thus, to scale down from 5 nodes to 3 nodes, one would:

$ docker service scale cc_galera=3
cc_galera scaled to 3

Then, remove the stopped hosts from the ClusterControl UI by going to Nodes -> rollover the removed container -> click on the ‘X’ icon on the top right -> Confirm & Remove Node:

ClusterControl will then execute a remove node job and bring back the cluster to the expected size.

Failover

In case of container failure, Docker Swarm automatic rescheduling will kick in and there will be a new replacement container with the same IP address as the old one (with different container ID). ClusterControl will then start to provision this node from scratch, by performing the installation process, configuration and getting it to rejoin the cluster. The old container will be removed automatically from ClusterControl before the deployment starts.

Go ahead and try to kill one of the database containers:

$ docker kill [container ID]

You’ll see the new containers that Swarm created will be provisioned automatically by ClusterControl.

Creating a new cluster

To create a new cluster, just create another service or stack with a different CLUSTER_NAME and service name. The following is an example that we want to create another Galera Cluster running on MariaDB 10.1 (some extra environment variables are required for MariaDB 10.1):

version: '3'
services:
  galera2:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "MariaDB_Galera"
      VENDOR: "mariadb"
      PROVIDER_VERSION: "10.1"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - cc_galera_cc

networks:
  cc_galera_cc:
    external: true

Then, create the service:

$ docker stack deploy --compose-file=docker-compose.yml db2

Go back to ClusterControl UI -> Activity -> Jobs and you should see a new deployment has started. After a couple of minutes, you will see the new cluster will be listed inside ClusterControl dashboard:

Destroying everything

To remove everything (including the ClusterControl container), you just need to remove the stack created by Docker Swarm:

$ docker stack rm cc
Removing service cc_clustercontrol
Removing service cc_galera
Removing network cc_galera_cc

That’s it, the whole stack has been removed. Pretty neat huh? You can start all over again by running the “docker stack deploy” command and everything will be ready after a couple of minutes.

Summary

The flexibility you get by running a single command to deploy or destroy a whole environment can be useful in different types of use cases such as backup verification, DDL procedure testing, query performance tweaking, experimenting for proof-of-concepts and also for staging temporary data. These use cases are closer to developer environment. With this approach, you can now treat a stateful service “statelessly”.

Would you like to see ClusterControl manage the whole database container stack through the UI via point and click? Let us know your thoughts in the comments section below. In the next blog post, we are going to look at how to perform automatic backup verification on Galera Cluster using containers.

Tags:

In today’s competitive technology environment, high availability is a must. There is no way around it - if your website or service is not available, then most probably you are losing money. It could relate directly to money loss - your customers cannot access your e-commerce service and they cannot spend their money (in addition, they are likely to use your competitors instead). Or it can be less directly linked - your sales reps cannot reach your web-based CRM system and that seriously limits their productivity. No matter how, a website which cannot be reached can be more or less serious for any organization. The question is - how do we ensure that your website will stay available? Assuming you are using MySQL or MariaDB (not unlikely, if it is a website), one of the technologies that can be utilized is Galera Cluster. In this blog post, we’ll show you how to leverage a high availability database to improve the availability of your site.

High availability is hard with databases

Every website is different, but in general, we’ll see some frontend webservers, a database backend, load balancers, file system storage and additional components like caching systems. To make a website highly available, we would need each component to be highly available so we do not have any single point of failure (SPOF).

The webserver tier is usually relatively easy to scale, as it is possible to deploy multiple instances behind a load balancer. If you would want to preserve session state across all webservers, one would probably store it in a shared database (Memcached, or even MySQL Cluster for a pretty robust alternative). Load balancers can be made highly available (e.g. HAProxy/Keepalived/VIP). There are a number of clustered file systems that can be used for the storage layer, we have previously covered solutions like csync2 with lsyncd, GlusterFS, OCFS2. The database service can be made highly available with e.g. DRBD, so all storage is replicated to a standby server. This means the service can be started on another host that has access to the database files.

This might not work very well for a high traffic website though, as all webservers will still be hitting the primary database instance. Failover time also can take a while, since you are failing over to a cold standby server and MySQL has to perform crash recovery when starting up.

MySQL master-slave replication is another option, and there are different ways to make it highly available. There are inconveniences though, as not all nodes are the same - you need to ensure to just write to one master and avoid diverging datasets across the nodes.

Galera Cluster for MySQL/MariaDB

Let’s start with discussing what Galera Cluster is and what it is not. It is a virtually synchronous, multi-master cluster. You can access any of the nodes and issue reads and writes - this is a significant improvement compared to replication setups - no need for failovers and master promotions, if one node is down, usually you can just connect to another node and execute queries. Galera provides a self-healing mechanism through state transfers - State Snapshot Transfer (SST) and Incremental State Transfer (IST). What it means is that when a node joins a cluster, Galera will attempt to bring it back to sync. It may just copy missing data from other node’s gcache (IST) or, if none of nodes contain data in its gcache, it will copy all of the contents of one of nodes to the joining node (via SST). This also makes it very easy for a Galera Cluster to recover even from serious failure conditions. Galera Cluster is a method to scale reads - you can increase a size of the cluster and you can read from all of the slaves.

On the other hand, you have to keep in mind what Galera is not. Galera is not a solution which implements sharding (like NDB Cluster does) - it does not shard the data automatically, each and every Galera node contains the same full data set, just like a standalone MySQL node. Therefore, Galera is not a solution which can help you scale writes. It can help you squeeze more writes than standard, single-threaded replication as it can utilize multiple writers at once, but as of MySQL 5.7, you can use multithreaded replication for every workload so it’s not the same advantage it used to be. Galera is not a solution which can be left alone - even though it has some auto healing features, it still requires user supervision and it happens pretty often that Galera cannot recover on its own.

Having said that, Galera Cluster is still a great piece of software, which can be utilized to build highly available clusters. In the next section we’ll show you different deployment patterns for Galera. Before we get there, there is one more very important bit of information that is required to understand why we want to deploy Galera clusters the way we are about to describe - quorum calculations. Galera has a mechanism which prevents split brain scenarios from happening. It detects number of available nodes and checks if there is a quorum available - Galera has to see (50% + 1) nodes to accept traffic. Otherwise it assumes that a split brain happened and it is a part of a minority segment which does not contain current data nor it can accept writes. Ok, now let’s talk about different ways in which you can deploy Galera cluster.

Deploying Galera cluster

Basic deployment - three node cluster

The most common way in which Galera cluster is deployed is to use an odd number of nodes and just deploy them. The most basic setup for Galera is a three node cluster. This setup is enough to survive a loss of one node - it still can operate just fine even though its read capacity decreases and it cannot tolerate any more failures. Such setup is very common to be used as an entry level - you can always add more nodes in the future, keeping in mind that you should use an odd number of nodes. We have seen Galera clusters as big as 11 - 13 nodes.

Minimalistic approach - two nodes + garbd

If you want to do some testing with Galera, you can reduce the cluster size to two nodes. A 2-node cluster does not provide any fault tolerance but we can improve this by leveraging Galera Arbitrator (garbd) - a daemon which can be started on a third node. It will receive all of the Galera traffic and for the purpose of detecting failures and forming a quorum, it acts as a Galera node. Given that garbd doesn’t need to apply writesets (it just accepts them, no further action is taken), it doesn’t require beefy hardware, like database nodes would need. This reduce the cost of hardware that’s needed to build the cluster.

One step further - add asynchronous slave

At any point you can extend your Galera cluster by adding one or more asynchronous slaves. Ideally, your Galera cluster has GTID enabled - it makes adding and reslaving slaves so much easier. Even if not, it is still possible to create a slave, although replication is much less likely to survive a crash of the “master” galera node. Such asynchronous slave can be used for a variety of reasons. It can be used as a backup host - run your backups on it to minimize the impact on Galera cluster. It can also be used for heavier, OLAP queries - again, to remove load from the Galera cluster. Another reason why you may want to use asynchronous slave is to build a Disaster Recovery environment - set it up in a separate datacenter and, in case the location of your Galera Cluster would go up in flames, you will still have a copy of your data in a safe location.

Multi-datacenter Galera clusters

If you really care about availability of your data, you can improve it by spanning your Galera cluster across multiple datacenters. Galera can be used across the WAN - some reconfiguration may be needed to make it better adapted to higher latency of WAN connections but it is totally suitable to work in such environment. The only blocker would be if your application frequently modifies a very small subset of rows - this could significantly reduce number of queries per second it will be able to execute.

When talking about WAN-spanning Galera clusters it is important to mention segments. Segments are used in Galera to differentiate the nodes of the cluster which are collocated in the same DC. For example, you may want to configure all nodes in the datacenter “A” to use segment 1 and all nodes in the datacenter “B” to use segment 2. We won’t go into details here (we covered this bit extensively in one of our posts) but in short, using segments reduce inter-segment communication - something you definitely want if segments are connected using WAN.

Another important aspect of building a highly available Galera Cluster over multiple datacenters is to use an odd number of datacenters. Two is not enough to build a setup which would automatically tolerate a loss of a datacenter. Let’s analyze following setup.

As we mentioned earlier, Galera requires a quorum to operate. Let’s see what would happen if one datacenter is not available:

As you can clearly see, we ended up with 3 nodes up, so 50% only - not enough to form a quorum. Manual action is required to assess the situation and promote the remaining part of the cluster to form a “Primary Component” - this requires time and it prolongs downtime.

Let’s take a look at another option:

In this case we added one more node to the datacenter “B” to make sure it will take over the traffic when DC “A” is down. But what would happen if DC “B” goes down?

We have three nodes out of seven, less than 50%. The only way is to use a third datacenter. You can, of course, use one more segment of three nodes (or more, it’s important to have the same number of nodes in each datacenter) but you can minimize costs by utilizing Galera Arbitrator:

In this case, no matter which datacenter will stop operating, as long as it will be only one of them, with garbd we have a quorum:

In our example it is: seven nodes in total, three down, four (3 + garbd) up - enough to form a quorum.

Multiple Galera clusters connected using asynchronous replication

We mentioned that you can deploy an asynchronous slave to Galera cluster and use it, for example, as a DR host. You also have to keep in mind that Galera cluster, to some extent, can be treated as a single MySQL instance. This means there’s no reason why you couldn’t connect two separate Galera clusters using asynchronous replication. Such setup is pretty common. Again, as with regular slaves, it’s better to have GTID because it allows you to quickly reslave your “standby” Galera cluster to another Galera node in the “active” cluster. Without GTID this is also possible but it’s so much more time-consuming and error-prone. Of course, such setup does work together as the multi-DC Galera cluster would - there’s no automated recovery of the replication link between datacenters, there’s no automated reslaving if a “master” Galera node goes down. You may need to build your own tools to automate this process.

Proxy layer

There is no high availability without a proxy layer (unless you have built-in HA in your application). Static connections to a single host does not scale. What you need is a middleman - something that will sit between your application and your database tier and mask the complexity of your database setup. Ideally, your application will have just a single point of access to your databases - connect to a given host and port - that’s it.

You can choose between different proxies - HAProxy, MaxScale, ProxySQL - all of them can work with Galera Cluster. Some of them, though, may require additional configuration - you need to know what and how it needs to be done. Additionally, it’s extremely important so you won’t end up with a proxy as a single point of failure. Each of those proxies can be deployed in a highly available fashion, and you will need to have a few components work together.

How ClusterControl can help you to build highly available Galera Clusters?

ClusterControl can help deploy Galera using all the vendors that are available: Codership, Percona XtraDB Cluster and MariaDB Cluster.

When you deployed a cluster, you can easily scale it up. If needed, you can define different segments for WAN-spanning cluster.

You can also, if you want, deploy an asynchronous slave to the Galera Cluster.

You can use ClusterControl to deploy Galera Arbitrator, which can be very helpful (as we shown previously) in multi-datacenter deployments.

Proxy layer

ClusterControl gives you ability to deploy different proxies with your Galera Cluster. It support deployments of HAProxy, MaxScale and ProxySQL. For HAProxy and ProxySQL, there are additional options to deploy redundant instances with Keepalived and VirtualIP.

For HAProxy, ClusterControl has a statistics page. You can also set a node to maintenance state:

For MaxScale, using ClusterControl, you have access to MaxScale’s CLI and you can perform any actions that are possible from it. Please note we deploy MaxScale version 1.4.3 - more recent versions introduced licensing limitations.

ClusterControl provides quite a bit of functionality for managing ProxySQL, including creating query rules and query caching, including creating query rules and query caching. If you are interested in more details, we encourage you to watch the replay of one of our webinars in which we demoed this UI.

As mentioned previously, ClusterControl can also deploy HAProxy and ProxySQL in a highly available setup. We use virtual IP and Keepalived to track the state of services and perform a failover if needed.

As we have seen in this blog, Galera Cluster can be used in a number of ways to provide high availability of your database, which is a key part of the web infrastructure. You are welcome to download ClusterControl and try the different topologies we discussed above.

Tags:

galera

Hight Availability

clustercontrol

ClusterControl allows you to easily manage your database infrastructure on premise or in the cloud. With in-depth support for technologies like Galera Cluster for MySQL and MariaDB setups, you can truly automate mixed environments for next-level applications.

Since the launch of ClusterControl in 2012, we’ve experienced growth in new industries with customers who are benefiting from the advancements ClusterControl has to offer - in particular when it comes to Galera Cluster for MySQL.

In addition to reaching new highs in ClusterControl demand, this past year we’ve doubled the size of our team allowing us to continue to provide even more improvements to ClusterControl.

Take a look at this infographic for our top Galera Cluster for MySQL resources and information about how ClusterControl works with Galera Cluster.

Tags: