Severalnines - ClusterControl

Many of our users speak highly of our product ClusterControl, especially how easy it is to install the software package. Installing new software is one thing, but using it properly is another.

We all are impatient to test new software and would rather like to toy around in a new exciting application than to read documentation up front. That is a bit unfortunate as you may miss the most important features or find out the way of doing things yourself instead of reading how to do things the easy way.

This new blog series will cover all the basic operations of ClusterControl for MySQL, MongoDB & PostgreSQL with examples explaining how to do this, how to make most of your setup and provides a deep dive per subject to save you time.

These are the topics we'll cover in this series:

Deploying the first clusters
Adding your existing infrastructure
Performance and health monitoring
Make your components HA
Workflow management
Safeguarding your data
Protecting your data
In depth use case

In today’s post we cover installing ClusterControl and deploying your first clusters.

Preparations

In this series we will make use of a set of Vagrant boxes but you can use your own infrastructure if you like. In case you do want to test it with Vagrant, we made an example setup available from the following Github repository:
https://github.com/severalnines/vagrant

Clone the repo to your own machine:

git clone git@github.com:severalnines/vagrant.git

The topology of the vagrant nodes is as follows:

vm1: clustercontrol
vm2: database node1
vm3: database node2
vm4: database node3

Obviously you can easily add additional nodes if you like by changing the following line:

4.times do |n|

The Vagrant file is configured to automatically install ClusterControl on the first node and forward the user interface of ClusterControl to port 8080 on your hos that runs Vagrantt. So if your host’s ip address is 192.168.1.10 you find ClusterControl UI here: http://192.168.1.10:8080/clustercontrol/

Installing ClusterControl

You can skip this if you chose to use the Vagrant file and received the automatic installation for free. But installation of ClusterControl is easy and will take less than five minutes of your time.

With the package installation all you have to do is issue the following three commands on the ClusterControl node to get it installed:

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc   # as root or sudo user

That’s it: it can’t get easier than this. If the installation script did not encounter any issues ClusterControl has been installed and is up and running. You can now log into ClusterControl on the following URL:
http://192.168.1.210/clustercontrol

After creating an administrator account and logging in, you will be prompted to add your first cluster.

Deploy a Galera cluster

In case you have installed ClusterControl via the package installation or when there are no clusters defined in ClusterControl you will be prompted to create a new database server/cluster or add an existing (i.e., already deployed) server or cluster:

In this case we are going to deploy a Galera cluster and this only requires one screen to fill in:

To allow ClusterControl to install the Galera nodes we use the root user that was granted ssh access by the Vagrant bootstrap scripts. In case you chose to use your own infrastructure, you must enter a user here that is allowed to do passwordless ssh to the nodes that ClusterControl is going to control.

Also make sure you disable AppArmor/SELinux. See here why.

After filling in all the details and you have clicked Deploy, a job will be spawned to build the new cluster. The nice thing is that you can keep track of the progress of this job by clicking on the spinning circle between the messages and settings icon in the top menu bar:

Clicking on this icon will open a popup that keeps you updated on the progress of your job.

Once the job has finished, you have just created your first cluster. Opening the cluster overview should look like this:

In the nodes tab, you can do about any operation you normally would like to do on a cluster and more. The query monitor gives you a good overview of both running and top queries. The performance tab will help you to keep a close eye upon the performance of your cluster and also features the advisors that help you to act pro-actively on trends in data. The backups tab enables you to easily schedule backups that are either stored on the DB nodes or the controller host and the managing tab enables you to expand your cluster or make it highly available for your applications through a load balancer.

All this functionality will be covered in later blog posts in this series.

Deploy a MySQL replication set

A new feature in ClusterControl 1.2.11 is that you can not only add slaves to existing clusters/nodes but you can also create new masters. In order to create a new replication set, the first step would be creating a new MySQL master:

After the master has been created you can now deploy a MySQL slave via the “Add Node” option in the cluster list:

Keep in mind that adding a slave to a master requires the master’s configuration to be stored in the ClusterControl repository. This will happen automatically but will take a minute to be imported and storedAfter adding the slave node, ClusterControl will provision the slave with a copy of the data from its master using Xtrabackup. Depending on the size of your data this may take a while.

Deploy a PostgreSQL replication set

Creating a PostgreSQL cluster requires one extra step as compared to creating a Galera cluster as it gets divided in adding a standalone PostgreSQL server and then adding a slave. The two step approach lets you decide which server will become the master and which one becomes the slave.

A side note is that the supported PostgreSQL version is 9.x and higher. Make sure the correct version gets installed by adding the correct repositories by PostgreSQL: http://www.postgresql.org/download/linux/

First we create a master by deploying a standalone PostgreSQL server:

After deploying, the first node will become available in the cluster list as a single node instance.

You can either open the cluster overview and then add the slave, but the cluster list also gives you the option to immediately add a replication slave to this cluster:

And adding a slave is as simple as selecting the master and filling in the fqdn for the new slave:

The PostgreSQL cluster overview gives you a good insight in your cluster:

Just like with the Galera and MySQL cluster overviews you can find all the necessary tabs and functions here: the query monitor, performance, backups tabs enables you to do the necessary operations.

Deploy a MongoDB replicaSet

Deploying a new MongoDB replicaSet is similar to PostgreSQL. First we create a new master node:

After installing the master we can add a slave to the replicaSet using the same dropdown from the cluster overview:

Keep in mind that you need to select the saved Mongo template here to start replicating from a replicaSet, in this case select the mongod.conf.shardsvr configuration.

After adding the slave to the MongoDB replicaSet, a job will be spawned. Once this job has finished it will take a short while before MongoDB adds it to the cluster and it becomes visible in the cluster overview.

Similar to the PostgreSQL, Galera and MySQL cluster overviews you can find all the necessary tabs and functions here: the query monitor, performance, backups tabs enables you to do the necessary operations.

Final thoughts

With these three examples we have shown you how easy it is to set up new clusters for MySQL, MongoDB and PostgreSQL from scratch in only a couple of minutes. The beauty of using this Vagrant setup is that you can as easy as spawning this environment also take it down again and then spawn again. Impress your fellow colleagues on how easily you can setup a working environment and convince them to use it as their own test or devops environment.

Of course it would be equally interesting to add existing hosts and clusters into ClusterControl and that’s what we'll cover next time.

Blog category:

ClusterControl

Tags:

In our previous blog post we covered the deployment of four types of clustering/replication: MySQL Galera, MySQL master-slave replication, PostgreSQL replication set and MongoDB replication set. This should enable you to create new clusters with great ease, but what if you already have 20 replication setups deployed and wish to manage them with ClusterControl?

This blog post will cover adding existing infrastructure components for these four types of clustering/replication to ClusterControl and how to have ClusterControl manage them.

Adding an existing Galera cluster to ClusterControl

Adding an existing Galera cluster to ClusterControl requires: mysql user with the proper grants and a ssh user that is able to login (without password) from the ClusterControl node to your existing databases and clusters.

Install ClusterControl on a separate VM. Once it is up, open the dialogue for adding an existing cluster. All you have to do is to add one of the Galera nodes and ClusterControl will figure out the rest:

After this behind the scenes, ClusterControl will connect to this host and detect all the necessary details for the full cluster and register the cluster in the overview.

Adding an existing MySQL master-slave to ClusterControl

Adding of an existing MySQL master-slave topology requires a bit more work than adding a Galera cluster. As ClusterControl is able to extract the necessary information for Galera, in the case of master-slave, you need to specify every host within the replication setup.

After this, ClusterControl will connect to every host, see if they are part of the same topology and register them as part of one cluster (or server group) in the GUI.

Adding an existing PostgreSQL replication set to ClusterControl

Similar to adding the MySQL master-slave above, the PostgreSQL replication set also requires to fill in all hosts within the same replication set.

After this, ClusterControl will connect to every host, see if they are part of the same topology and register them as part of the same group.

Adding an existing MongoDB replica set to ClusterControl

Adding an existing MongoDB replica set is just as easy as Galera: just one of the hosts in the replica set needs to be specified with its credentials and ClusterControl will automatically discover the other nodes in the replica set.

Expanding your existing infrastructure

After adding the existing databases and clusters, they now have become manageable via ClusterControl and thus we can scale out our clusters.

For MySQL, MongoDB and PostgreSQL replication sets, this can easily be achieved via the same way we showed in our previous blogpost: simply add a node and ClusterControl will take care of the rest.

For Galera, there is a bit more choice. The most obvious choice is to add a (Galera) node to the cluster by simply choosing “add node” in the cluster list or cluster overview. Expanding your Galera cluster this way should happen with increments of two to ensure your cluster always can have majority during a split brain situation.

Alternatively you could add a replication slave and thus create asynchronous slave in your synchronous cluster that looks like this:

Adding a slave node blindly under one of the Galera nodes can be dangerous since if this node goes down, the slave won’t receive updates anymore from its master. We blogged about paradigm earlier and you can read how to solve this in this blog post.

Final thoughts

We showed you how easy it is to add existing databases and clusters to ClusterControl, you can literally add clusters within minutes. So nothing should hold you back from using ClusterControl to manage your existing infrastructure. If you have a large infrastructure, the addition of ClusterControl will give you more overview and save time in troubleshooting and maintaining your clusters.

Now the challenge is how to leverage ClusterControl to keep track of key performance indicators, show the global health of your clusters and proactively alert you in time when something is predicted to happen. And that’s the subject we'll cover next time.

Blog category:

ClusterControl

Tags:

Requires ClusterControl. Applies to all supported database clusters. Applies to all supported operating systems (RHEL/CentOS/Debian/Ubuntu).

ClusterControl requires a super-privileged SSH user to provision database nodes. If you are running as non-root user, the corresponding user must able to execute sudo commands with or without sudo password. Unfortunately, this could generate another issue where performing remote command with “sudo” requires an interactive session (tty). We will explain this in details in the next sections.

What’s up with sudo?

By default, most of the RHEL flavors have the following configured under /etc/sudoers:

Defaults requiretty

When an interactive session (tty) is required, each time the sudo user SSH into the box with -t flag (force pseudo-tty allocation), entries will be created in /var/log/wtmp for the creation and destruction of terminals, or the assignment and release of terminals. These logs only record interactive sessions. If you didn’t specify -t, you would see the following error:

sudo: sorry, you must have a tty to run sudo

The root user does not require an interactive session when running remote SSH command, the entries only appear in /var/log/secure or /var/log/auth.log depending on the system configuration. Different distributions have different defaults in this regards. SSH does not make a file into wtmp if it is a non-interactive session.

To check the content of wtmp, we use the following command:

$ last -f /var/log/wtmp
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
...

On Debian/Ubuntu system, sudo user does not need to acquire tty as it defaults to have no “requiretty” configured. However, ClusterControl defaults to append -t flag if it detects the SSH user as a non-root user. Since ClusterControl performs all the monitoring and management tasks as this user, you may notice that /var/log/wtmp will grow rapidly, as shown in the following section.

Log rotation for wtmp

Example: Take note of the following default configuration of wtmp in RHEL 7.1 inside /etc/logrotate.conf:

/var/log/wtmp {
    monthly
    create 0664 root utmp
    minsize 1M
    rotate 1
}

By running the following commands on one of the database nodes managed by ClusterControl, we can see how fast /var/log/wtmp grows every minute:

[user@server ~]$ a=$(du -b /var/log/wtmp | cut -f1) && sleep 60 && b=$(du -b /var/log/wtmp | cut -f1) && c=$(expr $b - $a ) && echo $c
89088

From the above result, ClusterControl causes the log file to grow 89 KB per minute, which equals to 128MB per day. If the mentioned logrotate configuration is used (monthly rotation), /var/log/wtmp alone may consume 3.97 GB of disk space! If the partition where this file resides (usually under “/” partition) is small (it’s common to have “/” partition smaller, especially if it’s a cloud instance), there is a potential risk you would fill up the disk space on that partition in less than one month.

Workaround

The workaround is to play with the log rotation of wtmp. This is applicable to all operating systems mentioned in the beginning of this post. For those who are affected by this, you have to change the log rotation behaviour so it does not grow more than expected. The following is what we recommend:

/var/log/wtmp {
     size 100M
     create 0664 root utmp
     rotate 3
     compress
}

The above settings specify that the maximum size of wtmp should be 100 MB and, and we should keep the 3 most recent (compressed) files and remove older ones.

Logrotate uses crontab (under /etc/cron.daily/logrotate) to work. It is not a daemon so no need to reload its configuration. When the crontab executes logrotate, it will use the new config file automatically.

Happy Clustering!

PS.: To get started with ClusterControl, click here!

Blog category:

ClusterControl

Tags:

The blog series for MySQL, MongoDB & PostgreSQL administrators

In the previous two blog posts we covered both deploying the four types of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL) and managing/monitoring your existing databases and clusters. So, after reading these two first blog posts you were able to add your 20 existing replication setups to ClusterControl, expand them and additionally deployed two new Galera clusters while doing a ton of other things. Or maybe you deployed MongoDB and/or PostgreSQL systems. So now, how do you keep them healthy?

That’s exactly what this blog post is about: how to leverage ClusterControl’s performance monitoring and advisors functionality to keep your MySQL, MongoDB and/or PostgreSQL databases and clusters healthy. So how is this done in ClusterControl?

The cluster list

The most important information can already be found in the cluster list: as long as there are no alarms and no hosts are shown to be down, everything is functioning fine. An alarm is raised if a certain condition is met, e.g. host is swapping, and brings to your attention the issue you should investigate. That means that alarms not only are raised during an outage but also to allow you to proactively manage your databases.

Suppose you would log into ClusterControl and see a cluster listing like this, you will definitely have something to investigate: one node is down in the Galera cluster for example and every cluster has various alarms.

Once you click on one of the alarms, you will go to a detailed page on all alarms of the cluster. The alarm details will explain the issue and in most cases also advise the action to resolve the issue.

You can set up your own alarms by creating custom expressions, but that has been deprecated in favor of our new Developer Studio that allows you to write custom Javascripts and execute these as Advisors. We will get back to this topic later in this post.

The cluster overview - Dashboards

When opening up the cluster overview, we can immediately see the most important performance metrics for the cluster in the tabs. This overview may differ per cluster type as, for instance, Galera has different performance metrics to watch than traditional MySQL, Postgres or MongoDB.

Both the default overview and the pre-selected tabs are customizable. By clicking on Overview > Dash Settings you are given a dialogue that allows you to define the dashboard.

By pressing the plus sign you can add and define your own metrics to graph the dashboard. In our case we will define a new dashboard featuring the Galera specific receive and send queue:

This new dashboard should give us good insight in the average queue length of our Galera cluster.

Once you have pressed save, the new dashboard will become available for this cluster:

Similarly you can do this for PostgreSQL as well by combining the checkpoints with the number of commits:

So as you can see, it is relatively easy to customize your own (default) dashboard.

Cluster overview - Query Monitor

The Query Monitor tab is available for both MySQL and PostgreSQL based setups and consists out of three dashboards: Top Queries, Running Queries and Query Histogram.

In the Running Queries dashboard, you will find all current queries that are running. This is basically the equivalent of SHOW PROCESSLIST in ClusterControl.

Top Queries and Query Histogram both rely on the input of the slow query log. To prevent ClusterControl to be too intrusive and the slow query log to grow too large, ClusterControl will sample the slow query log by turning it on and off again. This loop is by default set to 1 second capturing and the long_query_time is set to 0.5 seconds. If you wish to change these settings for your cluster, you can change this via Settings -> Query Monitor.

Top Queries will, like the name says, show the top queries that were sampled. You can sort them on various columns: for instance the frequency, average execution time or the total execution time.

You can get more details about the query by selecting it and this will present the query execution plan (if available) and optimization hints/advisories. If necessary you can also select the query and have the details emailed to you by clicking on the “email query” button.

The Query Histogram is similar to the Top Queries but then allows you to filter the queries per host and compare them in time.

Cluster overview - Operations

Similar to the PostgreSQL and MySQL systems the MongoDB clusters have the Operations overview and is similar to the Running Queries. This overview is similar to issuing the db.currentOp() command within MongoDB.

Cluster overview - Performance

MySQL / Galera

The performance tab is probably the best place to find the overall performance and health of your clusters. For MySQL and Galera it consists of an Overview page, the Advisors, status/variables overviews, the Schema Analyzer and the Transaction log.

The Overview page will give you a graph overview of the most important metrics in your cluster. This is, obviously, different per cluster type. Eight metrics have been set by default, but you can easily set your own - up to 20 graphs if needed.

The Advisors is one of the key features of ClusterControl: the Advisors are scripted checks that can be run on demand. The advisors can evaluate almost any fact known about the host and/or cluster and give its opinion on the health of the host and/or cluster and even can give advice on how to resolve issues or improve your hosts!

The best part is yet to come: you can create your own checks in the Developer Studio (Cluster -> Manage -> Developer Studio), run them on a regular interval and use them again in the Advisors section. We blogged about this new feature earlier this year.

We will skip the status/variables overview of MySQL and Galera as this is useful for reference but not for this blog post: it is good enough that you know it is here. It is also good to mention that the Status Time Machine can help you track specific status variables and see how they change in time.

Now suppose your database is growing but you want to know how fast it grew in the past week. You can actually keep track of the growth of both data and index sizes from right within ClusterControl:

And next to the total growth on disk it can also report back the top 25 largest schemas.

Another important feature is the Schema Analyzer within ClusterControl.

ClusterControl will analyze your schemas and look for redundant indexes, MyISAM tables and tables without a primary key. Of course it is entirely up to you to keep a table without a primary key because some application might have created it this way, but at least it is great to get the advice here for free. The Schema Analyzer even constructs the necessary ALTER statement to fix the problem.

PostgreSQL

For PostgreSQL the Advisors, DB Status and DB Variables can be found here.

MongoDB

For MongoDB the Mongo Stats and performance overview can be found under the Performance tab. The Mongo Stats is an overview of the output of mongostat and
the Performance overview gives a good graphical overview of the Mongo opcounters:

Final thoughts

We showed you how to keep your eyeballs on the most important monitoring and health checking features of ClusterControl. Obviously this is only the beginning of the journey as we will soon start another blog series about the Developer Studio capabilities and how you can make most of your own checks. Also keep in mind that our support for MongoDB and PostgreSQL is not as extensive as our MySQL toolset, but we are continuously improving on this.

You may ask yourself why we have skipped over the performance monitoring and health checks of HA Proxy and MaxScalel. We did that deliberately as the blog series covered only deployments of clusters up till now and not the deployment of HA components. So that’s the subject we'll cover next time.

Blog category:

ClusterControl

Tags:

Many of our users speak highly of our product ClusterControl, especially how easy it is to install the software package. Installing new software is one thing, but using it properly is another.

These are the topics we'll cover in this series:

Deploying the first clusters
Adding your existing infrastructure
Performance and health monitoring
Make your components HA
Workflow management
Safeguarding your data
Protecting your data
In depth use case

In today’s post we cover installing ClusterControl and deploying your first clusters.

Preparations

Clone the repo to your own machine:

git clone git@github.com:severalnines/vagrant.git

The topology of the vagrant nodes is as follows:

vm1: clustercontrol
vm2: database node1
vm3: database node2
vm4: database node3

Obviously you can easily add additional nodes if you like by changing the following line:

4.times do |n|

Installing ClusterControl

You can skip this if you chose to use the Vagrant file and received the automatic installation for free. But installation of ClusterControl is easy and will take less than five minutes of your time.

With the package installation all you have to do is issue the following three commands on the ClusterControl node to get it installed:

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc   # as root or sudo user

After creating an administrator account and logging in, you will be prompted to add your first cluster.

Deploy a Galera cluster

In this case we are going to deploy a Galera cluster and this only requires one screen to fill in:

Also make sure you disable AppArmor/SELinux. See here why.

Clicking on this icon will open a popup that keeps you updated on the progress of your job.

Once the job has finished, you have just created your first cluster. Opening the cluster overview should look like this:

All this functionality will be covered in later blog posts in this series.

Deploy a MySQL replication set

After the master has been created you can now deploy a MySQL slave via the “Add Node” option in the cluster list:

Deploy a PostgreSQL replication set

First we create a master by deploying a standalone PostgreSQL server:

After deploying, the first node will become available in the cluster list as a single node instance.

You can either open the cluster overview and then add the slave, but the cluster list also gives you the option to immediately add a replication slave to this cluster:

And adding a slave is as simple as selecting the master and filling in the fqdn for the new slave:

The PostgreSQL cluster overview gives you a good insight in your cluster:

Just like with the Galera and MySQL cluster overviews you can find all the necessary tabs and functions here: the query monitor, performance, backups tabs enables you to do the necessary operations.

Deploy a MongoDB replicaSet

Deploying a new MongoDB replicaSet is similar to PostgreSQL. First we create a new master node:

After installing the master we can add a slave to the replicaSet using the same dropdown from the cluster overview:

Keep in mind that you need to select the saved Mongo template here to start replicating from a replicaSet, in this case select the mongod.conf.shardsvr configuration.

Final thoughts

Of course it would be equally interesting to add existing hosts and clusters into ClusterControl and that’s what we'll cover next time.

Blog category:

ClusterControl

Blog category:

ClusterControl

Tags:

Requires ClusterControl. Applies to all supported database clusters. Applies to all supported operating systems (RHEL/CentOS/Debian/Ubuntu).

What’s up with sudo?

By default, most of the RHEL flavors have the following configured under /etc/sudoers:

Defaults requiretty

sudo: sorry, you must have a tty to run sudo

To check the content of wtmp, we use the following command:

$ last -f /var/log/wtmp
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
ec2-user pts/0        ip-10-0-0-79.ap- Wed Oct 28 11:16 - 11:16  (00:00)
...

Log rotation for wtmp

Example: Take note of the following default configuration of wtmp in RHEL 7.1 inside /etc/logrotate.conf:

/var/log/wtmp {
    monthly
    create 0664 root utmp
    minsize 1M
    rotate 1
}

By running the following commands on one of the database nodes managed by ClusterControl, we can see how fast /var/log/wtmp grows every minute:

[user@server ~]$ a=$(du -b /var/log/wtmp | cut -f1) && sleep 60 && b=$(du -b /var/log/wtmp | cut -f1) && c=$(expr $b - $a ) && echo $c
89088

Workaround

/var/log/wtmp {
     size 100M
     create 0664 root utmp
     rotate 3
     compress
}

The above settings specify that the maximum size of wtmp should be 100 MB and, and we should keep the 3 most recent (compressed) files and remove older ones.

Happy Clustering!

PS.: To get started with ClusterControl, click here!

Blog category:

ClusterControl

Tags:

The blog series for MySQL, MongoDB & PostgreSQL administrators

The cluster list

The cluster overview - Dashboards

Both the default overview and the pre-selected tabs are customizable. By clicking on Overview > Dash Settings you are given a dialogue that allows you to define the dashboard.

By pressing the plus sign you can add and define your own metrics to graph the dashboard. In our case we will define a new dashboard featuring the Galera specific receive and send queue:

This new dashboard should give us good insight in the average queue length of our Galera cluster.

Once you have pressed save, the new dashboard will become available for this cluster:

Similarly you can do this for PostgreSQL as well by combining the checkpoints with the number of commits:

So as you can see, it is relatively easy to customize your own (default) dashboard.

Cluster overview - Query Monitor

The Query Monitor tab is available for both MySQL and PostgreSQL based setups and consists out of three dashboards: Top Queries, Running Queries and Query Histogram.

In the Running Queries dashboard, you will find all current queries that are running. This is basically the equivalent of SHOW PROCESSLIST in ClusterControl.

Top Queries will, like the name says, show the top queries that were sampled. You can sort them on various columns: for instance the frequency, average execution time or the total execution time.

The Query Histogram is similar to the Top Queries but then allows you to filter the queries per host and compare them in time.

Cluster overview - Operations

Cluster overview - Performance

MySQL / Galera

Now suppose your database is growing but you want to know how fast it grew in the past week. You can actually keep track of the growth of both data and index sizes from right within ClusterControl:

And next to the total growth on disk it can also report back the top 25 largest schemas.

Another important feature is the Schema Analyzer within ClusterControl.

PostgreSQL

For PostgreSQL the Advisors, DB Status and DB Variables can be found here.

MongoDB

Final thoughts

Blog category:

ClusterControl

Tags:

Choosing your HA topology

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (KeepaliveD, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

Resource managers like Zookeeper and Etcd can monitor your databases and (re)configure your applications once a host fails or a slave gets promoted to master. In general this is a good idea but implementing your checks with Zookeeper and Etcd is a complex task.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and MaxScale and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and MaxScale both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

If you have virtual IPs setup, all you have to do is point your application to the correct (virtual) IP address and everything should be fine connection wise. But suppose you have scaled out the number of read replicas, you might want to provide virtual IPs for each of those read replicas as well because of maintenance or availability reasons. This might become a very large pool of virtual IPs that you have to manage. If one of those read replicas had a failure, you need to re-assign the virtual IP to another host or else your application will connect to either a host that is down or in worst case, a lagging server with stale data. Keeping the replication state to the application managing the virtual IPs is therefore necessary.

Also for Galera there is a similar challenge: you can in theory add as many hosts as you’d like to your application config and pick one at random. The same problem arises when this host is down: you might end up connecting to an unavailable host. Also using all hosts for both reads and writes might also cause rollbacks due to the optimistic locking in Galera. If two connections try to write to the same row at the same time, one of them will receive a roll back. In case your workload has such concurrent updates, it is advised to only use one node in Galera to write to. Therefore you want a manager that keeps track of the internal state of your database cluster.

Both HAProxy and MaxScale will offer you the functionality to monitor the database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and MaxScale can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and MaxScale will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and MaxScale may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. Both HAProxy and MaxScale can distribute connections using round-robin. You can utilize the round-robin also to split reads by designating a specific port for sending reads to the slaves and another port to send writes to the master. Your application will have to decide whether to use the read or write port. Since MaxScale is an intelligent proxy, it is database aware and is also able to analyze your queries. MaxScale is able to do read/write splitting on a single port by detecting whether you are performing a read or write operation and connecting to the designated slaves or master in your cluster. MaxScale includes additional functionality like binlog routing, audit logging and query rewriting but we will have to cover these in a separate article.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

By default the HAProxy instance will be configured to send connections to the server instances receiving the least number of connections, but you can change that policy to either round robin or source.

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now your Galera cluster is also available via the newly deployed HAProxy node on port 3307. Don’t forget to GRANT your application access from the HAProxy IP, as now the traffic will be incoming from the proxy instead of the application hosts. Also, remember to point your application connection to the HAProxy node.

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

Now that we have moved the responsibility of retaining high availability over the database connections from the client to HAProxy, what if the proxy node dies? The answer is to create another HAProxy instance and use a virtual IP controlled by Keepalived as shown in this diagram:

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

In the example here, we used separate hosts to run HAProxy on, but you could easily add them to existing server instances as well. HAProxy does not bring much overhead, although you should keep in mind that in case of a server failure, you will lose both the database node and the proxy.

Deploying MaxScale

Deploying MaxScale to your cluster is done in a similar way to HAProxy: ‘Add Load Balancer’ in the cluster list.

ClusterControl will deploy MaxScale with both the round-robin router and the read/write splitter. The CLI port will be used to enable you to administrate MaxScale from ClusterControl.

After MaxScale has been deployed, it will be available under the Nodes tab:

Opening the MaxScale node overview will present you the interface that grants you access to the CLI interface, so there is no reason to log into MaxScale on the node anymore.

For MaxScale, the grants are slightly different: as you are proxying, you need to allow connections from the proxy - just like with HAProxy. But since MaxScale is also performing local authentication and authorization, you need to grant access to your application hosts as well.

Deploying Garbd

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes (50% + 1 node), so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. The added benefit by adding the Galera Arbitrator is that you can now do with only two nodes in your cluster.

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or MaxScale hosts to deploy garbd on, like in the example below:

Alternatively you could install garbd on the ClusterControl host.

After installing garbd, you will see it appear next to your two Galera nodes:

Final thoughts

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and MaxScale. Also garbd is a nice daemon that can save the extra third node in your Galera cluster.

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Blog category:

ClusterControl

Tags:

In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and MaxScale.

So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?

You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.

Creating an instant backup

In essence creating a backup is the same for Galera, MySQL replication, Postgres and MongoDB. You can find the backup section under ClusterControl > Backup and by default it should open the scheduling overview. From here you can also press the “Backup” button to make an instant backup.

As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get choose between mysqldump and xtrabackup. If in doubt which one to choose (for MySQL), check out this blog about the differences and use cases for mysqldump and xtrabackup.

On this very same screen, you can also create a backup schedule that allows you to run the backup at a set interval, for instance, during off-peak hours.

Backing up MySQL and Galera

As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup. Using mysqldump you can make backups of individual schemas or a selected set of schemas while xtrabackup will always make a full backup of your database.

In the Backup Wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas.

If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node.

If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster.

After scheduling an instant backup you can keep track of the progress of the backup job in the Settings > Cluster Jobs. After it has finished, you should be able to see the backup file in the configured location.

Backing up PostgreSQL

Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups the are less options to fill in as there is one backup method: pg_dump.

Backing up MongoDB

Similar to PostgreSQL there is only one backup method: mongodump. In contrary to PostgreSQL the node that we take the backup from can be desynced in the case of MongoDB.

Scheduling backups

Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.
The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.

For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer.

Once scheduled the job(s) should become visible under the “Current Backup Schedule” and you can edit them by double clicking on them. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Cluster Jobs overview if necessary.

Backup reports

You can find the Backup Reports under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Also from this interface you can directly restore a backup to a host in the master-slave setup or an entire Galera cluster.

The nice feature from ClusterControl is that it is able to restore a node/cluster using the full+incremental backups as it will keep track of the last (full) backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.

Offsite backup in Amazon S3 or Glacier

Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to copy your backups offsite to Amazon S3 or Glacier.

To enable offsite backups with Amazon, you need to add your AWS credentials and keypair in the Service Providers dialogue (Settings > Service Providers).

Once setup you are now able to copy your backups offsite:

This process will take some time as the backup will be sent encrypted and the Glacier service is, in contrary to S3, not a fast storage solution.

After copying your backup to Amazon S3 or Glacier you can get them back easily by selecting the backup in the S3/Glacier tab and click on retrieve. You can also remove existing backups from Amazon S3 and Glacier here.

An alternative to Amazon S3 or Glacier would be to send your backups to another data center (if available). You can do this with a sync tool like BitTorrent Sync. We wrote a blog article on how to set up BitTorrent Sync for backups within ClusterControl.

Final thoughts

We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from S3 or Glacier. Recovering from backups that have been moved to any other offsite storage will involve manual intervention though.

Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!

Blog category:

ClusterControl

Tags:

In the past five posts of the blog series, we covered deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

With ClusterControl 1.2.11, we made major enhancements to the database configuration manager. The new version allows changing of parameters on multiple database hosts at the same time and, if possible, changing their values at runtime.

We featured the new MySQL Configuration Management in a Tips & Tricks blog post, but this blog post will go more in depth and cover Configuration Management within ClusterControl for MySQL, PostgreSQL and MongoDB.

Cluster Control Configuration management

The configuration management interface can be found under Manage > Configurations. From here, you can view or change the configurations of your database nodes and other tools that ClusterControl manages. ClusterControl will import the latest configuration from all nodes and overwrite previous copies made. Currently there is no historical data kept.

If you’d rather like to manually edit the config files directly on the nodes, you can re-import the altered configuration by pressing the Import button.

And last but not least: you can create or edit configuration templates. These templates are used whenever you deploy new nodes in your cluster. Of course any changes made to the templates will not retroactively applied to the already deployed nodes that were created using these templates.

MySQL Configuration Management

As previously mentioned, the MySQL configuration management got a complete overhaul in ClusterControl 1.2.11. The interface is now more intuitive. When changing the parameters ClusterControl checks whether the parameter actually exists. This ensures your configuration will not deny startup of MySQL due to parameters that don’t exist.

From Manage -> Configurations, you will find an overview of all config files used within the selected cluster, including MaxScale nodes.

We use a tree structure to easily view hosts and their respective configuration files. At the bottom of the tree, you will find the configuration templates available for this cluster.

Changing parameters

Suppose we need to change a simple parameter like the maximum number of allowed connections (max_connections), we can simply change this parameter at runtime.

First select the hosts to apply this change to.

Then select the section you want to change. In most cases, you will want to change the MYSQLD section. If you would like to change the default character set for MySQL, you will have to change that in both MYSQLD and client sections.

If necessary you can also create a new section by simply typing the new section name. This will create a new section in the my.cnf.

Once we change a parameter and set its new value by pressing “proceed”, ClusterControl will check if the parameter exists for this version of MySQL. This is to prevent any non-existent parameters to block the initialization of MySQL on the next restart.

When we press “proceed” for the max_connections change, we will receive a confirmation that it has been applied to the configuration and set at runtime using SET GLOBAL. A restart is not required as max_connections is a parameter we can change at runtime.

Now suppose we want to change the bufferpool size, this would require a restart of MySQL before it takes effect:

And as expected the value has been changed in the configuration file, but a restart is required. You can do this by logging into the host manually and restarting the MySQL process. Another way to do this from ClusterControl is by using the Nodes dashboard.

Restarting nodes in a Galera cluster

You can perform a restart per node by selecting “Shutdown Node” and pressing the “Execute” button.

This will stop MySQL on the host but depending on your workload and bufferpool size this could take a while as MySQL will start flushing the dirty pages from the InnoDB bufferpool to disk. These are the pages that have been modified in memory but not on disk.

Once the host has stopped MySQL the “Start Node” button should become available:

Make sure you leave the “initial” checkbox unchecked in the confirmation:

When you select “initial start” on a Galera node, ClusterControl will empty the MySQL data directory and force a full copy this way. This is, obviously, unncessary for a configuration change.

Restarting nodes in a MySQL master-slave topologies

For MySQL master-slave topologies you can’t just restart node by node. Unless downtime of the master is acceptable, you will have to apply the configuration changes to the slaves first and then promote a slave to become the new master.

You can go through the slaves one by one and execute a “Shutdown node” on them and once MySQL has stopped execute the “Start node” again. Again make sure you leave the “initial” checkbox unchecked in the confirmation:

Just like the “Start Node” with Galera clusters, “initial start” will delete the MySQL data directory and copy the data from the master.

After applying the changes to all slaves, promote a slave to become the new master:

After the slave has become the new master, you can shutdown and start the old master node to apply the change.

Importing configurations

Now that we have applied the change directly on the database, as well as the configuration file, it will take until the next configuration import to see the change reflected in the configuration stored in ClusterControl. If you are less patient, you can schedule an immediate configuration import by pressing the “Import” button.

PostgreSQL Configuration Management

For PostgeSQL, the Configuration Management works a bit different from the MySQL Configuration Management. In general, you have the same functionality here: change the configuration, import configurations for all nodes and define/alter templates.

The difference here is that you can immediately change the whole configuration file and write this configuration back to the database node.

If the changes made requires a restart, a “Restart” button will appear that allows you to restart the node to apply the changes.

MongoDB Configuration Management

The MongoDB Configuration Management works similar to the PostgreSQL Configuration Management: you can change the configuration, import configurations for all nodes and alter templates.

Changing the configuration is, just like PostgreSQL, altering the whole configuration:

The biggest difference for MongoDB is that there are four configuration templates predefined:

The reason for this is that we support different types of MongoDB clusters, and this gets reflected in the cluster configurations.

Final thoughts

In this blog post we learned about how to manage, alter and template your configurations in ClusterControl. Changing the templates can save you a lot of time when you have deployed only one node in your topology. As the template will be used for new nodes, this will save you from altering all configurations afterwards. However for MySQL based nodes changing the configuration on all nodes has become trivial due to the new Configuration Management interface.

As a reminder, we recently covered in the same series deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

Blog category:

ClusterControl

Tags:

Earlier in the blog series, we touched upon deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale, how to prepare yourself for disasters by scheduling backups and in the last post how to manage your database configuration files where we described the new configuration management interface that got introduced in ClusterControl 1.2.11.

Another enhancement in ClusterControl 1.2.11 is the addition of system log files. Instead of having to log into each and every node in a cluster, you can now conveniently browse and read the mysqld and mongod log files of every node from within ClusterControl.

Todays blog post will cover the ClusterControl log section with all the tools available in ClusterControl and how to use them to your benefit. We will also cover how to grab all the necessary log files when troubleshooting issues together with the Severalnines support team.

Cluster Jobs

The Cluster Jobs contain the output of the various jobs that are run on a cluster. You can find the cluster specific jobs under Cluster > Logs > Cluster Jobs. The output of the job is in a certain sense just a log file detailing the steps executed in a job. Normally you would have no need to look at the output of these jobs. But should a certain job not succeed, then this is the first place to look for clues.

In this overview you can immediately see all jobs and their status. For instance here you can see that a backup is currently running on 10.10.11.11.

We can also spot a failed job. If we want to know why it failed, we can click on the entry and get the job output in the view below.

In the job details, we can look at the exit code of each step to trace back to the beginning of the problem. In this case, the first entry with an exit code of 1 is the ssh command to the new host. Apparently the CMON controller is unable to establish an ssh session to the new host and this is something we can resolve.

CMON Log files

The next place to look are the CMON Log files. You can find them under Cluster > Logs > CMON Logs. Here you will find the log entries of all scheduled jobs CMON is running, like crons and reports. Also any failure of nodes or cluster degradation can be found here. So for instance, if a node in your cluster is down, this is the place to look for hints.

The example above shows log entries of errors that one node in the cluster cannot be reached while there are informative lines that inform you that the cluster has 1 dead node and 2 nodes that are alive.

You can sort and filter the log entries as well.

MySQL log files

As mentioned earlier, we have added the collection of the mysql log files in ClusterControl 1.2.11. Files included are the MySQL error log, the innobackup backup and restore log files. You can find them under Cluster > Logs > System Logs.

All log files are being collected by ClusterControl every 30 minutes and you can check the “Last Updated” time at the bottom of the overview. If you are in immediate need of the log files you can push the “Refresh Logs” button to trigger a job in ClusterControl to collect the latest lines from the log files.

Also if you wish to have the log files collected more (or less) often, you can change this in Cluster > Settings > General Settings or change this in the cluster configuration file directly and reload the CMON service.

The MySQL error log can be very helpful to find and resolve issues within your cluster. We published a blog post about the ins and outs of the MySQL error log a few weeks ago.

Next to the MySQL error log, we also provide the innobackup backup and restore logs. These log files are created by the process that provides a node with the data from its master (or SST from another node in Galera’s case). If anything goes wrong during loading the data, these log files will give you a good clue about what went wrong.

To give an example, suppose we are forcing an SST in Galera and this fails. Firstly we can find the failed SST error in the MySQL error log:

As you can see, first 10.10.11.12 get selected as a donor, the MySQL data directory gets emptied and then the data is transferred. So next step would be to check the innobackup backup log on the donor:

We can see that innobackupex made an attempt to make a backup but failed to connect to MySQL. It used the root account and password in this case, so this indicates the stored credentials for the SST (wsrep_sst_auth) are invalid. In this case, it is quite obvious why it failed. But in less obvious cases, these log files are a great help in resolving an issue.

Mongodb log files

Just as described above, the MongoDB log files are collected by ClusterControl. You can find them under Cluster > Logs > System Logs.

Error reports

Whenever you were not able to resolve your issues using the log files as described above and would like us to have a look, it is always handy to include an error report for us. You can find this under Cluster > Logs > Error reports. The error report is basically a tarball that contains a collection of log files, job lists and job details from the cluster.

You can create a job that will generate an error report by clicking on the “Create Error Report” button in the interface. This will give you a dialogue that asks whether you want to store the report on the web server or not. If you store the reports on the web server, you can download the report once the job has succeeded. Otherwise you can specify the location on the ClusterControl node where you want the report to be stored.

You can attach this report to the support ticket you are creating, so we have all the information at hand.

Final thoughts

With the combined insights you can retrieve from the cluster jobs, CMON logs and system log files, you should be able to narrow down issues more easily. Combine that insight with the knowledge of our blog post on the MySQL error log, this should help you not only identify the issue resolve it yourself.

Blog category:

ClusterControl

Tags:

Did you ever wonder what triggers the advice in ClusterControl that your disk is filling up? Or the advice to create primary keys on InnoDB tables if they don’t exist? These advisors are mini scripts written in the ClusterControl Domain Specific Language (DSL) that is a Javascript-like language. These scripts can be written, compiled, saved, executed and scheduled in ClusterControl. That is what the ClusterControl Developer Studio blog series will be about.

Today we will cover the Developer Studio basics and show you how to create your very first advisor where we will pick two status variables and give advice about their outcome.

The advisors

Advisors are mini scripts that are executed by ClusterControl, either on-demand or after a schedule. They can be anything from simple configuration advice, warning on thresholds or more complex rules for predictions or cluster-wide automation tasks based on the state of your servers or databases. In general, advisors perform more detailed analysis, and produce more comprehensive recommendations than alerts.

The advisors are stored inside the ClusterControl database and you can add new or alter/modify existing advisors. We also have an advisor Github repository where you can share your advisors with us and other ClusterControl users.

The language used for the advisors is the so called ClusterControl DSL and is an easy to comprehend language. The semantics of the language can be best compared to Javascript with a couple of differences, where the most important differences are:

Semicolons are mandatory
Various numeric data types like integers and unsigned long long integers.
Arrays are two dimensional and single dimensional arrays are lists.

You can find the full list of differences in the ClusterControl DSL reference.

The Developer Studio interface

The Developer Studio interface can be found under Cluster > Manage > Developer Studio. This will open an interface like this:

Advisors

The advisors button will generate an overview of all advisors with their output since the last time they ran:

You can also see the schedule of the advisor in crontab format and the date/time since the last update. Some advisors are scheduled to run only once a day so their advice may no longer reflect the reality, for instance if you already resolved the issue you were warned about. You can manually re-run the advisor by selecting the advisor and run it. Go to the “compile and run” section to read how to do this.

Importing advisors

The Import button will allow you to import a tarball with new advisors in them. The tarball has to be created relative to the main path of the advisors, so if you wish to upload a new version of the MySQL query cache size script (s9s/mysql/query_cache/qc_size.js) you will have to make the tarball starting from the s9s directory.

By default the import will create all (sub)folders of the import but not overwrite any of the existing advisors. If you wish to overwrite them you have to select the “Overwrite existing files” checkbox.

Exporting advisors

You can export the advisors or a part of them by selecting a node in the tree and pressing the Export button. This will create a tarball with the files in the full path of the structure presented. Suppose we wish to make a backup of the s9s/mysql advisors prior to making a change, we simply select the s9s/mysql node in the tree and press Export:

Note: make sure the s9s directory is present in /home/myuser/.

This will create a tarball called /home/myuser/s9s/mysql.tar.gz with an internal directory structure s9s/mysql/*

Creating a new advisor

Since we have covered exports and imports, we can now start experimenting. So let’s create a new advisor! Click on the New button to get the following dialogue:

In this dialogue, you can create your new advisor with either an empty file or pre fill it with the Galera or MySQL specific template. Both templates will add the necessary includes (common/mysql_helper.js) and the basics to retrieve the Galera or MySQL nodes and loop over them.

Creating a new advisor with the Galera template looks like this:

#include "common/mysql_helper.js"

Here you can see that the mysql_helper.js gets included to provide the basis for connecting and querying MySQL nodes.

var WARNING_THRESHOLD=0;
…
if(threshold > WARNING_THRESHOLD)

The warning threshold is currently set to 0, meaning if the measured threshold is greater than the warning threshold, the advisor should warn the user. Note that the variable threshold is not set/used in the template yet as it is a kickstart for your own advisor.

var hosts     = cluster::Hosts();
var hosts     = cluster::mySqlNodes();
var hosts     = cluster::galeraNodes();

The statements above will fetch the hosts in the cluster and you can use this to loop over them. The difference between them is that the first statement includes all non-MySQL hosts (also the CMON host), the second all MySQL hosts and the last one only the Galera hosts. So if your Galera cluster has MySQL asynchronous read slaves attached, those hosts will not be included.

Other than that these objects will all behave the same and feature the ability to read their variables, status and query against them.

Advisor buttons

Now that we have created a new advisor there are six new button available for this advisor:

Save will save your latest modifications to the advisor (stored in the CMON database), Move will move the advisor to a new path and Remove will obviously remove the advisor.

More interesting is the second row of buttons. Compiling the advisor will compile the code of the advisor. If the code compiles fine, you will see this message in the Messages dialogue below the code of the advisor:

While if the compilation failed, the compiler will give you a hint where it failed:

In this case the compiler indicates a syntax error was found on line 24.

The compile and run button will not only compile the script but also execute it and its output will be shown in the Messages, Graph or Raw dialogue. If we compile and run the table cache script from the auto_tuners, we will get output similar to this:

Last button is the schedule button. This allows you to schedule (or unschedule) your advisors and add tags to it. We will cover this at the end of this post when we have created our very own advisor and want to schedule it.

My first advisor

Now that we have covered the basics of the ClusterControl Developer Studio, we can now finally start to create a new advisor. As an example we will create a advisor to look at the temporary table ratio. Create a new advisor as following:

The theory behind the advisor we are going to create is simple: we will compare the number of temporary tables created on disk against the total number of temporary tables created:

tmp_disk_table_ratio = Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100;

First we need to set some basics in the head of the script, like the thresholds and the warning and ok messages. All changes and additions have been marked in bold:

var WARNING_THRESHOLD=20;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive." ;

We set the threshold here to 20 percent which is considered to be pretty bad already. But more on that topic once we have finalised our advisor.

Next we need to get these status variables from MySQL. Before we jump to conclusions and execute some “SHOW GLOBAL STATUS LIKE ‘Created_tmp_%’” query, there is already a function to retrieve the status variable of a MySQL instance:

statusVar = readStatusVariable(host, <statusvariablename>);

We can use this function in our advisor to fetch the Created_tmp_disk_tables and Created_tmp_tables.

for (idx = 0; idx < hosts.size(); ++idx)
{
   host        = hosts[idx];
   map         = host.toMap();
   connected     = map["connected"];
   var advice = new CmonAdvice();
   var tmp_tables = readStatusVariable(host, ‘Created_tmp_tables’);
   var tmp_disk_tables = readStatusVariable(host, ‘Created_tmp_disk_tables’);

And now we can calculate the temporary disk tables ratio:

var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;

And alert if this ratio is greater than the threshold we set in the beginning:

if(checkPrecond(host))
{
   if(tmp_disk_table_ratio > WARNING_THRESHOLD) {
      advice.setJustification("Temporary tables written to disk is excessive");
      msg = ADVICE_WARNING;
   }
   else {
      advice.setJustification("Temporary tables written to disk not excessive");
      msg = ADVICE_OK;
   }
}

It is important to assign the Advice to the msg variable here as this will be added later on into the advice object with the setAdvice function. The full script for completeness:

#include "common/mysql_helper.js"

/**
* Checks the percentage of max ever used connections
*
*/
var WARNING_THRESHOLD=20;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive.";

function main()
{
   var hosts     = cluster::mySqlNodes();
   var advisorMap = {};

   for (idx = 0; idx < hosts.size(); ++idx)
   {
       host        = hosts[idx];
       map         = host.toMap();
       connected     = map["connected"];
       var advice = new CmonAdvice();
       var tmp_tables = readStatusVariable(host, 'Created_tmp_tables');
       var tmp_disk_tables = readStatusVariable(host, 'Created_tmp_disk_tables');
       var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;
       
       if(!connected)
           continue;

       if(checkPrecond(host))
       {
          if(tmp_disk_table_ratio > WARNING_THRESHOLD) {
              advice.setJustification("Temporary tables written to disk is excessive");
              msg = ADVICE_WARNING;
              advice.setSeverity(0);
          }
          else {
              advice.setJustification("Temporary tables written to disk not excessive");
              msg = ADVICE_OK;
          }
       }
       else
       {
           msg = "Not enough data to calculate";
           advice.setJustification("there is not enough load on the server or the uptime is too little.");
           advice.setSeverity(0);
       }

       advice.setHost(host);
       advice.setTitle(TITLE);
       advice.setAdvice(msg);
       advisorMap[idx]= advice;
   }

   return advisorMap;
}

Now you can play around with the threshold of 20, try to lower it to 1 or 2 for instance and then you probably can see how this advisor will actually give you advice on the matter.

As you can see, with a simple script you can check two variables against each other and report/advice based upon their outcome. But is that all? There are still a couple of things we can improve!

Improvements on my first advisor

The first thing we can improve is that this advisor doesn’t make a lot of sense. What the metric actually reflects is the total number of temporary tables on disk since the last FLUSH STATUS or startup of MySQL. What it doesn’t say is at what rate it actually creates temporary tables on disk. So we can convert the Created_tmp_disk_tables to a rate using the uptime of the host:

var tmp_disk_table_rate = tmp_disk_tables / uptime;

This should give us the number of temporary tables per second and combined with the tmp_disk_table_ratio, this will give us a more accurate view on things. Again, once we reach the threshold of two temporary tables per second, we don’t want to immediately send out an alert/advice.

Another thing we can improve is to not use the readStatusVariable function from the mysql_helper.js library. This function executes a query to the MySQL host every time we read a status variable, while CMON already retrieves most of them every second and we don’t need a real-time status anyway. It’s not like two or three queries will kill the hosts in the cluster, but if many of these advisors are run in a similar fashion, this could create heaps of extra queries.

In this case we can optimize this by retrieving the status variables in a map using the host.sqlInfo()function and retrieve everything at once as a map. This function contains the most important information of the host, but it does not contain all. For instance the variable uptime that we need for the rate is not available in the host.sqlInfo()map and has to be retrieved with the readStatusVariable function.

This is what our advisor will look like now, with the changes/additions marked in bold:

#include "common/mysql_helper.js"

/**
* Checks the percentage of max ever used connections
*
*/
var RATIO_WARNING_THRESHOLD=20;
var RATE_WARNING_THRESHOLD=2;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk and current rate is more than 2 temporary tables per second. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive.";

function main()
{
   var hosts     = cluster::mySqlNodes();
   var advisorMap = {};

   for (idx = 0; idx < hosts.size(); ++idx)
   {
       host        = hosts[idx];
       map         = host.toMap();
       connected     = map["connected"];
       var advice = new CmonAdvice();
       var hostStatus = host.sqlInfo();
       var tmp_tables = hostStatus['CREATED_TMP_TABLES'];
       var tmp_disk_tables = hostStatus['CREATED_TMP_DISK_TABLES'];
       var uptime = readStatusVariable(host, 'uptime');
       var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;
       var tmp_disk_table_rate = tmp_disk_tables / uptime;

       if(!connected)
           continue;

       if(checkPrecond(host))
       {
          if(tmp_disk_table_rate > RATE_WARNING_THRESHOLD && tmp_disk_table_ratio > RATIO_WARNING_THRESHOLD) {
              advice.setJustification("Temporary tables written to disk is excessive: " + tmp_disk_table_rate + " tables per second and overall ratio of " + tmp_disk_table_ratio);
              msg = ADVICE_WARNING;
              advice.setSeverity(0);
          }
          else {
              advice.setJustification("Temporary tables written to disk not excessive");
              msg = ADVICE_OK;
          }
       }
       else
       {
           msg = "Not enough data to calculate";
           advice.setJustification("there is not enough load on the server or the uptime is too little.");
           advice.setSeverity(0);
       }

       advice.setHost(host);
       advice.setTitle(TITLE);
       advice.setAdvice(msg);
       advisorMap[idx]= advice;
   }

   return advisorMap;
}

Scheduling my first advisor

After we have saved this new advisor, compiled it and run, we now can schedule this advisor. Since we don’t have an excessive workload, we will probably run this advisor once per day.

The base scheduling mode has every minute, 5 minutes, hour, day, month preset and this is exactly what we need. Changing this to advanced will unlock the other greyed out input fields. These input fields work exactly the same as a crontab, so you can even schedule for a particular day, day of the month or even set it on weekdays.

Blog category:

ClusterControl

Tags:

In the previous blog post, we gave a brief introduction to ClusterControl Developer Studio and the ClusterControl Domain Specific Language that you can use to write your own scripts, advisors and alerts. We also showed how, with just 12 lines of code, you can create an advisor by using one of the ready made templates combined with existing database metrics collected by ClusterControl.

Today we are going one step further: we are going to retrieve data from the MySQL Performance Schema and use this in our advisor.

MySQL Performance Schema

The performance schema is basically a schema that exists in your MySQL instance. It is a special type of schema as it uses the so called PERFORMANCE_SCHEMA storage engine that interfaces with the performance data of your MySQL instance. This allows you to query for specific performance metrics and/or events that happen inside MySQL without locking up your database (e.g. SHOW ENGINE INNODB STATUS locks for a fraction of a second). The performance schema does have a drawback: it is at the cost of performance of your database and has an average of 5% to 10% performance penalty. Given the extra insights you will get from the database engine, and no longer need to lock the InnoDB engine every few seconds, this should actually pay off in the long run.

One of the prerequisites for our advisor is that we need to have the MySQL performance schema enabled, otherwise our queries will return empty results as the schema is empty. You can check if the performance schema is enabled on your host by executing the following command:

SHOW VARIABLES LIKE 'performance_schema';

If your host did not have the performance schema enabled you can do this by adding or modifying the following line in the [mysqld] section of your MySQL configuration:

performance_schema = 1

This obviously requires a restart of MySQL to become effective.

Available Performance Schema advisors in ClusterControl

ClusterControl ships, by default, a couple of advisors using the Performance Schema:

Processlist
Top accessed DB files
Top queries
Top tables by IOWait
Top tables by Lockwait

You can run and schedule these advisors if you have the Performance Schema enabled and they will give you great insight in what is happening inside your cluster. The source code can be found on our github page.

Adding custom advisors

Once the performance schema has been enabled, you should see data inside the performance_schema starting to accumulate. One of the tables we are interested in for this article is the one that keeps information about the indexes per table, per schema: table_io_waits_summary_by_index_usage

The information in this table is pretty interesting if you can keep track of the previous state, take the delta between the two and see the increase/decrease in efficiency. Most interesting metrics in this table are:

COUNT_STAR: total number of all IO operations (read + write)
COUNT_READ: total number of all read IO operations
COUNT_WRITE: total number of all write IO operations
AVG_TIMER_*: average time taken

However as the advisors only run infrequently and do not keep previous state, this would not be possible at the moment of writing. Therefore we will cover two advisors: unused indexes and queries that are not using indexes.

Unused indexes

We can extract different information from this table than just timers: we can find indexes that have not been touched as they did not receive any read, write or delete operations at all. The query to extract this information would be:

SELECT object_schema AS schema, object_name AS table, index_name AS index FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NOT NULL AND count_star = 0 AND object_schema != 'mysql' AND index_name != 'PRIMARY'

With this information we can run a daily advisor that extracts this information and advises you to look at this index and drop it if necessary. So in principle we simply have to loop over all hosts in the cluster and run this query.

Our new advisor will be based on the Generic MySQL template. So if you create the advisor please select the following:

Querying

At the top of our new advisor, we simply define our query since we don’t need any host specific information:

var query="SELECT `object_schema` AS `schema`, `object_name` AS `table`, `index_name` AS `index`""FROM `performance_schema`.`table_io_waits_summary_by_index_usage` ""WHERE `index_name` IS NOT NULL ""AND `count_star` = 0 ""AND `object_schema` != 'mysql'""AND `index_name` != 'PRIMARY'";

Now inside the host loop we first have to check if the Performance Schema is enabled, otherwise the query will fail and the advisor will output an error:

if (!readVariable(host, "performance_schema").toBoolean())
{
   print(host, ": performance_schema is not enabled.");
   continue;
}

Also inside the host loop, we run our query on each host:

result = getValueMap(host, query);

The function getValueMap is a predefined function in the mysql_helper.js that is included by default if you create a new MySQL or Galera advisor. This function will accept host and query as parameters and returns a map (array) per row with its columns. So, for example, if we wish to print the second column of the first row, we should do the following:

print(result[0][1]);

The getValueMap function returns false if there are no rows found, so we have to cover the case of all indexes being used.

if (result == false)
{
  msg = concatenate(msg, "No unused indexes found on this host.");
  advice.setJustification(ADVICE_OK);
  advice.setSeverity(Ok);
}

The message in the msg variable will be printed later as we may have to include additional information. As the message will be used in the advisors page, we have included HTML formatting here. We also have to set the right severity and justification per host whether there is an unused index present.

for (i=0; i<result.size(); ++i)
{
    msg = concatenate(msg, "Unused index found on table ", result[i][0], ".", result[i][1], ": index ", result[i][2], 
" can be dropped.<br/><br/>");
}
advice.setJustification(ADVICE_WARNING);
advice.setSeverity(Warning);

In the case of an unused index, it will output a message like this:
Unused index found on table sbtest.sbtest1: index k_1 can be dropped.

Scheduling

Now if we would schedule once per day, this advisor will run once a day and become available in the advisor page:

And after it has run successfully, we can see that it does work as expected:

Unused indexes - The complete advisor script

#include "common/mysql_helper.js"

/**
 * Checks the index usage and warns if there are unused indexes present
 * 
 */ 
var TITLE="Unused indexes";
var ADVICE_WARNING="Unused indexes have been found in your cluster. It is advised to drop them.";
var ADVICE_OK="No unused indexes found.";

var query="SELECT `object_schema` AS `schema`, `object_name` AS `table`, `index_name` AS `index`""FROM `performance_schema`.`table_io_waits_summary_by_index_usage` ""WHERE `index_name` IS NOT NULL ""AND `count_star` = 0 ""AND `object_schema` != 'mysql'""AND `index_name` != 'PRIMARY'";

function main()
{
    var hosts     = cluster::mySqlNodes();
    var advisorMap = {};

    for (idx = 0; idx < hosts.size(); ++idx)
    {
        host        = hosts[idx];
        map         = host.toMap();
        connected     = map["connected"];
        var advice = new CmonAdvice();

        if(!connected)
            continue;
        if (!readVariable(host, "performance_schema").toBoolean())
        {
            print(host, ": performance_schema is not enabled.");
            continue;
        }
        result = getValueMap(host, query);
        msg = concatenate("Server: ", host, "<br/>");
        msg = concatenate(msg, "------------------------<br/>");
        if (result == false)
        {
            msg = concatenate(msg, "No unused indexes found on this host.");
            advice.setJustification(ADVICE_OK);
            advice.setSeverity(Ok);
        }
        else
        {
            for (i=0; i<result.size(); ++i)
            {
                msg = concatenate(msg, "Unused index found on table ", result[i][0], ".", result[i][1], ": index ", result[i][2], " can be dropped.<br/><br/>");
            }
            advice.setJustification(ADVICE_WARNING);
            advice.setSeverity(Warning);
        }
        
        print(msg);
        advice.setHost(host);
        advice.setTitle(TITLE);
        advice.setAdvice(msg);
        advisorMap[idx]= advice;
    }
    return advisorMap;
}

Tables with no indexes used

Just like extracting the unused indexes from the Performance Schema, we can also find which tables have been accessed without using indexes.

SELECT `object_schema`, `object_name`, `count_star`, `count_read`, `count_write`, `count_delete` FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NULL AND count_star > 0 AND object_schema != 'mysql'

With this, we can run a daily advisor that extracts this information and advises you to look at the tables that have been queried without using an index. There could be various reasons why an index has not been used, for instance because the update query uses columns that have not been covered by any index.

Our new advisor will be based on the Generic MySQL template, but you could as well copy the previous advisor and adapt it.

Querying

At the top of our new advisor, we define again our query since we don’t need any host specific information:

var query = "SELECT `object_schema`, `object_name`, `count_star`, `count_read`, `count_write`, `count_delete` FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NULL AND count_star > 0 AND object_schema != 'mysql'";

The code for this advisor will be almost entirely similar to the unused indexes example as we will yet again loop over all hosts in the cluster and check on the indexes. The major difference is the way we interpret and show the data:

for (i=0; i<result.size(); ++i)
{
    msg = concatenate(msg, "Table has been queried without using indexes: ", result[i][0], ".", result[i][1], " with a total of ", result[i][2], " IO operations (", result[i][3], " Read / ", result[i][4]," Write / ", result[i][5], " Delete)<br/><br/>");
}
advice.setJustification(ADVICE_WARNING);
advice.setSeverity(Warning);

Keep in mind that the information created by this query does not tell us what the exact cause is, it is just an indicator that there is an inefficient table that has X amount of iops specified by read, write and delete operations. Further investigation is necessary.

Scheduling

Similarly as the Unused Indexes advisor, we would schedule this advisor once a day and make it available inside the advisor page:

And after it has ran successfully, we can see that it does work as expected:

Complete advisor script - Tables with no indexes used

#include "common/mysql_helper.js"

/**
 * Checks the index usage and warns if there are unused indexes present
 * 
 */ 
var TITLE="Table access without using index";
var ADVICE_WARNING="There has been access to tables without using an index. Please investigate queries using these tables using a query profiler.";
var ADVICE_OK="All tables have been accessed using indexes.";

var query = "SELECT `object_schema`, `object_name`, `count_star`, `count_read`, ""`count_write`, `count_delete` ""FROM performance_schema.table_io_waits_summary_by_index_usage ""WHERE index_name IS NULL ""AND count_star > 0 ""AND object_schema != 'mysql'";

function main()
{
    var hosts     = cluster::mySqlNodes();
    var advisorMap = {};

    for (idx = 0; idx < hosts.size(); ++idx)
    {
        host        = hosts[idx];
        map         = host.toMap();
        connected     = map["connected"];
        var advice = new CmonAdvice();

        if(!connected)
            continue;
        if (!readVariable(host, "performance_schema").toBoolean())
        {
            print(host, ": performance_schema is not enabled.");
            continue;
        }
        result = getValueMap(host, query);
        msg = concatenate("Server: ", host, "<br/>");
        msg = concatenate(msg, "------------------------<br/>");
        if (result == false)
        {
            msg = concatenate(msg, "No tables have been queried without indexes.");
            advice.setJustification(ADVICE_OK);
            advice.setSeverity(Ok);
        }
        else
        {
            for (i=0; i<result.size(); ++i)
            {
                msg = concatenate(msg, "Table has been queried without using indexes: ", result[i][0], ".", result[i][1], " with a total of ", result[i][2], " IO operations (", result[i][3], " Read / ", result[i][4]," Write / ", result[i][5], " Delete)<br/><br/>");
            }
            advice.setJustification(ADVICE_WARNING);
            advice.setSeverity(Warning);
        }
        
        print(msg);
        advice.setHost(host);
        advice.setTitle(TITLE);
        advice.setAdvice(msg);
        advisorMap[idx]= advice;
    }
    return advisorMap;
}

Advisor repository

All our advisors are freely available through the Advisor Github repository and the two custom advisors are also made available through the repository. We encourage you to share the advisors you have written yourself as they may benefit other ClusterControl users as well.

If you have a Github account you can share back by forking our repository, check in your changes and create a pull request for us. If you don’t have a Github account, you can paste a link to your advisor in the comments section of this blog or email it to us.

Blog category:

ClusterControl

Tags:

Logging is a fundamental tool for the system administrator. It helps identify whether systems are running as configured, and track any unusual activity when trying to diagnose and isolate problems. Without even basic logging, when something unexpected happens, you could experience excessive downtime due to the lack of diagnostic data.

Syslog is a widely used standard for message logging. A centralized syslog server, where you can analyze logs of your infrastructure in one place, can greatly improve your visibility into what is going on. It is possible to have ClusterControl alarms sent to syslog via a syslog plugin. In this blog post, we will show you how to install this plugin. More plugins and integrations can be found on our Tools page.

What is syslog?

In a Linux operating system, the kernel and other internal components generate alerts and messages. These messages are typically stored in a file system or relayed to another device in the form of syslog messages. An internal daemon, called Syslogd, handles the syslog process. Syslog is widely used, and provides a central point for collecting and processing system logs. These system logs are useful later for troubleshooting and auditing.

The Syslog protocol is supported by a wide range of devices and can be used to log different types of events. Syslog is able to:

accept inputs from a wide variety of sources
transform them
and output the results to diverse destinations

Syslog consists of three basic components:

Device: This is the equipment that generates log messages, this could be a server, router, firewall, or any other device that can generate syslog event data.
Collector: This equipment is configured to receive messages generated by a log device where it will subsequently store these externally generated messages local to itself.
Relay: Like a log collector it is configured to receive messages, these messages are not stored but forwarded to another device, usually a log collector but could be another relay.

Nowadays, most of Linux boxes come with syslog configured for local logging only and will not accept or process any messages from outside sources. In a local logging configuration the host performs both roles of device and collector, messages generated by the system are directed to the syslog process which routes these messages to a local destination on the host machine based on the rules set in a file /etc/syslog.conf.

ClusterControl Syslog Plugin

This plugin writes the new alarms instantly to the syslog, by default it would be /var/log/syslog on Debian-based systems or /var/log/messages on Redhat-based systems on the local host. If you have remote syslog configured for central log management, you can leverage this capability to capture ClusterControl alarms and process them. You can then use the information for things like a system health check, triggering an alert, intrusion detection or security audit purposes.

To install the plugin, simply do the following 4 steps on ClusterControl node:

Create the ClusterControl plugin directory. We will use the default location which is /var/cmon/plugins:
```
$ mkdir -p /var/cmon/plugins
```

Grab the plugin from our s9s-admin Github repository:

$ git clone https://github.com/severalnines/s9s-admin

Copy it over to the plugin directory:

$ cp s9s-admin/plugins/syslog/syslog.py /var/cmon/plugins/

Restart CMON to load the plugin:
```
$ service cmon restart
```

That’s it. Now you will get a copy of alarms triggered by ClusterControl in the local syslog. You can verify this with the following command on Debian:

$ tail -f /var/log/syslog
Jan 14 04:01:42 cc-server cmon-syslog-plugin: cluster-1, new alarm: MySQL server disconnected (id=573205676966301289)
Jan 14 04:20:31 cc-server cmon-syslog-plugin: cluster-1, new alarm: SSH failed (id=6970644932146548621)

For RHEL based systems, you can find them at /var/log/messages:

$ tail -f /var/log/messages
Jan 14 04:01:42 cc-server cmon-syslog-plugin: cluster-1, new alarm: MySQL server disconnected (id=573205676966301289)
Jan 14 04:20:31 cc-server cmon-syslog-plugin: cluster-1, new alarm: SSH failed (id=6970644932146548621)

To learn more on how ClusterControl plugin is executed, please refer to this blog post under ‘How does the plugin execute’ section. That’s all folks!

Blog category:

ClusterControl

Tags:

Splunk is a popular log management tool for our syslog facility. In this blog, we will show you how to integrate ClusterControl alarms with Splunk.

We are going to forward the alarm event through the ClusterControl syslog plugin, where the rsyslog client will forward it over to Splunk via UDP port 514. We will then create an alert in Splunk based on the specific search term.

**For the purpose of this blog post, we are using UDP as the transport protocol. Note that UDP is a connectionless, unreliable protocol that does not guarantee message delivery. If a message gets lost, neither the log device nor the collector is going to know or care. You can use TCP as alternative.

Here is our simple architecture:

On the ClusterControl node, install rsyslog (if not installed) so we can forward the syslog to Splunk.

Install rsyslog on ClusterControl node:

$ yum install rsyslog # RHEL/CentOS
$ apt-get install rsyslog # Debian/Ubuntu

Then append the following line into /etc/rsyslog.conf under “catch-all” log files section (line 94):
```
*.* @192.168.55.119:514
```
Restart syslog to load the new configuration:
```
$ systemctl restart syslog
```
Configure Splunk to listen to incoming syslog on UDP port 514. Go to Splunk > Settings > Data inputs > UDP > New and specify as per below:
Click ‘Next’. In the ‘Input Settings’, choose syslog as the ‘Source type’ and leave everything default. Then, click ‘Review’ and ‘Submit’. Splunk is now capturing the syslog data sent by ClusterControl server. If there are new alarms raised by ClusterControl, you can retrieve them directly from Splunk > App > Search & Reporting and look up for “cmon-syslog-plugin” and you should see something like below:

Creating a Simple Alert with Splunk

From the search result shown above, we can ask Splunk to send an alert if it finds a specific pattern in the incoming syslog. For example, to get an alert after a new alarm is raised when a MySQL server is down, use the following search term:
“cmon-syslog-plugin AND new alarm AND MySQL server disconnected”

You should get the following result (if the alarm is raised by ClusterControl):

Click on Save As > Alert and configure the settings with alert type ‘Real-time’:

Click Save and you will get an alert if the same alarm raises again.

Splunk can also handle various alert actions like email, running a script, webhook and also through third-party apps integration like SMS alerting via Twillio, Slack notification, Twitter and more. The integration step above are similar if you are running on other log management tools like Logstash stack (with Kibana and ElasticSearch), SumoLogic or Loggly.

That’s it. Happy alerting!

Blog category:

ClusterControl

Tags:

With our new ClusterControl 1.2.12 release we have added many new features like operational reports, enhanced backup options, SSL Encryption for Galera replication links and improved the support for external tools. One of these tools is CCBot, the ClusterControl chatbot.

CCBot is based on the popular Hubot framework originally created by Github. Github uses Hubot as their DevOps tools of choice and allowing them to do Continuous Integration and Continuous Delivery on their entire infrastructure. So what does Hubot allow you to do?

Hubot

Hubot is a chatbot that has been modelled after Github’s internal bot called hubot. The Hubot framework allows you to quickly create your own bot and extend it with various pre made scripts and integrate it with many popular chat and messaging services.

Hubot is meant as a tool to help your team automate and operate at scale, like for instance when you are part of a DevOps team. You can give it a command and it will execute that command for you. For instance you could do code deployments, kick off continuous integrations, upgrade schemas using schema control, schedule backups and even scale infrastructure.

Not only is Hubot capable of executing tasks, it can also monitor systems for you. For instance if you create a post-commit hook that interfaces with Hubot you can alert all team members that someone just committed code. Take that one step further and you could even monitor your database servers.

Automating your Database with CCBot

CCBot is the Severalnines integration of ClusterControl in the Hubot framework and therefore supports most of the major chat services like Slack, Flowdock, Hipchat, Campfire, any XMPP based chat service and also IRC. We have tested and verified CCBot to work with Slack, Flowdock, Hipchat and Campfire. CCBot follows the philosophy of Severalnines by implementing the four pillars of ClusterControl: Deploy, Manage, Monitor and Scale.

Monitor and manage

The first release of CCBot covers the Manage and Monitor parts of ClusterControl, meaning CCBot will be able to keep your team up to date on the status of your clusters, jobs and backups. At the same time you can also create impromptu backups, read the last log lines of the MySQL error logs, schedule and create the daily reports.

Installing CCBot

There are two ways to integrate CCBot with Hubot: either as a standalone chatbot that operates from your ClusterControl host or if you already have a Hubot based chatbot in your company integrate it into your existing Hubot framework. The latter may need a bit of adjustments to your startup script.

Installing CCBot is only a few minutes of work. You can find our repository here:
https://github.com/severalnines/ccbot

Integrate CCBot on an existing Hubot framework

In principle this should be relatively easy as you already have a working Hubot chatbot, thus only copying the source files to your chatbot and add the CCBot parameters would be sufficient to make it work.

Installing CCBot scripts

Copy the following files from our ccbot repository to your existing Hubot instance in the respective directories:

git clone https://github.com/severalnines/ccbot
cd ccbot
cp -R src/config <hubot rootdir>/
cp -R src/scripts <hubot rootdir>/
cp -R src/utils <hubot rootdir>/

Then add the following parameters in your Hubot startup script if necessary:

export HUBOT_CMONRPC_TOKENS=’TOKEN0,TOKEN1,TOKEN2,TOKEN3’
export HUBOT_CMONRPC_HOST=’<your clustercontrol host>’
export HUBOT_CMONRPC_PORT=9500
export HUBOT_CMONRPC_MSGROOM=’General’

These variables will be picked up by the config.coffee file and used inside the cmonrpc calls.

The HUBOT_CMONRPC_TOKENS variable should contain the RPC tokens set under the rpc_key parameter in /etc/cmon.cnf and /etc/cmon.d/cmon_<cluster>.cnf configuration files. These tokens are used to secure the CMON RPC api and hence have to be filled in when used.

NOTE: Currently as of 1.2.12 the ClusterControl web application does not support having a RPC token in the cmon.cnf file. If you want to run both ccbot and access the web application at the same time then comment out the RPC token in the cmon.cnf file.

For configuration of the HUBOT_CMONRPC_MSGROOM variable, see below in the standalone installation.

Bind ClusterControl to external ip addres

As of ClusterControl version 1.2.12 there is a change in binding address of the CMON RPC: by default it will bind to localhost (127.0.0.1) and if your existing Hubot chatbot is living on a different host you need to configure CMON to bind to another ip address as well. You can change this in the cmon default file (/etc/default/cmon):

RPC_PORT=9500
RPC_BIND_ADDRESSES="127.0.0.1,<your ip address>"

Install CCBot as a standalone chatbot

Prerequisites

Firstly we need to have the node.js framework installed. This can best be done by installing npm. This should install the necessary node.js packages as well and allow you to install additional modules via npm.

Installing Hubot framework

For security we create a separate hubot user to ensure Hubot itself can’t do anything outside running Hubot and create the directory to run Hubot from.

sudo useradd -m hubot
sudo mkdir /var/lib/hubot
sudo chown hubot /var/lib/hubot

To install the Hubot framework from scratch follow the following procedure where the adapter is the chat service you are using (e.g. slack, hipchat, flowdock):

sudo npm install -g yo generator-hubot
sudo su - hubot
cd /var/lib/hubot
yo hubot --name CCBot --adapter <adapter>

So if you are using, for instance, Slack as your chat provider you would need to provide “slack” as your adapter. A complete list of all the Hubot adapters can be found here:
https://hubot.github.com/docs/adapters/
Don’t forget to configure your adapter accordingly in the hubot startup script.

Also if you choose to change CCBot’s name keep in mind not to name the bot to Hubot: the Hubot framework attempts to create a module named exactly the same as the name you give to the bot. Since the framework is already named Hubot this will cause a non-descriptive error.

Installing CCBot scripts

Copy the following files to the ccbot directory:

cd ~/
git clone https://github.com/severalnines/ccbot
cd ccbot
cp -R src/config /var/lib/hubot/
cp -R src/scripts /var/lib/hubot/
cp -R src/utils /var/lib/hubot/

Installing Hubot startup scripts

Obviously you can run Hubot in the background or a Screen session, but it would be much better if we can daemonize Hubot using proper start up scripts. We supply three startup scripts for CCBot: a traditional Linux Standard Base init script (start, stop, status), a systemd wrapper for this init script and a supervisord script.

Linux Standard Base init script:

For Redhat/Centos 6.x (and lower):

cp scripts/hubot.initd /etc/init.d/hubot
cp scripts/hubot.env /var/lib/hubot
chkconfig hubot on

For Debian/Ubuntu:

cp scripts/hubot.initd /etc/init.d/hubot
cp scripts/hubot.env /var/lib/hubot
ln -s /etc/init.d/hubot /etc/rc3.d/S70hubot

Systemd:

For systemd based systems:

sudo cp scripts/hubot.initd /sbin/hubot
cp scripts/hubot.env /var/lib/hubot
sudo cp scripts/hubot.systemd.conf /etc/systemd/hubot.conf
sudo systemctl daemon-reload
sudo systemctl enable hubot

Supervisord

For this step it is necessary to have supervisord installed on your system.

For Redhat/Centos:

sudo yum install supervisord
sudo cp scripts/hubot.initd /sbin/hubot
sudo cp scripts/hubot.supervisord.conf /etc/supervisor/conf.d/hubot.conf
sudo supervisorctl update

For Debian/Ubuntu:

sudo apt-get install supervisord
cp scripts/hubot.initd /sbin/hubot
sudo cp scripts/hubot.supervisord.conf /etc/supervisor/conf.d/hubot.conf
sudo supervisorctl update

Hubot parameters

Then modify the following parameters in the Hubot environment script (/var/lib/hubot/hubot.env) or supervisord config if necessary:

export HUBOT_CMONRPC_TOKENS=’TOKEN0,TOKEN1,TOKEN2,TOKEN3’
export HUBOT_CMONRPC_HOST=’localhost’
export HUBOT_CMONRPC_PORT=9500
export HUBOT_CMONRPC_MSGROOM=’General’

The HUBOT_CMONRPC_TOKENS variable should contain the RPC tokens set in /etc/cmon.cnf and /etc/cmon.d/cmon_<cluster>.cnf configuration files. These tokens are used to secure the CMON RPC api and hence have to be filled in when used. If you have no tokens in your configuration you can leave this variable empty.

The HUBOT_CMONRPC_MSGROOM variable contains the team’s room the chatbot has to send its messages to. For the chat services we tested this with it should be something like this:

Slack: use the textual ‘General’ chatroom or a custom textual one.
Hipchat: similar to “17723_yourchat@conf.hipchat.com”. You can find your own room via “Room Settings”
Flowdock: needs a room identifier similar to “a0ef5f5f-9d97-42aa-b6a3-c1a6bb87510e”. You can find your own identifier via Integrations -> Github -> popup url
Campfire: a numeric room, which is in the url of the room

Hubot commands

You can operate Hubot by giving it commands in the chatroom. In principle it does not matter whether you issue to command in a general chatroom where Hubot is present or if it were in a private chat with the bot itself. Sending a command will be as following:

botname command

Where botname is the name of your Hubot bot, so if in our example Hubot is called “ccbot” and the command is “status” you would send the command be as following:

@ccbot status

Note: when you are in a private chat with the chatbot you must omit the addressing of the bot.

Command list

Status

Syntax:

status

Lists the clusters in ClusterControl and shows their status.

Example:

@ccbot status

Full backup

Syntax:

backup cluster <clusterid> host <hostname>

Schedules a full backup for an entire cluster using xtrabackup. Host is an optional parameter, if not provided CCBot will pick the first host from the cluster.

Example:

@ccbot backup cluster 1 host 10.10.12.23

Schema backup

Syntax:

@backup cluster <clusterid> schema <schema> host <hostname>

Schedules a backup for a single schema using mysqldump. Host is an optional parameter, if not provided CCBot will pick the first host from the cluster.

Example:

@ccbot backup cluster 1 schema important_schema

Create operational report

Syntax:

createreport cluster <clusterid>

Creates an operational report for the given cluster

Example:

@ccbot createreport cluster 1

List operational reports

Syntax:

listreports cluster <clusterid>

Lists all available reports for the given cluster

Example:

@ccbot listreports cluster 1

Last loglines

Syntax:

lastlog cluster <cluster> host <host> filename <filename> limit <limit>

Returns the last log lines of the given cluster/host/filename.

Example:

@ccbot lastlog cluster 1 host 10.10.12.23 filename /var/log/mysqld.log limit 5

CCBot roadmap

As CCBot is meant to compliment the ClusterControl UI we will review what makes sense to put into a chatbot and what not. So far we have identified that adding schemas, users, scaling clusters and running (custom) advisors makes the most sense and we will continue to extend CCBot with that functionality in the upcoming months. Obviously if you have the urge and need to automate additional ClusterControl functions we are all ears.

Blog category:

ClusterControl

Tags:

In the previous posts of this blog series, we covered deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale, how to prepare yourself against disasters by scheduling backups, how to manage your database configurations and in the last post how to manage your log files.

One of the most important aspects of becoming a ClusterControl DBA is to be able to delegate tasks to team members, and control access to ClusterControl functionality. This can be achieved by utilizing the User Management functionality, that allows you to control who can do what. You can even go a step further by adding teams or organizations to ClusterControl and map them to your DevOps roles.

Organizations

Organizations can be seen either as a full organization or groups of users. Clusters can be assigned to organizations and in this way the cluster is only visible for the users in the organization it has been assigned to. This allows you to run multiple organizations within one ClusterControl environment. Obviously the ClusterControl admin account will still be able to see and manage all clusters.

You can create a new Organization via Settings > User Management and clicking on the plus sign on the left side under Organizations:

After adding a new Organization, you can assign users to the organization.

Users

After selecting the newly created organization, you can add new users to this organization by pressing the plus sign on the right dialogue:

By selecting the role, you can limit the functionality of the user to either an Super Admin, Admin or User. You can extend these default roles in the Access Control section.

Access Control

Standard Roles

Within ClusterControl the default roles are: Super Admin, Admin and User. The Super Admin is the only account that can administrate organizations, users and roles. The Super Admin is also able to migrate clusters across organizations. The admin role belongs to a specific organization and is able to see all clusters in this organization. The user role is only able to see the clusters he/she created.

User Roles

You can add new roles within the role based access control screen. You can define the privileges per functionality whether the role is allowed (read-only), denied (deny), manage (allow change) or modify (extended manage).

If we create a role with limited access:

As you can see, we can create a user with limited access rights (mostly read-only) and ensure this user does not break anything. This also means we could add non-technical roles like Manager here.

Notice that the Super Admin role is not listed here as it is a default role with the highest level of privileges within ClusterControl and thus can’t be changed.

LDAP Access

ClusterControl supports Active Directory, FreeIPA and LDAP authentication. This allows you to integrate ClusterControl within your organization without having to recreate the users. In earlier blog posts we described how to set up ClusterControl to authenticate against OpenLDAP, FreeIPA and Active Directory.

Once this has been set up authentication against ClusterControl will follow the chart below:

Basically the most important part here is to map the LDAP group to the ClusterControl role. This can be done fairly easy in the LDAP Access page under User Management.

The dialog above would map the DevopsTeam to the Limited User role in ClusterControl. Then repeat this for any other group you wish to map.

After this any user authenticating against ClusterControl will be authenticated and authorized via the LDAP integration.

Final thoughts

Combining all the above allows you to integrate ClusterControl better into your existing organization, create specific roles with limited or full access and connect users to these roles. The beauty of this is that you are now much more flexible in how you organize around your database infrastructure: who is allowed to do what? You could for instance offload the task of backup checking to a site reliability engineer instead of having the DBA check them daily. Allow your developers to check the MySQL, Postgres and MongoDB log files to correlate them with their monitoring. You could also allow a senior developer to scale the database by adding more nodes/shards or have a seasoned DevOps engineer write advisors.

As you can see the possibilities here are endless, it is only a question of how to unlock them. In the Developer Studio blog series, we dive deeper into automation with ClusterControl and for DevOps integration we recently released CCBot.

Tags:

In the previous blog posts, we gave a brief introduction to ClusterControl Developer Studio and the ClusterControl Domain Specific Language and how to extract information from the Performance Schema. ClusterControl’s Developer Studio allows you to write your own scripts, advisors and alerts. With just a few lines of code, you can already automate your clusters!

In this blog post we will dive deeper into Developer Studio and show you how you can keep an eye on performance and at the same time scale out the number of read slaves in your replication topology whenever it is necessary.

CMON RPC

The key element in our advisor will be talking to the CMON RPC: ClusterControl’s API that enables you to automate tasks. Many of the components of ClusterControl make use of this API as well and a great deal of functionality is accessible via the API.

To be able to talk to the CMON RPC we will need to install/import the cmonrpc.js helper file from the Severalnines Github Developer Studio repository into your own Developer Studio. We described this process briefly in our introductory blog post. Alternatively you could create a new file named common/cmonrpc.js and paste the contents in there.

This helper file has only one usable function that interacts with the CMON RPC at the moment: addNode. All the other functions in this helper are supporting this process, like for instance the setCmonrpcToken function that adds the RPC token in the JSON body if RPC tokens are in use.

The cmonrpc helper expects the following variables to be present:

var CMONRPC_HOST = 'localhost';
var CMONRPC_PORT = '9500';
var CMONRPC_TOKEN = ["token0", "token1", “token2”];
var FREE_HOSTS = ["10.10.10.12", "10.10.10.13", "10.10.10.14"];

The FREE_HOSTS variable contains the ip addresses of the hosts we want to use as read slaves. This variable will be used by the findUnusedHosts function and compared against the hosts already present in the cluster and return an unused host or false in case there is no unused host available.

The CMONRPC_TOKEN variable contains the RPC tokens when used. The first token will be the token found in the main cmon.cnf. If you are not using RPC tokens in your configuration, you can leave them empty.

NOTE: Currently as of 1.2.12 the ClusterControl web application does not support having a RPC token in the cmon.cnf file. If you want to run both this advisor and access the web application at the same time, then comment out the RPC token in the cmon.cnf file and leave the CMON_RPCTOKEN variable empty.

Auto Scale

Our auto scaling advisor is a very basic one: we simply look at the number of connections on our master and slaves. If we find the number of connections excessive on the slaves, we need to scale out our reads and we can do this by adding fresh servers.

We will look at long(er) term connections to prevent our advisor from scaling unnecessarily. Therefore we use the SQL statistics functionality from Developer Studio and determine the standard deviation of each node in the cluster. You could customize this to either the nth-percentile, average or maximum connections if you like, but that last one could cause unnecessary scaling.

var endTime   = CmonDateTime::currentDateTime();
var startTime = endTime - 3600;
var stats     = host.sqlStats(startTime, endTime);
var config      = host.config();
var max_connections    = config.variable("max_connections")[0]['value'];

We retrieve the SQL statistics using the host.sqlStats function, and passing it a start- and endtime, we retrieve the configured maximum number of connections as well. The sqlStats function returns an array of maps containing all statistics collected during the period we selected. Since the statistical functions of Developer Studio expect arrays containing only values, the array of maps isn’t useable in this form. So we need to create a new array and copy all the values for the number of connections.

var connections = [];
for(stx = 0; stx < stats.size(); ++stx) {
    connections[stx] = stats[stx]['connections'];
}

Then we can calculate the connections used during our selected period of time and express that as an percentage:

stdev_connections_pct = (stdev(connections) / max_connections) * 100;
if(stdev_connections_pct > WARNING_THRESHOLD) {
    THRESHOLD_MET = true;
}

Once our threshold is met, we add a new node to our cluster and this is when we call the cmonrpc helper functions. However we only want to do this once during our run, hence we set the variable THRESHOLD_MET. At the very end, we also add an extra line of advice to show we are scaling out our cluster

if (THRESHOLD_MET == true)
{
    /* find unused node */
    node = findUnusedHost();
    addNode(node);

    advice = new CmonAdvice();
    advice.setTitle(TITLE);
    advice.setAdvice("Scaling out cluster with new node:"+ node);
    advice.setJustification("Scaling slave nodes is necessary");
    advisorMap[idx+1]= advice;
}

Conclusion

Obviously, there are still a few shortcomings with this advisor: it should obviously not run more frequently than the period used for the SQL statistics selection. In our example we set it to 1 hour of statistics, so do not run the advisor more frequently than once per hour.

Also the advisor will put extra stress on the master by copying its dataset to the new slave, so you better also keep an eye on the master node in your Master-Slave topology. The advisors are limited to a runtime of 30 seconds at this moment, so if there is a slow response in the curl calls, it could exceed the runtime if you use the cmonrpc library for other purposes.

On the good side, this advisor shows how easy you can use advisors beyond what they were designed for and use them to trigger actions. Examples of such actions could be the scheduling of backups or setting hints in your configuration management tool (Zookeeper/Consul). The possibilities with Developer Studio are almost only limited by your imagination!

The complete advisor:

#include "common/mysql_helper.js"
#include "common/cmonrpc.js"

var CMONRPC_HOST = 'localhost';
var CMONRPC_PORT = '9500';
var CMONRPC_TOKEN = ["test12345", "someothertoken"];
var FREE_HOSTS = ["10.10.19.12", "10.10.19.13", "10.10.19.14"];

/**
 * Checks the percentage of used connections and scales accordingly
 * 
 */ 
var WARNING_THRESHOLD=85;
var TITLE="Auto scaling read slaves";
var THRESHOLD_MET = false;
var msg = '';

function main()
{
    var hosts     = cluster::mySqlNodes();
    var advisorMap = {};

    for (idx = 0; idx < hosts.size(); ++idx)
    {
        host        = hosts[idx];
        map         = host.toMap();
        connected     = map["connected"];
        var advice = new CmonAdvice();
        var endTime   = CmonDateTime::currentDateTime();
        var startTime = endTime - 10 * 60;
        var stats     = host.sqlStats(startTime, endTime);
        var config      = host.config();
        var max_connections    = config.variable("max_connections")[0]['value'];
        var connections = [];

        if(!connected)
            continue;
        if(checkPrecond(host) && host.role() != 'master')
        {
            /* Fetch the stats on connections over our selection period */
            for(stx = 0; stx < stats.size(); ++stx)
                connections[stx] = stats[stx]['connections'];
            stdev_connections_pct = (stdev(connections) / max_connections) * 100;
            if(stdev_connections_pct > WARNING_THRESHOLD)
            {
                THRESHOLD_MET = true;
                msg = "Slave node";
                advice.setJustification("Percentage of connections used (" + stdev_connections_pct + ") above " + WARNING_THRESHOLD + " so we need to scale out slaves.");
                advice.setSeverity(Warning); 
            }
            else
            {
                msg = "Slave node";
                advice.setJustification("Connections used ok.");
                advice.setSeverity(Ok);
            }
        }
        else
        {
            if (host.role() == 'master')
            {
                msg = "Master node";
                advice.setJustification("Master node will not be taken into consideration");
                advice.setSeverity(Ok);  
            }
            else
            {
                msg = "Cluster is not okay and there is no data";
                advice.setJustification("there is not enough load on the server or the uptime is too little.");
                advice.setSeverity(Ok);
            }
        }

        advice.setHost(host);
        advice.setTitle(TITLE);
        advice.setAdvice(msg);
        advisorMap[idx]= advice;
    }

    if (THRESHOLD_MET == true)
    {
        /* find unused node */
        var node = findUnusedHost();
        addNode(node);

        advice = new CmonAdvice();
        advice.setTitle(TITLE);
        advice.setAdvice("Scaling out cluster with new node:"+ node);
        advice.setJustification("Scaling slave nodes is necessary");
        advisorMap[idx+1]= advice;
    }


    return advisorMap;
}

Tags:

Graylog is an open-source log management tool. Similar to Splunk and LogStash, Graylog helps centralize and aggregate all your log files for full visibility. It also provides a query language to search through log data. For large volumes of log data in a big production setup, you might want to deploy a Graylog Cluster.

Graylog Cluster consists of several components:

Graylog server - Log processor
Graylog web UI - Graylog web user interface
MongoDB - store configuration and the dead letter messages
ElasticSearch - store messages (if you lose your ElasticSearch data, the messages are gone)

In this blog post, we are going to deploy a Graylog cluster, with a MongoDB Replica Set deployed using ClusterControl. We will configure the Graylog cluster to be able to collect syslog from several devices through a load balanced syslog TCP running on HAProxy. This is to allow high availability single endpoint access with auto failover in case if any of the Graylog servers goes down.

Our Graylog cluster consists of 4 nodes:

web.local - ClusterControl server + Graylog web UI + HAProxy
graylog1.local - Graylog server + MongoDB Replica Set + ElasticSearch
graylog2.local - Graylog server + MongoDB Replica Set + ElasticSearch
graylog3.local - Graylog server + MongoDB Replica Set + ElasticSearch

The architecture diagram looks like this:

Prerequisites

All hosts are running on CentOS 7.1 64 bit with SElinux and iptables disabled. The following is the host definition inside /etc/hosts:

192.168.55.200     web.local clustercontrol.local clustercontrol web
192.168.55.201     graylog1.local graylog1
192.168.55.202     graylog2.local graylog2
192.168.55.203      graylog3.local graylog3

Ensure NTP is installed and enabled:

$ yum install ntp -y
$ systemctl enable ntpd
$ systemctl start ntpd

Deploying MongoDB Replica Set

The following steps should be performed on the ClusterControl server.

Install ClusterControl on web.local:

$ wget http://severalnines.com/downloads/cmon/install-cc
$ chmod 755 install-cc
$ ./install-cc

Follow the installation wizard up until it finishes. Open ClusterControl UI at http://web.local/clustercontrol and create a default admin user.

Setup passwordless SSH from ClusterControl server to all MongoDB nodes (including ClusterControl server itself):

ssh-keygen -t rsa
ssh-copy-id 192.168.55.200
ssh-copy-id 192.168.55.201
ssh-copy-id 192.168.55.202
ssh-copy-id 192.168.55.203

From ClusterControl UI, go to Create Database Node. We are going to deploy MongoDB Replica Set by creating one MongoDB node, then use Add Node function to expand it to a three-node Replica Set.
Click on the Cluster Action icon and go to ‘Add Node to Replica Set’ and add the other two nodes, similar to screenshot below:
Repeat the above steps for graylog3.local (192.168.55.203). Once done, at this point, you should have a three-node MongoDB Replica Set:

ClusterControl v.1.2.12 defaults to install latest version of MongoDB 3.x.

Setting Up MongoDB User

Once deployed, we need to create a database user for graylog. Login to the MongoDB console on the PRIMARY MongoDB Replica Set node (you can determine the role under the ClusterControl Overview page). In this example, it was graylog1.local:

$ mongo

And paste the following lines:

my_mongodb_0:PRIMARY> use graylog2
my_mongodb_0:PRIMARY> db.createUser(
    {
      user: "grayloguser",
      pwd: "password",
      roles: [
         { role: "readWrite", db: "graylog2" }
      ]
    }
);

Verify that the user is able to access the graylog2 schema on another replica set member (e.g. 192.168.55.202 was in SECONDARY state):

$ mongo -u grayloguser -p password 192.168.55.202/graylog2

Deploying ElasticSearch Cluster

The following steps should be performed on graylog1, graylog2 and graylog3.

Graylog only supports ElasticSeach v1.7.x. Download the package from ElasticSearch website:

$ wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.5.noarch.rpm

Install Java OpenJDK:
```
$ yum install java
```

Install ElasticSearch package:

$ yum localinstall elasticsearch-1.7.5.noarch.rpm

Specify the following configuration file inside /etc/elasticsearch/elasticsearch.yaml:

cluster.name: graylog-elasticsearch
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["graylog1.local", "graylog2.local", "graylog3.local"]
discovery.zen.minimum_master_nodes: 2
network.host: 192.168.55.203

** Change the value of network.host relative to the host that you are configuring.

Start the ElasticSearch daemon:

$ systemctl enable elasticsearch
$ systemctl start elasticsearch

Verify that ElasticSearch is loaded correctly:

$ systemctl status elasticsearch -l

And ensure it listens to the correct ports (default is 9300):

[root@graylog3 ~]# netstat -tulpn | grep -E '9200|9300'
tcp6       0      0 192.168.55.203:9200     :::*                    LISTEN      97541/java
tcp6       0      0 192.168.55.203:9300     :::*                    LISTEN      97541/java

Use curl to obtain the ElasticSearch cluster state:

[root@graylog1 ~]# curl -XGET 'http://192.168.55.203:9200/_cluster/state?human&pretty'
{
  "cluster_name" : "graylog-elasticsearch",
  "version" : 7,
  "master_node" : "BwQd98BnTBWADDjCvLQ1Jw",
  "blocks" : { },
  "nodes" : {
    "BwQd98BnTBWADDjCvLQ1Jw" : {
      "name" : "Misfit",
      "transport_address" : "inet[/192.168.55.203:9300]",
      "attributes" : { }
    },
    "7djnRL3iR-GJ5ARI8eIwGQ" : {
      "name" : "American Eagle",
      "transport_address" : "inet[/192.168.55.201:9300]",
      "attributes" : { }
    },
    "_WSvA3gbQK2A4v17BUWPug" : {
      "name" : "Scimitar",
      "transport_address" : "inet[/192.168.55.202:9300]",
      "attributes" : { }
    }
  },
  "metadata" : {
    "templates" : { },
    "indices" : { }
  },
  "routing_table" : {
    "indices" : { }
  },
  "routing_nodes" : {
    "unassigned" : [ ],
    "nodes" : {
      "_WSvA3gbQK2A4v17BUWPug" : [ ],
      "BwQd98BnTBWADDjCvLQ1Jw" : [ ],
      "7djnRL3iR-GJ5ARI8eIwGQ" : [ ]
    }
  },
  "allocations" : [ ]
}

Configuring the ElasticSearch cluster is completed.

Deploying Graylog Cluster

The following steps should be performed on graylog1, graylog2 and graylog3.

Download and install Graylog repository for CentOS 7:

$ rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-1.3-repository-el7_latest.rpm

Install Graylog server and Java OpenJDK:
```
$ yum install java graylog-server
```
Generate a SHA sum for our Graylog admin password using the following command:
```
$ echo -n password | sha256sum | awk {'print $1'}
5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
```
**Copy the generated value to be used as root_password_sha2 value in Graylog configuration file.

Configure Graylog server configuration file at /etc/graylog/server/server.conf, and ensure following options are set accordingly:

password_secret = password
root_password_sha2 = 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
rest_listen_uri = http://0.0.0.0:12900/
elasticsearch_cluster_name = graylog-elasticsearch
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = graylog1.local:9300,graylog2.local:9300,graylog3.local:9300
mongodb_uri = mongodb://grayloguser:password@192.168.55.201:27017,192.168.55.202:27017,192.168.55.203:27019/graylog2

After the configurations are saved, Graylog can be started with the following command:

$ systemctl enable graylog-server
$ systemctl start graylog-server

Ensure all components are up and running inside Graylog log:

$ tail /var/log/graylog-server/server.log
2016-03-03T14:17:42.655+08:00 INFO  [ServerBootstrap] Services started, startup times in ms: {InputSetupService [RUNNING]=2, MetricsReporterService [RUNNING]=7, KafkaJournal [RUNNING]=7, OutputSetupService [RUNNING]=13, BufferSynchronizerService [RUNNING]=14, DashboardRegistryService [RUNNING]=21, JournalReader [RUNNING]=100, PeriodicalsService [RUNNING]=142, IndexerSetupService [RUNNING]=3322, RestApiService [RUNNING]=3835}
2016-03-03T14:17:42.658+08:00 INFO  [ServerBootstrap] Graylog server up and running.

**Repeat the same steps for the remaining nodes.

Deploying Graylog Web UI

The following steps should be performed on web.local.

Download and install Graylog repository for CentOS 7:

$ rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-1.3-repository-el7_latest.rpm

Install Graylog web UI and Java OpenJDK:
```
$ yum install java graylog-web
```
Generate a secret key. The secret key is used to secure cryptographics functions. Set this to a long and randomly generated string. You can use a simple md5sum command to generate it:
```
$ date | md5sum | awk {'print $1'}
eb6aebdeedfb2fa05742d8ca733b5a2c
```
Configure the Graylog server URIs and application secret (taken as above) inside /etc/graylog/web/web.conf:
```
graylog2-server.uris="http://192.168.55.201:12900/,http://192.168.55.202:12900/,http://192.168.55.203:12900"
application.secret="eb6aebdeedfb2fa05742d8ca733b5a2c"
```
** If you deploy your application to several instances be sure to use the same application secret.
After the configurations are saved, Graylog Web UI can be started with the following command:
```
$ systemctl enable graylog-web
$ systemctl start graylog-web
```
Now, login to the Graylog Web UI at http://web.local:9000/ and with username “admin” and password “password”. You should see something like below:

Our Graylog suite is ready. Let’s configure some inputs so it can start capturing log streams and messages.

Configuring Inputs

To start capturing syslog data, we have to configure Inputs. Go to Graylog UI > System / Overview > Inputs. Since we are going to load balance the inputs via HAProxy, we need to configure the syslog input listeners to be running on TCP (HAProxy does not support UDP).

On the dropdown menu, choose “Syslog TCP” and click “Launch New Input”. In the input dialog, configure as follows:

Global input (started on all nodes)
Title: Syslog TCP 51400
Port: 51400

Leave the rest of the options as default and click “Launch”. We have to configure syslog port to be higher than 1024 because Graylog server is running as user “java”. You need to be root to bind sockets on ports 1024 and below on most *NIX systems. You could also try to give permission to the local user then runs graylog2-server to bind to those restricted ports, but usually just choosing a higher port is the easiest solution.

Once configured, you should notice the Global Input is running as shown in the following screenshot:

At this point, each Graylog server is now listening on TCP port 51400 for incoming syslog data. You can start configuring the devices to forward the syslog stream to the Graylog servers. The following lines show an example of rsyslog.conf configuration to start forwarding the syslog message to Graylog servers via TCP:

*.* @@192.168.55.201:51400
*.* @@192.168.55.202:51400
*.* @@192.168.55.203:51400

In the above example, rsyslog only sends to the secondary server if the first one fails. But there is also a neat way to provide a high availability single endpoint with auto failover using a load balancer. The load balancer performs the health check on Graylog servers to check if the syslog service is alive, it will also take the dead nodes out of the load balancing set.

In the next section, we deploy HAProxy to load balance this service.

Setting up a Load Balanced Syslog Service

The following steps should be performed on web.local.

Install HAProxy via package manager:
```
$ yum install -y haproxy
```

Clear the existing HAProxy configuration:

$ cat /dev/null > /etc/haproxy/haproxy.cfg

And add following lines into /etc/haproxy/haproxy.cfg:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  dontlognull
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

userlist STATSUSERS
         group admin users admin
         user admin insecure-password password
         user stats insecure-password PASSWORD

listen admin_page 0.0.0.0:9600
       mode http
       stats enable
       stats refresh 60s
       stats uri /
       acl AuthOkay_ReadOnly http_auth(STATSUSERS)
       acl AuthOkay_Admin http_auth_group(STATSUSERS) admin
       stats http-request auth realm admin_page unless AuthOkay_ReadOnly
       #stats admin if AuthOkay_Admin

listen syslog_tcp_514
       bind *:514
       mode tcp
       timeout client  120s
       timeout server  120s
       default-server inter 2s downinter 5s rise 3 fall 2 maxconn 64 maxqueue 128 weight 100
       server graylog1 192.168.55.201:51400 check
       server graylog2 192.168.55.202:51400 check
       server graylog3 192.168.55.203:51400 check

Enable HAProxy daemon on boot and start it up:

$ systemctl enable haproxy
$ systemctl start haproxy

Verify that HAProxy listener turns green, indicating the backend services are healthy:

Our syslog service is now load balanced between three Graylog servers on TCP port 514. Next we configure our devices to start sending out syslog messages over TCP to the HAProxy instance.

Configuring Syslog TCP Clients

In this example, we are going to use rsyslog on a standard Linux box to forward syslog messages to the load balanced syslog servers.

Install rsyslog on the client box:

$ yum install rsyslog # RHEL/CentOS
$ apt-get install rsyslog # Debian/Ubuntu

Then append the following line into /etc/rsyslog.conf under “catch-all” log files section (line 94):
```
*.* @@192.168.55.200:514
```
**Take note that ‘@@’ means we are forwarding syslog messages through TCP, while single ‘@’ is for UDP.
Restart syslog to load the new configuration:
```
$ systemctl restart syslog
```
Now we can see the log message stream pouring in under Global inputs section. You can verify this from the “Network IO” section as highlighted by the red arrows in the screenshot below:
Verify the incoming log messages by clicking on ‘Show received messages’:

We now have a highly available log processing cluster with Graylog, MongoDB Replica Set, HAProxy and ElasticSearch cluster.

Notes

This setup does not cover high availability for Graylog web UI, HAProxy and ClusterControl. In order to achieve full resilient setup, we have to have another node to serve as the secondary HAProxy and Graylog web UI with virtual IP address using Keepalived.
For ClusterControl redundancy, you have to setup a standby ClusterControl server to get higher availability.