Software Secret Weapons™
|
How To Monitor MYSQL Replication In Real Time With SNMP by Pavel Simakov on 2007-05-07 16:12:53 under Linguine Watch, view comments |
|||
|
Introduction One big issue remained. In order to get a truly mission-critical database system one has to continuously monitor the state of both the master and the slave servers. In his writing Jeremy has hinted at the simple algorithm for monitoring replication status. Several simple scripts were written to explore his ideas. However, it was quite difficult for me to adapt these various scripts to my Java projects. Finally I wrote MySQL replication monitor in Java. What is really cool is that I connected it with my Linguine Watch open-source monitoring project to allow monitoring of replication status with SNMP. This article describes how I have done it. Now, you can setup your MySQL master-slave replication monitoring with SNMP in five minutes!
General algorithm for monitoring
I created simple table _MASTERSTATUS on the master to store the timestamp and some other things. Here is the DDL for it:
CREATE TABLE `_masterstatus` ( `ID` int(11) NOT NULL default '0', `SEQUENCE` int(11) default NULL, `UPDATED_ON` datetime default NULL, `MASTER_UID` int(11) default NULL, PRIMARY KEY (`ID`), KEY `ID_idx` (`ID`) ); There are several fields present in this table in addition to the timestamp. The absolute time can be very tricky to synchronize in the clustered environment and it should not be trusted. That is why I rely on the SEQUENCE number as an alternative. Instead of writing new timestamp to the master we first read the SEQUENCE number from the master. Then we increment its value by one and then write new value back into the same row on the master. Now we can wait for the replication and later read the SEQUENCE number off the slave and compare both values. We still put the timestamp into the UPDATED_ON field on the master, but for the convenience and not for the actual monitoring. To figure our how much the slave is behind the master we multiply the difference in the SEQUENCE numbers by the frequency of the updates to the master. If the master has SEQUENCE number 10, the slave has 3 and master is updated once every 5 seconds, the slave is behind master by 7 * 5 = 35 seconds. When the master-slave replication is configured a master is given a unique id. However, it is not possible to figure out the master's id while talking to the slave. To overcome this problem we need the MASTER_UID field. This field will help us to defend against the situation when one tries to accidentally monitor the master and the slave that actually do not work with each other. Now, every time the monitoring application starts it chooses a random number. This random number is written to the MASTER_UID field on all consecutive updates to the master. When reading the SEQUENCE number from the slave, we also check that the value of the MASTER_UID field matches with the master.
Purging binary logs After the slave status has been confirmed by checking SEQUENCE number and MASTER_UID, we determine the name of the current binary log file that is being processed by the slave. It can be obtained by issuing SHOW SLAVE STATUS statement on the slave (Appendix A). All prior binary logs have been already processed by the slave and can be purged from the master using PURGE MASTER LOGS TO statement.
Deployment diagram The MySQL monitor has built-in Java SNMP agent for reporting data to an SNMP monitoring station. A very good product OpManager from Adventnet can be used as a monitoring station. It has a free version that handles all aspects of SNMP monitoring (with alarms) for a limited number of hosts. Any other SNMP capable monitoring application will work as well.
Almost no programming skills are needed
The MySQL monitor application can be started from the command line using the following batch file
See MySqlReplStatusMonitor in action
The real-time chart for the sample run is presented below. The same data can be seen on a periodic HTML report The red line shows how master SEQUENCE number is incremented every 5 seconds. The green line shows SEQUENCE number on the slave. Both follow each other closely. The blue line shows how much the slave is behind the master. It always stays at 0, confirming that the replication is always done within 5 seconds (in my test at least). The yellow line shows number of errors. At around 120 seconds into the test I issue STOP SLAVE command on the slave server. The green line stops growing; new values for SEQUENCE number do not arrive to the slave. The blue line, the lag, increases correspondingly. As soon as I issue START SLAVE command at around 140 seconds slave immediately recovers and everything goes back to normal! There are no errors while replication is stopped; we are able to connect to both the slave and the master to check their statuses. At around 180 seconds, I terminate the slave process and the number of errors and the lag start growing. The errors increase because we are not able to connect to slave anymore. Finally, at around 210 seconds, I terminate the master. The red line, that represents the master SEQUENCE number, stops growing. The number of errors goes up twice as fast. Now we are ready to eliminate the last bit of manual work in the replication monitoring. Set up an email alarm inside the OpManager to trigger if:
Implementation of the MySQL monitor
MySqlReplContext
Following the ideas of the Linguine Watch framework for adding performance counters to large software applications, all the MySQL monitoring information is concentrated in one class MySqlReplStatusMonitor This class contains five individual performance counters including:
All performance counters are continuously updated by the MySqlReplThread as it communicates with the master and the slave. At the same time the performance counter values can be accessed directly in real-time by registering MySqlReplStatusMonitor with Linguine Watch built-in SNMP agent.
Implementation of SNMP agent
The SNMP defines a special MIB file format for listing counter definitions. The Linguine Watch generates MIB files automatically thus saving you from the need to learn MIB file format and writing the file by hand. The generated MIB file for the MySqlReplStatusMonitor is here
Putting it all together The issues here are similar to the issues of any Java or non-Java package where multiple dependent services must be started and stopped in a coordinated way. I do not want to pollute this article with the deep discussion of this and I will discuss this topic elsewhere.
To make long story short, the following adapted excerpt from MySqlReplTest
public static void main(String args){
Conslusion
Appendix A mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+ | File | Position | Binlog_do_db | Binlog_ignore_db | +------------------+----------+--------------+------------------+ | my-bin-log.1666 | 678 | | | +------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> PURGE MASTER LOGS TO 'my-bin-log.1580'; Query OK, 0 rows affected (0.01 sec) mysql> SHOW SLAVE STATUS; +-------------+-------------+-------------+---------------+------------------+... | Master_Host | Master_User | Master_Port | Connect_retry | Master_Log_File |... +-------------+-------------+-------------+---------------+------------------+... | 127.0.0.1 | admin | 3011 | 60 | my-bin-log.1666 |... +-------------+-------------+-------------+---------------+------------------+... 1 row in set (0.00 sec) mysql> SHOW MASTER LOGS; +------------------+ | Log_name | +------------------+ | my-bin-log.1663 | | my-bin-log.1664 | | my-bin-log.1665 | | my-bin-log.1666 | +------------------+ 4 rows in set (0.00 sec)
Appendix B Following configuration parameters can be set for MySqlReplContext:
// the name of the housekeeping database that will store timestamp and sequence // numbers used to track replication status private String dbName = "MASTER"; // the name of the housekeeping database private String tableName = "_MASTERSTATUS"; // the table type for the CREATE TABLE command private String tableCreatePostfix = "type = InnoDB"; // change master timestamp and sequence number and check slave status // every this number of milliseconds private int checkDelayMillis = 5000; // purge master logs after confirming that slave is up to date // every this number of milliseconds private int purgeDelayMillis = 5 * 60 + 1000; // if set to true will purge binary logs once confirmed with slave // that they have been processed; is set to false will only report replication // status, increment master status, but will not purge bin logs private boolean canPurgeBinLogs = true; Following configuration parameters can be set for MySqlReplContext.MySqlReplConnectionPool:
// connection string to the master database private String masterConnStr; // connection string to the slave database private String slaveConnStr;
References
Comments (12) Leave a comment |
|
|||
|
Copyright © 2004-2010 by Pavel Simakov any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients |
|
Comment by Salvatore — July 30, 2007 @ 4:31 am
Hi
First of all congratulations for your very useful tutorial. I’m writing because I have an idea that I would discuss with you. I’m working for an italian radio company and we are building an IT infrastructure totally based on open source technology. So, I’m building a MySQL monitor starting from your tool.
My work goals are:
1) Configurable monitor for a generic MySQL replication topology;
2) “Special agent” that, in case of master node fault, switch one of the slave in master and reorganize all the topology;
3) Integration of some advising system (email,sms,etc) to alert if something happens.
Are you interested in this kind of tool?
I would to publish it under GPL licence, and I just would know your opinion about the project.
Have you tried something of similar?
Do you think that this kind of tools could be useful?
I appreciate any suggestions.
Thank you for your kindly attention.
Salvatore
Comment by Pavel Simakov — August 1, 2007 @ 11:17 am
Hi Salvatore,
Thank for reading Software Secret Weapons!
I have reviewed and had time to think about your project as you described it below. Several points can be noted. The project, given its superficial simplicity, might turn into very complex project and might require advanced software engineering skills and ongoing system administration. Do you have these already? Do you have budget for if? Can you hire skilled people?
Replication topology and especially restructuring the topology is the issue of huge complexity. The best example I know of how it was done is described in Google patches to MySQL in the section Mysql4Patches here http://code.google.com/p/google-mysql-tools/. Google supports advanced dynamic topology restructuring, but it is Google… You can try to use Google version of MySQL, but I personally found documentation to be spotty and you have to be ready to “dig in” if something goes wrong. Good luck with that!
So my advise is not even try for dynamic MySQL topology. MySQL is quite stable and you can improve reliability by using C-JDBC. Monitoring and advising systems don’t need to be built as they already exist. I personally had a very positive experience with free edition of AdventNet’s ManageEngine OpManager http://manageengine.adventnet.com/products/opmanager/index.html. It supports alarms and things like that. The LinguineWatch was specifically designed with SNMP support to allow using tools like OpManager for monitoring and advising.
Good luck,
- Pavel Simakov
Comment by Xaprb — October 1, 2007 @ 10:13 am
The MySQL Heartbeat script, which Jeremy Cole and Six Apart contributed to the MySQL Toolkit (http://mysqltoolkit.sourceforge.net/) is the same idea, elegantly designed as a command-line tool.
Comment by yingkuan — February 21, 2008 @ 7:15 pm
I wonder what’s the issue with using Seconds_Behind_Master in show slave status to check heartbeat ?
For binlog rotation, from Mysql 5.0 you could set expire_logs_days
instead of manually do it.
Comment by Alex F — February 26, 2008 @ 2:39 am
I’m not a java programmer, all I need is to monitor MySQL replication. Is there any simple manual to install your scripts properly? I spent day trying to setup them with no success.
Comment by Agostino Curti — March 11, 2008 @ 12:09 pm
Hello Pavel,
I downloaded/installed Linguine Watch 1.2 because I think this is a very powerful tool. I configured the props file and I checked the software, but during the startup tasks I see the error in attach.. It seems the master connection works fine, but the slave is unreachable… Do you have some idea ? My master/slave works fine and they are configured normally….
The only thing is: my 2 hosts are BOTH master and slave (Server1 <—> Server2). Is it a problem for Linguine ?
The error message is: “The slave reports to a different master 1422432106.”
Thanks for your help and my compliments for the software.
Agostino Curti
Comment by Pavel Simakov — March 11, 2008 @ 12:17 pm
Hi,
Thank you for using Linguine Watch!
This example of MySQL replication monitor assumes one master and one slave one way replication (master -> slave). The monitoring of two way replication needs slightly different approach.
You need to run two replication monitors one for server1 -> server2 and another one for server2 -> server1. Each master must have different master status table name, for example: _MASTERSTATUS_1 and _MASTERSTATUS_2. These table names can be setup in MySqlReplTest.props.
Everything should run well after that.
- Pavel Simakov
Comment by maplebed — March 18, 2008 @ 1:18 pm
re: yingkuan – seconds behind master has a few characteristics that make it less than ideal in more complicated situations.
* it only has a value when the slave is running, so if there is an error that stops slaving you no longer know how far the slave is behind the master
* if you are using chained replication (A -> B -> C), seconds behind on C reports the difference between C and B, not between C and A (which is often what you are actually interested in). If B falls behind, C may report that it is up to date, when in reality it is as far behind as B.
Comment by Bryon B — April 7, 2009 @ 8:49 pm
Has anyone tried running this agent as a service – either in Windows or Linux? If so, I’d like to find how you set it up, so you can start/stop it gracefully, etc.
Comment by Bryon B — April 8, 2009 @ 8:21 pm
Thanks to a tip from Pavel, I was able to get the agent running as a Windows service without too much difficulty, using the Community version of the Java Service Wrapper ( http://wrapper.tanukisoftware.org ).
I used the WrapperSimpleApp Integration method, and that seems to be working. It doesn’t do a clean shutdown, as the service wrapper forcibly terminates the JVM to shut it down, but it seems to work. Now I can start/stop the monitoring agent through the Windows Services Manager. Thanks for the tip Pavel!
Comment by Omry — August 12, 2009 @ 4:31 pm
Hi,
I created a really flexible and easy to use replication monitor called MySQL Monk.
take a look at it:
http://projects.firefang.net/wiki/MySQLMonk
Comment by lecteur de cartes — December 14, 2009 @ 7:18 am
MySQL replication allows you to have an exact copy of a database from a master server on another server (slave), and all updates to the database on the master server are immediately replicated to the database on the slave server so that both databases are in sync.