MySQL Master Switch¶
Archived (pre-2022)
Preserved for reference only -- likely outdated. View original | Last updated: March 2017
Master switch is done by running the master_switch script. See below for a functionality overview.
Performing a master switch¶
First run this on the current master to pause archiving. (Archiving will be re-started when cheffing with the new roles, see below)
Then simply run this command to eg switch the current master to dbs002:
Note:* Always use the fqdn for the new host!
Note2: Always run the script as a regular user*, never as root! The user must have full sudo rights on all servers, however.
Tip:* Run mytop in a separate screen on the master to keep an eye for possible deadlocks (see Troubleshooting*).
The script will run a sanity check, then ask for confirmation. After confirming it will perform the necessary steps to make dbs002 master (see below).
It doesn't actually matter on which host the script is run. I () have made it a habit to always run the script on the soon-to-become-master.
Troubleshooting / Fixing deadlocks¶
Due to some unfortunate code paths in our applications (some parts still call external services within database transactions) a deadlock situation might occur while trying to set the master read only.
A clear indication of this the script apparently getting stuck and queries piling up in mytop.
Solution: use pt-kill on the master to remove offenders. Repeat if necessary:
sudo pt-kill --host 127.0.0.1 --port 3306 --run-time 1 --busy-time 5 --interval 1 --kill --victims all --ignore-user root
Finalizing the switch in chef¶
After the switch both servers (old and new master) have to be cheffed to update their local configuration:
- Edit
roles/dbs00X.jsonfor the new master, replacerole[db_slave]withrole[db_master] - Edit
roles/dbs00X.jsonfor the old master, replacerole[db_master]withrole[db_slave] Chef the old masterChef the new masterChef mon003 (to make monitoring aware of the change).
How the master_switch script works¶
Running the script without parameters simply displays the current configuration:
rwunderer@dbs004:~$ master_switch
Checking configuration...
dbs003.prd.fyber.com dbs002.prd.fyber.com dbs004.prd.fyber.com dbs005.prd.fyber.com
WARN: No mha_manager host found!
#######################
Current configuration:
MASTER is dbs002.prd.fyber.com
STATSDB is dbs004.prd.fyber.com
STATSDB2 is dbs003.prd.fyber.com
MHA_MANAGER is
Current layout as reported by mha_manager:
+--dbs003.prd.fyber.com(10.99.36.33:3306)
+--dbs004.prd.fyber.com(10.99.36.34:3306)
+--dbs005.prd.fyber.com(10.99.36.35:3306)
(ignore the warning about the missing mha_manager, it would only be needed for automated failover)
Using --help gives an overview of the possible parameters:
rwunderer@dbs004:~$ master_switch --help
Usage:
/usr/local/bin/master_switch [OPTIONS] --new-master=name
-h, --help Display this help and exit
-q, --quiet Do not display information about actions performed
-f, --force Continue even though sanity check of environment fails
--newmaster Name of new master host
Steps performed¶
- Sanity check. If any of the following preconditions are not met, the script will abort.
--forcewill override this and continue in any case - master, statsdb and statsdb2 can be identified correctly
- statsdb and statsdb2 are not the same as master
- the new master is actually different from current master
- the master will not be moved to current statsdb or statsdb2
SET READ_ONLY=1on the old master (see Troubleshooting above)- Take down the VIP on the old master (
ip addr del) - Re-configure replication to have slaves on all servers point to the new master
SET READ_ONLY=0on the new master- Take up the VIP on the new master (
ip addr add) and usearping -Uto clear arp caches on neighboring nodes/switches.