MySQL Master Switch¶

Archived (pre-2022)

Preserved for reference only -- likely outdated. View original | Last updated: March 2017

Master switch is done by running the master_switch script. See below for a functionality overview.

Performing a master switch¶

First run this on the current master to pause archiving. (Archiving will be re-started when cheffing with the new roles, see below)

touch /tmp/pt-archiver-sentinel /tmp/pt-cleaner-sentinel

Then simply run this command to eg switch the current master to dbs002:

master_switch --newmaster=dbs002.prd.fyber.com

Note:* Always use the fqdn for the new host!
Note2: Always run the script as a regular user*, never as root! The user must have full sudo rights on all servers, however.

Tip:* Run mytop in a separate screen on the master to keep an eye for possible deadlocks (see Troubleshooting*).

The script will run a sanity check, then ask for confirmation. After confirming it will perform the necessary steps to make dbs002 master (see below).

It doesn't actually matter on which host the script is run. I () have made it a habit to always run the script on the soon-to-become-master.

Troubleshooting / Fixing deadlocks¶

Due to some unfortunate code paths in our applications (some parts still call external services within database transactions) a deadlock situation might occur while trying to set the master read only.

A clear indication of this the script apparently getting stuck and queries piling up in mytop.

Solution: use pt-kill on the master to remove offenders. Repeat if necessary:

sudo pt-kill --host 127.0.0.1 --port 3306 --run-time 1 --busy-time 5 --interval 1 --kill --victims all --ignore-user root

Finalizing the switch in chef¶

After the switch both servers (old and new master) have to be cheffed to update their local configuration:

Edit roles/dbs00X.json for the new master, replace role[db_slave] with role[db_master]
Edit roles/dbs00X.json for the old master, replace role[db_master] with role[db_slave]
Chef the old master
Chef the new master
Chef mon003 (to make monitoring aware of the change).

How the master_switch script works¶

Running the script without parameters simply displays the current configuration:

rwunderer@dbs004:~$ master_switch

Checking configuration... 

dbs003.prd.fyber.com dbs002.prd.fyber.com dbs004.prd.fyber.com dbs005.prd.fyber.com  
WARN: No mha_manager host found!


####################### 
Current configuration: 

MASTER is dbs002.prd.fyber.com 
STATSDB is dbs004.prd.fyber.com 
STATSDB2 is dbs003.prd.fyber.com 
MHA_MANAGER is  


Current layout as reported by mha_manager: 

 +--dbs003.prd.fyber.com(10.99.36.33:3306)
 +--dbs004.prd.fyber.com(10.99.36.34:3306)
 +--dbs005.prd.fyber.com(10.99.36.35:3306)

(ignore the warning about the missing mha_manager, it would only be needed for automated failover)

Using --help gives an overview of the possible parameters:

rwunderer@dbs004:~$ master_switch --help

Usage:
    /usr/local/bin/master_switch [OPTIONS] --new-master=name

    -h, --help              Display this help and exit
    -q, --quiet             Do not display information about actions performed
    -f, --force             Continue even though sanity check of environment fails
    --newmaster             Name of new master host

Steps performed¶

Sanity check. If any of the following preconditions are not met, the script will abort. --force will override this and continue in any case
master, statsdb and statsdb2 can be identified correctly
statsdb and statsdb2 are not the same as master
the new master is actually different from current master
the master will not be moved to current statsdb or statsdb2
SET READ_ONLY=1 on the old master (see Troubleshooting above)
Take down the VIP on the old master ( ip addr del )
Re-configure replication to have slaves on all servers point to the new master
SET READ_ONLY=0 on the new master
Take up the VIP on the new master ( ip addr add ) and use arping -U to clear arp caches on neighboring nodes/switches.