Skip to content

MySQL in AWS

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: March 2022

General Setup

Server roles

At all times each server is assigned a distinctive role as show in the following diagram. All non-master servers replicate directly from the master.

See top of the MON-MySQL dasboard in AWS Grafana to get an idea which is which.

MySQL_in_AWS-2.png

Role VIP (virtual IP address) Description
master db.prd-aws.fyber.com Main MySQL server. All writes and most of the reads of all applications go to this server.
statsdb statsdb.prd-aws.fyber.com "Statistics DB". All lengthy background jobs (eg ams, kafka_connect, sqoop) run their queries against this server. Starting from late 2016 some heavy queries from frontend applications also use statsdb. Queries against recent_engagements that in total use most of the database cpu are split in a configurable ratio between master and statsdb
standby - Standby Slave. In emergency situations (eg hardware failure on master or statsdb) this server will be assigned the respective role. Strandby is also used for schema changes on big tables: The schema change (that can run hours up to days, depending on the table) is executed on standby. Then roles are switched, making another server standby to have the schema change applied and so on. Only when the schema change has been executed on all 5 servers, the additional column can actually be used by the application
archive - This role is permanently assigned to mysql4. The server holds both a live copy of the main database as well as an archive database (controlled by separate MySQL instance on port 3316). See MySQL data archiving. Also nightly backups of the main database are run on this server. See MySQL Backup.

Server <-> Role assignment

Due to instance type differences (see below) not all servers are suitable for all roles. See also DB Role Switch.

Server AZ master statsd/statsdb2 archive
mysql1 eu-west-1a Y Y -
mysql2 eu-west-1b Y Y -
mysql3 eu-west-1c -*) Y -
mysql4 eu-west-1a - - Y

*) Note: In an emergency, should neither mysql1 nor mysql2 be usable for some reason either mysql3 could be temporarily made master. It will suffer though due to having less RAM available

Instance types

We use the following instance types for these servers:

Server Instance type vCPU RAM EB (Database) attached to EBS (Logs) attached to EBS (Archive) attached to Local Data Storage Local Log Storage Additional notes
mysql1 x1.16xlarge 64 976GB /storage /dblogs - (not used) (not used)
mysql2 x1.16xlarge 64 976GB /storage /dblogs - (not used) (not used)
mysql3 r4.16xlarge 64 488GB /storage /dblogs - n.a. n.a.
mysql4 r4.16xlarge 64 488GB /storage /dblogs /archive n.a. n.a. Daily EBS snapshots of both database and archive volumes

Master/Role switching

How to switch roles

For step-by-step instructions see: Master Switch Cookbook.

The procedure to switch master or statsdb roles from one server to the other is basically the same as in the old DC, described in these articles:

MySQL Master Switch

MySQL Statsdb Switch

  • The only difference is that for the virtual IP (vip) we use a technique called "Overlay IP Adresses" (see below).
  • The basic configuration for Overlay IPs (ie. disabling source/destination checks and giving the necessary permissions to all mysql instances to manipulate the routing tables) are configured by Terraform in the mysql module.
  • Route table manipulation is not done by terraform however, this is part of the switch_lib library script that is part of the master_switch script.

How Overlay IP Addresses work

The following description is originally from 616 and included here in case that page becomes unavailable:

AWS networking allows creating routing entries for routing tables, which direct all traffic for an IP address to an EC2 instance. This concept allows directing the traffic to any instance in a Virtual Private Network (VPC) no matter which subnet it is in and no matter which availability zone (AZ) it is in. Changing this routing entry for the subnets in a given VPC allows redirecting traffic when needed. This concept is known as “IP Overlay” routing in AWS. It is normally being used in a static way for routers and Network Address Translation (NAT) instances. Overlay IP routing can however be used in a dynamic fashion.

Column 1
The diagram in Figure X shows a network topology in which this concept can get used. Two instances named node1 (EC2 instance i-a) and node2 (EC2 instance i-b) are connected to two different subnets. The two subnets are assigned to the same VPC in two different Availability Zones (AZ). It is not mandatory that both nodes are located in different availalibility zones and subnets, it’s however desirable in many cases. Failover nodes in high availability architectures should be independent of common failure root causes. Both nodes are part of the same Virtual Private Network (VPC). Both subnets share the same routing table named rtb_A.

The idea is to route traffic from on premises consumers or consumers from within the VPC to the IP address 10.2.0.1 in this case. It’s important that the IP address is outside of the Classless Inter-Domain Routing (CIDR) block of the VPC.

It takes 4 steps to route traffic through an Overlay IP address to EC2 node1 or node2

  1. Create a routing entry in the routing table which sends the traffic to the EC2 instance in question
  2. Disable the source/destination check for the network traffic to the two instances in the EC2 network management. The AWS network doesn’t by default send network packets to instances which don’t match the normal routing entries
  3. Enable the operating system of the EC2 instances to accept these packets
  4. The two EC2 instances are likely to monitor each other. They are likely to initiate the routing change when needed. The EC2 instances require policies in the IAM roles which authorize them make these changes in the routing table

Creating and managing the routing Entries

The AWS command line interface (AWS-CLI) allows creating such a route with the command:

aws ec2 create-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR --instance-id INSTANCE

Where as ROUTE_TABLE is the identifier of the routing table which needs to me modified. CIDR is an IP address with the filter. INSTANCE is the node to which the traffic gets directed.

Once the route exists it can be changed whenever traffic is supposed to be routed to a different node with the command:

aws ec2 replace-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR --instance-id INSTANCE

There are chances if there is a need to delete such a route entry. This happens with the command:

aws ec2 delete-route --route-table-id ROUTE_TABLE --destination-cidr-block CIDR

It may be as well important to check for the current status of the routing table. A routing table can be checked with this command:

aws ec2 describe-route-tables --route-table-ids ROUTE_TABLE

The output will list all routing entries. The user will have to filter out the line with the CIDR in question.

Disable the Source/Destination Check for the Failover Instances

Column 1
The source/destination check can be disabled through the EC2 console. It takes the execution of the following pull down menu in the console for both EC2 instances (see left). The same operation can be performed through scripts using the AWS command line interface (AWS-CLI). The following command needs to be executed one time for both instances, which are supposed to receive traffic from the Overlay IP address:
 ec2-modify-instance-attribute EC2-INSTANCE --source-dest-check false

The system on which this command gets executed needs temporarily a role with the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1424870324000",
      "Effect": "Allow",
      "Action": [ "ec2:ModifyInstanceAttribute"],
      "Resource": [
       "arn:aws:ec2:REGION:ACCOUNT-ID:instance/INSTANCE-A",
       "arn:aws:ec2:REGION:ACCOUNT-ID:instance/INSTANCE-B"
      ]
    }
  ]
}

Replace the individual parameters (bold letters) for the region, the account identifier and the two identifiers for the EC2 instances with the placeholders in bold letters.

Configure the Network Interfaces to receive the Network Traffic of the Overlay IP Address

Linux systems need the overlay IP addresses to be configured as secondary IP address on their standard interface eth0. This can be achieved by the command:

ip address add OVERLAY-IPD/CIDR dev eth0:1

The tools to make the secondary IP address permanent vary across the Linux distributions. Please use the individual documentation to lookup the commands.

Enable the Instances to change the Routes

Switching routes from node to node typically happens in failover cluster. Failover clusters with two nodes monitor each other and take action when the other node doesn’t seem to be alive anymore. The following policy has to be applied to the EC2 instances, which are supposed to monitor each other and be allowed to switch the route when needed:

{
  "Version": "2012-10-17",
  "Statement": [

    {
      "Sid": "Stmt1424860166260",
      "Action": [
        "ec2:CreateRoute",
        "ec2:DeleteRoute",
        "ec2:DescribeRouteTables",
        "ec2:ReplaceRoute"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

PMM

General

PMM (Index) is part of the mysql module as we use it for mysql only.

Basically PMM is a pre-configured Prometheus/Grafana instance with MySQL/Innodb specific metrics and dashboards. Team-ssh keys are added to the instance with cloud-init as usual.

PMM-specific instance and security group configuration is included in a separate file pmm.tf. Relevant variables (currently 2) are prefixed with pmm_.

After terraform installation one has to connect to the instance at least once on port 80 to set a username/password for linking pmm-client. (Note set username to pmm as currently that's what the client uses).

For the same reason mysql servers currently have to be added manually to PMM using these commands:

sudo yum install pmm-client
sudo pmm-admin config --server-password=${password} --server={$pmm_instance_ip}
sudo pmm-admin add mysql

Hint: if some metrics are missing and http://../prometheus/targets shows context deadline exceeded for some scrapers, it might help to disable table stats:

sudo pmm-admin remove mysql:metrics
sudo pmm-admin add mysql:metrics --disable-tablestats

Grafana datasource

All data from PMM is available in our regular Grafana via the PMM prometheus datasource.

Storage expansion

Actual data is stored on a separate EBS volume using thin-provisioned LVM volumes. If the volume runs out of space, follow the steps described here:

Ami