Beehive Infrastructure¶

Archived (pre-2022)

Preserved for reference only -- likely outdated. View original | Last updated: September 2018

General Overview¶

The Beehive project aims to gather few tools and integrations to consolidate serving ads across all Fyber entities. The end goal of Beehive project primarily is to facilitate the integration of different Fyber platforms and use one common unified platform to serve ads.

Architecture Overview¶

The main Beehive module is named "Bidding Agent", it determines and dispatches the winning bid to a third party provider named "BeesWax".

Beehive relies on Beeswax platform to handle most of the campaigns. It exposes additionally more features such as targeting, frequency, budgeting, etc...

BeesWax Architecture¶

BeesWax is fully running in AWS (mainly 3 regions: US-EAST-1, US-WEST-2 and EU-WEST-1).

It uses essentially Streaming Event managed by Amazon Kinesis Streams service that collects larges number of data streams in real time.

Read more: Data Feeds

Beehive Integration Architecture¶

The Bidding Agent relies on capped information that are generated by different components as the following:

LICP (Line Item Capper Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:

Provides Capping information by generating a CDB file.
The CDB files will be uploaded to S3 (periodically) and consumed by the Bidding Agent (once available)
Runs in scheduler managed by AirFlow jobs.
Uses data generated by Druid (Running in Fyber RTB environment)

LIDP (Line Item Data Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:

Provides Data information by generating a CDB file.
The CDB files will be uploaded to S3 (periodically), consumed by the Bidding Agent (once available)
Runs in scheduler managed by AirFlow jobs.

ASP (Ad Scorer Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:

Provides Scores information (by Ad) by generating a CDB file.
The CDB files will be uploaded to S3 (periodically) and consumed by the Bidding Agent (once available)
Runs in scheduler managed by AirFlow jobs.
Uses data generated by Druid (Running in Fyber RTB environment)

Beehive Infrastructure¶

As mentioned in the previous section, the Beehive project compels different environments running several components that can be summarised as follows:

BeesWax: Third party provider acting as RTB As A Service platform (Managed by BeesWax in AWS)
Bidding Agent: Running in Fyber Core Environment (AWS)
LIDP: Running in Fyber RTB Environment (AWS)
LICP: Running in Fyber RTB Environment (AWS)
ASP: Running in Fyber RTB Environment (AWS)
Druid: Running in Fyber RTB Environment (AWS)

The described items and workflow can be summarised in the following diagram:

Screen Shot 2018-08-31 at 14.54.41.png

The following sections will put under scope each components from infrastructure perspective in more details.

Bidding Agent¶

Environments¶

Column 1	Column 2
AWS Environment	Fyber Core
AWS Staging	399797994004
AWS Production	767648288756
AWS Active Regions	EU-WEST-1 ; US-EAST-1; US-WEST-2
SSH Access	SAML (Active Directory)

Read More about Fyber AWS accounts: Here →

Cluster Setup¶

The Bidding Agent run in ECS cluster 'aws-production-bidding-agent' service. The Cluster service runs 4 different containers as the following:

bidding-agent: docker container running bidding agent application.
consul-client: docker container running consul for service discovery.
haproxy-consul-template: docker container running HAProxy with dynamic config through consul-template.
statsite: docker container running statsite as metrics aggregator (push containers metrics to graphite).

Task Definition and configuration values can be found → here.

Cluster Deployment¶

The Bidding Agent infrastructure deployment is managed by Terraform within staging and production environments in 3 different regions per each:

Production: eu-west-1, us-east-1, us-west-2
Staging: eu-west-1, us-east-1, us-west-2

Code State can be found here Terraform →

Code Module can be found here Terraform →

To deploy/update resources for the BA:

Point to the infrastructure as code repository.
Choose the target region and environment.
Use the built-in plan and apply terraform wrapper.

e.g: To deploy/update BA in production environment targeting eu-west-1 region, the command line to be executed as follows:

$ bundle exec rake "terraform:plan_and_apply[bst,production-eu-west-1]"

The format of the plan_and_apply arguments for BST state is : [bst,ENVIRONMENT-REGION]

where:

ENVIRONMENT: production or staging
REGION: eu-west-1, us-east-1 or us-west-2

Resources definition¶

Each deployed BA per region provides at least the following resources (as stated in the state file):

S3 Buckets
Kinesis Streams (Data Streams)
ECS (backed by SpotInst Scaling Groups)

It is important to note that each state might have different resources (missing Streams for example in commented lines). That depends on the requirements of the Beehive project.

Common resources defined in the main.tf file might also be commented out when deploying/updating BA in a target region and environment depending on the project requirements.

Bidding Agent Access¶

In order to access to Bidding Agent instance, use the terraform rake wrapper command by pointing to the region and desired environment.

E.g, to list the active and online bidding agent instances in production environment targeting eu-west-1 region, run the following rake command :

$ bundle exec rake "ssh[production,eu-west-1,aws,bid]"
Using AWS profile 'saml'...
Using environment 'production'...
Using region 'eu-west-1'...
Using namespace 'aws'...
Using name 'bid'...
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.229.15
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.238.103
aws-ops-bastion-host -> aws-bidding-agent (a) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.191.61
aws-ops-bastion-host -> aws-bidding-agent (b) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.217.41
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.235.242
aws-ops-bastion-host -> aws-bidding-agent (a) | ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.185.116

To access one of the BA instance, simply copy and paste one of the designated ssh commands for a specific instance:

$ ssh -A -t  ubuntu@52.209.168.254 ssh -A -t  ubuntu@10.37.17.29 ssh -A -t  ec2-user@10.37.229.15

As noted in the previous section, the BA is running on a docker container. The container cluster is managed by ECS service. As described in the , the ECS cluster running the BA defines statsite, haproxy-consul-template and consul-client containers. This can be listed on the accessed instance using the following command:

$ docker ps -a
CONTAINER ID        IMAGE                                                                                    COMMAND                  CREATED             STATUS              PORTS                                                                                                                                NAMES
cbe5118b032d        767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.bidding-agent:latest             "bin/docker-entryp..."   2 days ago          Up 2 days           0.0.0.0:8080->8080/tcp, 0.0.0.0:9010->9010/tcp, 8888/tcp                                                                             ecs-aws-production-bidding-agent-22-aws-production-bidding-agent-98e2e8b7c1e4e8ad3d00
18122c226966        767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.statsite:latest                  "/docker-entrypoin..."   2 days ago          Up 2 days           0.0.0.0:8125->8125/tcp, 0.0.0.0:8125->8125/udp                                                                                       ecs-aws-production-bidding-agent-22-aws-production-statsite-8ca68be58d8cb8c92c00
3c21dc45d2ac        767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.haproxy-consul-template:latest   "entrypoint.sh"          2 days ago          Up 2 days           0.0.0.0:1936->1936/tcp, 0.0.0.0:2003->2003/tcp, 0.0.0.0:3000->3000/tcp, 0.0.0.0:8081->8081/tcp, 0.0.0.0:9092->9092/tcp               ecs-aws-production-bidding-agent-22-aws-production-haproxy-consul-template-eae8aa9dd9feddfa8401
0c72995f7001        767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.consul-client:latest             "docker-entrypoint..."   2 days ago          Up 2 days           8300/tcp, 8302/tcp, 0.0.0.0:8301->8301/tcp, 8400/tcp, 8302/udp, 8600/tcp, 0.0.0.0:8500->8500/tcp, 0.0.0.0:8301->8301/udp, 8600/udp   ecs-aws-production-bidding-agent-22-aws-production-consul-client-9489bfa8c9b7b9ef4700
0f5d4f3bf521        amazon/amazon-ecs-agent:latest                                                           "/agent"                 2 days ago          Up 2 days                                                                                                                                                ecs-agent

To access to a defined container, use the following command in the running instance:

$ docker exec -it cbe5118b032d bash

daemon@bidding-agent:/opt/docker$

Command format:

docker exec -it CONTAINER_ID bash. Where  CONTAINER_ID is the container id field listed by the `docker ps -a` command line

Bidding Agent - SpotInst:

As bidding agent instances are backed by spot instances, a third party spotinst provider takes care of the management of instance bidding in AWS that can be accessed here.

The BA group spot instances are defined as Elasticgroups and exist in both staging and production environments (in SpotInst terminology defined as Organisation):

Editing the configuration of any EG can be done simply by selecting the EG under question and edit from Actions button required settings.

E.G, updating the BA scaling group capacity can be performed from the SpotInst dashboard by pointing to the Actions drop down menu → Edit Capacity.

Important Note: The control of ASG for instances backed by SpotInst that includes BA can be done only via the Spotinst dashboard and not directly via AWS console.

screenshot (2).png

Bidding Agent - Monitoring and Logging¶

There are different ways to monitor the BA clusters in AWS by checking the following options:

Grafana/Graphite¶

As mentioned previously, each BA task definition includes a statsite container for metrics collection. Each operational region is having a central Graphite instance where collected metrics are being pulled and aggregated. For the Beehive project, Grafana is used for graphs visualisation including bidding agent and related components.

The main production dashboard is located here:

screenshot (6).png

Most of the published metrics as aggregated by a Graphite instance located by region. To access any Graphite instances in AWS, run the terraform wrapper SSH command line with additional port forwarding argument (default Graphite web interface port) as follows:

$ bundle exec rake "ssh[production,eu-west-1,aws,graphite,80]"
Using AWS profile 'saml'...
Using environment 'production'...
Using region 'eu-west-1'...
Using namespace 'aws'...
Using name 'graphite'...
Forwarding hostports '["80"]'...
aws-ops-bastion-host -> aws-graphite (a) | ssh -A -t -L 80:localhost:4366 ubuntu@52.209.168.254 ssh -A -t -L 4366:localhost:4366 ubuntu@10.37.17.29 ssh -A -t -L 4366:localhost:80 ec2-user@10.37.20.16

Running the following command and browsing to 'localhost:80' would expose the Graphite dashboard with detailed metrics per aggregation metric instance:

$ ssh -A -t -L 80:localhost:4366 ubuntu@52.209.168.254 ssh -A -t -L 4366:localhost:4366 ubuntu@10.37.17.29 ssh -A -t -L 4366:localhost:80 ec2-user@10.37.20.16

→ localhost:80

CloudWatch¶

Each of production and staging environment exposes the same setup for monitoring and logging the ECS cluster running the bidding agent using CloudWatch.

For monitoring using CW, metrics are gathered for all cluster instances and can explored by instance running each the BA application either from EC2 or ECS dashboards:

screenshot (3).png

screenshot (4).png

Additionally, CW is used mainly for logging where each container cluster logs is being shipped to CW log. This can accessed from the CW dashboard → Logs, select the Log Streams name.

The following example exposes the BA Log streams in CW:

screenshot (5).png

Another important Monitoring spot using CW is by checking the Load balancer where BA instances running behind:

screenshot (7).png

SpotInst:

SpotInst dashboard also provides other useful monitoring and monitoring dashboards for each Elastic Group as the following:

Monitoring: General monitoring information regarding number of requests for the whole cluster group, latency and number of errors (5XX and 4XX):

screenshot (8).png

Logging: State of instances information when joining and leaving the cluster (ELB/ALB registration and de-registration):

Screen Shot 2018-09-03 at 17.11.30.png

Bidding Agent - Troubleshooting¶

Instance failure to join the Load Balancer - Container Service Failure:

One of the common issues with bidding agent is the failure to join the ELB BA. This could be due to failure to start the task definition of the bididng agent ECS service in one of the instances.

If one of the docker containers fails to start (e.g consul-client container failed to start and join consul server), the task will fail and the instance won't join the load balancer.

For fast troubleshooting, make sure to check the following points:

Check in the ECS service which task is failing and in which instances
Check in the Logs tab of the ECS clusters service which containers are being running and stopped
If no obvious information provided, pinpoint to CloudWatch → Logs and check the latest logs lines for BA service. Note that each Log Stream line provides logs per container as stated in the task definition:
bidding-agent container (followed by created ID)
consul-client container (followed by created ID)
haproxy-consul-template container (followed by created ID)
statsite container (followed by created ID)
instance messages
ecs-agent container

Screen Shot 2018-09-04 at 13.57.54.png

Instance failure to join the Load Balancer - SpotInst Type not available:

Other reason that might raise when reaching low instance count for BA instance types. This can be due to lack of required instance types of BA in the Spot Market which depends on region distribution:

Screen Shot 2018-09-04 at 14.13.48.png

Note: Instance type pools for SpotInst can be updated and deployed per request in terraform code here →

For fast troubleshooting in such issue, update the Spot types list from the SpotInst dashboard as follows:

Screen Shot 2018-09-04 at 14.18.11.png

Then add instance type alternatives to the drop down list in the configuration menu:

Screen Shot 2018-09-04 at 14.18.26.png

Click on Update, then the cluster will deploy and join new available spot instances. This would take few minutes to deploy the new cluster configuration and being effective.

Screen Shot 2018-09-04 at 14.18.37.png

Increased of Error Requests hitting the BA endpoint:

The BA cluster is adjusted with X instances per region (handling between 7.5-10K requests in production backed by 6 instances). If 4xx/5xx increases, please check the following points:

Healthy Hosts: If number of healthy hosts decreases, make sure to check the ECS cluster status as mentioned in point 1 and 2 from the troubleshooting section
Backends errors (including 4xx and 5xx): The ELB could start dropping requests (with fixed number of X healthy hosts), make sure to increase the ASG in the SpotInst dashboard (Manage capacity of the cluster) as follows: