Beehive Infrastructure¶
Archived (pre-2022)
Preserved for reference only -- likely outdated. View original | Last updated: September 2018
General Overview¶
The Beehive project aims to gather few tools and integrations to consolidate serving ads across all Fyber entities. The end goal of Beehive project primarily is to facilitate the integration of different Fyber platforms and use one common unified platform to serve ads.
Architecture Overview¶
The main Beehive module is named "Bidding Agent", it determines and dispatches the winning bid to a third party provider named "BeesWax".
Beehive relies on Beeswax platform to handle most of the campaigns. It exposes additionally more features such as targeting, frequency, budgeting, etc...
BeesWax Architecture¶
BeesWax is fully running in AWS (mainly 3 regions: US-EAST-1, US-WEST-2 and EU-WEST-1).
It uses essentially Streaming Event managed by Amazon Kinesis Streams service that collects larges number of data streams in real time.
Read more: Data Feeds
Beehive Integration Architecture¶
The Bidding Agent relies on capped information that are generated by different components as the following:
LICP (Line Item Capper Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:
- Provides Capping information by generating a CDB file.
- The CDB files will be uploaded to S3 (periodically) and consumed by the Bidding Agent (once available)
- Runs in scheduler managed by AirFlow jobs.
- Uses data generated by Druid (Running in Fyber RTB environment)
LIDP (Line Item Data Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:
- Provides Data information by generating a CDB file.
- The CDB files will be uploaded to S3 (periodically), consumed by the Bidding Agent (once available)
- Runs in scheduler managed by AirFlow jobs.
ASP (Ad Scorer Preparator) is part of the Beehive project but runs in different environment (AWS RTB Environment). It essentially:
- Provides Scores information (by Ad) by generating a CDB file.
- The CDB files will be uploaded to S3 (periodically) and consumed by the Bidding Agent (once available)
- Runs in scheduler managed by AirFlow jobs.
- Uses data generated by Druid (Running in Fyber RTB environment)
Beehive Infrastructure¶
As mentioned in the previous section, the Beehive project compels different environments running several components that can be summarised as follows:
- BeesWax: Third party provider acting as RTB As A Service platform (Managed by BeesWax in AWS)
- Bidding Agent: Running in Fyber Core Environment (AWS)
- LIDP: Running in Fyber RTB Environment (AWS)
- LICP: Running in Fyber RTB Environment (AWS)
- ASP: Running in Fyber RTB Environment (AWS)
- Druid: Running in Fyber RTB Environment (AWS)
The described items and workflow can be summarised in the following diagram:

The following sections will put under scope each components from infrastructure perspective in more details.
Bidding Agent¶
Environments¶
| Column 1 | Column 2 |
|---|---|
| AWS Environment | Fyber Core |
| AWS Staging | 399797994004 |
| AWS Production | 767648288756 |
| AWS Active Regions | EU-WEST-1 ; US-EAST-1; US-WEST-2 |
| SSH Access | SAML (Active Directory) |
Read More about Fyber AWS accounts: Here →
Cluster Setup¶
The Bidding Agent run in ECS cluster 'aws-production-bidding-agent' service. The Cluster service runs 4 different containers as the following:
- bidding-agent: docker container running bidding agent application.
- consul-client: docker container running consul for service discovery.
- haproxy-consul-template: docker container running HAProxy with dynamic config through consul-template.
- statsite: docker container running statsite as metrics aggregator (push containers metrics to graphite).
Task Definition and configuration values can be found → here.
Cluster Deployment¶
The Bidding Agent infrastructure deployment is managed by Terraform within staging and production environments in 3 different regions per each:
- Production: eu-west-1, us-east-1, us-west-2
- Staging: eu-west-1, us-east-1, us-west-2
Code State can be found here Terraform →
Code Module can be found here Terraform →
To deploy/update resources for the BA:
- Point to the infrastructure as code repository.
- Choose the target region and environment.
- Use the built-in plan and apply terraform wrapper.
e.g: To deploy/update BA in production environment targeting eu-west-1 region, the command line to be executed as follows:
The format of the plan_and_apply arguments for BST state is : [bst,ENVIRONMENT-REGION]
where:
- ENVIRONMENT: production or staging
- REGION: eu-west-1, us-east-1 or us-west-2
Resources definition¶
Each deployed BA per region provides at least the following resources (as stated in the state file):
- S3 Buckets
- Kinesis Streams (Data Streams)
- ECS (backed by SpotInst Scaling Groups)
It is important to note that each state might have different resources (missing Streams for example in commented lines). That depends on the requirements of the Beehive project.
Common resources defined in the main.tf file might also be commented out when deploying/updating BA in a target region and environment depending on the project requirements.
Bidding Agent Access¶
In order to access to Bidding Agent instance, use the terraform rake wrapper command by pointing to the region and desired environment.
E.g, to list the active and online bidding agent instances in production environment targeting eu-west-1 region, run the following rake command :
$ bundle exec rake "ssh[production,eu-west-1,aws,bid]"
Using AWS profile 'saml'...
Using environment 'production'...
Using region 'eu-west-1'...
Using namespace 'aws'...
Using name 'bid'...
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.229.15
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.238.103
aws-ops-bastion-host -> aws-bidding-agent (a) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.191.61
aws-ops-bastion-host -> aws-bidding-agent (b) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.217.41
aws-ops-bastion-host -> aws-bidding-agent (c) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.235.242
aws-ops-bastion-host -> aws-bidding-agent (a) | ssh -A -t ubuntu@52.209.168.254 ssh -A -t ubuntu@10.37.17.29 ssh -A -t ec2-user@10.37.185.116
To access one of the BA instance, simply copy and paste one of the designated ssh commands for a specific instance:
As noted in the previous section, the BA is running on a docker container. The container cluster is managed by ECS service. As described in the , the ECS cluster running the BA defines statsite, haproxy-consul-template and consul-client containers. This can be listed on the accessed instance using the following command:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cbe5118b032d 767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.bidding-agent:latest "bin/docker-entryp..." 2 days ago Up 2 days 0.0.0.0:8080->8080/tcp, 0.0.0.0:9010->9010/tcp, 8888/tcp ecs-aws-production-bidding-agent-22-aws-production-bidding-agent-98e2e8b7c1e4e8ad3d00
18122c226966 767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.statsite:latest "/docker-entrypoin..." 2 days ago Up 2 days 0.0.0.0:8125->8125/tcp, 0.0.0.0:8125->8125/udp ecs-aws-production-bidding-agent-22-aws-production-statsite-8ca68be58d8cb8c92c00
3c21dc45d2ac 767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.haproxy-consul-template:latest "entrypoint.sh" 2 days ago Up 2 days 0.0.0.0:1936->1936/tcp, 0.0.0.0:2003->2003/tcp, 0.0.0.0:3000->3000/tcp, 0.0.0.0:8081->8081/tcp, 0.0.0.0:9092->9092/tcp ecs-aws-production-bidding-agent-22-aws-production-haproxy-consul-template-eae8aa9dd9feddfa8401
0c72995f7001 767648288756.dkr.ecr.eu-west-1.amazonaws.com/production.consul-client:latest "docker-entrypoint..." 2 days ago Up 2 days 8300/tcp, 8302/tcp, 0.0.0.0:8301->8301/tcp, 8400/tcp, 8302/udp, 8600/tcp, 0.0.0.0:8500->8500/tcp, 0.0.0.0:8301->8301/udp, 8600/udp ecs-aws-production-bidding-agent-22-aws-production-consul-client-9489bfa8c9b7b9ef4700
0f5d4f3bf521 amazon/amazon-ecs-agent:latest "/agent" 2 days ago Up 2 days ecs-agent
To access to a defined container, use the following command in the running instance:
Command format:
docker exec -it CONTAINER_ID bash. Where CONTAINER_ID is the container id field listed by the `docker ps -a` command line
Bidding Agent - SpotInst:
As bidding agent instances are backed by spot instances, a third party spotinst provider takes care of the management of instance bidding in AWS that can be accessed here.
The BA group spot instances are defined as Elasticgroups and exist in both staging and production environments (in SpotInst terminology defined as Organisation):

Editing the configuration of any EG can be done simply by selecting the EG under question and edit from Actions button required settings.
E.G, updating the BA scaling group capacity can be performed from the SpotInst dashboard by pointing to the Actions drop down menu → Edit Capacity.
Important Note: The control of ASG for instances backed by SpotInst that includes BA can be done only via the Spotinst dashboard and not directly via AWS console.

Bidding Agent - Monitoring and Logging¶
There are different ways to monitor the BA clusters in AWS by checking the following options:
Grafana/Graphite¶
As mentioned previously, each BA task definition includes a statsite container for metrics collection. Each operational region is having a central Graphite instance where collected metrics are being pulled and aggregated. For the Beehive project, Grafana is used for graphs visualisation including bidding agent and related components.
The main production dashboard is located here:

Most of the published metrics as aggregated by a Graphite instance located by region. To access any Graphite instances in AWS, run the terraform wrapper SSH command line with additional port forwarding argument (default Graphite web interface port) as follows:
$ bundle exec rake "ssh[production,eu-west-1,aws,graphite,80]"
Using AWS profile 'saml'...
Using environment 'production'...
Using region 'eu-west-1'...
Using namespace 'aws'...
Using name 'graphite'...
Forwarding hostports '["80"]'...
aws-ops-bastion-host -> aws-graphite (a) | ssh -A -t -L 80:localhost:4366 ubuntu@52.209.168.254 ssh -A -t -L 4366:localhost:4366 ubuntu@10.37.17.29 ssh -A -t -L 4366:localhost:80 ec2-user@10.37.20.16
Running the following command and browsing to 'localhost:80' would expose the Graphite dashboard with detailed metrics per aggregation metric instance:
$ ssh -A -t -L 80:localhost:4366 ubuntu@52.209.168.254 ssh -A -t -L 4366:localhost:4366 ubuntu@10.37.17.29 ssh -A -t -L 4366:localhost:80 ec2-user@10.37.20.16
→ localhost:80

CloudWatch¶
Each of production and staging environment exposes the same setup for monitoring and logging the ECS cluster running the bidding agent using CloudWatch.
For monitoring using CW, metrics are gathered for all cluster instances and can explored by instance running each the BA application either from EC2 or ECS dashboards:


Additionally, CW is used mainly for logging where each container cluster logs is being shipped to CW log. This can accessed from the CW dashboard → Logs, select the Log Streams name.
The following example exposes the BA Log streams in CW:

Another important Monitoring spot using CW is by checking the Load balancer where BA instances running behind:

SpotInst:
SpotInst dashboard also provides other useful monitoring and monitoring dashboards for each Elastic Group as the following:
- Monitoring: General monitoring information regarding number of requests for the whole cluster group, latency and number of errors (5XX and 4XX):

- Logging: State of instances information when joining and leaving the cluster (ELB/ALB registration and de-registration):

Bidding Agent - Troubleshooting¶
- Instance failure to join the Load Balancer - Container Service Failure:
One of the common issues with bidding agent is the failure to join the ELB BA. This could be due to failure to start the task definition of the bididng agent ECS service in one of the instances.
If one of the docker containers fails to start (e.g consul-client container failed to start and join consul server), the task will fail and the instance won't join the load balancer.
For fast troubleshooting, make sure to check the following points:
- Check in the ECS service which task is failing and in which instances
- Check in the Logs tab of the ECS clusters service which containers are being running and stopped
- If no obvious information provided, pinpoint to CloudWatch → Logs and check the latest logs lines for BA service. Note that each Log Stream line provides logs per container as stated in the task definition:
- bidding-agent container (followed by created ID)
- consul-client container (followed by created ID)
- haproxy-consul-template container (followed by created ID)
- statsite container (followed by created ID)
- instance messages
- ecs-agent container

- Instance failure to join the Load Balancer - SpotInst Type not available:
Other reason that might raise when reaching low instance count for BA instance types. This can be due to lack of required instance types of BA in the Spot Market which depends on region distribution:

Note: Instance type pools for SpotInst can be updated and deployed per request in terraform code here →
For fast troubleshooting in such issue, update the Spot types list from the SpotInst dashboard as follows:

Then add instance type alternatives to the drop down list in the configuration menu:

Click on Update, then the cluster will deploy and join new available spot instances. This would take few minutes to deploy the new cluster configuration and being effective.

- Increased of Error Requests hitting the BA endpoint:
The BA cluster is adjusted with X instances per region (handling between 7.5-10K requests in production backed by 6 instances). If 4xx/5xx increases, please check the following points:
- Healthy Hosts: If number of healthy hosts decreases, make sure to check the ECS cluster status as mentioned in point 1 and 2 from the troubleshooting section
- Backends errors (including 4xx and 5xx): The ELB could start dropping requests (with fixed number of X healthy hosts), make sure to increase the ASG in the SpotInst dashboard (Manage capacity of the cluster) as follows:
