I wrote a suite of Ansible playbooks to provision an ECS (Elastic Container Service) cluster on AWS, running a webapp deployed on Docker containers in the cluster and load balanced from an ALB (Application Load Balancer), with the Docker image for the app pulled from an ECR (Elastic Container Registry) repository.
This is a follow-up to my project/article “How to use Ansible to provision an EC2 instance with an app running in a Docker container” which explains how to get a containerised Docker app running on an regular EC2 instance, using Docker Hub as the image repo. That could work well as a simple Staging environment, but for Production it’s desirable to easily cluster and scale the containers with a load balancer, so I came up with this solution for provisioning/deploying on ECS which is well-suited for this kind of flexibility. (To quote AWS: “Amazon ECS is a fully managed container orchestration service that makes it easy for you to deploy, manage, and scale containerized applications”.) This solution also uses Amazon’s own ECR for Docker images, rather than Docker Hub.
Overview
Firstly a Docker image is built locally and pushed to a private ECR repository, then the EC2 SSH key and Security Groups are created. Next, a Target Group and corresponding ALB (Application Load Balancer type of ELB) are provisioned, then an ECS container instance is launched on EC2 for the ECS cluster. Finally the ECS cluster is provisioned, an ECS task definition is created to pull and launch the containers from the Docker image in ECR, and finally an ECS Service is provisioned to run the webapp task on the cluster as per the Service definition.
This is an Ansible framework to serve as a basis for building Docker images for your webapp and deploying them as containers on Amazon ECS. It can be expanded in multiple ways, the most obvious being to increase the number of running containers and ECS instances, either with manual scaling or ideally by adding auto-scaling. (Have a look at my article “How to use Ansible for automated AWS provisioning” to see how to auto-scale EC2 instances with Ansible. To scale the containers, it’s simply a case of increasing the desired container count in the ECS Service definition, the rest is handled automatically via port dynamic port mappings. See the provision_production.yml playbook below to learn more.)
CentOS 7 is used for the Docker container, but this can be changed to a different Linux distro if desired. Amazon Linux 2 is used for the ECS cluster instances on EC2.
I created a very basic Python webapp to use as an example for the deployment here, but you can replace that with your own webapp should you so wish.
N.B. Until you’ve tested this and honed it to your needs, run it in a completely separate environment for safety reasons, otherwise there is potential here for accidental destruction of parts of existing environments. Create a separate VPC specifically for this, or even use an entirely separate AWS account.
GitHub files
The playbooks and supporting files can be found in this repository on my GitHub.
Installation/setup
- You’ll need an AWS account with a VPC set up, and with a DNS domain set up in Route 53.
- Install and configure the latest version of the AWS CLI. The settings in the AWS CLI configuration files are needed by the Ansible modules in these playbooks. Also, the Ansible AWS modules aren’t perfect, so there are a few tasks which needs to run the AWS CLI as a local external command. If you’re using a Mac, I’d recommend using Homebrew as the simplest way of installing and managing the AWS CLI.
- If you don’t already have it, you’ll need Python 3. You’ll also need the boto and boto3 Python modules (for Ansible modules and dynamic inventory) which can be installed via pip.
- Ansible needs to be installed and configured. Again, if you’re on a Mac, using Homebrew for this is probably best.
- Docker needs to be installed and running. For this it’s probably best to refer to the instructions on the Docker website.
- ECR Docker Credential Helper needs to be installed so that the local Docker daemon can authenticate with Elastic Container Registry in order to push images to a repository there. Follow the link for installation instructions (on a Mac, as usual, I’d recommend the Homebrew method).
- Copy etc/variables_template.yml to etc/variables.yml and update the static variables at the top for your own environment setup.
Configuring ECR Docker Credential Helper
The method which worked best for me was to add a suitable “credHelpers” section to my ~/.docker/config.json file:
"credHelpers": {
"000000000000.dkr.ecr.eu-west-2.amazonaws.com": "ecr-login"
}
(I’ve replaced my AWS account ID with zeros, but otherwise this is correct.)
So, for me, the whole ~/.docker/config.json ended up looking like this. Yours may not be quite the same but hopefully it clarifies how to add the “credHelpers” section near the end:
{
"auths": {
"000000000000.dkr.ecr.eu-west-2.amazonaws.com": {},
"https://index.docker.io/v1/": {
"auth": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
},
"credHelpers": {
"000000000000.dkr.ecr.eu-west-2.amazonaws.com": "ecr-login"
}
}
Hopefully now if your AWS credentials are also set correctly, you should have no trouble pushing Docker images to ECR repositories.
Usage
These playbooks are run in the standard way, i.e:
ansible-playbook PLAYBOOK_NAME.yml
To deploy your own webapp instead of my basic Python app, you’ll need to modify build_push.yml so it pulls your own app from your repo, then you can edit the variables as needed in etc/variables.yml.
Playbooks for build/provisioning/deployment
There are comments at key points in the playbooks to help further explain certain aspects of what is going on.
1. build_push.yml
Pulls the webapp from GitHub, builds a Docker image using docker/Dockerfile which runs the webapp, and pushes the image to a private ECR repository:
---
- name: Build Docker image and push to ECR repository
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Get app from GitHub
git:
repo: "https://github.com/mattbrock/simple-webapp.git"
dest: "docker/{{ app_name }}"
force: yes
- name: Create Amazon ECR repository
ecs_ecr:
name: "{{ app_name }}"
register: ecr_repo
- name: Update variables file with repo URI
lineinfile:
path: etc/variables.yml
regex: '^ecr_repo:'
line: "ecr_repo: {{ ecr_repo.repository.repositoryUri }}"
- name: Build Docker image and push to AWS ECR repository
docker_image:
build:
path: ./docker
name: "{{ ecr_repo.repository.repositoryUri }}:latest"
push: yes
source: build
force_source: yes
2. provision_key_sg.yml
Provisions an EC2 SSH key, and Security Groups for ECS container instances and ELB:
---
- name: Provision SSH key, Security Groups and Application Load Balancer
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Create EC2 SSH key
ec2_key:
name: "{{ app_name }}"
register: ec2_key
- name: Save EC2 SSH key to file
copy:
content: "{{ ec2_key.key.private_key }}"
dest: etc/ec2_key.pem
mode: 0600
when: ec2_key.changed
- name: Create Security Group for Application Load Balancer
ec2_group:
name: Application Load Balancer
description: EC2 VPC Security Group for Application Load Balancer
vpc_id: "{{ vpc_id }}"
rules:
- proto: tcp
ports: 80
cidr_ip: 0.0.0.0/0
rule_desc: Allow app access from everywhere
register: ec2_sg_lb
- name: Update variables file with Security Group ID
lineinfile:
path: etc/variables.yml
regex: '^ec2_sg_lb_id:'
line: "ec2_sg_lb_id: {{ ec2_sg_lb.group_id }}"
when: ec2_sg_lb.changed
- name: Create Security Group for ECS container instances
ec2_group:
name: ECS Container Instances
description: EC2 VPC Security Group for ECS container instances
vpc_id: "{{ vpc_id }}"
rules:
- proto: tcp
ports: 0-65535
group_id: "{{ ec2_sg_lb.group_id }}"
rule_desc: Allow ELB access to containers
- proto: tcp
ports: 8080
cidr_ip: "{{ my_ip }}/32"
rule_desc: Allow direct app access from my IP
- proto: tcp
ports: 22
cidr_ip: "{{ my_ip }}/32"
rule_desc: Allow SSH from my IP
register: ec2_sg_app
- name: Update variables file with Security Group ID
lineinfile:
path: etc/variables.yml
regex: '^ec2_sg_app_id:'
line: "ec2_sg_app_id: {{ ec2_sg_app.group_id }}"
when: ec2_sg_app.changed
3. provision_production.yml
Provisions a Target Group and associated ALB (Application Load Balancer type of ELB) for load balancing the containers, provisions IAM setup for ECS instances, launches ECS container instance on EC2, provisions ECS cluster, and sets up ECS task definition and Service so the webapp containers deploy on the cluster using the Docker image in ECR:
---
- name: Provision ECS cluster, task definition and service with Docker container, including Target Group + ALB, and ECS container instances
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Create Target Group
elb_target_group:
name: "{{ app_name }}"
protocol: http
port: 80
vpc_id: "{{ vpc_id }}"
state: present
modify_targets: no
register: target_group
- name: Create Application Load Balancer
elb_application_lb:
name: "{{ app_name }}"
security_groups: "{{ ec2_sg_lb_id }}"
subnets:
- "{{ vpc_subnet_id_1 }}"
- "{{ vpc_subnet_id_2 }}"
listeners:
- Protocol: HTTP
Port: 80
DefaultActions:
- Type: forward
TargetGroupName: "{{ app_name }}"
Rules:
- Conditions:
- Field: host-header
Values:
- "{{ route53_zone }}"
Priority: '1'
Actions:
- Type: redirect
RedirectConfig:
Host: "www.{{ route53_zone }}"
Protocol: "#{protocol}"
Port: "#{port}"
Path: "/#{path}"
Query: "#{query}"
StatusCode: "HTTP_301"
register: load_balancer
- name: Update variables file with ELB DNS
lineinfile:
path: etc/variables.yml
regex: '^elb_dns:'
line: "elb_dns: {{ load_balancer.dns_name }}"
- name: Update variables file with ELB hosted zone ID
lineinfile:
path: etc/variables.yml
regex: '^elb_zone_id:'
line: "elb_zone_id: {{ load_balancer.canonical_hosted_zone_id }}"
- name: Create ECS Instance Role for EC2 Production instances
iam_role:
name: ecsInstanceRole
assume_role_policy_document: "{{ lookup('file','etc/ecs_instance_role_policy.json') }}"
managed_policies:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
create_instance_profile: yes
state: present
register: ecs_instance_role
# Couldn't find any way with Ansible plugins to link the role to the instance profile
# so we do it like this. It's messy but it works
- name: Link ECS Instance Role to Instance Profile
command: aws iam add-role-to-instance-profile --role-name ecsInstanceRole --instance-profile-name ecsInstanceRole
ignore_errors: yes
# Specify the ECS Instance Role and add the User Data so ECS knows
# to use this instance for the ECS cluster
- name: Launch an ECS container instance on EC2 for the cluster to run tasks on
ec2_instance:
name: ECS
key_name: "{{ app_name }}"
vpc_subnet_id: "{{ vpc_subnet_id_1 }}"
instance_type: t2.micro
instance_role: "{{ ecs_instance_role.role_name }}"
security_group: "{{ ec2_sg_app_id }}"
network:
assign_public_ip: true
image_id: "{{ ec2_ecs_image_id }}"
tags:
Environment: Production
user_data: |
#!/bin/bash
echo ECS_CLUSTER={{ app_name }} >> /etc/ecs/ecs.config
wait: yes
- name: Provision ECS cluster
ecs_cluster:
name: "{{ app_name }}"
state: present
# Set hostPort to 0 to enable dynamic port mappings from load balancer
#
# force_create ensures new revision when app has changed in repo
# and causes service to redeploy as rolling deployment with new task revision
- name: Create ECS task definition with dynamic port mappings from load balancer (setting hostPort to 0 to enable this)
ecs_taskdefinition:
family: "{{ app_name }}"
containers:
- name: "{{ app_name }}"
image: "{{ ecr_repo }}:latest"
memory: 128
portMappings:
- containerPort: 8080
hostPort: 0
launch_type: EC2
network_mode: default
state: present
force_create: yes
- name: Pause is necessary before provisioning service, possibly for AWS to finish creating service-linked IAM role for ECS
pause:
seconds: 30
- name: Provision ECS service
ecs_service:
name: "{{ app_name }}"
cluster: "{{ app_name }}"
task_definition: "{{ app_name }}"
desired_count: 1
launch_type: EC2
scheduling_strategy: REPLICA
load_balancers:
- targetGroupArn: "{{ target_group.target_group_arn }}"
containerName: "{{ app_name }}"
containerPort: "{{ 8080 }}"
state: present
4. provision_dns.yml
Provisions the DNS in Route 53 for the ALB; note that it may take a few minutes for the DNS to propagate before it becomes usable:
---
- name: Provision DNS
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Add an alias record for root
route53:
state: present
zone: "{{ route53_zone }}"
record: "{{ route53_zone }}"
type: A
value: "{{ elb_dns }}"
alias: yes
alias_hosted_zone_id: "{{ elb_zone_id }}"
alias_evaluate_target_health: yes
overwrite: yes
- name: Add an alias record for www.domain
route53:
state: present
zone: "{{ route53_zone }}"
record: "www.{{ route53_zone }}"
type: A
value: "{{ elb_dns }}"
alias: yes
alias_hosted_zone_id: "{{ elb_zone_id }}"
alias_evaluate_target_health: yes
overwrite: yes
Running order and outcome
Initially, running later playbooks without having run the earlier ones will fail due to missing components and variables etc. Running all four playbooks in succession will set up the entire infrastructure from start to finish.
Once everything is built successfully, the ECS service will attempt to run a task to deploy the webapp containers in the cluster. Below are instructions for how to check the service event log to see task deployment progress.
Redeployment
Once the environment is up and running, any changes to the app can be rebuilt and redeployed by running Steps 1 and 3 again. This makes use of the rolling deployment mechanism within ECS for a smooth automated transition to the new version of the app.
Playbooks for deprovisioning
1. destroy_all.yml
Destroys the entire AWS infrastructure:
---
- name: Destroy entire infrastructure
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Delete DNS record for root
route53:
state: absent
zone: "{{ route53_zone }}"
record: "{{ route53_zone }}"
type: A
value: "{{ elb_dns }}"
alias: yes
alias_hosted_zone_id: "{{ elb_zone_id }}"
alias_evaluate_target_health: yes
- name: Delete DNS record for www.
route53:
state: absent
zone: "{{ route53_zone }}"
record: "www.{{ route53_zone }}"
type: A
value: "{{ elb_dns }}"
alias: yes
alias_hosted_zone_id: "{{ elb_zone_id }}"
alias_evaluate_target_health: yes
- name: Delete ECS service
ecs_service:
name: "{{ app_name }}"
cluster: "{{ app_name }}"
state: absent
force_deletion: yes
# Ansible AWS plugins didn't seem to offer a way of removing all revisions of a task definition
# so we have to do it like this
- name: Deregister all ECS task definitions
shell: for taskdef in $(aws ecs list-task-definitions --query 'taskDefinitionArns[*]' --output text | grep {{ app_name }}) ; do aws ecs deregister-task-definition --task-definition $taskdef ; done
- name: Delete Application Load Balancer
elb_application_lb:
name: "{{ app_name }}"
state: absent
- name: Delete Target Group
elb_target_group:
name: "{{ app_name }}"
state: absent
- name: Terminate all EC2 instances
ec2_instance:
state: absent
filters:
instance-state-name: running
tag:Name: ECS
wait: yes
- name: Delete ECS cluster
ecs_cluster:
name: "{{ app_name }}"
state: absent
# Ansible AWS plugins apparently can't force-remove a repository, i.e.
# remove a repository containing images, so we have to do it like this
- name: Delete ECR repository
shell: aws ecr delete-repository --repository-name {{ app_name }} --force
ignore_errors: yes
- name: Delete Security Group for ECS container instances
ec2_group:
group_id: "{{ ec2_sg_app_id }}"
state: absent
- name: Delete Security Group for load balancer
ec2_group:
group_id: "{{ ec2_sg_lb_id }}"
state: absent
- name: Delete EC2 SSH key
ec2_key:
name: "{{ app_name }}"
state: absent
- name: Delete ecsInstanceRole
iam_role:
name: ecsInstanceRole
state: absent
- name: Delete service-linked IAM role for ECS
command: aws iam delete-service-linked-role --role-name AWSServiceRoleForECS
ignore_errors: yes
- name: Delete service-linked IAM role for ELB
command: aws iam delete-service-linked-role --role-name AWSServiceRoleForElasticLoadBalancing
ignore_errors: yes
2. delete_all.yml
Clears all dynamic variables in the etc/variables.yml file, deletes the EC2 SSH key, removes the local Docker image, and deletes the local webapp repo in the docker directory:
---
- name: Delete dynamic variables, SSH key file, local Docker image and local app repo
hosts: localhost
connection: local
tasks:
- name: Import variables
include_vars: etc/variables.yml
- name: Remove ELB DNS from variables file
lineinfile:
path: etc/variables.yml
regex: '^elb_dns:'
line: "elb_dns:"
- name: Remove ELB Zone ID from variables file
lineinfile:
path: etc/variables.yml
regex: '^elb_zone_id:'
line: "elb_zone_id:"
- name: Remove app Security Group from variables file
lineinfile:
path: etc/variables.yml
regex: '^ec2_sg_app_id:'
line: "ec2_sg_app_id:"
- name: Remove LB Security Group from variables file
lineinfile:
path: etc/variables.yml
regex: '^ec2_sg_lb_id:'
line: "ec2_sg_lb_id:"
- name: Remove ECR repo from variables file
lineinfile:
path: etc/variables.yml
regex: '^ecr_repo:'
line: "ecr_repo:"
- name: Delete SSH key file
file:
path: etc/ec2_key.pem
state: absent
- name: Remove local Docker image
docker_image:
name: "{{ ecr_repo }}"
state: absent
force_absent: yes
- name: Delete local app repo folder
file:
path: "./docker/{{ app_name }}"
state: absent
Destruction/deletion notes
USE destroy_all.yml WITH EXTREME CAUTION! If you’re not operating in a completely separate environment, or if your shell is configured for the wrong AWS account, you could potentially cause serious damage with this. Always check before running that you are working in the correct isolated environment and that you are absolutely 100 percent sure you want to do this. Don’t say I didn’t warn you!
Once everything has been fully destroyed, it’s safe to run the delete_all.yml playbook to clear out the variables file. Do not run this until you are sure everything has been fully destroyed, because the SSH key file can never be recovered again after it has been deleted.
Checking the Docker image in a local container
After building the Docker image in Step 1, if you want to run a local container from the image for initial testing purposes, you can use standard Docker commands for this:
docker run -d --name simple-webapp -p 8080:8080 $(grep ecr_repo etc/variables.yml | cut -d" " -f2):latest
You should then be able to make a request to the local container at:
http://localhost:8080/
To check the logs:
docker logs simple-webapp
To stop the container:
docker stop simple-webapp
To remove it:
docker rm simple-webapp
Checking deployment status, logs, etc.
To check the state of the deployment and see events in the service log (change “simple-webapp” to the name of your app, if different):
aws ecs describe-services --cluster simple-webapp --services simple-webapp --output text
This should show what’s happening on the cluster in terms of task deployment, and hopefully you’ll eventually see that the process successfully starts, registers on the load balancer, and completes deployment, at which point it should reach a “steady state”:
EVENTS 2022-02-23T13:04:39.900000+00:00 3a087c70-aaa3-47d5-ae31-040db688155a (service simple-webapp) has reached a steady state.
EVENTS 2022-02-23T13:04:39.899000+00:00 c0785dae-154d-440b-b315-f948901d48fb (service simple-webapp) (deployment ecs-svc/4617274246689568181) deployment completed.
EVENTS 2022-02-23T13:04:20.239000+00:00 c60ce4fa-e7a6-4776-907b-b931a166109a (service simple-webapp) registered 1 targets in (target-group arn:aws:elasticloadbalancing:eu-west-2:000000000000:targetgroup/simple-webapp/2ec4fbc39edca3aa)
EVENTS 2022-02-23T13:03:50.185000+00:00 2e2c4570-2bb3-45f3-83e6-84b61b9c63bb (service simple-webapp) has started 1 tasks: (task 8b8f8d2258a74885b58e610fbf19a2cc).
Check the webapp via the ALB (ELB):
curl http://$(grep elb_dns etc/variables.yml | cut -d" " -f2)
Check the webapp using DNS (once the DNS has propagated, and replacing yourdomain.com
with the domain you are using:
curl http://staging.yourdomain.com/
Get the container logs from running instances:
ansible -i etc/inventory.aws_ec2.yml -u ec2-user --private-key etc/ec2_key.pem tag_Environment_Production -m shell -a "docker ps | grep simple-webapp | cut -d\" \" -f1 | xargs docker logs"
You can also use that method to run ad hoc Ansible commands on the instances, e.g. uptime
:
ansible -i etc/inventory.aws_ec2.yml -u ec2-user --private-key etc/ec2_key.pem tag_Environment_Production -m shell -a "uptime"
If you need to SSH to the instance, if there’s only one instance:
ssh -i etc/ec2_key.pem ec2-user@$(aws ec2 describe-instances --filters "Name=tag:Environment,Values=Production" --query "Reservations[*].Instances[*].PublicDnsName")
For multiple instances, list the public DNS names as follows, then SSH to each individually as needed:
aws ec2 describe-instances --filters "Name=tag:Environment,Values=Production" --query "Reservations[*].Instances[*].PublicDnsName"
Final thoughts
I hope this is a helpful guide for building and running containerised Docker apps on ECS using Ansible. If you need help with any of the issues raised in this article, or with any other infrastructure, automation, DevOps or SysAdmin projects or tasks, don’t hesitate to get in touch regarding the freelance services I offer.