This blog now has a deployment pipeline!
18 Dec 2021This site was created mostly as a learning experience. Setting it up was a complete point-and-click adventure following detailed instructions laid down by a friend of mine (more about it here). That didn’t feel like enough practice. Plus every time I write a new post, I need to scp it to the AWS instance and ssh there to move stuff around. I decided to take my learning process up a notch and write a github action to deploy the site. Moreover, this time I tried to figure things out myself instead of asking for instructions beforehand.
My first idea was to recreate exactly what I am doing manually in a github workflow:
- Setup ruby
- Jekyll build the site from the repository
- Scp files to the running instance
This was the right approach in principle - do in your script what you would do manually - but apparently what I was doing manually wasn’t really the proper way. With AWS autoscaling you have an autoscaling group, responsible for making sure you always have a specified number of instances running. The autoscaling group is based on a launch template that is based on Amazon Machine Image (AMI) that was created based on some instance having the correct configuration (in my case nginx server + the contents of my static website).
So when I update only the instance, the information is not propagated to the other parts of the setup. And so if/ when the instances crashes new instance would be launched from the old AMI with the old website version.
And that meant I actually need to go through almost the full setup of the website I did manually in the first place:
- Launch a new EC2 instance
- Configure nginx on it correctly
- Create a new AMI based on that instance
- Assign that new AMI to the launch template as a new template version
- Make that version default so it would be used by autoscaling group the next time it tries to do something.
- Trigger instance refresh on the autoscaling group.
The last part is meant to be a safe way to rollout a new version of the system. But since I’m a cheapskate and have only one instance running, I still had a few minutes of downtime every time I was experimenting with this.
I wanted to do this in the simplest way possible, not relying to much on other github actions. So I used:
- AWS shell (+ an action to setup credentials)
- jq (available in github action by default) - to slice and dice the AWS responses
- Bash scripts - the github workflow file was getting quite big so I created two scripts to be run in it: one for nginx setup on newly launched instance, one for doing all the needed preparation and starting an instance refresh).
All of this wasn’t easy, as I was not familiar with the tooling. I struggled with simple things quite a lot:
Github actions
Using ssh-agent. My idea was to have nice little steps in my workflow like:
- Setup ssh credentials
- Scp files
But apparently if you add your key to ssh agent in one workflow step that information is completely lost in the next one. Fortunately, you can hack around that by binding that agent to a socket.
Testing the action. Apparently it is not a simple thing to do as workflow files are only picked up from main branch. I ended up using this hack that requires creating a dummy action on the main branch.
Bash language
Using variables. The preparation for instance refresh requires getting and passing a lot of IDs around. You need to create an instance and pass its ID for AMI creation. Then you need to update the launch template version with the AMI ID and use the new version number to set it as default. And you need to know some values like autoscaling group name and launch template ID to be able to access all that. So naturally I tried to put a lot of things into variables, to make things more readable. But I struggled with such simple task quite a lot:
- In shell you can specify env variables right before command
FOO=bar echo $FOO
or, in other words you can’t ever have spaces between the equality sign and the value you are assigning to the variable likeINSTANCE_ID= $1
As this will be interpreted like assign empty value to
INSTANCE_ID
and execute whatever is in$1
. And that results in a not found error as the value is not a command. - Another thing I’ve struggled a little with was passing the actual value of the variable to code. I started with defining some variables for a workflow step like
AUTOSCALING_GROUP_NAME: 'Autoscaling group name'
But apparently YAML strips the quotes and when the shell command is executed it sees 3 separate arguments: Autoscaling, group and name. Quotes seem to be the remedy from your shell scripts behaving in unexpected ways:
- Single quotes mean: treat this text as is, do not perform any substitutions.
- Double quotes mean: perform environment variable substitutions, but treat the result as a single string.
As friend put it mildly ‘Welcome to the weirdness of scripting language invented before usability was a thing’.
Specifying the interpreter. When I started writing the script I just added #!/bin/sh
to the top of my script as friend had shown me previously and forgot about it. And it worked fine until I needed to have an until loop to poll for the state of my instance refresh. I used bash syntax for that and started getting errors like [[: not found
. It turns out /bin/sh
on Linux is not Bash. In order for the script to use Bash I needed to specify that more explicitly: #!/usr/bin/env bash
Parsing JSON outputs correctly
So my variables are set and working, the until loop is running. But unfortunately it is running forever. It was set to to stop when instance refresh status changes to ‘Successful’. And I knew for sure that status is correct now, but the loop just didn’t seem to care. And it was because of some nuances of how I retrieve the status from the aws cli response. I used jq to get me the value of the status field from json response. And apparently that still included quotes. So I was comparing "Successful"
against Successful
. In order to get the raw output from jq you need to specify that with -r
option.
As you can see sometimes very little things can get in your way. Especially when you are working with unfamiliar tooling. It took me three months, 77 commits and 22 AWS launch template versions to finally get things right! But I guess the time invested (and the fact that I didn’t give up) just makes it feel like a bigger accomplishment. I feel like I deserve some blogger gin now. Too bad they don’t seem to sell it anymore.
P. S. If you want to see how the end result looks like, it is something like this:
Workflow
name: CI/CD
on:
push:
branches: [ main ]
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
bundler-cache: true
- name: Build Site
run: bundle exec jekyll build
env:
JEKYLL_ENV: production
- name: Setup AWS CLI
uses: unfor19/install-aws-cli-action@v1
with:
version: 2 # default
verbose: false # default
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: $
aws-secret-access-key: $
aws-region: eu-west-1
- name: Create new AWS instance
run: |
INSTANCE_ID=$(aws ec2 run-instances --image-id ***** --count 1 --instance-type t4g.micro --key-name '*****' --security-group-ids***** | jq -r .Instances[0].InstanceId)
if test -z "${INSTANCE_ID}"; then exit 1; fi
echo "INSTANCE_ID=${INSTANCE_ID}" >> $GITHUB_ENV
- name: Get public DNS of the new instance
run: |
NEW_INSTANCE_PUBLIC_DNS=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query "Reservations[].Instances[].PublicDnsName[]" --output text)
if test -z "${NEW_INSTANCE_PUBLIC_DNS}"; then exit 1; fi
echo "NEW_INSTANCE_PUBLIC_DNS=${NEW_INSTANCE_PUBLIC_DNS}" >> $GITHUB_ENV
- name: Wait for all running instances to be in ok state
run: aws ec2 wait instance-status-ok --instance-ids $INSTANCE_ID
- name: Setup ssh credentials
env:
SSH_AUTH_SOCK: /tmp/ssh_agent.sock
run: |
ssh-agent -a $SSH_AUTH_SOCK > /dev/null
ssh-add - <<<"$"
- name: SCP files
env:
SSH_AUTH_SOCK: /tmp/ssh_agent.sock
run: scp -o StrictHostKeyChecking=no -r ./_site "ec2-user@${NEW_INSTANCE_PUBLIC_DNS}:"
- name: Setup instance
env:
SSH_AUTH_SOCK: /tmp/ssh_agent.sock
run: |
ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "echo \"$\" >> .ssh/authorized_keys"
ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} 'bash -s' < ./scripts/fresh_instance_setup.sh
ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "sudo mv /home/ec2-user/_site/* /data/www && rm -rf ./_site"
- name: Trigger instance refresh
run: ./scripts/instance_refresh.sh $INSTANCE_ID $
New instance setup script
#!/usr/bin/env bash
sudo yum -y update
sudo yum -y install yum-utils
sudo touch /etc/yum.repos.d/nginx.repo
sudo tee -a /etc/yum.repos.d/nginx.repo > /dev/null <<EOT
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
EOT
sudo yum -y install nginx
sudo rm /etc/nginx/conf.d/default.conf
sudo touch /etc/nginx/conf.d/my-website.conf
sudo tee -a /etc/nginx/conf.d/my-website.conf > /dev/null <<EOT
server {
location / {
root /data/www;
}
}
EOT
sudo mkdir -p /data/www
AWS instance refresh
#!/usr/bin/env bash
INSTANCE_ID=$1
RUN_IN=$2
AMI_OWNER_ID="*****"
TEMPLATE_ID="*****"
IMAGE_NAME="ieva.dev image ${RUN_IN}"
VERSION_DESCRIPTION="version ${RUN_IN}"
GROUP_NAME="*****"
OLD_IMAGE_ID=$(aws ec2 describe-images --owners "$AMI_OWNER_ID" --no-include-deprecated | jq -r '.Images[].ImageId')
if test -z "${OLD_IMAGE_ID}"; then exit 1; fi
IMAGE_ID=$(aws ec2 create-image --instance-id "$INSTANCE_ID" --name "$IMAGE_NAME"|jq -r '.ImageId')
if test -z "${IMAGE_ID}"; then exit 1; fi
LATEST_TEMPLATE_VERSION=$(aws ec2 describe-launch-templates --launch-template-ids "$TEMPLATE_ID" | jq -r '.LaunchTemplates[].LatestVersionNumber')
if test -z "${LATEST_TEMPLATE_VERSION}"; then exit 1; fi
NEW_TEMPLATE_VERSION=$(aws ec2 create-launch-template-version --launch-template-id "$TEMPLATE_ID" --version-description "$VERSION_DESCRIPTION" --source-version "$LATEST_TEMPLATE_VERSION" --launch-template-data "ImageId=${IMAGE_ID}"| jq -r '.LaunchTemplateVersion.VersionNumber')
if test -z "${NEW_TEMPLATE_VERSION}"; then exit 1; fi
aws ec2 modify-launch-template --launch-template-id "$TEMPLATE_ID" --default-version "$NEW_TEMPLATE_VERSION"
INSTANCE_REFRESH_ID=$(aws autoscaling start-instance-refresh --auto-scaling-group-name "$GROUP_NAME"| jq -r '.InstanceRefreshId')
if test -z "${INSTANCE_REFRESH_ID}"; then exit 1; fi
until [[ `aws autoscaling describe-instance-refreshes --auto-scaling-group-name "$GROUP_NAME" --instance-refresh-id "$INSTANCE_REFRESH_ID" | jq -r '.InstanceRefreshes[0].Status'` == 'Successful' ]]
do
sleep 15
echo "Waiting for instance refresh"
done
aws ec2 terminate-instances --instance-ids "$INSTANCE_ID"
aws ec2 deregister-image --image-id "$OLD_IMAGE_ID"
aws ec2 delete-launch-template-versions --launch-template-id "$TEMPLATE_ID" --versions "$LATEST_TEMPLATE_VERSION"