This blog now has a deployment pipeline!

18 Dec 2021

This site was created mostly as a learning experience. Setting it up was a complete point-and-click adventure following detailed instructions laid down by a friend of mine (more about it here). That didn’t feel like enough practice. Plus every time I write a new post, I need to scp it to the AWS instance and ssh there to move stuff around. I decided to take my learning process up a notch and write a github action to deploy the site. Moreover, this time I tried to figure things out myself instead of asking for instructions beforehand.

My first idea was to recreate exactly what I am doing manually in a github workflow:

Setup ruby
Jekyll build the site from the repository
Scp files to the running instance

This was the right approach in principle - do in your script what you would do manually - but apparently what I was doing manually wasn’t really the proper way. With AWS autoscaling you have an autoscaling group, responsible for making sure you always have a specified number of instances running. The autoscaling group is based on a launch template that is based on Amazon Machine Image (AMI) that was created based on some instance having the correct configuration (in my case nginx server + the contents of my static website).

AWS Autoscaling group

So when I update only the instance, the information is not propagated to the other parts of the setup. And so if/ when the instances crashes new instance would be launched from the old AMI with the old website version.

And that meant I actually need to go through almost the full setup of the website I did manually in the first place:

Launch a new EC2 instance
Configure nginx on it correctly
Create a new AMI based on that instance
Assign that new AMI to the launch template as a new template version
Make that version default so it would be used by autoscaling group the next time it tries to do something.
Trigger instance refresh on the autoscaling group.

The last part is meant to be a safe way to rollout a new version of the system. But since I’m a cheapskate and have only one instance running, I still had a few minutes of downtime every time I was experimenting with this.

I wanted to do this in the simplest way possible, not relying to much on other github actions. So I used:

AWS shell (+ an action to setup credentials)
jq (available in github action by default) - to slice and dice the AWS responses
Bash scripts - the github workflow file was getting quite big so I created two scripts to be run in it: one for nginx setup on newly launched instance, one for doing all the needed preparation and starting an instance refresh).

All of this wasn’t easy, as I was not familiar with the tooling. I struggled with simple things quite a lot:

Github actions

Using ssh-agent. My idea was to have nice little steps in my workflow like:

Setup ssh credentials
Scp files

But apparently if you add your key to ssh agent in one workflow step that information is completely lost in the next one. Fortunately, you can hack around that by binding that agent to a socket.

Testing the action. Apparently it is not a simple thing to do as workflow files are only picked up from main branch. I ended up using this hack that requires creating a dummy action on the main branch.

Bash language

Using variables. The preparation for instance refresh requires getting and passing a lot of IDs around. You need to create an instance and pass its ID for AMI creation. Then you need to update the launch template version with the AMI ID and use the new version number to set it as default. And you need to know some values like autoscaling group name and launch template ID to be able to access all that. So naturally I tried to put a lot of things into variables, to make things more readable. But I struggled with such simple task quite a lot:

In shell you can specify env variables right before command FOO=bar echo $FOO or, in other words you can’t ever have spaces between the equality sign and the value you are assigning to the variable like
```
INSTANCE_ID= $1
```
As this will be interpreted like assign empty value to INSTANCE_ID and execute whatever is in $1. And that results in a not found error as the value is not a command.
Another thing I’ve struggled a little with was passing the actual value of the variable to code. I started with defining some variables for a workflow step like
```
AUTOSCALING_GROUP_NAME: 'Autoscaling group name'
```
But apparently YAML strips the quotes and when the shell command is executed it sees 3 separate arguments: Autoscaling, group and name. Quotes seem to be the remedy from your shell scripts behaving in unexpected ways:
- Single quotes mean: treat this text as is, do not perform any substitutions.
- Double quotes mean: perform environment variable substitutions, but treat the result as a single string.

As friend put it mildly ‘Welcome to the weirdness of scripting language invented before usability was a thing’.

Specifying the interpreter. When I started writing the script I just added #!/bin/sh to the top of my script as friend had shown me previously and forgot about it. And it worked fine until I needed to have an until loop to poll for the state of my instance refresh. I used bash syntax for that and started getting errors like [[: not found. It turns out /bin/sh on Linux is not Bash. In order for the script to use Bash I needed to specify that more explicitly: #!/usr/bin/env bash

Parsing JSON outputs correctly

So my variables are set and working, the until loop is running. But unfortunately it is running forever. It was set to to stop when instance refresh status changes to ‘Successful’. And I knew for sure that status is correct now, but the loop just didn’t seem to care. And it was because of some nuances of how I retrieve the status from the aws cli response. I used jq to get me the value of the status field from json response. And apparently that still included quotes. So I was comparing "Successful" against Successful. In order to get the raw output from jq you need to specify that with -r option.

As you can see sometimes very little things can get in your way. Especially when you are working with unfamiliar tooling. It took me three months, 77 commits and 22 AWS launch template versions to finally get things right! But I guess the time invested (and the fact that I didn’t give up) just makes it feel like a bigger accomplishment. I feel like I deserve some blogger gin now. Too bad they don’t seem to sell it anymore.

chin-chin

P. S. If you want to see how the end result looks like, it is something like this:

Workflow


name: CI/CD

on:
  push:
    branches: [ main ]

  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      - name: Set up Ruby
        uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true
      - name: Build Site
        run: bundle exec jekyll build
        env:
          JEKYLL_ENV: production

      - name: Setup AWS CLI
        uses: unfor19/install-aws-cli-action@v1
        with:
          version: 2 # default
          verbose: false # default
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: $
          aws-secret-access-key: $
          aws-region: eu-west-1

      - name: Create new AWS instance
        run: |
            INSTANCE_ID=$(aws ec2 run-instances --image-id ***** --count 1 --instance-type t4g.micro --key-name '*****' --security-group-ids***** | jq -r .Instances[0].InstanceId)
            if test -z "${INSTANCE_ID}"; then exit 1; fi
            echo "INSTANCE_ID=${INSTANCE_ID}" >> $GITHUB_ENV
      - name: Get public DNS of the new instance
        run: |
            NEW_INSTANCE_PUBLIC_DNS=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query "Reservations[].Instances[].PublicDnsName[]" --output text)
            if test -z "${NEW_INSTANCE_PUBLIC_DNS}"; then exit 1; fi
            echo "NEW_INSTANCE_PUBLIC_DNS=${NEW_INSTANCE_PUBLIC_DNS}" >> $GITHUB_ENV
      
      - name: Wait for all running instances to be in ok state
        run: aws ec2 wait instance-status-ok --instance-ids $INSTANCE_ID

      - name: Setup ssh credentials
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: |
            ssh-agent -a $SSH_AUTH_SOCK > /dev/null
            ssh-add - <<<"$"

      - name: SCP files
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: scp -o StrictHostKeyChecking=no -r ./_site "ec2-user@${NEW_INSTANCE_PUBLIC_DNS}:"

      - name: Setup instance
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: |
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "echo \"$\" >> .ssh/authorized_keys"
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} 'bash -s' < ./scripts/fresh_instance_setup.sh
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "sudo mv /home/ec2-user/_site/* /data/www && rm -rf ./_site"

      - name: Trigger instance refresh
        run: ./scripts/instance_refresh.sh $INSTANCE_ID  $

New instance setup script


#!/usr/bin/env bash

sudo yum -y update
sudo yum -y install yum-utils

sudo touch /etc/yum.repos.d/nginx.repo
sudo tee -a /etc/yum.repos.d/nginx.repo > /dev/null <<EOT
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
EOT

sudo yum -y install nginx

sudo rm /etc/nginx/conf.d/default.conf

sudo touch /etc/nginx/conf.d/my-website.conf
sudo tee -a /etc/nginx/conf.d/my-website.conf > /dev/null <<EOT
server { 
        location / { 
                root /data/www;
        }
}
EOT

sudo mkdir -p /data/www

AWS instance refresh


#!/usr/bin/env bash

INSTANCE_ID=$1
RUN_IN=$2
AMI_OWNER_ID="*****"
TEMPLATE_ID="*****"
IMAGE_NAME="ieva.dev image ${RUN_IN}"
VERSION_DESCRIPTION="version ${RUN_IN}"
GROUP_NAME="*****"

OLD_IMAGE_ID=$(aws ec2 describe-images --owners "$AMI_OWNER_ID"  --no-include-deprecated | jq -r '.Images[].ImageId')
if test -z "${OLD_IMAGE_ID}"; then exit 1; fi

IMAGE_ID=$(aws ec2 create-image --instance-id "$INSTANCE_ID" --name "$IMAGE_NAME"|jq -r '.ImageId')
if test -z "${IMAGE_ID}"; then exit 1; fi

LATEST_TEMPLATE_VERSION=$(aws ec2 describe-launch-templates --launch-template-ids "$TEMPLATE_ID" | jq -r '.LaunchTemplates[].LatestVersionNumber')
if test -z "${LATEST_TEMPLATE_VERSION}"; then exit 1; fi

NEW_TEMPLATE_VERSION=$(aws ec2 create-launch-template-version --launch-template-id "$TEMPLATE_ID" --version-description "$VERSION_DESCRIPTION" --source-version "$LATEST_TEMPLATE_VERSION" --launch-template-data "ImageId=${IMAGE_ID}"| jq -r '.LaunchTemplateVersion.VersionNumber')
if test -z "${NEW_TEMPLATE_VERSION}"; then exit 1; fi

aws ec2 modify-launch-template --launch-template-id "$TEMPLATE_ID" --default-version "$NEW_TEMPLATE_VERSION"

INSTANCE_REFRESH_ID=$(aws autoscaling start-instance-refresh --auto-scaling-group-name "$GROUP_NAME"| jq -r '.InstanceRefreshId')
if test -z "${INSTANCE_REFRESH_ID}"; then exit 1; fi

until [[ `aws autoscaling describe-instance-refreshes --auto-scaling-group-name "$GROUP_NAME" --instance-refresh-id "$INSTANCE_REFRESH_ID" | jq -r '.InstanceRefreshes[0].Status'` == 'Successful' ]]
do
 sleep 15
 echo "Waiting for instance refresh"
done

aws ec2 terminate-instances --instance-ids "$INSTANCE_ID"
aws ec2 deregister-image --image-id "$OLD_IMAGE_ID"
aws ec2 delete-launch-template-versions --launch-template-id "$TEMPLATE_ID" --versions "$LATEST_TEMPLATE_VERSION"