IMG_20200908_114948.jpg

Blog

How to: Set up a CI/CD pipeline with Github Actions

…A.K.A. navigating the gotchas and pitfalls of a new good-but-not-perfect tool


Today we’ll be building a CI/CD pipeline using
Github Actions (GHA) 🚀
This is really a simplified version of what I’ve been doing at work at Agrando over the past few weeks. Because GHA is such a new tool (generally available as of 13.11.2020), most of the online resources are “here’s how you do this super simple task”, or “here’s a rewording of the documentation which is already confusing”. I found it really difficult to find articles on setting up a production system and battling some of the issues that non-hobbyists will encounter.

The overall structure is this:
We have a containerized repository for our application code (can be backend, frontend, both, etc.). We have a repository for DevOps (deployment scripts, etc.). We’re using Google Cloud Platform (most of this is applicable if you’re using AWS or something similar). When we make a change in the application repository, we want that change to build an image, be tested, and be deployed to the appropriate server. For sake of example, we’ll pretend we only have a dev and production server.

Github Actions uses YAML files for workflow configuration (similar to CircleCI, TravisCI, etc.) Instead of restating the documentation, I highly recommend you skim this.
Okay, now that you know how to fly this thing, some sidenotes about today’s lesson:

  • We’ll have two workflow files: build+test.yml and deploy.yml

  • Inside of build+test.yml we have two jobs. Unsurprisingly, build and test

  • Inside of deploy.yml we have one job: deploy

  • Inside of the jobs, we have multiple steps

  • Contrary to the name Github Actions, we’re not actually building any actions (They should have named it Github Workflows, but I guess marketing got involved 🙄)

Let’s get to it. To help, I’ve made this low resolution diagram of how everything fits together:

Github Actions CI/CD Pipeline


The Build+Test Workflow

This workflow combines building an image from your code and using that image again to run tests. All of this goes with the assumption you’ve got a containerized environment. If you don’t, I suggest you do, but otherwise, you can just omit the Build Job and go straight to the testing. Why combine the two into a single workflow? That’s discussed in the next sub-section “The Test Job”.

The Build Job

The first thing we need to do is build our image. This will standard our environments so we don’t need to do environment replication when testing or on our servers. If you haven’t yet dockerized your environments, go to your VP of Engineering and push on making that a priority, then come back here when you’re done.

Assuming you’re ready to go, the first thing we’ll need to do is set up a workflow in your repository under .github/workflows/build+test.yml and set up the workflow:

name: Build + Test
on: push
jobs:
  build:
    name: Build
    runs-on: ubuntu-latest
    steps:

So we’ve set up the workflow - now we need to add steps so we can get something done. The first thing I like to do is create a step for defining environment variables. For our example, the env variables will differ based on if the branch we’re building is master or something else (develop, feature branches, etc. - assuming “everything else” uses a common dev environment).

It’s worth noting there are a few ways to set environment variables in GHA. Even Github says “You can define environment variables for a step, job, or entire workflow using the jobs.<job_id>.steps.env, jobs.<job_id>.env, and env keywords.” How I would use them:

  • env declaration (workflow-wide): Nice, but you can’t use bash, so usually best suited for constants, or values that can be manipulated using Github’s expression sytax

  • jobs.job_id.env: Same as above, but on a job-level

  • jobs._job_id.steps.env: Same as above, but on a step-level

  • set_env: Allows you to set environment variables within the job via shell scripting. This is my preferred method, I just wish it were possible to set them on a workflow-level. 😕

Below, we calculate values using the shell and use set_env to make them available for the rest of the job.

- name: Set Env Vars
  # TODO: Good case for refactoring into an action or script
  run: |
   # Docker doesn't accept uppercase characters for image names
   echo ::set-env name=BRANCH_NAME_LOWERCASE::$(tr '[:upper:]' '[:lower:]' <<<"${GITHUB_REF#refs/heads/}")
   if [ $BRANCH_NAME_LOWERCASE = "master" ]; then     
     ENV_FILE_NAME=".env.production"
     GCP_SA_EMAIL=${{ secrets.GCP_SA_EMAIL_PRODUCTION }}
     GCP_SA_KEY_BASE64_ENCODED=${{ secrets.GCP_SA_KEY_BASE_64_ENCODED_PRODUCTION }}
     GCP_PROJECT="project-id-production"
   else
     ENV_FILE_NAME=".env.dev"
     GCP_SA_EMAIL=${{ secrets.GCP_SA_EMAIL_DEV }}
     GCP_SA_KEY_BASE64_ENCODED=${{ secrets.GCP_SA_KEY_BASE_64_ENCODED_DEV }}
     GCP_PROJECT="project-id-dev"
   fi
   echo ::set-env name=ENV_FILE_NAME::$ENV_FILE_NAME
   echo ::set-env name=GCP_SA_EMAIL::$GCP_SA_EMAIL
   echo ::set-env name=GCP_SA_KEY_BASE64_ENCODED::$GCP_SA_KEY_BASE64_ENCODED
   echo ::set-env name=GCP_PROJECT::$GCP_PROJECT

You’ll notice the calculation for BRANCH_NAME_LOWERCASE. This is important, especially if you use JIRA and git flow, because your branches will be named something like feature/PROJECT-1 and this could get ugly if you start building/testing branches with these names. So I’ve adopted it as a strategy to just always lowercase the branch name and use this throughout the job.

Now if we’re going to use Google Cloud Build later, we need to activate gcloud. This is the official action sponsored by Google. Additionally, there is a gcloud action on the Github Actions Hub (if you’re confused, it’s OK), but I prefer the former (shown below) because it allows you to use gcloud across steps.

- name: Install gcloud CLI
  uses: GoogleCloudPlatform/github-actions/setup-gcloud@master
  with:
   # Please occasionally update this version
   version: '280.0.0'
   project_id: ${{ env.GCP_PROJECT }}
   service_account_email: ${{ env.GCP_SA_EMAIL }}
   service_account_key: ${{ env.GCP_SA_KEY_BASE64_ENCODED }}

Now we need to actually check out the repository (pull down your code). This is going to be one of the shortest steps in your workflow. Note that there’s a lot of other options available for this action, and you can even checkout other repositories, which are then mounted as a child or sibling in the file system of this workflow.

- name: Checkout Ref
  uses: actions/checkout@v2

And now the moment we’ve been waiting for is here. The actual build.

- name: GCP Cloud Build
  run: gcloud builds submit . --substitutions=REPO_NAME=${{ env.REPO_NAME }},BRANCH_NAME=${{ env.BRANCH_NAME_LOWERCASE }},TAG_NAME="latest",_ENV_FILE_NAME=${{ env.ENV_FILE_NAME }}

This is a pretty simple build, but I would still make some recommendations:

  • Right now we’re building in the GHA workflow, but what if for some reason we want to build from our local machine? Then we’ll need to copy and replicate this command on our local machine. Instead, I’d recommend a separate build script which calls this command. This would standardize the build invocation, assure the necessary parameters are always there, and maybe do some setup work as well. Then invoking the build from your local machine would be the same as invoking from GHA!

  • The TAG_NAME is hardcoded right now, but I’d recommend a better tagging strategy. Definitely one that includes the SHA of the commit, but ideally one with some sort of automatic versioning.

  • Google Cloud Build automatically looks for a cloudbuild.yml file, which we’re assuming here exists. If for some reason you have multiple cloudbuild files, you’ll need to specify that here.

  • We’re passing ENV_FILE_NAME so our cloudbuild.yml can pull the correct env file when we’re building, because we’re assuming our build requires an env file.

Now we can easily deploy, right!? Well, kind of. You should think of this as a categorically different monster. Ideally you have a separate DevOps repository which handles deployments independently of application context. Here's where I think GHA is really lacking. Triggering workflows across repositories is a pain in the ass. To make it happen, you have to trigger what’s called a “Repository Dispatch” which is basically just an API call. To do so, I recommend peter-evans/repository-dispatch, but this is a private solution and I think Github should have an official action themselves. The huge gotcha here: repository dispatches are only triggered on the master branch! This means if you’ve got an orderly codebase with a release cadence and you don’t want to slop up your master branch with a bunch of test commits, you’re gonna have a bad time. What I do: change the on trigger at the top to push so you can test on a feature branch, and once you’re sure everything works, change it to:

on: 
  repository_dispatch:
    types: [trigger-deploy]

But as you can imagine, testing this is a pain in the ass.

The below step triggers a deploy on the my-org/devops repository and sends a client payload containing data from this workflow. Because otherwise, your deployment workflow receives “it’s time to deploy!” with no information about where this message came from. Again, Github needs to improve this and IMO it should be native.

- name: Trigger Deploy (if develop or master branch)
  # https://github.community/t5/GitHub-Actions/Trigger-a-workflow-from-another-workflow/td-p/43681
  # Can't use $GITHUB_REF here
  if: env.BRANCH_NAME_LOWERCASE == 'develop' || env.BRANCH_NAME_LOWERCASE == 'master'
  uses: peter-evans/repository-dispatch@v1
  with:
    token: ${{ secrets.GITHUB_MACHINE_USER_PAT }}
    repository: my-org/devops
    event-type: trigger-deploy
    client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}", "repo": "${{ github.repository }}"}'

Additional notes:

  • The comment # Can't use $GITHUBREF here: We can conveniently reuse the BRANCH_NAME_LOWERCASE from earlier, but it’s worth noting you can access the REF of your commit via two ways in GHA: GITHUBREF and github.ref. Both exist in the documentation, except the latter a little more so. The former cannot be used within expressions (such as a GHA if statement), so I’d just stick with github.ref and forget about the other.

  • GITHUB_MACHINE_USER_PAT: You need to call this trigger with a Github Personal Access Token (you can create it in settings and it needs the `repo` permission and maybe some more - Github permissions are confusing). I suggest creating a “Machine User” for your organization, which is a user the DevOps team can access and holds accounts which are independent of any specific team member.

  • Did you notice we triggered the deployment before running tests!? 😱
    Well, that’s a decision you can make. You can easily move this trigger to the end of the test job (discussed next), but because our tests take a long time, we’ve found it better for our QA process by deploying first and testing later 🤠.
    Really important note: We’re not automatically deploying to production. Most of this article I’ve changed “staging” -> “production”. We would never deploy to production without ensuring tests are passing.

That’s it! Assuming your build is configured correctly, you now have an image in Google Container Registry (or somewhere else) that we can use in the following jobs / workflows.


The Test Job

Now we need to run the tests. The strategy of when to test is up to you. For this example, we’ll assume we want to test every commit. Historically for testing, you’d set up a Jenkins server and try your best to replicate the “real” environment, occasionally updating the internal company wiki for setup instructions and never updating packages because if it ain’t broke, don’t fix it! Well, That’s all over now. Since we have a built image, we don’t need Mr. Jenkins no more (thank God)! We’re now working under the mantra if it ain’t broke, make it better.

Here’s where things get weird and you can skim unless you’re stuck:
You can either create a separate workflow file for testing, which would be more reusable, or do the testing in a separate job of the same workflow as the build. While the first seems like the better option, it’s much trickier with GHA. We only want our tests to run if our build passes (because otherwise we don’t have an image to test!), but when would we trigger the test workflow? We’d have to set up a repository dispatch at the end of the Build workflow which triggers the Test workflow. But that’s not exactly “perfect” because something could happen near the end of the Build workflow and prevent the repo dispatch from happening. Also, we have the problem mentioned earlier, which is repo dispatches are only received on the master branch. Additionally, repo-dispatches don’t show much information in their description, only “repository dispatch”, so unless you create a step to output the SHA of the commit which triggered the test, back-tracing the test to the commit is a pain. Finally, repo dispatches lose the connection to pull requests, so if you want a PR to trigger a test and display it as a PR check step, you’ll need to configure that manually.

That’s a headache to manage. So we’ll avoid that and instead proceed with Test as job #2 in our build+test.yml workflow

We start again by defining a job, and indicate that it only runs when the build works:

test:
  name: Test
  needs: build
  runs-on: ubuntu-latest

Then we need to again load the correct env vars and files based on branches. Why? Well if you have EXAMPLE_ENV_VAR and it affects your testing, and you change this for the develop branch, but not yet for the master branch, you’ll need test environments for your branches. You could use your regular environment files, but what happens if they do something like enabling email sending? So my recommendation is to have environment specific files, along with corresponding test environment files for each environment. Bonus points if you find a way to use inheritance and keep it DRY. Here’s what it will look like. Notice that it’s the exact same code as in the build job.

- name: Set Env Vars
  # TODO: Good case for refactoring into an action or script
  run: |
   # Docker doesn't accept uppercase characters for image names
   echo ::set-env name=BRANCH_NAME_LOWERCASE::$(tr '[:upper:]' '[:lower:]' <<<"${GITHUB_REF#refs/heads/}")
   if [ $BRANCH_NAME_LOWERCASE = "master" ]; then     
     ENV_FILE_NAME=".env.production.test"
     GCP_SA_EMAIL=${{ secrets.GCP_SA_EMAIL_PRODUCTION }}
     GCP_SA_KEY_BASE64_ENCODED=${{ secrets.GCP_SA_KEY_BASE_64_ENCODED_PRODUCTION }}
     GCP_PROJECT="project-id-production"
   else
     ENV_FILE_NAME=".env.dev.test"
     GCP_SA_EMAIL=${{ secrets.GCP_SA_EMAIL_DEV }}
     GCP_SA_KEY_BASE64_ENCODED=${{ secrets.GCP_SA_KEY_BASE_64_ENCODED_DEV }}
     GCP_PROJECT="project-id-dev"
   fi
   echo ::set-output name=ENV_FILE_NAME::$ENV_FILE_NAME
   echo ::set-output name=GCP_SA_EMAIL::$GCP_SA_EMAIL
   echo ::set-output name=GCP_SA_KEY_BASE64_ENCODED::$GCP_SA_KEY_BASE64_ENCODED
   echo ::set-output name=GCP_PROJECT::$GCP_PROJECT

Why did I copy/paste the code from above? To illustrate another improvement that could be made to GHA: template inheritance and/or reusability. Technically I think you could do this with actions which would reduce the lines of code, but you’d end up copy/pasting that setup action across your workflows. Problem still not really solved. I’d like to see the ability to run scripts above the jobs, inside of the workflow which allows you to run commands that will apply to the entire workflow.

Again, we need to call checkout. More code repetition. I understand the benefit of “starting from scratch” with every job, but it could be beneficial to make this optional.

- name: Checkout Ref
  uses: actions/checkout@v2

And again, we need to set up gcloud (so we can pull the image). More code repetition!

- name: Install gcloud CLI
  uses: GoogleCloudPlatform/github-actions/setup-gcloud@master
  with:
   # Please occasionally update this version
   version: '280.0.0'
   project_id: ${{ env.GCP_PROJECT }}
   service_account_email: ${{ env.GCP_SA_EMAIL }}
   service_account_key: ${{ env.GCP_SA_KEY_BASE64_ENCODED }}

In our case, we need to configure gcloud to act as an authenticator for docker so we can pull from gcr.

- name: Configure gcloud docker
  run: gcloud auth configure-docker

Now we create the container to house our image. Note below we’re using the “latest” tag, but it would be more complete if we tagged our images with the SHA of the commit for this workflow (discussed earlier).

- name: Create container
  run: docker create --name container-name gcr.io/${{ env.GCP_PROJECT }}/${{ env.REPO_NAME }}/${{ env.BRANCH_NAME_LOWERCASE }}:latest

Caution: Fun Ahead

Assuming you’re doing the right thing and being careful with your environment variables, you won’t have them packaged in your builds, which means in order to run tests, we’ll need to load in our necessary env file(s). But since we’re now using containers based on images, you can’t add files into the image, which means we need to copy the file to the container, which means we need to use docker cp, but since our image probably has a different CMD (or none at all) in the Dockerfile of which it was built, we need to change the container start command, which means we need to use the docker run command, but you can only use the docker cp command with already running containers, which means we need to use docker commit to create a derivative container to host our new files.

- name: Import secrets
  run: gcloud --quiet secrets versions access latest --secret="my-secret-env-file" > .env

- name: Transfer secrets to container
  run: docker cp .env container-name:/usr/src/app/

- name: Create new local derivative container
  run: docker commit container-name container-name-derivative

Note with the above: we’re using GCP Secret Manager, but this is just a choice, and secret management is a separate topic you’ll need to figure out yourself. Just please don’t store them unencrypted in version control. 🙏

Now we can finally run our tests!

- name: Run tests with coverage
  # --network=host used to connect to the postgres service (not shown above, but kept here for bonus help)
  run: docker run --rm --network=host container-name-derivative command-to-run-your-tests

That was like, super easy, right!? 😅
Are you ready to follow our process to the deployment?


The deploy workflow

This workflow has one job: deploy our image using a deployment script. As a tip, I recommend keeping focus on reusability and only using GHA for conditional job running. All other scripts for your deployment process should exist as files or processes checked into the repository.

The Deploy Job

This will be our only job of this example workflow. Depending on your buildout, you might want to separate it into more. Regardless, I suggest merely using GHA to call your deploy scripts, and leaving the meat of the work to better suited tools.

Once again, we set up a workflow file:

name: Deploy to Dev/Production

on: 
  # Trigger when devops repo is updated
  push:
    branches: master
  # Trigger from Repo Dispatches. Note, they are only received on the Master branch.
  repository_dispatch:
    types: [trigger-deploy]

jobs:
  deploy:
    name: Deploy
    runs-on: ubuntu-latest
    steps:

You can see in the on section that we’re triggering this in two ways. The first is if the devops repo itself is updated, we want to run a deployment. This is assuming you’re following GitOps / Infrastructure as Code best practices and updating your devops repository will automatically update your servers. The second is what receives the repository dispatch, discussed in the Build job of our Build+Test workflow. You can read more about this in the GHA documenation.

Once again, we checkout the repository. Keep in mind this checks out the devops repository.

- name: Checkout Ref
  uses: actions/checkout@v2

Our deployment script uses SSH to connect to our servers. Again, we use a “Machine User” account which contains an SSH key that is authorized to connect to our servers. Below, we move this key from Github Secrets to the .ssh directory of the workflow workspace.

- name: Set up SSH
  run: |
    mkdir ~/.ssh
    echo "${{ secrets.ID_RSA_GITHUB }}" > ~/.ssh/id_rsa
    chmod 600 ~/.ssh/id_rsa

Our deploy scripts use python (Fabric), so we need to install pip packages. Normally I’d exclude this because it’s specific to our process, but it’s also a good insight into caching.

- name: Set up Python
  uses: actions/setup-python@v1
  with:
    python-version: '3.7'

- name: Cache pip
  id: cache
  uses: actions/cache@v1
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}

- name: Install python packages
  # Don't need to test for cache-hit like seen in some documentation, pip automatically recognizes it
  run: pip install -r requirements.txt

Most of this you can find in the GHA Python documentation, but the comment about "cache-hit” is important because as of 04/20, the documentation for the cache action makes it appear as though you need to check for the cache hit, but when it comes to running pip install, you do not.

Finally, we call our deployment scripts. Again, a lot of repeated code

- name: Deploy to Dev if devops repo or repo dispatch for 'develop' branch
  # Empty client_paylod signifies devops repo was updated and not triggered from BE/FE
  if: github.event.client_payload == '' || github.event.client_payload.ref == 'refs/heads/develop'
  run: cd fab && fab deploy --server=dev --noinput

- name: Deploy to Production if devops repo or repo dispatch for 'master' branch
  # Empty client_paylod signifies devops repo was updated and not triggered from BE/FE
  if: github.event.client_payload == '' || github.event.client_payload.ref == 'refs/heads/master'
  run: cd fab && fab deploy --server=production --noinput

Why two code blocks that look similar? Because if the devops repo is updated, we want to run both, and if a repo dispatch is for a specific branch, we only want to run one. You could instead use some Bash scripting to consolidate this, but this maintains clarity as my team gets used to this new tooling. If we weren’t doing the if checks, we could use the build matrix feature (which we use in a few other workflows and it’s pretty handy).

Special note regarding running scripts inside of GHA
Recently I ran into a problem where a long running Python script (via Fabric) was making an API call using the gcloud CLI tool, finishing (output indicated so), but not being picked up by the system, resulting in a socket timeout. I did a lot of debugging, and this was not reproducible anywhere except via GHA. After a couple days of getting stuck, I changed from using the Ubuntu runner to the OSX runner and the problem was resolved. I think this has something to do with the socket configuration of the Ubuntu runner on GHA. If anyone figures this out, I would love to know!


Conclusion

That was it! Do you LOVE GHA now?! 😅
We took our code from build, to test, to deploy, without leaving Github! If you’re doing something similar, I hope this helps! If you’re considering switching to GHA and want to know what I think, well then…

I have mixed feelings about Github Actions. It’s a logical step for the company, but the current state of the project leaves a lot to be desired. This is evident both in the official support channel and with hands-on experience. But, I don’t see it going away, so I assume the product team is hard at work with constant improvements. Here’s a straightforward breakdown of my thoughts:

Pros

  • Tight coupling with where your repositories are stored results in much less back-and-forth communication compared to using a 3rd party

  • Workflow files are logically organized into the applicable repository

  • Simplifies interaction with incoming / outgoing triggers, such as on push or even interacting with Pull Requests, releases, other Github objects, etc.

Cons

  • Often found myself confused/lost in documentation

  • Lots of code repetition in workflow files

  • Not optimized for organizations with multiple repositories

  • Conditionals and dynamic scripting need maturity

  • Workflow workspace machines need maturity (as of 04/20, Debian OS not available, maximum of 2vCPUs available)

  • Choosing Github (which is now owned by Microsoft) over smaller players (read: small businesses) results in further Big Tech consolidation, which is a topic our industry should be taking more seriously

If I did this all over again, I’d probably wait 1-2 years before making the switch, but getting everything up and running was a really nice feeling.

Please feel free to leave comments below if you found any inconsistencies, mistakes, or just want to say thank you! 🙂