Waldo sessions now support scripting! – Learn more

Why and how we built a Slack bot for code deploys

March 20, 2019

min read

I’d like to introduce you to waldo’s newest buddy, Walbot. Walbot lives in the great unknown between Slack, AWS, and Github. There, he performs great magic that provides our team with an extra layer of security and more seamless collaboration.

What Walbot does

Walbot is our deployment bot. We send him commands from our #deployment Slack channel to complete in our infrastructure. Whenever we create an update for our code, we use Walbot to deploy (or launch) it.

For context, we have three environments we write code on, testing it at every stage to ensure it’s working properly, little by little. They are:

Staging – an isolated environment where we can create new code, change existing code, and test it without breaking anything in production.
Beta – all data is the same as Production with limited access to people we choose to give it to.
Production – the stable version of the product, available to all end users.

Everything in our infrastructure is hosted on Amazon Web Services (AWS). We prefer cloud infrastructure management so we don’t have to manage our own servers or hardware. It allows us to scale our product and systems easier and faster.

Why we built Walbot

In order to deploy a new version of a service/application, people typically either use their hosting service’s graphics interface directly or use a command that’s run locally to deploy a specific branch or commit. Meaning, the only way for your teammates to know what version is live is to add specific metadata in your application or to check directly in your hosting service’s portal.

That’s fine when only a few people are working together, but as a team grows, this leads to critical problems and greater margin for error. You need to confirm that what’s being deployed has passed all the tests—that your command is correct and you’re deploying the right version to the right service with the right environment.

Multiple people could be testing their code at the same time and unless you’re asking all your teammates, it’s hard to know if the service is available or currently being used. It’s the worst when one person has tasks running and another deploys something, therefore canceling those tasks out. That’s a lot of wasted engineering time.

We created Walbot to add a layer of security and better collaboration on our deployments.

It has centralized and simplified all the commands we need for any deployment directly on Slack. It can only deploy builds that went successfully through our CI (continuous integration), making sure all our tests passed.

We can also see a history of all deployments, as well as what deployments are taking place in real time, creating full transparency as we get notified if anything wrong happens.

If one of us is deploying new code and we need full, uninterrupted control, we pin the Slack post to the channel asking the team to not touch it until we unpin it. If there’s a bug, we can see exactly which version of the product it was on and investigate what caused it quickly and easily.

How Walbot works

At waldo, all the engineering work and collaboration is done on Github (see how we collaborate here). All our code is hosted in different repositories. That means we can track everything we do every day by taking a look at our Github team account. Any discussion, commit, pull request, or review is done on their platform. This is the starting point of our CI/CD (continuous deployment).

Due to the specificity of our business, our architecture relies on multiple services that communicate with each other. Most of our services are hosted on AWS Fargate, which allows us to scale our infrastructure easily without managing or configuring clusters or servers.

We use AWS Fargate in parallel with Docker. Docker provides a way to package your application into an image you can then run in a container so your application has everything it needs to run including libraries, system tools, code, and runtime. Docker virtualizes an operating system for containers the same way a virtual machine virtualizes server hardware.

AWS Fargate can be configured to pull any image from any AWS ECR repositories and run it in a container. All you need is to create a task definition specifying which image from AWS ECR to use and update the given service with the new task definition (here’s how to do that).

What does all this have to do with CI/CD?

This is where it gets interesting. AWS also provides a service called CodeBuild. It allows us to run commands, build Docker images, and more importantly, it connects directly with Github.

Every time an engineer at waldo commits some code, it triggers a build in CodeBuild. CodeBuild pulls the code from Github and runs the tests. If those are successful, the Docker image is built, tagged with the commit hash, and then pushed to the right depository in AWS ECR. Once this is done, the status is updated on GitHub and a little mark appears next to the commit/PR. This is probably our favorite part of the process.

Walbot in action

Once all this is set up, it becomes pretty easy to update the AWS Fargate service to deploy a safe version of our application. This is where Walbot steps up to the plate.

To deploy on a service, we open the #deployment Slack channel and simply type “@walbot deploy [deploy instructions]” and it carries out the function. Walbot is based on Hubot (Github’s open-source script library for building a robot), so it’s triggered on specific patterns on the channel, but there are plenty of other libraries that work as well.

Walbot recognizes the repository, service, branch, or commit and the environment directly from the command.

For instance, when we want to deploy the updated version of our frontend service (called webapp), we type “@walbot deploy webapp to production”. Walbot then pulls the repository from Github where it retrieves the git hash and ensures the corresponding image exists on AWS ECR from its tags.

We also allow it to deploy a specific branch or commit, such as “@walbot deploy webapp:feature/signup-autocomplete to staging”. In this instance, the bot creates the new task definition with the specified link to the image, updates the service with the new task, and waits until the image is successfully deployed to notify the team on Slack.

The beauty of this is that Walbot itself is a service and can redeploy itself (Walbotception).

Lessons learned in building our Slack bot

Building Walbot was pretty simple but we did run across a couple of snags.

For one, building your Docker image and running tests in CodeBuild is not always an easy task. In order to run your tests directly in CodeBuild, you need to have your application’s dependencies installed (your database, tools, third-party libraries, etc.). But given the environment provided by CodeBuild, it gets tricky to install some of them during the build. For example, installing NodeJS or Flow can quickly become cumbersome.

One solution to use Docker inside CodeBuild is to provide your own Docker image with all the dependencies (for us, this includes python, git, GitVersion, and git-crypt to list just a few) to CodeBuild. However, this is not an end-all solution. Some dependencies need to be updated as your service evolves and you don’t want to have to update your CodeBuild image every two weeks.

The best solution we found is to build a test image that’s then run in CodeBuild. If any of the tests don’t go through, the whole build fails. We then build the actual service image that’s pushed to AWS ECR.

The other issue we encountered was that the more you add things to your infrastructure, the more difficult it becomes to manage the permissions, policies, roles, and services. AWS’s user interface is helpful at the beginning but gets difficult to maintain as you grow. The solution we found is to use Terraform, which allows you to plan, update, and change your infrastructure directly as code. Keep your eyes peeled, as we’ll break down how we use Terraform in a future article.