When we think about git and git repos, we don’t often think about separating the .git repo itself from the working directory. But we can actually have a lot of fun with bare repos. They give you a lot of flexibility, and when you’re doing things like deploying code or running builds, that’s useful.
Searching the web, it’s actually not super easy to find info on how to do this. I figured that writing up a post on it would be helpful both for me and for anyone who finds this.
Creating a --bare Clone
Cloning a repo bare is easy enough. When you run git clone, you simply include the --bare flag. It’ll create a directory that is identical to the .git directory inside your normal old git checkout. The convention is to name this directory <whatever>.git, but that’s optional. The only difference between this checkout and your normal repo’s .git directory is that the config file will have bare = true. So to wrap up, your whole clone command will look like this: git clone --bare email@example.com:<org|user>/<repo-name>.git <repo-name>.git.
Now, because you have a bare repo, a few things are probably different from the repos that you’re accustomed to working with:
There’s no ‘working directory’
Nothing is ‘checked out’
You aren’t ‘on’ a branch
The cool thing is that using a bare repo actually lets you work with a few working directories, if you want. Each working directory will be free of a .git directory, so they’ll be smaller and not contain the entire history of your project.
Updating a Bare Repo
To update your repo, you’re going to use a fetch command, but you’re going to specify an environment variable beforehand. You’ll want to point GIT_DIR to your bare checkout:
The master:master at the end of the command is telling git to get the changes from your origin‘s master branch and update your local master branch to match. If you want to update some other branch or from some other remote, you can adjust your command accordingly. If you’re looking to update all the branches in your repo, change out the master:master and put use --all instead.
Checking Out from a Bare Repo
Checking out from your bare repo is going to be almost identical to checking out anything in a normal repo, but you’ll need two environment variables specified: GIT_DIR and GIT_WORKING_DIR. Your command will look a lot like this:
The -f will discard any changes that have been made in the working directory. In most cases where you’ll be using this, that’s preferable to a failure just because something has changed in the directory.
This command will be the same whether you’re checking it out for the first time or updating it to the latest.
Hopefully that helps you (and me)! If you’ve got any questions or comments, of if I’ve made any errors, let me know in the comments below!
Awhile back, when wildcard certs first became available from Let’s Encrypt, I wrote a post about using Google Cloud DNS to create wildcard certificates. Since then, however, it’s come to my attention that Cloudflare offers DNS for free that interacts with an API. So I figured, why not move over to use Cloudflare’s DNS, instead? This post explains how to set up wildcard certs using Cloudflare’s DNS.
Setting up Cloudflare
Before you do anything else, you’ll need an account with Cloudflare. If you already have one, that’s great! You’ll need to import whatever domain you want to set up wildcard certs for – just follow the steps that Cloudflare gives you. The awesome thing is that Cloudflare will automatically detect your existing records (or at least try to) and import them for you. It might miss some, so just be aware and manually add any that it’s missing.
Finally, you’ll need to retrieve your Cloudflare API key, so that certbot can add the records that Let’s Encrypt needs to verify your ownership of the domain. To do that, you’ll need to click the ‘profile’ dropdown in the top right, then click ‘My Profile’:
Then, scroll down to the bottom of the page, where you’ll see links to get your API keys:
Click ‘View’ next to show your Global API Key. Naturally, make note of this – you’ll need it later on.
Like we did in our previous post, we’re going to use Docker to run certbot so that we can get our certificates without installing certbot and its dependencies. I’m doing this for the sake of simplicity, but if you’d rather avoid Docker, you’re free to install everything.
To use our API key, we need to have it wherever we’re running our Docker container from. In my case, I’m running it on my web server, but you can run it from any machine. Following the Cloudflare docs from Certbot, I used the following format for my credentials:
# Cloudflare API credentials used by Certbot
dns_cloudflare_email = firstname.lastname@example.org
dns_cloudflare_api_key = 0123456789abcdef0123456789abcdef01234567
I placed the file in my ~/.secrets/certbot directory, called cloudflare.ini. I’ll be able to mount this directory to the Docker container later, so it’ll be available to certbot running inside the container.
We’ll need to mount a few things so that our Docker container has access to them – first off, we need the credentials to be accessible. Second, we need to mount the location where the certificates will be placed, so that they persist when we shut down our container. And finally, we’ll mount the location where certbot places its backups. In the end, our Docker volume will look something like this:
I’ve recently spent some time reworking my dotfiles repo. Up to this point, I’ve mostly just taken what someone else has made available, changed it to work just enough for me, and left it at that. Finally, I’ve put in some time to update them so that they’ll work better for me.
As part of this transition, I’ve made the move from Antigen over to Zgen. It’s not really a big change, but I like the fact that with Zgen, you only run the update check when you want to, and not every single time that a new shell loads. Of course, this opens you up to the possibility of updating everything on a cron as well (which I’d highly recommend).
My dotfiles were originally taken from Holman‘s dotfiles repo. As you do with dotfiles repos, I’ve modified them quite a bit since I first copied his repo, and I need to do some updating to get some of the more recent stuff that he’s added, but for now they’re working for me.
Next up, you’ll need to add zgen (and install plugins) in your .zshrc file, like this:
if ! zgen saved; then
echo "Creating a zgen save"
zgen oh-my-zsh plugins/git
zgen oh-my-zsh plugins/sudo
zgen oh-my-zsh plugins/command-not-found
zgen load zsh-users/zsh-syntax-highlighting
zgen load zsh-users/zsh-history-substring-search
zgen load bhilburn/powerlevel9k powerlevel9k
zgen load junegunn/fzf
zgen load zsh-users/zsh-completions src
zgen oh-my-zsh themes/arrow
# save all to init script
Those are the plugins that I’m currently using, though I’m looking for more that might be useful. Now, you get all of these awesome things without having to install them all separately, plus whatever else you add. And because you’re using Zgen, not Antigen, they’ll only update (& check for updates) when you want them to, rather than every single time that you open your shell.
To update your plugins (which you should definitely do periodically), all you have to do is run zgen update. It really couldn’t be simpler!
Once I get more done with my dotfiles, I’ll throw more of it up here so you can check it out. Until then, I hope this is helpful!
At work recently, we had a need to generate diffs between two different directory trees. This is so that we can handle deploys, but it’s after we’ve already generated assets, so we can’t just use git for the diff creation, since git diff doesn’t handle files that aren’t tracked by git itself. We looked into using GNU’s diffutils, but it doesn’t handle binary files.
We tried investigating other methods for deploying our code, but thought it would still be simplest if there was some way to generate just a ‘patch’ of what had changed.
Luckily, one of the Staff Engineers at Etsy happened to know that rsync had just such an option hiding in its very long man page. Because rsync handles transferring files from one place to another, whether it’s local or remote, it has to figure out the diffs between files anyway. It’s really nice that they’ve exposed it so that you can use the diffs themselves. The option that does this is called ‘Batch Mode’, because you can use it to ‘apply’ a diff on many machines after you’ve distributed the diff file.
Creating the Diff
To create the diff itself, you’ll need to first have two directories containing your folder structure – one with the ‘previous’ version and one with the ‘current’ version. In our case, after we run each deploy, we create a copy of the current directory so that we can use that as our previous version to build our next diff.
Running that command will give you two files, diff and diff.sh. You can just use the .sh file to apply your diff, but you don’t have to. As long as you remember to use the same flags when applying your diff, you’ll be fine. You can also use any filename that you want after the =.
Also, it’s important to note that running this command will update/deploy/previous to the contents of /deploy/current. If you want to keep /deploy/previous as-is so that you can update it later, use --only-write-batch instead of just --write-batch.
Applying the Diff
Next up, you’ll want to distribute your diff to whatever hosts are going to receive it. In our case, we’re uploading it to Google Cloud Storage, where all the hosts can just grab it as necessary.
On each host that’s applying the diff, you’ll want to just run something like the following:
rsync --read-diff=/path/to/diff /deploy/directory
Remember, you need to use the same flags when applying your diff as you did when you created your diff.
In our testing, this worked well for applying a diff to many hosts – updating around 400 hosts in just about 1 minute (including downloading the ~30MB diff file to each host).
This will fail if the diff doesn’t apply cleanly. So, essentially, if one of your hosts is a deploy behind, you should make absolutely sure that you know that, and don’t try to update it to the latest version. If you try to anyway, you’ll probably end up with errors in the best case, or a corrupt copy of your code in the worst case. We’re still working on making our scripts handle the potential error cases so that we don’t end up in a corrupt state.
I hope this is helpful to you! If you’ve got any thoughts, questions, or corrections, drop them in the comments below. I’d love to hear them!
In Ruby on Rails, it’s easy to build custom functions to calculate something and then display the result in your views. While this simplicity is nice, it doesn’t come without its drawbacks.
Recently, when working on a simple app, I came across a situation where loading a page was taking 0.5 seconds. This may not sound like a lot (and wouldn’t be for most sites), but in an app as simple as mine, it’s a sign that something is taking way longer than it should. Luckily, it wasn’t too difficult to determine what it was.
Let’s start with an example: say you’re building an application that will contain purchases from a grocery store. You probably want to link the items sold in a purchase with the record from that purchase, right? Well, somewhere you’re going to have to calculate the total. Of course, I’m assuming that you don’t want the customer to calculate the total.
You could calculate the total every time that you need to load the record of the purchase, but first let’s walk through what would be happening when you calculated the total. If there are, say, 30 items in that purchase, you’ll need to load every single one of those items so that you can grab the price (we’re assuming prices don’t change for this example) and add them all together.
As you might imagine, this isn’t a very efficient way to go about things. We’d rather offload some of that computation (that would be happening an awful lot) to the disk, instead. After all, it’s generally easier to store a few bytes than spend valuable CPU time recalculating it every time you need it.
In my case, that’s exactly the sort of thing that was happening. I was working to calculate a field that wouldn’t change often but that involved loading lots of links to other records. On top of that, it was going to be loaded pretty often. It’s much more efficient for me to just store that value than to calculate it for every request.
You’ll need to add a new field to your database, which means you’ll need to add a database migration, something like this:
After you run your migration (rails db migrate), you’ll have your new field. Now, if you generated all your scaffolding, that’d be showing up in your user interface. That’s not what you want to do, though, since we’re trying to make this easier for your users and calculate it on their behalf.
Thus, we’re going to add something like the following to our model:
sum_value = x + y # Whatever you need to do here to calculate
self.total = sum_value
Now that method will run automatically before the record is saved, and place our calculated value into the total value, which means it’ll end up there in the database, as well.
Like I said, just how much benefit (if any) you’ll get out of this depends on your exact circumstances, but in my case it reduced a 500 ms page load to around 100 ms, which is clearly a substantial improvement.
If you’ve got any questions, drop them in the comments, and I’ll do my best to answer them!
I know I’m really late to the party, but I’m just finally creating something from scratch in Ruby on Rails. I’ve dealt a tiny bit with Rails before, but it was mostly just in helping my brother with CSS stuff, which obviously isn’t working on the Rails backend.
Anyway, now that I’ve started working with Rails (to build a simple app for my wife), I’ve found myself needing to learn how to do things in Rails 5. The problem is, a lot of things have changed in Rails, but most Google search results for ‘Rails …’ or ‘Ruby on Rails …’ end up with articles that are at least several years old. It’s hard to figure out what actually applies to Rails 5, vs. any other version of Rails. Thus, I’ve decided to write up some of my findings so that they’re hopefully helpful for someone else.
I’ll assume that you already know how to create your initial models, views, and controllers. If you don’t, check out this guide to get started.
Creating Many:Many Associations
Creating links between your records is pretty straightforward. I’m going to be using hypothetical ‘products’ and ‘purchases’ tables, which aren’t necessarily a perfect use-case for this, but they’re good enough. We can use Rails’s simple generator to make a link between our tables:
All that’s going to do is generate a migration file (if you want those indexes, uncomment the two t.index lines):
class CreateJoinTableProductsPurchases < ActiveRecord::Migration[5.2]
create_join_table :products, :purchases do |t|
# t.index [:product_id, :purchase_id]
# t.index [:purchase_id, :product_id]
After updating our database with the migration (rails db:migrate), we just need to add a has_and_belongs_to_many to each of our models:
class Product < ApplicationRecord
class Purchase < ApplicationRecord
I know that using has_and_belongs_to_many may be going out of favor, but I haven’t had time to look at the alternative just yet, so I’m sticking with HABTM for now.
Creating Many:Many Links in the UI
Now you’ve got a link between your two models, but no convenient way to create any links! That’s what we’ll be focusing on next. Luckily, it’s pretty easy, you just need to know what to do.
Showing Many:Many Forms
First off, in the form for the ‘products’ side of our relationship, we’ll simply add the following:
If you’re using Rails’s scaffold generator, you’ll add it to app/views/<model_name>/_form.html.erb. This creates a list of all the ‘products’ (in the example code) in the form for a purchase, and allows you to select multiple products.
If you’d like, you can add similar code to the other side of your relationship. In my case, I’m just adding it on one side. The form will end up looking something like this:
You can use ‘Ctrl/Cmd+click’ to select more than one item in the list, but that’s really all there is to it. It’s a very basic form, but it’ll do the trick.
Of course, you can do something more advanced, but we’re just covering the basics here. Ideally, I’d like a multi-select autocompleting textbox. Hopefully some day in the near future I’ll be able to put a guide up here on how to do that.
Saving Many:Many Relationships
Now, to make our selections actually save, we’ve got to add something to the controller. Since I’m only allowing the creation of links from the ‘Purchase’ UI, I put this code in my app/controllers/purchases_controller.rb file, in both the create and the update methods:
params[:purchase][:product_ids].each do |product_id|
product = Product.find(product_id)
@boarding.products << product
And that’s all there is to it! When you either create or update a ‘purchase’, your selection for its ‘products’ will be saved.
Viewing Many:Many Links in the UI
So now that we’re able to easily create links, it would be nice if we could view those links. So let’s make it possible to see the list of ‘products’ in a ‘purchase’. Using the Rails scaffolding, we’re going to want to expose these in both the ‘list’ view and the ‘single’ view for each side of our link. You can pick if the same applies to you.
In our app/views/purchases/index.html.erb, we’re going to add the following to a new <td> that will display the ‘products’ in each ‘purchase’. Don’t forget to add an associated <th> in the header, as well.
<%= purchase.products.map(&:name).join(', ') %>
Then, upon loading /purchases, you’ll see something a lot like this:
Next up, we need to add some very similar code to our app/views/purchases/show.html.erb file (but note the @ in the name of the purchase variable:
<%= @purchase.products.map(&:name).join(', ') %>
And then you’ll have something like this when you view your ‘purchase’ (at a URL like /purchases/2):
That’s it! As I said, this isn’t a perfect use-case for a many:many relationship, but it’s at least an overview of how to create the UI elements that will allow you to save and view such relationships.
If you want to take a look at the code in its entirety, it’s available on GitHub. And, if you’ve got any tips or comments, drop them in the ‘comments’ section below! That’s what it’s there for!
This is going to be a quick post – but I wanted to put it here for my own reference, since it’s something I have to look up pretty often. I might as well make my notes about it public so that others can benefit, too.
What are ‘Contexts’?
In Kubernetes, a Context is essentially the configuration that you use to access a particular cluster & namespace with a user account. In most cases, this will be your user account, but it could also be a service account.
In my particular case, there are at least a few Kubernetes clusters that I need to access pretty regularly. We have one in our data center and two or three different clusters (depending on the day) configured in GCP to work on our migration there. When I need to work in one cluster, I need to remember how to activate the context that grants me access to that cluster.
This will show all your configured contexts in Kubernetes. I included the | tr ... to replace the spaces with newlines so that it’s easier to parse the results. This way, you can easily see the exact names of your contexts, so that you can easily switch between them.
Show your Current Context
kubectl config current-context
This just shows your current context. It’s pretty self-explanatory, but I often forget the exact syntax that lists my context.
Set your Context
kubectl config use-context <context_name>
And this, not surprisingly, sets your context. So if you need to switch from your minikube context to your gcp-project-cluster-context, you just use this nifty command, and suddenly your commands are pointing at an entirely different cluster.
As of last month (March 8th-ish, 2018), Let’s Encryptsupports wildcard certificates! This is great news, because it means that those of us who like using tons of subdomains can now get one cert for all our subdomains, rather than having to get a cert for every single subdomain.
As you may know, Certbot is the tool provided by the EFF that you use to interact with and issue certs from Let’s Encrypt. It used to be called letsencrypt-auto, but when the EFF took it over, it switched names to Certbot.
Now, it’s not quite as easy to get wildcard certs as it is to get normal certs – mainly because there are some prerequisites. The nice thing is, some of these prerequisites make it easier to issue and renew certificates without temporarily disabling your web server.
The certbot docs aren’t super clear about a lot of this, so you have to do some digging, but essentially it boils down to this:
Wildcard certificates are only available if you use the ‘DNS’ method of verification.
DNS verification requires that you use a DNS provider with a supported API. This is because you need to be able to quickly create TXT records so that Let’s Encrypt can verify them. You can do this manually, but it seems like a pain. The supported DNS providers for automatic updates are:
I’ve decided to just go with Google Cloud DNS, because I’m already doing some other stuff in Google Cloud Platform, and it’s really quite affordable for what I need. Sure, it’s not as cheap as just using the DNS that your registrar provides, but I know mine doesn’t provide an API, especially not one supported by certbot.
Setting up Google Cloud DNS
To get Google Cloud DNS set up, you’ll obviously need a Google account. If you don’t have one, well, I’ll leave it to you to get one. Then you’ll need to set up Google Cloud Platform – once again, I’ll leave that to you. You’ll also need to set up a project for your DNS records. If you already have a project, you can feel free to use that.
Set up a Service Account
Next up, you’ll need to set up a ‘Service Account’ that will let you access the GCP DNS API with restricted permissions, so that you can safely put the credentials on a box that’ll handle your renewals without having to fully authenticate with GCP yourself. To start out, click the ‘hamburger’ menu on the left, then find ‘IAM & admin’, and finally ‘Service Accounts’:
Once there, you’re going to want to click the ‘Create Service Account’ button at the top. Currently, the Cloud DNS permissions and whatnot are in beta, so that means that while you could create a custom role that would have exactly the permissions that you need, those are subject to change and there’s a decent chance that you’d need to recreate your service account later. To avoid this, I just made my new service account into a ‘DNS Administrator’. It’s got more permissions than I strictly need, but I’m not super worried about that.
After you give your service account a name, you’ll want to check the ‘Furnish a new private key’ box so that you can download the credentials file that you’ll need to access the API later:
When you create your account, it should automatically download the JSON file with the credentials. I copied that key to the server I’m using to issue certificates (my good ‘ole Linode server!) so that I can use it later.
Creating your DNS Zone
Naturally, to do anything with DNS, you need to have a domain to do something with. You’ll have to point your registrar to Google/your DNS provider of choice before you can actually issue a wildcard certificate.
I set up my domain in Google Cloud DNS before I switched anything at my domain registrar, so that I wouldn’t have to worry about any downtime where my site was unreachable. To do this, you’ll first want to access the Cloud DNS control panel in GCP:
Next up, click the ‘Create Zone’ button at the top of the console. Then, you’ll enter the information for your domain:
Then you’ll create some records for your domain. At the very least, you’ll probably want one A record pointing to your server, but you can also create subdomains or whatever else you want. If you’re copying your config from somewhere else, put in all the records that you had on your previous provider. I had quite a few records, but even so it didn’t take very long.
Finally, you’ll update the nameservers to point to your DNS provider. I use a few different registrars, but they all make this part pretty easy. In this particular case, I was using Hover – all I had to do was click their ‘Edit Nameservers’ link near the domain I wanted to adjust, and put in Google’s nameserver addresses:
You’ll need to put in whatever your DNS provider wants, but if you’re using Google, the above should work. Once you’ve done that, you might have to wait a few minutes for your DNS to switch over to your new provider. It only took a couple of minutes in my case. I used the DNS lookup tool at MxToolbox to figure out when it switched. It shows your nameserver at the bottom, like this:
Once that’s updated to your new provider, you’re ready to get issuing certificates!
Using Docker to Issue the Certificates
Now, we can do something really nifty here to renew our accounts. Instead of installing certbot-auto on our server, we can just always use the latest up-to-date version in a preconfigured, lightweight Docker container. Sure, this requires you having Docker installed, but who doesn’t these days? If you’d rather, you can do this all manually with your own certbot-auto installation, but I chose to go the Docker route, for simplicity’s sake.
Now it’s actually pretty simple to just run our Docker container. We just need to get the correct arguments, and we’ll be good to go. Note, this does require that you have locations set up where you want to put your certificates, otherwise they’ll just float off into the ether when your container shuts down.
Configuring our Volumes
So first up, let’s create the place we want our certificates to be placed. In my case, I’m going to stick with the default, since my server has already been storing Let’s Encrypt certificates there anyway: /etc/letsencrypt. This is where Let’s Encrypt stores all its configuration and certificates by default. If you’re putting it somewhere else, go ahead and create that directory. Keep note of it for later, naturally. Later, we’re going to map our Docker container to use that as a volume.
Next up, we need to create the place where Let’s Encrypt will store backups. I guess this isn’t strictly necessary (mine is empty anyway), but I figure it can’t hurt. By default it’s in /var/lib/letsencrypt. So we’ll also be mounting that.
Finally, we need to mount the directory where you put your GCP service account’s credentials. I’ve put mine in ~/.secrets/certbot and changed the name to google.json, but you can put it wherever you want and call it whatever you want, really.
So these are going to end up being arguments like this when we run our docker command:
The image that we’re going to be using (assuming you’re sticking with Google DNS) is certbot/dns-google. If you’re using another DNS provider, you can probably figure out pretty easily which image you’ll need.
In the command itself, we’ll also need to use the certonly command, signifying to certbot that we don’t want to have it try to actually install the certificates for us, we only want it to issue them. I personally prefer installing them myself anyway, and that would be very difficult and/or impossible to do from within a Docker container anyway.
We’re also telling certbot to use Google’s DNS with --dns-google, and we’re giving it the path to the credentials file with --dns-google-credentials <file-path>. The last thing we have to do is manually specify the Let’s Encrypt server that we’re using, because right now, wildcard certs are only supported by one server: --server https://acme-v02.api.letsencrypt.org/directory. That should do it for our arguments.
All this means that our full docker command will look like this:
Of course, if you’ve made any adjustments in the way you’re doing this, you’ll need to adjust the command itself. The final line is the important one, -d '*.russt.me, is telling certbot to issue a new wildcard certificate for *.russt.me. You’ll want to make sure you change it, because, well, I’d rather you not issue certificates for my domain. Plus, unless you’ve hacked my Google DNS account, you probably don’t have access.
With any luck, you’ll see some output that looks a lot like this:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator dns-google, Installer None
Obtaining a new certificate
/usr/local/lib/python2.7/site-packages/josepy/jwa.py:107: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead.
signer = key.signer(self.padding, self.hash)
Performing the following challenges:
dns-01 challenge for russt.me
Unsafe permissions on credentials configuration file: /secrets/google.json
Waiting 60 seconds for DNS changes to propagate
Waiting for verification...
Cleaning up challenges
- Congratulations! Your certificate and chain have been saved at:
Your key file has been saved at:
Your cert will expire on <date>. To obtain a new or tweaked
version of this certificate in the future, simply run certbot
again. To non-interactively renew *all* of your certificates, run
I’ve cleaned it up a bit, but that’s the gist. I chose to ignore the ‘Unsafe permissions’ warning, since I’m running this in a Docker container anyway and the file on my system actually does have the correct permissions.
I’ll leave the configuration and use of these certificates up to you. Hopefully, you have some idea of how to use the certificates on your own server. If not, there are guides all over that should help you out.
I’m extremely thankful to both the EFF and to Let’s Encrypt for enabling us all to issue free SSL certificates, and it’s even better now that they’re letting us issue wildcard certificates. If you’re also feeling thankful, head on over to the EFF’s donation page or Let’s Encrypt’s donation page and drop a donation for them. The web will thank you for it.
If you’ve got any questions or comments, feel free to drop them in the comments below! I’ll do my best to get back to you. Extra props if you correct an error or tell me a better way to do this.
Sometimes, you've just got a big codebase. Maybe you want to put that codebase into a Docker image. Sure, you could mount it, but sometimes that's either not an option or would be too much trouble. In those cases, it's nice to be able to cram your big 'ole codebase into images quickly.
To accomplish this, we're going to have 3 main layers in our image. Of course, you're welcome to add as many layers as you want, or even take one away. We'll just use 3 for demo purposes – and because that's what's worked in my situation.
The first image will be what we call the 'base' image. It's the one that we're going to copy our code into. On top of that, we'll build an image with our dependencies – it's our 'dependency' image. Finally, we've got our 'incremental' image, which just has our code updates. And that's it. So three images: base, dependency, and incremental.
Luckily, the first two images don't have to build quickly. They're the images that we use to prepare to build the incremental image, so that the incremental build can be quick when we need it.
The Base Image
This image is actually pretty simple to set up. Here's an example of a Dockerfile:
RUN yum -y install patch && yum -y clean all
COPY repo /var/repo
You'll notice that I'm installing the patch utility. That's because we're going to use this in the incremental image later on to apply a diff to our code. If you have any binary files in your image, you might want to install git instead, because git handles binary diffs, where patch doesn't.
Now, at the end there, we're doing something that you don't necessarily see in every Dockerfile. When we build this image, there are a few more things we should do. We're including the sha argument so that later on we can generate the right diff that we need to apply to get the latest code. We need to pass this in to the docker build command as a --build-arg, and this last bit of the Dockerfile will add that as a label to the image itself. You can see an example on Stack Overflow.
We also should avoid copying in parts of the codebase that we don't need in the image. For example, we probably don't need our .git/ folder in our Docker image. There are a couple of ways that we can accomplish this – we can either do a --bare checkout of our repo, or we can just delete the .git folder before we copy it in. I took the first approach, because it allows me to just update my repo and check things out again.
I used a bash script to handle all this, so that I don't have to remember everything and I don't have to worry about accidentally skipping things. Here's what my base image building script looks like, approximately:
# This allows you to pass in an environment variable, but sets a default.
# If the repo already exists, just update it. Otherwise, clone it.
if [ -d "$CODE_DIR" ]; then
echo "Found an existing git clone... Fetching now to update it..."
GIT_DIR=$CODE_DIR git fetch -q origin master:master
echo "No clone found. Cloning the entire repo."
git clone --mirror email@example.com:my/repo.git $CODE_DIR
# This grabs the sha we'll be building
BUILD_VERSION=$(GIT_DIR=$CODE_DIR git rev-parse master)
mkdir -p ./repo
# We clean the old directory to make sure it's a 'clean' checkout
rm -rf ./repo/*
# Check out the code
GIT_DIR=$CODE_DIR GIT_WORK_TREE=./repo git checkout $BUILD_VERSION -f
# Build the image
docker build --rm -t base-image:latest --build-arg sha=$BUILD_VERSION
docker push base-image:latest
So, it's not super simple (if you want to skip the .git/ folder), but I think you get the idea. Next, we'll move on to the dependencies image.
The Dependencies Image
This image is really only necessary if you have more dependencies to install. In my particular case, I needed to install things like php, sqlite, etc. Next up, I installed the composer dependencies for my project. If you're using something other than PHP, you can install the dependencies through whatever package manager you're using – like bundler or npm.
My Dockerfile for this image looks a lot like this (the particular incantations you use will depend on the flavor of linux you're using, of course):
RUN yum -y install \
&& yum -y clean all
RUN composer update
This image doesn't necessarily need to have a bash script to build it, but it all depends on what your dependencies look like. If you need to copy other information in, then you might just want one. In my case, I have one, but it's pretty similar to the previous script, so I won't bother to put it here.
The Incremental Image
Now for the fun part. Our incremental image is what needs to build quickly – and we're set up to do just that. This takes a bit of scripting, but it's not super complicated.
Here's what we're going to do when we build this image:
Update/clone our git repo
Figure out the latest sha for our repo
Generate a diff from our original sha (the one the base image has baked-in) and the new sha
When building the image, copy in and apply the diff to our code
To handle all of this, I highly recommend using a shell script. Here's a sample of the script that I'm using to handle the build (with some repetition from the script above):
# This allows you to pass in an environment variable, but sets a default.
# If the repo already exists, just update it. Otherwise, clone it.
if [ -d "$CODE_DIR" ]; then
echo "Found an existing git clone... Fetching now to update it..."
GIT_DIR=$CODE_DIR git fetch -q origin master:master
echo "No clone found. Cloning the entire repo."
git clone --mirror firstname.lastname@example.org:my/repo.git $CODE_DIR
# Get the latest commit sha from Github, use jq to parse it
echo "Fetching the current commit sha from GitHub..."
BUILD_VERSION=$(curl -s "https://github.com/api/v3/repos/my/repo/commits" | jq '..sha' | tr -d '"')
# Generate a diff from the base image
docker pull $BASE_IMAGE_NAME:$BASE_IMAGE_VERSION
BASE_VERSION=$(docker inspect $BASE_IMAGE_NAME:$BASE_IMAGE_VERSION | jq '..Config.Labels.sha' | tr -d '"')
GIT_DIR=$CODE_DIR git diff $BASE_VERSION..$BUILD_VERSION > patch.diff
# Build the image
docker build --rm -t incremental-image:latest -t incremental-image:$BUILD_VERSION --build-arg sha=$BUILD_VERSION
# And push both tags!
docker push incremental-image:latest
docker push incremental-image:$BUILD_VERSION
There are a few things of note here: I'm using jq on my machine to parse out JSON results. I'm fetching the latest sha directly from GitHub, but I could just as easily use a local git command to get it. I'm also passing in the --build-arg, just like we did for our base image, so that we can use it in the Dockerfile as an environment variable and to set a new label for the image.
On that note, here's a sample Dockerfile:
COPY patch.diff /var/repo/patch.diff
RUN patch < patch.diff
RUN composer update
And that's it! In my experience, this is pretty quick to run – it takes me about a minute, which is a lot faster than the 6+ minute build times I was seeing when I built the entire image every time.
I'm making some definite assumptions here. First, I'm assuming that you have a builder where you have git, jq, and docker installed. I'm also assuming that you can build your base and dependency images about once a day without time restraints. I have them build on a cron at midnight. Throughout the day, as people make commits, I build the incremental image.
This is a fairly straightforward method to build up-to-date images with code baked in quickly. Well, quickly relative to copying in the entire codebase every time.
I don't recommend this method if building your images quickly isn't a priority. In my case, we're trying to build these images and run our tests in under 5 minutes – which meant that a 5 minute image build time obviously wasn't acceptable.
Last week, I found myself needing to filter things alphabetically, using regex. Basically, this is because PHPUnit lets you filter what tests you run with regex, and we (we being Etsy) have enough tests that we have to split them into many parts to get them to run in a reasonable amount of time.
We've already got some logical splits, like separating unit tests from db-related integration tests and all that jazz. But at some point, you just need to split a test suite into, say, 6 pieces. When the option you have to do this is regex, well, then you just have to split it out by name.
Splitting Tests by Name? That's not Smart.
No, no it's not. But until we have a better solution (which I'll talk about in a later post), this is what we're stuck with. Originally, we just split the alphabet into the number of pieces we required and ran the tests that way. Of course, this doesn't result in even remotely similar runtimes on your test suites.
Anyway, since we're splitting things up by runtime but we're stuck with using regex, we might as well use alphabetic sorting. That'll result in relatively short regular expressions compared to just creating a list of tests.
To figure out where in the alphabet to make the splits for our tests, I downloaded all of our test results for a specific test suite and ran it through a parser that could handle JUnit-style output (XML files with the test results). I converted them into CSV's, and then loaded them into a Google Spreadsheet:
This made it trivial to figure out where to split the tests alphabetically to even out the runtimes. The problem was, the places where it made sense to split our tests weren't the easiest places to create an alphabetic split. While it would've been nice if the ranges had been, say, A-Cd or Ce-Fa, instead they decided to be things like A-Api_User_Account_Test or Shop_Listings_O-Transaction_User_B.
It's not easy to turn that into regex, but there is at least a pattern to it. I originally tried creating the regex myself – but quickly got in over my head. After about 100 characters in my regex, my brain was fried.
I decided that it'd be easier, quicker, and less error-prone to write a script that could handle it for me.
Identifying the Pattern
It's really quite simple once you break it down. To find if a String is between A and Bob (not including Bob itself), you need a String that meets the following conditions:
Starts with A or a, OR
Starts with B or b, AND:
The second character is A-M or a-m OR
The second character is O or o, AND:
The third character is A or a
In a normal 'ole regular expression, this looks like the following (ignoring all special characters):
Now, if we've got something that complicated just to find something up to Bob, you can likely figure out that the rule would get much longer if you have many characters, like Beta_Uploader_Test_Runner.
There's a recognizable pattern, but once again, it's pretty complex and hard for my weak human brain to grok when it gets long. Luckily, this is what computers are very, very good at.
Formulating the Regex
To get a range between two alphabetic options, you generally need 3 different regex rules. Let's say we're looking for the range Super-Whale. First, you need the range from the starting point to the last alphabetic option that still starts with the same letter. So, essentially, you need Super-Sz. The second thing you need is anything that starts with a letter between the first letter of the starting point and the first letter of the end point. So our middle range would be T-V. The last part needs to be W-Whale.
By way of an example, here's a more simple version of the first part – in this case, it's Hey-Hz, color-coded so that you can see what letter applies to which part of the regular expression:
Next up, we're using the same word as if it were the second part. In this case, H-Hey:
Since the middle part is super simple, I won't bother detailing that. With those three elements, we've got our regex range. Of course, there are some more details around edge cases and whatnot, but we'll ignore those for now. It's much simpler for the purposes of blog posts.
Doing some Test-Driven Development
I decided that the best way to make this, you know, actually work, was to write a bunch of tests that would cover many of the edge cases that I could hit. I needed to make sure that these would work, and writing a bunch of tests is a good way to do so.
This helped me know exactly what was going wrong, and I wrote more tests as I kept writing the code. For every method that I wrote, I wrote tests to go along with it. If I realized that I had missed any necessary tests, I added them in, too.
Overall, I'd say this significantly increased my development speed, and it definitely helped me be more confident in my code. Tests are good. Don't let anyone tell you otherwise.
The Code on GitHub
Of course, it doesn't make sense to restrict this code to just me. I couldn't find any good libraries to handle this for me, so I wrote it myself. But really, it only makes sense to make this available to a wider audience.