Category: Automation

  • Creating and Applying Diffs with Rsync

    Creating and Applying Diffs with Rsync

    At work recently, we had a need to generate diffs between two different directory trees. This is so that we can handle deploys, but it’s after we’ve already generated assets, so we can’t just use git for the diff creation, since git diff doesn’t handle files that aren’t tracked by git itself. We looked into using GNU’s diffutils, but it doesn’t handle binary files.

    We tried investigating other methods for deploying our code, but thought it would still be simplest if there was some way to generate just a ‘patch’ of what had changed.

    Luckily, one of the Staff Engineers at Etsy happened to know that rsync had just such an option hiding in its very long man page. Because rsync handles transferring files from one place to another, whether it’s local or remote, it has to figure out the diffs between files anyway. It’s really nice that they’ve exposed it so that you can use the diffs themselves. The option that does this is called ‘Batch Mode’, because you can use it to ‘apply’ a diff on many machines after you’ve distributed the diff file.

    Creating the Diff

    To create the diff itself, you’ll need to first have two directories containing your folder structure – one with the ‘previous’ version and one with the ‘current’ version. In our case, after we run each deploy, we create a copy of the current directory so that we can use that as our previous version to build our next diff.

    Your rsync command will look a lot like this:

    rsync --write-batch=diff /deploy/current /deploy/previous

    Running that command will give you two files, diff and diff.sh. You can just use the .sh file to apply your diff, but you don’t have to. As long as you remember to use the same flags when applying your diff, you’ll be fine. You can also use any filename that you want after the =.

    Also, it’s important to note that running this command will update /deploy/previous to the contents of /deploy/current. If you want to keep /deploy/previous as-is so that you can update it later, use --only-write-batch instead of just --write-batch.

    Applying the Diff

    Next up, you’ll want to distribute your diff to whatever hosts are going to receive it. In our case, we’re uploading it to Google Cloud Storage, where all the hosts can just grab it as necessary.

    On each host that’s applying the diff, you’ll want to just run something like the following:

    rsync --read-diff=/path/to/diff /deploy/directory

    Remember, you need to use the same flags when applying your diff as you did when you created your diff.

    In our testing, this worked well for applying a diff to many hosts – updating around 400 hosts in just about 1 minute (including downloading the ~30MB diff file to each host).

    Caveats

    This will fail if the diff doesn’t apply cleanly. So, essentially, if one of your hosts is a deploy behind, you should make absolutely sure that you know that, and don’t try to update it to the latest version. If you try to anyway, you’ll probably end up with errors in the best case, or a corrupt copy of your code in the worst case. We’re still working on making our scripts handle the potential error cases so that we don’t end up in a corrupt state.

    I hope this is helpful to you! If you’ve got any thoughts, questions, or corrections, drop them in the comments below. I’d love to hear them!

  • Building Large Docker Images, Quickly

    Building Large Docker Images, Quickly

    Sometimes, you've just got a big codebase. Maybe you want to put that codebase into a Docker image. Sure, you could mount it, but sometimes that's either not an option or would be too much trouble. In those cases, it's nice to be able to cram your big 'ole codebase into images quickly.

    Overview

    To accomplish this, we're going to have 3 main layers in our image. Of course, you're welcome to add as many layers as you want, or even take one away. We'll just use 3 for demo purposes – and because that's what's worked in my situation.

    The first image will be what we call the 'base' image. It's the one that we're going to copy our code into. On top of that, we'll build an image with our dependencies – it's our 'dependency' image. Finally, we've got our 'incremental' image, which just has our code updates. And that's it. So three images: base, dependency, and incremental.

    Luckily, the first two images don't have to build quickly. They're the images that we use to prepare to build the incremental image, so that the incremental build can be quick when we need it.

    The Base Image

    So the first image, the base image, has our codebase and any universal dependencies. The reason we don't want to put other dependencies in here is because we want to be able to use this 'base' image for any type of task that will require our code. For example, if we have JavaScript tests and, say, PHP tests, they'll probably require different dependencies to run. While we may have a huge codebase, we're still trying to stick to the idea that Docker images should be as small as possible.

    This image is actually pretty simple to set up. Here's an example of a Dockerfile:

    FROM centos:latest
    
    RUN yum -y install patch && yum -y clean all
    
    COPY repo /var/repo
    
    ARG sha=unknown
    LABEL sha=$sha

    You'll notice that I'm installing the patch utility. That's because we're going to use this in the incremental image later on to apply a diff to our code. If you have any binary files in your image, you might want to install git instead, because git handles binary diffs, where patch doesn't.

    Now, at the end there, we're doing something that you don't necessarily see in every Dockerfile. When we build this image, there are a few more things we should do. We're including the sha argument so that later on we can generate the right diff that we need to apply to get the latest code. We need to pass this in to the docker build command as a --build-arg, and this last bit of the Dockerfile will add that as a label to the image itself. You can see an example on Stack Overflow.

    We also should avoid copying in parts of the codebase that we don't need in the image. For example, we probably don't need our .git/ folder in our Docker image. There are a couple of ways that we can accomplish this – we can either do a --bare checkout of our repo, or we can just delete the .git folder before we copy it in. I took the first approach, because it allows me to just update my repo and check things out again.

    I used a bash script to handle all this, so that I don't have to remember everything and I don't have to worry about accidentally skipping things. Here's what my base image building script looks like, approximately:

    #!/bin/bash
    
    # This allows you to pass in an environment variable, but sets a default.
    CODE_DIR=${CODE_DIR:-'/var/tmp/repo.git'}
    
    # If the repo already exists, just update it. Otherwise, clone it.
    if [ -d "$CODE_DIR" ]; then
        echo "Found an existing git clone... Fetching now to update it..."
        GIT_DIR=$CODE_DIR git fetch -q origin master:master
    else
        echo "No clone found. Cloning the entire repo."
        git clone --mirror [email protected]:my/repo.git $CODE_DIR
    fi
    
    # This grabs the sha we'll be building
    BUILD_VERSION=$(GIT_DIR=$CODE_DIR git rev-parse master)
    
    mkdir -p ./repo
    
    # We clean the old directory to make sure it's a 'clean' checkout
    rm -rf ./repo/*
    
    # Check out the code
    GIT_DIR=$CODE_DIR GIT_WORK_TREE=./repo git checkout $BUILD_VERSION -f
    
    # Build the image
    docker build --rm -t base-image:latest --build-arg sha=$BUILD_VERSION
    
    docker push base-image:latest

    So, it's not super simple (if you want to skip the .git/ folder), but I think you get the idea. Next, we'll move on to the dependencies image.

    The Dependencies Image

    This image is really only necessary if you have more dependencies to install. In my particular case, I needed to install things like php, sqlite, etc. Next up, I installed the composer dependencies for my project. If you're using something other than PHP, you can install the dependencies through whatever package manager you're using – like bundler or npm.

    My Dockerfile for this image looks a lot like this (the particular incantations you use will depend on the flavor of linux you're using, of course):

    FROM base-image:latest
    
    RUN yum -y install \ 
        php7 \
        sqlite \
        sqlite-devel \
        && yum -y clean all
    
    WORKDIR /var/repo
    
    RUN composer update

    You'll probably notice in this image, we don't need to include the whole ARG sha=unknown thingy. That's because labels applied to a parent image are automatically passed to child images.

    This image doesn't necessarily need to have a bash script to build it, but it all depends on what your dependencies look like. If you need to copy other information in, then you might just want one. In my case, I have one, but it's pretty similar to the previous script, so I won't bother to put it here.

    The Incremental Image

    Now for the fun part. Our incremental image is what needs to build quickly – and we're set up to do just that. This takes a bit of scripting, but it's not super complicated.

    Here's what we're going to do when we build this image:

    1. Update/clone our git repo
    2. Figure out the latest sha for our repo
    3. Generate a diff from our original sha (the one the base image has baked-in) and the new sha
    4. When building the image, copy in and apply the diff to our code

    To handle all of this, I highly recommend using a shell script. Here's a sample of the script that I'm using to handle the build (with some repetition from the script above):

    #!/bin/bash
    
    BASE_IMAGE_NAME=dependency-image
    BASE_IMAGE_TAG=latest
    
    # This allows you to pass in an environment variable, but sets a default.
    CODE_DIR=${CODE_DIR:-'/var/tmp/repo.git'}
    
    # If the repo already exists, just update it. Otherwise, clone it.
    if [ -d "$CODE_DIR" ]; then
        echo "Found an existing git clone... Fetching now to update it..."
        GIT_DIR=$CODE_DIR git fetch -q origin master:master
    else
        echo "No clone found. Cloning the entire repo."
        git clone --mirror [email protected]:my/repo.git $CODE_DIR
    fi
    
    # Get the latest commit sha from Github, use jq to parse it
    echo "Fetching the current commit sha from GitHub..."
    BUILD_VERSION=$(curl -s "https://github.com/api/v3/repos/my/repo/commits" | jq '.[0].sha' | tr -d '"')
    
    # Generate a diff from the base image
    docker pull $BASE_IMAGE_NAME:$BASE_IMAGE_VERSION
    BASE_VERSION=$(docker inspect $BASE_IMAGE_NAME:$BASE_IMAGE_VERSION | jq '.[0].Config.Labels.sha' | tr -d '"')
    GIT_DIR=$CODE_DIR git diff $BASE_VERSION..$BUILD_VERSION > patch.diff
    
    # Build the image
    docker build --rm -t incremental-image:latest -t incremental-image:$BUILD_VERSION --build-arg sha=$BUILD_VERSION
    
    # And push both tags!
    docker push incremental-image:latest
    docker push incremental-image:$BUILD_VERSION

    There are a few things of note here: I'm using jq on my machine to parse out JSON results. I'm fetching the latest sha directly from GitHub, but I could just as easily use a local git command to get it. I'm also passing in the --build-arg, just like we did for our base image, so that we can use it in the Dockerfile as an environment variable and to set a new label for the image.

    On that note, here's a sample Dockerfile:

    FROM incremental-image:latest
    
    ARG sha=unknown
    ENV sha=$sha
    LABEL sha=$sha
    
    COPY patch.diff /var/repo/patch.diff
    RUN patch < patch.diff
    
    RUN composer update
    
    CMD ["run-my-tests"]

    And that's it! In my experience, this is pretty quick to run – it takes me about a minute, which is a lot faster than the 6+ minute build times I was seeing when I built the entire image every time.

    Assumptions

    I'm making some definite assumptions here. First, I'm assuming that you have a builder where you have git, jq, and docker installed. I'm also assuming that you can build your base and dependency images about once a day without time restraints. I have them build on a cron at midnight. Throughout the day, as people make commits, I build the incremental image.

    Conclusion

    This is a fairly straightforward method to build up-to-date images with code baked in quickly. Well, quickly relative to copying in the entire codebase every time.

    I don't recommend this method if building your images quickly isn't a priority. In my case, we're trying to build these images and run our tests in under 5 minutes – which meant that a 5 minute image build time obviously wasn't acceptable.

  • Alphabetic Filtering with Regex

    Alphabetic Filtering with Regex

    Last week, I found myself needing to filter things alphabetically, using regex. Basically, this is because PHPUnit lets you filter what tests you run with regex, and we (we being Etsy) have enough tests that we have to split them into many parts to get them to run in a reasonable amount of time.

    We've already got some logical splits, like separating unit tests from db-related integration tests and all that jazz. But at some point, you just need to split a test suite into, say, 6 pieces. When the option you have to do this is regex, well, then you just have to split it out by name.

    Splitting Tests by Name? That's not Smart.

    No, no it's not. But until we have a better solution (which I'll talk about in a later post), this is what we're stuck with. Originally, we just split the alphabet into the number of pieces we required and ran the tests that way. Of course, this doesn't result in even remotely similar runtimes on your test suites.

    Anyway, since we're splitting things up by runtime but we're stuck with using regex, we might as well use alphabetic sorting. That'll result in relatively short regular expressions compared to just creating a list of tests.

    To figure out where in the alphabet to make the splits for our tests, I downloaded all of our test results for a specific test suite and ran it through a parser that could handle JUnit-style output (XML files with the test results). I converted them into CSV's, and then loaded them into a Google Spreadsheet:

    Tests in Google Sheets

    This made it trivial to figure out where to split the tests alphabetically to even out the runtimes. The problem was, the places where it made sense to split our tests weren't the easiest places to create an alphabetic split. While it would've been nice if the ranges had been, say, A-Cd or Ce-Fa, instead they decided to be things like A-Api_User_Account_Test or Shop_Listings_O-Transaction_User_B.

    It's not easy to turn that into regex, but there is at least a pattern to it. I originally tried creating the regex myself – but quickly got in over my head. After about 100 characters in my regex, my brain was fried.

    I decided that it'd be easier, quicker, and less error-prone to write a script that could handle it for me.

    Identifying the Pattern

    It's really quite simple once you break it down. To find if a String is between A and Bob (not including Bob itself), you need a String that meets the following conditions:

    • Starts with A or a, OR
    • Starts with B or b, AND:
      • The second character is A-M or a-m OR
      • The second character is O or o, AND:
        • The third character is A or a

    In a normal 'ole regular expression, this looks like the following (ignoring all special characters):

    ^[Aa].*|^([Bb](.|[Oo]([Aa]|$)|$)).*$

    Now, if we've got something that complicated just to find something up to Bob, you can likely figure out that the rule would get much longer if you have many characters, like Beta_Uploader_Test_Runner.

    There's a recognizable pattern, but once again, it's pretty complex and hard for my weak human brain to grok when it gets long. Luckily, this is what computers are very, very good at.

    Formulating the Regex

    To get a range between two alphabetic options, you generally need 3 different regex rules. Let's say we're looking for the range Super-Whale. First, you need the range from the starting point to the last alphabetic option that still starts with the same letter. So, essentially, you need Super-Sz. The second thing you need is anything that starts with a letter between the first letter of the starting point and the first letter of the end point. So our middle range would be T-V. The last part needs to be W-Whale.

    By way of an example, here's a more simple version of the first part – in this case, it's Hey-Hz, color-coded so that you can see what letter applies to which part of the regular expression:

    Next up, we're using the same word as if it were the second part. In this case, H-Hey:

    Since the middle part is super simple, I won't bother detailing that. With those three elements, we've got our regex range. Of course, there are some more details around edge cases and whatnot, but we'll ignore those for now. It's much simpler for the purposes of blog posts.

    Doing some Test-Driven Development

    I decided that the best way to make this, you know, actually work, was to write a bunch of tests that would cover many of the edge cases that I could hit. I needed to make sure that these would work, and writing a bunch of tests is a good way to do so.

    This helped me know exactly what was going wrong, and I wrote more tests as I kept writing the code. For every method that I wrote, I wrote tests to go along with it. If I realized that I had missed any necessary tests, I added them in, too.

    Overall, I'd say this significantly increased my development speed, and it definitely helped me be more confident in my code. Tests are good. Don't let anyone tell you otherwise.

    The Code on GitHub

    Of course, it doesn't make sense to restrict this code to just me. I couldn't find any good libraries to handle this for me, so I wrote it myself. But really, it only makes sense to make this available to a wider audience.

    I've still got some issues to work out, and I need to make a Ruby gem out of it, but in the meantime, feel free to play around with the code: https://github.com/russtaylor/alphabetic-regex

    I'm really hoping that someone else will find this code to be useful. If anyone has any questions, comments, or suggestions, feel free to let me know!

  • HTTPS for free with Let’s Encrypt

    HTTPS for free with Let’s Encrypt

    If you’re anything like me, you’ve probably wanted to add HTTPS to your personal sites or apps, without having to shell out the money to get a certificate from a certificate authority. Of course, self-signed certificates were always an option, but it really kind of sucked to have to always either bypass warnings or install the certificate everywhere. Oh, that and the fact that my Android phone would warn me every time I booted that someone could be eavesdropping on me.

    For that reason, I haven’t ever used an SSL certificate on my sites, except for the occasional self-signed cert. Luckily, finally, Let’s Encrypt has come along to save the day (well, they’ve just entered public beta). They’re an automated and free certificate authority, and they make getting a certificate a breeze. Heck, if you have a matching configuration, the whole process is already automated.

    Installing Let’s Encrypt

    Installation is super easy – the official (and almost assuredly up-to-date) instructions can be found on the Let’s Encrypt website, but it’s definitely quite simple. In most cases, the easiest method is to clone the letsencrypt repository from github:

    git clone https://github.com/letsencrypt/letsencrypt
    cd letsencrypt
    

    You can see if it’s available via your distro’s package manager and install it that way, too. I haven’t found it in either yum or apt-get yet, so your mileage may vary.

    Getting a certificate

    If you happen to be running a supported configuration (as of right now, just Apache running on Debian or Ubuntu), then you can let it take care of all the dirty work for you: ./letsencrypt-auto --apache (though probably with sudo). That should take care of everything for you. I haven’t tried it personally, but I’d imagine it’s pretty sweet. If you try it and it works, then you can feel free to skip the rest of this article, because it probably won’t help you much.

    Otherwise, you’ll want to go the certonly route to obtain your certificates. It’s still super easy, and the configuration itself isn’t super difficult.

    To obtain your certificate, first ensure that your DNS A record is pointing to the correct server. If you don’t, well, then this certainly isn’t going to work.

    Webserver Considerations

    In order for Let’s Encrypt to verify that the DNS record in fact points to the server that you’re using, it needs to temporarily place a file in your webroot. This allows it to prove that you really do have control over the DNS records for that domain. You can do this one of two ways:

    1. By specifying the --webroot flag when you run letsencrypt
    2. By temporarily stopping your web server so that letsencrypt can spin up its own

    Depending on your site’s configuration, one may be easier (or less disruptive) than the other. In my case, the servers I’ve configured thus far haven’t had an easily accessible webroot, so I just shutdown my webserver (sudo systemctl stop nginx.service in my case, on CentOS 7) while I obtained the certificate.

    Get Your Certificate

    Once you’ve taken care of that, run ./letsencrypt-auto certonly -d example.com to obtain a certificate for your domain. Or, if you’re using the webroot flag, execute ./letsencrypt-auto certonly --webroot -w /var/www/example / -d example.com without shutting down your webserver. For more details, see the ‘How it works’ page on Let’s Encrypt’s site.

    Your certificate files will be placed in /etc/letsencrypt/live/example.com/ (naturally, replacing example.com with your address).

    Configuring your webserver

    I’ve gradually been configuring a new server with Ansible (which is a whole story in itself), and in the process, I’ve switched over to using Nginx as the primary web server with a reverse proxy setup to direct other requests where necessary. As a result, my direct experience with Let’s Encrypt is limited to Nginx, but I know it’s similar with Apache or whatever else you might be using.

    For me, setting up https for my sites just involved the following:

    1. Adding a redirect from the insecure HTTP site to the HTTPS site
    2. Adding a second directive in the server configuration for port 443
    3. Enabling SSL on that configuration
    4. Specifying the location of the certificate files

    In Nginx, the redirect looks like this:

    server {
        listen 0.0.0.0:80;
        server_name example.com;
        server_tokens off;
        return 301 https://$server_name$request_uri;
    }
    

    This tells Nginx to listen on port 80 for requests to example.com, then return a 301 code saying that the requested resource has moved permanently to the same address, but with https:// instead of http://.

    In similar fashion, configuring the SSL portion of the site is quite simple, and goes something like this:

    server {
        listen 0.0.0.0:443 ssl;
    
        server_name example.com;
        server_tokens off;
    
        ssl on;
        ssl_certificate /etc/letsencrypt/live/example.com/cert.pem;
        ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    
        root /var/www/example;
    }
    

    If you’re using Apache, you’ll probably need to use another one of the certificate files, but I’m not entirely sure which at this point. However, if you need more info about that, I’d recommend checking out Apache’s documentation on SSL.

    Final Notes

    There are a few more considerations with these certificates. Though they’re supported by all major browsers, which is awesome, there are a few places that are lacking – for example, I’ve noticed that GitHub’s web hooks don’t realize that the certificates are valid. It seems like this might be something related to OpenSSL, but I haven’t had time to investigate it yet.

    Also, these certificates expire after 90 days, so they need to be refreshed fairly often. However, since obtaining the certificates is so easy and it’d be super easy to script, it’s something that you could easily add as a cron job. It’s what I’m planning to do in the near future.

    Hopefully this was helpful. Feel free to chime in if you’ve got any questions or comments!

  • User Lists in Twitch’s IRC with ZNC

    Twitch’s IRC is, well, not like most IRC. They’ve modified it in so many ways, it’s difficult to still even consider it IRC. But hey, at least we still can connect to their chat services using IRC. I guess I shouldn’t complain.

    Anyway, one of their most recent changes was removing join/part messages completely from default chat, unless you specifically request permissions. That means that your IRC client won’t see a members list at all for a channel. Luckily, a simple command still gets you the join/parts, so that you can still at least see the members list.

    All you need to do is run this IRC command before joining channels to request the capability:

    CAP REQ :twitch.tv/membership
    

    Unfortunately for me, I’m using ZNC, which makes it more difficult to run this. However, once again, a little bit of googling found a solution. All you need to do is enable the perform module and have it execute the CAP REQ command above when you join the server. To enable it on ZNC, just run the following commands in any IRC client connected to your ZNC’s Twitch instance:

    /msg *status loadmod perform
    /msg *perform add CAP REQ :twitch.tv/membership
    /msg *status disconnect
    /msg *status connect
    

    After ZNC reconnects to twitch, you should be getting membership lists from your Twitch channels!

  • Automatic Deployment with Gitolite

    About Gitolite

    About a year and a half ago, I came across a great open-source git repository management tool called Gitolite. It’s a great tool for hosting and managing git repositories. It worked especially well for me because I run my own web server where I could set it up. If you’d like to give it a try or read up on it, I suggest you visit the Gitolite documentation.

    Why Automatic Deployment?

    Now, having worked in web development for at least a few years, I wanted a simpler way to automatically deploy my sites. Ideally, this should use Git. I’ve become quite fond of Git, so I’ve been using it for all my projects lately. Before I even open a text editor to start a new project, I’ve usually already typed git init (or, as it is with Gitolite, git clone).

    There’s something to be said for entering git push and having your commits reflected live on the web. It’s not something you want for every site, but it can certainly be useful when you want it.

    Getting it Set Up

    If you’ve managed to get Gitolite set up, you probably won’t have much trouble with getting the rest figured out. If you do happen to have some questions, I’ll do my best to answer them.

    In order to set up your automatic deployment, you’ll need direct access to the gitolite account on your server. As a matter of fact, having root access would probably be helpful. Because unfortunately, the autodeployment isn’t something you can just set up using the gitolite-admin repository (for some very good security reasons, I might add). With that in mind, follow along with the steps below.

    1. Add your web server user and your gitolite user to the same group. While this probably isn’t strictly necessary, it’s what I decided to do to make it work. Mainly, you just need your web server to be able to properly access the files that your gitolite user will be checking out.

      In my case, I simply created a new group and added both users to that group using usermod (check out usermod’s man page for more info). However, as I said, you can handle this however you’d like to, especially if your UNIX knowledge surpasses mine (which certainly wouldn’t surprise me).

    2. Create your repository and deployment directory.

    3. Change your deployment directory to allow the gitolite user access. This will depend on exactly how you handled things in step 1, but if you followed my pattern, I’d suggest changing the group of the directory to the group you added in step 1. In case you aren’t completely familiar with how you do this, you can try chown user:group directory -R on your target directory (More info here).

    4. Add the following to your /home/{gitolite_user}/.gitolite/hooks/common/post-receive script:

      if [ "$GL_REPO" == "gitolite/path/to/repo" ]; 
          git --work-tree /path/to/webroot --git-dir ./ 
          find /path/to/webroot -type f -print | xargs chmod 664 
          find /path/to/webroot -type d -print | xargs chmod 775
      fi
      
    5. Modify the script (from above) as needed. Basically, this script will run any time a repo is pushed to the server. If the repo matches the path you put in, it’ll execute the script within the if statement. That simply checks the repo out to the directory you specify, then adjusts the permissions on the files and subdirectories. You can modify the script as needed, because your specific case may need some special treatment.

    6. Push to your repo!

    Hopefully I’ve covered everything. If you try this tutorial and run into problems, let me know in the comments and I’ll do what I can to get you sorted out.

  • Monitoring a Web Page for Changes

    Bash Script

    Today, I found myself needing a way to monitor a page. I didn’t need anything fancy, just something that would alert me if a page changed in any way. So, I set up a simple bash script and cron job to monitor the page. For me, this was a perfect solution. Since I’ve got a server running 24/7, it’s always able to monitor the page. This wouldn’t work quite as well from, say, a laptop, but a server or always-on desktop work perfectly. But in truth, all you really need is a system capable of running cron jobs. So, without further ado, whip open your favorite text editor and plug this in there:

    #!/bin/bash
    pageAddress="http://example.com/index.html"
    pageHashFile=/path/to/pageHash.txt
    
    newhash=$(curl "${pageAddress}" | md5sum | awk '{ print $1 }')
    oldhash=$(cat $pageHashFile)
    
    # Check the hashes, send an email if it's changed
    if [ $newhash != $oldhash ]; then
        echo "${pageAddress}" | mail -s "Page changed!" [email protected]
    
        # Only update the hash if the email was successfully sent.
        returnCode=$?
        if [[ $returnCode == 0 ]] ; then
            echo "${newhash}" > $pageHashFile
        fi
    fi
    

    Of course, you’ll need to change the page address, the path to where you want the hash put, and the email so that they meet your situation. Finally, just add the script to your crontab, and you’re good to go! I’ve got mine set to run every 10 minutes. To put it in your crontab, run crontab -e, and insert the following (adapt it as needed):

    */10 * * * * bash /path/to/script.sh
    

    It could be adapted to be more versatile and enable monitoring multiple pages, but since I just needed one (at least for now), this does the trick nicely.