Larry Myers

Cover Image for Deploying Containers using Nomad and Traefik

Deploying Containers using Nomad and Traefik

Updated on

About a year ago I decided to stop managing deployment of my projects with custom bash scripts, cron jobs, systemd, and nginx. While it was simple to automate the process of copying tarballs to a remote server and restarting a systemd unit, setting up something new always required too much time spent in the terminal via ssh. I wanted to migrate to a solution that would take manual work out of deploying and operating software.

I did not want to run my own kubernetes cluster, nor did I want to have to pay the rather exorbitant costs of AWS Fargate or Google Cloud Run. I’m also hesitant about “serverless” options, as I’m not convinced the space is mature enough yet and guides you down a path of vendor lock-in.

The benefit of running your own Linux VMs and using open source tools is not being restricted by someone else’s platform. I had heard about Hashicorp Nomad previously, and put in the effort to learn how to run and manage a small set of infrastructure services to make operations easier for myself.

My first iterations used the full Hashicorp stack (Consul, Vault, and Nomad) and Traefik. The version 1.4 release of Nomad removed the need to use Consul for service discovery and Vault for secrets management. The combination of Nomad and Traefik provide a very compelling way to deploy and operate applications with minimal resources and overhead.

These two pieces of software get you an easy way to deploy docker containers with automatic ingress routing. Having TLS automatically managed via Let’s Encrypt is added bonus.

The end result of my efforts was a well documented and stable production infrastructure, with signifcant increases in reliability and uptime. As I transcribe my personal playbooks into this post I’m happy with the low mental overhead required to run this technology stack.

Table of Contents

Prerequisites

These instructions assume the following:

  1. You have the ability to run a Debian flavor of Linux on a VM.
  2. You’ve chosen a hosting provider that supports an external firewall configuration and private networking.
  3. You’re comfortable with the Linux commmand line.
  4. You have a domain and DNS setup already (I’m partial to using AWS Route53).

Most major hosting providers will support these requirements. (Linode, Digital Ocean, AWS, Google, Azure, etc.)

For these instructions I’m opting to use two distinct VMs to host the deployment infrastructure, both running Debian 11.

The Ingress VM runs public facing infrastructure:

  • Nomad (server)
  • Traefik
  • Docker registry (optional)

The Compute VM runs applications in docker containers:

  • Docker engine
  • Nomad (client)

The Ingress VM should allow inbound http traffic, with Traefik being the only service bound to the public network interface. All other services should bind to the private IP. The Compute VM should not be publicly addressable, and will only receive inbound traffic from the Ingress VM over the private network.

While you can run everything on a single VM, keeping your compute resources separate means you can treat them as emphemeral and scale up easily. It’s a bit overkill for running smaller projects, but it drastically simplifies considerations of security, backups, and logging.

Finally, this setup below is for small projects. Please read the docs where necessary to configure each piece for the scale you need to achieve.

Note: All the commands below make no assumptions of user privilege. Prepend sudo as needed. The configuration files here reference bot, an unprivileged user account with sudo permission. The Ingress VM uses the private IP 192.0.2.1, and the Compute VM uses the private IP 192.0.2.2 on the configured VLAN.

Note: For all config files below the name includes the absolute path where they should be placed.

Ingress VM

Setup

I would recommend configuring your firewall to only allow inbound traffic on ports 80 (http), 443 (https), and 22 (ssh). Allowing all outbound traffic should be fine.

Creating an unprivileged user accounts so things don’t have to run as root is strongly recommended.

Linode provides a good tutorial on the basics for securing your server.

Docker Registry

You’ll need a docker registry to use Nomad’s docker driver for deployment. Feel free to use a hosted registry, but I’ve found the distribution registry to be a straight forward and low cost way to self host a private docker registry. The caveat being you’re responsible for your own resource management and authentication. (i.e. make sure you have enough disk space and network bandwidth)

If you don’t want to host your own registry I’ve found the Google Container Registry to have the most competitive pricing for storage and network egress, as well as having good integration with the docker engine and popular CI/CD solutions.

Self Hosting Setup

First download and extract the latest release. I’ve chosen to extract the registry binary to /srv/registry/.

mkdir /srv/registry/
mkdir /srv/registry/data/
wget <url to distribution tarball>
tar -C /srv/registry xzf <path to distribution tarball>

Next you’ll need a config file and systemd unit.

/srv/registry/config.yml

version: 0.1
log:
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /srv/registry/data
http:
  addr: 192.0.2.1:5000
  headers:
    X-Content-Type-Options: [nosniff]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3

/etc/systemd/system/docker-registry.service

[Unit]
Description=Docker Registry v2
After=network.target

[Service]
Type=simple
User=bot
ExecStart=/srv/registry/registry serve /srv/registry/config.yml

[Install]
WantedBy=multi-user.target

Now start your registry:

systemctl enable docker-registry
systemctl start docker-registry

Your registry will be running at http://192.0.2.1:5000. We’ll delegate the ingress routing and authentication to traefik at a later step.

Nomad Server

Since we just have two VMs, Nomad will be running with a single server and client. This abandons all the provided guarantees of high availability and recovery, but it allows using low resource / low cost VMs. Make sure you take regular snapshots of the server to prevent data loss.

First add the Hashicorp debian repository to the VM.

Then install Nomad:

sudo apt-get update && sudo apt-get install -y nomad

/etc/nomad.d/nomad.hcl

data_dir   = "/opt/nomad/data"
bind_addr  = "192.0.2.1"
datacenter = "main"

addresses {
  http = "192.0.2.1"
  rpc  = "192.0.2.1"
  serf = "192.0.2.1"
}

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = false
}

acl {
  enabled = true
}

Next we’ll configure and start Nomad:

chown nomad:nomad /etc/nomad.d/nomad.hcl
systemctl enable nomad
systemctl start nomad
nomad acl bootstrap

Make sure you save the bootstrap management ACL token somewhere secure.

Then you can verify the server is up and running.

export NOMAD_TOKEN=<management token>
nomad server members

You’ll also need an ACL token to give your CI system if you do automated deploys.

deploy.policy.hcl

namespace "default" {
  capabilities = ["read-job", "submit-job"]
}

Then create the ACL token from your deploy policy:

nomad acl policy apply -description "Deploy Job Policy" deploy deploy.policy.hcl
nomad acl token create -name="Deploy Token" -policy=deploy -type=client

The secret ID returned by the nomad cli is what you can provide to your CI server as the NOMAD_TOKEN environment variable.

I’d highly recommend working through the ACL Tutorial for Nomad. While you can use the management token for most tasks, it’s not the best idea to have a superuser token floating around. Creating less privileged ACL tokens will reduce your surface area for security issues.

Traefik

Traefik serves three main purposes for our setup:

  1. Ingress server for our infrastructure services.
  2. Dynamic routing and ingress for our Nomad deployed applications, using Nomad service discovery.
  3. Automatically provision SSL certificates using Let’s Encrypt.

Download the latest linux_amd64 release: https://github.com/traefik/traefik/releases

tar xzf <release>.tar.gz
mv traefik /usr/local/bin/.
mkdir /etc/traefik
mkdir /var/log/traefik

/etc/traefik/traefik.yml

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"

providers:
  file:
    filename: /etc/traefik/traefik-routes.yml
  nomad:
    endpoint:
      address: http://192.0.2.1:4646
      token: <replace with Nomad ACL token>
    exposedByDefault: false

certificatesResolvers:
  letsencrypt:
    acme:
      email: <your email here>
      storage: /etc/traefik/acme.json
      tlsChallenge: {}

api:
  dashboard: true

accessLog:
  filePath: /var/log/traefik/access.log
  fields:
    headers:
      names:
        User-Agent: keep

log:
  filePath: /var/log/traefik/traefik.log
  level: INFO

Note: Remember to create and add the Nomad ACL token to the traefik.yml.

/etc/traefik/traefik-routes.yml

http:
  routers:
    dashboard:
      rule: Host(`traefik.your-domain.tld`) && (PathPrefix(`/api`) || PathPrefix(`/dashboard`))
      tls:
        certResolver: letsencrypt
      service: api@internal
      middlewares:
        - dashboard-auth
    docker-registry:
      rule: Host(`docker.your-domain.tld`)
      tls:
        certResolver: letsencrypt
      service: docker-registry
      middlewares:
        - docker-registry-auth
    nomad:
      rule: Host(`nomad.your-domain.tld`)
      tls:
        certResolver: letsencrypt
      service: nomad
  services:
    nomad:
      loadBalancer:
        servers:
          - url: "http://192.0.2.1:4646/"
    docker-registry:
      loadBalancer:
        servers:
          - url: "http://192.0.2.1:5000/"
  middlewares:
    dashboard-auth:
      basicAuth:
        usersFile: /etc/traefik/dashboard-users
    docker-registry-auth:
      basicAuth:
        usersFile: /etc/traefik/docker-registry-users
    response-compress:
      compress: {}

Note: Remove the Docker Registry configuration if you’re not self-hosting your own registry.

/etc/systemd/system/traefik.service

[Unit]
Description="Traefik Proxy"
Documentation=https://doc.traefik.io/traefik/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/traefik/traefik.yml

[Service]
User=traefik
Group=traefik
ExecStart=/usr/local/bin/traefik --configFile=/etc/traefik/traefik.yml
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
LimitNOFILE=65536
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

Next we’ll create the basic auth credentials for both the docker registry and traefik dashboard.

apt install -y apache2-utils

touch /etc/traefik/dashboard-users
chmod 600 /etc/traefik/dashboard-users
htpasswd -bB /etc/traefik/dashboard-users <user> <password>

touch /etc/traefik/docker-registry-users
chmod 600 /etc/traefik/docker-registry-users
htpasswd -bB /etc/traefik/docker-registry-users <user> <password>

Allow traefik to run as an unprivileged user and still bind to ports 80 and 443.

sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/traefik

Setup the systemd unit:

groupadd -g 500 traefik
useradd \
  -g traefik \
  --no-user-group \
  --no-create-home \
  --shell /bin/false \
  --system \
  --uid 500 \
  traefik

chmod 600 /etc/traefik/traefik.yml
chown -R traefik:traefik /etc/traefik
chown traefik:traefik /var/log/traefik

Finally we’ll start Traefik:

sudo systemctl enable traefik.service
sudo systemctl start traefik.service

Verify the following works: https://traefik.your-domain.tld/dashboard/

Note: The trailing slash for the Traefik dashboard page is important. If you get a 404 you’ve likely left it off the URL.

Compute VM

Setup

Follow the same basic setup as your Ingress VM, but block all inbound traffic with the firewall configuration. Your Compute VM only needs to allow outbound traffic. This should help reduce the attack surface of the Compute VM.

Docker Engine

I generally think Docker is good way to package and deploy software, with a few caveats:

  1. The docker engine updates iptables, and can silently open up ports you don’t want open on your hosts. It can be a rude suprise if you’re trying to use iptables as a firewall. This is one of the main reasons to deny all inbound traffic to the Compute VM with a cloud firewall.
  2. I have a healthy distrust of docker networking. I’d highly recommend running all your containers with host networking enabled. Nomad assigns random ports and prevents port collisions, so there’s less reason to use docker networking.

Follow the Docker install instructions for Debian 11. This will allow you to manage the docker engine via apt.

Then follow the post-install steps.

If you can run docker ps then you should be set to move on to the next step.

Nomad Client

First add the Hashicorp debian repository to the VM.

Then install Nomad:

sudo apt-get update && sudo apt-get install -y nomad

/etc/nomad.d/nomad.hcl

data_dir   = "/opt/nomad/data"
bind_addr  = "192.0.2.2"
datacenter = "main"

client {
  enabled = true
  servers = ["192.0.2.1"]

  host_network "private" {
    cidr = "192.0.2.0/24"
  }
}

plugin "docker" {
  config {
    auth {
      config = "/etc/docker-auth.json"
    }
  }
}

Note: The private host_network configuration allows us to specify it in our nomad jobs. This allows us to use host networking for Docker, and tells Nomad to use it for the IP.

Setup docker auth so the nomad client can pull images from our registry:

docker login <docker.your-domain.tld> -u <user>
cp ~/.docker/config.json /etc/docker-auth.json
chown root:root /etc/docker-auth.json
chmod 400 /etc/docker-auth.json

Note: If you’re using another registry that requires authentication follow the setup instructions to do so. It likely involves a credentials helper.

This configuration for the docker plugin should work for most use cases. Refer to the documentation if you have more specific requirements.

Finally start Nomad:

chown nomad:nomad /etc/nomad.d/nomad.hcl
sudo systemctl enable nomad
sudo systemctl start nomad

Sample Project

I’ve put together a small sample project that demonstrates how to create a Nomad job specification that uses the infrastructure described above.

https://github.com/larrymyers/nomad-hello-world

It should require minimal modification to test out deployment with Nomad and automatic ingress routing with Traefik. Once working the following should be possible using curl.

> curl -i https://hello.larrymyers.com

HTTP/2 200
content-type: text/plain; charset=utf-8
date: Sat, 15 Oct 2022 18:07:13 GMT
vary: Accept-Encoding
content-length: 12

Hello World!

Disaster Recovery

Take a moment to read the docs on how to take Nomad snapshots. It’s worthwhile to save snapshots to a secure location regularly in the event your Ingress VM has an unrecoverable issue.

You don’t need to do anything for the Compute VM, as it’s intentionally designed to be emphemeral.

Troubleshooting

To see current port bindings:

netstat -ntl

To see what is bound to a port:

sudo lsof -i :<port>

View current iptables:

sudo iptables -t nat -L -v -n --line-numbers