Geo Steered Load Balancing on a Budget

Or, how I manage and keep in-sync my jugaad infrastructure at Project Segfault :)

$(whoami)

Your average free-software enthusiast, who just so happens to run a hobbyist sysadmin project called Project Segfault.

What is Project Segfault

  • A few friends online who run useful free software services on a few VPSes and physical "servers" we have
  • We have a Pubnix, where people can get a place where they can host services and get access to a useful linux shell
  • Our main niché is Privacy Frontends, which are web services which allow people to access content from privacy-invasive services like YouTube and Reddit in a better UI and ad-free, with the added benefit of no tracking.
  • You can find more information about the project over at https://psf.lt

What is geo-steered load balancing?

  • Directing requests through different servers, which all have the same software with little changes between one-other, based on sender's geographic location (usually retrierved through IP address)
  • Allows for re-routing through another server, when the closest server isn't accessible
  • Can be done over DNS, or a web server

Why would you want to load balance in the first place?

  • Better performance since load is distributed between multiple servers
  • Faster load times for users since they use the servers closer to them
  • Makes sure single server downtimes don't cause infra-wide downtimes (when load balancing is done correctly, that is)
  • Gives you the motivation to actually automate your infrastructure :)
  • Great way to learn things

Why would you geo-steer load balancing?

  • Compared to normal systems that use round-robin and co., this provides better performance to users
  • More important regions can be upgraded separately, and/or given more servers, rather than routing same traffic through smaller servers in other regions

Why not use an external service for loadbalancing?

a) Privacy - I would rather not MitM my infra :)
b) We are poor

Should you put the load balancing in your webserver or DNS?

Project Segfault does DNS-based load balancing for a variety of reasons:

  • We don't have a powerful enough system to route all traffic of all our nodes through
  • It needs to go through multiple systems and the final IP to use cannot be cached like a DNS resolver can
  • The above point also results in less latency for the user
  • Its more resilient, since the DNS requests will be redirected to our slave node automatically if master isn't working
  • We miss some extra perks however, like more complex routing/TLS stuff and improved accuracy of user location (especially when user uses DNS resolvers who disable ECS)

What about SSL

  • We use DNS-01 authentication from letsencrypt, and for extra security, only allow DNS-01 authentication from letsencrypt alone using a CAA record
  • Since we use caddy, the fetching of certificates is extremely simple to configure
  • And we use RFC2136 (Dynamic DNS Updates) on our knot server, so the required DNS records can be added by caddy automatically.

What about services which have user data

  • We currently don't geo-steer services which store user data, since most of our data storing services don't really need geo-steering, and are also not built to work properly with it
  • However, this isn't that hard to achieve with something like Cockroach DB or even Postgres replication, if the service is written in a "modern" manner
  • You should also consider the risk of GDPR violations by syncing data with European users/servers

An overview of our infrastructure

About our DNS setup

  • We chose Knot DNS, the DNS resolver made by the .CZ NIC, for our DNS server because its easy to setup, and worked with standard zonefiles.
  • We have the DNS running on our normal servers, with master on EU and slave on US.
  • We keep zonefiles in sync using IXFR, with a hmac key and IP restrictions to keep everything secure.
  • We have RFC2136 setup as well, which allows caddy to automatically fetch certificates using letsencrypt's DNS-01 authentication.
    • By default, changes using RFC2136 are synced automatically using the same IXFR
  • By using knot-module-geoip, we also have GeoDNS, which is kept in-sync between master/slave using a hacky bash script.
  • You can learn more about our setup with Knot DNS at https://aryak.me/blog/01-knot

How we keep our servers in sync

  • We primarily use docker for most of our services, which makes it easy to deploy on multiple nodes
  • To manage our compose stacks, install required packages, set the needed configurations/sysctls, we use ansible
  • To manage and setup our docker stacks, we use our own ansible role, gi-yt/ansible-docker-compose (https://katb.in/docker-role), which also allows us to declaratively write the compose file in a variables file, and delpoy it to an actual compose file on the server
  • We have the main parts of setting up our servers divided into multiple playbooks, to make it easier to manage
  • You can find more information at ProjectSegfault/ansible (https://katb.in/psf-ansible)

Our playbooks

  • all: Applies to all our servers, and performs basic setup
    • Sets up the sysadmin users, and adds the correct shell configuration and ssh keys/config for the same
    • Sets up all required repositories, like debian backports, docker, goaccess etc.
    • Installs useful packages and enables their systemd services if necessary
    • Configures Sysctls to make the server work correctly with our usecases (like net.ipv4.ip_forward)
    • Sets up firewall configuration for select hosts
    • Sets up backups and adds all hosts to our self-hosted tailscale network (using external ansible roles)
  • node-specific playbooks (like pizza1): sets up services that are only run on that node (for example: tor, VPN, email)
  • privfrontends: Applies to our geo-steered servers, sets up the services
    • Uses our ansible docker compose role to setup all of our Privacy Frontends
    • Installs and sets up caddy with all required configuration from the template
    • Sets up fail2ban for privacy frontends commonly attacked by bots (like Libreddit)

Ansible Semaphore

  • We use semaphore, which is a WebUI for ansible, in order to run the playbooks
  • Makes sure we don't need to have all the credentials on our system
  • We also have most of our crons, for things like service restarts and docker pruning, done via ansible jobs, run as semaphore crons
  • All secrets are encrypted, so someone with access to the system cant really breach other systems

Tailscale

  • We have all our nodes and personal computers connected to an internal tailscale network, self-hosted on one of our servers using Headscale
  • For better security (and to prevent millions of lines of failed logins in auth.log), SSH and other administrative UIs are restricted to tailscale
  • As mentioned before, tailscale login is auto-provisioned on all servers using the tailscale ansible role

Thanks for Listening!

Any Questions?

Explain what is load balancing first, and then say what geo-steering it is

Mention lack of healthchecks but say it can be fixed using le shell scripts Also mention its better for normal load balancing since it can use more complex balancing algos which a simple DNS server cant

Show configuration

show example by digging lr.psf.lt from different regions

Show the vars.yml and how our docker role thing works

open semaphore and show