Fun times with DNS #1

Understanding DNS resolution in a Linux environment

Feb 20, 2024

I am personally not a big fan of theory-only style of learning, I like to build and test things after understanding the core concepts. Enter coding challenges , the wonderful newsletter that has great and detailed task descriptions and project ideas for various topics, such as writing your own load balancer or in our current case, writing your own dns forwarder. What I’ll be attempting to do in this series of posts is to walk you through my process of writing a DNS forwarder in go.

In the first entry I’ll explain the fundamentals of Domain Name System (DNS) and how DNS resolution looks like in a Linux environment, so that we have a better idea of what it is exactly that we’re trying to build.

If you’re familiar with DNS resolution on Linux you can safely skip this part, but if you want to understand the specifics or just want to brush up on the basics, then let’s get started!

Help me, I don’t know what DNS is

This section only serves as establishing a common ground for us to stand on, if you’re familiar with the bare minimum of how DNS works, skip right ahead.

DNS, at its core, is a fairly simple concept: IP addresses change frequently and are difficult to deal with from an operator’s perspective. The idea of DNS is that you have human designated and readable names associated with servers (domains) that you map to the IP address of the servers which are meant to be read by machines. In order to do lookups, you need to have this information stored, and those are the DNS records. Now that you’ve stored the records somewhere, you need an to be able to fetch it somehow, and this is the reason why DNS servers exist. That’s it.

At this point, you might be wondering: this doesn’t seem too difficult, right? Well, not exactly. DNS is unfortunately a relatively opaque area of networking, and although there are some standards, there are several RFCs governing the field with various implementations running all over the place. Just so that you have an idea about how fragmented this space is, take a look at the RFCs supported by a popular go dns pkg. The further we get into the implementation of the DNS forwarder, the more of this we’ll see in action, but for now understanding the core problem is enough.

Clarifying some of the naming ambiguity

When reading through the literature, the terms resolver, nameserver and cache seem to be used in a rather murky fashion.

Resolver
The resolver is a set of dynamic library routines used by applications that need to know machine names. The resolver's function is to resolve users' queries. The resolver queries a name server, which then returns either the requested information or a referral to another server. In its purest sense, this is the glibc stub (see next section) in a Linux environment.

Nameserver
Both in the local and remote sense the actual logic that responds to DNS questions. An example of a local nameserver is systemd-resolved, an example of a remote nameserver is your ISP’s nameserver or Google’s 8.8.8.8.

There is a bit of nuance tho when we’re talking about local nameservers, since they provide resolver services for DNS. The most accurate way of calling it would be a “stub-resolver” since it acts both as a local DNS server and a forwarder.
You will sometimes see ISP recursive nameservers called recursive resolvers.

Cache or resolver-cache
To simplify things this will refer to the caching mechanism of your local stub-resolver.

Understanding DNS query resolution at a local level

Asking the question: where’s the IP?

There is a core component in the Linux architecture called the GNU C Library (or glibc, for short), which is basically a consistent interface to system resources, such as files, networks, and memory, or in simplified terms, a stub (not a DNS stub tho, that’s a different element in our chain). The dns resolver library is part of this greater glibc lib(we’re not going to go into how this works exactly and what configuration files you work with, but if you’re interested a highly recommended piece of reading is the TLDP resolver lib section).

It’s important not to think about this as a standalone component or program or app in the DNS sense, these are basically functions of the glibc library that are called (like getaddrinfo(3)) from the resolver library, which is responsible for the system-wide DNS resolution process. Essentially, everything that needs name resolution for a domain, from the cli tool ping to the most advanced app you create or your browser will interact with this library at the end of the day in some shape or form.

Determining the course of action

When you’re trying to query dns records, the resolver lib’s functions don’t actually do the heavy lifting, they just ask questions. How that question is answered and from what source depends on a few things, but without getting waaay too deep into linux fundamentals, there’s a part of the glibc library that’s called Name Service Switch (NSS for short, see the docs here ) which reads a file under /etc/nsswitch.conf that looks something like this(yours might differ based on the configuration you have, omitting the other fields for brevity):

....
hosts:          files mdns4_minimal [NOTFOUND=return] dns myhostname
...

That hosts field will tell you the exact order of how DNS record is fetched and has a few intricacies, but to keep it short and simple, the resolver functions of glibc consult the NSS and follow the host resolution path described in the nsswitch.conf file. In most scenarios it will check the etc/hosts file for the given domain first, then the local OS level DNS cache(which can be glibc internal or the resolver cache), and if there are no records found for the given domain, then in our simplified example the DNS request is forwarded to one or potentially several nameservers.

Know your nameservers

A cruicial component in this resolution process is the resolv.conf file (not the be mixed up with resolved.conf which is systemd’s configuration file) usually located under /etc/systemd/resolv.conf (which is a simlink to /run/systemd/resolve/stub-resolv.conf, so if you edit the content it will be discarded on reboot).

There are a couple of key points in this:

You can have multiple DNS servers configured for your setup. An example would be

nameserver 1.2.3.4
nameserver 8.8.8.8
nameserver 127.0.0.53

These nameservers can either be remote (like 8.8.8.8 for google’s DNS server) or local (127.0.0.53, which is the address for our stub-resolver, in this case systemd-resolved).

This means that when a dns record is requested, the resolver-lib tries to make a call to 1.2.3.4 first, and if it returns a response then all is good and it’s sent back to the client (like your golang app, for example) as a response. If there’s a timeout however and no response, it will continue down the chain of nameservers, and in our current example the last entry is the systemd-resolved server address (which defaults to 127.0.0.53). It’s very important to keep in mind that the first response / no timeout breaks the chain, which is part of the core design (meaning that if 1.2.3.4 returns an answer or doesn’t time out there will be no calls made to 5.6.7.8 or 127.0.0.53).

The DNS server

We have verified using our local resources(cache and hosts file) that we don’t know what the IP belonging to the domain is, so now we actually have to use a nameserver.

As we’ve discussed previously, these nameservers can be either remote or local. You might be wondering: “why can’t I just always query the google DNS server and be done with it”?

There are a few reasons why it is beneficial:

Caching(more on that soon, leads to better performance),
Reduced dependency on a single upstream service like google’s DNS server,
More flexibility in terms of configuration (conditional forwarding etc.)
Most importantly: in some setups you will need access to internal DNS resources that (hopefully) noone outside your organization has access to.

There are several options for local nameservers (stub-resolvers) available, consult this post for various options available with configuration details, for our example we’ll stick with systemd-resolved.

Our very own local dns-server

We have finally arrived at our local resolver which is part of the greater systemd suite and does a few things, but the most important ones for us are:

Network name resolution:
Resolves domain names, IPv4 and IPv6 addresses, DNS resources, records, and services. This resolution can happen locally or by forwarding the DNS request message to a remote nameserver.
DNS caching:
Keeps the responses cached for the duration of the record’s TTL (which is stored in the header section of the response record, more on that in a later section when we look at the actual DNS record itself).

At this point it is worth it to look at how the internal cache of systemd-resolved looks like. After fetching it1 the output can be quite long, but here’s an example of a cached record:

CACHE:
      askubuntu.com IN A 104.18.37.100

Remember our first resolution step about checking the OS-level cache? This is where it comes from if you have systemd-resolved as your stub-resolver.

This is relevant for us further down the line when we’ll be doing our own caching in golang(and is is also a really nifty trick to have up your sleeve if you’re ever forced to resolve DNS issues on a linux distro).

Actually getting those pesky records

Systemd-resolved will follow the standard DNS resolution chain, starting from the ISP’s recursive server to the rootservers to authoritative DNS servers (if you want to see this in action try dig google.com +trace , we’re not really interested in anything after the recursive DNS call tho).

What the addresses of these recursive servers are is configured by another component called NetworkManager2 if it’s installed on the instance where sytemd-resolved is running, or configured either manually by an operator or in automated fashion.

At the end of this long and complex callchain is the answer to your original question, we finally have our IP.

So where did we end up after all this?

In essence we’ll be setting up our own little udp server (more on that later) which will forward DNS requests to the configured remote DNS server(in our case that will be google’s DNS server) and will cache the responses. The cached records will be returned to the client within their TTL, in every other case we will forward the request and return the response.

Thank you for bearing with me on this part, in the next chapter we’ll examine how a DNS request and response looks like in detail so we can implement that in golang.

In order to do that we’ll dump(it doesn’t terminate the process) the current cache entries to syslog with

sudo pkill -USR1 systemd-resolve

and then export the log messages to a file with

sudo journalctl -u systemd-resolved > ~/resolved.txt

you can make adjustments to it by editing

/etc/systemd/resolved.conf

but that is out of scope for this article. If you want to see how you’re local systemd-resolved is configured, run

systemd-resolve —status

InfraSpection

Discussion about this post