Anycast is a technology that allows multiple hosts to act as a single device, despite physical separation. It works at the routing level, using BGP in an unintended way. Anycast works well for DNS and other single-payload protocols. It may not work well for any TCP-based applications.
This primer is intended to be a general introduction to how anycast works, rather than a full technical description. In fact, it may be possible to tune BGP in such a way that TCP-based and other multi-payload protocols do work as intended, but doing so would require technical expertise beyond the scope of this document.
What is BGP?
Acronym expansion: Border Gateway Protocol
Suppose you have a physical network that has multiple connections out to the Internet (border interconnects), without using BGP. Each border interconnect is associated with a logical subnet.
If one connection fails, the logical subnet is unreachable, even though the machines connected to the physical network may be reachable through the other border interconnect(s).
Enter BGP. Now, instead of having one subnet per border interconnect, there is one logical subnet for the physical network. The subnet is reachable through multiple routes. This requires that the border routers of the network, as well as the border routers at the other ends of those connections, all cooperate with each other using BGP.
A packet destined for the subnet can be routed through any of the available border interconnects. Metrics can be applied to choose between available routes; rules can be based on speed, an arbitrary preference, failure states of some of the normally available routes, source address, etc.
The most important of those metrics is a failure state. If one of the connections fails, packets will usually be routed around to one of the other connections. This is all done seamlessly, with very little delay, so that as far as the application is concerned, no failure has occurred.
How does Anycast use BGP?
BGP assumes that all routes into the subnet lead to the same physical network. However, if this assumption fails, and if the application protocol only expects one packet in each direction (a test which the TCP transport layer automatically fails), then the application can still function. In this way, multiple DNS servers can all use the same IP address, in different locations, and answer queries as if the others did not exist. Queries sent to the shared address are routed to any of the available DNS servers to be answered.
Why doesn't Anycast work for TCP?
Consider a TCP transaction: The initiator sends a SYN packet, which is answered by a SYN/ACK; the initiator then sends an ACK back to the destination. With anycast, it is quite conceivable that the SYN and ACK packets could go to different instances of the anycast-routed shared IP address of the destination.
This is considered to be an uncommon situation - it is considered that most TCP connections to an anycast system will work fine. However, unless great care is taken, the number of failed connections will not be nil, and worse, for any client for which a TCP connection attempt fails, further attempts will likely also fail. This is generally considered unacceptable.
What happens when an authoritative name server fails, but its border interconnect is still up?
The answer is that anycast cannot easily and reliably correct for this. Instead, the operator should also make use of the redundancy in the DNS, installing multiple DNS servers at each node. Or multiple parallel anycast systems can be used, with each anycast system owning a different IP address (and a different DNS server hostname). This latter solution is employed by the root servers.
Zone transfers are sent over TCP. How does that work with anycast?
It does not. Instead, in addition to the anycast subnet, each name server should also have its own unique, non-anycast address that can be used for zone transfers (both inbound and outbound).
Resolving name servers with anycast addresses
If an anycast address is used as the source of a query, the response might be routed to a different node in the anycast system. Therefore, a resolving name server listening for inbound queries on an anycast address should also have its own unique, non-anycast address that can be used for outbound queries. (see previous diagram)
What about truncated responses that are retried over TCP?
This mostly affects authoritative name servers with very large RRSets, or DNSSEC-enabled zones. Since client machines are not known to retry queries in response to truncated answers, a resolving name server is only likely to see this problem if other name servers are forwarding to them.
Firstly, care should be taken to minimize the size of payloads to avoid this. Secondly, all possible steps should be taken to enable EDNS0, which enables use of UDP packets larger than 512 bytes (typically up to 4 kilobytes). Thirdly, if the preceding measures are insufficient, a non-anycast name server should be made available via an additional NS record; this name server will generally be slower, and therefore will not be preferred to the anycast-enabled name server clusters. However, if TCP connections are attempted to the anycast servers, and if those TCP connections fail (which should be rare), a resolving name server should fail over to the slow, non-anycast name server.