[TriLUG] DNS forwarder on BIND9

Aaron S. Joyner aaron at joyner.ws
Mon Mar 12 00:59:39 EDT 2007


Kevin Otte wrote:
> In BIND9, given a zone definition:
> 
> zone "someother.lan" {
>         type forward;
>         forwarders { 192.168.10.1; };
> };
> 
> Every once in awhile, due to a network blip or what have you, the
> forward request fails:
> 
> kjotte at mystic:~$ host router.someother.lan
> Host router.someother.lan not found: 3(NXDOMAIN)
> 
> The problem is that BIND9 is caching that negative result rather than
> attempting a new lookup when asked.  As such, I get the NXDOMAIN error
> long after the network issues have subsided.
> 
> How do I get it to not cache negative results?
> 
> -- Kevin

BIND doesn't get an NXDOMAIN when it can't talk to the remote host, at
most it gets an ICMP host/net unreachable, so it doesn't have a lot to
go on, in terms of how long to negatively cache.  This RFC defines the
behavior in that case.  Take a moment to read that section.
http://www.faqs.org/rfcs/rfc2308.html, section 7.2.

Still, you probably do want it to negatively cache, so that it doesn't
go berzerk and choke and die if it gets flooded with queries for a host
that doesn't exist.  This also has the nice side benefit of allowing
most apps to fail quickly and gracefully in the face of no upstream
connectivity to that host.  It's more a question of how quickly you want
it to recognize that the upstream has returned, vs load in querying
against it.  As per the RFC, that convergence should take no more than 5
mins after the connection returns, is that in line with what you're
seeing?  If so, is 300 seconds too long to wait?  I'm sure it depends on
the application as to if it is, or not.  Sadly, I don't know of a way
off-hand to tweak that value.  There might be one, but some quick
searching didn't turn up anything obvious.

Two potential answers of how to reduce the convergence time:
- You can clamp the size of a negatively cached result globally, or per
view, by tweaking the max-ncache-ttl in options {}
http://www.zytrax.com/books/dns/ch7/hkpng.html#max-ncache-ttl
You can not change this on a per-zone basis, so changing this is
probably more broad than you want.

- Unfortunately, it may not actually clamp the type of NXDOMAIN you're
seeing (I don't know how that's implemented internally, I'd have to surf
the code, which I don't have time for tonight, I've got a sore throat
and want to sleep).  You *might* be able to tweak how long something is
negatively cached from the other end, via the SOA record's 'minimum'
definition, which BIND normally interprets as the NXDOMAIN cache time.
The question is, are they treating this as something that generates
NXDOMAINs, or more as the RFC implies marking the query tuple as
flat-and-out down, as opposed to making a cache entry of NXDOMAIN.  I
really suspect it's the later, given the language of the RFC.

If you dig into this further to discover how it's implemented internally
(either by using the source, or emperical testing), I'm quite interested
to know what your findings are.

Aaron S. Joyner



More information about the TriLUG mailing list