comp.protocols.tcp-ip.domains FAQ - Section 6
PROBLEMS
- Q6.1. No address for root server
- Q6.2. Error - No Root Nameservers for Class XX
- Q6.3. Bind 4.9.x and MX querying?
- Q6.4. Do I need to define an A record for localhost ?
- Q6.5. MX records, CNAMES and A records for MX targets
- Q6.6. Can an NS record point to a CNAME ?
- Q6.7. Nameserver forgets own A record
- Q6.8. General problems (core dumps !)
- Q6.9. malloc and DECstations
- Q6.10. Can't resolve names without a "."
- Q6.11. Why does swapping kill BIND ?
- Q6.12. Resource limits warning in system
- Q6.13. ERROR:ns_forw: query...learnt
- Q6.14. ERROR:zone has trailing dot
- Q6.15. ERROR:Zone declared more then once
- Q6.16. ERROR:response from unexpected source
- Q6.17. ERROR:record too short from [zone name]
- Q6.18. ERROR:sysquery: findns error (3)
- Q6.19. ERROR:Err/TO getting serial# for XXX
- Q6.20. ERROR:zonename IN NS points to a CNAME
- Q6.21. ERROR:Masters for secondary zone [XX] unreachable
- Q6.22. ERROR:secondary zone [XX] expired
- Q6.23. ERROR:bad response to SOA query from [address]
- Q6.24. ERROR:premature EOF, fetching [zone]
- Q6.25. ERROR:Zone [XX] SOA serial# rcvd from [Y] is < ours
- Q6.26. ERROR:connect(IP/address) for zone [XX] failed
- Q6.27. ERROR:sysquery: no addrs found for NS
- Q6.28. ERROR:zone [name] rejected due to errors
Question 6.1. No address for root server
Date: Wed Jan 14 12:15:54 EST 1998Q: I've been getting the following messages lately from bind-4.9.2.. ns_req: no address for root server We are behind a firewall and have the following for our named.cache file - ; list of servers . 99999999 IN NS POBOX.FOOBAR.COM. 99999999 IN NS FOOHOST.FOOBAR.COM. foobar.com. 99999999 IN NS pobox.foobar.com.You can't do that. Your nameserver contacts POBOX.FOOBAR.COM, gets the correct list of root servers from it, then tries again and fails because of your firewall.You will need a 'forwarder' definition, to ensure that all requests are forwarded to a host which can penetrate the firewall. And it is unwise to put phony data into 'named.cache'.
Q: We are getting logging information in the form: Apr 8 08:05:22 gute named[107]: sysquery: no addrs found for root NS (A.ROOT-SERVERS.NET) Apr 8 08:05:22 gute named[107]: sysquery: no addrs found for root NS (B.ROOT-SERVERS.NET) Apr 8 08:05:22 gute named[107]: sysquery: no addrs found for root NS (C.ROOT-SERVERS.NET) ... We are running bind 4.9.5PL1 Our system IS NOT behind a firewall. Any ideas ?This was discussed on the mailing list in November of 1996. The short answer was to ignore it as it was not a problem. That being said, you should upgrade to a newer version at this time if you are running a non-current version :-)Question 6.2. Error - No Root Nameservers for Class XX
Date: Sun Nov 27 23:32:41 EST 1994Q: I've received errors before about "No root nameservers for class XX" but they've been because of network connectivity problems. I believe that Class 1 is Internet Class data. And I think I heard someone say that Class 4 is Hesiod?? Does anyone know what the various Class numbers are?From RFC 1700:DOMAIN NAME SYSTEM PARAMETERS The Internet Domain Naming System (DOMAIN) includes several parameters. These are documented in [RFC1034] and [RFC1035]. The CLASS parameter is listed here. The per CLASS parameters are defined in separate RFCs as indicated. Domain System Parameters: Decimal Name References -------- ---- ---------- 0 Reserved [PM1] 1 Internet (IN) [RFC1034,PM1] 2 Unassigned [PM1] 3 Chaos (CH) [PM1] 4 Hesoid (HS) [PM1] 5-65534 Unassigned [PM1] 65535 Reserved [PM1] DNS information for RFC 1700 was taken fromftp.isi.edu
:/in-notes/iana/assignments/dns-parameters
Hesiod is class 4, and there are no official root nameservers for class 4, so you can safely declare yourself one if you like. You might want to put up a packet filter so that no one outside your network is capable of making Hesiod queries of your machines, if you define yourself to be a root nameserver for class 4.
Question 6.3. Bind 4.9.x and MX querying?
Date: Sun Nov 27 23:32:41 EST 1994If you query a 4.9.x DNS server for MX records, a list of the MX records as well as a list of the authorative nameservers is returned. This happens because bind 4.9.2 returns the list of nameserver that are authorative for a domain in the response packet, along with their IP addresses in the additional section.
Question 6.4. Do I need to define an A record for localhost ?
Date: Sat Sep 9 00:36:01 EDT 1995Somewhere deep in the BOG (BIND Operations Guide) that came with 4.9.3 (section 5.4.3), it says that you define this yourself (if need be) in the same zone files as your "real" IP addresses for your domain. Quoting the BOG:
... As implied by this PTR record, there should be a ``localhost.my.dom.ain'' A record (with address 127.0.0.1) in every domain that contains hosts. ``localhost.'' will lose its trailing dot when 1.0.0.127.in-addr.arpa is queried for;...The sample files in the BIND distribution show you what needs to be done (see the BOG).Some HP boxen (especially those running HP OpenView) will also need "loopback" defined with this IP address. You may set it as a CNAME record pointing to the "localhost." record.
Question 6.5. MX records, CNAMES and A records for MX targets
Date: Wed Jun 16 22:09:03 EDT 1999The O'Reilly "DNS and Bind" book warns against using non-canonical names in MX records, however, this warning is given in the context of mail hubs that MX to each other for backup purposes. How does this apply to mail spokes. RFC 974 has a similar warning, but where is it specifically prohibited to us an alias in an MX record ?
Without the restrictions in the RFC, a MTA must request the A records for every MX listed to determine if it is in the MX list then reduce the list. This introduces many more lookups than would other wise be required. If you are behind a 1200 bps link YOU DON'T WANT TO DO THIS. The addresses associated with CNAMES are not passed as additional data so you will force additional traffic to result even if you are running a caching server locally.
There is also the problem of how does the MTA find all of it's IP addresses. This is not straight forward. You have to be able to do this is you allow CNAMEs (or extra A's) as MX targets.
The letter of the law is that an MX record should point to an A record.
There is no "real" reason to use CNAMEs for MX targets or separate As for nameservers any more. CNAMEs for services other than mail should be used because there is no specified method for locating the desired server yet.
People don't care what the names of MX targets are. They're invisible to the process anyway. If you have mail for "mary" redirected to "sue" is totally irrelevant. Having CNAMEs as the targets of MX's just needlessly complicates things, and is more work for the resolver.
Having separate A's for nameservers like "ns.your.domain" is pointless too, since again nobody cares what the name of your nameserver is, since that too is invisible to the process. If you move your nameserver from "mary.your.domain" to "sue.your.domain" nobody need care except you and your parent domain administrator (and the InterNIC). Even less so for mail servers, since only you are affected.
Q: Given the example - hello in cname realname mailx in mx 0 hello Now, while reading the operating manual of bind it clearly states that this is *not* valid. These two statements clearly contradict each other. Is there some later RFC than 974 that overrides what is said in there with respect to MX and CNAMEs? Anyone have the reference handy? A: This isn't what the BOG says at all. See below. You can have a CNAME that points to some other RR type; in fact, all CNAMEs have to point to other names (Canonical ones, hence the C in CNAME). What you can't have is an MX that points to a CNAME. MX RR's that point to names which have only CNAME RR's will not work in many cases, and RFC 974 intimates that it's a bad idea: Note that the algorithm to delete irrelevant RRs breaks if LOCAL has a alias and the alias is listed in the MX records for REMOTE. (E.g. REMOTE has an MX of ALIAS, where ALIAS has a CNAME of LOCAL). This can be avoided if aliases are never used in the data section of MX RRs. Here's the relevant BOG snippet: aliases {ttl addr-class CNAME Canonical name ucbmonet IN CNAME monet The Canonical Name resource record, CNAME, speci- fies an alias or nickname for the official, or canonical, host name. This record should be the only one associated with the alias name. All other resource records should be associated with the canonical name, not with the nickname. Any resource records that include a domain name as their value (e.g., NS or MX) must list the canoni- cal name, not the nickname.This issue seems to go on and on and is discussed from time to time and does not seem to be listed as a clear-cut rule in an RFC. John Navas contributed the following section related to various RFCs on the point:>it is a bug in their setup.. MX or domains cannot point to cnames but to >the real name.. Are you sure? RFC 974 states ("Issuing a Query"): There is one other special case. If the response contains an answer which is a CNAME RR, it indicates that REMOTE is actually an alias for some other domain name. The query should be repeated with the canonical domain name. That seems to clearly indicate that MX records can point to CNAME records. RFC 1034 (3.6.2) suggests avoiding MX indirection: Domain names in RRs which point at another name should always point at the primary name and not the alias. This avoids extra indirections in accessing information. but does not prohibit it: Of course, by the robustness principle, domain software should not fail when presented with CNAME chains or loops; CNAME chains should be followed ... Then there's RFC 1912, which states (2.4): Don't use CNAMEs in combination with RRs which point to other names like MX, CNAME, PTR and NS. (PTR is an exception if you want to implement classless in-addr delegation.) For example, this is strongly discouraged: podunk.xx. IN MX mailhost mailhost IN CNAME mary mary IN A 1.2.3.4 [RFC 1034] in section 3.6.2 says this should not be done, and That's "should not" not "cannot". [RFC 974] explicitly states that MX records shall not point to an alias defined by a CNAME. But it doesn't, as noted above; in fact, just the opposite. Finally, there's RFC 2181 (10.3): The domain name used as the value of a NS resource record, or part of the value of a MX resource record must not be an alias. Not only is the specification clear on this point, but using an alias in either of these positions neither works as well as might be hoped, nor well fulfills the ambition that may have led to this approach. This domain name must have as its value one or more address records. Currently those will be A records, however in the future other record types giving addressing information may be acceptable. It can also have other RRs, but never a CNAME RR. The problem is that the "specification" is NOT "clear on this point" as noted above. RFC 2181 goes on to state: Searching for either NS or MX records causes "additional section processing" in which address records associated with the value of the record sought are appended to the answer. This helps avoid needless extra queries that are easily anticipated when the first was made. Additional section processing does not include CNAME records, let alone the address records that may be associated with the canonical name derived from the alias. Thus, if an alias is used as the value of an NS or MX record, no address will be returned with the NS or MX value. This can cause extra queries, and extra network burden, on every query. It is trivial for the DNS administrator to avoid this by resolving the alias and placing the canonical name directly in the affected record just once when it is updated or installed. This suggests that this is an issue of "goodness" (avoiding the extra lookup) rather than a real error. Continuing: In some particular hard cases the lack of the additional section address records in the results of a NS lookup can cause the request to fail. That would not seem to be the case here. To be clear, even though this does not appear to be an error per se, I would still recommend not using CNAME in an MX record given that many (most?) people will probably take RFC 2181 at face value with the result that some implementations may fail to properly resolve MX queries that return a CNAME.Question 6.6. Can an NS record point to a CNAME ?
Date: Wed Mar 1 11:14:10 EST 1995Can I do this ? Is it legal ?
@ SOA (.........) NS ns.host.this.domain. NS second.host.another.domain. ns CNAME third third IN A xxx.xxx.xxx.xxxNo. Only one RR type is allowed to refer, in its data field, to a CNAME, and that's CNAME itself. So CNAMEs can refer to CNAMEs but NSs and MXs cannot.BIND 4.9.3 (Beta11 and later) explicitly syslogs this case rather than simply failing as pre-4.9 servers did. Here's a current example:
Dec 7 00:52:18 gw named[17561]: "foobar.com IN NS" \ points to a CNAME (foobar.foobar.com)Here is the reason why:Nameservers are not required to include CNAME records in the Additional Info section returned after a query. It's partly an implementation decision and partly a part of the spec. The algorithm described in RFC 1034 (pp24,25; info also in RFC 1035, section 3.3.11, p 18) says 'Put whatever addresses are available into the additional section, using glue RRs [if necessary]'. Since NS records are speced to contain only primary names of hosts, not CNAMEs, then there's no reason for algorithm to mention them. If, on the other hand, it's decided to allow CNAMEs in NS records (and indeed in other records) then there's no reason that CNAME records might not be included along with A records. The Additional Info section is intended for any information that might be useful but which isn't strictly the answer to the DNS query processed. It's an implementation decision in as much as some servers used to follow CNAMEs in NS references.
Question 6.7. Nameserver forgets own A record
Date: Fri Dec 2 16:17:31 EST 1994Q: Lately, I've been having trouble with named 4.9.2 and 4.9.3. Periodically, the nameserver will seem to "forget" its own A record, although the other information stays intact. One theory I had was that somehow a site that the nameserver was secondary for was "corrupting" the A record somehow. A: This is invariably due to not removing ALL of the cached zones when you moved to 4.9.X. Remove ALL cached zones and restart your nameservers. You get "ignoreds" because the primaries for the relevant zones are running old versions of BIND which pass out more glue than is required. named-xfer trims off this extra glue.Question 6.8. General problems (core dumps !)
Date: Sun Dec 4 22:21:22 EST 1994Paul Vixie says:
I'm always interested in hearing about cases where BIND dumps core. However, I need a stack trace. Compile with -g and not -O (unless you are using gcc and know what you are doing) and then when it dumps core, get into dbx or gdb using the executable and the core file and use "bt" to get a stack trace. Send it to me <paul@vix.com> along with specific circumstances leading to or surrounding the crash (test data, tail of the debug log, tail of the syslog... whatever matters) and ideally you should save your core dump for a day or so in case I have questions you can answer via gdb/dbx.Question 6.9. malloc and DECstations
Date: Mon Jan 2 14:19:22 EST 1995We have replaced malloc on our DECstations with a malloc that is more compact in memory usage, and this helped the operation of bind a lot. The source is now available for anonymous ftp from
ftp.cs.wisc.edu
:/pub/misc/malloc.tar.gz
Question 6.10. Can't resolve names without a "."
(Answer written by Mark Andrews) You are not using a RFC 1535 aware resolver. Depending upon the age of your resolver you could try adding a search directive to resolv.conf.e.g. domain <domain> search <domain> [<domain2> ...]If that doesn't work you can configure you server to serve the parent and grandparent domains as this is the default search list."domain langley.af.mil" has an implicit "search langley.af.mil af.mil mil" in the old resolvers, and you are timing out trying to resolve the address with one of these domains tacked on.
When resolving internic.net the following will be tried in order.
internic.net.langley.af.mil internic.net.af.mil internic.net.mil internic.net.RFC 1535 aware resolvers try qualified address first.internic.net. internic.net.langley.af.mil internic.net.af.mil internic.net.milRFC 1535 documents the problems associated with the old search algorithim, including security issues, and how to alleviate some of the problems.Question 6.11. Why does swapping kill BIND ?
Date: Thu Jul 4 23:20:20 EDT 1996The question was:
I've been diagnosing a problem with BIND 4.9.x (where x is usually 3BETA9 or 3REL) for several months now. I finally tracked it down to swap space utilization on the unix boxes. This happens under (at least) under Linux 1.2.9 & 1.2.13, SunOS 4.1.3U1, 4.1.1, and Solaris 2.5. The symptom is that if these machines get into swap at all bind quits resolving most, if not all queries. Mind you that these machines are not "swapping hard", but rather we're talking about a several hundred K TEMPORARY deficiency. I have noticed while digging through various archives that there is some referral to "bind thrashing itself to death". Is this what is happening ?And the answer is:Yes it is. Bind can't tolerate having even a few pages swapped out. The time required to send responses climbs to several seconds/request, and the request queue fills and overflows. It's possible to shrink memory consumption a lot by undefining STATS and XSTATS, and recompiling. You could nuke DEBUG too, which will cut the code size down some, but probably not the data size. If that doesn't do the job then it sounds like you'll need to move DNS onto a separate box. BIND tends to touch all of its resident pages all of the time with normal activity... if you look at the RSS verses the total process size, you will always see the RSS within, usually, 90% of the total size of the process. This means that *any* paging of named-owned pages will stall named. Thus, a machine running a heavily accessed named process cannot afford to swap *at all*. (Paul Vixie continues on this subject): I plan to try to get BIND to exhibit slightly better locality of reference in some future release. Of course, I can only do this if the query names also exhibit some kind of hot spots. If someone queries all your names often, BIND will have to touch all of its VM pool that often. (Right now, BIND touches everything pretty often even if you're just hammering on some hot spots -- that's the part I'd like to fix. Malloc isn't cooperating.)Question 6.12. Resource limits warning in system
Date: Sun Feb 15 22:04:43 EST 1998When bind-8.1.1 is started the following informational message appears in the syslog...
Feb 13 14:19:35 ns1named[1986]: "cannot set resource limits on this system"What does this mean ?A: It means that BIND doesn't know how to implement the "coresize", "datasize", "stacksize", or "files" process limits on your OS.
If you're not using these options, you may ignore the message.
Question 6.13. ERROR:ns_forw: query...learnt
Date: Sun Feb 15 23:08:06 EST 1998The following message appears in syslog:
Jan 22 21:59:55 server1 named[21386]: ns_forw: query(testval) contains our address (dns1.foobar.org:1.2.3.4) learnt (A=:NS=) what does it mean ? A: This means that when it was looking up the NS records for the domain containing "testval" (i.e. the root domain), it found an NS record pointing to dns1.foobar.org, and the A record for this is 1.2.3.4. This is server1's own IP address, but it's not authoritative for the root domain. The (A-:NS=) part of the message means that it didn't learn these NS records from any other machine. You may have listed dns1.foobar.org in your root server cache file, even though it's not configured as a root server. \question 09jul:linuxq ERROR:recvfrom: Connection refused Date: Wed Jul 9 21:57:40 EDT 1997 DNS on my linux system is reporting the error \verbatim Mar 26 12:11:20 idg named[45]: recvfrom: Connection refusedWhen I start or restart the named program I get no errors. What could be causing this ?A: Are you running the BETA9 version of bind 4.9.3 ? It is a bug that does no harm and the error reporting was corrected in later releases. You should upgrade to a newer version of bind.
Question 6.14. ERROR:zone has trailing dot
Date: Wed Jul 9 22:11:51 EDT 1997If syslog reports "zone has trailing dot", the zone information contains a trailing dot in the named.boot file where it does not belong.
example: secondary domain.com. xxx.xxx.xxx.xxx S-domain.com ^Question 6.15. ERROR:Zone declared more then once
Date: Wed Jul 9 22:12:45 EDT 1997If syslog reports "Zone declared more then once",
A zone is specified multiple times in the named.boot file
example: secondary domain.com 198.247.225.251 S-domain.com secondary zone.com 198.247.225.251 S-zone.com primary domain.com P-domain.com domain.com is declared twice, once as primary, and once as secondaryQuestion 6.16. ERROR:response from unexpected source
Date: Wed Jul 9 22:12:45 EDT 1997If syslog reports "response from unexpected source", BIND (pre 4.9.3) has a bug if implimented on a multi homed server. This error indicates that the response to a query came from an address other then the one sent to. So, if ace gets a response from an unexpected source, ace will ignore the response.
Question 6.17. ERROR:record too short from [zone name]
Date: Mon Jun 15 21:34:49 EDT 1998If syslog report "record too short from [zone name]", The secondary server is trying to pull a zone from the primary server. For some reason, the primary sent an incomplete zone. This usually is a problem at the primary server.
To troubleshoot, try this: dig [zonename] axfr @[primary IP address] Often, this is caused by a line broken in the middle.When the primary server's "named.boot" file contains "xfrnets" entries for other servers and the secondary is not listed, this error can occur. Creating an "xfrnets" entry for the secondary will solve the error.Question 6.18. ERROR:sysquery: findns error (3)
Date: Wed Jul 9 22:17:09 EDT 1997If syslog reports "sysquery: findns error (3)" or "qserial_query(zonename): sysquery FAILED", there is no ns record for the zone. or the NS record is not defined correctly.
Question 6.19. ERROR:Err/TO getting serial# for XXX
Date: Wed Jul 9 22:18:41 EDT 1997If syslog reports "Err/TO getting serial# for XXX", there could be a number of possible errors:
- An incorrect IP address in named.boot, - A network reachibility problem, - The primary is lame for the zone.An external check to see if you can retrieve the SOA is the best way to work out which it is.Question 6.20. ERROR:zonename IN NS points to a CNAME
Date: Wed Jul 9 22:20:29 EDT 1997If syslog reports "zonename IN NS points to a CNAME" or "zonename IN MX points to a CNAME", named is 'reminding' you that due to various RFCs, an NS or MX record cannot point to a CNAME.
EXAMPLE 1 --------- domain.com IN SOA (...stuff...) IN NS ns.domain.com. ns IN CNAME machine.domain.com. machine IN A 1.2.3.4 The IN NS record points to ns, which is a CNAME for machine. This is what results in the above error EXAMPLE 2 --------- domain.com IN SOA (...stuff...) IN MX mail.domain.com. mail IN CNAME machine.domain.com. machine IN A 1.2.3.4 This would cause the MX variety of the error. The fix is point MX and NS records to a machine that is defined explicitly by an IN A record.Question 6.21. ERROR:Masters for secondary zone [XX] unreachable
Date: Wed Jul 9 22:24:27 EDT 1997If syslog reports "Masters for secondary zone [XX] unreachable", the initial attempts to load a zone failed, and the name server is still trying. If this occurs multiple times, a problem exists, likely on the primary server. This is a fairly generic error, and could indicate a vast number of problems. It might be that named is not running on the primary server, or they do not have the correct zone file. If this keeps up long enough a zone might expire.
Question 6.22. ERROR:secondary zone [XX] expired
Date: Wed Jul 9 22:25:53 EDT 1997If syslog reports "secondary zone [XX] expired", there has been a expiration of a secondary zone on this server.
An expired zone is one in which a transfer hasn't successfully been completed in the amount of time specified before a zone expires.
This problem could be anything which prevents a zone transfer: The primary server is down, named isn't running on the primary, named.boot has the wrong IP address, etc.
Question 6.23. ERROR:bad response to SOA query from [address]
Date: Wed Jan 14 12:15:11 EST 1998If syslog reports "bad response to SOA query from [address], zone [name]", a syntax error may exist in the SOA record of the zone your server is attempting to pull.
It may also indicate that the primary server is lame, possibly due to a syntax error somewhere in the zone file.
Question 6.24. ERROR:premature EOF, fetching [zone]
Date: Wed Jul 9 22:28:26 EDT 1997If syslog reports "premature EOF, fetching [zone]", a syntax error exists on the zone at the primary location, likely towards the End of File (EOF) location.
Question 6.25. ERROR:Zone [XX] SOA serial# rcvd from [Y] is < ours
Date: Wed Jul 9 22:30:03 EDT 1997If syslog reports "Zone [name] SOA serial# rcvd from [address] is < ours", the zone transfer failed because the primary machine has a lower serial number in the SOA record than the one on file on this server.
Question 6.26. ERROR:connect(IP/address) for zone [XX] failed
Date: Wed Jan 14 12:21:40 EST 1998If syslog reports "connect(address) for zone [name] failed: No route to host" or "connect(address) for zone [name] failed: Connection timed out", it could be that there is no route to the specified host or a slow primary system. Try a traceroute to the address specified to isolate the problem. The problem may be a mistyped IP address in named.boot.
A very slow primary machine or a connection may have been initialized, then connectivity lost for some reason, etc. Try networking troubleshooting tools like ping and traceroute, then try connecting to port 53 using nslookup or dig.
If syslog reports "connect(address) for zone [name] failed: Connection refused", the destination address is not allowing the connection. Either the destination is not running DNS (port 53), or possibly filtering the connection from you. It is also possible that the named.boot is pointing to the wrong address.
Question 6.27. ERROR:sysquery: no addrs found for NS
Date: Wed Jul 9 22:37:01 EDT 1997If syslog reports "sysquery: no addrs found for NS" , the IN NS record may be pointing to a host with no IN A record.
Question 6.28. ERROR:zone [name] rejected due to errors
Date: Wed Jul 9 22:37:51 EDT 1997If syslog reports "primary zone [name] rejected due to errors", there will likely be another more descriptive error along with this, like "zonefile: line 17: database format error". That zone file should be investigated for errors.
Next: ACKNOWLEDGEMENTS.
Back: CONFIGURATION.
Return to contents.Chris Peckham - 16 June 1999
Extracted from comp.protocols.tcp-ip.domains Frequently Asked Questions, Copyright 1999.