Please read the end of this ... the new updates are where most of the
details are.


------------------------------------------------------------------------

So I have a theory on what it is that Dan Kaminsky may have discovered
that is broken with DNS (it was already _so_ broken, what else could be 
wrong?!)

Basically it has to do with ICMP packets (spoofed ICMP unreachables sent 
in response to DNS packets the attacker can't see, but can guess - thanks 
to non-random port selection).

The biggest problem with spoofing DNS at the moment is that you need
to silence the real nameservers in order to get your fake replies in.

For an ICMP response to be valid, it must contain the IP header of the
packet it is a reponse too, but it also must contain 64bits of the data
payload. The reason for requiring 64bits of the payload is to prevent
people from spoofing ICMP replies to packets they have not received. In
the case of a DNS packet, that payload is the first 64 bits of the UDP
header.

What is in the first 64bits of the UDP header? The source and destination
ports of the DNS servers. If these are easily predictable then you can
spoof an ICMP unreachable response to a dns query or reply without 
actually receiving it.

If you can spoof ICMP; You can prevent the recursor from communicating
with the real nameserver. This will make it very very easy to spoof DNS as
it removes the biggest hurdle; that of silencing the real nameservers. It
only takes about 2min on a 10mbit/s connection to run through all 65536
possible sequence numbers so if you can prevent the recursor from talking
to the real nameservers it really is easy as pie.

Comments?

robert@mckay.com


Mon Jul 14 04:02:36 BST 2008 -- Update: Was thinking about this some
more.. what if you use ICMP unreachables to cut off the master/slave zone
transfers? Eventually the slave will drop the zone (depending on the
expire timeout in the SOA record). Is it then possible to poison the
slave(s)?

Tue Jul 15 02:21:21 BST 2008 -- Update: Another thing has occured to me 
about how to use the ICMP unreachables; Trigger a lookup for the target 
name, immediately reply to it with a small number of replies using 
incrementing transaction IDs. Follow these packets with the ICMP 
unreachable. This way if your attempt to spoof the XID doesn't work the 
legitimate request has still been blocked, so you can try the same thing 
again. Internet hot/cold routes can give you a 50-100ms advantage over the 
legitimate authoritative server if you 'pre warm' your path. Obviously 
you'll need to wait a short time for the real server's path to cool down 
again.

Fri Jul 18 01:24:39 BST 2008 -- Update: I've finally got around to writing
some actual code and good news! I've managed to get an attack that works
using ICMP redirects (not unreachables, yet anyway). There are a few
restrictions to this attack that may limit its scope for actual damage
(but not its being really evil/cool). It seems (on Linux at least) to
only be possible to force the victim nameserver to learn broken routes to
other things on its same local subnet. The broken routes need to be
created to route via something else that arps on the local subnet (ie: not
the nameserver or its default gw). This could be used to annoy the admin
by preventing them from logging into the nameserver from another box on
the same subnet as the nameserver and probably lots of other circumstance
specific nastyness. Basically you can prevent anything on the nameserver's
local subnet from being able to communicate with it. It doesn't appaer to
be possible to create broken routes to hosts that are behind the default
(or any other) gateway though, so its not directly useful to help with a
cache poisoning attack. Just to be clear the spoofed ICMP redirects can be 
sent over the internet so long as the border gateway does not drop them 
(it won't unless someone has specifically configured the firewall to do 
so).

sun Jul 20 00:18:11 BST 2008 -- Update: Its just occured to me that the
ICMP unreachable attack I mentioned in update #2 that I still haven't
managed to get working, would probably work a lot better if there was a
firewall (or possible even just a NAT) between the internet and the
nameserver. If there is a state tracking firewal, then it may process the
spoofed ICMP unreachables for you and cut off any further replies.. this
would get around the problem I have with sendmsg/recvmsg on Linux
unconnected sockets not seeming to care about any error conditions.

Sun Jul 20 00:56:51 BST 2008 -- Update: Ok. After some googling I've found 
the following link. Assuming most NAT and firewall devices actually 
implement ICMP state tracking then this seems pretty clear cut - This is 
the last missing piece of the puzzle.

 http://www.faqs.org/docs/iptables/icmpconnections.html

How it works? You just send as many fake UDP DNS replies to the victim
nameserver that you can in 50ms (your "hot path" advantage) and then send
an ICMP unreachable. If you guessed the right transactionID, the victim
nameserver learns the wrong (faked) answer and is poisoned. If you didn't
guess the right transactionID, the ICMP you sent after your replies has
still cut off the firewall/NAT path. The legitimate answer from the real
nameserver that comes along a bit later is dropped by the firewall/NAT and
never reaches the nameserver. This means you can try again and again and
again until eventually you get the right transactionID and the nameserver
is poisoned.

Mon Jul 21 21:27:28 BST 2008 -- OK. Yesterday's update was a nice idea 
however it turns out that most firewall/NAT implementations don't work 
that way at all (don't close off UDP sessions on receipt of an ICMP 
unreachable) and I haven't been able to get it to work. However -- I have 
just thought up a much more devious and evil attack:

It has to do with IP fragmentation.

The first step is you use spoofed ICMP unreachable fragmentation required
messages to reduce the MTU of the path between the authoritative
nameserver and the recursor. The goal here is to make the authoritative
nameserver fragment its replies to the recursor.

You query the recursor to make it talk to the auth server. You then spoof
ICMP unreachable Fragmentation Needed messages (as if they were in reply
to the authoritative server's UDP DNS reply to the recursor) to the auth
server from the recursor setting its MTU as low as possible (to 576
bytes). (This part is actually possible, I've done it - the authoritative
server successfully reduced its path mtu upon receipt of forged ICMP 
unreachable fragmentation needed messages).

Next you make another query to the recursor for the name you want to
poison, but also a bunch of other names too make your request longer (I'm
pretty sure DNS allows sending multiple questions in one packet). Your aim
here is to make the reply longer than the smallest MTU you can set which
is 576 bytes.

The auth nameserver's IP stack will fragment its reply to the recursor
into two packets.

You immediately start sending your own version of the #2 packet fragment 
containing your own answers.

Your fragment #2 packet will arrive at the recursor even before the auth 
nameserver's first fragmented packet does.

When the auth nameserver's first fragmented packet arrives at the
recursor, it will re-assamble it using your #2 fragment that it already
has.

What's in the first fragment from the auth server? The real transaction
ID! No need to try and guess it, its just there! You've managed to attach
your own answers to the end of the real reply from the auth server that
has the right transaction ID. This is really really cool :-)

There are two problems with this attack that I see at the moment. The 
first one is the IP ID# which is used for fragment reassembly. Your 
guessed IP packet fragment #2 needs to have the same 16bit IP header ID# 
as the one generated by the auth nameserver. You could possibly get around 
this by sending a whole lot of versions of IP packet fragment #2 with 
different ID#s and just hope that one of them hits. Essentially this seems 
like a way of reopening the birthday attack. I'm not sure how many partial 
IP fragments a typical IP stack will hold for you but it may well be a 
lot.

The second problem is the UDP checksum. Once the packet has been 
sucessfully reassembled, in order for it to be delivered to the recursive 
nameserver's UDP socket, the packet needs to pass the UDP checksum. Since 
you've modified the content of the UDP packet, it won't... unless you are 
very careful.

A simple attack I can see is that you will have multiple answers (to your
multiple questions) in the original DNS packet.

name1.whatever.com IN A 127.0.0.1
name2.whatever.com IN A 127.0.0.2

now in your fake packet, you should be able to re-arrange the data without 
changing the checksum:

name2.whatever.com IN A 127.0.0.2
name1.whatever.com IN A 127.0.0.1

It should also be possible to just append some other random garbage in
order to make any reply you want match the original answer's checksum.

I think this is a solvable problem.. 

I'm not so sure about the ID# one..  as noted above, it seems like you
should be able to get around it by sending a whole lot of different
versions of fragment #2 with different IDs so that one of them will match
when fragment #1 arrives and there are probably other platforms that do
not use random ID#s at all, making the attack very easy indeed.

I would be very intersted in any comments people may have about this 
idea..

Tue Jul 22 01:12:39 BST 2008 -- Update: Ok. I've experimented with sending
a DNS request to bind with two queries. It doesn't like it (returns a
mailformed request error). However - this may not actually be necessary.
Many ANY queries for a domain will probably come back with enough response
data to fragment the packet if the MTU is at 576 bytes.

Tue Jul 22 02:04:12 BST 2008 -- Update: Thinking about this further, if
you lookup com. or . you get quite a large response back. Currently these
replies are just under 576 bytes (queries for .net and .com clock in at
534 bytes). I've noticed something potentially interesting. Not all the
nameservers that have IPv6 records (AAAA)'s are having those AAAA records
returned when you query for just .com or just .net. If you look up the
nameserver's name say f.root-servers.net then the AAAA record is there,
but it is NOT being returned when you attemp to lookup just .com or .net
or ".". This may be an attempt at mitigating this kind of attack (or it
may just be that its always been like this).

Compare these two queries:
dig @a.root-servers.net. com. any
dig @a.root-servers.net. b.root-servers.net. any

In the first one there is no mention of b.root-servers.net's AAAA record. 
Why is this? Is it possibly to keep the reply size low enough that it 
can't be fragmented?

Tue Jul 22 19:52:08 BST 2008 -- Update: Ok. so everyone's seen the Halver
theory by now. 

http://it.slashdot.org/article.pl?sid=08/07/21/2212227&from=rssw

I've started trying to implement my own version of this attack.. code 
is here:

http://wari.mckay.com/~rm/sd3.c.txt

I haven't actually got this one working yet either.. but its getting 
close. Will update again soon.

Btw, I am still very interested in investigating the other attacks on this
page. I'm not totally convinced about this Halver thing.. sure it probably
works, but people have been able to poison DNS for a very long time now
anyway (its not about getting something that works, its about getting
something that works really well) - so, I'm not sure if this is all there
is to Dan's discoveries. I think there may be more.


OK. Update again.

I've just found what I think may be an important optimisation of this DNS 
poisoning scheme.

Basically it is the discovery that if you make your QNAME long enough the 
root-servers will start dropping GLUE records in order to fit their reply 
into the maximum UDP msgsize (slightly over 500 bytes):

dig @b.root-servers.net. 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com any

Consider the above command. Normally the root-servers would return GLUE 
for all of the .COM nameservers, however because the QNAME is so long, 
there isn't enough room in the reply for all of them.


rm@wari:~$ dig @c.root-servers.net. 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com 
a

; <<>> DiG 9.3.1 <<>> @c.root-servers.net. 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com 
a
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2735
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 1

;; QUESTION SECTION:
;aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com. 
IN A

;; AUTHORITY SECTION:
com.                    172800  IN      NS      G.GTLD-SERVERS.NET.
com.                    172800  IN      NS      E.GTLD-SERVERS.NET.
com.                    172800  IN      NS      M.GTLD-SERVERS.NET.
com.                    172800  IN      NS      A.GTLD-SERVERS.NET.
com.                    172800  IN      NS      F.GTLD-SERVERS.NET.
com.                    172800  IN      NS      H.GTLD-SERVERS.NET.
com.                    172800  IN      NS      L.GTLD-SERVERS.NET.
com.                    172800  IN      NS      K.GTLD-SERVERS.NET.
com.                    172800  IN      NS      D.GTLD-SERVERS.NET.
com.                    172800  IN      NS      C.GTLD-SERVERS.NET.
com.                    172800  IN      NS      I.GTLD-SERVERS.NET.
com.                    172800  IN      NS      J.GTLD-SERVERS.NET.
com.                    172800  IN      NS      B.GTLD-SERVERS.NET.

;; ADDITIONAL SECTION:
A.GTLD-SERVERS.NET.     172800  IN      A       192.5.6.30

;; Query time: 192 msec
;; SERVER: 192.33.4.12#53(192.33.4.12)
;; WHEN: Wed Jul 23 14:49:54 2008
;; MSG SIZE  rcvd: 511


Notice that it has only returned one GLUE record for .COM? That means you 
only need to spoof replies for .COM as if they came from one nameserver 
(192.5.6.30). This should make the attack about 13 times esaier :-)


Wed Jul 23 22:13:17 BST 2008 -- Update: Okay.. the above doesn't actually
give such a huge advantage (possibly no advantage at all even). The
problem is that the recursor will still try and use the other nameservers
it doesn't have GLUE for by chasing them down by name (it re-queries the
root servers using the NS name it did get to find it's IP). It takes a
while longer but it will still attempt to use the other nameservers. I've 
also found a serious problem with the fragment replacement attack I 
proposed earlier. Basically the problem is that you can only fragment 
packets that are larger than 572 bytes and all such packets you can get a 
nameserver to generate have the trunacated flag set, so even if you were 
able to replace part of a fragmented answer the recursor would end up 
retrying the request with TCP anyway. The only way it would work is if the 
recursor's OS allowed overlapping fragments that overlap earlier data (you 
could rewrite the truncated flag and turn it back to non-truncated 
again) - but this is starting to get a bit crazy. The attack might still 
hold some use as a DOS though to prevent answers getting back to the 
recursor at all. You could fill up the recursor's fragmented packet buffer 
and then the replies would be dropped but you could still answer with your 
fake reply using non-fragmented packets.


Wed Jul 23 23:16:55 BST 2008 -- Update: Some other exploits have been 
released for this issue:

http://www.caughq.org/exploits/CAU-EX-2008-0002.txt
http://metasploit.com/dev/trac/browser/framework3/trunk/modules/auxiliary/spoof/dns/baliwicked_host.rb

Thu Jul 24 01:12:31 BST 2008 -- Update: Well.. I managed to get my
implementation of the exploit working now too:

http://wari.mckay.com/~rm/sd4.c.txt