Please read the end of this ... the new updates are where most of the details are. ------------------------------------------------------------------------ So I have a theory on what it is that Dan Kaminsky may have discovered that is broken with DNS (it was already _so_ broken, what else could be wrong?!) Basically it has to do with ICMP packets (spoofed ICMP unreachables sent in response to DNS packets the attacker can't see, but can guess - thanks to non-random port selection). The biggest problem with spoofing DNS at the moment is that you need to silence the real nameservers in order to get your fake replies in. For an ICMP response to be valid, it must contain the IP header of the packet it is a reponse too, but it also must contain 64bits of the data payload. The reason for requiring 64bits of the payload is to prevent people from spoofing ICMP replies to packets they have not received. In the case of a DNS packet, that payload is the first 64 bits of the UDP header. What is in the first 64bits of the UDP header? The source and destination ports of the DNS servers. If these are easily predictable then you can spoof an ICMP unreachable response to a dns query or reply without actually receiving it. If you can spoof ICMP; You can prevent the recursor from communicating with the real nameserver. This will make it very very easy to spoof DNS as it removes the biggest hurdle; that of silencing the real nameservers. It only takes about 2min on a 10mbit/s connection to run through all 65536 possible sequence numbers so if you can prevent the recursor from talking to the real nameservers it really is easy as pie. Comments? robert@mckay.com Mon Jul 14 04:02:36 BST 2008 -- Update: Was thinking about this some more.. what if you use ICMP unreachables to cut off the master/slave zone transfers? Eventually the slave will drop the zone (depending on the expire timeout in the SOA record). Is it then possible to poison the slave(s)? Tue Jul 15 02:21:21 BST 2008 -- Update: Another thing has occured to me about how to use the ICMP unreachables; Trigger a lookup for the target name, immediately reply to it with a small number of replies using incrementing transaction IDs. Follow these packets with the ICMP unreachable. This way if your attempt to spoof the XID doesn't work the legitimate request has still been blocked, so you can try the same thing again. Internet hot/cold routes can give you a 50-100ms advantage over the legitimate authoritative server if you 'pre warm' your path. Obviously you'll need to wait a short time for the real server's path to cool down again. Fri Jul 18 01:24:39 BST 2008 -- Update: I've finally got around to writing some actual code and good news! I've managed to get an attack that works using ICMP redirects (not unreachables, yet anyway). There are a few restrictions to this attack that may limit its scope for actual damage (but not its being really evil/cool). It seems (on Linux at least) to only be possible to force the victim nameserver to learn broken routes to other things on its same local subnet. The broken routes need to be created to route via something else that arps on the local subnet (ie: not the nameserver or its default gw). This could be used to annoy the admin by preventing them from logging into the nameserver from another box on the same subnet as the nameserver and probably lots of other circumstance specific nastyness. Basically you can prevent anything on the nameserver's local subnet from being able to communicate with it. It doesn't appaer to be possible to create broken routes to hosts that are behind the default (or any other) gateway though, so its not directly useful to help with a cache poisoning attack. Just to be clear the spoofed ICMP redirects can be sent over the internet so long as the border gateway does not drop them (it won't unless someone has specifically configured the firewall to do so). sun Jul 20 00:18:11 BST 2008 -- Update: Its just occured to me that the ICMP unreachable attack I mentioned in update #2 that I still haven't managed to get working, would probably work a lot better if there was a firewall (or possible even just a NAT) between the internet and the nameserver. If there is a state tracking firewal, then it may process the spoofed ICMP unreachables for you and cut off any further replies.. this would get around the problem I have with sendmsg/recvmsg on Linux unconnected sockets not seeming to care about any error conditions. Sun Jul 20 00:56:51 BST 2008 -- Update: Ok. After some googling I've found the following link. Assuming most NAT and firewall devices actually implement ICMP state tracking then this seems pretty clear cut - This is the last missing piece of the puzzle. http://www.faqs.org/docs/iptables/icmpconnections.html How it works? You just send as many fake UDP DNS replies to the victim nameserver that you can in 50ms (your "hot path" advantage) and then send an ICMP unreachable. If you guessed the right transactionID, the victim nameserver learns the wrong (faked) answer and is poisoned. If you didn't guess the right transactionID, the ICMP you sent after your replies has still cut off the firewall/NAT path. The legitimate answer from the real nameserver that comes along a bit later is dropped by the firewall/NAT and never reaches the nameserver. This means you can try again and again and again until eventually you get the right transactionID and the nameserver is poisoned. Mon Jul 21 21:27:28 BST 2008 -- OK. Yesterday's update was a nice idea however it turns out that most firewall/NAT implementations don't work that way at all (don't close off UDP sessions on receipt of an ICMP unreachable) and I haven't been able to get it to work. However -- I have just thought up a much more devious and evil attack: It has to do with IP fragmentation. The first step is you use spoofed ICMP unreachable fragmentation required messages to reduce the MTU of the path between the authoritative nameserver and the recursor. The goal here is to make the authoritative nameserver fragment its replies to the recursor. You query the recursor to make it talk to the auth server. You then spoof ICMP unreachable Fragmentation Needed messages (as if they were in reply to the authoritative server's UDP DNS reply to the recursor) to the auth server from the recursor setting its MTU as low as possible (to 576 bytes). (This part is actually possible, I've done it - the authoritative server successfully reduced its path mtu upon receipt of forged ICMP unreachable fragmentation needed messages). Next you make another query to the recursor for the name you want to poison, but also a bunch of other names too make your request longer (I'm pretty sure DNS allows sending multiple questions in one packet). Your aim here is to make the reply longer than the smallest MTU you can set which is 576 bytes. The auth nameserver's IP stack will fragment its reply to the recursor into two packets. You immediately start sending your own version of the #2 packet fragment containing your own answers. Your fragment #2 packet will arrive at the recursor even before the auth nameserver's first fragmented packet does. When the auth nameserver's first fragmented packet arrives at the recursor, it will re-assamble it using your #2 fragment that it already has. What's in the first fragment from the auth server? The real transaction ID! No need to try and guess it, its just there! You've managed to attach your own answers to the end of the real reply from the auth server that has the right transaction ID. This is really really cool :-) There are two problems with this attack that I see at the moment. The first one is the IP ID# which is used for fragment reassembly. Your guessed IP packet fragment #2 needs to have the same 16bit IP header ID# as the one generated by the auth nameserver. You could possibly get around this by sending a whole lot of versions of IP packet fragment #2 with different ID#s and just hope that one of them hits. Essentially this seems like a way of reopening the birthday attack. I'm not sure how many partial IP fragments a typical IP stack will hold for you but it may well be a lot. The second problem is the UDP checksum. Once the packet has been sucessfully reassembled, in order for it to be delivered to the recursive nameserver's UDP socket, the packet needs to pass the UDP checksum. Since you've modified the content of the UDP packet, it won't... unless you are very careful. A simple attack I can see is that you will have multiple answers (to your multiple questions) in the original DNS packet. name1.whatever.com IN A 127.0.0.1 name2.whatever.com IN A 127.0.0.2 now in your fake packet, you should be able to re-arrange the data without changing the checksum: name2.whatever.com IN A 127.0.0.2 name1.whatever.com IN A 127.0.0.1 It should also be possible to just append some other random garbage in order to make any reply you want match the original answer's checksum. I think this is a solvable problem.. I'm not so sure about the ID# one.. as noted above, it seems like you should be able to get around it by sending a whole lot of different versions of fragment #2 with different IDs so that one of them will match when fragment #1 arrives and there are probably other platforms that do not use random ID#s at all, making the attack very easy indeed. I would be very intersted in any comments people may have about this idea.. Tue Jul 22 01:12:39 BST 2008 -- Update: Ok. I've experimented with sending a DNS request to bind with two queries. It doesn't like it (returns a mailformed request error). However - this may not actually be necessary. Many ANY queries for a domain will probably come back with enough response data to fragment the packet if the MTU is at 576 bytes. Tue Jul 22 02:04:12 BST 2008 -- Update: Thinking about this further, if you lookup com. or . you get quite a large response back. Currently these replies are just under 576 bytes (queries for .net and .com clock in at 534 bytes). I've noticed something potentially interesting. Not all the nameservers that have IPv6 records (AAAA)'s are having those AAAA records returned when you query for just .com or just .net. If you look up the nameserver's name say f.root-servers.net then the AAAA record is there, but it is NOT being returned when you attemp to lookup just .com or .net or ".". This may be an attempt at mitigating this kind of attack (or it may just be that its always been like this). Compare these two queries: dig @a.root-servers.net. com. any dig @a.root-servers.net. b.root-servers.net. any In the first one there is no mention of b.root-servers.net's AAAA record. Why is this? Is it possibly to keep the reply size low enough that it can't be fragmented? Tue Jul 22 19:52:08 BST 2008 -- Update: Ok. so everyone's seen the Halver theory by now. http://it.slashdot.org/article.pl?sid=08/07/21/2212227&from=rssw I've started trying to implement my own version of this attack.. code is here: http://wari.mckay.com/~rm/sd3.c.txt I haven't actually got this one working yet either.. but its getting close. Will update again soon. Btw, I am still very interested in investigating the other attacks on this page. I'm not totally convinced about this Halver thing.. sure it probably works, but people have been able to poison DNS for a very long time now anyway (its not about getting something that works, its about getting something that works really well) - so, I'm not sure if this is all there is to Dan's discoveries. I think there may be more. OK. Update again. I've just found what I think may be an important optimisation of this DNS poisoning scheme. Basically it is the discovery that if you make your QNAME long enough the root-servers will start dropping GLUE records in order to fit their reply into the maximum UDP msgsize (slightly over 500 bytes): dig @b.root-servers.net. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com any Consider the above command. Normally the root-servers would return GLUE for all of the .COM nameservers, however because the QNAME is so long, there isn't enough room in the reply for all of them. rm@wari:~$ dig @c.root-servers.net. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com a ; <<>> DiG 9.3.1 <<>> @c.root-servers.net. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com a ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2735 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 1 ;; QUESTION SECTION: ;aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.mckay.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS G.GTLD-SERVERS.NET. com. 172800 IN NS E.GTLD-SERVERS.NET. com. 172800 IN NS M.GTLD-SERVERS.NET. com. 172800 IN NS A.GTLD-SERVERS.NET. com. 172800 IN NS F.GTLD-SERVERS.NET. com. 172800 IN NS H.GTLD-SERVERS.NET. com. 172800 IN NS L.GTLD-SERVERS.NET. com. 172800 IN NS K.GTLD-SERVERS.NET. com. 172800 IN NS D.GTLD-SERVERS.NET. com. 172800 IN NS C.GTLD-SERVERS.NET. com. 172800 IN NS I.GTLD-SERVERS.NET. com. 172800 IN NS J.GTLD-SERVERS.NET. com. 172800 IN NS B.GTLD-SERVERS.NET. ;; ADDITIONAL SECTION: A.GTLD-SERVERS.NET. 172800 IN A 192.5.6.30 ;; Query time: 192 msec ;; SERVER: 192.33.4.12#53(192.33.4.12) ;; WHEN: Wed Jul 23 14:49:54 2008 ;; MSG SIZE rcvd: 511 Notice that it has only returned one GLUE record for .COM? That means you only need to spoof replies for .COM as if they came from one nameserver (192.5.6.30). This should make the attack about 13 times esaier :-) Wed Jul 23 22:13:17 BST 2008 -- Update: Okay.. the above doesn't actually give such a huge advantage (possibly no advantage at all even). The problem is that the recursor will still try and use the other nameservers it doesn't have GLUE for by chasing them down by name (it re-queries the root servers using the NS name it did get to find it's IP). It takes a while longer but it will still attempt to use the other nameservers. I've also found a serious problem with the fragment replacement attack I proposed earlier. Basically the problem is that you can only fragment packets that are larger than 572 bytes and all such packets you can get a nameserver to generate have the trunacated flag set, so even if you were able to replace part of a fragmented answer the recursor would end up retrying the request with TCP anyway. The only way it would work is if the recursor's OS allowed overlapping fragments that overlap earlier data (you could rewrite the truncated flag and turn it back to non-truncated again) - but this is starting to get a bit crazy. The attack might still hold some use as a DOS though to prevent answers getting back to the recursor at all. You could fill up the recursor's fragmented packet buffer and then the replies would be dropped but you could still answer with your fake reply using non-fragmented packets. Wed Jul 23 23:16:55 BST 2008 -- Update: Some other exploits have been released for this issue: http://www.caughq.org/exploits/CAU-EX-2008-0002.txt http://metasploit.com/dev/trac/browser/framework3/trunk/modules/auxiliary/spoof/dns/baliwicked_host.rb Thu Jul 24 01:12:31 BST 2008 -- Update: Well.. I managed to get my implementation of the exploit working now too: http://wari.mckay.com/~rm/sd4.c.txt