The security lab I've been developing is located on my university's campus network. If I want to work from home (which I did this week), I need to VPN on to campus. If I then want to work within the security lab I need to VPN on to the lab. I call this dual VPNing! For both VPN connections I opt out of using the VPN as my gateway. So essentially I utilize the campus VPN for a secured connection to my security lab VPN server, then the security lab VPN to access the lab NAT. This leaves my innocent laptop with quite a large routing table, but she's a trooper and doesn't complain.
As I was configuring web access for the lab's NAT and renaming the AD domain (not as hard as it sounds, see here and then here), I noticed a problem launching the VMware MKS display console. This was a problem which prompted the need for a VPN to the security lab in the first place! Now it was back, the only change I had made was driving 120 miles from campus. My first assumption was that the routing table was wrong for the ESXi hosts, and that the data was flowing to them but not back to my VPN client. And for some unknown reason my Windows 7 laptop doesn't want to allow Wireshark access to the VPN tunneling adapter (so no troubleshooting from that end). Motivated by frustration, after a few reinstalls of winpcap, I configured the two VPNs within my Ubuntu VM (yeah one was L2TP and the other PPTP, fun). I confirmed, both from the Windows 2008 VPN server and the Ubuntu VPN client that the ESXi host was responding. It was actually the VPN client that would stop ACKing data soon after the SSL handshake. Interesting...
While not on good terms with my Windows 7, I blamed her again and tried to launch the display console from the web interface and a Windows XP VM (vCenter client runs on Windows). They all exhibited the same erroneous behavior. Thankfully, I had just finished my TCP/IP final exam not one week prior so my knowledge of networking had me collecting and examining the offending packets (in this case TCP segments). I found that all had one thing in common, 1360 bytes worth of TCP data, bingo! Looking back at the TCP handshake I saw that the MSS advertised by the VPN client was 1360, and VPN server was also 1360. After running some tests I saw that the campus VPN was reserving 100 bytes, which explained the MSS of 1360 from an MTU of 1514; 14 for the Ethernet frame, 20 for the IP header, 20 for the TCP header, and 100 for the VPN encapsulation. Now that's fine for communication within the campus, but as soon as data moves into the second VPN the MSS should decrease since there is a secondary VPN reservation.
I used ping with the do-not-fragment flag and variable data sizes to find a working MSS into the secondary VPN. I had known the campus blocked ICMP traffic but I did not realize that PMTU negotiation would fail so horribly. (Maybe I should have paid attention in class a bit more.) Fortunately ping helped me as I would receive no feedback if the size was greater than the PMTU, and an echo reply if the size was less than or equal. Ping worked because I was working from VPN client to secondary VPN (the security lab network did not block ICMP). The PMTU discovery was relying on ICMP communication from the campus network back to the security lab VPN server (which was blocked).
Remedying the situation was fairly easy; it turned out the secondary VPN (using PPTP) needed an additional 25 bytes. So the Windows 2008 VPN (RRAS) server was configured for a VPN MTU of 1375 (14 for Ethernet, 100 for VPN1, 25 for VPN2), such that the MSS would then become 1335 (discovered via ping). This was done in the registry by adding:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NdisWan\Parameters\Protocols\0\PPPProtocolType = DWORD(0x21)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NdisWan\Parameters\Protocols\0\ProtocolType = DWORD(0x800)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NdisWan\Parameters\Protocols\0\TunnelMTU = DWORD(1375)
Dual VPNing success!
Now some notes about ICMP traffic and securing networks. Yes, blocking ICMP echo replies will help prevent smurf attacks. However, I can just as well run a SYN flood attack, does that mean we should block TCP SYN? Yes, blocking port unreachable, host unreachable, network unreachable will help prevent espionage. There are ways around this but there seems to be no harm in blocking these types of messages. (For port scan blocking it might be best to leave that decision up to the client, same for echo messages.) Sure traceroute is fun for evil but it's also used in introductory networking classes. Having to use a web-based traceroute just to overcome your campus firewall is less-fun. :( Fragmentation required, self explanatory.
Long story short, it seems like allowing ICMP isn't so bad. Yes there are attacks based on ICMP messages, including the PMTU discovery procedure, source quench, hard, echo reply, etc. For the campus network, other security policies are in place which help mitigate these attacks; such as port authentication and packets-per-second rate-limiting. Also, IPv6 continues to use ICMP (ICMPv6) messages to properly operate. There is no ARP protocol in IPv6; instead ICMPv6 messages create a Neighbor Discovery protocol. Of course dual VPNing is a rare case where PMTU discovery is needed, but often when a problem caused by ICMP blockage occurs, it takes a long time to figure out why. I'd suggest a more granular approach, if at all, to blocking ICMP.
As an aside, VMware's MKS display console works quite well over the dual VPN (and 120 miles away).