Cisco ZBF and ICMP Inspect Drops

I stumbled into a interesting issue the other day with icmp inspect breaking MTR. After cutting over traffic to an Cisco ASR1001HX running IOX-XE Zone Bases Firewall, mtr running from behind the ZBF was showing 99.9% packet-loss for all the hops between the ZBF and the Last.

There was no real packet loss and the last hop was always showing 0% loss as expected, however Monitoring Systems that relies on MTR to monitor hop-by-hop went completely crazy, thinking that we were having real packet loss.

Screen Shot 2018-02-12 at 12.07.44

We noticed messages similar to the following on the Logs:

%IOSXE-6-PLATFORM: R0/0: cpp_cp: QFP:0.0 Thread:122 TS:00002775229457146791 %FW-6-DROP_PKT: Dropping icmp pkt from internal0/0/recycle:0 X.X.X.X:11 => due to ICMP ERR Pkt:exceed burst lmt with ip ident 0
%IOSXE-6-PLATFORM: R0/0: cpp_cp: QFP:0.0 Thread:025 TS:00002773939123637161 %FW-6-DROP_PKT: Dropping icmp pkt from Port-channel1 X.X.X.X:11 => due to ICMP ERR Pkt:exceed burst lmt with ip ident 4427

First thought seeing ICMP ERR Pkt:exceed burst lmt is that it was some sort of icmp rate-limit on the box, so we disabled ip icmp rate-limit unreachable that’s enabled by default and is used to limit the amount of unreachable icmp packets in a X amount of time.

Well, that didn’t work. Paying more attention to the logs we noticed that they were all icmp type 11 code 0, that is ttl-exceeded in transit, ZBF was dropping the icmp 11/0 responses, and because of that the source running MTR would never receive them and would that packet loss was happening.

Solution? Add an ICMP Pass rule on your Policy-Map, making ICMP Stateless for the Firewall.

class-map type inspect match-any CM-UNTRUST-TO-TRUST-ICMP-ALLOW
 match protocol icmp

policy-map type inspect PM-UNTRUST-TO-TRUST
 class type inspect CM-UNTRUST-TO-TRUST-ICMP-ALLOW
 class type inspect CM-UNTRUST-TO-TRUST-ALLOW
 class class-default

Pass means that the ZBF will allow the traffic according to your class-map, and won’t keep any state of the connection, making it stateless for icmp traffic, you can apply it to your Untrust to Trust zone-pair, and MTR should work fine again.

You can and probably should be more specific and match on ttl-exceeded instead of all icmp messages, but this should be a good enough example to illustrate the issue.

I hope this can help someone having the same issue, I honestly couldn’t find much about this with my Google-Fu, and I still haven’t heard from Cisco with an official answer, if its a bug or expected for some reason.. If you know more about it, please write a comment 🙂


Cisco IOS-XE Guest Shell

Cisco IOS-XE now comes with a neat feature called Guest Shellit give us the power of spinning up a Linux Container on the router, giving us many new Network Programmability options, the main one being the option of running custom Python Scripts. With that we can write things like custom Metrics, Integrations, Features and tools not natively available.

Enabling the Guest Shell

I will be using a Cisco CSR1000v on AWS as an example (its easier to spin up and lab), but the feature is also available on ASR and ISR, and the way to enable is is fairly similar.
When you bring the CSR up, you should notice that it has a VirtualPortGroup Interface enabled, this Interface will be responsible for the Interface between the GuestShell Container and the router IOS-XE/External World. On platforms with managements ports, the mgmt interface/vrf would be used for the Interface with the GuestShell.

app-hosting appid guestshell
 vnic gateway1 virtualportgroup 0 guest-interface 0 guest-ipaddress netmask gateway
 resource profile custom cpu 1500 memory 512

guestshell enable VirtualPortGroup 0 guest-ip

That’s it, we can run show app-hosting list to confirm that GuestShell is enabled:

#show app-hosting list
App id                           State
guestshell                       RUNNING

To gain shell access, run  guestshell run bash:

#guestshell run bash
[guestshell@guestshell ~]$ uname -a
Linux guestshell 4.4.51 #1 SMP Sun Apr 23 01:42:33 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[guestshell@guestshell ~]$

We are now able to start playing with Python. Your main Python package will be the cli package, this let you run IOS commands from the shell:

[guestshell@guestshell ~]$ python
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from cli import cli
>>> from cli import configure
>>> configure('hostname IT-WORKS')
[ConfigResult(success=True, command='hostname IT-WORKS', line=1, output='', notes=None)]
>>> cli('show running-config | in hostname')
'\nhostname IT-WORKS\n'

It also comes with pip installed, so you are able to install pretty much any package that you need to write your code.

I’ve been implementing a bunch of Cisco routers with ZBF lately, its a simple and nice Firewall solution for those that don’t need all that fancy features that vendors as CheckPoint or Palo Alto offer you, and one thing that happened a couple of times is ACL out o Sync between the devices, before we wrote a NAPALM script to update both devices at same time (more of this on a future post), I would sometimes update the ACL in one Router and forget to update the pair device, I would then test a failover and notice that traffic was dropped on the secondary router. (Oops!)

Verify ACL Sync with IOS-XE GuestShell and Python

This gave the Idea to write this example using GuestShell and Python for this blog post, the following Script runs after X specified seconds, compare the Local ACLs with the Remote ACLs, and generate a Syslog Message in case they are out of sync. Simple and helpful. Here’s the code, which is also available on my github:

Copy this to a .py on the GuestShell, make it an executable with chmod +x, and execute it! If the ACLs on both devices are not the same, you will see this log msg on the Router:

*Feb  8 11:57:16.924: %SYS-4-USERLOG_WARNING: Message from tty3(user id: ): ATTENTION. ACLs are not in Sync, local ACL has more entries than Peer.

If you have a syslog Collector that Integrates with Slack or some other Chat Platform that your team uses, it should be easy to see when the ACLs are not in-sync, and now anyone can go to their favourite Version Control system where they manage their ACLs and re-apply the ACL to the device with missing entries. I am sure you store you ACLs in a repo, right? 🙂

With imagination and some available time this Script could be much improved, and many other cool tools/features could be written to level-up your Cisco IO-XE Routers!

I hope this helps someone, thanks for reading.