IP SLA with Syslog Alerting

Most network outages are not detected by the router that experiences them — they are detected by an end user who calls the help desk, or by a NOC engineer refreshing a dashboard. IP SLA (Internet Protocol Service Level Agreement) changes this by turning the router itself into an active probe: it continuously sends synthetic test traffic to a target, measures reachability and performance, and declares the target either reachable or unreachable based on those measurements. Object Tracking watches the IP SLA result and translates probe outcomes into a binary state — Up or Down. EEM applets subscribe to that state and fire the moment a transition occurs, generating syslog alerts, capturing diagnostics, and sending email notifications — all before any human has opened a dashboard.

This lab assembles that complete monitoring pipeline on a single router. It covers four progressively more capable probe types: ICMP echo for basic reachability, HTTP for application-layer connectivity, UDP jitter for voice-quality monitoring, and DNS for resolver availability. Each probe is wired to a tracking object and an EEM applet that generates an alert on failure and a recovery notification when the target returns. The result is a lightweight, self-contained WAN monitoring system that requires no external NMS, no SNMP collector, and no subscription.

Before starting, ensure you understand IP SLA probe types and tracking at IP SLA Configuration & Tracking. For the EEM applet architecture used in this lab, see EEM — Embedded Event Manager Scripting. For forwarding alerts to a central syslog server, see Syslog Configuration and Syslog Server Configuration. For understanding syslog severity levels referenced in the EEM actions, see Syslog Severity Levels.

1. IP SLA + Track + EEM — The Full Pipeline

How the Three Components Work Together

  ┌─────────────────────────────────────────────────────────────┐
  │  LAYER 1: IP SLA PROBE                                      │
  │  Sends synthetic test traffic to the target on a schedule   │
  │  Records: RTT, packet loss, jitter, return code             │
  │  Declares: operation success or failure                     │
  │                                                             │
  │  ip sla 1                                                   │
  │   icmp-echo 203.0.113.1 source-interface GigabitEthernet0/0 │
  │   frequency 30                                              │
  │  ip sla schedule 1 life forever start-time now             │
  └───────────────────────┬─────────────────────────────────────┘
                          │ passes result (success/fail)
                          ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  LAYER 2: OBJECT TRACKING                                   │
  │  Watches the IP SLA operation result                        │
  │  Maintains binary state: Up (probe succeeding) or           │
  │                          Down (probe failing)               │
  │  Applies reachability or threshold criteria                 │
  │                                                             │
  │  track 1 ip sla 1 reachability                             │
  └───────────────────────┬─────────────────────────────────────┘
                          │ notifies subscribers on state change
                          ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  LAYER 3: EEM APPLET — event track 1 state down            │
  │  Fires the instant track 1 transitions to Down              │
  │  Actions:                                                   │
  │   1. Send syslog CRITICAL alert to log buffer + server      │
  │   2. Capture show ip sla statistics to flash                │
  │   3. Capture show ip route to flash                         │
  │   4. Send email to NOC team                                 │
  │  Companion applet on state up:                              │
  │   1. Send syslog NOTICE — target recovered                  │
  └─────────────────────────────────────────────────────────────┘
                          │
                          ▼
              Syslog server / NOC inbox
              (alert arrives within seconds
               of probe declaring target down)
  

IP SLA Probe Types Covered in this Lab

Probe Type IOS Keyword What It Measures Requires Responder? Typical Use Case
ICMP Echo icmp-echo Round-trip time (RTT) and reachability to any IP-addressable target No — target only needs to respond to ICMP (ping) WAN gateway reachability, ISP monitoring, basic link health
UDP Jitter udp-jitter RTT, jitter (delay variation), packet loss, and out-of-order delivery — full VoIP quality metrics Yes — Cisco IP SLA Responder must be enabled on the target router Voice quality monitoring, MPLS SLA verification, QoS validation
HTTP http get HTTP GET response time and HTTP return code — application-layer reachability No — any web server Web application availability, DNS + HTTP end-to-end testing
DNS dns DNS resolution time and success/failure for a specific hostname No — standard DNS server DNS resolver availability monitoring, split-horizon DNS validation
TCP Connect tcp-connect TCP three-way handshake completion time to a specific IP and port No — any TCP server (port 80, 443, 22, etc.) Application port availability — verify SSH, HTTPS, or custom services are accepting connections

Tracking Object Types

Track Type IOS Syntax State Goes Down When Use Case
Reachability track N ip sla N reachability The IP SLA operation returns a failure (timeout, unreachable, non-OK return code) Binary up/down monitoring — did the probe succeed or fail?
State track N ip sla N state The IP SLA operation result changes from its baseline value (over-threshold or under-threshold) Threshold-based monitoring — did RTT exceed the configured threshold?

2. Lab Topology & Monitoring Plan

NetsTuts_R1 is a dual-WAN edge router with a primary ISP (Gi0/0) and backup ISP (Gi0/1). Four monitoring probes will be deployed: a WAN gateway ICMP probe on each ISP, a UDP jitter probe to the branch office, and an HTTP probe to a critical internal web server. Each has a dedicated tracking object and a pair of EEM applets (down alert + up recovery).

  ┌─────────────────────────────────────────────────────────────────┐
  │                       NetsTuts_R1                               │
  │  Gi0/0 ── 203.0.113.2 ── ISP-A Gateway: 203.0.113.1           │
  │  Gi0/1 ── 198.51.100.2 ── ISP-B Gateway: 198.51.100.1         │
  │  Gi0/2 ── 10.0.0.1/24  ── LAN                                  │
  └─────────────────────────────────────────────────────────────────┘

  IP SLA Monitoring Plan:
  ┌──────┬──────────────┬──────────────────────────────┬─────────┐
  │ SLA# │ Probe Type   │ Target                       │ Track # │
  ├──────┼──────────────┼──────────────────────────────┼─────────┤
  │  1   │ ICMP echo    │ ISP-A Gateway  203.0.113.1   │    1    │
  │  2   │ ICMP echo    │ ISP-B Gateway  198.51.100.1  │    2    │
  │  3   │ UDP jitter   │ Branch Router  10.10.0.1     │    3    │
  │  4   │ HTTP get     │ Web Server     http://10.0.0.50│   4   │
  └──────┴──────────────┴──────────────────────────────┴─────────┘

  EEM Applet Pairs (down alert + up recovery):
  ┌─────────────────────────────┬────────────────────────────────┐
  │ ISPA-GW-DOWN / ISPA-GW-UP   │ Track 1 state transitions      │
  │ ISPB-GW-DOWN / ISPB-GW-UP   │ Track 2 state transitions      │
  │ BRANCH-JITTER-DOWN / -UP    │ Track 3 state transitions      │
  │ WEBSERVER-DOWN / -UP        │ Track 4 state transitions      │
  └─────────────────────────────┴────────────────────────────────┘
  

3. Step 1 — EEM Prerequisites and Environment Variables

Configure the global EEM prerequisites before writing any applets. These settings are shared across all four monitoring pairs.

NetsTuts_R1>en
NetsTuts_R1#conf t

! ══════════════════════════════════════════════════════════
! EEM CLI execution user — required for action cli command
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#username eem-user privilege 15 secret EEM$ecret99
NetsTuts_R1(config)#event manager session cli username eem-user

! ══════════════════════════════════════════════════════════
! EEM environment variables — centralised parameters
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#event manager environment _hostname     NetsTuts_R1
NetsTuts_R1(config)#event manager environment _email_server 10.0.0.25
NetsTuts_R1(config)#event manager environment _email_from   [email protected]
NetsTuts_R1(config)#event manager environment _email_to     [email protected]
NetsTuts_R1(config)#event manager environment _log_dir      flash:/sla-logs/

! ── Create the log directory on flash ─────────────────────
NetsTuts_R1#mkdir flash:/sla-logs
Create directory filename [sla-logs]? [Enter]
Created dir flash:/sla-logs

! ── Verify NTP is synchronised — timestamps must be accurate
! ── for syslog correlation during outage post-mortems ─────
NetsTuts_R1#show ntp status | include Clock
Clock is synchronized, stratum 2, reference is 10.0.0.200
  
Accurate timestamps in syslog are essential for correlating IP SLA alerts with other events in the network — a syslog alert with the wrong time is worse than no alert because it actively misleads the investigation. Confirm NTP is synchronised with show ntp status before deploying probes. For NTP configuration, see NTP Synchronisation. For the full EEM prerequisites explanation, see EEM — Embedded Event Manager Scripting.

4. Step 2 — ICMP Echo Probes for WAN Gateway Monitoring

ICMP echo probes send periodic pings from a specific source interface to the ISP gateway. Using source-interface ensures the probe tests the actual WAN path and is sourced from the correct interface — not just the best-path reachability from the router's routing table.

! ══════════════════════════════════════════════════════════
! IP SLA 1 — ISP-A gateway reachability (primary WAN)
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 1
NetsTuts_R1(config-ip-sla)# icmp-echo 203.0.113.1 source-interface GigabitEthernet0/0
NetsTuts_R1(config-ip-sla-echo)#  frequency 30
NetsTuts_R1(config-ip-sla-echo)#  timeout 5000
NetsTuts_R1(config-ip-sla-echo)#  threshold 2000
NetsTuts_R1(config-ip-sla-echo)#  tag ISP-A-GW-MONITOR
NetsTuts_R1(config-ip-sla-echo)#exit
NetsTuts_R1(config)#ip sla schedule 1 life forever start-time now

! ══════════════════════════════════════════════════════════
! IP SLA 2 — ISP-B gateway reachability (backup WAN)
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 2
NetsTuts_R1(config-ip-sla)# icmp-echo 198.51.100.1 source-interface GigabitEthernet0/1
NetsTuts_R1(config-ip-sla-echo)#  frequency 30
NetsTuts_R1(config-ip-sla-echo)#  timeout 5000
NetsTuts_R1(config-ip-sla-echo)#  threshold 2000
NetsTuts_R1(config-ip-sla-echo)#  tag ISP-B-GW-MONITOR
NetsTuts_R1(config-ip-sla-echo)#exit
NetsTuts_R1(config)#ip sla schedule 2 life forever start-time now
  
The key parameters for a WAN gateway probe: frequency 30 sends one probe every 30 seconds — a good balance between detection speed and ICMP overhead on the WAN link. timeout 5000 declares the probe a failure if no response is received within 5,000 milliseconds (5 seconds). threshold 2000 marks the RTT as over-threshold when it exceeds 2,000 ms — this feeds the track N ip sla N state tracking type, which alerts on latency degradation even before complete packet loss occurs. source-interface is critical — without it, IOS sources the probe from the best available exit interface. If the WAN link fails but a LAN route to the gateway still exists (rare but possible), the probe would succeed via the LAN — giving a false healthy result for the WAN interface specifically.

Tracking Objects for ICMP Probes

! ── Track 1: reachability of ISP-A gateway ────────────────
NetsTuts_R1(config)#track 1 ip sla 1 reachability
NetsTuts_R1(config-track)# delay down 10 up 10
NetsTuts_R1(config-track)#exit

! ── Track 2: reachability of ISP-B gateway ────────────────
NetsTuts_R1(config)#track 2 ip sla 2 reachability
NetsTuts_R1(config-track)# delay down 10 up 10
NetsTuts_R1(config-track)#exit
  
The delay down 10 up 10 setting introduces a 10-second hold-down in both directions. Without this, a single dropped probe (possible due to a momentary burst of congestion or a brief ICMP rate-limit on the ISP router) would immediately flip the tracking object to Down and fire the EEM alert. With delay down 10, the tracking object only transitions to Down after the probe has been failing continuously for 10 seconds — approximately one missed probe cycle at frequency 30. The delay up 10 prevents a flapping link from generating rapid successive Down/Up alert pairs before it has truly stabilised. For detailed IP SLA and tracking configuration, see IP SLA Configuration & Tracking.

5. Step 3 — UDP Jitter Probe for Voice Quality Monitoring

UDP jitter probes measure the metrics that matter for voice and video quality: jitter (delay variation), packet loss, and out-of-order delivery. Unlike ICMP echo, UDP jitter requires a Cisco IP SLA Responder on the target device. The responder timestamps probe packets with hardware-precision clocks, enabling accurate one-way delay measurement.

Configure the Responder on the Branch Router

! ── On Branch_Router — enable IP SLA Responder ────────────
Branch_Router>en
Branch_Router#conf t
Branch_Router(config)#ip sla responder
Branch_Router(config)#ip sla responder udp-echo ipaddress 10.10.0.1 port 5000
Branch_Router(config)#end
Branch_Router#wr
  

Configure the UDP Jitter Probe on NetsTuts_R1

! ══════════════════════════════════════════════════════════
! IP SLA 3 — UDP jitter to branch (voice quality)
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 3
NetsTuts_R1(config-ip-sla)# udp-jitter 10.10.0.1 5000 source-ip 10.0.0.1 \
   source-port 5001 num-packets 20 interval 20
NetsTuts_R1(config-ip-sla-jitter)#  frequency 60
NetsTuts_R1(config-ip-sla-jitter)#  timeout 5000
NetsTuts_R1(config-ip-sla-jitter)#  threshold 150
NetsTuts_R1(config-ip-sla-jitter)#  rtt-threshold 100
NetsTuts_R1(config-ip-sla-jitter)#  mos-threshold 3.60
NetsTuts_R1(config-ip-sla-jitter)#  tag BRANCH-VOIP-MONITOR
NetsTuts_R1(config-ip-sla-jitter)#exit
NetsTuts_R1(config)#ip sla schedule 3 life forever start-time now

! ── Track 3: state of jitter probe ───────────────────────
! ── Uses "state" not "reachability" — detects threshold
! ── violations even without complete packet loss ──────────
NetsTuts_R1(config)#track 3 ip sla 3 state
NetsTuts_R1(config-track)# delay down 15 up 30
NetsTuts_R1(config-track)#exit
  
The UDP jitter probe sends num-packets 20 test packets spaced interval 20 milliseconds apart — simulating a stream of RTP voice packets. rtt-threshold 100 marks the operation over-threshold if the average RTT exceeds 100 ms — the ITU-T G.114 recommendation for one-way voice delay is 150 ms, so a 100 ms RTT threshold provides early warning before voice quality degrades. mos-threshold 3.60 generates a threshold violation if the calculated MOS (Mean Opinion Score) drops below 3.60 — a MOS below 3.5 is considered unacceptable for VoIP. For QoS configuration that protects voice traffic on the WAN link, see QoS Overview. delay up 30 is longer than delay down 15 because voice quality must stabilise for 30 seconds before declaring recovery — a brief improvement followed by another degradation would otherwise generate rapid Down/Up alert pairs.

6. Step 4 — HTTP and DNS Application Probes

ICMP and UDP probes test Layer 3/4 connectivity. HTTP and DNS probes test the application layer — a server can be pingable while its web service or DNS resolver is down. These probes catch application failures that ICMP monitoring misses entirely.

! ══════════════════════════════════════════════════════════
! IP SLA 4 — HTTP GET to internal web server
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 4
NetsTuts_R1(config-ip-sla)# http get http://10.0.0.50/health source-ip 10.0.0.1
NetsTuts_R1(config-ip-sla-http)#  frequency 60
NetsTuts_R1(config-ip-sla-http)#  timeout 10000
NetsTuts_R1(config-ip-sla-http)#  threshold 5000
NetsTuts_R1(config-ip-sla-http)#  tag WEBSERVER-HTTP-MONITOR
NetsTuts_R1(config-ip-sla-http)#exit
NetsTuts_R1(config)#ip sla schedule 4 life forever start-time now

! ── Track 4: reachability of HTTP probe ───────────────────
NetsTuts_R1(config)#track 4 ip sla 4 reachability
NetsTuts_R1(config-track)# delay down 15 up 15
NetsTuts_R1(config-track)#exit

! ══════════════════════════════════════════════════════════
! IP SLA 5 — DNS resolution test
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 5
NetsTuts_R1(config-ip-sla)# dns netstuts.com name-server 10.0.0.53 \
   source-ip 10.0.0.1
NetsTuts_R1(config-ip-sla-dns)#  frequency 60
NetsTuts_R1(config-ip-sla-dns)#  timeout 5000
NetsTuts_R1(config-ip-sla-dns)#  tag DNS-RESOLVER-MONITOR
NetsTuts_R1(config-ip-sla-dns)#exit
NetsTuts_R1(config)#ip sla schedule 5 life forever start-time now

! ── Track 5: reachability of DNS probe ───────────────────
NetsTuts_R1(config)#track 5 ip sla 5 reachability
NetsTuts_R1(config-track)# delay down 15 up 15
NetsTuts_R1(config-track)#exit
  
The HTTP probe fetches /health — a lightweight status endpoint that returns HTTP 200 if the application is running. A full page fetch would work but wastes bandwidth on every probe cycle. If the web server is responding but the application is down (returning HTTP 500), the IP SLA HTTP probe detects this because it checks for a successful HTTP response code — not just TCP connectivity. The DNS probe resolves netstuts.com against the specific internal DNS server 10.0.0.53 — it will fail if the DNS server is unreachable or if the resolver cannot resolve the name, but will succeed even if external DNS is down (as long as the internal resolver is healthy). For static routing configuration that uses tracking objects for WAN failover, see Static Routing Configuration.

7. Step 5 — EEM Applets: Down Alert and Up Recovery Pairs

Each monitoring target needs two applets — one that fires when the tracking object goes Down (alert) and one that fires when it returns to Up (recovery notification). Without the recovery applet, the NOC team has no automated confirmation that an outage has ended and must manually verify resolution.

ISP-A Gateway — Down and Up Applets

! ══════════════════════════════════════════════════════════
! ISPA-GW-DOWN — fires when track 1 transitions to Down
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#event manager applet ISPA-GW-DOWN
NetsTuts_R1(config-applet)# description "Alert: ISP-A gateway unreachable"
NetsTuts_R1(config-applet)# event track 1 state down
NetsTuts_R1(config-applet)#  maxrun 90

! ── ACTION 1: Critical syslog alert ──────────────────────
NetsTuts_R1(config-applet)# action 1.0 syslog priority critical \
   msg "*** OUTAGE *** ISP-A gateway 203.0.113.1 UNREACHABLE on $_hostname — SLA probe failing"

! ── ACTION 2: Capture SLA statistics at moment of failure ─
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 1 | redirect flash:/sla-logs/ispa-failure.txt"

! ── ACTION 3: Capture interface state ────────────────────
NetsTuts_R1(config-applet)# action 3.0 cli command \
   "show interfaces GigabitEthernet0/0 | redirect flash:/sla-logs/ispa-intf.txt"

! ── ACTION 4: Capture routing table — confirm failover ────
NetsTuts_R1(config-applet)# action 4.0 cli command \
   "show ip route | redirect flash:/sla-logs/ispa-route.txt"

! ── ACTION 5: Email NOC team ──────────────────────────────
NetsTuts_R1(config-applet)# action 5.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "*** OUTAGE: ISP-A Gateway DOWN on $_hostname ***" \
   body "ALERT: IP SLA probe 1 reports ISP-A gateway 203.0.113.1 \
   is UNREACHABLE from $_hostname. \
   Diagnostics saved to flash:/sla-logs/. \
   Verify routing failover to ISP-B is active. \
   Check show ip route on the router."

NetsTuts_R1(config-applet)#exit

! ══════════════════════════════════════════════════════════
! ISPA-GW-UP — fires when track 1 transitions back to Up
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#event manager applet ISPA-GW-UP
NetsTuts_R1(config-applet)# description "Recovery: ISP-A gateway reachable again"
NetsTuts_R1(config-applet)# event track 1 state up
NetsTuts_R1(config-applet)#  maxrun 60

! ── ACTION 1: Informational syslog — recovery ─────────────
NetsTuts_R1(config-applet)# action 1.0 syslog priority notice \
   msg "*** RECOVERY *** ISP-A gateway 203.0.113.1 REACHABLE again on $_hostname"

! ── ACTION 2: Capture SLA statistics after recovery ───────
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 1 | redirect flash:/sla-logs/ispa-recovery.txt"

! ── ACTION 3: Email recovery notification ─────────────────
NetsTuts_R1(config-applet)# action 3.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "RECOVERY: ISP-A Gateway restored on $_hostname" \
   body "RECOVERY: IP SLA probe 1 reports ISP-A gateway 203.0.113.1 \
   is now REACHABLE from $_hostname. \
   Verify primary routing has been restored. \
   Check routing table to confirm ISP-A routes are active."

NetsTuts_R1(config-applet)#exit
  

ISP-B Gateway — Down and Up Applets

NetsTuts_R1(config)#event manager applet ISPB-GW-DOWN
NetsTuts_R1(config-applet)# description "Alert: ISP-B gateway unreachable"
NetsTuts_R1(config-applet)# event track 2 state down
NetsTuts_R1(config-applet)#  maxrun 90
NetsTuts_R1(config-applet)# action 1.0 syslog priority critical \
   msg "*** OUTAGE *** ISP-B gateway 198.51.100.1 UNREACHABLE on $_hostname"
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 2 | redirect flash:/sla-logs/ispb-failure.txt"
NetsTuts_R1(config-applet)# action 3.0 cli command \
   "show interfaces GigabitEthernet0/1 | redirect flash:/sla-logs/ispb-intf.txt"
NetsTuts_R1(config-applet)# action 4.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "OUTAGE: ISP-B Gateway DOWN on $_hostname" \
   body "IP SLA probe 2 reports ISP-B gateway 198.51.100.1 \
   is UNREACHABLE from $_hostname."
NetsTuts_R1(config-applet)#exit

NetsTuts_R1(config)#event manager applet ISPB-GW-UP
NetsTuts_R1(config-applet)# description "Recovery: ISP-B gateway reachable again"
NetsTuts_R1(config-applet)# event track 2 state up
NetsTuts_R1(config-applet)#  maxrun 30
NetsTuts_R1(config-applet)# action 1.0 syslog priority notice \
   msg "*** RECOVERY *** ISP-B gateway 198.51.100.1 REACHABLE again on $_hostname"
NetsTuts_R1(config-applet)# action 2.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "RECOVERY: ISP-B Gateway restored on $_hostname" \
   body "IP SLA probe 2 reports ISP-B gateway 198.51.100.1 \
   is REACHABLE from $_hostname."
NetsTuts_R1(config-applet)#exit
  

Branch Jitter — Down and Up Applets

NetsTuts_R1(config)#event manager applet BRANCH-JITTER-DOWN
NetsTuts_R1(config-applet)# description "Alert: Branch voice quality degraded"
NetsTuts_R1(config-applet)# event track 3 state down
NetsTuts_R1(config-applet)#  maxrun 90
NetsTuts_R1(config-applet)# action 1.0 syslog priority critical \
   msg "*** VOIP DEGRADED *** Branch UDP jitter probe failing on $_hostname — check WAN QoS"
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 3 details | redirect flash:/sla-logs/branch-jitter.txt"
NetsTuts_R1(config-applet)# action 3.0 cli command \
   "show policy-map interface GigabitEthernet0/0 | \
   redirect flash:/sla-logs/branch-qos.txt"
NetsTuts_R1(config-applet)# action 4.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "VOIP QUALITY ALERT: Branch jitter threshold exceeded on $_hostname" \
   body "UDP jitter probe 3 to branch (10.10.0.1) reports threshold \
   violation on $_hostname. VoIP quality may be degraded. \
   Check WAN QoS policy and interface utilisation."
NetsTuts_R1(config-applet)#exit

NetsTuts_R1(config)#event manager applet BRANCH-JITTER-UP
NetsTuts_R1(config-applet)# description "Recovery: Branch voice quality restored"
NetsTuts_R1(config-applet)# event track 3 state up
NetsTuts_R1(config-applet)#  maxrun 30
NetsTuts_R1(config-applet)# action 1.0 syslog priority notice \
   msg "*** VOIP RESTORED *** Branch jitter probe back in threshold on $_hostname"
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 3 | redirect flash:/sla-logs/branch-jitter-recovery.txt"
NetsTuts_R1(config-applet)#exit
  

Web Server — Down and Up Applets

NetsTuts_R1(config)#event manager applet WEBSERVER-DOWN
NetsTuts_R1(config-applet)# description "Alert: Internal web server HTTP probe failing"
NetsTuts_R1(config-applet)# event track 4 state down
NetsTuts_R1(config-applet)#  maxrun 60
NetsTuts_R1(config-applet)# action 1.0 syslog priority critical \
   msg "*** OUTAGE *** Web server HTTP probe FAILING on $_hostname — http://10.0.0.50"
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 4 | redirect flash:/sla-logs/webserver-failure.txt"
NetsTuts_R1(config-applet)# action 3.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "OUTAGE: Web server (10.0.0.50) DOWN on $_hostname" \
   body "IP SLA HTTP probe 4 cannot reach http://10.0.0.50/health. \
   Server may be down or the application is not responding. \
   Escalate to the application team."
NetsTuts_R1(config-applet)#exit

NetsTuts_R1(config)#event manager applet WEBSERVER-UP
NetsTuts_R1(config-applet)# description "Recovery: Web server HTTP responding again"
NetsTuts_R1(config-applet)# event track 4 state up
NetsTuts_R1(config-applet)#  maxrun 30
NetsTuts_R1(config-applet)# action 1.0 syslog priority notice \
   msg "*** RECOVERY *** Web server HTTP probe SUCCEEDING on $_hostname"
NetsTuts_R1(config-applet)# action 2.0 mail server "$_email_server" \
   to "$_email_to" \
   from "$_email_from" \
   subject "RECOVERY: Web server (10.0.0.50) restored on $_hostname" \
   body "IP SLA HTTP probe 4 reports http://10.0.0.50/health is \
   responding successfully. Application appears to be restored."
NetsTuts_R1(config-applet)#exit

NetsTuts_R1(config)#end
NetsTuts_R1#wr
  
The event track N state up recovery applet is as important as the down alert. Without it, the NOC team must either manually poll the router for track state or wait for the next monitoring cycle on their NMS to confirm resolution. An automated recovery notification closes the incident loop: the on-call engineer receives the down alert, works the issue, and receives the recovery alert — no manual verification step needed. For OSPF deployments where the tracking object also controls route injection or redistribution, recovery is especially critical to confirm the primary route has been re-advertised — see OSPF Single-Area Configuration. For HSRP/FHRP integration with tracking, see FHRP — HSRP, VRRP & GLBP and HSRP.

8. Step 6 — Advanced: RTT Threshold Alerting (Degradation Before Failure)

Reachability probes alert only when a target becomes completely unreachable. RTT threshold alerting goes further — it generates an alert when the link is still up but latency has degraded to a level that impacts applications. This gives the NOC team early warning before users start complaining.

! ══════════════════════════════════════════════════════════
! IP SLA 6 — ISP-A with RTT threshold alerting
! Alerts if RTT exceeds 100ms even if probe still succeeds
! ══════════════════════════════════════════════════════════
NetsTuts_R1(config)#ip sla 6
NetsTuts_R1(config-ip-sla)# icmp-echo 203.0.113.1 source-interface GigabitEthernet0/0
NetsTuts_R1(config-ip-sla-echo)#  frequency 30
NetsTuts_R1(config-ip-sla-echo)#  timeout 5000
NetsTuts_R1(config-ip-sla-echo)#  threshold 100
NetsTuts_R1(config-ip-sla-echo)#  tag ISP-A-LATENCY-MONITOR
NetsTuts_R1(config-ip-sla-echo)#exit
NetsTuts_R1(config)#ip sla schedule 6 life forever start-time now

! ── Track 6 on STATE (not reachability)
! ── "state" fires when probe is over-threshold, even if
! ── the probe technically succeeds (not a timeout)
NetsTuts_R1(config)#track 6 ip sla 6 state
NetsTuts_R1(config-track)# delay down 20 up 30
NetsTuts_R1(config-track)#exit

! ── EEM applet for latency degradation alert ──────────────
NetsTuts_R1(config)#event manager applet ISPA-LATENCY-HIGH
NetsTuts_R1(config-applet)# description "Alert: ISP-A RTT above 100ms threshold"
NetsTuts_R1(config-applet)# event track 6 state down
NetsTuts_R1(config-applet)#  maxrun 60
NetsTuts_R1(config-applet)# action 1.0 syslog priority warning \
   msg "*** LATENCY WARNING *** ISP-A RTT exceeded 100ms threshold on $_hostname — link degraded"
NetsTuts_R1(config-applet)# action 2.0 cli command "enable"
NetsTuts_R1(config-applet)# action 2.1 cli command \
   "show ip sla statistics 6 details | redirect flash:/sla-logs/ispa-latency.txt"
NetsTuts_R1(config-applet)#exit

NetsTuts_R1(config)#event manager applet ISPA-LATENCY-NORMAL
NetsTuts_R1(config-applet)# description "Recovery: ISP-A RTT back below threshold"
NetsTuts_R1(config-applet)# event track 6 state up
NetsTuts_R1(config-applet)#  maxrun 30
NetsTuts_R1(config-applet)# action 1.0 syslog priority notice \
   msg "*** LATENCY NORMAL *** ISP-A RTT back below 100ms threshold on $_hostname"
NetsTuts_R1(config-applet)#exit
  
The distinction between track N ip sla N reachability and track N ip sla N state is subtle but important for latency alerting. Reachability is binary: the probe either gets a response within the timeout window or it does not. A probe that takes 4,900 ms to respond (still within the 5,000 ms timeout) is considered "reachable" — no alert fires even though the WAN is almost unusable. State incorporates the threshold value: the probe is considered over-threshold when the RTT exceeds the configured threshold, and the tracking object transitions to Down even though technically the probe is still receiving responses. This pattern — reachability probe for outage alerting, state probe for degradation alerting — gives two distinct alert tiers: Warning (high latency) and Critical (complete outage).

9. Verification

show ip sla statistics — Per-Probe Results

NetsTuts_R1#show ip sla statistics

IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: 8 milliseconds
Latest operation start time: 14:35:30 UTC Wed Oct 16 2024
Latest operation return code: OK
Number of successes: 142
Number of failures: 0
Operation time to live: Forever

IPSLA operation id: 2
        Latest RTT: 12 milliseconds
Latest operation start time: 14:35:33 UTC Wed Oct 16 2024
Latest operation return code: OK
Number of successes: 141
Number of failures: 0
Operation time to live: Forever

IPSLA operation id: 3
        Latest RTT: 24 milliseconds
Latest operation start time: 14:35:00 UTC Wed Oct 16 2024
Latest operation return code: OK
Number of successes: 71
Number of failures: 0
Operation time to live: Forever

IPSLA operation id: 4
        Latest RTT: 87 milliseconds
Latest operation start time: 14:35:00 UTC Wed Oct 16 2024
Latest operation return code: OK
Number of successes: 70
Number of failures: 0
Operation time to live: Forever
  
All four probes show Latest operation return code: OK and zero failures — all monitored targets are currently reachable and within thresholds. The RTT values (8 ms ISP-A, 12 ms ISP-B, 24 ms branch jitter, 87 ms HTTP) establish the baseline for each target. If an outage occurs, the statistics here will show return code: Timeout and incrementing failure counts.

show ip sla statistics details — Rich Jitter Metrics

NetsTuts_R1#show ip sla statistics 3 details

IPSLAs Latest Operation Statistics

IPSLA operation id: 3
Type of operation: UDP Jitter
        Latest RTT: 24 ms
Latest operation start time: 14:35:00 UTC Wed Oct 16 2024
Latest operation return code: OK
RTT Values:
        Number Of RTT: 20        RTT Min/Avg/Max: 22/24/31 milliseconds
Latency one-way time:
        Number of Latency one-way Samples: 20
        Source to Destination Latency one way Min/Avg/Max: 9/11/14 ms
        Destination to Source Latency one way Min/Avg/Max: 12/13/17 ms
Jitter Time:
        Num of SD Jitter Samples: 19
        Num of DS Jitter Samples: 19
        Source to Destination Jitter Min/Avg/Max: 0/1/4 ms
        Destination to Source Jitter Min/Avg/Max: 0/1/3 ms
Packet Loss Values:
        Loss Source to Destination: 0      Loss Destination to Source: 0
        Out Of Sequence: 0                 Tail Drop: 0
        Skipped: 0                         Late Arrival: 0
Voice Score Values:
        Calculated Planning Impairment Factor (ICPIF): 0
        MOS score: 4.40
Number of successes: 71
Number of failures: 0
Operation time to live: Forever
  
The UDP jitter statistics output shows every voice-quality metric in detail: MOS score: 4.40 (above the 3.60 threshold — excellent quality), one-way latency split into source-to-destination and destination-to-source components, per-direction jitter, and packet loss. This level of detail is impossible to obtain from a simple ICMP echo probe. When the EEM alert fires for this probe, the diagnostic file saved to flash contains this full output at the exact moment of the degradation — invaluable for pinpointing whether the issue is asymmetric (one direction only), latency-only without packet loss, or complete loss.

show track — Tracking Object States

NetsTuts_R1#show track

Track 1
  IP SLA 1 Reachability
  Reachability is Up
    2 changes, last change 00:47:23
  Latest operation return code: OK
  Latest RTT (millisecs) 8
  Tracked by:
    ISPA-GW-DOWN (EEM)
    ISPA-GW-UP   (EEM)

Track 2
  IP SLA 2 Reachability
  Reachability is Up
    1 change, last change 02:15:44

Track 3
  IP SLA 3 State
  State is Up
    3 changes, last change 00:12:05

Track 4
  IP SLA 4 Reachability
  Reachability is Up
    1 change, last change 04:30:10

Track 5
  IP SLA 5 Reachability
  Reachability is Up
    1 change, last change 04:30:12

Track 6
  IP SLA 6 State
  State is Up
    4 changes, last change 00:05:32
  
show track is the single most important operational command for this monitoring system. It shows the current state of every tracking object, the number of state changes (a high change count on track 6 suggests recurring latency bursts), and the time since the last state change. Track 1 shows 2 changes — the link was down and has since recovered, which should correlate with the ISPA-GW-DOWN and ISPA-GW-UP alerts in the syslog. The Tracked by: EEM lines confirm the applets are registered against this tracking object.

show ip sla statistics aggregated — Historical Performance

NetsTuts_R1#show ip sla statistics aggregated 1

IPSLAs Aggregated Statistics

IPSLA operation id: 1
Start Time Index: 14:00:00 UTC Wed Oct 16 2024
        Aggregation interval: 900 seconds (15 minutes)

  Round-Trip-Time (RTT) Values
        Num of Measurements: 30     Min RTT: 7 ms
        Max RTT: 145 ms             Avg RTT: 9 ms
        Over thresholds: 2          (2 probes exceeded 2000ms threshold)

  Number of successes: 28
  Number of failures: 2
  Completion Time: 14:15:00 UTC Wed Oct 16 2024
  
show ip sla statistics aggregated shows the last 15-minute (configurable) window of probe results. This reveals intermittent problems that the instantaneous show ip sla statistics misses — in this example, 2 of 30 probes in the last 15 minutes failed (Number of failures: 2) and 2 exceeded the 2,000 ms threshold. The current probe shows OK, but the aggregated history shows the link is experiencing intermittent connectivity issues. This is the difference between point-in-time monitoring and trend analysis.

show logging — Confirm Alert Flow

NetsTuts_R1#show logging | include SLA\|OUTAGE\|RECOVERY\|HA_EM

Oct 16 14:32:01: %TRACK-6-STATE: 1 ip sla 1 reachability Up->Down
Oct 16 14:32:11: %HA_EM-2-LOG: ISPA-GW-DOWN: *** OUTAGE *** ISP-A gateway \
   203.0.113.1 UNREACHABLE on NetsTuts_R1 — SLA probe failing
Oct 16 14:32:12: %HA_EM-6-LOG: ISPA-GW-DOWN: diagnostic files saved to \
   flash:/sla-logs/
Oct 16 14:38:45: %TRACK-6-STATE: 1 ip sla 1 reachability Down->Up
Oct 16 14:38:55: %HA_EM-5-LOG: ISPA-GW-UP: *** RECOVERY *** ISP-A gateway \
   203.0.113.1 REACHABLE again on NetsTuts_R1
  
The syslog shows the complete event timeline: at 14:32:01 the tracking object transitions Up->Down (IOS-generated message), 10 seconds later (the delay down 10 hold-down has elapsed) the EEM applet ISPA-GW-DOWN fires at 14:32:11 and generates the CRITICAL alert. At 14:38:45 the tracking object transitions Down->Up and 10 seconds later the ISPA-GW-UP recovery applet fires. The outage lasted approximately 6 minutes and 44 seconds — this timeline is now permanently recorded in the syslog and the diagnostic files on flash allow post-mortem analysis of what the router's state was at the moment of failure. For forwarding these alerts to a central server, see Syslog Server Configuration.

Verification Command Summary

Command What It Shows Primary Use
show ip sla statistics All probes — latest RTT, return code (OK/Timeout/Error), success and failure counts since last reset Instant health check — confirm all probes are returning OK with zero or low failure counts
show ip sla statistics [N] details Single probe — full detail including per-direction RTT, jitter, packet loss, MOS, threshold violations Deep-dive on a specific probe, especially UDP jitter — verify voice quality metrics
show ip sla statistics aggregated [N] 15-minute aggregated window — min/avg/max RTT, total successes/failures, over-threshold count Identify intermittent issues that the current probe misses — reveals patterns over time
show ip sla configuration [N] Full probe configuration — target IP, source, frequency, timeout, threshold, tag, schedule Verify probe is configured correctly — confirm source interface, frequency, and threshold values
show track All tracking objects — current state (Up/Down), change count, last change time, EEM subscribers Confirm tracking objects are Up and EEM applets are registered. High change count indicates a flapping probe
show track [N] Single tracking object detail — IP SLA operation, reachability/state type, delay settings Verify delay down/up settings are correct and confirm the linked IP SLA operation number
show event manager policy registered All EEM applets — name, event type (track), registered track number, registration time Confirm all down and up applets are registered against the correct track object numbers
show event manager history events EEM execution history — applet name, event type, execution time Verify applets fired when expected. Cross-reference with show logging timestamps
show logging | include TRACK\|HA_EM All tracking state changes and EEM syslog actions in the log buffer Complete timeline — correlate track state transitions with EEM alert timestamps
dir flash:/sla-logs/ Diagnostic files written by EEM actions Confirm files are being created at failure events. Use more flash:/sla-logs/[file].txt to review captured output

10. Troubleshooting IP SLA + EEM Monitoring

Problem Symptom Cause Fix
Probe shows continuous failures but target is reachable show ip sla statistics 1 shows return code: Timeout and incrementing failure count even though manual pings to the target succeed The probe is not using source-interface and is being sourced from a different interface than intended, or the target device rate-limits ICMP causing the probe to time out while manual pings from the router succeed because they use a shorter timeout. Alternatively, the ISP gateway specifically blocks ICMP from some source IPs but not the router's loopback Add source-interface GigabitEthernet0/0 to the probe configuration — this forces the probe to use the WAN interface IP as its source, matching the exact path to the gateway. Verify the exact source IP used: show ip sla configuration 1 shows the source IP being used. If rate-limiting is the issue, increase the frequency to 60 seconds to reduce ICMP rate — or switch to tcp-connect probe on a port the gateway accepts
EEM applet does not fire when track state changes show track shows track state is Down, but show event manager history events shows no execution of the ISPA-GW-DOWN applet The EEM applet is registered against the wrong track object number — the applet says event track 2 state down but the tracking object for ISP-A is track 1. Or the applet event clause says state down but the tracking object is using reachability type (which generates a different event notification) Run show event manager policy registered — confirm the applet shows the correct track number and state. Run show track 1 — confirm the track type (Reachability vs State) matches the applet's expectation. The EEM event track N state down works with both reachability and state tracking objects — "state down" means the object transitioned to the Down state regardless of tracking type. Re-check the track number in the applet matches the track number shown in show track
Track object flapping — repeated Down/Up transitions show track 1 shows a very high change count (50+ changes in an hour). EEM applet fires repeatedly, flooding syslog and the NOC inbox with alternating OUTAGE/RECOVERY emails The track delay values are too short — a single dropped probe immediately triggers a Down transition, the next successful probe triggers Up, and so on. Or the WAN link is genuinely unstable (physical layer issue, ISP congestion) Increase the track delay values: track 1 ip sla 1 reachabilitydelay down 30 up 60. This requires 30 consecutive seconds of failure before declaring Down (approximately one missed probe at frequency 30), and 60 seconds of continuous success before declaring recovery. Add ratelimit 600 to the EEM event clause as an additional protection. Investigate the underlying WAN stability separately with show ip sla statistics aggregated 1 to see the failure pattern
UDP jitter probe shows return code: Busy or No Connection show ip sla statistics 3 shows return code other than OK — specifically Busy, No Connection, or Timeout Busy means the responder is not enabled or not listening on the configured port on the target router. No Connection means IP connectivity exists but the responder is not accepting UDP on port 5000. Timeout means no response at all — possible if the probe can reach the router but the responder is not running Verify the responder is enabled on the branch router: SSH to Branch_Router and run show ip sla responder — it should show UDP responder listening on port 5000. If not, re-configure: ip sla responder and ip sla responder udp-echo ipaddress 10.10.0.1 port 5000. Confirm the ACL on the branch router does not block UDP 5000. Also verify the source port configured on the probe does not conflict with other probes (each probe needs a unique source port)
IP SLA probe stops running after a reload After a router reload, show ip sla statistics shows old data but no new measurements — the probe is not generating new results The ip sla schedule was configured with a specific start time in the past (start-time 14:00:00) rather than start-time now or start-time after 0:0:5. After a reload, IOS sees the start time has already passed and does not restart the schedule. Alternatively, ip sla schedule was configured with a life value that has expired Reconfigure the schedule: ip sla schedule 1 life forever start-time now. The life forever ensures the probe never expires. start-time now restarts it immediately. Verify the probe is running after the schedule: show ip sla statistics 1 should show the Latest operation start time updating every frequency seconds
Alert fires for a target that has a planned maintenance window A server is being patched and taken offline deliberately. The monitoring system fires OUTAGE alerts and emails the NOC every 30 seconds during the maintenance window — flooding the team with false positives they must manually suppress No maintenance mode mechanism is built into the IP SLA + EEM monitoring by default. The tracking object transitions Down the moment the probe fails, regardless of whether the outage is planned or unplanned For planned maintenance, temporarily suspend the IP SLA schedule: no ip sla schedule 4 before the maintenance window, then ip sla schedule 4 life forever start-time now after. Alternatively, add an EEM environment variable as a maintenance flag (event manager environment _maintenance 1) and add a conditional check in the applet: action 0.5 if $_maintenance eq 1action 0.6 exit to skip all further actions when maintenance mode is active

Key Points & Exam Tips

  • The complete IP SLA alerting pipeline has three layers: IP SLA probe (sends synthetic test traffic and measures results), Object Tracking (translates probe results into a binary Up/Down state with configurable delay), and EEM applet (fires on state transitions and executes alert actions). All three layers must be correctly configured for automated alerting to work.
  • Always configure source-interface on WAN gateway probes. Without it, IOS sources the probe from the best available exit interface — a failed WAN link that still has an alternate path may produce false-healthy results if the probe routes around the failure instead of through it.
  • The track delay down [seconds] up [seconds] command prevents false alerts from single dropped probes. delay down requires the probe to fail continuously for N seconds before declaring Down; delay up requires continuous success for N seconds before declaring recovery. Size these to be slightly longer than one probe cycle at the configured frequency.
  • There are two tracking types for IP SLA objects: reachability (Down when probe times out — complete failure only) and state (Down when probe result exceeds the configured threshold — fires on latency degradation even without packet loss). Use both together for two-tier alerting: Warning on degradation, Critical on outage.
  • Always deploy paired applets — one on event track N state down and one on event track N state up. The down applet alerts on outage; the up applet confirms recovery. Without the recovery applet, the NOC team must manually verify resolution — defeating the purpose of automated monitoring.
  • UDP jitter probes require the Cisco IP SLA Responder (ip sla responder) on the target device. The responder uses hardware timestamps for precise one-way delay measurement. Without the responder, UDP jitter probes return Busy or No Connection return codes.
  • show ip sla statistics shows the current probe result (latest RTT, return code, cumulative success/failure counts). show ip sla statistics aggregated shows the historical 15-minute window including min/avg/max RTT and over-threshold counts — essential for identifying intermittent problems that the instantaneous view misses.
  • show track is the primary operational command — it shows current state, change count, last change time, and which EEM applets are subscribed. A high change count on a tracking object indicates a flapping probe — investigate both the underlying path stability and the track delay values.
  • IP SLA schedules configured with a past start-time do not restart automatically after a reload. Always use start-time now or start-time after 0:0:5 with life forever to ensure probes survive router reloads.
  • On the exam: know the three IP SLA probe types and whether they require a responder (ICMP — no; UDP jitter — yes; HTTP/DNS — no), the two tracking types (reachability vs state), the delay down/up purpose, and the EEM event track N state down/up syntax. For traffic-volume monitoring alongside SLA alerting, see NetFlow Configuration.
Next Steps: For the EEM applet architecture that this lab depends on, see EEM — Embedded Event Manager Scripting. For IP SLA probe configuration, Object Tracking, and tracked static routes in detail, see IP SLA Configuration & Tracking. For forwarding EEM syslog alerts to a central server for long-term storage and correlation, see Syslog Configuration and Syslog Server Configuration. For SNMP-based monitoring that complements IP SLA with threshold traps, see SNMP v2c & v3 Configuration and SNMP Traps. For Python-based multi-device monitoring that can poll IP SLA statistics across an entire network, see Python Netmiko Show Commands. For traffic capture at the moment of a detected outage, see SPAN & RSPAN Port Mirroring.

TEST WHAT YOU LEARNED

1. Why is source-interface GigabitEthernet0/0 critical on a WAN gateway ICMP echo probe, and what false result does omitting it risk?

Correct answer is B. This is the most important practical detail in WAN gateway monitoring with IP SLA. The false-healthy scenario is not theoretical — it happens in production whenever a dual-WAN router has a default route that can reach the ISP-A gateway address via either WAN link. When ISP-A fails and the router's default route fails over to ISP-B, the probe without source-interface happily sends its ICMP packets out ISP-B toward the ISP-A gateway IP. If the ISP-A gateway is still reachable from ISP-B's network (which it often is since both ISPs typically peer with the same internet backbone), the probe returns OK. The monitoring system reports no outage, no EEM alert fires, and the NOC team has no idea ISP-A is down — until someone runs a traceroute and sees all traffic routing through ISP-B. The source-interface binding creates a hard dependency: the probe can only send packets out GigabitEthernet0/0 and can only receive the reply on GigabitEthernet0/0. If that interface is down, IOS cannot send the probe at all and the operation returns an immediate failure.

2. What is the difference between track N ip sla N reachability and track N ip sla N state, and when should each be used for alerting?

Correct answer is D. This distinction is fundamental to building a useful monitoring system. Consider a WAN link where the RTT is normally 8 ms. Due to ISP congestion the RTT rises to 300 ms — too high for VoIP or real-time applications. ICMP probes are still receiving responses (300 ms is within a 5,000 ms timeout), so reachability remains Up and no alert fires. Meanwhile, users on VoIP calls are experiencing choppy audio and the NOC team has no alert. state tracking solves this: with threshold 100 in the probe configuration, a 300 ms RTT puts the operation over-threshold, the state tracking object transitions Down, and the EEM applet fires a Warning alert. The network is not down, but it is degraded. Running both tracking objects (reachability for Critical alerting, state for Warning alerting) gives the two-tier alert system that enterprise NOCs need: Warning = investigate and prepare, Critical = immediate response required. The threshold value in the IP SLA probe configuration is what the state tracking type uses for its comparison — setting an appropriate threshold for the application being monitored is the key design decision.

3. A UDP jitter probe to the branch router returns return code: No Connection. The branch router is reachable via ping. What is the most likely cause?

Correct answer is C. The return code No Connection specifically means the TCP/UDP connection to the responder port was refused or not answered — the network path exists (the router is pingable, proving Layer 3 reachability) but the UDP port is not listening. This is exactly the behaviour you get when the IP SLA Responder is not configured on the target: UDP packets arrive at the branch router on port 5000, but no process is listening on that port, so the OS returns an ICMP Port Unreachable, which IP SLA interprets as No Connection. ICMP probes do not need a responder because ICMP echo is handled natively by every IP stack — any router or server will respond to ping by default. UDP jitter is different: it sends custom-formatted UDP packets to a specific port that only the Cisco IP SLA Responder knows how to process and timestamp. Without the responder, the probe cannot measure jitter or one-way delay because there is no endpoint to apply hardware timestamps. Diagnose by SSH'ing to the branch router and running show ip sla responder — if this shows no responders configured or not enabled, add the configuration: ip sla responder (global enable) and ip sla responder udp-echo ipaddress 10.10.0.1 port 5000.

4. show track 1 shows 87 changes in the last hour for an IP SLA reachability probe with frequency 30. What does this indicate and how should it be addressed?

Correct answer is A. The change count in show track is one of the most useful diagnostic indicators in the IP SLA + tracking system. A tracking object on a healthy WAN link should show very few changes — typically 1–2 (one initial Up transition when the probe starts, possibly one Down/Up pair during a real outage). A high change count (especially 50+ per hour) is always a signal that something needs attention. The diagnosis path is: first determine whether the probe failures are real or artefactual by examining show ip sla statistics aggregated — this reveals the actual probe success/failure rate. If probe failures are rare (suggesting delay values are too short and single dropped probes flip the state), increase both delay down and delay up. If probe failures are frequent, the WAN path is genuinely unstable. In either case, adding ratelimit to the EEM applets is important to prevent a flapping WAN from generating hundreds of alternating OUTAGE/RECOVERY emails per hour. The ratelimit does not fix the underlying problem but protects the NOC from alert fatigue while the investigation proceeds.

5. After a router reload, show ip sla statistics shows the last measurement was taken before the reload and no new measurements are being generated. What is the cause and fix?

Correct answer is D. This is a common production gotcha. When ip sla schedule N life forever start-time 09:00:00 is saved in the configuration, IOS saved the absolute start time. After a reload at 14:00:00, IOS reads the startup-config, sees start-time 09:00:00, calculates that 09:00:00 was 5 hours ago, and does not restart the schedule — the start time has already passed. The probe is configured correctly, it is in the running-config, but it is not running. The symptom is exactly as described: show ip sla statistics shows old pre-reload data with no new measurements. The fix is immediate: ip sla schedule N life forever start-time now. The best practice that prevents this entirely is to always write schedules with start-time now — this means "start immediately when this config line is processed," which works correctly both when first configured and after every subsequent reload. start-time after 0:0:5 (start 5 seconds after the configuration is processed) is useful when configuring multiple probes in a block, ensuring all probes start cleanly after the configuration has been fully applied.

6. An ICMP echo probe has threshold 2000 and timeout 5000. An event track N state down EEM applet monitors this probe with track N ip sla N state. When does the tracking object go Down and when does it stay Up?

Correct answer is B. Understanding the interaction between threshold and timeout in the context of state vs reachability tracking is a core concept for this lab. The timeout defines the maximum wait time for a probe response — if no response arrives within this window, the probe returns Timeout and the operation is a failure regardless of tracking type. The threshold defines the RTT value above which the operation is considered "over threshold" — the probe received a response but it took longer than expected. With track N ip sla N state: the tracking object goes Down when either condition occurs — timeout (complete failure) OR over-threshold (response received but RTT > 2,000 ms). With track N ip sla N reachability: the tracking object only goes Down on complete timeout — a response received at 4,900 ms (below the 5,000 ms timeout but far above the 2,000 ms threshold) keeps reachability Up. This is why the two tracking types serve different monitoring purposes and why combining both (as done in Step 6 of this lab) provides the two-tier Warning + Critical alerting model.

7. Why does an IP SLA + EEM monitoring system require a paired applet (one for state down and one for state up), and what operational problem occurs without the recovery applet?

Correct answer is C. The paired applet pattern is about closing the incident loop — the complete lifecycle of a network event notification should include both the opening of the incident (Down alert) and the confirmation of resolution (Up recovery). Without the recovery applet, the NOC team receives an alert at 2 AM, the on-call engineer wakes up and investigates, the issue resolves itself at 2:15 AM (perhaps a brief ISP congestion spike), but the engineer has no automated indication that the problem is gone. They spend additional time manually verifying the link is stable before they can go back to sleep. With the recovery applet, the engineer's phone receives the recovery notification at 2:15 AM and the incident is confirmed closed without a manual check. At a NOC scale, this matters enormously: managing 50 simultaneous alerts without recovery notifications means 50 ongoing manual verification tasks. With recovery notifications, each alert is automatically paired with a resolution and the NOC dashboard accurately reflects the current state of the network. Options A, B, and D describe incorrect technical behaviour — the EEM scheduler and tracking system work correctly regardless of whether paired applets are configured; the gap is purely operational.

8. How does an HTTP probe detect an application failure that an ICMP echo probe would miss, and give a specific scenario where this distinction matters?

Correct answer is A. This is the fundamental reason for deploying application-layer probes alongside network-layer probes. The OSI model tells us that each layer can fail independently: a server can have a healthy physical link (Layer 1), healthy IP stack (Layer 3), and successful ICMP responses (Layer 3 ping), while its application (Layer 7) is completely broken. ICMP probes test exactly up to Layer 3 — they tell you whether packets can reach the IP address. HTTP probes test up to Layer 7 — they tell you whether the application is responding correctly. The specific scenario in option A is extremely common: web application crashes, database connection failures, Apache/Nginx process hangs, out-of-memory conditions that crash the application but not the OS — all of these cause application unavailability while the server continues to respond to ping. Monitoring only with ICMP would report these servers as healthy. The IP SLA HTTP probe to a lightweight /health endpoint (which the application itself must serve successfully) catches all of these. Option C describes the DNS probe, not a benefit of HTTP over ICMP specifically. The HTTP probe configured in this lab uses an IP address (http://10.0.0.50), not a hostname, so DNS resolution is not involved.

9. A production router is monitoring six targets with IP SLA probes. During a planned maintenance window, the primary WAN link is taken down for hardware replacement. What is the correct procedure to prevent the monitoring system from sending false-positive OUTAGE alerts during the maintenance window?

Correct answer is D. This is a real operational challenge — automated monitoring systems must have a mechanism for acknowledging planned downtime without permanently disabling monitoring. The EEM system has no built-in concept of "maintenance mode" (unlike dedicated NMS platforms like SolarWinds or Nagios which have scheduled downtime windows). The two practical approaches reflect a trade-off: suspending the IP SLA schedule (no ip sla schedule 1) is the simplest approach — with no active schedule, the probe does not send packets, no failures are recorded, and the tracking object state does not change (it stays in whatever state it was when the schedule was stopped). The maintenance flag approach is more sophisticated and preserves probe measurement continuity (statistics accumulate the entire time the probe is running), but requires the EEM applet to be written with conditional logic from the start. Neither deleting applets (option A, operationally risky and error-prone to recreate) nor changing timeouts (option B, doesn't actually prevent state transitions — just makes failures happen faster) is appropriate for a production environment. The key operational discipline is to always restore monitoring after maintenance: re-enable the schedule or clear the maintenance flag, and verify with show ip sla statistics and show track that probes are running and tracking objects are Up.

10. show ip sla statistics aggregated 1 shows 28 successes and 2 failures in the last 15 minutes, with max RTT of 145 ms. show ip sla statistics 1 shows the current probe as OK with RTT of 8 ms. What does this reveal and what action should be taken?

Correct answer is C. This scenario illustrates precisely why show ip sla statistics aggregated is more valuable than the instantaneous show ip sla statistics for operational monitoring. The current probe shows OK — if you looked only at the instantaneous statistics, you would conclude the link is perfectly healthy. But the aggregated window tells a completely different story: 2 of 30 probes failed (6.7% packet loss) and the max RTT spiked to 145 ms — 17 times higher than the normal 8 ms baseline. This pattern is characteristic of intermittent ISP congestion, a degrading physical connection (cable, SFP, or patch panel), or CEF/switching performance issues on either end. Because the current probe happens to be in a good moment, the instantaneous view is misleading. The max RTT of 145 ms is particularly telling — even if the EEM alert did not fire (the reachability probe stayed Up and the RTT didn't exceed the 2,000 ms threshold), this peak represents a period where VoIP calls on this link would have experienced significant degradation. Clearing statistics (option D) would destroy valuable baseline data — never clear probe statistics on a link that you suspect is having problems, as the historical data is evidence for the ISP trouble ticket.