The Zombie Problem Hiding in Your Routing Tables

Executive Summary

BGP stuck routes also widely known as "BGP zombies", are routing table entries that persist in one or more routers' RIBs, despite the originating AS having withdrawn the prefix. They represent one of the most insidious failure modes in Internet routing because affected routers have zero awareness that anything is wrong. Stuck routes carry no special flag, appear identical to valid entries and silently cause traffic blackholing or suboptimal forwarding for hours to weeks.

Research by Fontugne et al. (PAM 2019) measured more than one zombie outbreak per day using RIPE RIS beacon prefixes, with the average peer missing 1.8% of IPv4 withdrawals and 2.7% of IPv6 withdrawals. A follow up study(ANRW 2021) found over 6.5 million zombies across six years of BGP data, with 94% causing incoherent routing states and 468 confirmed routing loops. The most significant protocol level root cause, a TCP zero window condition that blocks withdrawal transmission while keeping sessions alive, was confirmed across every major BGP implementation and was finally addressed by RFC 9687.


How BGP Zombies Manifest and Evade Detection

A stuck routes defining characteristic is invisibility. When an origin AS withdraws a prefix, WITHDRAW messages should propagate through the AS graph until the prefix is removed from every downstream routers adj-RIB-in, loc-RIB, and FIB. In a zombie scenario, this propagation fails at some intermediate AS. Every downstream router retains the route as a normal, valid, best path entry with the standard *> marker distinguishing it as a legitimate active route.

On Cisco IOS-XR, a zombie appears as *>i 10.0.0.0/24 192.168.1.1 0 100 0 65001 65002 i with no S (stale), r (RIB-failure) or any other anomalous flag. On Junos, it displays as an active [BGP/170] entry with validation state: unverified. The router forwarding packets toward a zombie prefix has no mechanism to detect the problem locally because from its perspective, the route was never withdrawn.

Figure 1: BGP Zombie Route Lifecycle. The withdrawal fails to propagate past a transit AS

This distinguishes stuck routes sharply from stale routes under Graceful Restart(RFC 4724). Stale routes are explicitly marked with an S flag, bounded by the stale path time and automatically cleaned up when either the session re-establishes with an "End of RIB" marker or the timer expires. Stuck routes have no automatic cleanup mechanism. They persist indefinitely until the BGP session is torn down, a route refresh is processed or the route is explicitly reannounced and then withdrawn.

Figure 2: Stuck Routes vs. Stale Routes

Attribute Stuck Routes (Zombies) Stale Routes (Graceful Restart)
Cause Withdrawal not propagated BGP session reset. Peer restarting
Awareness None. Route appears valid Explicitly marked as stale
Status Flag None (*>) S (Stale)
Duration Indefinite Bounded by stale path timer
Cleanup Manual or session reset Automatic timer expiry or EoR
Protocol Spec Addressed by RFC 9687 Defined in RFC 4724 / RFC 9494

Table 1: Comparison of Stuck Routes vs. Stale Routes


The TCP Zero-Window Flaw and Other Root Causes

The Primary Protocol Level Vulnerability

The most significant documented root cause was identified by Job Snijders(Fastly/NTT).

BGP runs over TCP. If a remote peers application stops reading from its TCP receive buffer, due to CPU overload, a software bug or deliberate action, the TCP receive window shrinks to zero. This blocks the local system from sending any data, including KEEPALIVE, UPDATE, WITHDRAW, and NOTIFICATION messages. The critical flaw in RFC 4271, is that the Hold Timer only counts from the last message received. If the remote peer keeps sending KEEPALIVEs, the session stays Established indefinitely while the local system cannot transmit withdrawals.

Snijders confirmed this vulnerability affected all the BGP router platforms.

Figure 3: TCP Zero-Window. KEEPALIVEs flow in but WITHDRAWs cannot flow out

The fix is RFC 9687 (BGP SendHoldTimer), published November 2024 as a Standards Track document. It adds a SendHoldTimer to the BGP finite state machine. Each time the local system successfully sends a BGP message, the timer resets. When it expires, indicating the local system hasn't transmitted any BGP message for the duration, the session is torn down with a new NOTIFICATION Error Code 8. The recommended default is approximately 2× the negotiated HoldTime.

Software Implementation Bugs Across Vendors

Cisco IOS-XR has documented bugs including stale BGP prefixes on ASR9K when soft reconfiguration inbound is configured, withdrawals are not processed correctly. The CSCuv27777 bug leaves IOS-XR stuck in "Not NSR-Ready" state, preventing proper BGP state synchronization between active/standby processors.

FRRouting has a well documented issues tracker. Issue #16641 (FRR 9.1.1) describes routes persisting in kernel tables after neighbor shutdown or daemon stop. Issue #11073 documents routes continuing to be announced after the underlying static route is deleted. Issue #7865 reports routes being rejected by the kernel with "Invalid argument" after 4 to 48 hours, requiring a full system reboot to resolve.

Juniper JunOS exhibits zombie behavior where aggregate routes accompanied by stale routes persisted across multiple JunOS versions. Deactivating peers, clearing neighbors, and reenabling sessions did not resolve the issue and only a full router reboot cleared the zombies.

Other Contributing Factors

Path MTU mismatches create a subtle variant. BGP keepalives are small packets that traverse paths with reduced MTU but larger UPDATE/WITHDRAW messages get silently dropped. This produces sessions where both ends show connectivity but route exchange is blocked.

Graceful Restart edge cases contribute when timers are misconfigured. Long lived GR(RFC 9494) allows timers up to 16,777,215 seconds(~194 days), meaning routes can persist as "stale" for enormous periods.

Route Reflector topology interactions amplify the problem. A zombie in a transit network propagates to all customer networks. The PAM 2019 study found that a zombie in Level(3)/AS3356 caused approximately 50% of RIPE RIS peers to show zombie routes.


Detecting Stuck Routes Across Your Infrastructure

Detection is fundamentally challenging because stuck routes masquerade as valid entries. No single technique is sufficient. Effective detection requires layered approaches combining local consistency checking, protocol level monitoring and external vantage points.

Figure 4: Layered BGP Stuck Route Detection Architecture

Layer 1: Local Router RIB/FIB Consistency Checking

Cisco IOS-XR provides the most mature built in tooling with its Route Consistency Checker (RCC), which compares RIB entries against FIB on line cards. IOS-XR also supports update wait-install, which prevents BGP from advertising a route until RIB to FIB installation feedback is received. IOS-XE offers show ip bgp rib-failure for routes not installed in RIB, and a BGP Consistency Checker with auto repair capability. Junos exposes hidden routes via show route hidden detail. Arista EOS marks uninstalled routes with # and stale routes with S.

Layer 2: BMP for Real-Time Route Lifecycle Monitoring

The BGP Monitoring Protocol(RFC 7854) provides the most powerful framework for stuck route detection by streaming pre-policy adj-RIB-in, post-policy adj-RIB-in, loc-RIB(via RFC 9069) and adj-RIB-out(via RFC 8671) data to external collectors. The detection methodology compares routes in adj-RIB-in against known withdrawal history. If a prefix persists after a withdrawal should have propagated, it is a zombie candidate. Correlating adj-RIB-in(received) with adj-RIB-out(sent) reveals routes withdrawn inbound but still advertised outbound. OpenBMP is the primary open-source collector, backed by Kafka and PostgreSQL.

Layer 3: External Vantage Point Monitoring

ThousandEyes BGP Stuck Route Observatory uses beacon prefix advertisements across hundreds of global BGP monitors to detect stuck routes per ASN. BGPalerter(developed by NTT), self configures monitoring from RIPE RIS Live streams and detects route hijacks, visibility loss and RPKI invalid announcements. BGPStream/PyBGPStream(CAIDA) provides unified access to RouteViews, RIPE RIS, and BMP data for custom analysis.

Streaming Telemetry and Automation

Modern detection pipelines combine gNMI subscriptions with BMP data in a centralized analytics platform. The OpenConfig YANG model openconfig-rib-bgp exposes five logical RIBs per address family: loc-RIB, adj-RIB-in-pre, adj-RIB-in-post, adj-RIB-out-pre, and adj-RIB-out-post, each with INVALID_ROUTE_REASON identities. Tools like gNMIc (Nokia) and gnmi-gateway(Netflix) feed Prometheus or Kafka for downstream alerting.


Prevention Strategies: Protocol Level to Operational Practice

RFC 9687 SendHoldTimer: The Primary Protocol Fix

Every hyperscale operator should prioritize vendor adoption of RFC 9687. It directly addresses the TCP zero window vulnerability by adding a SendHoldTimer that fires when no BGP message can be successfully transmitted for the configured duration. OpenBGPD and FRRouting have early implementations. Cisco, Juniper, and Arista has not published deployment timelines yet.

Enhanced Route Refresh (RFC 7313)

RFC 7313 transforms basic route refresh into a stuck route cleanup mechanism through BoRR/EoRR markers. Upon receiving BoRR, the router marks all routes from that peer as stale then peer readvertises its entire adj-RIB-out and upon receiving EoRR, any routes still marked stale are immediately purged. This mark and sweep approach provides complete route table resynchronization without session teardown. For hyperscale datacenter fabrics, scheduling Enhanced Route Refresh weekly during low traffic windows is recommended.

Add-Path (RFC 7911) to Reduce Blast Radius

Traditional route reflectors reflect only the best path, meaning a single zombie at the RR affects all clients. With Add-Path, RR clients receive multiple paths and can make independent bestpath decisions. If one path is stuck, pre-installed alternates enable continued forwarding. Configure add-path send path-count 6 on RR-to-client sessions for meaningful path diversity.

Graceful Restart and BFD Interaction

GR and BFD fundamentally conflict. BFD triggers fast failover while GR prevents it. The safe combination requires BFD implemented in the forwarding plane with C-bit = 1, where BFD failure indicates data plane failure, while BFD staying up during control plane restart enables GR helper mode. For hyperscale datacenter fabrics with sub-second failover requirements and alternate paths, prefer BFD without GR.

Recommended Timer Configuration

Timer Recommendation Rationale
Hold Time 90s(eBGP), 180s(iBGP) Balance detection speed with false positive risk
MRAI 0 to 1s in fabric Critical for sub second convergence
GR Restart Time 120 to 180s Match actual observed restart duration
GR Stale Routes 360 to 600s Buffer above restart time
SendHoldTime 2× HoldTime Detect blocked TCP connections(RFC 9687)
Connect Retry 30s Faster reestablishment after failures

Table 2: Recommended Timer Settings for Datacenter Environments


Real-World Incidents That Demonstrate the Stakes

CenturyLink/Level3: The Definitive Stuck Route Catastrophe

The August 30, 2020 CenturyLink/Lumen outage remains the clearest documented case of BGP stuck routes causing catastrophic real-world impact. A faulty BGP Flowspec rule originating from CenturyLink's Mississauga data center blocked all internal BGP traffic across their Tier 1 backbone(AS 3356). Flowspec rule kills BGP, BGP restarts, receives the rule again, crashes again. This created an infinite loop preventing withdrawal propagation. External ASes continued routing toward CenturyLink for prefixes that were no longer internally reachable. 3.5% of global Internet traffic was affected for approximately 5 hours. BGP update volume spiked from the normal 1.5 - 2 MB per 15 minute period to over 26 MB.

Measured Zombie Prevalence Across the Internet

The PAM 2019 study established ground truth using 27 RIPE RIS beacon prefixes cycled on 2 to 4 hour intervals. The ANRW 2021 follow up extended detection to all Internet prefixes, discovering over 6.5 million zombies in six years with 94% causing incoherent states and 468 confirmed routing loops. The ACM IMC 2025 study refined the methodology, confirming that zombies are a persistent, regular phenomenon rather than rare edge cases.

The June 2023 BGP Prefix-SID Attribute Incident

On June 2, 2023, a small Brazilian network re-announced a route with a corrupt BGP Prefix-SID attribute(type 40, all data 0x00), with the transitive bit set. JunOS passed the corrupt attribute unchanged, Arista EOS devices reset sessions upon receiving it and BIRD-based IXP route servers propagated it to all clients across multi-terabit exchanges. This created a persistent stuck route and when sessions restarted, the bad route was retransmitted, causing repeated resets. Approximately 100 networks were affected, with message rates spiking from 30,000/s to over 150,000/s.

FRRouting: Open-Source Stuck Routes in Production

FRRouting's bug tracker provides unusually transparent documentation of stuck route behavior. Issue #16641 describes routes persisting in Linux kernel tables after session teardown. Issue #15626 documents orphan routes with BGP suppress fib pending enabled. Issue #7865 is the most alarming: After 4 to 48 hours, routes are rejected by the kernel, and only a full system reboot resolves the issue. These bugs are particularly relevant for hyperscale operators running FRR on white box switches.


Conclusion: From Protocol Flaw to Operational Discipline

BGP stuck routes have evolved from an obscure curiosity first noted at RIPE 42 in 2002 to a quantified, daily phenomenon affecting the global routing system. The identification of the TCP zero window root cause and its codification in RFC 9687 represents the most important protocol-level advance, but it addresses only one vector among many.

Three operational priorities emerge for hyperscale environments:

First, deploy RFC 9687 SendHoldTimer as vendors ship implementations and this eliminates the single largest known cause.

Second, implement continuous BMP monitoring with adj-RIB-in/out correlation and automated Enhanced Route Refresh(RFC 7313) remediation. This catches zombies from any cause rather than depending on protocol fixes alone.

Third, design for resilience through Add-Path on route reflectors, RPKI ROV on all external sessions and Batfish validated configuration pipelines. This will reduce both the probability and blast radius of stuck routes.

The NSDI 2025 proposal for Route Status Transparency(RoST) suggests the research community sees stuck routes as warranting fundamental architectural change. Until such mechanisms mature, the layered defense of protocol level timers, continuous monitoring and disciplined operational practice, remains the most effective strategy against BGP's zombie problem.


Key References

RFC 9687 - Border Gateway Protocol 4(BGP-4) Send Hold Timer(November 2024)

RFC 7313 - Enhanced Route Refresh Capability for BGP-4

RFC 7854 - BGP Monitoring Protocol(BMP)

RFC 7911 - Advertisement of Multiple Paths in BGP(Add-Path)

RFC 4724 - Graceful Restart Mechanism for BGP

RFC 9494 - Long-Lived Graceful Restart for BGP

Fontugne et al., "BGP Zombies: An Analysis of Beacons Stuck Routes" (PAM 2019)

Ongkanchana et al., "Hunting BGP Zombies in the Wild" (ANRW 2021)

Xygkou et al., "A First Look into Long-lived BGP Zombies" (ACM IMC 2025)

Anahory et al., "Suppressing BGP Zombies with Route Status Transparency" (NSDI 2025)

ThousandEyes BGP Stuck Route Observatory (thousandeyes.com)