Skip to content
8 min read

Cilium Deep Dive: What Replacing kube-proxy Actually Means

Cilium replaces kube-proxy with eBPF, but some iptables remain. Real Helm values, BPF map stats, and the gaps most guides skip.
Isometric visualization of eBPF programs attached to the Linux kernel networking stack, bypassing traditional iptables chains

Every Kubernetes cluster runs kube-proxy. It's the component you never think about until you have 200 services and your iptables chains take seconds to update. Cilium replaces kube-proxy entirely with eBPF programs that run inside the Linux kernel, turning O(n) packet processing into O(1) hash lookups. This isn't theoretical. My 3-node homelab has been running without kube-proxy for months, and the difference shows up in ways I didn't expect.

This is Part 3 of the Building a Production-Grade Homelab series. Part 1 covered why kubeadm. Part 2 covered HA with kube-vip. Now we're replacing the default networking data plane.

What kube-proxy Actually Does

kube-proxy watches the Kubernetes API for Service and Endpoints objects, then programs the host's networking rules so pods can reach services. In its default iptables mode, that means generating iptables rules for every Service/Endpoint pair.

The problem is structural. When a packet arrives at a node, the kernel evaluates iptables rules sequentially until it finds a match. Add 200 services with 3 backends each, and you're looking at 600+ rules evaluated per packet. Every time a Service or Endpoint changes, kube-proxy flushes and rewrites the entire chain while holding an atomic lock on the kernel tables.

IPVS mode improves the lookup performance with hash tables, but it still relies on the kernel's conntrack subsystem for connection tracking. Under high throughput, conntrack table exhaustion leads to dropped packets. IPVS is a better implementation of the same architecture, not a different approach.

Neither mode provides any visibility into the traffic it routes. If a packet gets dropped, you're reaching for tcpdump.

How Cilium eBPF Changes the Model

Cilium doesn't optimize iptables. It bypasses iptables entirely by attaching eBPF programs directly to the kernel's networking hooks. Service lookups happen through BPF hash maps instead of sequential rule chains.

kube-proxy (iptables) Cilium eBPF
Service lookup Sequential rule matching (O(n)) Hash map lookup (O(1))
East-west LB Per-packet DNAT at prerouting Socket-level at connect()
Rule updates Flush and rewrite all chains Atomic BPF map update
Observability None Hubble (L3/L4/L7 flows)
Masquerading iptables MASQUERADE eBPF (optional, bpf.masquerade)

Socket-Level Load Balancing

This is the part most articles skip. For east-west traffic (pod-to-service within the cluster), Cilium intercepts at the connect() system call, before a packet even exists. When Pod A connects to a ClusterIP, Cilium's eBPF program translates the destination to the backend pod's real IP at the socket layer. The kernel then builds the packet with the correct destination from the start.

The result: no per-packet DNAT, no SNAT, no conntrack entries for intra-cluster traffic. The kernel sees direct pod-to-pod connections. On the homelab, the cilium_lb4_reverse_sk BPF map holds over 71,000 entries tracking socket-level translations for established connections.

North-South: XDP and Direct Server Return

For traffic entering the cluster from outside, Cilium can attach to XDP (eXpress Data Path) hooks on the network driver itself, processing packets before the kernel allocates socket buffers. With Direct Server Return (DSR), the backend pod responds directly to the client without routing back through the load balancer node. Maglev consistent hashing ensures backend changes affect minimal active connections.

The Homelab Configuration

Three Lenovo M80q nodes (i5-10400T, 16GB RAM) running Kubernetes v1.35.0 via kubeadm, with Cilium v1.18.6 as the CNI. The kernel is 6.8.0 (Ubuntu 24.04 HWE).

Step 1: Skip kube-proxy at Bootstrap

The cleanest way to replace kube-proxy is to never install it. During kubeadm init, the --skip-phases flag prevents the kube-proxy DaemonSet from being created:

kubeadm init \
  --config=kubeadm-config.yaml \
  --upload-certs \
  --skip-phases=addon/kube-proxy

This is handled by Ansible playbook 03-init-cluster.yml. No kube-proxy pods, no leftover iptables rules, no conflict with Cilium.

Step 2: Install Cilium via Helm

The Ansible playbook 04-cilium.yml installs Cilium through Helm rather than the cilium install CLI. Helm provides version-pinned, reproducible installs and integrates with GitOps workflows.

The full values.yaml:

cluster:
  name: homelab

operator:
  replicas: 1

routingMode: tunnel
tunnelProtocol: vxlan

gatewayAPI:
  enabled: true

kubeProxyReplacement: true

k8sServiceHost: "10.10.30.10"   # kube-vip VIP
k8sServicePort: "6443"

l7Proxy: true

l2announcements:
  enabled: true

externalIPs:
  enabled: true

devices: "eno1"

A few values worth explaining:

k8sServiceHost: "10.10.30.10" points to the kube-vip floating VIP from Part 2. Without kube-proxy, Cilium can't discover the API server through the default in-cluster kubernetes Service. You have to tell it explicitly.

routingMode: tunnel uses VXLAN encapsulation for pod-to-pod traffic. This works on any flat L2 network without BGP or custom router configuration. Native routing would give better performance, but VXLAN is simpler to operate and debug.

devices: "eno1" tells Cilium which interface to use for L2 announcements. The Intel I219-LM NIC on the M80q nodes.

What's Not in the Values File (and Why It Matters)

The homelab verification revealed two settings that are not configured but probably should be:

bpf.masquerade: true would move pod-to-external SNAT from iptables into eBPF. Right now, Cilium replaces kube-proxy's service routing chains but still uses iptables for masquerading. The cilium status output confirms this: Masquerading: IPTables [IPv4: Enabled]. "Replacing kube-proxy" doesn't mean "removing all iptables." Worth understanding the distinction.

bpf.hostLegacyRouting: false would enable native eBPF host routing, bypassing the veth pair and host network stack for pod-to-host traffic. The cluster runs in Legacy host routing mode (Host: Legacy in status output). The kernel (6.8.0) supports native mode. This is a performance tuning opportunity I haven't needed yet on a 3-node cluster, but it's the kind of detail a deep-dive post should call out honestly.

L2 Announcements: Replacing MetalLB

On bare-metal clusters, LoadBalancer Services need something to advertise their IPs on the local network. MetalLB is the standard answer. Cilium's L2 Announcements feature does the same thing without an additional component.

The feature is Beta in Cilium 1.18.x, but it's been stable enough for production homelab use. Four LoadBalancer services have been running on it for weeks with zero issues.

Two custom resources configure it:

CiliumLoadBalancerIPPool defines the available IP range:

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: homelab-pool
spec:
  blocks:
    - start: "10.10.30.20"
      stop: "10.10.30.99"

CiliumL2AnnouncementPolicy controls which interfaces and node types respond to ARP requests:

apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: homelab-l2
spec:
  interfaces:
    - ^eno.*
    - ^eth.*
    - ^enp.*
  nodeSelector:
    matchLabels:
      kubernetes.io/os: linux
  loadBalancerIPs: true
  externalIPs: true

How It Works Under the Hood

Cilium uses Kubernetes Leases for per-service leader election. For each LoadBalancer IP, one node wins the lease and starts responding to ARP requests for that IP. If the node goes down, another node acquires the lease and takes over.

On the homelab, all four leases currently sit on the same node (k8s-cp2):

Lease Holder Service IP
homelab-gateway k8s-cp2 10.10.30.20
gitlab-shell-lb k8s-cp2 10.10.30.21
otel-collector k8s-cp2 10.10.30.22
adguard-dns k8s-cp2 10.10.30.53

This is by design. L2 networking is inherently Active/Standby per IP; only one node can own a VIP at a time. The fact that all four landed on the same node means if k8s-cp2 goes down, all four services fail over simultaneously. The recovery is automatic (another node acquires each lease), but there's a brief disruption window. BGP-based load balancing can spread VIPs across nodes, but requires router support that most home networks don't have.

What Cilium Replaces

The consolidation argument is real. The homelab runs one component where most clusters need three to five:

Function Traditional Stack Homelab (Cilium)
CNI Calico or Flannel Cilium
Service proxy kube-proxy Cilium eBPF
Network policies Calico Cilium
LoadBalancer IPs MetalLB Cilium L2 Announcements
Ingress controller NGINX or Traefik Cilium Gateway API
Network observability Prometheus exporters Hubble

Fewer components means fewer things to upgrade, fewer resource consumers, and fewer failure modes. The Gateway API integration (covered in detail in Part 4) serves 16 HTTPRoutes across 13 namespaces through a single gatewayClassName: cilium configuration.

Hubble: Observability Without Sidecars

Hubble reads flow data from the same eBPF programs that handle packet routing. No sidecar proxies, no packet capture, no additional CPU overhead. On the homelab, Hubble processes 212-303 flows per second from ~30 pods.

# See what's being dropped and why
hubble observe --verdict DROPPED

# Watch HTTP traffic to a specific service
hubble observe --pod grafana --protocol http

# All flows from a namespace
hubble observe --from-namespace monitoring

The practical value shows up during debugging. With kube-proxy, a dropped packet is silent. You don't know it happened unless the application reports a timeout. Hubble explicitly logs every drop with a reason: Policy denied, Stale endpoint, TCP flags invalid. That turns hours of tcpdump analysis into a single CLI command.

The default Hubble buffer holds 4,095 flows before older entries are evicted. At 300 flows/s, that's roughly 13 seconds of history. For deeper analysis, you'd increase hubble.bufferSize or enable Hubble metrics export to Prometheus.

Verifying the Replacement

After installation, verify that kube-proxy is truly gone and Cilium owns the data plane:

# No kube-proxy pods or DaemonSet
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# (empty)

# No iptables service chains
sudo iptables-save | grep KUBE-SVC
# (empty)

# Cilium confirms replacement
cilium status
# KubeProxyReplacement: True [eno1, Direct Routing]

The eBPF maps tell the rest of the story:

BPF Map Entries Purpose
cilium_lb4_services_v2 200 All services (ClusterIP + NodePort + LB)
cilium_lb4_backends_v3 87 Backend pod addresses
cilium_lb4_reverse_sk 71,312 Socket reverse lookup for established connections
cilium_ipcache_v2 100 IP-to-identity mapping
cilium_lxc 26 Local endpoints on this node

The 200 service entries in the cilium_lb4_services_v2 map are what kube-proxy would have rendered as 200+ iptables chains. In Cilium, they're hash table entries with O(1) lookup time. The cilium_lb4_reverse_sk map with 71,312 entries is particularly interesting: each entry represents a socket cookie for an established connection, enabling Cilium to bypass NAT entirely for ongoing traffic.

Resource Cost

Cilium is not free. Each node runs a Cilium agent pod:

Node CPU Memory Uptime
k8s-cp1 60m 183Mi 13d
k8s-cp2 54m 261Mi 13d
k8s-cp3 66m 284Mi 17d

That's roughly 5-6% of a single core and 200-300MB of RAM per node. The operator pod adds negligible overhead. For a 3-node cluster with 16GB RAM per node, this is comfortable. Zero restarts across all pods over extended uptime.

154 internal controllers are running healthy. 13 modules are stopped (disabled features: encryption, BIG TCP, bandwidth manager, SRv6, host firewall). One module reports degraded status, likely related to TLSRoute CRD expectations in the operator logs (Cilium 1.18.x looks for experimental Gateway API CRDs even when only standard CRDs are installed).

Version Compatibility

Cilium 1.18.x officially supports Kubernetes 1.30 through 1.33. The homelab runs K8s 1.35.0, which is outside the official matrix. It works. Zero restarts, 154/154 controllers healthy, all features operational.

Version matrices are conservative by necessity. Upstream Kubernetes maintains backward compatibility for API versions across several minor releases. The practical risk of running Cilium 1.18.x on K8s 1.35 is low, but it means you're responsible for catching compatibility issues yourself rather than relying on the vendor's CI matrix.

The operator logs do show v1 Endpoints is deprecated in v1.33+ warnings. Cilium 1.18.x still uses v1 Endpoints alongside EndpointSlice. Not a functional problem, but expect the next major Cilium release to drop v1 Endpoints support.

The Bigger Picture

Kubernetes itself is moving in this direction. KEP 4004 deprecated status.nodeInfo.kubeProxyVersion in K8s 1.31, acknowledging that kube-proxy may not be present. An nftables mode for kube-proxy landed as Beta in K8s 1.31, providing a modernized backend for environments that can't run eBPF, but not changing the fundamental architecture.

GKE Dataplane V2, AWS EKS Anywhere, and Azure CNI all use Cilium under the hood. It's one of the fastest-growing CNCF graduated projects, with over 506,000 contributions and 4,400+ individual contributors.

For the homelab, the argument is simpler. Cilium replaces kube-proxy, MetalLB, and a separate ingress controller with one component. Hubble gives you network observability that would otherwise require a service mesh. The eBPF data plane handles 200 services with the same lookup time it would handle 20,000. And the whole thing runs on 60m CPU and 250MB RAM per node.

What's Next

This post is the third in the "Building a Production-Grade Homelab" series:

  1. Why kubeadm Over k3s, RKE2, and Talos
  2. HA Control Plane with kube-vip
  3. Cilium Deep Dive: eBPF Networking (you are here)
  4. Gateway API vs Ingress: No Ingress Controller Needed
  5. Distributed Storage with Longhorn: 2 Replicas Are Enough
  6. The Modern Logging Stack: Loki + Alloy
  7. Alerting That Actually Works: Discord, Email, and Dead Man's Switches
  8. Self-Hosted GitLab: CI/CD Without Cloud Vendor Lock-in

Part 4 covers the 16 HTTPRoutes the homelab serves through Cilium's native GatewayClass, and why the ingress-nginx controller's retirement makes Gateway API the clear successor.