Kubernetes Loop

I’ve been diving deep into systems architecture lately, specifically Kubernetes

Strip away the UIs, the YAML, and the ceremony, and Kubernetes boils down to:

A very stubborn event driven collection of control loops

aka the reconciliation (Control) loop, and everything I read is calling this the “gold standard” for distributed control planes.

Because it decomposes the control plane into many small, independent loops, each continuously correcting drift rather than trying to execute perfect one-shot workflows. these loops are triggered by events or state changes, but what they do is determined by the the spec. vs observed state (status)

Now we have both:

  • spec: desired state
  • status: observed state

Kubernetes lives in that gap.

When spec and status match, everything’s quiet. When they don’t, something wakes up to ensure current state matches the declared state.

The Architecture of Trust

In Kubernetes, they don’t coordinate via direct peer-to-peer orchestration; They coordinate by writing to and watching one shared “state.”

That state lives behind the API server, and the API server validates it and persists it into etcd.

Role of the API server

The API server is the front door to the cluster’s shared truth: it’s the only place that can accept, validate, and persist declared intent as Kubernetes API objects (metadata/spec/status).

When you install a CRD, you’re extending the API itself with a new type (a new endpoint) or a schema the API server can validate against

When we use kubectl apply (or any client) to submit YAML/JSON to the API server, the API server validates it (built-in rules, CRD OpenAPI v3 schema / CEL rules, and potentially admission webhooks) and rejects invalid objects before they’re stored.

If the request passes validation, the API server persists the object into etcd (the whole API object, not just “intent”), and controllers/operators then watch that stored state and do the reconciliation work to make reality match it.

Once stored, controllers/operators (loops) watch those objects and run reconciliation to push the real world toward what’s declared.

it turns out In practice, most controllers don’t act directly on raw watch events, they consume changes through informer caches and queue work onto a rate-limited workqueue. They also often watch related/owned resources (secondary watches), not just the primary object, to stay convergent.

spec is often user-authored as discussed above, but it isn’t exclusively human-written, the scheduler and some controllers also update parts of it (e.g., scheduling decisions/bindings and defaulting).

Role of etcd cluster

etcd is the control plane’s durable record of “the authoritative reference for what the cluster believes that should exist and what it currently reports.”

If an intent (an API object) isn’t in etcd, controllers can’t converge on it—because there’s nothing recorded to reconcile toward

This makes the system inherently self-healing because it trusts the declared state and keeps trying to morph the world to match until those two align.

One tidbit worth noting:

In production, Nodes, runtimes, cloud load balancers can drift independently. Controllers treat those systems as observed state, and they keep measuring reality against what the API says should exist.

How the Loop Actually Works

 Kubernetes isn’t one loop. It’s a bunch of loops(controllers) that all behave the same way:

  • read desired state (what the API says should exist)
  • observe actual state (what’s really happening)
  • calculate the diff
  • push reality toward the spec

 

As an example, let’s look at a simple nginx workload deployment

1) Intent (Desired State)

To Deploy the Nginx workload. You run:

kubectl apply -f nginx.yaml

 

The API server validates the object (and its schema, if it’s a CRD-backed type) and writes it into etcd.

At that point, Kubernetes has only recorded your intent. Nothing has “deployed” yet in the physical sense. The cluster has simply accepted:

“This is what the world should look like.”

2) Watch (The Trigger)

Controllers and schedulers aren’t polling the cluster like a bash script with a sleep 10.

They watch the API server.

When desired state changes, the loop responsible for it wakes up, runs through its logic, and acts:

“New desired state: someone wants an Nginx Pod.”

watches aren’t gospel. Events can arrive twice, late, or never, and your controller still has to converge. Controllers use list+watch patterns with periodic resync as a safety net. The point isn’t perfect signals it’s building a loop that stays correct under imperfect signals.

Controllers also don’t spin constantly they queue work. Events enqueue object keys; workers dequeue and reconcile; failures requeue with backoff. This keeps one bad object from melting the control plane.

3) Reconcile (Close the Gap)

Here’s the mental map that made sense to me:

Kubernetes is a set of level-triggered control loops. You declare desired state in the API, and independent loops keep working until the real world matches what you asked for.

  • Controllers (Deployment/ReplicaSet/etc.) watch the API for desired state and write more desired state.
    • Example: a Deployment creates/updates a ReplicaSet; a ReplicaSet creates/updates Pods.
  • The scheduler finds Pods with no node assigned and picks a node.
    • It considers resource requests, node capacity, taints/tolerations, node selectors, (anti)affinity, topology spread, and other constraints.
    • It records its decision by setting spec.nodeName on the Pod.
  • The kubelet on the chosen node notices “a Pod is assigned to me” and makes it real.
    • pulls images (if needed) via the container runtime (CRI)
    • sets up volumes/mounts (often via CSI)
    • triggers networking setup (CNI plugins do the actual wiring)
    • starts/monitors containers and reports status back to the API

Each component writes its state back into the API, and the next loop uses that as input. No single component “runs the whole workflow.”

One property makes this survivable: reconcile must be safe to repeat (idempotent). The loop might run once or a hundred times (retries, resyncs, restarts, duplicate/missed watch events), and it should still converge to the same end result.

if the desired state is already satisfied, reconcile should do nothing; if something is missing, it should fill the gap, without creating duplicates or making things worse.

When concurrent updates happen (two controllers might try to update the same object at the same time)

Kubernetes handles this with optimistic concurrency. Every object has a resourceVersion (what version of this object did you read?”). If you try to write an update using an older version, the API server rejects it (often as a conflict).

Then the flow is: re-fetch the latest object, apply your change again, and retry.

4) Status (Report Back)

Once the pod is actually running, status flows back into the API.

The Loop Doesn’t Protect You From Yourself

What if the declared state says to delete something critical like kube-proxy or a CNI component? The loop doesn’t have opinions. It just does what the spec says.

A few things keep this from being a constant disaster:

  • Control plane components are special. The API server, etcd, scheduler, controller-manager these usually run as static pods managed directly by kubelet, not through the API. The reconciliation loop can’t easily delete the thing running the reconciliation loop as long as its manifest exists on disk.
  • DaemonSets recreate pods. Delete a kube-proxy pod and the DaemonSet controller sees “desired: 1, actual: 0” and spins up a new one. You’d have to delete the DaemonSet itself.
  • RBAC limits who can do what. Most users can’t touch kube-system resources.
  • Admission controllers can reject bad changes before they hit etcd.

But at the end, if your source of truth says “delete this,” the system will try. The model assumes your declared state is correct. Garbage in, garbage out.

Why This Pattern Matters Outside Kubernetes

This pattern shows up anywhere you manage state over time.

Scripts are fine until they aren’t:

  • they assume the world didn’t change since last run
  • they fail halfway and leave junk behind
  • they encode “steps” instead of “truth”

A loop is simpler:

  • define the desired state
  • store it somewhere authoritative
  • continuously reconcile reality back to it

Ref

ASIC’S in Cisco Catalyst switches

Preface-

After I started working with Open networking switches, wanted to know more about the Cisco catalyst range I work with every day.

Information on older ASICS is very hard to find, but recently they have started to talk a lot about the new chips like UADP 2.0 with the Catalyst 9k / Nexus launch, This is more likely due to the rise of Desegregated Network Operating Systems DNOS such as Cumulus and PICA8, etc forcing customers to be more aware of what’s under the hood rather than listening and believing shiny PDF files with a laundry list of features.

The information was there but scattered all over the web, I went though CiscoLive, TechFieldDay slides/videos, interviews, partner update PDFs, Press leases and whitepapers and even LinkedIn profiles to gather information

If you notice a mistake please let me know

Scope –

we are going to focus on the ASIC’s used in the well-known 2960S/X/XR and the 36xx,37xx,38XX  and the new Cat 9K series

Timeline

 

Summary

Cisco Brought a bunch of companies to acquire the switching technology they needed that later bloomed into the switching platforms we know today

  • Crescendo Communications (1993) – Catalyst 5K and 6K chassis
  • Kalpana (1994) – Catalyst 3K (Fun Fact they invented VLANs that later got standardized as 802.1q)
  • Grand Junction (1995) – Catalyst 17xx, 19xx, 28xx, 29xx
  • Granite Systems (1996) – Catalyst 4K (K series ASIC)

After the original Cisco 3750/2950 switches, Cisco 3xxx/2xxx-G  (G for Gigabit) was released

Next, the Cisco 3xxx-E series with enterprise management functions was released

later, Cisco developed the Cisco 3750-V series with the function of energy-saving version for –E series, later replaced by Cisco 3750 V2 series (Possibly a die shrink)

G series and E series were later phased out and integrated into Cisco X series. which is still being sold and supported

in 2017-2018 Cisco released the catalyst 9k family to replace the 2K and 3K families

Sasquatch ASIC

from what I could  find there are two variants of this ASIC

The initial release in 2003

  • Fixed pipeline ASIC
  • 180 Nano-meter process
  • 60 Million Transistors

Shipped with the 10/100 3750 and 2960

Die Shrink to 130nm in 2005

  • Fixed pipeline ASIC
  • 130 Nano-meter process
  • 210 Million Transistors

Shipped in the 2960-G/3560-G/3750-G series

I couldn’t find much info about the chip design. will update this section when I find more.

 Strider ASIC

Initially Release in 2010

  • Fixed pipeline ASIC
  • Built on the 65-nanometer process
  • 1.3 Billion Transistors

Strider ASIC (circa 2010) was an improved design based on the 3750-E series was first shipped with the 2960-S family.

S88G_ASIC design
S88G ASIC

later in 2012-2013 with a die shrink to 45-nanometer, they managed to fit 2 ASICs in the same silicon, This shipped with the 2960-X/XR which replaced the 2960-S

  • higher stack speeds and features
  • limited layer 3 capabilities IP Lite feature (2960-XR only)
  • Better QoS and Netflow lite

Later down the line they silently rolled the ASIC design to a 32-nanometer process for better yield to achieve cheaper manufacturing costs

this switch is still being sold with no EOL announced as a cheaper Acess layer switch

On a side note – in 2017 Cisco released another version of the 2960 family the WS-2960-L This is a cheaper version built on a Marvel ASIC (Same as the SG55x) with a web UI and fanless design. I personally think this is the next evolution of their SMB market-oriented family the popular Cisco SG-5xx series. for the first the time the 2960 family had a fairly usable and pleasant web interface for the configuration and management. the new 9K series seems to be containing a more polished version of the web-UI

Unified Access Data Plane (UADP)

Due to the limitations in the fixed pipeline architecture and the costs involved with the re-rolling process to fix bugs they needed something new and had three options

As a compromise between all three Cisco Started dreaming up this programmable ASIC design in 2007-2008 the idea was to build a chip with programmable stages that can be updated with firmware updates instead of writing the logic into the silicon permanently.

they released the programmable ASIC technology initially for their QFP (Quantum flow processor) ASIC in the ISR router family to meet the customer needs (service providers and large enterprises)

This chip allowed them to support advanced routing technologies and new protocols without changing hardware simply via firmware updates improving the longevity of the investment allowing them to make more money out of the chips extended life cycle.

Eventually, this technology trickled downstream and the Doppler 1.0 was born

Improvements and features in UADP 1.0/1.1/2.0

  • Programmable stages in the pipeline
  • Cisco intent Driven networking support – DNA Center with ISE
  • Intergarted Stacking Support with Stack power – ASIC is built with pinouts for the stacking fabric allowing faster stacking performance
  • Rapid Recirculation (Encapsulation such as MPLS, VXLAN)
  • TrustSec
  • Advance on-chip QOS
  • Software-defined networking support – integrated NetFlow, SD access
  • Flex Parser – Programmable packet parser allowing them to introduce support for new protocols with firmware updates
  • On-chip Micro-engines – Highly specialized engines within in the chip to perform repetitive tasks such as
    • Reassembly
    • Encryption/Decryption
    •  Fragmentation
  • CAPWAP – Switch can function as a wireless Lan Controller
    • Mobility agent – Offload Qo and other functions from the WLC (IMO Works really nicely with multi-site wireless deployments)
    • Mobility Controller – Fullblown Wireless LAN controller (WLC)
  • Extended life cycle allowed integration of Cisco security technologies such as Cisco DNA + ISE later down the line even on the first generation switches
  • Multigig and 40GE speed support
  • Advanced malware detection capabilities via packet fingerprinting

Legacy Fixed pipeline architecture

Programmable pipeline architecture

Doppler/UADP 1.0 (2013)

While doppler1.0 programmable ASIC handling the Data plane coupled with Cavium CPU for the control plane the first generation of the switches to ship with these chip was the 3650-x and 3850-x gigabit versions

  • Built on 65 Nanometer Process
  • 1.3 billion transistors

UADP 1.1 (2015)

  • Die Shrink to 45 Nanometer
  • 3 billion transistors

UADP 2.0 (2017)

  • Built on 28nm/16nm Process
  • Equipped with an Intel Xeon D (Dual-core X86) CPU for the control plane
  • Open-IOS-XE

7.4 billion transistors

Flexible ASIC Templates –  

Allows Cisco to write templates that can optimize the chip resources for different use cases

the new Catalyst 9000 series will replace the following campus switching families built on the older Strider and more recent UADP 1 and 1.1 ASICS

  • Catalyst 2K —–> Catalyst 9200
  • Catalyst 3K —–> Catalyst 9300
  • Catalyst 4K —–> Catalyst 9400
  • Catalyst 6K —–> Catalyst 9500

I’m will update/fix this post when I find more info about the UADP 2 and the next evolution, stay tuned for a few more articles based on the silicon used in open networking X86 chassis.