Kubernetes

2. Architecture & Control Plane

Mar 27, 2026·20 min read

The Two Halves of Kubernetes

Every Kubernetes cluster splits into two layers: the control plane and the worker nodes.

The control plane is the brain. It stores desired state, makes scheduling decisions, and runs reconciliation loops. It never runs your application containers directly.

Worker nodes are the muscle. They run your actual workloads — your pods, your containers. Each node has an agent (kubelet) that takes orders from the control plane and reports back on status.

In managed services like GKE, EKS, or AKS, the cloud provider runs the control plane for you. You only manage the worker nodes. In self-managed clusters (kubeadm, k3s), you manage both.

bash

# See this split in action:
kubectl get nodes
# NAME        STATUS   ROLES           AGE   VERSION
# master-1    Ready    control-plane   30d   v1.29.2
# worker-1    Ready    <none>          30d   v1.29.2
# worker-2    Ready    <none>          30d   v1.29.2

Let's walk through every component, starting with the control plane.

API Server: The Front Door

The API server (kube-apiserver) is the only component that talks to everything. Every interaction — from kubectl commands to internal component communication — goes through it.

It does three things:

1.Authentication & Authorization — Who are you? Are you allowed to do this?

2.Admission Control — Mutating and validating webhooks run here (e.g., injecting sidecars, enforcing policies)

3.Persisting to etcd — Once validated, the object is written to the cluster store

# The API server runs as a pod in kube-system
kubectl get pods -n kube-system | grep apiserver
# kube-apiserver-master-1   1/1   Running   0   30d

# Check its config
kubectl describe pod kube-apiserver-master-1 \
  -n kube-system | grep -A5 "Command:"

Think of the API server as a bouncer + receptionist + filing clerk. Nothing happens in the cluster without it knowing.

etcd: The Source of Truth

etcd is a distributed key-value store that holds all cluster state. Every Deployment, Service, ConfigMap, Secret, and Pod definition lives here.

Only the API server reads from and writes to etcd directly. No other component touches it. This is by design — it centralizes access control and consistency.

# etcd runs as a pod (self-managed clusters)
kubectl get pods -n kube-system | grep etcd
# etcd-master-1   1/1   Running   0   30d

# Check etcd health
kubectl exec -it etcd-master-1 -n kube-system -- \
  etcdctl endpoint health \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Production rule: Always back up etcd on a schedule. In managed K8s (GKE, EKS), the provider does this for you. In self-managed clusters, this is your responsibility. Test your restores.

Scheduler: Pod Placement

The scheduler (kube-scheduler) watches for newly created pods that don't have a node assigned yet, then picks the best node for each one.

It runs a two-phase algorithm:

1.Filtering — Eliminate nodes that can't run the pod (not enough CPU, taints don't match tolerations, node selector doesn't match)

2.Scoring — Rank the remaining nodes (prefer spreading pods across zones, prefer nodes with images already cached, balance resource usage)

# The scheduler uses resource requests for placement
spec:
  containers:
    - name: api
      resources:
        requests:         # ← scheduler looks at these
          memory: "512Mi"
          cpu: "250m"
        limits:           # ← kubelet enforces these
          memory: "1Gi"
          cpu: "500m"
# If no node has 512Mi free, the pod stays Pending

The scheduler doesn't move running pods. It only makes placement decisions for new ones. If a node dies, the controller manager creates new pods, and the scheduler places those.

Controller Manager: The Reconciliation Engine

The controller manager (kube-controller-manager) runs dozens of controllers in a single process. Each controller is a loop that watches the current state, compares it to the desired state, and takes action to close the gap.

ReplicaSet Controller
  Desired: 5 replicas     Actual: 3 running
  Action: Create 2 more pods

Node Controller
  Watches node heartbeats (every 10s)
  Node stops reporting → mark NotReady (40s)
  Still gone → evict pods (5min default)

Deployment Controller
  Manages ReplicaSets for rolling updates
  Old RS: scale down     New RS: scale up

Job Controller
  Runs pods to completion
  Retries on failure up to backoffLimit

Endpoint Controller
  Updates Service → Pod IP mappings
  When pods come/go, endpoints update automatically

This is the magic of Kubernetes: you describe the desired state, and controllers continuously work to make it real. It's not a one-time action — it's a loop that never stops.

Kubelet: The Node Agent

The kubelet runs on every worker node. It's the bridge between the control plane and the actual containers. It watches the API server for pods assigned to its node, then makes them happen.

Kubelet responsibilities:

1. Watches API server for pod assignments
   "Is there a new pod for my node?"

2. Pulls container images via container runtime
   "containerd, pull myregistry/api:v2"

3. Creates and starts containers
   "Start this container with these env vars,
    mounts, and resource limits"

4. Runs health checks (liveness, readiness, startup)
   "GET /health every 10s, restart if 3 failures"

5. Reports pod status back to API server
   "Pod nginx is Running, container ready"

6. Reports node status (capacity, allocatable)
   "I have 4 CPU, 16Gi memory, 110 pods max"

Key distinction: the kubelet is the only component that runs directly on the host, not as a pod. It's a systemd service. This makes sense — you need the kubelet running before any pods can exist on that node.

Kube-proxy: Service Networking

kube-proxy runs on every node and implements Kubernetes Service networking. When you create a Service, kube-proxy sets up rules so that traffic to the Service IP gets routed to the right pods.

# You create a Service:
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api         # ← matches pods with this label
  ports:
    - port: 80       # ← Service listens on :80
      targetPort: 3000  # ← forwards to pod :3000

# kube-proxy creates iptables/IPVS rules:
# Traffic to api:80 (ClusterIP) →
#   round-robin to pod IPs on port 3000
#
# Other pods just call http://api:80
# DNS resolves "api" → ClusterIP
# kube-proxy handles the rest

In newer clusters, you may see Cilium or Calico replacing kube-proxy entirely with eBPF-based networking. Same job, better performance.

Container Runtime: Where Containers Actually Run

The container runtime is what actually pulls images and starts containers. The kubelet talks to it via the Container Runtime Interface (CRI) — a standardized API so Kubernetes doesn't care which runtime you use.

containerd (most common)
  - Default in GKE, EKS, AKS, k3s
  - Lightweight, purpose-built for K8s
  - Spun out of Docker in 2017
  - Uses: containerd → runc → Linux namespaces

CRI-O
  - Built specifically for Kubernetes
  - Default in OpenShift (Red Hat)
  - Minimal, follows OCI standards exactly

Docker (deprecated as CRI in K8s 1.24)
  - dockerd → containerd → runc
  - Extra layer of indirection, removed in 1.24
  - Your Docker images still work everywhere
  - Only the runtime shim was removed

Common misconception: "Kubernetes dropped Docker support." Not quite. K8s dropped the dockershim (the CRI adapter for Docker's runtime). Docker-built images are OCI-compliant and work on any runtime. You build with Docker, run with containerd.

How a Pod Gets Scheduled: Step by Step

Let's trace what happens when you run kubectl apply -f pod.yaml, from keystroke to running container.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: web
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"

The entire flow takes seconds. No human intervention. Every component did exactly one job: API server validated, etcd stored, scheduler placed, kubelet executed. This separation of concerns is what makes Kubernetes reliable at scale.

Inspect Every Component Live

Theory is good. Running commands on a real cluster is better. Here are the commands that let you see each architectural component in action.

# Cluster info and health
kubectl cluster-info
# Kubernetes control plane is running at https://...
# CoreDNS is running at https://...

# All nodes and their roles
kubectl get nodes -o wide
# NAME      STATUS  ROLES          VERSION  INTERNAL-IP  OS-IMAGE
# master-1  Ready   control-plane  v1.29.2  10.0.0.1     Ubuntu 22.04
# worker-1  Ready   <none>         v1.29.2  10.0.0.2     Ubuntu 22.04

# Control plane components (running as pods)
kubectl get pods -n kube-system
# kube-apiserver-master-1        1/1   Running
# kube-controller-manager-master-1  1/1   Running
# kube-scheduler-master-1        1/1   Running
# etcd-master-1                  1/1   Running
# kube-proxy-xxxxx               1/1   Running  (per node)
# coredns-xxxxx                  1/1   Running

→Run these commands on a kind or minikube cluster to see everything live

→kubectl describe is your best friend for debugging any K8s object

→Events are chronological - they tell you exactly what happened and when

⏭Next in Part 3: Pods, Deployments, and ReplicaSets - the core workload primitives