Kubernetes Internals Guide

Kubernetes Internals

Control plane, scheduling, networking, storage, and the reconciliation loop — how Kubernetes actually works

┌──────────────────────────────────────────────────────────────────────┐ │ Control Plane │ │ │ │ ┌─────────────────┐ ┌──────────┐ ┌────────────────────────┐ │ │ │ kube-apiserver │◄──► etcd │ │ kube-controller-mgr │ │ │ │ (REST / watch) │ │ (state) │ │ (reconcile loops) │ │ │ └────────┬─────────┘ └──────────┘ └───────────┬────────────┘ │ │ │ │ │ │ ┌────────▼──────────────────────────────────────────────────────┐ │ │ │ kube-scheduler │ │ │ │ filter → score → bind │ │ │ └────────────────────────────────────────────────────────────────┘ │ └────────────────────────┬─────────────────────────────────────────────┘ │ (watches API server) ┌────────────────┼────────────────┐ │ │ │ ┌───────▼──────┐ ┌──────▼───────┐ ┌────▼──────────┐ │ Worker 1 │ │ Worker 2 │ │ Worker N │ │ │ │ │ │ │ │ kubelet │ │ kubelet │ │ kubelet │ │ kube-proxy │ │ kube-proxy │ │ kube-proxy │ │ container │ │ container │ │ container │ │ runtime │ │ runtime │ │ runtime │ └──────────────┘ └──────────────┘ └───────────────┘

Architecture Overview

Kubernetes is a declarative control system. Users express desired state via the API; a set of reconciliation loops continuously drives actual state toward desired state. The cluster is split into a control plane (brain) and worker nodes (muscle).

Control Plane Components

Component	Role
kube-apiserver	Single entry point for all REST operations. Persists state to etcd. Implements authentication, authorization, and admission.
etcd	Distributed key-value store (Raft). The only stateful component. All cluster state lives here.
kube-scheduler	Watches for unscheduled pods, selects a node via filter + score, writes the binding back to the API server.
kube-controller-manager	Runs all built-in control loops (Deployment, ReplicaSet, Node, Endpoints, …) in a single binary.
cloud-controller-manager	Optional. Integrates with cloud provider APIs (LoadBalancer, Node, Route).

Node Components

Component	Role
kubelet	Runs on every node. Watches the API server for pods assigned to its node; drives the container runtime via CRI.
kube-proxy	Maintains network rules (iptables / ipvs) that implement Service VIPs.
container runtime	Implements CRI (containerd, CRI-O). Pulls images, creates/deletes containers.

Core design principles

Level-triggered, not edge-triggered. Controllers observe the full current state on every reconcile, not just the delta. This makes the system self-healing — a controller that misses an event will correct itself on the next sync.

Optimistic concurrency. Every API object carries a resourceVersion. Updates include this field; the API server rejects stale writes (HTTP 409), forcing clients to re-read and retry.

kube-apiserver Control Plane

The API server is the hub of the cluster. It validates and persists objects, enforces policy, and serves a long-poll watch mechanism that all other components use to react to state changes.

Request lifecycle

Flow

Client → Authentication → Authorization (RBAC) → Admission (Mutating webhooks)
       → Validation → Admission (Validating webhooks) → Persist to etcd → Response

Authentication methods

Method	How it works
X.509 client certs	CN = username, O = groups. Used by system components and kubeadm-generated kubeconfigs.
Bearer tokens	ServiceAccount tokens (JWT signed by API server), static token files, OIDC tokens.
Bootstrap tokens	Short-lived tokens for node bootstrapping (`kubeadm join`).
Webhook	API server calls an external service to validate the token.

Watch mechanism

The watch API (?watch=true) keeps an HTTP/2 stream open. The API server pushes ADDED, MODIFIED, and DELETED events as objects change. Controllers use informers (client-go) to multiplex a single watch into an in-memory cache + work queue, avoiding thundering-herd fan-out.

Shell

# Watch pods as raw events
kubectl get pods --watch -o json

# Inspect audit log (if enabled)
cat /var/log/kubernetes/audit.log | jq .

# Check API server health
kubectl get --raw /healthz
kubectl get --raw /readyz
kubectl get --raw /livez

Admission controllers

Admission controllers sit between authorization and persistence. Mutating webhooks run first (can modify the object), then validating webhooks (can only accept/reject). Important built-in controllers:

NamespaceLifecycle

Prevents creating objects in terminating namespaces and protects system namespaces from deletion.

LimitRanger

Applies default resource requests/limits when not set, enforces LimitRange constraints.

ServiceAccount

Automatically injects the default ServiceAccount and mounts its token into pods.

ResourceQuota

Rejects objects that would exceed namespace resource quotas.

etcd Control Plane

etcd is a strongly consistent, distributed key-value store based on the Raft consensus algorithm. It is the single source of truth for all cluster state. Losing etcd without a backup means losing the cluster.

Raft basics

Raft requires a quorum of (n/2)+1 nodes to commit a write. A 3-node etcd cluster tolerates 1 failure; 5-node tolerates 2. Quorum is needed for writes and leader election, but reads can be served stale from followers (etcd defaults to linearizable reads from the leader).

etcd storage layout in Kubernetes

Shell

# Keys are prefixed by resource path
/registry/pods/default/my-pod
/registry/deployments/production/my-app
/registry/services/endpoints/kube-system/kube-dns

# Read a key directly (requires etcdctl and certs)
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  get /registry/pods/default/my-pod | strings | head -50

# List all keys
etcdctl get / --prefix --keys-only

Backup and restore

Shell

# Snapshot backup
etcdctl snapshot save /backup/etcd-snapshot-$(date +%F).db

# Verify snapshot
etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table

# Restore (run before starting etcd)
etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restored \
  --name=etcd-node1 \
  --initial-cluster=etcd-node1=https://10.0.0.1:2380 \
  --initial-advertise-peer-urls=https://10.0.0.1:2380

Compaction and defragmentation

Shell

# Get current revision
etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision'

# Compact to current revision (removes old revisions)
etcdctl compact $(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')

# Defragment (reclaims disk space after compaction)
etcdctl defrag

kube-scheduler Control Plane

The scheduler watches for pods with spec.nodeName == "" and assigns them to nodes. Scheduling is a two-phase process: filtering (eliminate ineligible nodes) then scoring (rank remaining nodes).

Scheduling pipeline

Flow

New pod (nodeName="") detected
  │
  ▼
Filter plugins — eliminate nodes that cannot run the pod
  ├─ NodeUnschedulable      (node has NoSchedule taint / unschedulable flag)
  ├─ NodeResourcesFit       (insufficient CPU/memory)
  ├─ NodeAffinity           (nodeSelector, nodeAffinity)
  ├─ TaintToleration        (pod must tolerate node taints)
  ├─ PodTopologySpread      (spread constraints)
  └─ VolumeBinding          (PVCs that need specific node topology)
  │
  ▼
Score plugins — rank feasible nodes (0-100 each, weighted sum)
  ├─ NodeResourcesBalancedAllocation   (prefer balanced CPU/mem usage)
  ├─ LeastAllocated                    (prefer nodes with most free resources)
  ├─ ImageLocality                     (prefer nodes with image already pulled)
  └─ InterPodAffinity                  (prefer nodes satisfying pod affinity)
  │
  ▼
Select highest-score node → write Binding object → kubelet picks it up

Taints and tolerations

Shell

# Add a taint to a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule
kubectl taint nodes node1 dedicated=gpu:NoExecute

# Remove a taint
kubectl taint nodes node1 dedicated=gpu:NoSchedule-

# Pod toleration (YAML)
tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

Pod affinity and anti-affinity

YAML

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values: [us-east-1a, us-east-1b]
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: my-app
      topologyKey: kubernetes.io/hostname   # one pod per host

Preemption

If a high-priority pod cannot be scheduled, the scheduler may evict lower-priority pods to make room. Pods with PriorityClass set are considered; the scheduler finds a node where evicting low-priority pods would free enough resources.

YAML

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority   # or Never

kube-controller-manager Control Plane

A collection of reconciliation loops in one binary. Each controller watches specific resources, compares current state to desired state, and takes corrective action. Controllers use informers (shared cache + event handlers) and work queues to avoid overwhelming the API server.

Reconciliation loop pattern

Pseudocode

loop:
  desired = read desired state from API object
  actual  = observe current state (list pods, check nodes, …)
  diff    = desired - actual
  if diff != empty:
    apply changes (create/update/delete resources)
  sleep resyncPeriod (or wait for watch event)

Key built-in controllers

Controller	Watches	Action
ReplicaSet	ReplicaSet, Pod	Creates/deletes pods to match `spec.replicas`
Deployment	Deployment, ReplicaSet	Creates/scales ReplicaSets for rolling updates
StatefulSet	StatefulSet, Pod	Manages ordered pod creation with stable identities
DaemonSet	DaemonSet, Node, Pod	Ensures one pod per (selected) node
Job / CronJob	Job, Pod	Runs pods to completion; CronJob schedules Jobs
Node	Node	Marks nodes NotReady; evicts pods after `pod-eviction-timeout`
Endpoints	Service, Pod	Keeps Endpoints objects in sync with ready pod IPs
Namespace	Namespace	Cleans up resources in terminating namespaces

Informer / work-queue internals

Flow

API Server watch stream
  │ ADDED/MODIFIED/DELETED events
  ▼
Informer (per resource type, shared across controllers)
  ├─ ThreadSafeStore (in-memory cache — avoids API calls for reads)
  └─ Event handlers → enqueue object key (namespace/name)
        │
        ▼
     Work Queue (rate-limited, deduplicating)
        │
        ▼
     Worker goroutine → Reconcile(key)
        └─ reads from cache (lister)
        └─ calls API server only for writes

kubelet Node

The kubelet is the primary node agent. It watches for pods assigned to its node via the API server (and optionally static pod manifests), then drives the container runtime via CRI to converge actual pod state toward desired state.

Pod lifecycle (kubelet perspective)

Flow

Pod assigned to node (spec.nodeName set by scheduler)
  │
  ▼
Admit pod — check resources, enforce policies (CPU manager, topology)
  │
  ▼
Setup volumes (CNI calls for network, CSI for volumes)
  │
  ▼
Pull images (via CRI → container runtime → image service)
  │
  ▼
Create sandbox (pause container establishes network namespace)
  │
  ▼
Run init containers sequentially (each must succeed before next)
  │
  ▼
Run app containers (in parallel)
  │
  ├─ Execute postStart hook (if defined)
  ├─ Start liveness / readiness / startup probes
  └─ Report status back to API server
  │
  ▼ (on termination)
Send SIGTERM → wait terminationGracePeriodSeconds → SIGKILL
Execute preStop hook before SIGTERM

CRI (Container Runtime Interface)

The kubelet communicates with the container runtime via gRPC over a Unix socket. CRI separates the kubelet from runtime-specific code.

Shell

# Inspect containers via crictl (bypasses kubelet)
crictl ps                        # list running containers
crictl pods                      # list pods
crictl inspect <container-id>   # full container state
crictl logs <container-id>
crictl exec -it <container-id> sh

# Check which runtime is in use
kubectl get node -o wide         # CONTAINER-RUNTIME column
cat /var/lib/kubelet/config.yaml | grep containerRuntimeEndpoint

Node conditions and taints

Condition	Meaning	Auto-taint added
Ready=True	kubelet healthy, node can accept pods	—
MemoryPressure	Node is low on memory	node.kubernetes.io/memory-pressure:NoSchedule
DiskPressure	Node disk is nearly full	node.kubernetes.io/disk-pressure:NoSchedule
PIDPressure	Too many processes on node	node.kubernetes.io/pid-pressure:NoSchedule
Ready=False/Unknown	kubelet lost contact	node.kubernetes.io/not-ready:NoExecute (after timeout)

Static pods

Shell

# Static pod manifests (control plane components on kubeadm clusters)
ls /etc/kubernetes/manifests/
# kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml  etcd.yaml

# kubelet watches this directory; changes take effect immediately
# Static pods are mirrored as read-only objects in the API server

kube-proxy Node

kube-proxy implements the Service abstraction by programming kernel networking rules. It watches Services and EndpointSlices and translates ClusterIP VIPs into real pod IPs.

Modes

Mode	Mechanism	Notes
iptables (default)	DNAT rules in iptables PREROUTING/OUTPUT chains. Selects pod randomly per connection.	O(n) rule lookup; performance degrades with many Services.
ipvs	Linux IPVS (LVS) in kernel netfilter. Hash-table lookup, O(1). Supports LB algorithms: rr, lc, sh, sed, nq.	Requires ipvs kernel modules. Better at scale.
nftables	nftables sets for lookup (Kubernetes ≥ 1.31, alpha/beta).	Modern replacement for iptables mode.

How iptables DNAT works for ClusterIP

Shell

# Inspect kube-proxy iptables rules
iptables -t nat -L KUBE-SERVICES -n --line-numbers
iptables -t nat -L KUBE-SVC-<hash> -n       # per-service chain
iptables -t nat -L KUBE-SEP-<hash> -n       # per-endpoint chain (DNAT)

# Example flow for ClusterIP 10.96.0.10:80 with 3 endpoints
# KUBE-SERVICES → KUBE-SVC-xxx → statistic --probability 0.33 → KUBE-SEP-1 (DNAT pod1:8080)
#                                                                  → KUBE-SEP-2 (DNAT pod2:8080)
#                                                                  → KUBE-SEP-3 (DNAT pod3:8080)

# Check kube-proxy mode
kubectl -n kube-system get configmap kube-proxy -o yaml | grep mode

Service types

Type	How it works
ClusterIP	Virtual IP reachable only within the cluster. kube-proxy programs DNAT rules on every node.
NodePort	Opens a port (30000–32767) on every node. Traffic → NodePort → ClusterIP → pod. External firewall rules needed.
LoadBalancer	Provisions cloud load balancer via cloud-controller-manager. Gets an external IP. Includes NodePort.
ExternalName	Returns a CNAME for an external DNS name. No proxying; DNS only.
Headless (clusterIP: None)	DNS returns individual pod IPs directly. Used by StatefulSets for stable DNS names per pod.

Networking Network

Kubernetes networking follows three flat-network rules: every pod gets a unique IP, pods can communicate with any other pod without NAT, and nodes can communicate with pods without NAT. CNI plugins implement the pod network.

Kubernetes networking model

Diagram

Node A (10.0.0.1)                     Node B (10.0.0.2)
├─ eth0: 10.0.0.1                     ├─ eth0: 10.0.0.2
├─ cni0 bridge: 10.244.0.1/24         ├─ cni0 bridge: 10.244.1.1/24
│   ├─ veth → Pod A1: 10.244.0.2      │   ├─ veth → Pod B1: 10.244.1.2
│   └─ veth → Pod A2: 10.244.0.3      │   └─ veth → Pod B2: 10.244.1.3
│                                     │
└─── Overlay / BGP routes ────────────┘
     (flannel VXLAN, Calico BGP, Cilium eBPF, …)

CNI plugin comparison

Plugin	Dataplane	NetworkPolicy	Notes
Flannel	VXLAN overlay	No (needs Calico or Kube-router)	Simple; good for learning and small clusters.
Calico	BGP routes (no overlay) or VXLAN	Yes + extended policies	Popular in production; works well with BIRD BGP.
Cilium	eBPF (no kube-proxy needed)	Yes + L7 HTTP/gRPC	Best performance and observability. Replaces kube-proxy.
Weave	VXLAN + fast datapath	Yes	Easy setup; encrypted by default.

DNS (CoreDNS)

Shell

# DNS name formats
my-svc.my-ns.svc.cluster.local          # ClusterIP service
my-pod.my-ns.pod.cluster.local           # pod IP (dots replaced with dashes)
pod-0.my-svc.my-ns.svc.cluster.local    # StatefulSet pod via headless service

# Debug DNS from a pod
kubectl run -it --rm dnstest --image=busybox --restart=Never -- sh
/ # nslookup kubernetes.default.svc.cluster.local
/ # cat /etc/resolv.conf

# Check CoreDNS config
kubectl -n kube-system get configmap coredns -o yaml

NetworkPolicy

YAML

# Default deny all ingress in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}        # select all pods
  policyTypes: [Ingress]
---
# Allow ingress only from app=frontend pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - port: 8080

Storage Storage

Kubernetes storage is abstracted through PersistentVolumes (PVs), PersistentVolumeClaims (PVCs), and StorageClasses. The Container Storage Interface (CSI) is the standard plugin mechanism for storage drivers.

PV / PVC lifecycle

Flow

StorageClass (provisioner, parameters, reclaimPolicy)
  │
  │ (dynamic provisioning)
  ▼
PersistentVolumeClaim created by user
  │  storageClassName: fast-ssd
  │  storage: 10Gi
  │  accessModes: [ReadWriteOnce]
  │
  ▼  (external-provisioner sidecar watches PVCs)
CSI driver called: CreateVolume RPC
  │
  ▼
PersistentVolume created (Bound to PVC)
  │
  ▼
Pod references PVC → kubelet calls NodeStageVolume + NodePublishVolume
  │
  ▼
Volume mounted at pod path

On PVC delete: ReclaimPolicy
  Retain  → PV remains (manual cleanup)
  Delete  → CSI DeleteVolume called, PV deleted

Access modes

Mode	Abbreviation	Meaning
ReadWriteOnce	RWO	One node mounts read-write. Most block storage (EBS, PD, Azure Disk).
ReadOnlyMany	ROX	Many nodes mount read-only.
ReadWriteMany	RWX	Many nodes mount read-write. Requires NFS, CephFS, or similar.
ReadWriteOncePod	RWOP	Single pod mounts read-write (Kubernetes ≥ 1.22).

CSI architecture

Flow

kubelet                     CSI driver pod
  │                           │
  │  NodeStageVolume (gRPC)   │  ← format + mount to staging path
  │──────────────────────────►│
  │  NodePublishVolume        │  ← bind-mount staging → pod path
  │──────────────────────────►│

External controller sidecar (runs alongside driver):
  provisioner  → CreateVolume / DeleteVolume
  attacher     → ControllerPublishVolume / ControllerUnpublishVolume
  resizer      → ControllerExpandVolume

Volume types quick reference

emptyDir

Ephemeral dir in pod's node scratch space. Shared between containers in a pod. Deleted when pod dies.

hostPath

Mounts a node path into the pod. Avoid in production — ties pod to a specific node and risks host access.

configMap / secret

Projected as files or env vars. ConfigMap for config data, Secret for sensitive data (base64-encoded in etcd by default; encrypt at rest).

projected

Combines multiple volume sources (ServiceAccount token, ConfigMap, Secret, downwardAPI) into a single directory.

Workloads Workloads

Kubernetes workload controllers manage sets of pods. Understanding how each controller updates pods is critical for operating stateful and stateless services safely.

Deployment rolling update internals

Flow

kubectl set image deployment/my-app container=image:v2
  │
  ▼
Deployment controller creates new ReplicaSet (RS-v2, replicas=0)
  │
  ▼
Scale RS-v2 up by 1  (maxSurge=1 → can go 1 above desired)
  │
  ▼
Wait for new pod Ready
  │
  ▼
Scale RS-v1 down by 1  (maxUnavailable=1 → can be 1 below desired)
  │
  ▼
Repeat until RS-v2 = desired, RS-v1 = 0
  │
  ▼
Old RS kept (scale=0) for rollback history (revisionHistoryLimit)

StatefulSet guarantees

Property	Behaviour
Stable network identity	Pod name is `<name>-<ordinal>`. DNS: `<pod>.<headless-svc>.<ns>.svc.cluster.local`. Survives rescheduling.
Stable storage	VolumeClaimTemplate creates a PVC per pod. PVC is not deleted when the pod is deleted.
Ordered creation	Pods created 0 → N−1. Each must be Running+Ready before the next is created.
Ordered deletion	Pods deleted N−1 → 0 (reverse order) by default (`OrderedReady` policy).
Parallel policy	`podManagementPolicy: Parallel` — create/delete all pods simultaneously (updates still ordered).

DaemonSet scheduling

DaemonSet pods bypass the scheduler — the DaemonSet controller writes spec.nodeName directly. This means DaemonSet pods can be placed on nodes that are unschedulable (e.g., control plane nodes) when tolerations are set appropriately. The pods also start before the scheduler is fully ready during cluster bootstrap.

HorizontalPodAutoscaler internals

Flow

HPA controller (in controller-manager) runs every --horizontal-pod-autoscaler-sync-period (15s)
  │
  ├─ Queries metrics-server (or custom/external metrics adapter)
  │    currentMetricValue = avg CPU across pods
  │
  ├─ desiredReplicas = ceil(currentReplicas × (currentValue / targetValue))
  │
  └─ Scales Deployment / ReplicaSet if outside [minReplicas, maxReplicas]
       Cooldown: --horizontal-pod-autoscaler-downscale-stabilization (5m default)

Security Security

Kubernetes security involves cluster-level (RBAC, admission, network policies) and workload-level (pod security, secrets management) controls.

RBAC model

YAML

# Role (namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
# RoleBinding: bind Role to a ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: default
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

RBAC: Role vs ClusterRole

Object	Scope	Use
Role	Namespace	Grant access to namespaced resources within one namespace.
ClusterRole	Cluster-wide	Grant access to cluster-scoped resources (Nodes, PVs) or to namespaced resources across all namespaces.
RoleBinding	Namespace	Binds a Role or ClusterRole to subjects within a namespace.
ClusterRoleBinding	Cluster-wide	Binds a ClusterRole to subjects cluster-wide.

ServiceAccount token projection

Since Kubernetes 1.21, pods use projected ServiceAccount tokens (bound tokens) by default instead of long-lived static tokens. These tokens are audience-bound, time-limited (1 hour default), and rotated automatically by the kubelet. They are mounted via the projected volume type, not via a Secret.

Secrets encryption at rest

YAML

# /etc/kubernetes/enc/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: [secrets]
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}   # fallback: read unencrypted (for migration)

# Then restart kube-apiserver with:
# --encryption-provider-config=/etc/kubernetes/enc/encryption-config.yaml

# Rewrite all secrets to encrypt them
kubectl get secrets --all-namespaces -o json | kubectl replace -f -

Pod Security Admission (PSA)

Level	Restrictions
privileged	No restrictions. Same as no policy.
baseline	Blocks most known privilege escalations: no privileged containers, no hostPID/hostNetwork, restricted capabilities.
restricted	Heavily restricted: must run as non-root, drop ALL capabilities, no privilege escalation, seccomp required.

Shell

# Enable PSA on a namespace via labels
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted

Cheat Sheet Reference

Inspect cluster internals

Shell

# Control plane component health
kubectl get componentstatuses
kubectl get --raw /healthz

# Node internals
kubectl describe node <node>          # conditions, capacity, allocatable, events
kubectl top node                       # requires metrics-server
kubectl get events --sort-by=.metadata.creationTimestamp

# Pod scheduling
kubectl describe pod <pod>            # Events: FailedScheduling, etc.
kubectl get pod -o wide               # which node, IP
kubectl get pod <pod> -o jsonpath='{.spec.nodeName}'

# Watch reconciliation
kubectl get rs --watch                # watch ReplicaSet convergence
kubectl rollout status deploy/my-app  # watch Deployment rollout
kubectl rollout history deploy/my-app
kubectl rollout undo deploy/my-app    # roll back

etcd

etcdctl endpoint health — check cluster
etcdctl snapshot save — backup
etcdctl get /registry/ --prefix --keys-only
Quorum: (n/2)+1 nodes needed for writes

API Server

kubectl get --raw /apis — list API groups
kubectl api-resources — all resource types
kubectl explain pod.spec — schema docs
kubectl auth can-i list pods --as user

Networking

iptables -t nat -L KUBE-SERVICES -n
kubectl -n kube-system logs -l k8s-app=kube-dns
kubectl exec -it pod -- curl svc:port
kubectl port-forward svc/my-svc 8080:80

Storage

kubectl get pv,pvc — PV/PVC status
kubectl describe pvc my-claim — binding events
kubectl get sc — StorageClasses
PVC stuck Pending: check StorageClass, CSI driver logs

Workloads

kubectl rollout status deploy/x
kubectl rollout undo deploy/x --to-revision=2
kubectl scale deploy/x --replicas=5
kubectl get hpa — autoscaler status

Security

kubectl auth can-i '*' '*' --all-namespaces
kubectl get rolebindings,clusterrolebindings -A
kubectl get secret -o yaml | base64 -d
Check: pod-security.kubernetes.io/* labels on ns