Kubernetes Internals
Control plane, scheduling, networking, storage, and the reconciliation loop — how Kubernetes actually works
┌──────────────────────────────────────────────────────────────────────┐ │ Control Plane │ │ │ │ ┌─────────────────┐ ┌──────────┐ ┌────────────────────────┐ │ │ │ kube-apiserver │◄──► etcd │ │ kube-controller-mgr │ │ │ │ (REST / watch) │ │ (state) │ │ (reconcile loops) │ │ │ └────────┬─────────┘ └──────────┘ └───────────┬────────────┘ │ │ │ │ │ │ ┌────────▼──────────────────────────────────────────────────────┐ │ │ │ kube-scheduler │ │ │ │ filter → score → bind │ │ │ └────────────────────────────────────────────────────────────────┘ │ └────────────────────────┬─────────────────────────────────────────────┘ │ (watches API server) ┌────────────────┼────────────────┐ │ │ │ ┌───────▼──────┐ ┌──────▼───────┐ ┌────▼──────────┐ │ Worker 1 │ │ Worker 2 │ │ Worker N │ │ │ │ │ │ │ │ kubelet │ │ kubelet │ │ kubelet │ │ kube-proxy │ │ kube-proxy │ │ kube-proxy │ │ container │ │ container │ │ container │ │ runtime │ │ runtime │ │ runtime │ └──────────────┘ └──────────────┘ └───────────────┘
Architecture Overview

Kubernetes is a declarative control system. Users express desired state via the API; a set of reconciliation loops continuously drives actual state toward desired state. The cluster is split into a control plane (brain) and worker nodes (muscle).

Control Plane Components

ComponentRole
kube-apiserverSingle entry point for all REST operations. Persists state to etcd. Implements authentication, authorization, and admission.
etcdDistributed key-value store (Raft). The only stateful component. All cluster state lives here.
kube-schedulerWatches for unscheduled pods, selects a node via filter + score, writes the binding back to the API server.
kube-controller-managerRuns all built-in control loops (Deployment, ReplicaSet, Node, Endpoints, …) in a single binary.
cloud-controller-managerOptional. Integrates with cloud provider APIs (LoadBalancer, Node, Route).

Node Components

ComponentRole
kubeletRuns on every node. Watches the API server for pods assigned to its node; drives the container runtime via CRI.
kube-proxyMaintains network rules (iptables / ipvs) that implement Service VIPs.
container runtimeImplements CRI (containerd, CRI-O). Pulls images, creates/deletes containers.

Core design principles

Level-triggered, not edge-triggered. Controllers observe the full current state on every reconcile, not just the delta. This makes the system self-healing — a controller that misses an event will correct itself on the next sync.
Optimistic concurrency. Every API object carries a resourceVersion. Updates include this field; the API server rejects stale writes (HTTP 409), forcing clients to re-read and retry.
kube-apiserver Control Plane

The API server is the hub of the cluster. It validates and persists objects, enforces policy, and serves a long-poll watch mechanism that all other components use to react to state changes.

Request lifecycle

Flow
Client → Authentication → Authorization (RBAC) → Admission (Mutating webhooks)
       → Validation → Admission (Validating webhooks) → Persist to etcd → Response

Authentication methods

MethodHow it works
X.509 client certsCN = username, O = groups. Used by system components and kubeadm-generated kubeconfigs.
Bearer tokensServiceAccount tokens (JWT signed by API server), static token files, OIDC tokens.
Bootstrap tokensShort-lived tokens for node bootstrapping (kubeadm join).
WebhookAPI server calls an external service to validate the token.

Watch mechanism

The watch API (?watch=true) keeps an HTTP/2 stream open. The API server pushes ADDED, MODIFIED, and DELETED events as objects change. Controllers use informers (client-go) to multiplex a single watch into an in-memory cache + work queue, avoiding thundering-herd fan-out.

Shell
# Watch pods as raw events
kubectl get pods --watch -o json

# Inspect audit log (if enabled)
cat /var/log/kubernetes/audit.log | jq .

# Check API server health
kubectl get --raw /healthz
kubectl get --raw /readyz
kubectl get --raw /livez

Admission controllers

Admission controllers sit between authorization and persistence. Mutating webhooks run first (can modify the object), then validating webhooks (can only accept/reject). Important built-in controllers:

NamespaceLifecycle

Prevents creating objects in terminating namespaces and protects system namespaces from deletion.

LimitRanger

Applies default resource requests/limits when not set, enforces LimitRange constraints.

ServiceAccount

Automatically injects the default ServiceAccount and mounts its token into pods.

ResourceQuota

Rejects objects that would exceed namespace resource quotas.

etcd Control Plane

etcd is a strongly consistent, distributed key-value store based on the Raft consensus algorithm. It is the single source of truth for all cluster state. Losing etcd without a backup means losing the cluster.

Raft basics

Raft requires a quorum of (n/2)+1 nodes to commit a write. A 3-node etcd cluster tolerates 1 failure; 5-node tolerates 2. Quorum is needed for writes and leader election, but reads can be served stale from followers (etcd defaults to linearizable reads from the leader).

etcd storage layout in Kubernetes

Shell
# Keys are prefixed by resource path
/registry/pods/default/my-pod
/registry/deployments/production/my-app
/registry/services/endpoints/kube-system/kube-dns

# Read a key directly (requires etcdctl and certs)
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  get /registry/pods/default/my-pod | strings | head -50

# List all keys
etcdctl get / --prefix --keys-only

Backup and restore

Shell
# Snapshot backup
etcdctl snapshot save /backup/etcd-snapshot-$(date +%F).db

# Verify snapshot
etcdctl snapshot status /backup/etcd-snapshot.db --write-out=table

# Restore (run before starting etcd)
etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restored \
  --name=etcd-node1 \
  --initial-cluster=etcd-node1=https://10.0.0.1:2380 \
  --initial-advertise-peer-urls=https://10.0.0.1:2380

Compaction and defragmentation

Shell
# Get current revision
etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision'

# Compact to current revision (removes old revisions)
etcdctl compact $(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')

# Defragment (reclaims disk space after compaction)
etcdctl defrag
kube-scheduler Control Plane

The scheduler watches for pods with spec.nodeName == "" and assigns them to nodes. Scheduling is a two-phase process: filtering (eliminate ineligible nodes) then scoring (rank remaining nodes).

Scheduling pipeline

Flow
New pod (nodeName="") detected
  │
  ▼
Filter plugins — eliminate nodes that cannot run the pod
  ├─ NodeUnschedulable      (node has NoSchedule taint / unschedulable flag)
  ├─ NodeResourcesFit       (insufficient CPU/memory)
  ├─ NodeAffinity           (nodeSelector, nodeAffinity)
  ├─ TaintToleration        (pod must tolerate node taints)
  ├─ PodTopologySpread      (spread constraints)
  └─ VolumeBinding          (PVCs that need specific node topology)
  │
  ▼
Score plugins — rank feasible nodes (0-100 each, weighted sum)
  ├─ NodeResourcesBalancedAllocation   (prefer balanced CPU/mem usage)
  ├─ LeastAllocated                    (prefer nodes with most free resources)
  ├─ ImageLocality                     (prefer nodes with image already pulled)
  └─ InterPodAffinity                  (prefer nodes satisfying pod affinity)
  │
  ▼
Select highest-score node → write Binding object → kubelet picks it up

Taints and tolerations

Shell
# Add a taint to a node
kubectl taint nodes node1 dedicated=gpu:NoSchedule
kubectl taint nodes node1 dedicated=gpu:NoExecute

# Remove a taint
kubectl taint nodes node1 dedicated=gpu:NoSchedule-

# Pod toleration (YAML)
tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

Pod affinity and anti-affinity

YAML
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values: [us-east-1a, us-east-1b]
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: my-app
      topologyKey: kubernetes.io/hostname   # one pod per host

Preemption

If a high-priority pod cannot be scheduled, the scheduler may evict lower-priority pods to make room. Pods with PriorityClass set are considered; the scheduler finds a node where evicting low-priority pods would free enough resources.

YAML
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority   # or Never
kube-controller-manager Control Plane

A collection of reconciliation loops in one binary. Each controller watches specific resources, compares current state to desired state, and takes corrective action. Controllers use informers (shared cache + event handlers) and work queues to avoid overwhelming the API server.

Reconciliation loop pattern

Pseudocode
loop:
  desired = read desired state from API object
  actual  = observe current state (list pods, check nodes, …)
  diff    = desired - actual
  if diff != empty:
    apply changes (create/update/delete resources)
  sleep resyncPeriod (or wait for watch event)

Key built-in controllers

ControllerWatchesAction
ReplicaSetReplicaSet, PodCreates/deletes pods to match spec.replicas
DeploymentDeployment, ReplicaSetCreates/scales ReplicaSets for rolling updates
StatefulSetStatefulSet, PodManages ordered pod creation with stable identities
DaemonSetDaemonSet, Node, PodEnsures one pod per (selected) node
Job / CronJobJob, PodRuns pods to completion; CronJob schedules Jobs
NodeNodeMarks nodes NotReady; evicts pods after pod-eviction-timeout
EndpointsService, PodKeeps Endpoints objects in sync with ready pod IPs
NamespaceNamespaceCleans up resources in terminating namespaces

Informer / work-queue internals

Flow
API Server watch stream
  │ ADDED/MODIFIED/DELETED events
  ▼
Informer (per resource type, shared across controllers)
  ├─ ThreadSafeStore (in-memory cache — avoids API calls for reads)
  └─ Event handlers → enqueue object key (namespace/name)
        │
        ▼
     Work Queue (rate-limited, deduplicating)
        │
        ▼
     Worker goroutine → Reconcile(key)
        └─ reads from cache (lister)
        └─ calls API server only for writes
kubelet Node

The kubelet is the primary node agent. It watches for pods assigned to its node via the API server (and optionally static pod manifests), then drives the container runtime via CRI to converge actual pod state toward desired state.

Pod lifecycle (kubelet perspective)

Flow
Pod assigned to node (spec.nodeName set by scheduler)
  │
  ▼
Admit pod — check resources, enforce policies (CPU manager, topology)
  │
  ▼
Setup volumes (CNI calls for network, CSI for volumes)
  │
  ▼
Pull images (via CRI → container runtime → image service)
  │
  ▼
Create sandbox (pause container establishes network namespace)
  │
  ▼
Run init containers sequentially (each must succeed before next)
  │
  ▼
Run app containers (in parallel)
  │
  ├─ Execute postStart hook (if defined)
  ├─ Start liveness / readiness / startup probes
  └─ Report status back to API server
  │
  ▼ (on termination)
Send SIGTERM → wait terminationGracePeriodSeconds → SIGKILL
Execute preStop hook before SIGTERM

CRI (Container Runtime Interface)

The kubelet communicates with the container runtime via gRPC over a Unix socket. CRI separates the kubelet from runtime-specific code.

Shell
# Inspect containers via crictl (bypasses kubelet)
crictl ps                        # list running containers
crictl pods                      # list pods
crictl inspect <container-id>   # full container state
crictl logs <container-id>
crictl exec -it <container-id> sh

# Check which runtime is in use
kubectl get node -o wide         # CONTAINER-RUNTIME column
cat /var/lib/kubelet/config.yaml | grep containerRuntimeEndpoint

Node conditions and taints

ConditionMeaningAuto-taint added
Ready=Truekubelet healthy, node can accept pods
MemoryPressureNode is low on memorynode.kubernetes.io/memory-pressure:NoSchedule
DiskPressureNode disk is nearly fullnode.kubernetes.io/disk-pressure:NoSchedule
PIDPressureToo many processes on nodenode.kubernetes.io/pid-pressure:NoSchedule
Ready=False/Unknownkubelet lost contactnode.kubernetes.io/not-ready:NoExecute (after timeout)

Static pods

Shell
# Static pod manifests (control plane components on kubeadm clusters)
ls /etc/kubernetes/manifests/
# kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml  etcd.yaml

# kubelet watches this directory; changes take effect immediately
# Static pods are mirrored as read-only objects in the API server
kube-proxy Node

kube-proxy implements the Service abstraction by programming kernel networking rules. It watches Services and EndpointSlices and translates ClusterIP VIPs into real pod IPs.

Modes

ModeMechanismNotes
iptables (default)DNAT rules in iptables PREROUTING/OUTPUT chains. Selects pod randomly per connection.O(n) rule lookup; performance degrades with many Services.
ipvsLinux IPVS (LVS) in kernel netfilter. Hash-table lookup, O(1). Supports LB algorithms: rr, lc, sh, sed, nq.Requires ipvs kernel modules. Better at scale.
nftablesnftables sets for lookup (Kubernetes ≥ 1.31, alpha/beta).Modern replacement for iptables mode.

How iptables DNAT works for ClusterIP

Shell
# Inspect kube-proxy iptables rules
iptables -t nat -L KUBE-SERVICES -n --line-numbers
iptables -t nat -L KUBE-SVC-<hash> -n       # per-service chain
iptables -t nat -L KUBE-SEP-<hash> -n       # per-endpoint chain (DNAT)

# Example flow for ClusterIP 10.96.0.10:80 with 3 endpoints
# KUBE-SERVICES → KUBE-SVC-xxx → statistic --probability 0.33 → KUBE-SEP-1 (DNAT pod1:8080)
#                                                                  → KUBE-SEP-2 (DNAT pod2:8080)
#                                                                  → KUBE-SEP-3 (DNAT pod3:8080)

# Check kube-proxy mode
kubectl -n kube-system get configmap kube-proxy -o yaml | grep mode

Service types

TypeHow it works
ClusterIPVirtual IP reachable only within the cluster. kube-proxy programs DNAT rules on every node.
NodePortOpens a port (30000–32767) on every node. Traffic → NodePort → ClusterIP → pod. External firewall rules needed.
LoadBalancerProvisions cloud load balancer via cloud-controller-manager. Gets an external IP. Includes NodePort.
ExternalNameReturns a CNAME for an external DNS name. No proxying; DNS only.
Headless (clusterIP: None)DNS returns individual pod IPs directly. Used by StatefulSets for stable DNS names per pod.
Networking Network

Kubernetes networking follows three flat-network rules: every pod gets a unique IP, pods can communicate with any other pod without NAT, and nodes can communicate with pods without NAT. CNI plugins implement the pod network.

Kubernetes networking model

Diagram
Node A (10.0.0.1)                     Node B (10.0.0.2)
├─ eth0: 10.0.0.1                     ├─ eth0: 10.0.0.2
├─ cni0 bridge: 10.244.0.1/24         ├─ cni0 bridge: 10.244.1.1/24
│   ├─ veth → Pod A1: 10.244.0.2      │   ├─ veth → Pod B1: 10.244.1.2
│   └─ veth → Pod A2: 10.244.0.3      │   └─ veth → Pod B2: 10.244.1.3
│                                     │
└─── Overlay / BGP routes ────────────┘
     (flannel VXLAN, Calico BGP, Cilium eBPF, …)

CNI plugin comparison

PluginDataplaneNetworkPolicyNotes
FlannelVXLAN overlayNo (needs Calico or Kube-router)Simple; good for learning and small clusters.
CalicoBGP routes (no overlay) or VXLANYes + extended policiesPopular in production; works well with BIRD BGP.
CiliumeBPF (no kube-proxy needed)Yes + L7 HTTP/gRPCBest performance and observability. Replaces kube-proxy.
WeaveVXLAN + fast datapathYesEasy setup; encrypted by default.

DNS (CoreDNS)

Shell
# DNS name formats
my-svc.my-ns.svc.cluster.local          # ClusterIP service
my-pod.my-ns.pod.cluster.local           # pod IP (dots replaced with dashes)
pod-0.my-svc.my-ns.svc.cluster.local    # StatefulSet pod via headless service

# Debug DNS from a pod
kubectl run -it --rm dnstest --image=busybox --restart=Never -- sh
/ # nslookup kubernetes.default.svc.cluster.local
/ # cat /etc/resolv.conf

# Check CoreDNS config
kubectl -n kube-system get configmap coredns -o yaml

NetworkPolicy

YAML
# Default deny all ingress in a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}        # select all pods
  policyTypes: [Ingress]
---
# Allow ingress only from app=frontend pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - port: 8080
Storage Storage

Kubernetes storage is abstracted through PersistentVolumes (PVs), PersistentVolumeClaims (PVCs), and StorageClasses. The Container Storage Interface (CSI) is the standard plugin mechanism for storage drivers.

PV / PVC lifecycle

Flow
StorageClass (provisioner, parameters, reclaimPolicy)
  │
  │ (dynamic provisioning)
  ▼
PersistentVolumeClaim created by user
  │  storageClassName: fast-ssd
  │  storage: 10Gi
  │  accessModes: [ReadWriteOnce]
  │
  ▼  (external-provisioner sidecar watches PVCs)
CSI driver called: CreateVolume RPC
  │
  ▼
PersistentVolume created (Bound to PVC)
  │
  ▼
Pod references PVC → kubelet calls NodeStageVolume + NodePublishVolume
  │
  ▼
Volume mounted at pod path

On PVC delete: ReclaimPolicy
  Retain  → PV remains (manual cleanup)
  Delete  → CSI DeleteVolume called, PV deleted

Access modes

ModeAbbreviationMeaning
ReadWriteOnceRWOOne node mounts read-write. Most block storage (EBS, PD, Azure Disk).
ReadOnlyManyROXMany nodes mount read-only.
ReadWriteManyRWXMany nodes mount read-write. Requires NFS, CephFS, or similar.
ReadWriteOncePodRWOPSingle pod mounts read-write (Kubernetes ≥ 1.22).

CSI architecture

Flow
kubelet                     CSI driver pod
  │                           │
  │  NodeStageVolume (gRPC)   │  ← format + mount to staging path
  │──────────────────────────►│
  │  NodePublishVolume        │  ← bind-mount staging → pod path
  │──────────────────────────►│

External controller sidecar (runs alongside driver):
  provisioner  → CreateVolume / DeleteVolume
  attacher     → ControllerPublishVolume / ControllerUnpublishVolume
  resizer      → ControllerExpandVolume

Volume types quick reference

emptyDir

Ephemeral dir in pod's node scratch space. Shared between containers in a pod. Deleted when pod dies.

hostPath

Mounts a node path into the pod. Avoid in production — ties pod to a specific node and risks host access.

configMap / secret

Projected as files or env vars. ConfigMap for config data, Secret for sensitive data (base64-encoded in etcd by default; encrypt at rest).

projected

Combines multiple volume sources (ServiceAccount token, ConfigMap, Secret, downwardAPI) into a single directory.

Workloads Workloads

Kubernetes workload controllers manage sets of pods. Understanding how each controller updates pods is critical for operating stateful and stateless services safely.

Deployment rolling update internals

Flow
kubectl set image deployment/my-app container=image:v2
  │
  ▼
Deployment controller creates new ReplicaSet (RS-v2, replicas=0)
  │
  ▼
Scale RS-v2 up by 1  (maxSurge=1 → can go 1 above desired)
  │
  ▼
Wait for new pod Ready
  │
  ▼
Scale RS-v1 down by 1  (maxUnavailable=1 → can be 1 below desired)
  │
  ▼
Repeat until RS-v2 = desired, RS-v1 = 0
  │
  ▼
Old RS kept (scale=0) for rollback history (revisionHistoryLimit)

StatefulSet guarantees

PropertyBehaviour
Stable network identityPod name is <name>-<ordinal>. DNS: <pod>.<headless-svc>.<ns>.svc.cluster.local. Survives rescheduling.
Stable storageVolumeClaimTemplate creates a PVC per pod. PVC is not deleted when the pod is deleted.
Ordered creationPods created 0 → N−1. Each must be Running+Ready before the next is created.
Ordered deletionPods deleted N−1 → 0 (reverse order) by default (OrderedReady policy).
Parallel policypodManagementPolicy: Parallel — create/delete all pods simultaneously (updates still ordered).

DaemonSet scheduling

DaemonSet pods bypass the scheduler — the DaemonSet controller writes spec.nodeName directly. This means DaemonSet pods can be placed on nodes that are unschedulable (e.g., control plane nodes) when tolerations are set appropriately. The pods also start before the scheduler is fully ready during cluster bootstrap.

HorizontalPodAutoscaler internals

Flow
HPA controller (in controller-manager) runs every --horizontal-pod-autoscaler-sync-period (15s)
  │
  ├─ Queries metrics-server (or custom/external metrics adapter)
  │    currentMetricValue = avg CPU across pods
  │
  ├─ desiredReplicas = ceil(currentReplicas × (currentValue / targetValue))
  │
  └─ Scales Deployment / ReplicaSet if outside [minReplicas, maxReplicas]
       Cooldown: --horizontal-pod-autoscaler-downscale-stabilization (5m default)
Security Security

Kubernetes security involves cluster-level (RBAC, admission, network policies) and workload-level (pod security, secrets management) controls.

RBAC model

YAML
# Role (namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
# RoleBinding: bind Role to a ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: default
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

RBAC: Role vs ClusterRole

ObjectScopeUse
RoleNamespaceGrant access to namespaced resources within one namespace.
ClusterRoleCluster-wideGrant access to cluster-scoped resources (Nodes, PVs) or to namespaced resources across all namespaces.
RoleBindingNamespaceBinds a Role or ClusterRole to subjects within a namespace.
ClusterRoleBindingCluster-wideBinds a ClusterRole to subjects cluster-wide.

ServiceAccount token projection

Since Kubernetes 1.21, pods use projected ServiceAccount tokens (bound tokens) by default instead of long-lived static tokens. These tokens are audience-bound, time-limited (1 hour default), and rotated automatically by the kubelet. They are mounted via the projected volume type, not via a Secret.

Secrets encryption at rest

YAML
# /etc/kubernetes/enc/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: [secrets]
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}   # fallback: read unencrypted (for migration)

# Then restart kube-apiserver with:
# --encryption-provider-config=/etc/kubernetes/enc/encryption-config.yaml

# Rewrite all secrets to encrypt them
kubectl get secrets --all-namespaces -o json | kubectl replace -f -

Pod Security Admission (PSA)

LevelRestrictions
privilegedNo restrictions. Same as no policy.
baselineBlocks most known privilege escalations: no privileged containers, no hostPID/hostNetwork, restricted capabilities.
restrictedHeavily restricted: must run as non-root, drop ALL capabilities, no privilege escalation, seccomp required.
Shell
# Enable PSA on a namespace via labels
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
Cheat Sheet Reference

Inspect cluster internals

Shell
# Control plane component health
kubectl get componentstatuses
kubectl get --raw /healthz

# Node internals
kubectl describe node <node>          # conditions, capacity, allocatable, events
kubectl top node                       # requires metrics-server
kubectl get events --sort-by=.metadata.creationTimestamp

# Pod scheduling
kubectl describe pod <pod>            # Events: FailedScheduling, etc.
kubectl get pod -o wide               # which node, IP
kubectl get pod <pod> -o jsonpath='{.spec.nodeName}'

# Watch reconciliation
kubectl get rs --watch                # watch ReplicaSet convergence
kubectl rollout status deploy/my-app  # watch Deployment rollout
kubectl rollout history deploy/my-app
kubectl rollout undo deploy/my-app    # roll back

etcd

etcdctl endpoint health — check cluster
etcdctl snapshot save — backup
etcdctl get /registry/ --prefix --keys-only
Quorum: (n/2)+1 nodes needed for writes

API Server

kubectl get --raw /apis — list API groups
kubectl api-resources — all resource types
kubectl explain pod.spec — schema docs
kubectl auth can-i list pods --as user

Networking

iptables -t nat -L KUBE-SERVICES -n
kubectl -n kube-system logs -l k8s-app=kube-dns
kubectl exec -it pod -- curl svc:port
kubectl port-forward svc/my-svc 8080:80

Storage

kubectl get pv,pvc — PV/PVC status
kubectl describe pvc my-claim — binding events
kubectl get sc — StorageClasses
PVC stuck Pending: check StorageClass, CSI driver logs

Workloads

kubectl rollout status deploy/x
kubectl rollout undo deploy/x --to-revision=2
kubectl scale deploy/x --replicas=5
kubectl get hpa — autoscaler status

Security

kubectl auth can-i '*' '*' --all-namespaces
kubectl get rolebindings,clusterrolebindings -A
kubectl get secret -o yaml | base64 -d
Check: pod-security.kubernetes.io/* labels on ns