The sysctl command reads and writes kernel parameters at runtime. Every parameter maps to a file under /proc/sys/ — the dot-separated sysctl name corresponds to a slash-separated path.
Basic Usage
# Read a single parameter sysctl net.ipv4.tcp_max_syn_backlog # Read all parameters matching a pattern sysctl -a | grep tcp_max # Write a parameter (takes effect immediately, lost on reboot) sysctl -w net.ipv4.tcp_max_syn_backlog=65535 # Read all parameters sysctl -a # Show only changed (non-default) values sysctl -a --diff /etc/sysctl.conf
Name ↔ Path Mapping
sysctl -w changes take effect immediately but are lost on reboot. To persist, write to a file in /etc/sysctl.d/ (see the Persisting Changes section).
Every sysctl parameter is a virtual file under /proc/sys. You can read and write parameters directly using standard file I/O — useful in scripts where you want to avoid the sysctl binary.
# Read directly from /proc/sys cat /proc/sys/net/ipv4/tcp_max_syn_backlog # Write directly (same as sysctl -w) echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog # Explore the sysctl tree ls /proc/sys/net/ipv4/ | head -20 ls /proc/sys/vm/ # Check if a parameter is read-only ls -l /proc/sys/kernel/osrelease # -r--r--r-- means read-only
/proc/sys requires root. Most parameters are world-readable but root-writable. A few are read-only even as root (e.g. kernel.osrelease).
To survive reboots, write parameters to a .conf file under /etc/sysctl.d/. Files are loaded in lexicographic order, with later files overriding earlier ones.
File Format
# Lines starting with # are comments # Format: parameter = value net.core.somaxconn = 65535 net.ipv4.tcp_max_syn_backlog = 65535 net.ipv4.ip_local_port_range = 1024 65535 vm.swappiness = 10 fs.file-max = 2097152
Apply Without Rebooting
# Apply all files in /etc/sysctl.d/ and /etc/sysctl.conf sysctl --system # Apply a specific file sysctl -p /etc/sysctl.d/99-tuning.conf # Apply the legacy /etc/sysctl.conf sysctl -p
Load Order
/etc/sysctl.d/99-custom.conf rather than editing /etc/sysctl.conf directly. This survives package upgrades and is easier to track in version control or config management (Ansible, Chef, Puppet).
TCP parameters control connection queues, congestion control, retransmit behaviour, TIME_WAIT handling, and keepalives. These are among the most commonly tuned parameters on servers.
Connection Queues
| Parameter | Default | Description |
|---|---|---|
| net.core.somaxconn | 4096 | Max accept queue depth per socket. Also capped by the backlog arg passed to listen(). |
| net.ipv4.tcp_max_syn_backlog | 1024 | Max SYN queue depth (half-open connections) per socket. |
| net.ipv4.tcp_syncookies | 1 | Send SYN cookies when SYN queue is full — mitigates SYN flood without dropping connections. |
| net.ipv4.tcp_fin_timeout | 60 | Seconds a socket stays in TIME_WAIT after FIN. Lower to free ports faster. |
| net.ipv4.tcp_tw_reuse | 0 | Allow reuse of TIME_WAIT sockets for new outbound connections (safe to enable). |
| net.ipv4.ip_local_port_range | 32768 60999 | Ephemeral port range for outbound connections. Widen for high-connection-rate clients. |
| net.ipv4.tcp_max_tw_buckets | 262144 | Max simultaneous TIME_WAIT sockets. Excess are immediately destroyed. |
Retransmit & Keepalive
| Parameter | Default | Description |
|---|---|---|
| net.ipv4.tcp_retries2 | 15 | Max retransmit attempts before giving up on an established connection (~13–30 min). |
| net.ipv4.tcp_syn_retries | 6 | Max SYN retransmit attempts for outbound connections (~127s total). |
| net.ipv4.tcp_keepalive_time | 7200 | Seconds of idle before sending first keepalive probe (2 hours). |
| net.ipv4.tcp_keepalive_intvl | 75 | Seconds between keepalive probes. |
| net.ipv4.tcp_keepalive_probes | 9 | Number of unanswered probes before declaring the connection dead. |
Congestion Control
# Show available algorithms sysctl net.ipv4.tcp_available_congestion_control # Show active algorithm sysctl net.ipv4.tcp_congestion_control # Enable BBR (better for high-BDP paths and streaming) sysctl -w net.ipv4.tcp_congestion_control=bbr sysctl -w net.core.default_qdisc=fq # BBR works best with fq qdisc
Common Server Tuning
# High-connection-rate server (web, API) sysctl -w net.core.somaxconn=65535 sysctl -w net.ipv4.tcp_max_syn_backlog=65535 sysctl -w net.ipv4.ip_local_port_range="1024 65535" sysctl -w net.ipv4.tcp_tw_reuse=1 sysctl -w net.ipv4.tcp_fin_timeout=15 # Tighten keepalives for detecting dead connections faster sysctl -w net.ipv4.tcp_keepalive_time=60 sysctl -w net.ipv4.tcp_keepalive_intvl=10 sysctl -w net.ipv4.tcp_keepalive_probes=6
min(somaxconn, listen_backlog). Raising somaxconn alone isn't enough if your application passes a small value to listen(). Check your framework's default (Node.js: 511, many Java stacks: 50–100).
TCP and UDP socket buffers control how much data the kernel holds in flight per connection. Too small = throughput limited by buffer, especially on high-latency links. Too large = wasted memory on idle connections.
Buffer Parameters
| Parameter | Default | Description |
|---|---|---|
| net.core.rmem_max | 212992 | Hard ceiling for socket receive buffer (bytes). Set by the application via SO_RCVBUF. |
| net.core.wmem_max | 212992 | Hard ceiling for socket send buffer (bytes). |
| net.core.rmem_default | 212992 | Default receive buffer before the application adjusts it. |
| net.core.wmem_default | 212992 | Default send buffer. |
| net.ipv4.tcp_rmem | 4096 131072 6291456 | TCP receive buffer: min / default / max (bytes). Kernel auto-tunes within this range. |
| net.ipv4.tcp_wmem | 4096 16384 4194304 | TCP send buffer: min / default / max (bytes). |
| net.ipv4.udp_rmem_min | 4096 | Minimum UDP receive buffer per socket. |
| net.ipv4.tcp_mem | auto | System-wide TCP memory (pages): low / pressure / max. Kernel sets this from RAM at boot. |
High-Bandwidth Tuning (10 GbE / long-haul)
# BDP rule: buffer ≥ bandwidth × RTT # Example: 10 Gbps × 10ms RTT = 12.5 MB needed per connection sysctl -w net.core.rmem_max=134217728 # 128 MB ceiling sysctl -w net.core.wmem_max=134217728 sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" # Enable auto-tuning (on by default, verify it's on) sysctl net.ipv4.tcp_moderate_rcvbuf # should be 1
tcp_rmem / tcp_wmem range. You mostly need to raise the maximum — the kernel grows the buffer as needed. The rmem_max / wmem_max values cap what the application can request via SO_RCVBUF / SO_SNDBUF.
Netfilter connection tracking parameters. Critical on firewalls, NAT gateways, and load balancers where the tracked connection count can be very high.
| Parameter | Default | Description |
|---|---|---|
| net.netfilter.nf_conntrack_max | 131072 | Max number of tracked connections. When full, new connections are dropped with "table full" in dmesg. |
| net.netfilter.nf_conntrack_tcp_timeout_established | 432000 | Seconds to keep an established TCP connection in the table (5 days default — very long). |
| net.netfilter.nf_conntrack_tcp_timeout_time_wait | 120 | Seconds to keep TIME_WAIT entries. |
| net.netfilter.nf_conntrack_udp_timeout | 30 | Seconds to keep an unidirectional UDP flow. |
| net.netfilter.nf_conntrack_udp_timeout_stream | 120 | Seconds to keep a bidirectional UDP flow. |
| net.netfilter.nf_conntrack_buckets | auto | Hash table size. Set to nf_conntrack_max / 4 for good performance. |
# Check current table usage vs max cat /proc/sys/net/netfilter/nf_conntrack_count cat /proc/sys/net/netfilter/nf_conntrack_max # Increase table size (e.g. for a busy NAT gateway) sysctl -w net.netfilter.nf_conntrack_max=2097152 # Shorten TCP established timeout from 5 days to 1 hour sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=3600 # Watch for table-full drops dmesg | grep "nf_conntrack: table full"
dmesg when the conntrack table is exhausted. New connections are silently dropped. Either raise nf_conntrack_max, shorten timeouts to expire stale entries faster, or add NOTRACK rules in the raw table for traffic that doesn't need stateful tracking.
| Parameter | Default | Description |
|---|---|---|
| net.ipv4.ip_forward | 0 | Enable IPv4 packet forwarding. Must be 1 for routers, VPN gateways, and containers. |
| net.ipv6.conf.all.forwarding | 0 | Enable IPv6 packet forwarding. |
| net.ipv4.conf.all.rp_filter | 1 | Reverse path filtering — drops packets whose source address has no return route. Set to 0 for asymmetric routing. |
| net.ipv4.conf.all.accept_redirects | 1 | Accept ICMP redirects. Disable on servers — routers should not be redirecting your traffic. |
| net.ipv4.conf.all.send_redirects | 1 | Send ICMP redirects. Disable if not acting as a router. |
| net.ipv4.route.max_size | — | Routing cache max entries (removed in kernel 3.6; now caches are per-CPU and unbounded). |
# Enable forwarding (required for Docker, Kubernetes, VPNs) sysctl -w net.ipv4.ip_forward=1 sysctl -w net.ipv6.conf.all.forwarding=1 # Disable ICMP redirects on a server (not a router) sysctl -w net.ipv4.conf.all.accept_redirects=0 sysctl -w net.ipv4.conf.all.send_redirects=0 # Disable reverse path filtering for asymmetric routing sysctl -w net.ipv4.conf.all.rp_filter=0
| Parameter | Default | Description |
|---|---|---|
| net.core.netdev_max_backlog | 1000 | Per-CPU queue of packets waiting to be processed by the kernel. Increase if NIC drops frames under load (ethtool -S shows rx_missed_errors). |
| net.ipv4.udp_rmem_min | 4096 | Minimum per-socket UDP receive buffer. |
| net.ipv4.udp_wmem_min | 4096 | Minimum per-socket UDP send buffer. |
| net.ipv4.udp_mem | auto | System-wide UDP memory (pages): low / pressure / max. |
cat /proc/net/udp or ss -unp for per-socket drop counters. System-wide UDP drops appear in netstat -su under "receive buffer errors."
vm.swappiness controls how aggressively the kernel reclaims anonymous memory (heap, stack) relative to file-backed page cache. It's a tendency knob, not a hard threshold.
| Workload | Recommended Value | Reason |
|---|---|---|
| General server | 10–30 | Reduces latency spikes from unexpected swapping |
| Database (MySQL, PostgreSQL) | 1–10 | Databases manage their own caches — kernel swap causes stalls |
| Redis / in-memory store | 0–1 | Any swap evicts data that must stay in RAM |
| Desktop / shared workstation | 60–100 | Users benefit from aggressive page cache retention |
| Container host (Kubernetes) | 0 | Pods expect predictable memory; swap causes OOM confusion |
# Check current value sysctl vm.swappiness # Reduce for a database server sysctl -w vm.swappiness=10 # Check current swap usage free -h vmstat -s | grep swap
swapoff -a (and remove swap entries from /etc/fstab).
Dirty pages are modified memory pages that haven't been written to disk yet. The kernel periodically flushes them in the background (writeback). These parameters control when and how aggressively that happens.
| Parameter | Default | Description |
|---|---|---|
| vm.dirty_ratio | 20 | % of total RAM that can be dirty before a writing process is blocked and forced to flush. High values = more throughput but bigger write latency spikes on flush. |
| vm.dirty_background_ratio | 10 | % of RAM at which background flusher (pdflush/kworker) kicks in — less disruptive than dirty_ratio flush. |
| vm.dirty_writeback_centisecs | 500 | How often (centiseconds) the background flusher wakes up. Default = every 5 seconds. |
| vm.dirty_expire_centisecs | 3000 | How old (centiseconds) a dirty page must be before it is eligible for writeback. Default = 30 seconds. |
Tuning for Different Workloads
Latency-sensitive (DB writes)
Lower dirty ratios mean smaller flush bursts and more predictable latency at the cost of slightly lower write throughput.
Throughput-oriented (bulk ingest)
Higher dirty ratios allow the kernel to batch more writes together, increasing disk throughput at the risk of a large stall when the threshold is hit.
# Check current dirty page count cat /proc/meminfo | grep Dirty grep -E "Dirty|Writeback" /proc/meminfo # Reduce flush thresholds for lower write latency sysctl -w vm.dirty_ratio=5 sysctl -w vm.dirty_background_ratio=2 # Force immediate writeback of all dirty pages sync echo 3 > /proc/sys/vm/drop_caches # flush caches (use with care!)
When the system runs out of memory and cannot reclaim any, the Out-Of-Memory killer selects a process to kill. The victim is chosen by an OOM score — a combination of memory usage, runtime, and adjustments.
| Parameter / File | Description |
|---|---|
| vm.panic_on_oom | 0 = invoke OOM killer (default). 1 = kernel panic instead. Useful on systems where partial failure is worse than reboot. |
| vm.oom_kill_allocating_task | 0 = OOM killer picks the "best" victim by score. 1 = kill the task that triggered OOM immediately — faster, less heuristic. |
| /proc/<pid>/oom_score | Current OOM score for a process (read-only). Higher = more likely to be killed. |
| /proc/<pid>/oom_score_adj | Adjustment from -1000 to +1000. -1000 = never kill. +1000 = always kill first. |
# See which process was OOM-killed dmesg | grep -i "oom\|killed process" journalctl -k | grep -i oom # See OOM score for all processes (sorted) for p in /proc/[0-9]*; do printf "%5d %3d %s\n" \ "$(cat $p/oom_score 2>/dev/null)" \ "$(cat $p/oom_score_adj 2>/dev/null)" \ "$(cat $p/comm 2>/dev/null)" done | sort -rn | head -20 # Protect a critical process from OOM kill echo -1000 > /proc/$(pgrep myapp)/oom_score_adj # Make a process a preferred OOM victim echo +500 > /proc/$(pgrep lowpriority)/oom_score_adj
OOMScoreAdjust=-500 in the [Service] section) to make it permanent.
Linux overcommits virtual memory by default — malloc() succeeds even when there isn't enough physical RAM to back the allocation, betting that not all of it will be used at once.
| vm.overcommit_memory | Behaviour |
|---|---|
0 (default) | Heuristic — allow reasonable overcommit; kernel uses a formula to reject obviously excessive allocations. |
1 | Always allow — never fail malloc(). Used by some HPC and scientific workloads. |
2 | Strict — total committed memory cannot exceed swap + (overcommit_ratio % of RAM). malloc() can fail. |
# Check current setting sysctl vm.overcommit_memory sysctl vm.overcommit_ratio # used when overcommit_memory=2 (default 50%) # Check committed memory vs limit grep -E "CommitLimit|Committed_AS" /proc/meminfo
Hugepages (2 MB or 1 GB) reduce TLB pressure for large working sets by using fewer, larger page table entries. Critical for databases (PostgreSQL, Oracle, MySQL), Java heaps, and DPDK workloads.
| Parameter | Description |
|---|---|
| vm.nr_hugepages | Number of pre-allocated 2 MB hugepages. Memory is reserved immediately and cannot be used for anything else. |
| vm.nr_overcommit_hugepages | Additional hugepages that can be allocated on demand (not pre-reserved). |
| vm.transparent_hugepages | Not a sysctl — lives at /sys/kernel/mm/transparent_hugepage/enabled. Values: always, madvise, never. |
# Show hugepage stats grep Huge /proc/meminfo # Pre-allocate 512 × 2MB hugepages (= 1 GB reserved) sysctl -w vm.nr_hugepages=512 # Check / change Transparent Hugepage setting cat /sys/kernel/mm/transparent_hugepage/enabled echo madvise > /sys/kernel/mm/transparent_hugepage/enabled # Disable THP entirely (often better for databases) echo never > /sys/kernel/mm/transparent_hugepage/enabled
madvise or never.
| Parameter | Default | Description |
|---|---|---|
| kernel.pid_max | 32768 | Maximum PID value. When exhausted, fork() fails. Raise to 4194304 on busy systems running many containers. |
| kernel.threads-max | ~30000 | Max number of threads system-wide. Each thread consumes a PID. |
| fs.file-max | ~9M | System-wide open file descriptor limit. Per-process limit is set via ulimit -n or systemd's LimitNOFILE. |
| fs.inotify.max_user_watches | 8192 | Max inotify watches per user. IDEs, editors, and build tools exhaust this on large codebases. |
| fs.inotify.max_user_instances | 128 | Max inotify instances per user. |
| kernel.max_map_count | 65530 | Max virtual memory map areas per process. Elasticsearch and JVM workloads commonly exhaust this. |
# Raise PID limit for container-heavy hosts sysctl -w kernel.pid_max=4194304 # Fix "too many open files" system-wide sysctl -w fs.file-max=2097152 # Fix inotify limit (IDEs, webpack, etc.) sysctl -w fs.inotify.max_user_watches=524288 # Fix Elasticsearch "max virtual memory areas" error sysctl -w vm.max_map_count=262144 # Check current open file descriptor count cat /proc/sys/fs/file-nr # open / unused / max
| Parameter | Default | Description |
|---|---|---|
| kernel.panic | 0 | Seconds to wait before automatic reboot after a kernel panic. 0 = hang forever. Set to e.g. 10 for auto-reboot on production servers. |
| kernel.panic_on_oops | 0 | Panic (and thus reboot if kernel.panic > 0) on kernel oops. Useful when you prefer a clean reboot over running with a possibly corrupted kernel state. |
| vm.panic_on_oom | 0 | Panic instead of invoking the OOM killer. |
| kernel.unknown_nmi_halt | 0 | Halt on unknown NMI (Non-Maskable Interrupt). Useful for hardware fault isolation. |
# Auto-reboot 10 seconds after a kernel panic sysctl -w kernel.panic=10 sysctl -w kernel.panic_on_oops=1 # Check current panic setting sysctl kernel.panic
| Parameter | Default | Description |
|---|---|---|
| kernel.core_pattern | core | Template for core dump filename. Supports %p (PID), %e (executable), %t (timestamp). Can pipe to a handler with |/usr/lib/.... |
| kernel.core_uses_pid | 0 | Append PID to the core filename. Equivalent to adding %p to core_pattern. |
| fs.suid_dumpable | 0 | 0 = no core for setuid processes. 1 = dump (insecure). 2 = dump readable only by root. |
# Write core dumps to /tmp with PID and name in filename sysctl -w kernel.core_pattern=/tmp/core.%e.%p.%t # Check what systemd-coredump uses (modern distros) cat /proc/sys/kernel/core_pattern # usually: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h # List core dumps captured by systemd-coredump coredumpctl list
| Parameter | Default | Description |
|---|---|---|
| kernel.sched_min_granularity_ns | 750000 | Minimum time a task runs before being preempted (nanoseconds). Lower = more responsive, higher context-switch overhead. |
| kernel.sched_latency_ns | 6000000 | Target scheduling latency — time within which every runnable task should run at least once (nanoseconds). |
| kernel.sched_migration_cost_ns | 500000 | Cost of migrating a task between CPUs. Higher value = less migration = better cache locality at the cost of load imbalance. |
| kernel.sched_autogroup_enabled | 1 | Group tasks by session for desktop fairness. Disable on servers to let all processes compete equally. |
# Disable autogroup for server workloads sysctl -w kernel.sched_autogroup_enabled=0 # Increase migration cost for cache-sensitive workloads sysctl -w kernel.sched_migration_cost_ns=5000000
Address Space Layout Randomisation randomises the memory addresses of the stack, heap, and shared libraries, making it harder to exploit memory corruption vulnerabilities.
| kernel.randomize_va_space | Behaviour |
|---|---|
0 | Disabled — all addresses deterministic. Never use in production. |
1 | Randomise stack, VDSO, shared libraries. |
2 | Full ASLR — also randomise heap (default on most distros). |
# Check ASLR setting sysctl kernel.randomize_va_space # Ensure full ASLR is enabled sysctl -w kernel.randomize_va_space=2
| Parameter | Default | Description |
|---|---|---|
| kernel.dmesg_restrict | 0 | 1 = only root can read dmesg. Prevents unprivileged users from reading kernel addresses and hardware info. |
| kernel.kptr_restrict | 0 | 1 = hide kernel symbol addresses from /proc/kallsyms for non-root. 2 = hide even from root. |
| kernel.perf_event_paranoid | 2 | Controls who can use perf. -1 = all. 0 = all (plus CPU counters). 1 = user-level only. 2 = root only. 3 = nobody (some distros). |
# Harden kernel info exposure sysctl -w kernel.dmesg_restrict=1 sysctl -w kernel.kptr_restrict=1 # Allow non-root perf for development machines sysctl -w kernel.perf_event_paranoid=1
| Parameter | Default | Description |
|---|---|---|
| kernel.unprivileged_bpf_disabled | 0 or 1 | 0 = unprivileged users can load BPF programs. 1 = root only (write-once: cannot be set back to 0 without reboot). 2 = root only but reversible. |
| net.core.bpf_jit_enable | 1 | Enable BPF JIT compiler. 0 = interpret. 1 = JIT. 2 = JIT with debugging output. |
| net.core.bpf_jit_harden | 0 | 1 = harden JIT for unprivileged users (constant blinding). 2 = harden for all users. |
# Restrict BPF to root (production servers) sysctl -w kernel.unprivileged_bpf_disabled=1 # Harden JIT against Spectre-style side channels sysctl -w net.core.bpf_jit_harden=2
| Parameter | Default | Description |
|---|---|---|
| net.ipv4.tcp_syncookies | 1 | Send SYN cookies when the SYN queue overflows — allows legitimate connections to proceed without a full queue entry. Essential for SYN flood mitigation. |
| net.ipv4.icmp_echo_ignore_broadcasts | 1 | Ignore ICMP echo requests to broadcast addresses (prevents Smurf attack amplification). |
| net.ipv4.icmp_ignore_bogus_error_responses | 1 | Suppress logging of bogus ICMP error responses. |
| net.ipv4.conf.all.log_martians | 0 | Log packets with impossible source addresses (martians) — useful for detecting spoofed traffic. |
# Recommended security baseline for a public-facing server sysctl -w net.ipv4.tcp_syncookies=1 sysctl -w net.ipv4.icmp_echo_ignore_broadcasts=1 sysctl -w net.ipv4.icmp_ignore_bogus_error_responses=1 sysctl -w net.ipv4.conf.all.accept_redirects=0 sysctl -w net.ipv4.conf.all.send_redirects=0 sysctl -w net.ipv4.conf.all.rp_filter=1 sysctl -w kernel.dmesg_restrict=1 sysctl -w kernel.randomize_va_space=2
Commonly changed parameters, their defaults, and the typical direction of change.
| Parameter | Default | Typical Change | Reason |
|---|---|---|---|
| net.core.somaxconn | 4096 | ↑ 65535 | High-traffic servers / load balancers |
| net.ipv4.tcp_max_syn_backlog | 1024 | ↑ 65535 | Burst connection handling |
| net.ipv4.ip_local_port_range | 32768 60999 | 1024 65535 | More ephemeral ports for clients |
| net.ipv4.tcp_tw_reuse | 0 | ↑ 1 | Reuse TIME_WAIT ports for outbound |
| net.ipv4.tcp_fin_timeout | 60 | ↓ 15 | Release ports faster |
| net.core.rmem_max | 212992 | ↑ 134217728 | High-bandwidth / high-latency paths |
| net.core.wmem_max | 212992 | ↑ 134217728 | High-bandwidth / high-latency paths |
| net.ipv4.tcp_congestion_control | cubic | bbr | Better throughput on lossy or high-BDP links |
| net.netfilter.nf_conntrack_max | 131072 | ↑ 2097152 | Busy NAT gateways / firewalls |
| vm.swappiness | 60 | ↓ 10 | Reduce latency spikes on servers |
| vm.dirty_ratio | 20 | ↓ 5 | Reduce write latency spikes |
| vm.max_map_count | 65530 | ↑ 262144 | Elasticsearch, JVM applications |
| fs.file-max | ~9M | ↑ 2097152 | Many open files (databases, proxies) |
| fs.inotify.max_user_watches | 8192 | ↑ 524288 | IDEs, build tools on large codebases |
| kernel.pid_max | 32768 | ↑ 4194304 | Container hosts with many processes |
| kernel.panic | 0 | 10 | Auto-reboot on production systems after panic |
| net.ipv4.ip_forward | 0 | ↑ 1 | Docker, Kubernetes, VPNs, routers |
Web / API Server
# Connection capacity net.core.somaxconn = 65535 net.ipv4.tcp_max_syn_backlog = 65535 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_fin_timeout = 15 # Socket buffers net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # Congestion control net.ipv4.tcp_congestion_control = bbr net.core.default_qdisc = fq # OS limits fs.file-max = 2097152 vm.swappiness = 10
Database Server (PostgreSQL / MySQL)
# Avoid swapping — databases manage their own cache vm.swappiness = 1 # Reduce write latency spikes from page cache flush vm.dirty_ratio = 5 vm.dirty_background_ratio = 2 vm.dirty_writeback_centisecs = 100 # Disable transparent hugepages (causes latency spikes) # Set in /etc/rc.local or a systemd unit: # echo never > /sys/kernel/mm/transparent_hugepage/enabled # More shared memory for PostgreSQL kernel.shmmax = 68719476736 # 64 GB kernel.shmall = 4294967296 # Faster keepalive to detect dead connections net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 6
Kubernetes / Container Host
# Required for pod networking net.ipv4.ip_forward = 1 net.ipv6.conf.all.forwarding = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 # More PIDs for containers kernel.pid_max = 4194304 # Disable swap (Kubernetes recommends this) vm.swappiness = 0 # inotify for many file watchers across pods fs.inotify.max_user_watches = 524288 fs.inotify.max_user_instances = 512 # Connection tracking for services with many pods net.netfilter.nf_conntrack_max = 2097152