Linux sysctl Guide

Read, write, and persist Linux kernel parameters — networking, memory, VM, kernel, and security

sysctl namespace │ ├── net/ │ ├── core/ → socket buffers, netdev backlog │ ├── ipv4/ → TCP tuning, forwarding, ICMP, conntrack │ ├── ipv6/ → IPv6 forwarding, router advertisements │ └── netfilter/ → conntrack table size and timeouts │ ├── vm/ │ ├── swappiness → how aggressively kernel uses swap │ ├── dirty_* → writeback thresholds for page cache │ ├── overcommit_* → memory allocation policy │ └── nr_hugepages → transparent / explicit hugepage config │ ├── kernel/ │ ├── pid_max → max process IDs │ ├── panic → behaviour on kernel oops/panic │ ├── core_pattern → core dump filename template │ └── sched_* → CFS scheduler tunables │ └── fs/ ├── file-max → system-wide open file limit ├── inotify/ → inotify watch limits └── pipe-max-size → max pipe buffer size

Basics

Basics Reading & Writing

The sysctl command reads and writes kernel parameters at runtime. Every parameter maps to a file under /proc/sys/ — the dot-separated sysctl name corresponds to a slash-separated path.

Basic Usage

Shell

# Read a single parameter
sysctl net.ipv4.tcp_max_syn_backlog

# Read all parameters matching a pattern
sysctl -a | grep tcp_max

# Write a parameter (takes effect immediately, lost on reboot)
sysctl -w net.ipv4.tcp_max_syn_backlog=65535

# Read all parameters
sysctl -a

# Show only changed (non-default) values
sysctl -a --diff /etc/sysctl.conf

Name ↔ Path Mapping

sysctl name /proc/sys path ───────────────────────────────────────────────────── net.ipv4.ip_forward ↔ /proc/sys/net/ipv4/ip_forward vm.swappiness ↔ /proc/sys/vm/swappiness kernel.pid_max ↔ /proc/sys/kernel/pid_max fs.file-max ↔ /proc/sys/fs/file-max Dots in name = directory separators in /proc/sys

Runtime vs persistent sysctl -w changes take effect immediately but are lost on reboot. To persist, write to a file in /etc/sysctl.d/ (see the Persisting Changes section).

Basics /proc/sys

Every sysctl parameter is a virtual file under /proc/sys. You can read and write parameters directly using standard file I/O — useful in scripts where you want to avoid the sysctl binary.

Shell

# Read directly from /proc/sys
cat /proc/sys/net/ipv4/tcp_max_syn_backlog

# Write directly (same as sysctl -w)
echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog

# Explore the sysctl tree
ls /proc/sys/net/ipv4/ | head -20
ls /proc/sys/vm/

# Check if a parameter is read-only
ls -l /proc/sys/kernel/osrelease
# -r--r--r-- means read-only

Permissions Writing to /proc/sys requires root. Most parameters are world-readable but root-writable. A few are read-only even as root (e.g. kernel.osrelease).

Basics Persisting Changes

To survive reboots, write parameters to a .conf file under /etc/sysctl.d/. Files are loaded in lexicographic order, with later files overriding earlier ones.

File Format

/etc/sysctl.d/99-tuning.conf

# Lines starting with # are comments
# Format: parameter = value

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535

vm.swappiness = 10

fs.file-max = 2097152

Apply Without Rebooting

Shell

# Apply all files in /etc/sysctl.d/ and /etc/sysctl.conf
sysctl --system

# Apply a specific file
sysctl -p /etc/sysctl.d/99-tuning.conf

# Apply the legacy /etc/sysctl.conf
sysctl -p

Load Order

Files are loaded in this order (later = higher priority): /usr/lib/sysctl.d/*.conf ← distro defaults (don't edit) /usr/local/lib/sysctl.d/*.conf /run/sysctl.d/*.conf ← runtime / container overrides /etc/sysctl.d/*.conf ← your changes go here /etc/sysctl.conf ← legacy; still respected Naming tip: prefix with 99- to ensure your file loads last and overrides any distro defaults.

Best practice Create /etc/sysctl.d/99-custom.conf rather than editing /etc/sysctl.conf directly. This survives package upgrades and is easier to track in version control or config management (Ansible, Chef, Puppet).

Networking TCP

TCP parameters control connection queues, congestion control, retransmit behaviour, TIME_WAIT handling, and keepalives. These are among the most commonly tuned parameters on servers.

Connection Queues

Client SYN arrives │ ▼ SYN queue (half-open) ← tcp_max_syn_backlog — SYN received, SYN-ACK sent — awaiting client ACK │ ACK arrives ▼ Accept queue (fully established) ← somaxconn (and listen backlog) — waiting for app to call accept() │ accept() ▼ Application

Parameter	Default	Description
net.core.somaxconn	4096	Max accept queue depth per socket. Also capped by the `backlog` arg passed to `listen()`.
net.ipv4.tcp_max_syn_backlog	1024	Max SYN queue depth (half-open connections) per socket.
net.ipv4.tcp_syncookies	1	Send SYN cookies when SYN queue is full — mitigates SYN flood without dropping connections.
net.ipv4.tcp_fin_timeout	60	Seconds a socket stays in TIME_WAIT after FIN. Lower to free ports faster.
net.ipv4.tcp_tw_reuse	0	Allow reuse of TIME_WAIT sockets for new outbound connections (safe to enable).
net.ipv4.ip_local_port_range	32768 60999	Ephemeral port range for outbound connections. Widen for high-connection-rate clients.
net.ipv4.tcp_max_tw_buckets	262144	Max simultaneous TIME_WAIT sockets. Excess are immediately destroyed.

Retransmit & Keepalive

Parameter	Default	Description
net.ipv4.tcp_retries2	15	Max retransmit attempts before giving up on an established connection (~13–30 min).
net.ipv4.tcp_syn_retries	6	Max SYN retransmit attempts for outbound connections (~127s total).
net.ipv4.tcp_keepalive_time	7200	Seconds of idle before sending first keepalive probe (2 hours).
net.ipv4.tcp_keepalive_intvl	75	Seconds between keepalive probes.
net.ipv4.tcp_keepalive_probes	9	Number of unanswered probes before declaring the connection dead.

Congestion Control

Shell

# Show available algorithms
sysctl net.ipv4.tcp_available_congestion_control

# Show active algorithm
sysctl net.ipv4.tcp_congestion_control

# Enable BBR (better for high-BDP paths and streaming)
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.core.default_qdisc=fq   # BBR works best with fq qdisc

Common Server Tuning

Shell

# High-connection-rate server (web, API)
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=15

# Tighten keepalives for detecting dead connections faster
sysctl -w net.ipv4.tcp_keepalive_time=60
sysctl -w net.ipv4.tcp_keepalive_intvl=10
sysctl -w net.ipv4.tcp_keepalive_probes=6

somaxconn and listen() backlog are both caps The effective accept queue size is min(somaxconn, listen_backlog). Raising somaxconn alone isn't enough if your application passes a small value to listen(). Check your framework's default (Node.js: 511, many Java stacks: 50–100).

Networking Socket Buffers

TCP and UDP socket buffers control how much data the kernel holds in flight per connection. Too small = throughput limited by buffer, especially on high-latency links. Too large = wasted memory on idle connections.

Buffer Parameters

Parameter	Default	Description
net.core.rmem_max	212992	Hard ceiling for socket receive buffer (bytes). Set by the application via `SO_RCVBUF`.
net.core.wmem_max	212992	Hard ceiling for socket send buffer (bytes).
net.core.rmem_default	212992	Default receive buffer before the application adjusts it.
net.core.wmem_default	212992	Default send buffer.
net.ipv4.tcp_rmem	4096 131072 6291456	TCP receive buffer: min / default / max (bytes). Kernel auto-tunes within this range.
net.ipv4.tcp_wmem	4096 16384 4194304	TCP send buffer: min / default / max (bytes).
net.ipv4.udp_rmem_min	4096	Minimum UDP receive buffer per socket.
net.ipv4.tcp_mem	auto	System-wide TCP memory (pages): low / pressure / max. Kernel sets this from RAM at boot.

High-Bandwidth Tuning (10 GbE / long-haul)

Shell

# BDP rule: buffer ≥ bandwidth × RTT
# Example: 10 Gbps × 10ms RTT = 12.5 MB needed per connection

sysctl -w net.core.rmem_max=134217728       # 128 MB ceiling
sysctl -w net.core.wmem_max=134217728
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"

# Enable auto-tuning (on by default, verify it's on)
sysctl net.ipv4.tcp_moderate_rcvbuf           # should be 1

Auto-tuning Modern kernels auto-tune TCP buffers within the tcp_rmem / tcp_wmem range. You mostly need to raise the maximum — the kernel grows the buffer as needed. The rmem_max / wmem_max values cap what the application can request via SO_RCVBUF / SO_SNDBUF.

Networking conntrack

Netfilter connection tracking parameters. Critical on firewalls, NAT gateways, and load balancers where the tracked connection count can be very high.

Parameter	Default	Description
net.netfilter.nf_conntrack_max	131072	Max number of tracked connections. When full, new connections are dropped with "table full" in dmesg.
net.netfilter.nf_conntrack_tcp_timeout_established	432000	Seconds to keep an established TCP connection in the table (5 days default — very long).
net.netfilter.nf_conntrack_tcp_timeout_time_wait	120	Seconds to keep TIME_WAIT entries.
net.netfilter.nf_conntrack_udp_timeout	30	Seconds to keep an unidirectional UDP flow.
net.netfilter.nf_conntrack_udp_timeout_stream	120	Seconds to keep a bidirectional UDP flow.
net.netfilter.nf_conntrack_buckets	auto	Hash table size. Set to `nf_conntrack_max / 4` for good performance.

Shell

# Check current table usage vs max
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max

# Increase table size (e.g. for a busy NAT gateway)
sysctl -w net.netfilter.nf_conntrack_max=2097152

# Shorten TCP established timeout from 5 days to 1 hour
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=3600

# Watch for table-full drops
dmesg | grep "nf_conntrack: table full"

"nf_conntrack: table full, dropping packet" This appears in dmesg when the conntrack table is exhausted. New connections are silently dropped. Either raise nf_conntrack_max, shorten timeouts to expire stale entries faster, or add NOTRACK rules in the raw table for traffic that doesn't need stateful tracking.

Networking Routing & Forwarding

Parameter	Default	Description
net.ipv4.ip_forward	0	Enable IPv4 packet forwarding. Must be 1 for routers, VPN gateways, and containers.
net.ipv6.conf.all.forwarding	0	Enable IPv6 packet forwarding.
net.ipv4.conf.all.rp_filter	1	Reverse path filtering — drops packets whose source address has no return route. Set to 0 for asymmetric routing.
net.ipv4.conf.all.accept_redirects	1	Accept ICMP redirects. Disable on servers — routers should not be redirecting your traffic.
net.ipv4.conf.all.send_redirects	1	Send ICMP redirects. Disable if not acting as a router.
net.ipv4.route.max_size	—	Routing cache max entries (removed in kernel 3.6; now caches are per-CPU and unbounded).

Shell

# Enable forwarding (required for Docker, Kubernetes, VPNs)
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv6.conf.all.forwarding=1

# Disable ICMP redirects on a server (not a router)
sysctl -w net.ipv4.conf.all.accept_redirects=0
sysctl -w net.ipv4.conf.all.send_redirects=0

# Disable reverse path filtering for asymmetric routing
sysctl -w net.ipv4.conf.all.rp_filter=0

Networking UDP

Parameter	Default	Description
net.core.netdev_max_backlog	1000	Per-CPU queue of packets waiting to be processed by the kernel. Increase if NIC drops frames under load (`ethtool -S` shows `rx_missed_errors`).
net.ipv4.udp_rmem_min	4096	Minimum per-socket UDP receive buffer.
net.ipv4.udp_wmem_min	4096	Minimum per-socket UDP send buffer.
net.ipv4.udp_mem	auto	System-wide UDP memory (pages): low / pressure / max.

UDP receive drops Check cat /proc/net/udp or ss -unp for per-socket drop counters. System-wide UDP drops appear in netstat -su under "receive buffer errors."

Memory Swappiness

vm.swappiness controls how aggressively the kernel reclaims anonymous memory (heap, stack) relative to file-backed page cache. It's a tendency knob, not a hard threshold.

vm.swappiness = 0 → prefer reclaiming page cache, avoid swapping anonymous memory until absolutely necessary vm.swappiness = 10 → light preference for reclaiming page cache over swap (good for databases, latency-sensitive workloads) vm.swappiness = 60 → kernel default, balanced vm.swappiness = 100 → treat anonymous and file-backed memory equally, swap aggressively

Workload	Recommended Value	Reason
General server	10–30	Reduces latency spikes from unexpected swapping
Database (MySQL, PostgreSQL)	1–10	Databases manage their own caches — kernel swap causes stalls
Redis / in-memory store	0–1	Any swap evicts data that must stay in RAM
Desktop / shared workstation	60–100	Users benefit from aggressive page cache retention
Container host (Kubernetes)	0	Pods expect predictable memory; swap causes OOM confusion

Shell

# Check current value
sysctl vm.swappiness

# Reduce for a database server
sysctl -w vm.swappiness=10

# Check current swap usage
free -h
vmstat -s | grep swap

swappiness=0 does not disable swap Setting it to 0 means "avoid swapping unless OOM is imminent" — the kernel will still swap if necessary. To fully disable swap: swapoff -a (and remove swap entries from /etc/fstab).

Memory Dirty Pages & Writeback

Dirty pages are modified memory pages that haven't been written to disk yet. The kernel periodically flushes them in the background (writeback). These parameters control when and how aggressively that happens.

Parameter	Default	Description
vm.dirty_ratio	20	% of total RAM that can be dirty before a writing process is blocked and forced to flush. High values = more throughput but bigger write latency spikes on flush.
vm.dirty_background_ratio	10	% of RAM at which background flusher (pdflush/kworker) kicks in — less disruptive than dirty_ratio flush.
vm.dirty_writeback_centisecs	500	How often (centiseconds) the background flusher wakes up. Default = every 5 seconds.
vm.dirty_expire_centisecs	3000	How old (centiseconds) a dirty page must be before it is eligible for writeback. Default = 30 seconds.

Tuning for Different Workloads

Latency-sensitive (DB writes)

Lower dirty ratios mean smaller flush bursts and more predictable latency at the cost of slightly lower write throughput.

Throughput-oriented (bulk ingest)

Higher dirty ratios allow the kernel to batch more writes together, increasing disk throughput at the risk of a large stall when the threshold is hit.

Shell

# Check current dirty page count
cat /proc/meminfo | grep Dirty
grep -E "Dirty|Writeback" /proc/meminfo

# Reduce flush thresholds for lower write latency
sysctl -w vm.dirty_ratio=5
sysctl -w vm.dirty_background_ratio=2

# Force immediate writeback of all dirty pages
sync
echo 3 > /proc/sys/vm/drop_caches   # flush caches (use with care!)

Memory OOM Killer

When the system runs out of memory and cannot reclaim any, the Out-Of-Memory killer selects a process to kill. The victim is chosen by an OOM score — a combination of memory usage, runtime, and adjustments.

Parameter / File	Description
vm.panic_on_oom	`0` = invoke OOM killer (default). `1` = kernel panic instead. Useful on systems where partial failure is worse than reboot.
vm.oom_kill_allocating_task	`0` = OOM killer picks the "best" victim by score. `1` = kill the task that triggered OOM immediately — faster, less heuristic.
/proc/<pid>/oom_score	Current OOM score for a process (read-only). Higher = more likely to be killed.
/proc/<pid>/oom_score_adj	Adjustment from -1000 to +1000. `-1000` = never kill. `+1000` = always kill first.

Shell

# See which process was OOM-killed
dmesg | grep -i "oom\|killed process"
journalctl -k | grep -i oom

# See OOM score for all processes (sorted)
for p in /proc/[0-9]*; do
  printf "%5d %3d %s\n" \
    "$(cat $p/oom_score 2>/dev/null)" \
    "$(cat $p/oom_score_adj 2>/dev/null)" \
    "$(cat $p/comm 2>/dev/null)"
done | sort -rn | head -20

# Protect a critical process from OOM kill
echo -1000 > /proc/$(pgrep myapp)/oom_score_adj

# Make a process a preferred OOM victim
echo +500 > /proc/$(pgrep lowpriority)/oom_score_adj

oom_score_adj persists only while the process runs Set it from your application's init script or systemd unit (OOMScoreAdjust=-500 in the [Service] section) to make it permanent.

Memory Memory Overcommit

Linux overcommits virtual memory by default — malloc() succeeds even when there isn't enough physical RAM to back the allocation, betting that not all of it will be used at once.

vm.overcommit_memory	Behaviour
`0` (default)	Heuristic — allow reasonable overcommit; kernel uses a formula to reject obviously excessive allocations.
`1`	Always allow — never fail `malloc()`. Used by some HPC and scientific workloads.
`2`	Strict — total committed memory cannot exceed `swap + (overcommit_ratio % of RAM)`. `malloc()` can fail.

Shell

# Check current setting
sysctl vm.overcommit_memory
sysctl vm.overcommit_ratio     # used when overcommit_memory=2 (default 50%)

# Check committed memory vs limit
grep -E "CommitLimit|Committed_AS" /proc/meminfo

Memory Hugepages

Hugepages (2 MB or 1 GB) reduce TLB pressure for large working sets by using fewer, larger page table entries. Critical for databases (PostgreSQL, Oracle, MySQL), Java heaps, and DPDK workloads.

Parameter	Description
vm.nr_hugepages	Number of pre-allocated 2 MB hugepages. Memory is reserved immediately and cannot be used for anything else.
vm.nr_overcommit_hugepages	Additional hugepages that can be allocated on demand (not pre-reserved).
vm.transparent_hugepages	Not a sysctl — lives at `/sys/kernel/mm/transparent_hugepage/enabled`. Values: `always`, `madvise`, `never`.

Shell

# Show hugepage stats
grep Huge /proc/meminfo

# Pre-allocate 512 × 2MB hugepages (= 1 GB reserved)
sysctl -w vm.nr_hugepages=512

# Check / change Transparent Hugepage setting
cat /sys/kernel/mm/transparent_hugepage/enabled
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

# Disable THP entirely (often better for databases)
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Transparent Hugepages and databases THP can cause latency spikes in databases (PostgreSQL, MongoDB, Redis) due to khugepaged compaction and copy-on-write overhead. Most database vendors recommend setting THP to madvise or never.

Kernel Limits

Parameter	Default	Description
kernel.pid_max	32768	Maximum PID value. When exhausted, `fork()` fails. Raise to 4194304 on busy systems running many containers.
kernel.threads-max	~30000	Max number of threads system-wide. Each thread consumes a PID.
fs.file-max	~9M	System-wide open file descriptor limit. Per-process limit is set via `ulimit -n` or systemd's `LimitNOFILE`.
fs.inotify.max_user_watches	8192	Max inotify watches per user. IDEs, editors, and build tools exhaust this on large codebases.
fs.inotify.max_user_instances	128	Max inotify instances per user.
kernel.max_map_count	65530	Max virtual memory map areas per process. Elasticsearch and JVM workloads commonly exhaust this.

Shell

# Raise PID limit for container-heavy hosts
sysctl -w kernel.pid_max=4194304

# Fix "too many open files" system-wide
sysctl -w fs.file-max=2097152

# Fix inotify limit (IDEs, webpack, etc.)
sysctl -w fs.inotify.max_user_watches=524288

# Fix Elasticsearch "max virtual memory areas" error
sysctl -w vm.max_map_count=262144

# Check current open file descriptor count
cat /proc/sys/fs/file-nr    # open / unused / max

Kernel Panic & Crash Behaviour

Parameter	Default	Description
kernel.panic	0	Seconds to wait before automatic reboot after a kernel panic. `0` = hang forever. Set to e.g. `10` for auto-reboot on production servers.
kernel.panic_on_oops	0	Panic (and thus reboot if `kernel.panic > 0`) on kernel oops. Useful when you prefer a clean reboot over running with a possibly corrupted kernel state.
vm.panic_on_oom	0	Panic instead of invoking the OOM killer.
kernel.unknown_nmi_halt	0	Halt on unknown NMI (Non-Maskable Interrupt). Useful for hardware fault isolation.

Shell

# Auto-reboot 10 seconds after a kernel panic
sysctl -w kernel.panic=10
sysctl -w kernel.panic_on_oops=1

# Check current panic setting
sysctl kernel.panic

Kernel Core Dumps

Parameter	Default	Description
kernel.core_pattern	core	Template for core dump filename. Supports `%p` (PID), `%e` (executable), `%t` (timestamp). Can pipe to a handler with `\|/usr/lib/...`.
kernel.core_uses_pid	0	Append PID to the core filename. Equivalent to adding `%p` to `core_pattern`.
fs.suid_dumpable	0	`0` = no core for setuid processes. `1` = dump (insecure). `2` = dump readable only by root.

Shell

# Write core dumps to /tmp with PID and name in filename
sysctl -w kernel.core_pattern=/tmp/core.%e.%p.%t

# Check what systemd-coredump uses (modern distros)
cat /proc/sys/kernel/core_pattern
# usually: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

# List core dumps captured by systemd-coredump
coredumpctl list

Kernel Scheduler

Parameter	Default	Description
kernel.sched_min_granularity_ns	750000	Minimum time a task runs before being preempted (nanoseconds). Lower = more responsive, higher context-switch overhead.
kernel.sched_latency_ns	6000000	Target scheduling latency — time within which every runnable task should run at least once (nanoseconds).
kernel.sched_migration_cost_ns	500000	Cost of migrating a task between CPUs. Higher value = less migration = better cache locality at the cost of load imbalance.
kernel.sched_autogroup_enabled	1	Group tasks by session for desktop fairness. Disable on servers to let all processes compete equally.

Shell

# Disable autogroup for server workloads
sysctl -w kernel.sched_autogroup_enabled=0

# Increase migration cost for cache-sensitive workloads
sysctl -w kernel.sched_migration_cost_ns=5000000

Security ASLR

Address Space Layout Randomisation randomises the memory addresses of the stack, heap, and shared libraries, making it harder to exploit memory corruption vulnerabilities.

kernel.randomize_va_space	Behaviour
`0`	Disabled — all addresses deterministic. Never use in production.
`1`	Randomise stack, VDSO, shared libraries.
`2`	Full ASLR — also randomise heap (default on most distros).

Shell

# Check ASLR setting
sysctl kernel.randomize_va_space

# Ensure full ASLR is enabled
sysctl -w kernel.randomize_va_space=2

Security dmesg & Kernel Pointers

Parameter	Default	Description
kernel.dmesg_restrict	0	`1` = only root can read dmesg. Prevents unprivileged users from reading kernel addresses and hardware info.
kernel.kptr_restrict	0	`1` = hide kernel symbol addresses from `/proc/kallsyms` for non-root. `2` = hide even from root.
kernel.perf_event_paranoid	2	Controls who can use perf. `-1` = all. `0` = all (plus CPU counters). `1` = user-level only. `2` = root only. `3` = nobody (some distros).

Shell

# Harden kernel info exposure
sysctl -w kernel.dmesg_restrict=1
sysctl -w kernel.kptr_restrict=1

# Allow non-root perf for development machines
sysctl -w kernel.perf_event_paranoid=1

Security Unprivileged BPF

Parameter	Default	Description
kernel.unprivileged_bpf_disabled	0 or 1	`0` = unprivileged users can load BPF programs. `1` = root only (write-once: cannot be set back to 0 without reboot). `2` = root only but reversible.
net.core.bpf_jit_enable	1	Enable BPF JIT compiler. `0` = interpret. `1` = JIT. `2` = JIT with debugging output.
net.core.bpf_jit_harden	0	`1` = harden JIT for unprivileged users (constant blinding). `2` = harden for all users.

Shell

# Restrict BPF to root (production servers)
sysctl -w kernel.unprivileged_bpf_disabled=1

# Harden JIT against Spectre-style side channels
sysctl -w net.core.bpf_jit_harden=2

Security SYN Cookies & ICMP

Parameter	Default	Description
net.ipv4.tcp_syncookies	1	Send SYN cookies when the SYN queue overflows — allows legitimate connections to proceed without a full queue entry. Essential for SYN flood mitigation.
net.ipv4.icmp_echo_ignore_broadcasts	1	Ignore ICMP echo requests to broadcast addresses (prevents Smurf attack amplification).
net.ipv4.icmp_ignore_bogus_error_responses	1	Suppress logging of bogus ICMP error responses.
net.ipv4.conf.all.log_martians	0	Log packets with impossible source addresses (martians) — useful for detecting spoofed traffic.

Shell

# Recommended security baseline for a public-facing server
sysctl -w net.ipv4.tcp_syncookies=1
sysctl -w net.ipv4.icmp_echo_ignore_broadcasts=1
sysctl -w net.ipv4.icmp_ignore_bogus_error_responses=1
sysctl -w net.ipv4.conf.all.accept_redirects=0
sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl -w net.ipv4.conf.all.rp_filter=1
sysctl -w kernel.dmesg_restrict=1
sysctl -w kernel.randomize_va_space=2

Reference Quick-Reference Table

Commonly changed parameters, their defaults, and the typical direction of change.

Parameter	Default	Typical Change	Reason
net.core.somaxconn	4096	↑ 65535	High-traffic servers / load balancers
net.ipv4.tcp_max_syn_backlog	1024	↑ 65535	Burst connection handling
net.ipv4.ip_local_port_range	32768 60999	1024 65535	More ephemeral ports for clients
net.ipv4.tcp_tw_reuse	0	↑ 1	Reuse TIME_WAIT ports for outbound
net.ipv4.tcp_fin_timeout	60	↓ 15	Release ports faster
net.core.rmem_max	212992	↑ 134217728	High-bandwidth / high-latency paths
net.core.wmem_max	212992	↑ 134217728	High-bandwidth / high-latency paths
net.ipv4.tcp_congestion_control	cubic	bbr	Better throughput on lossy or high-BDP links
net.netfilter.nf_conntrack_max	131072	↑ 2097152	Busy NAT gateways / firewalls
vm.swappiness	60	↓ 10	Reduce latency spikes on servers
vm.dirty_ratio	20	↓ 5	Reduce write latency spikes
vm.max_map_count	65530	↑ 262144	Elasticsearch, JVM applications
fs.file-max	~9M	↑ 2097152	Many open files (databases, proxies)
fs.inotify.max_user_watches	8192	↑ 524288	IDEs, build tools on large codebases
kernel.pid_max	32768	↑ 4194304	Container hosts with many processes
kernel.panic	0	10	Auto-reboot on production systems after panic
net.ipv4.ip_forward	0	↑ 1	Docker, Kubernetes, VPNs, routers

Reference Tuning Profiles

Web / API Server

/etc/sysctl.d/99-web-server.conf

# Connection capacity
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15

# Socket buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Congestion control
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq

# OS limits
fs.file-max = 2097152
vm.swappiness = 10

Database Server (PostgreSQL / MySQL)

/etc/sysctl.d/99-database.conf

# Avoid swapping — databases manage their own cache
vm.swappiness = 1

# Reduce write latency spikes from page cache flush
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2
vm.dirty_writeback_centisecs = 100

# Disable transparent hugepages (causes latency spikes)
# Set in /etc/rc.local or a systemd unit:
# echo never > /sys/kernel/mm/transparent_hugepage/enabled

# More shared memory for PostgreSQL
kernel.shmmax = 68719476736   # 64 GB
kernel.shmall = 4294967296

# Faster keepalive to detect dead connections
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6

Kubernetes / Container Host

/etc/sysctl.d/99-k8s.conf

# Required for pod networking
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1

# More PIDs for containers
kernel.pid_max = 4194304

# Disable swap (Kubernetes recommends this)
vm.swappiness = 0

# inotify for many file watchers across pods
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

# Connection tracking for services with many pods
net.netfilter.nf_conntrack_max = 2097152