Linux Security
Capabilities, seccomp, AppArmor, SELinux, namespaces, file ACLs, audit, and PAM
Linux Security Layers (defence in depth) ┌──────────────────────────────────────────────────────┐ │ PAM — Authentication & session controls │ ├──────────────────────────────────────────────────────┤ │ DAC — File permissions, ACLs (uid/gid based) │ ├──────────────────────────────────────────────────────┤ │ Capabilities — Fine-grained root privilege split │ ├──────────────────────────────────────────────────────┤ │ seccomp — Syscall filtering (allow/deny list) │ ├──────────────────────────────────────────────────────┤ │ LSM — AppArmor / SELinux (Mandatory Access Control) │ ├──────────────────────────────────────────────────────┤ │ Namespaces — Isolation (PID, net, mount, user, …) │ ├──────────────────────────────────────────────────────┤ │ Audit — Record security-relevant kernel events │ └──────────────────────────────────────────────────────┘
CapabilitiesPrivileges

Linux capabilities split the monolithic root privilege into ~40 distinct units. A process only needs to hold the specific capabilities for what it's doing — reducing the blast radius of a compromise.

Key capabilities

CapabilityAllows
CAP_NET_BIND_SERVICEBind to ports below 1024 without being root
CAP_NET_RAWUse raw sockets (ping, tcpdump)
CAP_SYS_ADMINBroad: mount, namespaces, many kernel ops — avoid granting this
CAP_SYS_PTRACEptrace another process (debuggers, strace)
CAP_SETUID / CAP_SETGIDChange UID/GID arbitrarily
CAP_KILLSend signals to processes owned by other users
CAP_CHOWNChange file ownership arbitrarily
CAP_DAC_OVERRIDEBypass DAC file permission checks
Shell
# Show capabilities of current process
capsh --print

# Show capabilities of a running process
cat /proc/<pid>/status | grep Cap
capsh --decode=$(cat /proc/<pid>/status | grep CapEff | awk '{print $2}')

# Set file capabilities (no setuid needed)
setcap cap_net_bind_service+ep /usr/bin/node
getcap /usr/bin/node

# Remove file capabilities
setcap -r /usr/bin/node

# Drop capabilities in systemd service
# [Service]
# CapabilityBoundingSet=CAP_NET_BIND_SERVICE
# AmbientCapabilities=CAP_NET_BIND_SERVICE
# NoNewPrivileges=yes
seccompSyscalls

seccomp (secure computing mode) restricts which syscalls a process can make. seccomp-bpf allows writing BPF programs that inspect and filter each syscall. Docker, browsers, and systemd use this extensively.

seccomp modes

ModeBehaviour
SECCOMP_MODE_STRICTOnly read, write, exit, sigreturn allowed. Anything else → SIGKILL.
SECCOMP_MODE_FILTERBPF filter inspects each syscall. Can allow, deny, trap, log, or return errno.
Shell
# Check if a process has a seccomp filter
cat /proc/<pid>/status | grep Seccomp
# 0=disabled, 1=strict, 2=filter

# Docker default seccomp profile (allows ~300 of ~400 syscalls)
docker run --security-opt seccomp=/path/to/profile.json myimage

# Disable seccomp (e.g. for debugging — not for production)
docker run --security-opt seccomp=unconfined myimage

# systemd service seccomp filter
# [Service]
# SystemCallFilter=@system-service    # predefined set
# SystemCallFilter=~@privileged       # deny privileged calls
# SystemCallErrorNumber=EPERM         # return EPERM instead of kill

# List systemd syscall groups
systemd-analyze syscall-filter
Use strace -c myapp to see which syscalls your app actually uses, then build a minimal allowlist.
AppArmorMAC

AppArmor is a Mandatory Access Control (MAC) system that confines programs to a set of allowed files, capabilities, and network operations using per-program profiles. Default on Ubuntu/Debian.

Profile modes

ModeBehaviour
enforcePolicy is enforced. Violations are denied and logged.
complainPolicy is not enforced. Violations are only logged. Use for profiling.
disabledProfile is loaded but not applied.
Shell
# Check AppArmor status
aa-status
apparmor_status

# Set a profile to complain mode (for learning)
aa-complain /etc/apparmor.d/usr.sbin.nginx

# Set back to enforce
aa-enforce /etc/apparmor.d/usr.sbin.nginx

# Reload a profile
apparmor_parser -r /etc/apparmor.d/usr.sbin.nginx

# Generate a profile from scratch (interactive)
aa-genprof /usr/bin/myapp
# run the app in another terminal, then press S to scan logs

# Check violation logs
journalctl -k | grep apparmor
grep DENIED /var/log/syslog

Profile syntax

AppArmor
/usr/bin/myapp {
  #include <abstractions/base>

  /etc/myapp/**  r,          # read config
  /var/lib/myapp/** rw,      # read-write data dir
  /var/log/myapp/*.log w,    # write logs
  /tmp/myapp-*   rw,

  network tcp,               # allow TCP
  capability net_bind_service,

  deny /etc/shadow r,        # explicit deny
}
SELinuxMAC

SELinux (Security-Enhanced Linux) is a MAC system using type enforcement. Every process and file has a security context (user:role:type:level). The policy defines which types can interact with which. Default on RHEL/CentOS/Fedora.

SELinux contexts

Shell
# Show SELinux status and mode
getenforce              # Enforcing | Permissive | Disabled
sestatus

# View process contexts
ps -eZ | grep nginx     # system_u:system_r:httpd_t:s0

# View file contexts
ls -Z /var/www/html/
stat -c '%C' /var/www/html/index.html

# Temporarily switch mode (not persistent)
setenforce 0            # Permissive (log but don't block)
setenforce 1            # Enforcing

# Persistent mode (requires reboot)
# Edit /etc/selinux/config: SELINUX=permissive

Common SELinux operations

Shell
# Fix file context (relabel to match policy)
restorecon -Rv /var/www/html/
chcon -t httpd_sys_content_t /srv/mysite/index.html

# Check what's being denied
ausearch -m avc -ts recent
journalctl | grep avc

# Generate policy module from denials (audit2allow)
ausearch -m avc -ts recent | audit2allow -M mymodule
semodule -i mymodule.pp

# List and toggle booleans
getsebool -a | grep httpd
setsebool -P httpd_can_network_connect on  # -P = persistent

# Allow nginx to connect to upstream (common fix)
setsebool -P httpd_can_network_connect 1
When something silently fails on RHEL/CentOS, always check ausearch -m avc -ts recent first. SELinux denials are the #1 overlooked cause.
NamespacesIsolation

Linux namespaces isolate system resources so that a set of processes has its own view of the system. They are the foundation of containers.

Namespace types

NamespaceIsolatesFlag
pidProcess IDs. PID 1 inside is different from host PID 1.CLONE_NEWPID
netNetwork interfaces, routes, iptables rules, sockets.CLONE_NEWNET
mntMount points — different filesystem view per namespace.CLONE_NEWNS
utsHostname and domain name (uname).CLONE_NEWUTS
ipcSysV IPC, POSIX message queues.CLONE_NEWIPC
userUID/GID mappings — rootless containers.CLONE_NEWUSER
cgroupcgroup root — hides host cgroup hierarchy.CLONE_NEWCGROUP
timeSystem clocks (monotonic, boottime). Linux ≥ 5.6.CLONE_NEWTIME
Shell
# List namespaces of a process
ls -la /proc/<pid>/ns/

# Enter another process's namespace
nsenter -t <pid> --net --pid -- bash
nsenter -t <pid> --all -- bash   # all namespaces

# Create a new network namespace
ip netns add myns
ip netns exec myns ip link list
ip netns del myns

# Run a process in new namespaces (like a mini-container)
unshare --pid --fork --mount-proc bash

# List all network namespaces
ip netns list
lsns --type net
File PermissionsDAC
Shell
# Permission bits: rwxrwxrwx = user|group|other
chmod 644 file          # rw-r--r--
chmod 755 dir           # rwxr-xr-x
chmod u+x,g-w file      # symbolic
chmod -R 750 /dir       # recursive

# Special bits
chmod u+s /usr/bin/passwd   # setuid: runs as file owner
chmod g+s /shared/dir       # setgid: new files inherit group
chmod +t /tmp               # sticky: only owner can delete

# Change ownership
chown user:group file
chown -R www-data:www-data /var/www/

# Default permissions (umask)
umask              # show current (e.g. 0022)
umask 027          # new files: 640, dirs: 750

# Find dangerous permissions
find / -perm -4000 2>/dev/null   # setuid files
find / -perm -2000 2>/dev/null   # setgid files
find /tmp -perm -0002 -not -perm -1000  # world-writable without sticky
ACLsDAC

POSIX ACLs extend traditional permissions to grant access to specific users or groups beyond owner/group/other.

Shell
# View ACLs
getfacl file
getfacl -R /dir

# Grant user alice read+write
setfacl -m u:alice:rw file

# Grant group devs read access to directory
setfacl -m g:devs:r /data

# Default ACL — inherited by new files in directory
setfacl -d -m u:alice:rw /shared/dir

# Remove an ACL entry
setfacl -x u:alice file

# Remove all ACLs
setfacl -b file

# Copy ACL from one file to another
getfacl source | setfacl --set-file=- dest
ACL support must be enabled on the filesystem. For ext4: mount with -o acl or add acl to /etc/fstab options. Most modern distros enable it by default.
Linux AuditAudit

The Linux audit subsystem records security-relevant kernel events — file access, syscall invocations, user logins, privilege escalation. Required for PCI-DSS, HIPAA, and similar compliance frameworks.

Shell
# Check auditd status
systemctl status auditd
auditctl -s

# Watch a file for any access
auditctl -w /etc/passwd -p warx -k passwd-changes
# -p: permissions to watch (r=read, w=write, a=attr, x=exec)
# -k: key tag for searching

# Audit syscalls (e.g., all execve calls by non-root)
auditctl -a always,exit -F arch=b64 -S execve -F uid!=0 -k user-execve

# List active rules
auditctl -l

# Delete all rules
auditctl -D

# Search audit log
ausearch -k passwd-changes
ausearch -m execve --start today
ausearch -ua 1000          # by UID

# Generate summary report
aureport --summary
aureport --login           # login events
aureport --failed          # failed events

Persistent rules (/etc/audit/rules.d/)

auditd rules
# /etc/audit/rules.d/hardening.rules
-w /etc/passwd -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/auth.log -p wa -k authlog
-a always,exit -F arch=b64 -S setuid -k privilege-escalation
-a always,exit -F arch=b64 -S mount -k mounts
PAMAuth

PAM (Pluggable Authentication Modules) provides a flexible authentication framework. Applications call PAM APIs; PAM runs a stack of modules defined in /etc/pam.d/ for each service.

PAM control flags

FlagBehaviour on failure
requiredFailure is remembered but processing continues. Overall result is failure.
requisiteFailure immediately returns failure. Remaining modules not run.
sufficientSuccess returns overall success (if no prior required failures). Rest skipped.
optionalResult only matters if it's the only module for the type.
PAM config
# /etc/pam.d/sshd (example)
auth       required     pam_faillock.so preauth
auth       required     pam_unix.so
auth       required     pam_faillock.so authfail

account    required     pam_unix.so
account    required     pam_nologin.so

session    required     pam_limits.so       # enforce ulimits
session    required     pam_unix.so
session    optional     pam_motd.so
Shell
# Lock out after 5 failed attempts (pam_faillock)
# /etc/security/faillock.conf:
# deny = 5
# unlock_time = 900

# Check faillock status
faillock --user alice

# Unlock a user
faillock --user alice --reset

# Enforce password complexity (pam_pwquality)
# /etc/security/pwquality.conf:
# minlen = 12
# dcredit = -1   # at least 1 digit
# ucredit = -1   # at least 1 uppercase
# lcredit = -1   # at least 1 lowercase
Cheat SheetReference

Capabilities

capsh --print
setcap cap_net_bind_service+ep /bin
getcap /bin
cat /proc/PID/status | grep Cap

seccomp

cat /proc/PID/status | grep Seccomp
strace -c cmd — find used syscalls
Docker: --security-opt seccomp=profile.json
systemd: SystemCallFilter=@system-service

AppArmor

aa-status
aa-complain /etc/apparmor.d/x
aa-enforce /etc/apparmor.d/x
journalctl -k | grep apparmor

SELinux

getenforce
ausearch -m avc -ts recent
restorecon -Rv /path
setsebool -P httpd_can_network_connect 1

Audit

auditctl -w /etc/passwd -p warx -k tag
ausearch -k tag
aureport --summary
Rules: /etc/audit/rules.d/

Hardening checklist

NoNewPrivileges=yes in services
Drop capabilities not needed
Enable seccomp filter
Use AppArmor/SELinux profiles
Audit /etc/passwd, /etc/sudoers