Linux capabilities split the monolithic root privilege into ~40 distinct units. A process only needs to hold the specific capabilities for what it's doing — reducing the blast radius of a compromise.
Key capabilities
| Capability | Allows |
|---|---|
| CAP_NET_BIND_SERVICE | Bind to ports below 1024 without being root |
| CAP_NET_RAW | Use raw sockets (ping, tcpdump) |
| CAP_SYS_ADMIN | Broad: mount, namespaces, many kernel ops — avoid granting this |
| CAP_SYS_PTRACE | ptrace another process (debuggers, strace) |
| CAP_SETUID / CAP_SETGID | Change UID/GID arbitrarily |
| CAP_KILL | Send signals to processes owned by other users |
| CAP_CHOWN | Change file ownership arbitrarily |
| CAP_DAC_OVERRIDE | Bypass DAC file permission checks |
# Show capabilities of current process capsh --print # Show capabilities of a running process cat /proc/<pid>/status | grep Cap capsh --decode=$(cat /proc/<pid>/status | grep CapEff | awk '{print $2}') # Set file capabilities (no setuid needed) setcap cap_net_bind_service+ep /usr/bin/node getcap /usr/bin/node # Remove file capabilities setcap -r /usr/bin/node # Drop capabilities in systemd service # [Service] # CapabilityBoundingSet=CAP_NET_BIND_SERVICE # AmbientCapabilities=CAP_NET_BIND_SERVICE # NoNewPrivileges=yes
seccomp (secure computing mode) restricts which syscalls a process can make. seccomp-bpf allows writing BPF programs that inspect and filter each syscall. Docker, browsers, and systemd use this extensively.
seccomp modes
| Mode | Behaviour |
|---|---|
| SECCOMP_MODE_STRICT | Only read, write, exit, sigreturn allowed. Anything else → SIGKILL. |
| SECCOMP_MODE_FILTER | BPF filter inspects each syscall. Can allow, deny, trap, log, or return errno. |
# Check if a process has a seccomp filter cat /proc/<pid>/status | grep Seccomp # 0=disabled, 1=strict, 2=filter # Docker default seccomp profile (allows ~300 of ~400 syscalls) docker run --security-opt seccomp=/path/to/profile.json myimage # Disable seccomp (e.g. for debugging — not for production) docker run --security-opt seccomp=unconfined myimage # systemd service seccomp filter # [Service] # SystemCallFilter=@system-service # predefined set # SystemCallFilter=~@privileged # deny privileged calls # SystemCallErrorNumber=EPERM # return EPERM instead of kill # List systemd syscall groups systemd-analyze syscall-filter
strace -c myapp to see which syscalls your app actually uses, then build a minimal allowlist.AppArmor is a Mandatory Access Control (MAC) system that confines programs to a set of allowed files, capabilities, and network operations using per-program profiles. Default on Ubuntu/Debian.
Profile modes
| Mode | Behaviour |
|---|---|
| enforce | Policy is enforced. Violations are denied and logged. |
| complain | Policy is not enforced. Violations are only logged. Use for profiling. |
| disabled | Profile is loaded but not applied. |
# Check AppArmor status aa-status apparmor_status # Set a profile to complain mode (for learning) aa-complain /etc/apparmor.d/usr.sbin.nginx # Set back to enforce aa-enforce /etc/apparmor.d/usr.sbin.nginx # Reload a profile apparmor_parser -r /etc/apparmor.d/usr.sbin.nginx # Generate a profile from scratch (interactive) aa-genprof /usr/bin/myapp # run the app in another terminal, then press S to scan logs # Check violation logs journalctl -k | grep apparmor grep DENIED /var/log/syslog
Profile syntax
/usr/bin/myapp {
#include <abstractions/base>
/etc/myapp/** r, # read config
/var/lib/myapp/** rw, # read-write data dir
/var/log/myapp/*.log w, # write logs
/tmp/myapp-* rw,
network tcp, # allow TCP
capability net_bind_service,
deny /etc/shadow r, # explicit deny
}
SELinux (Security-Enhanced Linux) is a MAC system using type enforcement. Every process and file has a security context (user:role:type:level). The policy defines which types can interact with which. Default on RHEL/CentOS/Fedora.
SELinux contexts
# Show SELinux status and mode getenforce # Enforcing | Permissive | Disabled sestatus # View process contexts ps -eZ | grep nginx # system_u:system_r:httpd_t:s0 # View file contexts ls -Z /var/www/html/ stat -c '%C' /var/www/html/index.html # Temporarily switch mode (not persistent) setenforce 0 # Permissive (log but don't block) setenforce 1 # Enforcing # Persistent mode (requires reboot) # Edit /etc/selinux/config: SELINUX=permissive
Common SELinux operations
# Fix file context (relabel to match policy) restorecon -Rv /var/www/html/ chcon -t httpd_sys_content_t /srv/mysite/index.html # Check what's being denied ausearch -m avc -ts recent journalctl | grep avc # Generate policy module from denials (audit2allow) ausearch -m avc -ts recent | audit2allow -M mymodule semodule -i mymodule.pp # List and toggle booleans getsebool -a | grep httpd setsebool -P httpd_can_network_connect on # -P = persistent # Allow nginx to connect to upstream (common fix) setsebool -P httpd_can_network_connect 1
ausearch -m avc -ts recent first. SELinux denials are the #1 overlooked cause.Linux namespaces isolate system resources so that a set of processes has its own view of the system. They are the foundation of containers.
Namespace types
| Namespace | Isolates | Flag |
|---|---|---|
| pid | Process IDs. PID 1 inside is different from host PID 1. | CLONE_NEWPID |
| net | Network interfaces, routes, iptables rules, sockets. | CLONE_NEWNET |
| mnt | Mount points — different filesystem view per namespace. | CLONE_NEWNS |
| uts | Hostname and domain name (uname). | CLONE_NEWUTS |
| ipc | SysV IPC, POSIX message queues. | CLONE_NEWIPC |
| user | UID/GID mappings — rootless containers. | CLONE_NEWUSER |
| cgroup | cgroup root — hides host cgroup hierarchy. | CLONE_NEWCGROUP |
| time | System clocks (monotonic, boottime). Linux ≥ 5.6. | CLONE_NEWTIME |
# List namespaces of a process ls -la /proc/<pid>/ns/ # Enter another process's namespace nsenter -t <pid> --net --pid -- bash nsenter -t <pid> --all -- bash # all namespaces # Create a new network namespace ip netns add myns ip netns exec myns ip link list ip netns del myns # Run a process in new namespaces (like a mini-container) unshare --pid --fork --mount-proc bash # List all network namespaces ip netns list lsns --type net
# Permission bits: rwxrwxrwx = user|group|other chmod 644 file # rw-r--r-- chmod 755 dir # rwxr-xr-x chmod u+x,g-w file # symbolic chmod -R 750 /dir # recursive # Special bits chmod u+s /usr/bin/passwd # setuid: runs as file owner chmod g+s /shared/dir # setgid: new files inherit group chmod +t /tmp # sticky: only owner can delete # Change ownership chown user:group file chown -R www-data:www-data /var/www/ # Default permissions (umask) umask # show current (e.g. 0022) umask 027 # new files: 640, dirs: 750 # Find dangerous permissions find / -perm -4000 2>/dev/null # setuid files find / -perm -2000 2>/dev/null # setgid files find /tmp -perm -0002 -not -perm -1000 # world-writable without sticky
POSIX ACLs extend traditional permissions to grant access to specific users or groups beyond owner/group/other.
# View ACLs getfacl file getfacl -R /dir # Grant user alice read+write setfacl -m u:alice:rw file # Grant group devs read access to directory setfacl -m g:devs:r /data # Default ACL — inherited by new files in directory setfacl -d -m u:alice:rw /shared/dir # Remove an ACL entry setfacl -x u:alice file # Remove all ACLs setfacl -b file # Copy ACL from one file to another getfacl source | setfacl --set-file=- dest
-o acl or add acl to /etc/fstab options. Most modern distros enable it by default.The Linux audit subsystem records security-relevant kernel events — file access, syscall invocations, user logins, privilege escalation. Required for PCI-DSS, HIPAA, and similar compliance frameworks.
# Check auditd status systemctl status auditd auditctl -s # Watch a file for any access auditctl -w /etc/passwd -p warx -k passwd-changes # -p: permissions to watch (r=read, w=write, a=attr, x=exec) # -k: key tag for searching # Audit syscalls (e.g., all execve calls by non-root) auditctl -a always,exit -F arch=b64 -S execve -F uid!=0 -k user-execve # List active rules auditctl -l # Delete all rules auditctl -D # Search audit log ausearch -k passwd-changes ausearch -m execve --start today ausearch -ua 1000 # by UID # Generate summary report aureport --summary aureport --login # login events aureport --failed # failed events
Persistent rules (/etc/audit/rules.d/)
# /etc/audit/rules.d/hardening.rules
-w /etc/passwd -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/auth.log -p wa -k authlog
-a always,exit -F arch=b64 -S setuid -k privilege-escalation
-a always,exit -F arch=b64 -S mount -k mounts
PAM (Pluggable Authentication Modules) provides a flexible authentication framework. Applications call PAM APIs; PAM runs a stack of modules defined in /etc/pam.d/ for each service.
PAM control flags
| Flag | Behaviour on failure |
|---|---|
| required | Failure is remembered but processing continues. Overall result is failure. |
| requisite | Failure immediately returns failure. Remaining modules not run. |
| sufficient | Success returns overall success (if no prior required failures). Rest skipped. |
| optional | Result only matters if it's the only module for the type. |
# /etc/pam.d/sshd (example) auth required pam_faillock.so preauth auth required pam_unix.so auth required pam_faillock.so authfail account required pam_unix.so account required pam_nologin.so session required pam_limits.so # enforce ulimits session required pam_unix.so session optional pam_motd.so
# Lock out after 5 failed attempts (pam_faillock) # /etc/security/faillock.conf: # deny = 5 # unlock_time = 900 # Check faillock status faillock --user alice # Unlock a user faillock --user alice --reset # Enforce password complexity (pam_pwquality) # /etc/security/pwquality.conf: # minlen = 12 # dcredit = -1 # at least 1 digit # ucredit = -1 # at least 1 uppercase # lcredit = -1 # at least 1 lowercase
Capabilities
capsh --printsetcap cap_net_bind_service+ep /bingetcap /bincat /proc/PID/status | grep Cap
seccomp
cat /proc/PID/status | grep Seccompstrace -c cmd — find used syscalls
Docker: --security-opt seccomp=profile.json
systemd: SystemCallFilter=@system-service
AppArmor
aa-statusaa-complain /etc/apparmor.d/xaa-enforce /etc/apparmor.d/xjournalctl -k | grep apparmor
SELinux
getenforceausearch -m avc -ts recentrestorecon -Rv /pathsetsebool -P httpd_can_network_connect 1
Audit
auditctl -w /etc/passwd -p warx -k tagausearch -k tagaureport --summary
Rules: /etc/audit/rules.d/
Hardening checklist
NoNewPrivileges=yes in services
Drop capabilities not needed
Enable seccomp filter
Use AppArmor/SELinux profiles
Audit /etc/passwd, /etc/sudoers