User:NetSysFire/systemd sandboxing
Sandboxing systemd service units
systemd enables users to harden and sandbox systemd system service units. Because of technical limitations, and ironically security reasons, user units can not be hardened or sandboxed properly since this would make privilege escalation issues possible. This does not affect system units which use the User=
directive.
Because of the nature of other unit types, only service units can be hardened/sandboxed in the traditional sense. See systemd.exec(5) for more information.
General
Since hardening/sandboxing effectively restricts an application, it is not possible to use all the sandboxing directives. A webserver for example should not use ProtectNetwork=true
since it usually needs network access.
systemd-analyze security unit
generates a score for the unit showing all the used directives, which can be helpful to determine what settings to try next.
Hello world
can achieve a near perfect score. No application can use all the sandboxing settings.Unfortunately, systemd's error messages on misconfigurations relating to sandboxing are sometimes vague and/or misleading. Setting the log level temporarily to debug
may help getting actually relevant information.
# systemctl log-level debug
Common directives
Most of these directives can be applied to most applications without causing too many problems.
Without special configuration
Simple boolean settings which can either be enabled or not. They can not be configured.
Directive | Impact1 | Breakage2 | Notes |
---|---|---|---|
LockPersonality |
Medium | Low | |
MemoryDenyWriteExecute |
Medium3 | Medium | |
NoNewPrivileges |
High | Low | |
PrivateDevices |
Medium | Low | /dev/null and similar will still be there
|
PrivateNetwork |
High | Very high | Disallows any network access. |
PrivateTmp |
Medium | Low | |
PrivateUsers |
High | High | |
ProtectClock 5 |
Low | Medium4 | |
ProtectControlGroups 5 |
Medium | Low | Highly recommended since no service should write to that |
ProtectHostname 5 |
Low | Low | |
ProtectKernelLogs 5,6 |
Low | Low | |
ProtectKernelModules 5 |
Medium | Low | |
ProtectKernelTunables 5 |
Low | Low | |
RestrictRealtime |
Low | Low | May prevent denial-of-service situations |
RestrictSUIDSGID |
Medium | Low | Best used with NoNewPrivileges
|
- How effective the directive is
- How likely the directive is to break something
- Can be enhanced with
SystemCallFilter
- Some users reported
smartctl
can not run when this is set, but this should be relatively safe. - Even when running as another
User=
, systemd setups seccomp filters, which can e.g catch the application runningsudo modprobe
whenProtectKernelModules
is set totrue
- All official kernels have set
SECURITY_DMESG_RESTRICT
toy
, but this is still defense in depth.
Configurable directives
Directive | Value | Impact1 | Breakage2 | Notes |
---|---|---|---|---|
ProtectSystem |
strict |
Very high | Very high | Usually used with ReadWritePaths=
|
full |
High | Medium | May break e.g webservers using ACME to renew their own keys which may be in /etc
| |
true |
High | Medium | There are in theory few applications which write to /boot and /usr
| |
ProtectHome |
true |
Medium | Medium | Some applications may need persistent data stored in XDG_CONFIG_HOME 3
|
tmpfs |
Low | Low | ||
read-only |
Low | Low | Ideal for backup services |
- How effective the directive is
- How likely the directive is to break something
StateDirectory=
can be used to mitigate some of the negative consequences
Advanced directives
Syscall filter etc
chroot jail
description how to use RootDirectory=, BindPaths=, BindReadOnlyPaths= and maybe also TemporaryFileSystem=/:ro
system.conf
Changes to /etc/systemd/system.conf
are global, so they will affect every unit. See systemd-system.conf(5)
Disabling non-native syscalls
Non-native binaries, in almost all cases 32-bit binaries, may partially compromise the security of the system because they do not have access to more hardening. There have been some relatively minor vulnerabilities, like CVE-2009-0835, which affected non-native syscalls.
SystemCallArchitectures=native
This works well on most systems, but it needs to be at least partially disabled if e.g multilib is in use. Especially gaming with Wine may be impacted. Using systemd-run
or modifying the session slice to override SystemCallArchitectures
can be used to disable restrictions partially.
Enabling more unit statistics
systemd does not track all resource usage of a unit by default. Enable Default*Accounting
to get more statistics in the systemctl status
output and the journal. This is not strictly a security setting, but it will certainly make debugging easier.
DefaultCPUAccounting=yes DefaultIOAccounting=yes DefaultIPAccounting=yes DefaultBlockIOAccounting=yes DefaultMemoryAccounting=yes DefaultTasksAccounting=yes