Difference between revisions of "Sysctl"

From ArchWiki
Jump to: navigation, search
m (Text "(default)" without comments cause sysctl service to fail)
(Virtual memory: make use of brandnew extension:) https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#Rounding)
 
(34 intermediate revisions by 15 users not shown)
Line 1: Line 1:
 
{{Lowercase title}}
 
{{Lowercase title}}
 
[[Category:Kernel]]
 
[[Category:Kernel]]
[[Wikipedia:sysctl|sysctl]] is a tool for examining and changing [[Kernel parameters|kernel parameters]] at runtime (package {{Pkg|procps-ng}} in [[official repositories]]). sysctl is implemented in procfs, the virtual process file system at {{ic|/proc/}}.
+
[[ja:Sysctl]]
 +
[[Wikipedia:sysctl|sysctl]] is a tool for examining and changing [[kernel parameters]] at runtime (package {{Pkg|procps-ng}} in [[official repositories]]). sysctl is implemented in procfs, the virtual process file system at {{ic|/proc/}}.
  
 
== Configuration ==
 
== Configuration ==
  
{{Note|From version 207, [[systemd]] only applies settings from {{Ic|/etc/sysctl.d/*}} and {{Ic|/usr/lib/sysctl.d/*}}. If you had customized {{Ic|/etc/sysctl.conf}}, you need to rename it as {{Ic|/etc/sysctl.d/99-sysctl.conf}}.}}
+
{{Note|From version 207 and 21x, [[systemd]] only applies settings from {{Ic|/etc/sysctl.d/*.conf}} and {{Ic|/usr/lib/sysctl.d/*.conf}}. If you had customized {{Ic|/etc/sysctl.conf}}, you need to rename it as {{Ic|/etc/sysctl.d/99-sysctl.conf}}. If you had e.g. {{Ic|/etc/sysctl.d/foo}}, you need to rename is to {{Ic|/etc/sysctl.d/foo.conf}}.}}
  
 
The '''sysctl''' preload/configuration file can be created at {{Ic|/etc/sysctl.d/99-sysctl.conf}}. For [[systemd]], {{Ic|/etc/sysctl.d/}} and {{Ic|/usr/lib/sysctl.d/}} are drop-in directories for kernel sysctl parameters. The naming and source directory decide the order of processing, which is important since the last parameter processed may override earlier ones. For example, parameters in a {{ic|/usr/lib/sysctl.d/50-default.conf}} will be overriden by equal parameters in {{ic|/etc/sysctl.d/50-default.conf}} and any configuration file processed later from both directories.  
 
The '''sysctl''' preload/configuration file can be created at {{Ic|/etc/sysctl.d/99-sysctl.conf}}. For [[systemd]], {{Ic|/etc/sysctl.d/}} and {{Ic|/usr/lib/sysctl.d/}} are drop-in directories for kernel sysctl parameters. The naming and source directory decide the order of processing, which is important since the last parameter processed may override earlier ones. For example, parameters in a {{ic|/usr/lib/sysctl.d/50-default.conf}} will be overriden by equal parameters in {{ic|/etc/sysctl.d/50-default.conf}} and any configuration file processed later from both directories.  
Line 18: Line 19:
 
The parameters available are those listed under {{Ic|/proc/sys/}}. For example, the {{Ic|kernel.sysrq}} parameter refers to the file {{Ic|/proc/sys/kernel/sysrq}} on the file system. The {{Ic|sysctl -a}} command can be used to display all currently available values.
 
The parameters available are those listed under {{Ic|/proc/sys/}}. For example, the {{Ic|kernel.sysrq}} parameter refers to the file {{Ic|/proc/sys/kernel/sysrq}} on the file system. The {{Ic|sysctl -a}} command can be used to display all currently available values.
  
{{Note|If you have the kernel documentation installed ({{Pkg|linux-docs}}), you can find detailed information about sysctl settings in {{Ic|/usr/src/linux-$(uname -r)/Documentation/sysctl/}}. It is highly recommended reading these before changing sysctl settings.}}
+
{{Note|If you have the kernel documentation installed ({{Pkg|linux-docs}}), you can find detailed information about sysctl settings in {{Ic|/usr/lib/modules/$(uname -r)/build/Documentation/sysctl/}}. It is highly recommended reading these before changing sysctl settings.}}
  
 
Settings can be changed through file manipulation or using the {{Ic|sysctl}} utility. For example, to temporarily enable the [[Wikipedia:Magic_SysRq_key|magic SysRq key]]:
 
Settings can be changed through file manipulation or using the {{Ic|sysctl}} utility. For example, to temporarily enable the [[Wikipedia:Magic_SysRq_key|magic SysRq key]]:
Line 29: Line 30:
  
 
To preserve changes between reboots, add or modify the appropriate lines in {{Ic|/etc/sysctl.d/99-sysctl.conf}} or another applicable parameter file in {{Ic|/etc/sysctl.d/}}.
 
To preserve changes between reboots, add or modify the appropriate lines in {{Ic|/etc/sysctl.d/99-sysctl.conf}} or another applicable parameter file in {{Ic|/etc/sysctl.d/}}.
 +
{{Tip|Some parameters that can be applied may depend on kernel modules which in turn might not be loaded. For example parameters in {{ic|/proc/sys/net/bridge/*}} depend on the {{ic|br_netfilter}} module. If it is not loaded at runtime (or after a reboot), those will ''silently'' not be applied. See [[Kernel modules]].}}
  
 
== Security ==
 
== Security ==
  
=== Preventing link [[Wikipedia:TOCTOU|TOCTOU]] vulnerabilities ===
+
See [[Security#Kernel hardening]].
 
+
See the [https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=800179c9b8a1e796e441674776d11cd4c05d61d7 commit message] for when this feature was added for the rationale.
+
 
+
fs.protected_hardlinks = 1
+
fs.protected_symlinks = 1
+
 
+
{{Note|Already enabled by default nowadays. Only left here as information.}}
+
 
+
=== Hide kernel symbol addresses ===
+
 
+
Enabling {{ic|kernel.kptr_restrict}} will hide kernel symbol addresses in {{ic|/proc/kallsyms}} from regular users, making it more difficult for kernel exploits to resolve addresses/symbols dynamically. This will not help that much on a precompiled Arch Linux kernel, since a determined attacker could just download the kernel package and get the symbols manually from there, but if you're compiling your own kernel, this can help mitigating local root exploits. This will break some {{Pkg|perf}} commands when used by non-root users (but main {{Pkg|perf}} features require root access anyway). See {{Bug|34323}} for more information.
+
 
+
kernel.kptr_restrict = 1
+
  
 
== Networking ==
 
== Networking ==
Line 58: Line 47:
  
 
=== TCP/IP stack hardening ===
 
=== TCP/IP stack hardening ===
 +
 +
The following specifies a parameter set to tighten network security options of the kernel for the IPv4 protocol and related IPv6 parameters where an equivalent exists.
 +
 +
For some usecases, for example using the system as a [[router]], other parameters may be useful or required as well.
  
 
{{bc|1=
 
{{bc|1=
#### ipv4 networking ####
+
#### ipv4 networking and equivalent ipv6 parameters ####
  
 
## TCP SYN cookie protection (default)
 
## TCP SYN cookie protection (default)
Line 71: Line 64:
 
## (not widely supported outside of linux, but conforms to RFC)
 
## (not widely supported outside of linux, but conforms to RFC)
 
net.ipv4.tcp_rfc1337 = 1
 
net.ipv4.tcp_rfc1337 = 1
 +
 +
## sets the kernels reverse path filtering mechanism to value 1(on)
 +
## will do source validation of the packet's recieved from all the interfaces on the machine
 +
## protects from attackers that are using ip spoofing methods to do harm
 +
net.ipv4.conf.all.rp_filter = 1
 +
net.ipv6.conf.all.rp_filter = 1
  
 
## tcp timestamps
 
## tcp timestamps
Line 79: Line 78:
 
net.ipv4.tcp_timestamps = 0
 
net.ipv4.tcp_timestamps = 0
 
#net.ipv4.tcp_timestamps = 1
 
#net.ipv4.tcp_timestamps = 1
 
## source address verification (sanity checking)
 
## helps protect against spoofing attacks
 
net.ipv4.conf.all.rp_filter = 1
 
 
## disable ALL packet forwarding (not a router, disable it) (default)
 
net.ipv4.ip_forward = 0
 
  
 
## log martian packets
 
## log martian packets
Line 92: Line 84:
 
## ignore echo broadcast requests to prevent being part of smurf attacks (default)
 
## ignore echo broadcast requests to prevent being part of smurf attacks (default)
 
net.ipv4.icmp_echo_ignore_broadcasts = 1
 
net.ipv4.icmp_echo_ignore_broadcasts = 1
 
## optionally, ignore all echo requests
 
## this is NOT recommended, as it ignores echo requests on localhost as well
 
#net.ipv4.icmp_echo_ignore_all = 1
 
  
 
## ignore bogus icmp errors (default)
 
## ignore bogus icmp errors (default)
 
net.ipv4.icmp_ignore_bogus_error_responses = 1
 
net.ipv4.icmp_ignore_bogus_error_responses = 1
 
## IP source routing (insecure, disable it) (default)
 
net.ipv4.conf.all.accept_source_route = 0
 
  
 
## send redirects (not a router, disable it)
 
## send redirects (not a router, disable it)
Line 107: Line 92:
  
 
## ICMP routing redirects (only secure)
 
## ICMP routing redirects (only secure)
net.ipv4.conf.all.accept_redirects = 0
 
 
#net.ipv4.conf.all.secure_redirects = 1 (default)
 
#net.ipv4.conf.all.secure_redirects = 1 (default)
 +
net.ipv4.conf.default.accept_redirects=0
 +
net.ipv4.conf.all.accept_redirects=0
 +
net.ipv6.conf.default.accept_redirects=0
 +
net.ipv6.conf.all.accept_redirects=0
 
}}
 
}}
 +
 +
== Virtual memory ==
 +
 +
There are several key parameters to tune the operation of the virtual memory (VM) subsystem of the Linux kernel and the writeout of dirty data to disk. See the [https://www.kernel.org/doc/Documentation/sysctl/vm.txt Linux kernel documentation] for more information. For example:
 +
 +
* {{ic|1=vm.dirty_ratio = 3}}
 +
: Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.
 +
 +
* {{ic|1=vm.dirty_background_ratio = 2}}
 +
: Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.
 +
 +
As noted in the comments for the parameters, one needs to consider the total amount of RAM when setting these values. For example, simplifying by taking the installed system RAM instead of available memory:
 +
 +
* If {{ic|vm.dirty_ratio}} is set to 10 (percent of RAM). Consensus is that 10% of RAM when RAM is say half a GB (so 10% is {{#expr: 500/10 round 0}} MB) is a sane value on spinning disks. But if the machine has much more RAM, say 16 GB (10% is {{#expr: 16/10 round 1 }} GB), the percentage may be out of proportion as it becomes several seconds of writeback on spinning disks. A more sane value in this case is 3 (16*0.03 ~ 491 MB).
 +
 +
* A {{ic|vm.dirty_background_ratio}} setting of 5 (% of RAM) may, similarly, be just fine for small memory values, but again, consider and adjust accordingly for the amount of RAM on a particular system.
 +
 +
== MDADM ==
 +
 +
When the kernel performs a resync operation of a software raid device it tries not to create a high system load by restricting the speed of the operation. Using sysctl it is possible to change the lower and upper speed limit.
 +
 +
{{bc|1=
 +
# Set maximum and minimum speed of raid resyncing operations
 +
dev.raid.speed_limit_max = 10000
 +
dev.raid.speed_limit_min = 1000
 +
}}
 +
 +
If mdadm is compiled as a module {{ic|md_mod}}, the above settings are available only after the module has been loaded. If the settings shall be loaded on boot via {{ic|/etc/sysctl.d}}, the module {{ic|md_mod}} may be loaded beforehand through {{ic|/etc/modules-load.d}}.
  
 
== Troubleshooting ==
 
== Troubleshooting ==
Line 130: Line 146:
 
* The [http://linux.die.net/man/8/sysctl sysctl(8)] and [http://www.unixlore.net/cgi-bin/man/man2html?5+sysctl.conf sysctl.conf(5)] man pages
 
* The [http://linux.die.net/man/8/sysctl sysctl(8)] and [http://www.unixlore.net/cgi-bin/man/man2html?5+sysctl.conf sysctl.conf(5)] man pages
 
* Linux kernel documentation ({{Ic|<kernel source dir>/Documentation/sysctl/}})
 
* Linux kernel documentation ({{Ic|<kernel source dir>/Documentation/sysctl/}})
* [http://gotux.net/arch-linux/sysctl-config/ SysCtl Config Tutorial]
 
 
* Kernel Documentation: [https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt IP Sysctl]
 
* Kernel Documentation: [https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt IP Sysctl]
 +
* SysCtl.conf Tweaked for Security and Cable Speed [http://blog.gotux.net/code/config/sysctl/]{{Dead link|2015|12|14}}
 +
* Kernel network parameters for sysctl [http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.html]

Latest revision as of 07:53, 9 August 2016

sysctl is a tool for examining and changing kernel parameters at runtime (package procps-ng in official repositories). sysctl is implemented in procfs, the virtual process file system at /proc/.

Configuration

Note: From version 207 and 21x, systemd only applies settings from /etc/sysctl.d/*.conf and /usr/lib/sysctl.d/*.conf. If you had customized /etc/sysctl.conf, you need to rename it as /etc/sysctl.d/99-sysctl.conf. If you had e.g. /etc/sysctl.d/foo, you need to rename is to /etc/sysctl.d/foo.conf.

The sysctl preload/configuration file can be created at /etc/sysctl.d/99-sysctl.conf. For systemd, /etc/sysctl.d/ and /usr/lib/sysctl.d/ are drop-in directories for kernel sysctl parameters. The naming and source directory decide the order of processing, which is important since the last parameter processed may override earlier ones. For example, parameters in a /usr/lib/sysctl.d/50-default.conf will be overriden by equal parameters in /etc/sysctl.d/50-default.conf and any configuration file processed later from both directories.

To load all configuration files manually, execute

# sysctl --system 

which will also output the applied hierarchy. A single parameter file can also be loaded explicitly with

# sysctl -p filename.conf

See the new configuration files and more specifically systemd's sysctl.d man page for more information.

The parameters available are those listed under /proc/sys/. For example, the kernel.sysrq parameter refers to the file /proc/sys/kernel/sysrq on the file system. The sysctl -a command can be used to display all currently available values.

Note: If you have the kernel documentation installed (linux-docs), you can find detailed information about sysctl settings in /usr/lib/modules/$(uname -r)/build/Documentation/sysctl/. It is highly recommended reading these before changing sysctl settings.

Settings can be changed through file manipulation or using the sysctl utility. For example, to temporarily enable the magic SysRq key:

# sysctl kernel.sysrq=1

or:

# echo "1" > /proc/sys/kernel/sysrq

To preserve changes between reboots, add or modify the appropriate lines in /etc/sysctl.d/99-sysctl.conf or another applicable parameter file in /etc/sysctl.d/.

Tip: Some parameters that can be applied may depend on kernel modules which in turn might not be loaded. For example parameters in /proc/sys/net/bridge/* depend on the br_netfilter module. If it is not loaded at runtime (or after a reboot), those will silently not be applied. See Kernel modules.

Security

See Security#Kernel hardening.

Networking

Improving performance

Warning: This may cause dropped frames with load-balancing and NATs, only use this for a server that communicates only over your local network.
# reuse/recycle time-wait sockets
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

TCP/IP stack hardening

The following specifies a parameter set to tighten network security options of the kernel for the IPv4 protocol and related IPv6 parameters where an equivalent exists.

For some usecases, for example using the system as a router, other parameters may be useful or required as well.

#### ipv4 networking and equivalent ipv6 parameters ####

## TCP SYN cookie protection (default)
## helps protect against SYN flood attacks
## only kicks in when net.ipv4.tcp_max_syn_backlog is reached
net.ipv4.tcp_syncookies = 1

## protect against tcp time-wait assassination hazards
## drop RST packets for sockets in the time-wait state
## (not widely supported outside of linux, but conforms to RFC)
net.ipv4.tcp_rfc1337 = 1

## sets the kernels reverse path filtering mechanism to value 1(on)
## will do source validation of the packet's recieved from all the interfaces on the machine
## protects from attackers that are using ip spoofing methods to do harm
net.ipv4.conf.all.rp_filter = 1
net.ipv6.conf.all.rp_filter = 1

## tcp timestamps
## + protect against wrapping sequence numbers (at gigabit speeds)
## + round trip time calculation implemented in TCP
## - causes extra overhead and allows uptime detection by scanners like nmap
## enable @ gigabit speeds
net.ipv4.tcp_timestamps = 0
#net.ipv4.tcp_timestamps = 1

## log martian packets
net.ipv4.conf.all.log_martians = 1

## ignore echo broadcast requests to prevent being part of smurf attacks (default)
net.ipv4.icmp_echo_ignore_broadcasts = 1

## ignore bogus icmp errors (default)
net.ipv4.icmp_ignore_bogus_error_responses = 1

## send redirects (not a router, disable it)
net.ipv4.conf.all.send_redirects = 0

## ICMP routing redirects (only secure)
#net.ipv4.conf.all.secure_redirects = 1 (default)
net.ipv4.conf.default.accept_redirects=0
net.ipv4.conf.all.accept_redirects=0
net.ipv6.conf.default.accept_redirects=0
net.ipv6.conf.all.accept_redirects=0

Virtual memory

There are several key parameters to tune the operation of the virtual memory (VM) subsystem of the Linux kernel and the writeout of dirty data to disk. See the Linux kernel documentation for more information. For example:

  • vm.dirty_ratio = 3
Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.
  • vm.dirty_background_ratio = 2
Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.

As noted in the comments for the parameters, one needs to consider the total amount of RAM when setting these values. For example, simplifying by taking the installed system RAM instead of available memory:

  • If vm.dirty_ratio is set to 10 (percent of RAM). Consensus is that 10% of RAM when RAM is say half a GB (so 10% is 50 MB) is a sane value on spinning disks. But if the machine has much more RAM, say 16 GB (10% is 1.6 GB), the percentage may be out of proportion as it becomes several seconds of writeback on spinning disks. A more sane value in this case is 3 (16*0.03 ~ 491 MB).
  • A vm.dirty_background_ratio setting of 5 (% of RAM) may, similarly, be just fine for small memory values, but again, consider and adjust accordingly for the amount of RAM on a particular system.

MDADM

When the kernel performs a resync operation of a software raid device it tries not to create a high system load by restricting the speed of the operation. Using sysctl it is possible to change the lower and upper speed limit.

# Set maximum and minimum speed of raid resyncing operations
dev.raid.speed_limit_max = 10000
dev.raid.speed_limit_min = 1000

If mdadm is compiled as a module md_mod, the above settings are available only after the module has been loaded. If the settings shall be loaded on boot via /etc/sysctl.d, the module md_mod may be loaded beforehand through /etc/modules-load.d.

Troubleshooting

Small periodic system freezes

Set dirty bytes to small enough value (for example 4M):

vm.dirty_background_bytes = 4194304
vm.dirty_bytes = 4194304

Try to change kernel.io_delay_type (x86 only):

  • 0 - IO_DELAY_TYPE_0X80
  • 1 - IO_DELAY_TYPE_0XED
  • 2 - IO_DELAY_TYPE_UDELAY
  • 3 - IO_DELAY_TYPE_NONE

See also

  • The sysctl(8) and sysctl.conf(5) man pages
  • Linux kernel documentation (<kernel source dir>/Documentation/sysctl/)
  • Kernel Documentation: IP Sysctl
  • SysCtl.conf Tweaked for Security and Cable Speed [1][dead link 2015-12-14]
  • Kernel network parameters for sysctl [2]