Difference between revisions of "Maximizing performance"

From ArchWiki
Jump to: navigation, search
(CPU: Added a blurb about how to read the clock frequency of overclocked i5-i7 chips under Linux)
(Tuning IO schedulers)
 
(189 intermediate revisions by 66 users not shown)
Line 1: Line 1:
 
[[Category:Hardware]]
 
[[Category:Hardware]]
 
[[Category:System administration]]
 
[[Category:System administration]]
[[es:Maximizing Performance]]
+
[[ar:Maximizing performance]]
[[ru:Maximizing Performance]]
+
[[es:Maximizing performance]]
[[zh-CN:Maximizing Performance]]
+
[[ja:パフォーマンスの最大化]]
This article is a retrospective analysis and basic rundown about gaining performance in Arch Linux.
+
[[ru:Maximizing performance]]
 +
[[zh-cn:Maximizing performance]]
 +
{{Related articles start}}
 +
{{Related|ccache}}
 +
{{Related|Improve pacman performance}}
 +
{{Related|SSH#Speeding up SSH}}
 +
{{Related|LibreOffice#Speed up LibreOffice}}
 +
{{Related|Openoffice#Speed up OpenOffice}}
 +
{{Related|Laptop}}
 +
{{Related|Swap#Swappiness}}
 +
{{Related|Preload}}
 +
{{Related|Cpulimit}}
 +
{{Related|Improve boot performance}}
 +
{{Related articles end}}
 +
This article provides information on basic system diagnostics relating to performance as well as steps that may be taken to reduce resource consumption or to otherwise optimize the system with the end-goal being either perceived or documented improvements to a system's performance.
  
==The basics==
+
== The basics ==
 +
 
 +
=== Know your system ===
 +
 
 +
The best way to tune a system is to target bottlenecks, or subsystems which limit overall speed. The system specifications can help identify them.
 +
 
 +
* If the computer becomes slow when large applications (such as OpenOffice.org and Firefox) run at the same time, check if the amount of RAM is sufficient. Use the following command, and check the "available" column:
 +
 
 +
$ free -h
 +
 
 +
* If boot time is slow, and applications take a long time to load at first launch (only), then the hard drive is likely to blame. The speed of a hard drive can be measured with the {{ic|hdparm}} command:
 +
 
 +
# hdparm -t /dev/sdx
 +
 
 +
{{Pkg|hdparm}} indicates only the pure read speed of a hard drive, and is not a valid benchmark. A value higher than 40MB/s (while idle) is however acceptable on an average system.
 +
 
 +
* If CPU load is consistently high even with enough RAM available, then lowering CPU use should be a priority. This can be monitored in several ways, for example with {{Pkg|htop}}:
 +
 
 +
$ htop
 +
 
 +
* If only applications using direct rendering are slow (i.e those which use the GPU, such as video players and games), then improving GPU performance should help. The first step is to verify if direct rendering is actually enabled. This is indicated by the {{ic|glxinfo}} command:
  
===Know your system===
 
The best way to tune a system is to target the bottlenecks, that is the subsystems that limit the overall speed. They usually can be identified by knowing the specifications of the system, but there are some basic indications:
 
* If the computer becomes slow when big applications, like OpenOffice.org and Firefox, are running at the same time, then there is a good chance the amount of RAM is insufficient. To verify available RAM, use this command, and check for the line beginning with -/+buffers:
 
$ free -m
 
* If boot time is really slow, and if applications take a lot of time to load the first time they are launched, but run fine afterwards, then the hard drive is probably too slow. The speed of a hard drive can be measured using the hdparm command:
 
$ hdparm -t /dev/harddrive
 
This is only the pure read speed of the hard drive, and is not a valid benchmark, but a value superior to 40MB/s (assuming drive tested while idle) can be considered decent on an average system.
 
* If the CPU load is consistently high even when RAM is available, then lowering CPU usage should be a priority. CPU load can be monitored in many ways, like using the top command:
 
$ top
 
* If the only applications lagging are the ones using direct rendering, meaning they use the graphic card, like video players and games, then improving the graphic performance should help. First step would be to verify if direct rendering simply is not enabled. This is indicated by the glxinfo command:
 
 
  $ glxinfo | grep direct
 
  $ glxinfo | grep direct
 +
 +
{{ic|glxinfo}} is part of the {{Pkg|mesa-demos}} package.
  
 
===The first thing to do===
 
===The first thing to do===
 +
 
The simplest and most efficient way of improving overall performance is to run lightweight environments and applications.
 
The simplest and most efficient way of improving overall performance is to run lightweight environments and applications.
* Use a [[Window Manager|window manager]] instead of a [[Desktop Environment]]. Choices include [[dwm]], [[wmii]], [[Awesome]], [[Openbox]], [[Fluxbox]] and [[JWM]].
 
* Choose a minimal Desktop Environment over [[GNOME]] and [[KDE]]. Choices include [[LXDE]] and [[Xfce]].
 
* Using lightweight applications. Search [[Common Applications]] for console applications and the Light and Fast Applications Awards threads in the forum: [https://bbs.archlinux.org/viewtopic.php?id=41168 2007], [https://bbs.archlinux.org/viewtopic.php?id=67951 2008], [https://bbs.archlinux.org/viewtopic.php?id=78490 2009], [https://bbs.archlinux.org/viewtopic.php?id=88515 2010], and [https://bbs.archlinux.org/viewtopic.php?id=111878 2011].
 
* Remove unnecessary [[daemons]] and background what daemons you can in {{ic|/etc/rc.conf}}.
 
  
===Compromise===
+
* Use a [[window manager]] instead of a [[desktop environment]].
Almost all tuning brings drawbacks. Lighter applications usually come with less features and some tweaks may make a system unstable, or simply require time to implement and maintain. This page tries to highlight those drawbacks, but the final judgment rests on the user.
+
* Use {{ic|pstree}} or {{Pkg|htop}} to list running daemons and their resource use.
 +
 
 +
=== Benchmarking ===
 +
 
 +
The effects of optimization are often difficult to judge. They can however be measured by [[benchmarking]] tools.
 +
 
 +
== Storage devices ==
  
===Benchmarking===
+
{{Poor writing|Subjective writing}}
The effects of optimization are often difficult to judge. They can however be measured by [[benchmarking]] tools
+
  
==Storage devices==
+
=== Swap files ===
===Device Layout===
+
One of the biggest performance gains comes from having multiple storage devices in a layout that spreads the operating system work around.  Having {{ic|/}} {{ic|/home}} {{ic|/var}} and {{ic|/usr}} on separate disks is dramatically faster than a single disk layout where they are all on the same hard drive.
+
  
====Swap Files====
 
 
Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory.
 
Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory.
  
====RAID Benefits====
+
=== RAID ===
If you have multiple disks (2 or more) available, you can set them up as a software [[RAID]] for serious speed improvements. In a RAID 0 array there is no redundancy in case of drive failure, but for each additional disk you add to the array, the speed of the disk becomes that much faster.  The smart choice is to use RAID 5 which offers both speed and data protection.
+
 
 +
If you have multiple disks available, you can set them up as a software [[RAID]] for serious speed improvements.
 +
 
 +
=== Multiple hardware paths ===
  
====Multiple Hardware Paths====
 
 
An internal hardware path is how the storage device is connected to your motherboard.  There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc.  By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire.  The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes.
 
An internal hardware path is how the storage device is connected to your motherboard.  There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc.  By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire.  The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes.
  
 
More Examples
 
More Examples
 +
 
# Directly to the motherboard using pci/PCIe/ata
 
# Directly to the motherboard using pci/PCIe/ata
 
# Using an external enclosure to house the disk over USB/Firewire
 
# Using an external enclosure to house the disk over USB/Firewire
Line 53: Line 81:
 
Note also that if you have a 2 USB ports on the front of your machine, and 4 USB ports on the back, and you have 4 disks, it would probably be fastest to put 2 on front/2 on back or 3 on back/1 on front.  This is because internally the front ports are likely a separate Root Hub than the back, meaning you can send twice as much data by using both than just 1.  Use the following commands to determine the various paths on your machine.
 
Note also that if you have a 2 USB ports on the front of your machine, and 4 USB ports on the back, and you have 4 disks, it would probably be fastest to put 2 on front/2 on back or 3 on back/1 on front.  This is because internally the front ports are likely a separate Root Hub than the back, meaning you can send twice as much data by using both than just 1.  Use the following commands to determine the various paths on your machine.
  
{{hc|USB Device Tree|$ lsusb -t}}
+
{{hc|USB Device Tree|$ lsusb -tv}}
  
{{hc|PCI Device Tree|$ lspci -t}}
+
{{hc|PCI Device Tree|$ lspci -tv}}
  
===Partitioning===
+
=== Partitioning ===
The partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the center of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition (10gb, more or less depending on your needs) only for your system, as near to the beginning of the drive as possible. Other data (pictures, videos) should be kept on a separate partition, and this is usually achieved by separating the home directory (/home/user) from the system (/).
+
  
===Choosing and tuning your filesystem===
+
If using a traditional spinning HDD, your partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the outside of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition (10GB, more or less depending on your needs) only for your system, as near to the beginning of the drive as possible. Other data (pictures, videos) should be kept on a separate partition, and this is usually achieved by separating the home directory ({{ic|/home/''user''}}) from the system ({{ic|/}}).
Choosing the best filesystem for a specific system is very important because each has its own strengths. The [[Beginner%27s_Guide#Filesystem_types|beginner's guide]] provides a short summary of the most popular ones. You can also find relevant articles [[:Category:File systems|here]].
+
  
====Summary====
+
=== Choosing and tuning your filesystem ===
*XFS: Excellent performance with large files. Low speed with small files. A good choice for /home.
+
*Reiserfs: Excellent performance with small files. A good choice for /var.
+
*Ext3: Average performance, reliable.
+
*Ext4: Great overall performance, reliable, has performance issues with sqlite and some other databases.
+
*JFS: Good overall performance, very low CPU usage, extremely fast resume after power failure.
+
*Btrfs: Probably best overall performance (with compression) and lots of features. Still in heavy development and fully supported, but considered as unstable. Do not use this filesystem yet unless you know what you are doing and are prepared for potential data loss.
+
  
====Mount options====
+
Choosing the best filesystem for a specific system is very important because each has its own strengths. The [[File systems]] article provides a short summary of the most popular ones. You can also find relevant articles [[:Category:File systems|here]].
Mount options offer an easy way to improve speed without reformatting. They can be set using the mount command:
+
$ mount -o option1,option2 /dev/partition /mnt/partition
+
To set them permanently, you can modify /etc/fstab to make the relevant line look like this:
+
/dev/partition /mnt/partition partitiontype option1,option2 0 0
+
The mount options {{Ic|noatime,nodiratime}} are known for improving performance on almost all file-systems. The former is a superset of the latter (which applies to directories only -- {{Ic|noatime}} applies to both files and directories). In rare cases, for example if you use mutt, it can cause minor problems. You can instead use the {{Ic|relatime}} option (NB relatime is the default in >2.6.30)
+
  
====Ext3====
+
==== Mount options ====
See [[Ext3]].
+
  
====Ext4====
+
The [[fstab#atime options|noatime]] option is known to improve performance of the filesystem.
See [[Ext4#Tips_and_tricks | Ext4]].
+
  
====JFS====
+
Other mount options are filesystem specific, therefore see the relevant articles for the filesystems:
See [[JFS Filesystem#Optimizations| JFS Filesystem]].
+
  
====XFS====
+
* [[Ext3]]
For optimal speed, create an XFS file-system with:
+
* [[Ext4#Tips and tricks]]
$ mkfs.xfs -l internal,lazy-count=1,size=128m -d agcount=2 /dev/thetargetpartition
+
* [[JFS Filesystem#Optimizations]]
 +
* [[XFS]]
 +
* [[Btrfs#Defragmentation]] and [[Btrfs#Compression]]
 +
* [[ZFS#Tuning]]
  
==== Reiserfs ====
+
===== Reiserfs =====
  
The {{Ic|<nowiki>data=writeback</nowiki>}} mount option improves speed, but may corrupt data during power loss. The {{Ic|notail}} mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem:  
+
The {{Ic|1=data=writeback}} mount option improves speed, but may corrupt data during power loss. The {{Ic|notail}} mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem:  
  
  $ mkreiserfs –j /dev/hda1 /dev/hdb1
+
  # mkreiserfs –j /dev/sd'''a1''' /dev/sd'''b1'''
  
Replace /dev/hda1 with the partition reserved for the journal, and /dev/hdb1 with the partition for data.  You can learn more about reiserfs with this [http://www.funtoo.org/en/articles/linux/ffg/2/ article].
+
Replace {{ic|/dev/sd'''a1'''}} with the partition reserved for the journal, and {{ic|/dev/sd'''b1'''}} with the partition for data.  You can learn more about reiserfs with this [http://www.funtoo.org/Funtoo_Filesystem_Guide,_Part_2 article].
  
====Btrfs====
+
=== Tuning kernel parameters ===
See [[Btrfs#Defragmentation|defragmentation]] and [[Btrfs#Compression|compression]].
+
  
===Compressing /usr===
+
There are several key tunables affecting the performance of block devices, see [[sysctl#Virtual memory]] for more information.
{{Note|As of version 3.0 of the Linux kernel, aufs2 is no longer supported.}}
+
{{out of date|aufs is no longer in the official repos. Also, read the Note box above.}}
+
A way to speed up reading from the hard drive is to compress the data, because there is less data to be read. It must however be decompressed, which means a greater CPU load. Some file systems support transparent compression, most notably Btrfs and reiserfs4, but their compression ratio is limited by the 4k block size. A good alternative is to compress {{ic|/usr}} in a squashfs file, with a 64k(128k) block size, as instructed in this [http://forums.gentoo.org/viewtopic-t-646289.html Gentoo forums thread]. What this tutorial does is basically to compress the {{ic|/usr}} folder into a compressed squashfs file-system, then mounts it with aufs. A lot of space is saved, usually two thirds of the original size of {{ic|/usr}}, and applications load faster. However, each time an application is installed or reinstalled, it is written uncompressed, so {{ic|/usr}} must be re-compressed periodically. Squashfs is already in the kernel, and aufs2 is in the official repositories, so no kernel compilation is needed if using the stock kernel.
+
Since the linked guide is for Gentoo, the next commands outline the steps specifically for Arch. To get it working, [[pacman|install]] the packages {{pkg|aufs2}} and {{pkg|squashfs-tools}}. These packages provide the aufs-modules and some userspace-tools for the squash-filesystem.
+
  
Now we need some extra directories where we can store the archive of {{ic|/usr}} as read-only and another folder where we can store the data changed after the last compression as writeable:
+
=== Tuning IO schedulers ===
# mkdir -p /squashed/usr/{ro,rw}
+
Now that we got a rough setup you should perform a complete system-upgrade since every change of content in {{ic|/usr}} after the compression will be excluded from this speedup. If you use prelink you should also perform a complete prelink before creating the archive. Now it is time to invoke the command to compress {{ic|/usr}}:
+
# mksquashfs /usr /squashed/usr/usr.sfs -b 65536
+
These parameters/options are the ones suggested by the Gentoo link but there might be some room for improvement using some of the options described [http://www.tldp.org/HOWTO/SquashFS-HOWTO/mksqoverview.html#mksqusing here].
+
Now to get the archive mounted together with the writeable folder it is necessary to edit {{ic|/etc/fstab}} and add the following lines:
+
/squashed/usr/usr.sfs  /squashed/usr/ro  squashfs  loop,ro  0 0
+
usr    /usr    aufs    udba=reval,br:/squashed/usr/rw:/squashed/usr/ro  0 0
+
Now you should be done and able to reboot. The original author suggests to delete all the old content of {{ic|/usr}}, but this might cause some problems if anything goes wrong during some later re-compression. It is safer to leave the old files in place.
+
  
A [https://bbs.archlinux.org/viewtopic.php?pid=714052 Bash script] has been created that will automate the process of re-compressing (read updating) the archive since the tutorial is meant for Gentoo and some options do not correlate to what they should be in Arch.
+
{{Style|Theoretical background should be described at the top or the bottom, not in the middle. Use subsections to make the structure more clear.}}
  
===Tuning for an SSD===
+
The kernel officially supports the following schedulers for storage disk in-/output (IO):
[[SSD#Tips_for_Maximizing_SSD_Performance]]
+
*[[wikipedia:CFQ|CFQ]] scheduler (Completely Fair Queuing)
 +
*[[wikipedia:NOOP_scheduler|NOOP]]
 +
*[[wikipedia:Deadline_scheduler|Deadline]].
  
===RAM disks / tuning for really slow disks===
+
Unofficial support is available through the [[Linux-ck#How_to_enable_the_BFQ_I.2FO_Scheduler|BFQ]] (Budget Fair Queueing) which is compiled the {{Pkg|linux-zen}} kernel as well as many kernels in the [[AUR]].
* [http://cs.joensuu.fi/~mmeri/usbraid/ USB stick RAID]
+
* [https://bbs.archlinux.org/viewtopic.php?pid=493773#p493773 Combine RAM disk with disk in RAID]
+
  
==CPU==
+
A more contemporary option (since kernel version 3.16) is [https://www.thomas-krenn.com/en/wiki/Linux_Multi-Queue_Block_IO_Queueing_Mechanism_(blk-mq) Multi-Queue Block IO Queuing Mechanism] or blk-mq for short. Blk-mq leverages a CPU with multiple cores to map I/O queries to multiple queues. The tasks are distributed across multiple threads and therefore to multiple CPU cores (per-core software queues) and can speed up read/write operations vs. traditional I/O schedulers.
The only way to directly improve CPU speed is overclocking. As it is a complicated and risky task, it is not recommended for anyone except experts. The best way to overclock is through the BIOS. When purchasing your system, keep in mind that most Intel motherboards are notorious for disabling the capacity to overclock.
+
  
Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The i7z package is available in the community repo and i7z-svn is available in the AUR.
+
One can be enable blk-mq by adding the following to the kernel's boot line:
 +
scsi_mod.use_blk_mq=1
 +
 
 +
A HDD has spinning disks and head that move physically to the required location. Such structure leads to following characteristics:
 +
* random latency is quite high, for modern HDDs it is ~10ms (ignoring a disk controller write buffer).
 +
* sequential access provides much higher throughput. In this case the head needs to move less distance.
 +
 
 +
If we have a lot of running processes that make IO requests to different parts of storage (i.e. random access) then we can expect that a disk handles ~100 IO requests per second. Because modern systems can easily generate load much higher than 100 requests per second we have a queue of requests that have to wait for access to the storage. One way to improve throughput is to linearize access, i.e. order waiting requests by its logical address and always choose the closest request. Historically this was the first Linux IO scheduler called elevator scheduler.
 +
 
 +
One of the problems with the elevator algorithm is that it makes suffer processes with sequential access. Such processes read a block of data then process it for several microseconds then read next block and so on. The elevator scheduler does not know that the process is going to read another block nearby and, thus, moves to another request at some other location. To overcome the problem anticipatory IO scheduler was added. For synchronous requests this algorithm waits for a short amount of time before moving to another request.
 +
 
 +
While these schedulers try to improve total throughput they also might leave some unlucky requests waiting for a very long time. As an example, imagine the majority of processes make requests at the beginning of storage space while an unlucky process makes a request at the other end of storage. So developers tried to make the algorithm more fair and the deadline scheduler was added. It has a queue ordered by address (the same as elevator). If some request sits in this queue for a long time then it moves to an "expired" queue ordered by expire time. The scheduler checks the expire queue first and processes requests from there and only then moves to elevator queue. It is important to understand that this algorithm sacrifices total throughput for fairness.
 +
 
 +
{{Accuracy|The CFQ scheduler now contains optimizations for SSDs.}}
 +
 
 +
CFQ (the default scheduler nowadays) aggregates all ideas from above and adds {{ic|cgroup}} support that allows to reserve some amount of IO to a specific {{ic|cgroup}}. It is useful on shared (and cloud) hosting - users who paid for 20 IO/s want to get their share if needed.
 +
 
 +
{{Accuracy|The deadline scheduler can perform better than noop even for SSDs.}}
 +
 
 +
The characteristics of a SSD are different. It does not have moving parts. Random access is as fast as sequential one. An SSD can handle multiple requests at the same time. Modern devices' throughput ~10K IO/s, which is higher than workload on most systems. Essentially a user cannot generate enough requests to saturate a SDD, the requests queue is effectively always empty. In this case IO scheduler does not provide any improvements. Thus, it is recommended to use the noop scheduler for an SSD.
 +
 
 +
{{Accuracy|1=Bits below were restored from [https://wiki.archlinux.org/index.php?title=Solid_State_Drives&diff=next&oldid=411764]. The blk-mq scheduler discussed above may be activated differently.}}
 +
 
 +
It is possible to change the scheduler at runtime and even to use different schedulers for separate storage devices at the same time. Available schedulers can be queried by viewing the contents of {{ic|/sys/block/sd'''X'''/queue/scheduler}} (the active scheduler is denoted by brackets):
 +
 
 +
{{hc|$ cat /sys/block/sd'''X'''/queue/scheduler|
 +
noop deadline [cfq]
 +
}}
 +
 
 +
Users can change the active scheduler at runtime without the need to reboot, for example:
 +
 
 +
# echo noop > /sys/block/sd'''X'''/queue/scheduler
 +
 
 +
This method is non-persistent and will be lost upon rebooting.
 +
 
 +
==== Kernel parameter (for a single device) ====
 +
 
 +
If the sole storage device in the system is an SSD, consider setting the I/O scheduler for the entire system via the {{ic|1=elevator=noop}} [[kernel parameter]].
 +
 
 +
==== systemd-tmpfiles ====
 +
 
 +
If you have more than one storage device, or wish to avoid clutter on the kernel cmdline, you can set the I/O scheduler via {{ic|systemd-tmpfiles}}:
 +
 
 +
{{hc|/etc/tmpfiles.d/10_ioscheduler.conf|2=
 +
w /sys/block/sdX/queue/scheduler - - - - noop
 +
}}
 +
 
 +
For more detail on {{ic|systemd-tmpfiles|}} see [[Systemd#Temporary files]].
 +
 
 +
==== Using udev for one device or HDD/SSD mixed environment ====
 +
 
 +
Though the above will undoubtedly work, it is probably considered a reliable workaround.  Ergo, it would be preferred to use the system that is responsible for the devices in the first place to implement the scheduler.  In this case it is udev, and to do this, all one needs is a simple [[udev]] rule.
 +
 
 +
To do this, create the following:
 +
 
 +
{{hc|/etc/udev/rules.d/60-schedulers.rules|<nowiki>
 +
# set deadline scheduler for non-rotating disks
 +
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"
 +
</nowiki>}}
 +
 
 +
Of course, set Deadline/CFQ to the desired schedulers.  Changes should occur upon next boot.  To check success of the new rule:
 +
 
 +
$ cat /sys/block/sd'''X'''/queue/scheduler  # where '''X''' is the device in question
 +
 
 +
{{Note|In the example sixty is chosen because that is the number udev uses for its own persistent naming rules. Thus, it would seem that block devices are at this point able to be modified and this is a safe position for this particular rule. But the rule can be named anything so long as it ends in {{ic|.rules}}.)}}
 +
 
 +
=== Power management configuration ===
 +
When dealing with traditional rotational disks (HDD's) you may want to [[Hdparm#Power_management_configuration|lower or disable power saving features]] completely.
 +
 
 +
=== RAM disks ===
 +
 
 +
See [[tmpfs]].
 +
 
 +
=== USB storage devices ===
 +
 
 +
{{Accuracy|Commands still unclear}}
 +
 
 +
If USB drives like pendrives are slow to copy files, append these three lines in a [[systemd]] tmpfile:
 +
 
 +
{{hc|/etc/tmpfiles.d/local.conf|
 +
w /sys/kernel/mm/transparent_hugepage/enabled - - - - madvise
 +
w /sys/kernel/mm/transparent_hugepage/defrag - - - - madvise
 +
w /sys/kernel/mm/transparent_hugepage/khugepaged/defrag - - - - 0
 +
}}
 +
 
 +
See also [[sysctl#Virtual memory]], [http://unix.stackexchange.com/questions/107703/why-is-my-pc-freezing-while-im-copying-a-file-to-a-pendrive] and [http://lwn.net/Articles/572911/].
 +
 
 +
== CPU ==
 +
 
 +
There are few ways to get more performance:
 +
* Overclock the CPU. Please note that most CPUs cannot be overclocked (locked by the vendor).
 +
** Nehalem (Core i#) CPUs already use overclocking (Turbo technology). This limits the possibilities of overclocking heavily (unless you use liquid cooling).
 +
* [[CPU frequency scaling|Change the CPU governor]]
 +
 
 +
=== Overclocking ===
 +
 +
The only way to directly improve CPU speed is overclocking. As it is a complicated and risky task, it is not recommended for anyone except experts. The best way to overclock is through the BIOS. When purchasing your system, keep in mind that most Intel motherboards are notorious for disabling the capability to overclock.
 +
 
 +
Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The {{Pkg|i7z}} and {{AUR|i7z-git}} packages are available.
  
 
A way to modify performance ([http://lkml.org/lkml/2009/9/6/136 ref]) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS).
 
A way to modify performance ([http://lkml.org/lkml/2009/9/6/136 ref]) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS).
  
Kernel PKGBUILDs that include the BFS patch can be installed from the [[AUR]] or [[Unofficial User Repositories]]. See the respective pages for {{AUR|linux-ck}} and [[Linux-ck]] wiki page, {{AUR|linux-bfs}} or {{AUR|linux-pf}} for more information on their additional patches.
+
Kernel PKGBUILDs that include the BFS patch can be installed from the [[AUR]] or [[Unofficial user repositories]]. See the respective pages for {{AUR|linux-ck}} and [[Linux-ck]] wiki page, {{AUR|linux-pf}} and [[Linux-pf]] wiki page or {{AUR|linux-bfs}}{{Broken package link|{{aur-mirror|linux-bfs}}}} for more information on their additional patches.
  
{{out of date|The kernel.org link is dead in the following Note box.}}
+
{{Note|BFS/CK are designed for desktop/laptop use and not servers. They provide low latency and work well for 16 CPUs or less. Also, Con Kolivas suggests setting HZ to 1000. For more information, see the [http://ck.kolivas.org/patches/bfs/bfs-faq.txt BFS FAQ] and [http://users.on.net/~ckolivas/kernel/ Kernel patch homepage of Con Kolivas].}}
{{Note|BFS/CK are designed for desktop/laptop use and not servers. They provide low latency and work well for 16 CPUs or less. Also, Con Kolivas suggests setting HZ to 1000. For more information, see the [http://ck.kolivas.org/patches/bfs/bfs-faq.txt BFS FAQ] and [https://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.37/2.6.37-ck1/patches/ ck patches].}}
+
  
===Verynice===
+
=== Verynice ===
[[Verynice]] is a daemon, available in the [[AUR]] as {{AUR|verynice}}, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as ''goodexe'' in {{ic|/etc/verynice.conf}}. Similarly, CPU-hungry executables running in the background, like make, can be defined as ''badexe''. This prioritization greatly improves system responsiveness under heavy load.
+
  
===Ulatencyd===
+
[[VeryNice]] is a daemon, available in the {{AUR|verynice}} package, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as ''goodexe'' in {{ic|/etc/verynice.conf}}. Similarly, CPU-hungry executables running in the background, like make, can be defined as ''badexe''. This prioritization greatly improves system responsiveness under heavy load.
[[Ulatencyd]] is a daemon that controls how the Linux kernel will spend its resources on the running processes. It uses dynamic cgroups to give the kernel hints and limitations on processes. It supports prioritizing processes for disk I/O as well as CPU shares, and uses more clever heuristics than Verynice. In addition, it comes with a good set of configs out of the box.
+
  
One note of warning, by default it changes the default scheduler of all block devices to cfq, to disable behavior see [[Ulatencyd]].
+
=== Ananicy ===
  
==Network==
+
[https://github.com/Nefelim4ag/Ananicy Ananicy] is a daemon, available in the {{AUR|ananicy-git}} package, for auto adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources.
See relevant section in [[General Recommendations#Networking|General Recommendations]].
+
  
==Graphics==
+
=== cgroups ===
  
===Xorg.conf configuration===
+
See [[cgroups]].
Graphic performance heavily depends on the settings in {{ic|/etc/X11/xorg.conf}}. There are tutorials for [[Nvidia]], [[ATI]] and [[Intel]] cards. Improper settings may stop Xorg from working, so caution is advised.
+
  
===Driconf===
+
=== irqbalance ===
Driconf is a small utility that allows you to change the direct rendering settings for open source drivers. Enabling HyperZ can drastically improve performance.
+
  
===GPU Overclocking===
+
The purpose of {{Pkg|irqbalance}} is distribute hardware interrupts across processors on a multiprocessor system in order to increase performance. It can be [[systemd#Using units|controlled]] by the provided {{ic|irqbalance.service}}.
Overclocking a graphics card is typically more expedient than with a CPU, since there are readily accessible software packages which allow for on-the-fly GPU clock adjustments. For ATI users, get {{AUR|rovclock}}, and Nvidia users should get nvclock in the extra repository. Intel chipsets users can install [http://www.gmabooster.com/ GMABooster] from with the {{AUR|gmabooster}} AUR package.
+
  
The changes can be made permanent by running the appropriate command after X boots, for example by adding it to {{ic|~/.xinitrc}}. A safer approach would be to only apply the overclocked settings when needed.
+
== Graphics ==
 +
 
 +
As with CPUs, overclocking can directly improve performance, but is generally recommended against. There are several packages in the [[AUR]], such as {{AUR|rovclock}}{{Broken package link|{{aur-mirror|rovclock}}}}, {{AUR|amdoverdrivectrl}} (ATI), and {{AUR|nvclock}} (NVIDIA).
 +
 
 +
=== Xorg.conf configuration ===
 +
 
 +
Graphics performance may depend on the settings in {{ic|/etc/X11/xorg.conf}}; see the [[NVIDIA]], [[ATI]] and [[Intel]] articles. Improper settings may stop Xorg from working, so caution is advised.
 +
 
 +
=== Driconf ===
 +
 
 +
{{Pkg|driconf}} is a small utility which allows to change direct rendering settings for open source drivers. Enabling ''HyperZ'' may improve performance.
 +
 
 +
== RAM and swap ==
  
==RAM and swap==
 
 
=== Relocate files to tmpfs ===
 
=== Relocate files to tmpfs ===
Relocate files, such as your browser profile, to a [[Wikipedia:tmpfs|tmpfs]] file system, including {{ic|/tmp}}, or {{ic|/dev/shm}} for improvements in application response as all the files are now stored in RAM. Another benefit is a reduction in disk read and write operations, of which SSDs benefit the most.
+
Relocate files, such as your browser profile, to a [[Wikipedia:tmpfs|tmpfs]] file system, for improvements in application response as all the files are now stored in RAM:
 +
*Refer to [[Profile-sync-daemon]] for syncing browser profiles.
 +
*Refer to [[Anything-sync-daemon]] for syncing any specified folder.
 +
*Refer to [[Makepkg#tmpfs]] for improving compile times when building packages.
  
Use an active management script for maximal reliability and ease of use. 
+
=== Root on RAM overlay ===
  
Refer to the [[Profile-sync-daemon]] wiki article for more information on syncing browser profiles.
+
If running off a slow writing medium (USB, spinning HDDs) and storage requirements are low, the root may be run on a RAM overlay ontop of read only root (on disk). This can vastly improve performance at the cost of a limited writable space to root. See {{AUR|liveroot}}.
  
Refer to the [[Anything-sync-daemon]] wiki article for more information on syncing any specified folder.
+
=== Zram or zswap ===
  
=== Swappiness ===
+
The [https://www.kernel.org/doc/Documentation/blockdev/zram.txt zram] kernel module (previously called '''compcache''') provides a compressed block device in RAM. If you use it as swap device, the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Using zram is also a good way to reduce disk read/write cycles due to swap on SSDs.
  
The swappiness represent how much the kernel prefers swap to RAM. Setting it to a very low value, meaning the kernel will almost always use RAM, is known to improve responsiveness on many systems. To do that, simply add these lines to {{ic|/etc/sysctl.conf}}:
+
Similar benefits (at similar costs) can be achieved using [[zswap]] rather than zram. The two are generally similar in intent although not operation: zswap operates as a compressed RAM cache and neither requires (nor permits) extensive userspace configuration.
  
vm.swappiness=20
+
Example: To set up one lz4 compressed zram device with 32GiB capacity and a higher-than-normal priority (only for the current session):
vm.vfs_cache_pressure=50
+
  
To test and more on why this may work, take a look at this [http://rudd-o.com/en/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that article].
+
# modprobe zram
 +
# echo lz4 > /sys/block/zram0/comp_algorithm
 +
# echo 32G > /sys/block/zram0/disksize
 +
# mkswap --label zram0 /dev/zram0
 +
# swapon --priority 100 /dev/zram0
  
===Compcache / Zram ===
+
To disable it again, either reboot or run
[https://code.google.com/p/compcache/ Compcache], nowadays replaced by the '''zram''' kernel module, creates a device in RAM and compresses it. If you use for swap means that part of the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Zram is in mainline staging (therefore its not stable yet, use with caution).
+
  
The AUR package {{aur|zramswap}} provides an automated script fot setting up such swap devices with optimal settings for your system (such as RAM size and CPU core number). The script creates one zram device per CPU core with a total space equivalent to the RAM available. To do this automatically on every boot:
+
# swapoff /dev/zram0
 +
# rmmod zram
  
* If you use [[rc.conf|initscripts]], add {{ic|zramswap}} to the DAEMONS array in {{ic|/etc/rc.conf}}.
+
A detailed explanation of all steps, options and potential problems is provided in the official documentation of the module [https://www.kernel.org/doc/Documentation/blockdev/zram.txt here].
* If you use [[systemd]], enable {{ic|zramswap.service}} via systemctl.
+
  
You will have a compressed swap with higher priority than your regular swap which will utilize multiple CPU cores for compessing data.
+
The {{Pkg|systemd-swap}} package provides a {{ic|systemd-swap.service}} unit to automatically initialize zram devices. Configuration is possible in {{ic|/etc/systemd-swap.conf}}.
  
{{Tip|Using zram is also a good way to reduce disk read/write cycles due to swap on SSDs.}}
+
The package {{AUR|zramswap}} provides an automated script for setting up such swap devices with optimal settings for your system (such as RAM size and CPU core number). The script creates one zram device per CPU core with a total space equivalent to the RAM available, so you will have a compressed swap with higher priority than regular swap, which will utilize multiple CPU cores for compessing data. To do this automatically on every boot, [[enable]] {{ic|zramswap.service}}.
  
===Using the graphic card's RAM===
+
==== Swap on zRAM using a udev rule ====
In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See [[Swap on video ram]].
+
  
=== Preloading ===
+
The example below describes how to set up swap on zRAM automatically at boot with a single udev rule. No extra package should be needed to make this work.
Preloading is the action of putting and keeping target files into the RAM. The benefit is that preloaded applications start more quickly because reading from the RAM is always quicker than from the hard drive. However, part of your RAM will be dedicated to this task, but no more than if you kept the application open. Therefore preloading is best used with large and often-used applications like Firefox and OpenOffice.
+
==== Go-preload ====
+
[https://aur.archlinux.org/packages.php?ID=34207 Go-preload] is a small daemon created in the [http://forums.gentoo.org/viewtopic-t-789818-view-next.html?sid=5457cff93039fc7d4a3e445ef90f9821 Gentoo forum]. To use it, first run this command in a terminal for each program you want to preload at boot:
+
# gopreload-prepare program
+
Then, as instructed, press Enter when the program is fully loaded. This will add a list of files needed by the program in {{ic|/usr/share/gopreload/enabled}}. To load all lists at boot, add {{ic|gopreload}} to your DAEMONS array in {{ic|/etc/rc.conf}}. To disable the loading of a program, remove the appropriate list in {{ic|/usr/share/gopreload/enabled}} or move it to {{ic|/usr/share/gopreload/disabled}}.
+
  
====Preload====
+
First, enable the module:
A more automated approach is used by [[Preload]]. All you have to do is add it to your DAEMONS array in {{ic|/etc/rc.conf}}. It will monitor the most used files on your system, and with time build its own list of files to preload at boot.
+
====Readahead====
+
[[Readahead]] is a tool that can cache files before they are needed and help accelerate program loading.
+
  
==Boot time==
+
{{hc|/etc/modules-load.d/zram.conf|<nowiki>
You can find tutorials with good tips in the article [[Improve Boot Performance]].
+
zram
 +
</nowiki>}}
  
===Suspend to RAM===
+
Configure the number of /dev/zram nodes you need.
The best way to reduce boot time is not booting at all. Consider [[Suspend to RAM|suspending your system to RAM]] instead.
+
  
===Kernel boot options===
+
{{hc|/etc/modprobe.d/zram.conf|<nowiki>
Some boot options can decrease kernel boot time. The {{Ic|fastboot}} option usually can take off one second or so (but it sacrifices checking rootfs by fsck). Also, if you see a message saying "Waiting 8s for device XXX" at boot, adding {{Ic|<nowiki>rootdelay=1</nowiki>}} can reduce the waiting time, but be careful, as it may break the booting process. To set these options, see [[kernel parameters]].
+
options zram num_devices=2
 +
</nowiki>}}
  
===Custom kernel===
+
Create the udev rule as shown in the example.
Compiling a custom kernel will reduce boot time and memory usage, but can be long, complicated and even painful. It usually is not worth the effort, but can be very interesting and a great learning experience. If you really know what you are doing, start [[Kernel Compilation|here]].
+
  
==Application-specific tips==
+
{{hc|/etc/udev/rules.d/99-zram.rules|<nowiki>
===Firefox===
+
KERNEL=="zram0", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram0", TAG+="systemd"
The [[Firefox Tweaks]] article offers good tips; notably [[Firefox Tweaks#Turning off anti-phishing|turning off anti-phishing]], [[Firefox Tweaks#Improve rendering by disabling pango |disabling Pango]] and [[Firefox Tweaks#Defragment the profile's SQLite databases|cleaning the SQlite database]]. See also: [[Firefox Ramdisk|Firefox in Ramdisk]].
+
KERNEL=="zram1", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram1", TAG+="systemd"
 +
</nowiki>}}
  
Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build.
+
Add /dev/zram to your fstab.
To do this append
+
ac_add_options --enable-profile-guided-optimization
+
to your mozconfig.
+
  
===Gcc/Makepkg===
+
{{hc|/etc/fstab|<nowiki>
See [[Ccache]].
+
/dev/zram0 none swap defaults 0 0
 +
/dev/zram1 none swap defaults 0 0
 +
</nowiki>}}
  
===LibreOffice===
+
=== Using the graphic card's RAM ===
See [[LibreOffice#Speed up LibreOffice|Speed up LibreOffice]].
+
  
===Pacman===
+
In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See [[Swap on video ram]].
See [[Improve Pacman Performance]].
+
  
===SSH===
+
== Network ==
See [[SSH#Speeding up SSH|Speed up SSH]].
+
 
 +
Every time a connections is made, the system must first resolve a fully qualified domain name to an IP address before the actual connection can be done. Response times of network requests can be improved by caching DNS queries locally. Common tools for this purpose include [[pdnsd]], [[dnsmasq]], [[unbound]] and {{AUR|rescached-git}}.
 +
 
 +
== Application-specific tips ==
 +
 
 +
=== Firefox ===
 +
 
 +
See [[Firefox tweaks#Performance]] and [[Firefox on RAM]].
 +
 
 +
Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build.
 +
To do this append:
 +
 
 +
ac_add_options --enable-profile-guided-optimization
  
==Laptops==
+
to your {{ic|.mozconfig}} file.
See [[Laptop]].
+

Latest revision as of 15:18, 9 June 2016

This article provides information on basic system diagnostics relating to performance as well as steps that may be taken to reduce resource consumption or to otherwise optimize the system with the end-goal being either perceived or documented improvements to a system's performance.

The basics

Know your system

The best way to tune a system is to target bottlenecks, or subsystems which limit overall speed. The system specifications can help identify them.

  • If the computer becomes slow when large applications (such as OpenOffice.org and Firefox) run at the same time, check if the amount of RAM is sufficient. Use the following command, and check the "available" column:
$ free -h
  • If boot time is slow, and applications take a long time to load at first launch (only), then the hard drive is likely to blame. The speed of a hard drive can be measured with the hdparm command:
# hdparm -t /dev/sdx

hdparm indicates only the pure read speed of a hard drive, and is not a valid benchmark. A value higher than 40MB/s (while idle) is however acceptable on an average system.

  • If CPU load is consistently high even with enough RAM available, then lowering CPU use should be a priority. This can be monitored in several ways, for example with htop:
$ htop
  • If only applications using direct rendering are slow (i.e those which use the GPU, such as video players and games), then improving GPU performance should help. The first step is to verify if direct rendering is actually enabled. This is indicated by the glxinfo command:
$ glxinfo | grep direct

glxinfo is part of the mesa-demos package.

The first thing to do

The simplest and most efficient way of improving overall performance is to run lightweight environments and applications.

Benchmarking

The effects of optimization are often difficult to judge. They can however be measured by benchmarking tools.

Storage devices

Tango-edit-clear.pngThis article or section needs language, wiki syntax or style improvements.Tango-edit-clear.png

Reason: Subjective writing (Discuss in Talk:Maximizing performance#)

Swap files

Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory.

RAID

If you have multiple disks available, you can set them up as a software RAID for serious speed improvements.

Multiple hardware paths

An internal hardware path is how the storage device is connected to your motherboard. There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc. By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire. The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes.

More Examples

  1. Directly to the motherboard using pci/PCIe/ata
  2. Using an external enclosure to house the disk over USB/Firewire
  3. Turn the device into a network storage device by connecting over tcp/ip

Note also that if you have a 2 USB ports on the front of your machine, and 4 USB ports on the back, and you have 4 disks, it would probably be fastest to put 2 on front/2 on back or 3 on back/1 on front. This is because internally the front ports are likely a separate Root Hub than the back, meaning you can send twice as much data by using both than just 1. Use the following commands to determine the various paths on your machine.

USB Device Tree
$ lsusb -tv
PCI Device Tree
$ lspci -tv

Partitioning

If using a traditional spinning HDD, your partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the outside of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition (10GB, more or less depending on your needs) only for your system, as near to the beginning of the drive as possible. Other data (pictures, videos) should be kept on a separate partition, and this is usually achieved by separating the home directory (/home/user) from the system (/).

Choosing and tuning your filesystem

Choosing the best filesystem for a specific system is very important because each has its own strengths. The File systems article provides a short summary of the most popular ones. You can also find relevant articles here.

Mount options

The noatime option is known to improve performance of the filesystem.

Other mount options are filesystem specific, therefore see the relevant articles for the filesystems:

Reiserfs

The data=writeback mount option improves speed, but may corrupt data during power loss. The notail mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem:

# mkreiserfs –j /dev/sda1 /dev/sdb1

Replace /dev/sda1 with the partition reserved for the journal, and /dev/sdb1 with the partition for data. You can learn more about reiserfs with this article.

Tuning kernel parameters

There are several key tunables affecting the performance of block devices, see sysctl#Virtual memory for more information.

Tuning IO schedulers

Tango-edit-clear.pngThis article or section needs language, wiki syntax or style improvements.Tango-edit-clear.png

Reason: Theoretical background should be described at the top or the bottom, not in the middle. Use subsections to make the structure more clear. (Discuss in Talk:Maximizing performance#)

The kernel officially supports the following schedulers for storage disk in-/output (IO):

Unofficial support is available through the BFQ (Budget Fair Queueing) which is compiled the linux-zen kernel as well as many kernels in the AUR.

A more contemporary option (since kernel version 3.16) is Multi-Queue Block IO Queuing Mechanism or blk-mq for short. Blk-mq leverages a CPU with multiple cores to map I/O queries to multiple queues. The tasks are distributed across multiple threads and therefore to multiple CPU cores (per-core software queues) and can speed up read/write operations vs. traditional I/O schedulers.

One can be enable blk-mq by adding the following to the kernel's boot line:

scsi_mod.use_blk_mq=1

A HDD has spinning disks and head that move physically to the required location. Such structure leads to following characteristics:

  • random latency is quite high, for modern HDDs it is ~10ms (ignoring a disk controller write buffer).
  • sequential access provides much higher throughput. In this case the head needs to move less distance.

If we have a lot of running processes that make IO requests to different parts of storage (i.e. random access) then we can expect that a disk handles ~100 IO requests per second. Because modern systems can easily generate load much higher than 100 requests per second we have a queue of requests that have to wait for access to the storage. One way to improve throughput is to linearize access, i.e. order waiting requests by its logical address and always choose the closest request. Historically this was the first Linux IO scheduler called elevator scheduler.

One of the problems with the elevator algorithm is that it makes suffer processes with sequential access. Such processes read a block of data then process it for several microseconds then read next block and so on. The elevator scheduler does not know that the process is going to read another block nearby and, thus, moves to another request at some other location. To overcome the problem anticipatory IO scheduler was added. For synchronous requests this algorithm waits for a short amount of time before moving to another request.

While these schedulers try to improve total throughput they also might leave some unlucky requests waiting for a very long time. As an example, imagine the majority of processes make requests at the beginning of storage space while an unlucky process makes a request at the other end of storage. So developers tried to make the algorithm more fair and the deadline scheduler was added. It has a queue ordered by address (the same as elevator). If some request sits in this queue for a long time then it moves to an "expired" queue ordered by expire time. The scheduler checks the expire queue first and processes requests from there and only then moves to elevator queue. It is important to understand that this algorithm sacrifices total throughput for fairness.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: The CFQ scheduler now contains optimizations for SSDs. (Discuss in Talk:Maximizing performance#)

CFQ (the default scheduler nowadays) aggregates all ideas from above and adds cgroup support that allows to reserve some amount of IO to a specific cgroup. It is useful on shared (and cloud) hosting - users who paid for 20 IO/s want to get their share if needed.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: The deadline scheduler can perform better than noop even for SSDs. (Discuss in Talk:Maximizing performance#)

The characteristics of a SSD are different. It does not have moving parts. Random access is as fast as sequential one. An SSD can handle multiple requests at the same time. Modern devices' throughput ~10K IO/s, which is higher than workload on most systems. Essentially a user cannot generate enough requests to saturate a SDD, the requests queue is effectively always empty. In this case IO scheduler does not provide any improvements. Thus, it is recommended to use the noop scheduler for an SSD.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Bits below were restored from [1]. The blk-mq scheduler discussed above may be activated differently. (Discuss in Talk:Maximizing performance#)

It is possible to change the scheduler at runtime and even to use different schedulers for separate storage devices at the same time. Available schedulers can be queried by viewing the contents of /sys/block/sdX/queue/scheduler (the active scheduler is denoted by brackets):

$ cat /sys/block/sdX/queue/scheduler
noop deadline [cfq]

Users can change the active scheduler at runtime without the need to reboot, for example:

# echo noop > /sys/block/sdX/queue/scheduler

This method is non-persistent and will be lost upon rebooting.

Kernel parameter (for a single device)

If the sole storage device in the system is an SSD, consider setting the I/O scheduler for the entire system via the elevator=noop kernel parameter.

systemd-tmpfiles

If you have more than one storage device, or wish to avoid clutter on the kernel cmdline, you can set the I/O scheduler via systemd-tmpfiles:

/etc/tmpfiles.d/10_ioscheduler.conf
w /sys/block/sdX/queue/scheduler - - - - noop

For more detail on systemd-tmpfiles see Systemd#Temporary files.

Using udev for one device or HDD/SSD mixed environment

Though the above will undoubtedly work, it is probably considered a reliable workaround. Ergo, it would be preferred to use the system that is responsible for the devices in the first place to implement the scheduler. In this case it is udev, and to do this, all one needs is a simple udev rule.

To do this, create the following:

/etc/udev/rules.d/60-schedulers.rules
# set deadline scheduler for non-rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"

Of course, set Deadline/CFQ to the desired schedulers. Changes should occur upon next boot. To check success of the new rule:

$ cat /sys/block/sdX/queue/scheduler  # where X is the device in question
Note: In the example sixty is chosen because that is the number udev uses for its own persistent naming rules. Thus, it would seem that block devices are at this point able to be modified and this is a safe position for this particular rule. But the rule can be named anything so long as it ends in .rules.)

Power management configuration

When dealing with traditional rotational disks (HDD's) you may want to lower or disable power saving features completely.

RAM disks

See tmpfs.

USB storage devices

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Commands still unclear (Discuss in Talk:Maximizing performance#)

If USB drives like pendrives are slow to copy files, append these three lines in a systemd tmpfile:

/etc/tmpfiles.d/local.conf
w /sys/kernel/mm/transparent_hugepage/enabled - - - - madvise
w /sys/kernel/mm/transparent_hugepage/defrag - - - - madvise
w /sys/kernel/mm/transparent_hugepage/khugepaged/defrag - - - - 0

See also sysctl#Virtual memory, [2] and [3].

CPU

There are few ways to get more performance:

  • Overclock the CPU. Please note that most CPUs cannot be overclocked (locked by the vendor).
    • Nehalem (Core i#) CPUs already use overclocking (Turbo technology). This limits the possibilities of overclocking heavily (unless you use liquid cooling).
  • Change the CPU governor

Overclocking

The only way to directly improve CPU speed is overclocking. As it is a complicated and risky task, it is not recommended for anyone except experts. The best way to overclock is through the BIOS. When purchasing your system, keep in mind that most Intel motherboards are notorious for disabling the capability to overclock.

Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The i7z and i7z-gitAUR packages are available.

A way to modify performance (ref) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS).

Kernel PKGBUILDs that include the BFS patch can be installed from the AUR or Unofficial user repositories. See the respective pages for linux-ckAUR and Linux-ck wiki page, linux-pfAUR and Linux-pf wiki page or linux-bfsAUR[broken link: archived in aur-mirror] for more information on their additional patches.

Note: BFS/CK are designed for desktop/laptop use and not servers. They provide low latency and work well for 16 CPUs or less. Also, Con Kolivas suggests setting HZ to 1000. For more information, see the BFS FAQ and Kernel patch homepage of Con Kolivas.

Verynice

VeryNice is a daemon, available in the veryniceAUR package, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as goodexe in /etc/verynice.conf. Similarly, CPU-hungry executables running in the background, like make, can be defined as badexe. This prioritization greatly improves system responsiveness under heavy load.

Ananicy

Ananicy is a daemon, available in the ananicy-gitAUR package, for auto adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources.

cgroups

See cgroups.

irqbalance

The purpose of irqbalance is distribute hardware interrupts across processors on a multiprocessor system in order to increase performance. It can be controlled by the provided irqbalance.service.

Graphics

As with CPUs, overclocking can directly improve performance, but is generally recommended against. There are several packages in the AUR, such as rovclockAUR[broken link: archived in aur-mirror], amdoverdrivectrlAUR (ATI), and nvclockAUR (NVIDIA).

Xorg.conf configuration

Graphics performance may depend on the settings in /etc/X11/xorg.conf; see the NVIDIA, ATI and Intel articles. Improper settings may stop Xorg from working, so caution is advised.

Driconf

driconf is a small utility which allows to change direct rendering settings for open source drivers. Enabling HyperZ may improve performance.

RAM and swap

Relocate files to tmpfs

Relocate files, such as your browser profile, to a tmpfs file system, for improvements in application response as all the files are now stored in RAM:

Root on RAM overlay

If running off a slow writing medium (USB, spinning HDDs) and storage requirements are low, the root may be run on a RAM overlay ontop of read only root (on disk). This can vastly improve performance at the cost of a limited writable space to root. See liverootAUR.

Zram or zswap

The zram kernel module (previously called compcache) provides a compressed block device in RAM. If you use it as swap device, the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Using zram is also a good way to reduce disk read/write cycles due to swap on SSDs.

Similar benefits (at similar costs) can be achieved using zswap rather than zram. The two are generally similar in intent although not operation: zswap operates as a compressed RAM cache and neither requires (nor permits) extensive userspace configuration.

Example: To set up one lz4 compressed zram device with 32GiB capacity and a higher-than-normal priority (only for the current session):

# modprobe zram
# echo lz4 > /sys/block/zram0/comp_algorithm
# echo 32G > /sys/block/zram0/disksize
# mkswap --label zram0 /dev/zram0
# swapon --priority 100 /dev/zram0

To disable it again, either reboot or run

# swapoff /dev/zram0
# rmmod zram

A detailed explanation of all steps, options and potential problems is provided in the official documentation of the module here.

The systemd-swap package provides a systemd-swap.service unit to automatically initialize zram devices. Configuration is possible in /etc/systemd-swap.conf.

The package zramswapAUR provides an automated script for setting up such swap devices with optimal settings for your system (such as RAM size and CPU core number). The script creates one zram device per CPU core with a total space equivalent to the RAM available, so you will have a compressed swap with higher priority than regular swap, which will utilize multiple CPU cores for compessing data. To do this automatically on every boot, enable zramswap.service.

Swap on zRAM using a udev rule

The example below describes how to set up swap on zRAM automatically at boot with a single udev rule. No extra package should be needed to make this work.

First, enable the module:

/etc/modules-load.d/zram.conf
zram

Configure the number of /dev/zram nodes you need.

/etc/modprobe.d/zram.conf
options zram num_devices=2

Create the udev rule as shown in the example.

/etc/udev/rules.d/99-zram.rules
KERNEL=="zram0", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram0", TAG+="systemd"
KERNEL=="zram1", ATTR{disksize}="512M" RUN="/usr/bin/mkswap /dev/zram1", TAG+="systemd"

Add /dev/zram to your fstab.

/etc/fstab
/dev/zram0 none swap defaults 0 0
/dev/zram1 none swap defaults 0 0

Using the graphic card's RAM

In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See Swap on video ram.

Network

Every time a connections is made, the system must first resolve a fully qualified domain name to an IP address before the actual connection can be done. Response times of network requests can be improved by caching DNS queries locally. Common tools for this purpose include pdnsd, dnsmasq, unbound and rescached-gitAUR.

Application-specific tips

Firefox

See Firefox tweaks#Performance and Firefox on RAM.

Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build. To do this append:

ac_add_options --enable-profile-guided-optimization

to your .mozconfig file.