Difference between revisions of "Improving performance"
m (→GPU Overclocking: nvclock is in AUR) |
|||
(21 intermediate revisions by 8 users not shown) | |||
Line 6: | Line 6: | ||
[[ru:Maximizing Performance]] | [[ru:Maximizing Performance]] | ||
[[zh-CN:Maximizing Performance]] | [[zh-CN:Maximizing Performance]] | ||
− | This article | + | {{Article summary start}} |
+ | {{Article summary text|This article covers and links to various topics that might impact system performance. Basic metrics need to be taken, and then steps can be followed to potentially improve upon a system's performance.}} | ||
+ | {{Article summary heading|Related}} | ||
+ | {{Article summary wiki|Benchmarking}} | ||
+ | {{Article summary end}} | ||
+ | This article provides information on basic system diagnostics relating to performance as well as steps that may be taken to reduce resource consumption or to otherwise optimize the system with the end-goal being either perceived or documented improvements to a system's performance. | ||
==The basics== | ==The basics== | ||
Line 21: | Line 26: | ||
* If the only applications lagging are the ones using direct rendering, meaning they use the graphic card, like video players and games, then improving the graphic performance should help. First step would be to verify if direct rendering simply is not enabled. This is indicated by the glxinfo command: | * If the only applications lagging are the ones using direct rendering, meaning they use the graphic card, like video players and games, then improving the graphic performance should help. First step would be to verify if direct rendering simply is not enabled. This is indicated by the glxinfo command: | ||
$ glxinfo | grep direct | $ glxinfo | grep direct | ||
− | glxinfo is part of mesa-demos package. | + | {{ic|glxinfo}} is part of {{Pkg|mesa-demos}} package. |
===The first thing to do=== | ===The first thing to do=== | ||
Line 27: | Line 32: | ||
* Use a [[Window Manager|window manager]] instead of a [[Desktop Environment]]. Choices include [[dwm]], [[wmii]], [[i3]], [[Awesome]], [[Openbox]], [[Fluxbox]] and [[JWM]]. | * Use a [[Window Manager|window manager]] instead of a [[Desktop Environment]]. Choices include [[dwm]], [[wmii]], [[i3]], [[Awesome]], [[Openbox]], [[Fluxbox]] and [[JWM]]. | ||
* Choose a minimal Desktop Environment over a heavier one like [[GNOME]] or [[KDE]]. Something like [[LXDE]] or [[Xfce]]. | * Choose a minimal Desktop Environment over a heavier one like [[GNOME]] or [[KDE]]. Something like [[LXDE]] or [[Xfce]]. | ||
− | * Using lightweight applications. Search [[ | + | * Using lightweight applications. Search [[List of Applications]] for console applications and the Light and Fast Applications Awards threads in the forum: [https://bbs.archlinux.org/viewtopic.php?id=41168 2007], [https://bbs.archlinux.org/viewtopic.php?id=67951 2008], [https://bbs.archlinux.org/viewtopic.php?id=78490 2009], [https://bbs.archlinux.org/viewtopic.php?id=88515 2010], [https://bbs.archlinux.org/viewtopic.php?id=111878 2011], and [https://bbs.archlinux.org/viewtopic.php?id=138281 2012]. |
* Remove unnecessary [[daemons]]. | * Remove unnecessary [[daemons]]. | ||
− | ===Compromise=== | + | === Compromise === |
+ | |||
Almost all tuning brings drawbacks. Lighter applications usually come with less features and some tweaks may make a system unstable, or simply require time to implement and maintain. This page tries to highlight those drawbacks, but the final judgment rests on the user. | Almost all tuning brings drawbacks. Lighter applications usually come with less features and some tweaks may make a system unstable, or simply require time to implement and maintain. This page tries to highlight those drawbacks, but the final judgment rests on the user. | ||
− | ===Benchmarking=== | + | === Benchmarking === |
− | The effects of optimization are often difficult to judge. They can however be measured by [[benchmarking]] tools | + | |
+ | The effects of optimization are often difficult to judge. They can however be measured by [[benchmarking]] tools. | ||
+ | |||
+ | == Storage devices == | ||
+ | |||
+ | === Device layout === | ||
− | |||
− | |||
One of the biggest performance gains comes from having multiple storage devices in a layout that spreads the operating system work around. Having {{ic|/}} {{ic|/home}} {{ic|/var}} and {{ic|/usr}} on separate disks is dramatically faster than a single disk layout where they are all on the same hard drive. | One of the biggest performance gains comes from having multiple storage devices in a layout that spreads the operating system work around. Having {{ic|/}} {{ic|/home}} {{ic|/var}} and {{ic|/usr}} on separate disks is dramatically faster than a single disk layout where they are all on the same hard drive. | ||
− | ====Swap | + | ==== Swap files ==== |
+ | |||
Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory. | Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory. | ||
− | ====RAID | + | ==== RAID benefits ==== |
+ | |||
If you have multiple disks (2 or more) available, you can set them up as a software [[RAID]] for serious speed improvements. In a RAID 0 array there is no redundancy in case of drive failure, but for each additional disk you add to the array, the speed of the disk becomes that much faster. The smart choice is to use RAID 5 which offers both speed and data protection. | If you have multiple disks (2 or more) available, you can set them up as a software [[RAID]] for serious speed improvements. In a RAID 0 array there is no redundancy in case of drive failure, but for each additional disk you add to the array, the speed of the disk becomes that much faster. The smart choice is to use RAID 5 which offers both speed and data protection. | ||
− | ====Multiple | + | ==== Multiple hardware paths ==== |
+ | |||
An internal hardware path is how the storage device is connected to your motherboard. There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc. By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire. The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes. | An internal hardware path is how the storage device is connected to your motherboard. There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc. By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire. The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes. | ||
Line 60: | Line 72: | ||
{{hc|PCI Device Tree|$ lspci -tv}} | {{hc|PCI Device Tree|$ lspci -tv}} | ||
− | ===Partitioning=== | + | === Partitioning === |
− | The partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the center of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition ( | + | |
+ | The partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the center of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition (10GB, more or less depending on your needs) only for your system, as near to the beginning of the drive as possible. Other data (pictures, videos) should be kept on a separate partition, and this is usually achieved by separating the home directory ({{ic|/home/''user''}}) from the system ({{ic|/}}). | ||
+ | |||
+ | === Choosing and tuning your filesystem === | ||
− | + | Choosing the best filesystem for a specific system is very important because each has its own strengths. The [[File Systems]] article provides a short summary of the most popular ones. You can also find relevant articles [[:Category:File systems|here]]. | |
− | Choosing the best filesystem for a specific system is very important because each has its own strengths. The [[ | ||
− | ==== | + | ==== Mount options==== |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Mount options offer an easy way to improve speed without reformatting. They can be set using the mount command: | Mount options offer an easy way to improve speed without reformatting. They can be set using the mount command: | ||
$ mount -o option1,option2 /dev/partition /mnt/partition | $ mount -o option1,option2 /dev/partition /mnt/partition | ||
Line 80: | Line 88: | ||
The mount options {{Ic|noatime,nodiratime}} are known for improving performance on almost all file-systems. The former is a superset of the latter (which applies to directories only -- {{Ic|noatime}} applies to both files and directories). In rare cases, for example if you use mutt, it can cause minor problems. You can instead use the {{Ic|relatime}} option (NB relatime is the default in >2.6.30) | The mount options {{Ic|noatime,nodiratime}} are known for improving performance on almost all file-systems. The former is a superset of the latter (which applies to directories only -- {{Ic|noatime}} applies to both files and directories). In rare cases, for example if you use mutt, it can cause minor problems. You can instead use the {{Ic|relatime}} option (NB relatime is the default in >2.6.30) | ||
− | ====Ext3==== | + | ==== Ext3 ==== |
+ | |||
See [[Ext3]]. | See [[Ext3]]. | ||
− | ====Ext4==== | + | ==== Ext4 ==== |
+ | |||
See [[Ext4#Tips_and_tricks | Ext4]]. | See [[Ext4#Tips_and_tricks | Ext4]]. | ||
− | ====JFS==== | + | ==== JFS ==== |
− | See [[JFS Filesystem#Optimizations| JFS Filesystem]]. | + | |
+ | See [[JFS Filesystem#Optimizations|JFS Filesystem]]. | ||
+ | |||
+ | ==== XFS ==== | ||
− | + | {{Merge|XFS}} | |
For optimal speed, just create an XFS file-system with: | For optimal speed, just create an XFS file-system with: | ||
$ mkfs.xfs /dev/thetargetpartition | $ mkfs.xfs /dev/thetargetpartition | ||
Line 95: | Line 108: | ||
==== Reiserfs ==== | ==== Reiserfs ==== | ||
+ | |||
+ | {{Merge|Reiser4}} | ||
The {{Ic|<nowiki>data=writeback</nowiki>}} mount option improves speed, but may corrupt data during power loss. The {{Ic|notail}} mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem: | The {{Ic|<nowiki>data=writeback</nowiki>}} mount option improves speed, but may corrupt data during power loss. The {{Ic|notail}} mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem: | ||
Line 102: | Line 117: | ||
Replace /dev/hda1 with the partition reserved for the journal, and /dev/hdb1 with the partition for data. You can learn more about reiserfs with this [http://www.funtoo.org/en/articles/linux/ffg/2/ article]. | Replace /dev/hda1 with the partition reserved for the journal, and /dev/hdb1 with the partition for data. You can learn more about reiserfs with this [http://www.funtoo.org/en/articles/linux/ffg/2/ article]. | ||
− | ====Btrfs==== | + | ==== Btrfs ==== |
+ | |||
See [[Btrfs#Defragmentation|defragmentation]] and [[Btrfs#Compression|compression]]. | See [[Btrfs#Defragmentation|defragmentation]] and [[Btrfs#Compression|compression]]. | ||
− | ===Compressing /usr=== | + | === Tuning kernel parameters === |
+ | |||
+ | {{Merge|sysctl|It already contains a section on TCP/IP stack hardening, why should it not contain virtual memory settings? Similar information would be kept in one place.}} | ||
+ | |||
+ | There are several key tunables governing filesystems that users should consider adding to [[sysctl|/etc/sysctl.conf]] which is auto-loaded at boot by [[systemd]]: | ||
+ | |||
+ | # Contains, as a percentage of total system memory, the number of pages at which | ||
+ | # a process which is generating disk writes will start writing out dirty data. | ||
+ | vm.dirty_ratio = 3 | ||
+ | |||
+ | # Contains, as a percentage of total system memory, the number of pages at which | ||
+ | # the background kernel flusher threads will start writing out dirty data. | ||
+ | vm.dirty_background_ratio = 2 | ||
+ | |||
+ | As noted in the comments, one needs to consider the total amount of RAM when setting these values. | ||
+ | |||
+ | *'''vm.dirty_ratio''' defaults to 10 (percent of RAM). Consensus is that 10% of RAM when RAM is say half a GB (so 10% is ~50 MB) is a sane value on spinning disks, but it can be MUCH worse when RAM is larger, say 16 GB (10% is ~1.6 GB), as that's several seconds of writeback on spinning disks. A more sane value in this cause is 3 (16*0.03 ~ 491 MB). | ||
+ | |||
+ | *'''vm.dirty_background_ratio''' similarly, 5 (% of RAM) by default may be just fine for small memory values, but again, consider and adjust accordingly for the amount of RAM on a particular system. | ||
+ | |||
+ | === Compressing /usr === | ||
+ | |||
{{Note|As of version 3.0 of the Linux kernel, aufs2 is no longer supported.}} | {{Note|As of version 3.0 of the Linux kernel, aufs2 is no longer supported.}} | ||
{{out of date|aufs is no longer in the official repos. Also, read the Note box above.}} | {{out of date|aufs is no longer in the official repos. Also, read the Note box above.}} | ||
Line 123: | Line 160: | ||
A [https://bbs.archlinux.org/viewtopic.php?pid=714052 Bash script] has been created that will automate the process of re-compressing (read updating) the archive since the tutorial is meant for Gentoo and some options do not correlate to what they should be in Arch. | A [https://bbs.archlinux.org/viewtopic.php?pid=714052 Bash script] has been created that will automate the process of re-compressing (read updating) the archive since the tutorial is meant for Gentoo and some options do not correlate to what they should be in Arch. | ||
− | ===Tuning for an SSD=== | + | === Tuning for an SSD === |
+ | |||
[[SSD#Tips_for_Maximizing_SSD_Performance]] | [[SSD#Tips_for_Maximizing_SSD_Performance]] | ||
− | ===RAM disks / tuning for really slow disks=== | + | === RAM disks / tuning for really slow disks === |
+ | |||
* [http://cs.joensuu.fi/~mmeri/usbraid/ USB stick RAID] | * [http://cs.joensuu.fi/~mmeri/usbraid/ USB stick RAID] | ||
* [https://bbs.archlinux.org/viewtopic.php?pid=493773#p493773 Combine RAM disk with disk in RAID] | * [https://bbs.archlinux.org/viewtopic.php?pid=493773#p493773 Combine RAM disk with disk in RAID] | ||
− | ==CPU== | + | == CPU == |
− | |||
− | Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The i7z package is available in the community repo and i7z- | + | The only way to directly improve CPU speed is overclocking. As it is a complicated and risky task, it is not recommended for anyone except experts. The best way to overclock is through the BIOS. When purchasing your system, keep in mind that most Intel motherboards are notorious for disabling the capability to overclock. |
+ | |||
+ | Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The {{Pkg|i7z}} package is available in the community repo and {{AUR|i7z-git}} is available in the [[AUR]]. | ||
A way to modify performance ([http://lkml.org/lkml/2009/9/6/136 ref]) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS). | A way to modify performance ([http://lkml.org/lkml/2009/9/6/136 ref]) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS). | ||
Line 141: | Line 181: | ||
{{Note|BFS/CK are designed for desktop/laptop use and not servers. They provide low latency and work well for 16 CPUs or less. Also, Con Kolivas suggests setting HZ to 1000. For more information, see the [http://ck.kolivas.org/patches/bfs/bfs-faq.txt BFS FAQ] and [http://users.on.net/~ckolivas/kernel/ Kernel patch homepage of Con Kolivas].}} | {{Note|BFS/CK are designed for desktop/laptop use and not servers. They provide low latency and work well for 16 CPUs or less. Also, Con Kolivas suggests setting HZ to 1000. For more information, see the [http://ck.kolivas.org/patches/bfs/bfs-faq.txt BFS FAQ] and [http://users.on.net/~ckolivas/kernel/ Kernel patch homepage of Con Kolivas].}} | ||
− | ===Verynice=== | + | === Verynice === |
+ | |||
[[Verynice]] is a daemon, available in the [[AUR]] as {{AUR|verynice}}, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as ''goodexe'' in {{ic|/etc/verynice.conf}}. Similarly, CPU-hungry executables running in the background, like make, can be defined as ''badexe''. This prioritization greatly improves system responsiveness under heavy load. | [[Verynice]] is a daemon, available in the [[AUR]] as {{AUR|verynice}}, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as ''goodexe'' in {{ic|/etc/verynice.conf}}. Similarly, CPU-hungry executables running in the background, like make, can be defined as ''badexe''. This prioritization greatly improves system responsiveness under heavy load. | ||
− | ===Ulatencyd=== | + | === Ulatencyd === |
+ | |||
[[Ulatencyd]] is a daemon that controls how the Linux kernel will spend its resources on the running processes. It uses dynamic cgroups to give the kernel hints and limitations on processes. It supports prioritizing processes for disk I/O as well as CPU shares, and uses more clever heuristics than Verynice. In addition, it comes with a good set of configs out of the box. | [[Ulatencyd]] is a daemon that controls how the Linux kernel will spend its resources on the running processes. It uses dynamic cgroups to give the kernel hints and limitations on processes. It supports prioritizing processes for disk I/O as well as CPU shares, and uses more clever heuristics than Verynice. In addition, it comes with a good set of configs out of the box. | ||
One note of warning, by default it changes the default scheduler of all block devices to cfq, to disable behavior see [[Ulatencyd]]. | One note of warning, by default it changes the default scheduler of all block devices to cfq, to disable behavior see [[Ulatencyd]]. | ||
− | ==Graphics== | + | == Graphics == |
+ | |||
+ | === Xorg.conf configuration === | ||
− | |||
Graphic performance heavily depends on the settings in {{ic|/etc/X11/xorg.conf}}. There are tutorials for [[Nvidia]], [[ATI]] and [[Intel]] cards. Improper settings may stop Xorg from working, so caution is advised. | Graphic performance heavily depends on the settings in {{ic|/etc/X11/xorg.conf}}. There are tutorials for [[Nvidia]], [[ATI]] and [[Intel]] cards. Improper settings may stop Xorg from working, so caution is advised. | ||
− | ===Driconf=== | + | === Driconf === |
+ | |||
Driconf is a small utility that can be found in the [[official repositories]] that allows you to change the direct rendering settings for open source drivers. Enabling HyperZ can drastically improve performance. | Driconf is a small utility that can be found in the [[official repositories]] that allows you to change the direct rendering settings for open source drivers. Enabling HyperZ can drastically improve performance. | ||
− | ===GPU | + | === GPU overclocking === |
− | Overclocking a graphics card is typically more expedient than with a CPU, since there are readily accessible software packages which allow for on-the-fly GPU clock adjustments. For ATI users, get {{AUR|rovclock}} or {{AUR|amdoverdrivectrl}}, and | + | |
+ | Overclocking a graphics card is typically more expedient than with a CPU, since there are readily accessible software packages which allow for on-the-fly GPU clock adjustments. For ATI users, get {{AUR|rovclock}} or {{AUR|amdoverdrivectrl}}, and NVIDIA users should get {{AUR|nvclock}} from the AUR. Intel chipsets users can install [http://www.gmabooster.com/ GMABooster] from with the {{AUR|gmabooster}} AUR package. | ||
The changes can be made permanent by running the appropriate command after X boots, for example by adding it to {{ic|~/.xinitrc}}. A safer approach would be to only apply the overclocked settings when needed. | The changes can be made permanent by running the appropriate command after X boots, for example by adding it to {{ic|~/.xinitrc}}. A safer approach would be to only apply the overclocked settings when needed. | ||
− | ==RAM and swap== | + | == RAM and swap == |
+ | |||
=== Relocate files to tmpfs === | === Relocate files to tmpfs === | ||
+ | |||
Relocate files, such as your browser profile, to a [[Wikipedia:tmpfs|tmpfs]] file system, including {{ic|/tmp}}, or {{ic|/dev/shm}} for improvements in application response as all the files are now stored in RAM. | Relocate files, such as your browser profile, to a [[Wikipedia:tmpfs|tmpfs]] file system, including {{ic|/tmp}}, or {{ic|/dev/shm}} for improvements in application response as all the files are now stored in RAM. | ||
Line 174: | Line 221: | ||
=== Swappiness === | === Swappiness === | ||
− | + | See [[Swap#Swappiness]]. | |
− | |||
− | |||
− | |||
− | + | === Compcache/Zram === | |
− | |||
[https://code.google.com/p/compcache/ Compcache], nowadays replaced by the '''zram''' kernel module, creates a device in RAM and compresses it. If you use for swap means that part of the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Zram is in mainline staging (therefore its not stable yet, use with caution). | [https://code.google.com/p/compcache/ Compcache], nowadays replaced by the '''zram''' kernel module, creates a device in RAM and compresses it. If you use for swap means that part of the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Zram is in mainline staging (therefore its not stable yet, use with caution). | ||
− | The AUR package {{ | + | The AUR package {{AUR|zramswap}} provides an automated script fot setting up such swap devices with optimal settings for your system (such as RAM size and CPU core number). The script creates one zram device per CPU core with a total space equivalent to the RAM available. To do this automatically on every boot, enable {{ic|zramswap.service}} via [[systemd#Basic systemctl usage|systemctl]]. |
− | |||
− | |||
− | |||
You will have a compressed swap with higher priority than your regular swap which will utilize multiple CPU cores for compessing data. | You will have a compressed swap with higher priority than your regular swap which will utilize multiple CPU cores for compessing data. | ||
{{Tip|Using zram is also a good way to reduce disk read/write cycles due to swap on SSDs.}} | {{Tip|Using zram is also a good way to reduce disk read/write cycles due to swap on SSDs.}} | ||
− | |||
− | ===Using the graphic card's RAM=== | + | === Using the graphic card's RAM === |
+ | |||
In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See [[Swap on video ram]]. | In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See [[Swap on video ram]]. | ||
=== Preloading === | === Preloading === | ||
− | Preloading is the action of putting and keeping target files into the RAM. The benefit is that preloaded applications start more quickly because reading from the RAM is always quicker than from the hard drive. However, part of your RAM will be dedicated to this task, but no more than if you kept the application open. Therefore preloading is best used with large and often-used applications like Firefox and | + | |
+ | Preloading is the action of putting and keeping target files into the RAM. The benefit is that preloaded applications start more quickly because reading from the RAM is always quicker than from the hard drive. However, part of your RAM will be dedicated to this task, but no more than if you kept the application open. Therefore preloading is best used with large and often-used applications like Firefox and LibreOffice. | ||
+ | |||
==== Go-preload ==== | ==== Go-preload ==== | ||
− | + | ||
+ | {{Out of date|mentions {{ic|rc.conf}}, which is deprecated}} | ||
+ | {{AUR|gopreload}} is a small daemon created in the [http://forums.gentoo.org/viewtopic-t-789818-view-next.html?sid=5457cff93039fc7d4a3e445ef90f9821 Gentoo forum]. To use it, first run this command in a terminal for each program you want to preload at boot: | ||
# gopreload-prepare program | # gopreload-prepare program | ||
Then, as instructed, press Enter when the program is fully loaded. This will add a list of files needed by the program in {{ic|/usr/share/gopreload/enabled}}. To load all lists at boot, add {{ic|gopreload}} to your DAEMONS array in {{ic|/etc/rc.conf}}. To disable the loading of a program, remove the appropriate list in {{ic|/usr/share/gopreload/enabled}} or move it to {{ic|/usr/share/gopreload/disabled}}. | Then, as instructed, press Enter when the program is fully loaded. This will add a list of files needed by the program in {{ic|/usr/share/gopreload/enabled}}. To load all lists at boot, add {{ic|gopreload}} to your DAEMONS array in {{ic|/etc/rc.conf}}. To disable the loading of a program, remove the appropriate list in {{ic|/usr/share/gopreload/enabled}} or move it to {{ic|/usr/share/gopreload/disabled}}. | ||
− | ====Preload==== | + | ==== Preload ==== |
+ | |||
A more automated approach is used by [[Preload]]. All you have to do is enable it with this command: | A more automated approach is used by [[Preload]]. All you have to do is enable it with this command: | ||
# systemctl enable preload | # systemctl enable preload | ||
It will monitor the most used files on your system, and with time build its own list of files to preload at boot. | It will monitor the most used files on your system, and with time build its own list of files to preload at boot. | ||
− | ==== | + | == Boot time == |
− | |||
− | |||
You can find tutorials with good tips in the article [[Improve Boot Performance]]. | You can find tutorials with good tips in the article [[Improve Boot Performance]]. | ||
− | ===Suspend to RAM=== | + | === Suspend to RAM === |
− | The best way to reduce boot time is not booting at all. Consider [[Suspend to RAM|suspending your system to RAM]] instead. | + | |
+ | The best way to reduce boot time is not booting at all. Consider [[Suspend and Hibernate#Suspend to RAM|suspending your system to RAM]] instead. | ||
+ | |||
+ | == Application-specific tips == | ||
+ | |||
+ | === Firefox === | ||
− | + | See [[Firefox Tweaks#Performance]] and [[Firefox Ramdisk]]. | |
− | |||
− | |||
Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build. | Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build. | ||
Line 227: | Line 273: | ||
to your mozconfig. | to your mozconfig. | ||
− | ===Gcc/Makepkg=== | + | === Gcc/Makepkg === |
+ | |||
See [[Ccache]]. | See [[Ccache]]. | ||
− | === | + | === Office suites === |
− | See [[LibreOffice#Speed up LibreOffice|Speed up LibreOffice]]. | + | |
+ | See [[LibreOffice#Speed up LibreOffice|Speed up LibreOffice]] and [[Openoffice#Speed up OpenOffice|Speed up OpenOffice]]. | ||
+ | |||
+ | === Pacman === | ||
− | |||
See [[Improve Pacman Performance]]. | See [[Improve Pacman Performance]]. | ||
− | ===SSH=== | + | === SSH === |
+ | |||
See [[SSH#Speeding up SSH|Speed up SSH]]. | See [[SSH#Speeding up SSH|Speed up SSH]]. | ||
− | ==Laptops== | + | == Laptops == |
+ | |||
See [[Laptop]]. | See [[Laptop]]. |
Revision as of 17:54, 17 September 2013
zh-CN:Maximizing Performance Template:Article summary start Template:Article summary text Template:Article summary heading Template:Article summary wiki Template:Article summary end This article provides information on basic system diagnostics relating to performance as well as steps that may be taken to reduce resource consumption or to otherwise optimize the system with the end-goal being either perceived or documented improvements to a system's performance.
Contents
- 1 The basics
- 2 Storage devices
- 3 CPU
- 4 Graphics
- 5 RAM and swap
- 6 Boot time
- 7 Application-specific tips
- 8 Laptops
The basics
Know your system
The best way to tune a system is to target the bottlenecks, that is the subsystems that limit the overall speed. They usually can be identified by knowing the specifications of the system, but there are some basic indications:
- If the computer becomes slow when big applications, like OpenOffice.org and Firefox, are running at the same time, then there is a good chance the amount of RAM is insufficient. To verify available RAM, use this command, and check for the line beginning with -/+buffers:
$ free -m
- If boot time is really slow, and if applications take a lot of time to load the first time they are launched, but run fine afterwards, then the hard drive is probably too slow. The speed of a hard drive can be measured using the hdparm command:
$ hdparm -t /dev/sdx
This is only the pure read speed of the hard drive, and is not a valid benchmark, but a value superior to 40MB/s (assuming drive tested while idle) can be considered decent on an average system. hdparm can be found in the Official Repositories.
- If the CPU load is consistently high even when RAM is available, then lowering CPU usage should be a priority. CPU load can be monitored in many ways, like using the top command:
$ top
- If the only applications lagging are the ones using direct rendering, meaning they use the graphic card, like video players and games, then improving the graphic performance should help. First step would be to verify if direct rendering simply is not enabled. This is indicated by the glxinfo command:
$ glxinfo | grep direct
glxinfo
is part of mesa-demos package.
The first thing to do
The simplest and most efficient way of improving overall performance is to run lightweight environments and applications.
- Use a window manager instead of a Desktop Environment. Choices include dwm, wmii, i3, Awesome, Openbox, Fluxbox and JWM.
- Choose a minimal Desktop Environment over a heavier one like GNOME or KDE. Something like LXDE or Xfce.
- Using lightweight applications. Search List of Applications for console applications and the Light and Fast Applications Awards threads in the forum: 2007, 2008, 2009, 2010, 2011, and 2012.
- Remove unnecessary daemons.
Compromise
Almost all tuning brings drawbacks. Lighter applications usually come with less features and some tweaks may make a system unstable, or simply require time to implement and maintain. This page tries to highlight those drawbacks, but the final judgment rests on the user.
Benchmarking
The effects of optimization are often difficult to judge. They can however be measured by benchmarking tools.
Storage devices
Device layout
One of the biggest performance gains comes from having multiple storage devices in a layout that spreads the operating system work around. Having /
/home
/var
and /usr
on separate disks is dramatically faster than a single disk layout where they are all on the same hard drive.
Swap files
Creating your swap files on a separate disk can also help quite a bit, especially if your machine swaps frequently. It happens if you do not have enough RAM for your environment. Using KDE with all the features and applications that come along may require several GiB of memory, whereas a tiny window manager with console applications will perfectly fit in less than 512 MiB of memory.
RAID benefits
If you have multiple disks (2 or more) available, you can set them up as a software RAID for serious speed improvements. In a RAID 0 array there is no redundancy in case of drive failure, but for each additional disk you add to the array, the speed of the disk becomes that much faster. The smart choice is to use RAID 5 which offers both speed and data protection.
Multiple hardware paths
An internal hardware path is how the storage device is connected to your motherboard. There are different ways to connect to the motherboard such as TCP/IP through a NIC, plugged in directly using PCIe/PCI, Firewire, Raid Card, USB, etc. By spreading your storage devices across these multiple connection points you maximize the capabilities of your motherboard, for example 6 hard-drives connected via USB would be much much slower than 3 over USB and 3 over Firewire. The reason is that each entry path into the motherboard is like a pipe, and there is a set limit to how much can go through that pipe at any one time. The good news is that the motherboard usually has several pipes.
More Examples
- Directly to the motherboard using pci/PCIe/ata
- Using an external enclosure to house the disk over USB/Firewire
- Turn the device into a network storage device by connecting over tcp/ip
Note also that if you have a 2 USB ports on the front of your machine, and 4 USB ports on the back, and you have 4 disks, it would probably be fastest to put 2 on front/2 on back or 3 on back/1 on front. This is because internally the front ports are likely a separate Root Hub than the back, meaning you can send twice as much data by using both than just 1. Use the following commands to determine the various paths on your machine.
USB Device Tree
$ lsusb -tv
PCI Device Tree
$ lspci -tv
Partitioning
The partition layout can influence the system's performance. Sectors at the beginning of the drive (closer to the center of the disk) are faster than those at the end. Also, a smaller partition requires less movements from the drive's head, and so speed up disk operations. Therefore, it is advised to create a small partition (10GB, more or less depending on your needs) only for your system, as near to the beginning of the drive as possible. Other data (pictures, videos) should be kept on a separate partition, and this is usually achieved by separating the home directory (/home/user
) from the system (/
).
Choosing and tuning your filesystem
Choosing the best filesystem for a specific system is very important because each has its own strengths. The File Systems article provides a short summary of the most popular ones. You can also find relevant articles here.
Mount options
Mount options offer an easy way to improve speed without reformatting. They can be set using the mount command:
$ mount -o option1,option2 /dev/partition /mnt/partition
To set them permanently, you can modify /etc/fstab to make the relevant line look like this:
/dev/partition /mnt/partition partitiontype option1,option2 0 0
The mount options noatime,nodiratime
are known for improving performance on almost all file-systems. The former is a superset of the latter (which applies to directories only -- noatime
applies to both files and directories). In rare cases, for example if you use mutt, it can cause minor problems. You can instead use the relatime
option (NB relatime is the default in >2.6.30)
Ext3
See Ext3.
Ext4
See Ext4.
JFS
See JFS Filesystem.
XFS
For optimal speed, just create an XFS file-system with:
$ mkfs.xfs /dev/thetargetpartition
Yep, so simple — since all of the "boost knobs" are already "on" by default.
Reiserfs
The data=writeback
mount option improves speed, but may corrupt data during power loss. The notail
mount option increases the space used by the filesystem by about 5%, but also improves overall speed. You can also reduce disk load by putting the journal and data on separate drives. This is done when creating the filesystem:
$ mkreiserfs –j /dev/hda1 /dev/hdb1
Replace /dev/hda1 with the partition reserved for the journal, and /dev/hdb1 with the partition for data. You can learn more about reiserfs with this article.
Btrfs
See defragmentation and compression.
Tuning kernel parameters
There are several key tunables governing filesystems that users should consider adding to /etc/sysctl.conf which is auto-loaded at boot by systemd:
# Contains, as a percentage of total system memory, the number of pages at which # a process which is generating disk writes will start writing out dirty data. vm.dirty_ratio = 3 # Contains, as a percentage of total system memory, the number of pages at which # the background kernel flusher threads will start writing out dirty data. vm.dirty_background_ratio = 2
As noted in the comments, one needs to consider the total amount of RAM when setting these values.
- vm.dirty_ratio defaults to 10 (percent of RAM). Consensus is that 10% of RAM when RAM is say half a GB (so 10% is ~50 MB) is a sane value on spinning disks, but it can be MUCH worse when RAM is larger, say 16 GB (10% is ~1.6 GB), as that's several seconds of writeback on spinning disks. A more sane value in this cause is 3 (16*0.03 ~ 491 MB).
- vm.dirty_background_ratio similarly, 5 (% of RAM) by default may be just fine for small memory values, but again, consider and adjust accordingly for the amount of RAM on a particular system.
Compressing /usr
A way to speed up reading from the hard drive is to compress the data, because there is less data to be read. It must however be decompressed, which means a greater CPU load. Some file systems support transparent compression, most notably Btrfs and reiserfs4, but their compression ratio is limited by the 4k block size. A good alternative is to compress /usr
in a squashfs file, with a 64k(128k) block size, as instructed in this Gentoo forums thread. What this tutorial does is basically to compress the /usr
folder into a compressed squashfs file-system, then mounts it with aufs. A lot of space is saved, usually two thirds of the original size of /usr
, and applications load faster. However, each time an application is installed or reinstalled, it is written uncompressed, so /usr
must be re-compressed periodically. Squashfs is already in the kernel, and aufs2 is in the official repositories, so no kernel compilation is needed if using the stock kernel.
Since the linked guide is for Gentoo, the next commands outline the steps specifically for Arch. To get it working, install the packages aufs2 and squashfs-tools. These packages provide the aufs-modules and some userspace-tools for the squash-filesystem.
Now we need some extra directories where we can store the archive of /usr
as read-only and another folder where we can store the data changed after the last compression as writeable:
# mkdir -p /squashed/usr/{ro,rw}
Now that we got a rough setup you should perform a complete system-upgrade since every change of content in /usr
after the compression will be excluded from this speedup. If you use prelink you should also perform a complete prelink before creating the archive. Now it is time to invoke the command to compress /usr
:
# mksquashfs /usr /squashed/usr/usr.sfs -b 65536
These parameters/options are the ones suggested by the Gentoo link but there might be some room for improvement using some of the options described here.
Now to get the archive mounted together with the writeable folder it is necessary to edit /etc/fstab
and add the following lines:
/squashed/usr/usr.sfs /squashed/usr/ro squashfs loop,ro 0 0 usr /usr aufs udba=reval,br:/squashed/usr/rw:/squashed/usr/ro 0 0
Now you should be done and able to reboot. The original author suggests to delete all the old content of /usr
, but this might cause some problems if anything goes wrong during some later re-compression. It is safer to leave the old files in place.
A Bash script has been created that will automate the process of re-compressing (read updating) the archive since the tutorial is meant for Gentoo and some options do not correlate to what they should be in Arch.
Tuning for an SSD
SSD#Tips_for_Maximizing_SSD_Performance
RAM disks / tuning for really slow disks
CPU
The only way to directly improve CPU speed is overclocking. As it is a complicated and risky task, it is not recommended for anyone except experts. The best way to overclock is through the BIOS. When purchasing your system, keep in mind that most Intel motherboards are notorious for disabling the capability to overclock.
Many Intel i5 and i7 chips, even when overclocked properly through the BIOS or UEFI interface, will not report the correct clock frequency to acpi_cpufreq and most other utilities. This will result in excessive messages in dmesg about delays unless the module acpi_cpufreq is unloaded and blacklisted. The only tool known to correctly read the clock speed of these overclocked chips under Linux is i7z. The i7z package is available in the community repo and i7z-gitAUR is available in the AUR.
A way to modify performance (ref) is to use Con Kolivas' desktop-centric kernel patchset, which, among other things, replaces the Completely Fair Scheduler (CFS) with the Brain Fuck Scheduler (BFS).
Kernel PKGBUILDs that include the BFS patch can be installed from the AUR or Unofficial User Repositories. See the respective pages for linux-ckAUR and Linux-ck wiki page, linux-bfsAUR or linux-pfAUR for more information on their additional patches.
Verynice
Verynice is a daemon, available in the AUR as veryniceAUR, for dynamically adjusting the nice levels of executables. The nice level represents the priority of the executable when allocating CPU resources. Simply define executables for which responsiveness is important, like X or multimedia applications, as goodexe in /etc/verynice.conf
. Similarly, CPU-hungry executables running in the background, like make, can be defined as badexe. This prioritization greatly improves system responsiveness under heavy load.
Ulatencyd
Ulatencyd is a daemon that controls how the Linux kernel will spend its resources on the running processes. It uses dynamic cgroups to give the kernel hints and limitations on processes. It supports prioritizing processes for disk I/O as well as CPU shares, and uses more clever heuristics than Verynice. In addition, it comes with a good set of configs out of the box.
One note of warning, by default it changes the default scheduler of all block devices to cfq, to disable behavior see Ulatencyd.
Graphics
Xorg.conf configuration
Graphic performance heavily depends on the settings in /etc/X11/xorg.conf
. There are tutorials for Nvidia, ATI and Intel cards. Improper settings may stop Xorg from working, so caution is advised.
Driconf
Driconf is a small utility that can be found in the official repositories that allows you to change the direct rendering settings for open source drivers. Enabling HyperZ can drastically improve performance.
GPU overclocking
Overclocking a graphics card is typically more expedient than with a CPU, since there are readily accessible software packages which allow for on-the-fly GPU clock adjustments. For ATI users, get rovclockAUR or amdoverdrivectrlAUR, and NVIDIA users should get nvclockAUR from the AUR. Intel chipsets users can install GMABooster from with the gmaboosterAUR AUR package.
The changes can be made permanent by running the appropriate command after X boots, for example by adding it to ~/.xinitrc
. A safer approach would be to only apply the overclocked settings when needed.
RAM and swap
Relocate files to tmpfs
Relocate files, such as your browser profile, to a tmpfs file system, including /tmp
, or /dev/shm
for improvements in application response as all the files are now stored in RAM.
Use an active management script for maximal reliability and ease of use.
Refer to the Profile-sync-daemon wiki article for more information on syncing browser profiles.
Refer to the Anything-sync-daemon wiki article for more information on syncing any specified folder.
Swappiness
See Swap#Swappiness.
Compcache/Zram
Compcache, nowadays replaced by the zram kernel module, creates a device in RAM and compresses it. If you use for swap means that part of the RAM can hold much more information but uses more CPU. Still, it is much quicker than swapping to a hard drive. If a system often falls back to swap, this could improve responsiveness. Zram is in mainline staging (therefore its not stable yet, use with caution).
The AUR package zramswapAUR provides an automated script fot setting up such swap devices with optimal settings for your system (such as RAM size and CPU core number). The script creates one zram device per CPU core with a total space equivalent to the RAM available. To do this automatically on every boot, enable zramswap.service
via systemctl.
You will have a compressed swap with higher priority than your regular swap which will utilize multiple CPU cores for compessing data.
Using the graphic card's RAM
In the unlikely case that you have very little RAM and a surplus of video RAM, you can use the latter as swap. See Swap on video ram.
Preloading
Preloading is the action of putting and keeping target files into the RAM. The benefit is that preloaded applications start more quickly because reading from the RAM is always quicker than from the hard drive. However, part of your RAM will be dedicated to this task, but no more than if you kept the application open. Therefore preloading is best used with large and often-used applications like Firefox and LibreOffice.
Go-preload
gopreloadAUR is a small daemon created in the Gentoo forum. To use it, first run this command in a terminal for each program you want to preload at boot:
# gopreload-prepare program
Then, as instructed, press Enter when the program is fully loaded. This will add a list of files needed by the program in /usr/share/gopreload/enabled
. To load all lists at boot, add gopreload
to your DAEMONS array in /etc/rc.conf
. To disable the loading of a program, remove the appropriate list in /usr/share/gopreload/enabled
or move it to /usr/share/gopreload/disabled
.
Preload
A more automated approach is used by Preload. All you have to do is enable it with this command:
# systemctl enable preload
It will monitor the most used files on your system, and with time build its own list of files to preload at boot.
Boot time
You can find tutorials with good tips in the article Improve Boot Performance.
Suspend to RAM
The best way to reduce boot time is not booting at all. Consider suspending your system to RAM instead.
Application-specific tips
Firefox
See Firefox Tweaks#Performance and Firefox Ramdisk.
Firefox in the official repositories is built with the profile guided optimization flag enabled. You may want to use it in your custom build. To do this append
ac_add_options --enable-profile-guided-optimization
to your mozconfig.
Gcc/Makepkg
See Ccache.
Office suites
See Speed up LibreOffice and Speed up OpenOffice.
Pacman
See Improve Pacman Performance.
SSH
See Speed up SSH.
Laptops
See Laptop.