Distcc

From ArchWiki
Revision as of 10:50, 16 December 2011 by Kynikos (Talk | contribs) (this is why I don't like the idea of separating the summary from the introduction, see Help_talk:Style/Article_summary_templates)

Jump to: navigation, search

This template has only maintenance purposes. For linking to local translations please use interlanguage links, see Help:i18n#Interlanguage links.


Local languages: Català – Dansk – English – Español – Esperanto – Hrvatski – Indonesia – Italiano – Lietuviškai – Magyar – Nederlands – Norsk Bokmål – Polski – Português – Slovenský – Česky – Ελληνικά – Български – Русский – Српски – Українська – עברית – العربية – ไทย – 日本語 – 正體中文 – 简体中文 – 한국어


External languages (all articles in these languages should be moved to the external wiki): Deutsch – Français – Română – Suomi – Svenska – Tiếng Việt – Türkçe – فارسی

Summary help replacing me

distcc is a program that distributes source code among a number of distcc-servers allowing many machines to compile one program and thus speed up the compilation process. The cool part is you can use it together with pacman/srcpac.

Related
TORQUE

Distcc is a program to distribute builds of C, C++, Objective C or Objective C++ code across several machines on a network. distcc should always generate the same results as a local build, is simple to install and use, and is usually much faster than a local compile.

Terms

Note: The terminology used by the software can be a bit counterintuitive in that "the daemon" is the master and "the server(s)" are the slave PC(s) in a distcc cluster.
distcc daemon
The PC or server that's running distcc to distribute the source code. The daemon itself will compile parts of the source code but will also send other parts to the hosts defined in DISTCC_HOSTS.
distcc server
The PC or server compiling the source code it gets from the daemon. When it's done compiling, it sends back the object code (i.e. compiled source code) to the daemon, which in turn sends back some more source code (if there's any left to compile).

Getting started

Install the distcc package from [community] on all PCs in the cluster:

# pacman -S distcc

For other distro's or even OS'es including Windows through using Cygwin, refer to the distcc docs.

Configuration

Both Daemon and Server(s)

Edit /etc/conf.d/distccd and modify the only uncommented line with the correct IP address or range of your daemon or of your entire subnet:

DISTCC_ARGS="--user nobody --allow 192.168.0.0/24"

Daemon Only

Edit /etc/makepkg.conf in the following three sections:

  1. BUILDENV has distcc unbanged i.e. without exclamation point.
  2. Uncomment the DISTCC_HOSTS line and add the IP addresses of the servers (slaves) then a slash and the number of threads they are to use. The subsequent IP address/threads should be separated by a white space. This list is ordered from most powerful to least powerful (processing power).
  3. Adjust the MAKEFLAGS variable to correspond to the number of sum of the number of individual values specified for the max threads per server. In the example below, this is 5+3+3=11. If users specify more than this sum, the extra theoretical thread(s) will be blocked by distcc and appear as such in monitoring utils such as distccmon-text described below.
Note: It is common practice to define the number of threads as the number of physical core+hyperhtreaded cores (if they exist) plus 1. Do this on a per-server basis, NOT in the MAKEFLAGS!

Example using relevant lines:

BUILDENV=(distcc fakeroot color !ccache !check)
MAKEFLAGS="-j11"
DISTCC_HOSTS="192.168.0.2/5 192.168.0.3/3 192.168.0.4/3"

If users wish to use distcc through SSH, add an "@" symbol in front of the IP address in this section. If key-based auth is not setup on the systems, set the DISTCC_SSH variable to ignore checking for authenticated hosts, i.e. DISTCC_SSH="ssh -i"

Compile

Start the distcc daemon on every participating machine:

# rc.d start distccd

If having distccd run at boot up, add it to the DAEMONS array in /etc/rc.conf.

Compile via makepkg as normal.

Monitoring Progress

Progress can be monitored via several methods.

  1. distccmon-text
  2. tailing log file

Invoke distccmon-text to check on compilation status:

$ distccmon-text
29291 Preprocess  probe_64.c                                 192.168.0.2[0]
30954 Compile     apic_noop.c                                192.168.0.2[0]
30932 Preprocess  kfifo.c                                    192.168.0.2[0]
30919 Compile     blk-core.c                                 192.168.0.2[1]
30969 Compile     i915_gem_debug.c                           192.168.0.2[3]
30444 Compile     block_dev.c                                192.168.0.3[1]
30904 Compile     compat.c                                   192.168.0.3[2]
30891 Compile     hugetlb.c                                  192.168.0.3[3]
30458 Compile     catalog.c                                  192.168.0.4[0]
30496 Compile     ulpqueue.c                                 192.168.0.4[2]
30506 Compile     alloc.c                                    192.168.0.4[0]

One can have this program run continuously by using watch or by appending a space followed by integer to the command which corresponds to the number of sec to wait for a repeat query:

$ watch distccmon-text

or

$ distccmon-text 2

One can also simply tail /var/log/messages.log on daemon:

# tail -f /var/log/messages.log

"Cross Compiling" with Distcc

There are currently two method from which to select to have the ability of distcc distribution of tasks over a cluster building i686 packages from a native x86_64 environment. Neither is ideal, but to date, there are the only two methods documented on the wiki.

An ideal setup is one that uses the unmodified ARCH packages for distccd running only once one each node regardless of building from the native environment or from within a chroot AND one that works with makepkg. Again, this Utopian setup is not currently known.

A discussion thread has been started on the topic; feel free to contribute.

Chroot Method (Preferred)

Note: This method works, but is not very elegant requiring duplication of distccd on all nodes AND need to have a 32-bit chroots on all nodes.

Assuming the user has a 32-bit chroot setup and configured on each node of the distcc cluster, the strategy is to have two separate instances of distccd running on different ports on each node -- one runs in the native x86_64 environment and the other in the x86 chroot on a modified port. Start makepkg via a schroot command invoking makepkg.

Setup

Setup the chroot according to the aforementioned link. Be sure to install the discc package!

Once distcc is installed in the chroot, simply make three minor modifications inside the chroot:

  1. Configure distccd to run on another port which allows two version (one outside the chroot and once inside) to co-exist.
  2. Create a link in /usr/bin to "distccd2".
  3. Modify two lines in /etc/rc.d/distccd pointing the script to the symlink
  4. Redefine the DISTCC_HOSTS including a the modified port number in /opt/arch32/etc/rc.d/distccd

Configuration

Example /etc/conf.d/distcc.conf in the chroot:

DISTCC_ARGS="--user nobody --allow 192.168.0.0/24 --port 3692 --log-level info --log-file /tmp/distccd-i686.log"

Symlink

The symlink serves as a trivial method to avoid having two of the same programs (and start paths) running in the process list. Omitting this step will render the second instance of distcc unable to start by design of the init script: specifically, "pidof -o %PPID /usr/bin/distccd"!

Execute the following inside the chroot:

# ln -s /usr/bin/distccd /usr/bin/distccd2

Modification to Daemon Script

#!/bin/bash

[ -f /etc/conf.d/distccd ] && . /etc/conf.d/distccd

. /etc/rc.conf
. /etc/rc.d/functions

PID=`pidof -o %PPID /usr/bin/distccd2`    <-----
case "$1" in
  start)
    stat_busy "Starting distcc Daemon"
    [ -z "$PID" ] && /usr/bin/distccd2 --daemon ${DISTCC_ARGS}    <-----
    if [ $? -gt 0 ]; then
      stat_fail
    else
      add_daemon distccd
      stat_done
    fi
    ;;

Streamline this by simply adding a line to /etc/rc.d/arch32 to invoke the modified distccd like so:

case $1 in
	start)
		stat_busy "Starting Arch32 chroot"
		for d in "${dirs[@]}"; do
		  mount -o bind $d /opt/arch32/$d
		done
		mount -t proc none /opt/arch32/proc
		mount -t sysfs none /opt/arch32/sys
      ----->    linux32 chroot /opt/arch32 sh -c "/etc/rc.d/distccd start" || return 1    <-----
		add_daemon arch32
		stat_done
		;;
         stop)
                stat_busy "Stopping Arch32 chroot"
      ----->    linux32 chroot /opt/arch32 sh -c "/etc/rc.d/distccd stop" || return 1    <-----
                ...

Add port numbers to DISTCC_HOSTS on the i686 chroot

Append the port number defined eariler (3692) to each of the hosts in /opt/arch32/etc/makepkg.conf as follows:

DISTCC_HOSTS="192.168.1.101/5:3692 192.168.1.102/5:3692 192.168.1.103/3:3692"
Note: This only needs to be setup on the "master" i686 chroot. Where "master" is defined as the one from which the compilation will take place.

Invoke makepkg from the Native Environment

Setup schroot on the native x86_64 environment. Invoke makepkg to build an i686 package from the native x86_64 environment, simply by:

$ schroot -p -- makepkg -src

Multilib GCC Method (Not Recommended)

Warning: Errors have been reported when using this method to build the i686 linux package from a native x86_64 system! The chroot method is preferred and has been verified to work building the kernel packages.

Edit /etc/pacman.conf and uncomment the multilib repo:

[multilib]
Include = /etc/pacman.d/mirrorlist

Install gcc-multilib and its dependencies

# pacman -Syy & pacman -S gcc-multilib binutils-multilib

Compile packages on x86_64 for i686 is as easy as adding the following lines to $HOME/.makepkg.conf

CARCH="i686"
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O2 -pipe -m32"
CXXFLAGS="${CFLAGS}"

and invoking makepkg via the following

$ linux32 makepkg -src

Remember to remove or modify $HOME/.makepkg.conf when finished compiling i686 packages!

Tips/Tricks

Limit HDD/SDD usage

Relocate $HOME/.distcc

By default, distcc creates $HOME/.distcc which stores transient relevant info as it serves up work for nodes to compile. Create a directory named .distcc in RAM such as /tmp and soft link to it in $HOME. This will avoid needless HDD read/writes and is particularly important for SSDs.

$ mv $HOME/.distcc /tmp
$ ln -s $HOME/.distcc /tmp/.distcc

One only needs to have /etc/rc.local re-create this directory on a reboot (the soft link will remain until it is manually removed like any other file):

su -c "mkdir /tmp/.distcc" USERNAME

Adjust log level

By default, distcc will log to /var/log/messages.log as it goes along. One trick (actually recommended in the distccd manpage) is to log to an alternative file directly. Again, one can locate this in RAM via /tmp. Another trick is to lower to log level of minimum severity of error that will be included in the log file. Useful if only wanting to see error messages rather than an entry for each connection. LEVEL can be any of the standard syslog levels, and in particular critical, error, warning, notice, info, or debug.

Both of these lines are to be appended to DISTCC_ARGS in /etc/conf.d/distccd

DISTCC_ARGS="--user nobody --allow 192.168.0.0/24 --log-level error --log-file /tmp/distccd.log"