DeveloperWiki:Automated Package Build System

From ArchWiki
Jump to navigation Jump to search

What is an Automated Build System

One of the best examples of an automated package build systems is the Fedora Koji project, Koji is a continuous build system for all of the rpms in the Fedora and RHEL projects. The main benefit of an automated build system is that all of the packages need to pass through a common gate, a common checkpoint for quality and consistency.

While the Koji build server is referenced heavily the Arch equivalent will need to be very different, in both architecture and the general goals of the project.


The proposed name for the Arch package build system is Quarters, the logic being that what really feeds pacman is quarters.

Alternative naming options will be considered

Design Proposal

One of the main aspects of the Arch distribution compared to distributions like Fedora is money. The Fedora project can go out and buy servers by the arm full, but Arch will need to continue to rely on volunteer equipment and working with what we can get our hands on. While these aspects will effect the nature of the package build system, they cannot allow for the compromise of the quality of the Arch Linux distribution.

Proposed Features

  1. Simplicity
    1. Avoid requiring too many daemons
    2. No Authentication systems
    3. No database dependencies
    4. Simple https communication
    5. Able to interact with distributed components over any reasonable link quality
  1. Distributed
    1. Need to be able to distribute the build load to systems all over the world
    2. All communication needs to be encrypted (duh)
    3. Builders make decisions by peer review
  1. Data Model
    1. Information on the available packages is made by parsing live pacman data (pacman is fast enough)
    2. Packages to build is presented to the builders via serialized format (probably json)
  1. Communication
    1. All build communication is based on gathering presented data, all pulls, not puts
    2. All servers are state machines, the status of the distributed build environment is assessed by parsing the "global" state machine
    3. Command system to move packages manually from the build repos to the final repos
  1. Build Cleanliness
    1. Every package is built in its own clean chroot environment
  1. Packages can be pulled and polled from many sources
    1. SVN
    2. AUR (plugable?)
    3. GIT
    4. Other scms ()
    5. Web interface
  1. Building Requirements
    1. All packages need to be build-able, this includes the base toolchain
    2. Trigger mass rebuilds of package, with version bumps
    3. Track Packager data through the build process, don't allow the packager to be lost to the builder
  1. System Interaction points
    1. Detached cli interface sends signals to the master
    2. Master server reads signals and acts on them

Programming Language

It has been proposed that the application be developed in Python. Python is fast enough, has the libs we need, and who doesn't understand python?

Sorry, but this project is too big for bash, and bash is too slow.

I thought about OCaml, but I figured that it should be something that everyone can hack on.

Try and go for python3

Builder Process

The process that the builders will use is based on state information, the master server presents the master state, which is the list validated builders and the packages to be built. The master server will only present packages which are ready to be built. The builders then download the master server's builder information. The builders pick a package to build, post the package that they have claimed and then query the other builders to see if any other builder has claimed it. If the builder needs to change the package to build then it will just post a claim on a new package. Unless it is the first build on the builder then the peer process of determining which package needs to be built will be done while a package is already being built.

The master will regularly poll the states of the builders, the builders will post their states for the master and for the peers. The builder states will have all the information that the master needs to retrieve the built packages or the information for the failed build.

Component Build Out

Quarters will consist of a number of individual components. These components are all pieces of the distributed system, the components will also interact via a common language independent medium and https.

Master Server Components

The components required for the operation of the master server

Package Parsers

The package parsers will be pluggable modules that return predictable data about the state of source package data stored in either a source control manager or an interface such as the abs or the AUR.

The idea here is that as long as the data queried from the source is uniform there can be any number of usable interfaces. The primary system will need to be svn, since this is what the main package tree is stored in, then git and then an AUR parser. More interfaces can be added on later if we or anyone else wants them.

All parsers need to return a similar data structure and may need to do some package preparation. The data structure will need to be defined and static, but it should just be a python dict, something json likes.

SVN Parser

This is the main package parser, the parser needs to be able to ready the PKGBUILDS for specific info, particularly pkgver, pkgname and pkgrel, then compare the values to the list of packages that are already built. Once the packages that need updating have been found the parser needs to return a data structure containing the packages that need to be built. The main challenge here is figuring out the fastest way to parse an SVN repo, they can be somewhat slow.

Package Build Order

The package build order is critical, only packages which have all deps met in binary repos should be posted to the builders.

Standalone Components

These components are used by multiple services

Https Server

Each component will need to be able to present files via an https server. The package will require a standalone https server written in python.

Standard Utils Module

That anoying module that sets up logging and other more globally needed functions