Jump to content

Package Proxy Cache

From ArchWiki


If you want to install the same Arch packages over and over - e.g. for testing purposes - it could help if you would not have to get the packages every time from the internet. This article shows you how to share packages so that you can greatly decrease your download times. Keep in mind you should not share between different architectures (i.e. i686 and x86_64) or you will run into problems.

Read-only cache

Pacman 6.1.0 supports cache servers directly. Cache servers will be tried before any non-cache servers, will not be removed from the server pool because of HTTP 404 download errors, and will not be used for database files.

If you are looking for a quick solution, you can simply run a basic temporary webserver which other computers can use as their cache server.

Start serving this directory. For example, with Python http.server module:

$ python -m http.server -d /var/cache/pacman/pkg/
Tip: By default, Python http.server listens on port 8000 of any interface. To use another port, or bind only to specific address, simply add a parameter and an argument:
$ python -m http.server -d /var/cache/pacman/pkg/ --bind 127.0.0.1 8080

Then edit /etc/pacman.d/mirrorlist on each client machine to add this server:

/etc/pacman.d/mirrorlist
CacheServer = http://server-ip:port
Warning: Do not append /repos/$repo/os/$arch to this custom server like for other entries, as this hierarchy does not exist and therefore queries will fail.

If looking for a more standalone solution, darkhttpd offers a very minimal webserver. Replace the previous python command with e.g.:

[http]$ darkhttpd /var/cache/pacman/pkg --no-server-id

You could also run darkhttpd as a systemd service for convenience: see Systemd#Writing unit files.

miniserve, a small web server written in Rust, can also be used:

$ miniserve /var/cache/pacman/pkg

Then edit /etc/pacman.d/mirrorlist as above with the first url miniserve is available at.

If you are already running a web server for some other purpose, you might wish to reuse that as your local repository server instead. For example, if you already serve a site with nginx, you can add an nginx server block listening on port 8080:

/etc/nginx/nginx.conf
server {
    listen 8080;
    root /var/cache/pacman/pkg;
    server_name myarchrepo.localdomain;
    try_files $uri $uri/;
}

Remember to restart nginx.service after making this change.

Tip: Whichever web server you use, make sure the firewall configuration (if any) allows the configured port to be reached by the desired traffic, and disallows any undesired traffic. See Security#Network and firewalls.

Overlay mount of read-only cache

It is possible to use one machine on a local network as a read-only package cache by overlay mounting its /var/cache/pacman/pkg directory. Such a configuration is advantageous if this server has installed on it a reasonably comprehensive selection of up-to-date packages which are also used by other boxes. This is useful for maintaining a number of machines at the end of a low bandwidth upstream connection.

As an example, to use this method:

# mkdir /tmp/remote_pkg /mnt/workdir_pkg /tmp/pacman_pkg
# sshfs remote_username@remote_pkgcache_addr:/var/cache/pacman/pkg /tmp/remote_pkg -C
# mount -t overlay overlay -o lowerdir=/tmp/remote_pkg,upperdir=/var/cache/pacman/pkg,workdir=/mnt/workdir_pkg /tmp/pacman_pkg
Note: The working directory must be an empty directory on the same mounted device as the upper directory. See Overlay filesystem#Usage.
Tip: If listing the /tmp/pacman_pkg overlay directory gives errors, e.g., "Stale file handle", try overlay mounting with options -o redirect_dir=off -o index=off.

After this, run pacman using the option --cachedir /tmp/pacman_pkg, e.g.:

# pacman -Syu --cachedir /tmp/pacman_pkg

Distributed read-only cache

There are Arch-specific tools for automatically discovering other computers on your network offering a package cache. Try pacredir, pacserve, pkgdistcacheAUR, or paclanAUR. pkgdistcache uses Avahi instead of plain UDP which may work better in certain home networks that route instead of bridge between Wi-Fi and Ethernet.

Historically, there was PkgD and multipkg, but they are no longer maintained.

Read-write cache

In order to share packages between multiple computers, simply share /var/cache/pacman/ using any network-based mount protocol. This section shows how to use SSHFS to share a package cache plus the related library-directories between multiple computers on the same local network. Keep in mind that a network shared cache can be slow depending on the file-system choice, among other factors.

First, install any network-supporting filesystem packages: sshfs, curlftpfs, samba or nfs-utils.

Tip:
  • To use sshfs, consider reading Using SSH Keys.
  • By default, smbfs does not serve filenames that contain colons, which results in the client downloading the offending package afresh. To prevent this, use the mapchars mount option on the client.

Then, to share the actual packages, mount /var/cache/pacman/pkg from the server to /var/cache/pacman/pkg on every client machine.

Warning: Do not make /var/cache/pacman/pkg or any of its ancestors (e.g., /var) a symlink. Pacman expects these to be directories. When pacman re-installs or upgrades itself, it will remove the symlinks and create empty directories instead. However during the transaction pacman relies on some files residing there, hence breaking the update process. Refer to FS#50298 for further details.

Two-way with rsync or FTP

Another approach in a local environment is rsync. Choose a server for caching and enable the rsync daemon. On clients synchronize two-way with this share via the rsync protocol. Filenames that contain colons are no problem for the rsync protocol.

Draft example for a client, using uname -m within the share name ensures an architecture-dependent sync:

# rsync ... rsync://server/share_$(uname -m)/ /var/cache/pacman/pkg/ 
# pacman -Syu
# paccache --remove --keep 3
# rsync --delete ... /var/cache/pacman/pkg/ rsync://server/share_$(uname -m)/

Instead of relying on unencrypted rsync daemon a more secure security option is rsync over ssh, Rsync#Automated backup with SSH gives an overview.

In case rsync is not available in your local environment, a simple ftp service is suitable for the two-way sync as well. lftp provides a --mirror and a --delete option to sync a local with a remote storage.

Dynamic reverse proxy cache using nginx

nginx can be used to proxy package requests to official upstream mirrors and cache the results to the local disk. All subsequent requests for that package will be served directly from the local cache, minimizing the amount of internet traffic needed to update a large number of computers.

In this example, the cache server will run at http://cache.domain.example:8080/ and store the packages in /srv/http/pacman-cache/.

Install nginx on the computer that is going to host the cache. Create the directory for the cache and adjust the permissions so nginx can write files to it:

# mkdir /srv/http/pacman-cache
# chown http:http /srv/http/pacman-cache

Use the nginx pacman cache config as a starting point for /etc/nginx/nginx.conf. Check that the resolver directive works for your needs. In the upstream server blocks, configure the proxy_pass directives with addresses of official mirrors, see examples in the configuration file about the expected format. Once you are satisfied with the configuration file start and enable nginx.

In order to use the cache each Arch Linux computer (including the one hosting the cache) must have the following line at the top of the mirrorlist file:

/etc/pacman.d/mirrorlist
Server = http://cache.domain.example:8080/$repo/os/$arch
...
Note: You will need to create a method to clear old packages, as the cache directory will continue to grow over time. paccache (which is provided by pacman-contrib) can be used to automate this using retention criteria of your choosing. For example, find /srv/http/pacman-cache/ -type d -exec paccache -v -r -k 2 -c {} \; will keep the last 2 versions of packages in your cache directory.

Squid

Squid proxy can be setup to only cache arch packages and can be used with aif/pacman/wget/etc with minimal configuration on the client system.

Install Squid

Install squid.

Configure Squid

This is the minimum configuration to get squid cache arch packages.

Cache Rules

Before defining these rules, remove/comment (if you do not need them) all the default refresh_patterns

/etc/squid/squid.conf 
refresh_pattern \.pkg\.tar\.   0       20%     4320      reload-into-ims
refresh_pattern .              0       0%      0

That should define that *.pkg.tar.* gets cached, and anything else should not.

Maximum Filesize

Objects larger than this size will NOT be saved on disk:

/etc/squid/squid.conf 
maximum_object_size 256 MB

Cache Directory

Set the cache dir and its maximum size and subdirs:

/etc/squid/squid.conf 
cache_dir aufs /var/cache/squid 10000 16 256

Shutdown Lifetime

Time to wait until all active client sockets are closed:

/etc/squid/squid.conf 
shutdown_lifetime 1 seconds 
Note:

Every time you change the cache_dir path (and after fresh install), you need to (re)create this directory:

# squid -z

and it could be helpful to check the configuration file before running squid:

# squid -k parse

Start Squid

Just start squid.service or if squid is already running restart it.

Note:

It could be helpful to check the configuration file before running:

# squid -k check

Follow Squid access log

To see the access to squid:

# tail -f /var/log/squid/access.log

You should see this for packages that are directed to original host:

...TCP_MISS/200...DIRECT...

and for packages that are delivered from the cache:

...TCP_HIT/200...NONE...

Manual Arch Install

This article or section is being considered for removal.

Reason: AIF (/arch/setup) does not exist anymore; configuring Squid while using the ram-based install image does not seem like a regular use case. (Discuss in Talk:Package Proxy Cache)

Before running /arch/setup, add variables for your proxy. To do so, run on the console:

# export http_proxy='http://your_squid_machine_ip:3128/'
# export ftp_proxy='ftp://your_squid_machine_ip:3128/'

Now just use /arch/setup to normally install the system, and it should use your proxy. Watch the squid logs to verify this.

Note: If you want to use the proxy settings in the installed system, you need to add the http_proxy and/or ftp_proxy variables in an appropriate place on the installed system. (like /etc/profile.d/proxy.sh)

Intercepting local requests

If you want all HTTP requests on local machine automagically go through squid, we first need to add an intercepting port for squid:

/etc/squid/squid.conf 
http_port 3127 intercept

and iptables rules to redirect all (except the ones from squid) port 80 requests to squid:

# iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --uid-owner proxy -j ACCEPT
# iptables -t nat -A OUTPUT -p tcp --dport 80 -j REDIRECT --to-ports 3127
Note: if you get random slow download speeds in vagrant/packer/virtualbox, try using virtio network device type.

Pacoloco proxy cache server

Pacoloco is an easy-to-use proxy cache server for pacman repositories. It also allows automatic prefetching of the cached packages.

It can be installed as pacoloco. Open the configuration file and add pacman mirrors:

/etc/pacoloco.yaml
port: 9129
repos:
  mycopy:
    urls:
      - http://mirror.lty.me/archlinux
      - http://mirrors.kernel.org/archlinux

Restart pacoloco.service and the proxy repository will be available at http://myserver:9129/repo/mycopy.

Flexo proxy cache server

Flexo is yet another proxy cache server for pacman repositories. Flexo is available as flexo-gitAUR. Once installed, start the flexo.service unit.

Flexo runs on port 7878 by default. Enter Server = http://myserver:7878/$repo/os/$arch to the top of your /etc/pacman.d/mirrorlist so that pacman downloads packages via Flexo.

Synchronize pacman package cache using synchronization programs

Use Syncthing or Resilio Sync to synchronize the pacman cache directories (i.e. /var/cache/pacman/pkg).

Preventing unwanted cache purges

By default, pacman -Sc removes package tarballs from the cache that correspond to packages that are not installed on the machine the command was issued on. Because pacman cannot predict what packages are installed on all machines that share the cache, it will end up deleting files that should not be.

To clean up the cache so that only outdated tarballs are deleted:

/etc/pacman.conf
[options]
CleanMethod = KeepCurrent