Local Mirror

From ArchWiki
Revision as of 19:33, 26 July 2010 by Romashka (Talk | contribs) (added info about mirror size)

Jump to: navigation, search

This template has only maintenance purposes. For linking to local translations please use interlanguage links, see Help:i18n#Interlanguage links.


Local languages: Català – Dansk – English – Español – Esperanto – Hrvatski – Indonesia – Italiano – Lietuviškai – Magyar – Nederlands – Norsk Bokmål – Polski – Português – Slovenský – Česky – Ελληνικά – Български – Русский – Српски – Українська – עברית – العربية – ไทย – 日本語 – 正體中文 – 简体中文 – 한국어


External languages (all articles in these languages should be moved to the external wiki): Deutsch – Français – Română – Suomi – Svenska – Tiếng Việt – Türkçe – فارسی

If you want to create an official mirror see this page.

NOTE: 95% of users will NEVER need this. Rsyncing every package from core and extra will give you a lot of stuff you will never need or use. Only follow these instructions if you are running a very large site of Arch machines. Perhaps using a Network Shared Pacman Cache would serve you better. You could also use pkgd to effectively create a shared cache across your LAN without the need to manage a central cache.

NOTE: Due to traffic issues rsyncing from rsync.archlinux.org is allowed only for official mirrors. If you want to create an official mirror - write to the mailing list about this and entries in /etc/pacman.d/* will be added and your IP address will be allowed.

If you want to get a full mirror for personal use only, you may rsync from rsync://distro.ibiblio.org/distros/archlinux/

This document describes how to create a mirror on your local machine of all the packages and iso files on the Arch mirrors, how to update it using cron, how to serve the mirror with vsftpd, and how to set up pacman to use the local mirror.

Mirror size

To give you an impression about how much space will be needed for a mirror here are some numbers: archive - 14GB (permanently frozen) core + extra + community - 20GB as of July 2010 testing + community-testing - from few MBs to few GBs temporary repositories (like gnome-unstable, kde-unstable, xorg18 etc.) - up to 1GB iso - 11GB as of July 2010 other - 5.3GB as of July 2010 sources - 6.8GB as of July 2010

Most mirrors do not sync archive, other and sources directories, but sync everything else (including temporary repositories), so usually you will need about 35GB reserved for Arch Linux mirror.

Initial Setup

First, let's update and install the necessary tools:

pacman -Syu
pacman -S rsync vsftpd

Now, we are going to create a new user (with no login privileges) that will be used for sync operations, and as the user to serve the files with FTP. The name of our user is "mirror" in this example, but you can use any name you want. Do not use root, or any account that has login access. In order to make this secure, we want to use a user with as few privileges as possible.

useradd -m -s /bin/false mirror

Now, let's get on with setting up the mirror.

Creating the local mirror directory

We will be using /home/mirror, the home directory of our unprivileged user, for storage of the scripts, logs, and packages.

The first thing we need to do is create several directories in /home/mirror:

cd /home/mirror
sudo -u mirror mkdir {scripts,files,logs}

The synchronization script

Now lets create the actual rsync script, scripts/mirrorsync.sh, using your favorite editor.

#!/bin/bash
#
# The script to sync a local mirror of the Arch Linux repositories and ISOs
#
# Copyright (C) 2007 Woody Gilk <woody@archlinux.org>
# Modifications by Dale Blount <dale@archlinux.org>
# and Roman Kyrylych <roman@archlinux.org>
# and Vadim Gamov <nickleiten@gmail.com>
# Licensed under the GNU GPL (version 2)

# Filesystem locations for the sync operations
SYNC_HOME="/home/mirror"
SYNC_LOGS="$SYNC_HOME/logs"
SYNC_FILES="$SYNC_HOME/files"
SYNC_LOCK="$SYNC_HOME/mirrorsync.lck"

# Select which repositories to sync
# Valid options are: core, extra, testing, community, iso
# Leave empty to sync a complete mirror
# SYNC_REPO=(core extra testing community iso)
SYNC_REPO=()

# Set the rsync server to use
# Only official public mirrors are allowed to use rsync.archlinux.org
# SYNC_SERVER=rsync.archlinux.org::ftp
SYNC_SERVER=distro.ibiblio.org::distros/archlinux

# Set the format of the log file name
# This example will output something like this: sync_20070201-8.log
LOG_FILE="pkgsync_$(date +%Y%m%d-%H).log"

#Watchdog part (time in seconds of uninterruptable work of script)
#        Needed for low-speed and/or unstable links to prevent
#       rsync hunging up.
#        New instance of script checks for timeout, if it occurs
#       it'll kill previous instance, in elsecase it'll exit without
#       any work.
WD_TIMEOUT=10800 

# Do not edit the following lines, they protect the sync from running more than
# one instance at a time
if [ ! -d $SYNC_HOME ]; then
  echo "$SYNC_HOME does not exist, please create it, then run this script again."
  exit 1
fi

if [ -f $SYNC_LOCK ];then
    OPID=`head -n1 $SYNC_LOCK`;
    TIMEOUT=`head -n2 $SYNC_LOCK|tail -n1`;
    NOW=`date +%s`;
    if [ "$NOW" -ge "$TIMEOUT" ];then
       kill -9 $OPID;
    fi
    MYNAME=`basename $0`;
    TESTPID=`ps -p $OPID|grep $OPID|grep $MYNAME`;
    if [ "$TESTPID" != "" ];then
        echo "exit";
        exit 1;
    else
        rm $SYNC_LOCK;
    fi
fi
echo $$ > "$SYNC_LOCK"
echo `expr \`date +%s\` + $WD_TIMEOUT` >> "$SYNC_LOCK"
# End of non-editable lines

# Create the log file and insert a timestamp
touch "$SYNC_LOGS/$LOG_FILE"
echo "=============================================" >> "$SYNC_LOGS/$LOG_FILE"
echo ">> Starting sync on $(date --rfc-3339=seconds)" >> "$SYNC_LOGS/$LOG_FILE"
echo ">> ---" >> "$SYNC_LOGS/$LOG_FILE"

if [ -z $SYNC_REPO ]; then
  # Sync a complete mirror
  rsync -rtlvH --delete-after --delay-updates --safe-links --max-delete=1000 $SYNC_SERVER "$SYNC_FILES" >> "$SYNC_LOGS/$LOG_FILE"
  # Create $repo.lastsync file with timestamp like "2007-05-02 03:41:08+03:00"
  # which may be useful for users to know when the mirror was last updated
  date --rfc-3339=seconds > "$SYNC_FILES/$repo.lastsync"
else
  # Sync each of the repositories set in $SYNC_REPO
  for repo in ${SYNC_REPO[@]}; do
    repo=$(echo $repo | tr [:upper:] [:lower:])
    echo ">> Syncing $repo to $SYNC_FILES/$repo" >> "$SYNC_LOGS/$LOG_FILE"

    # If you only want to mirror i686 packages, you can add
    # " --exclude=os/x86_64" after "--delete-after"
    # 
    # If you only want to mirror x86_64 packages, use "--exclude=os/i686"
    # If you want both i686 and x86_64, leave the following line as it is
    #
    rsync -rtlvH --delete-after --delay-updates --max-delete=1000 $SYNC_SERVER/$repo "$SYNC_FILES" >> "$SYNC_LOGS/$LOG_FILE"

    # Create $repo.lastsync file with timestamp like "2007-05-02 03:41:08+03:00"
    # which may be useful for users to know when the repository was last updated
    date --rfc-3339=seconds > "$SYNC_FILES/$repo.lastsync"

    # Sleep 5 seconds after each repository to avoid too many concurrent connections
    # to rsync server if the TCP connection does not close in a timely manner
    sleep 5 
  done
fi

# Insert another timestamp and close the log file
echo ">> ---" >> "$SYNC_LOGS/$LOG_FILE"
echo ">> Finished sync on $(date --rfc-3339=seconds)" >> "$SYNC_LOGS/$LOG_FILE"
echo "=============================================" >> "$SYNC_LOGS/$LOG_FILE"
echo "" >> "$SYNC_LOGS/$LOG_FILE"

# Remove the lock file and exit
rm -f "$SYNC_LOCK"
exit 0

Nothing terribly fancy here, just a slightly advanced bash script to do what we need. Let's make it executable.

chmod +x scripts/mirrorsync.sh

That's it, you now have an easily modifiable script. You probably don't want to have to run this manually though, so let's set up a cron job to run this for us.

One note before we move on to the next step: your logs directory is going to keep growing in size. Make sure that you check it regularly so that it doesn't start overtaking the server with garbage. It is highly recommended that you set up LogRotate to manage this, or write some kind of cleanup script.

Running a cron job

Let's make sure we have the necessary cron tools (most Arch installations will):

pacman -S dcron

We will be running our cron job with crontab. For more infomation, see man crontab. The benefit of running the sync with a crontab is a higher level of security, and not cluttering up /etc/cron.* with files. It also allows for a higher level of control for when the script is run.

Create scripts/mirror.cron with the following contents:

0 3 * * * /home/mirror/scripts/mirrorsync.sh

Now we need to activate our crontab:

sudo -u mirror crontab scripts/mirror.cron

Let's make sure that crontab picked up our job:

sudo -u mirror crontab -l

You should see the contents of scripts/mirror.cron printed out. If not, rerun the previous command and check again.

This cron setup will run our sync.sh script every night at 3AM. You can adjust this however you want, see http://www.adminschoice.com/docs/crontab.htm for more information on crontab syntax.

Editing the cron job

If you ever need to edit mirror.cron, use the following command:

sudo -u mirror crontab -e

If you edit the file by hand, use the following command to update crontab:

sudo -u mirror crontab scripts/mirror.cron

Now let's set up pacman to use our local mirror.

Setting up pacman to use the local mirror

If you only want to access your mirror on one computer, you can use the following steps.

Single machine

NOTE: If you are following the above for a single machine, you are using a lot of bandwidth for no reason at all. Save it for the people that need it. This section only applies for those that will follow through with the below section as well.

You will not need vsftpd for this type of setup, because we are accessing the files via a file:// url, as opposed to a ftp:// url.

Add the following line to the top of /etc/pacman.d/mirrorlist, at the top of the Servers list:

Server = file:///home/mirror/files/$repo/os/i686

Also be sure to change i686 to x86_64 if you are using a 64bit version of Arch.

Multiple machines

Syncing this way will allow you to use FTP to access your local mirror from other machines. You can also use this method to sync to your local machine (more details on this later).

FTP server configuration

The first thing we need to do is configure vsftpd. Edit /etc/vsftpd.conf to look like this:

# vsftpd config file /etc/vsftpd.conf
#
# Setup for a secure anonymous FTP server
#
# Listen (non-xinetd) mode
listen=YES
# Use tcp_wrappers to control connections
tcp_wrappers=YES
# Use localtimes instead of GMT for files
use_localtime=YES
# Hide the true user/group ID of files
hide_ids=YES
# 
# Enable anonymous access (pacman requires this)
anonymous_enable=YES
# Use this user for anonymous logins
ftp_username=mirror
# Chroot directory for anonymous user
anon_root=/home/mirror/files/archlinux
# Don't require a password for anonymous access (pacman requires this)
no_anon_password=YES
#
# User to run vsftpd as (same as ftp_username)
nopriv_user=mirror
# Enable recursive "ls" listing
ls_recurse_enable=YES
#
# Forcefully destroy sessions after X seconds of inactivity 
# (It is highly recommended to not set this above 300)
idle_session_timeout=120
# Forcefully stop sending data after X seconds of inactivity during a transfer
# (It is highly recommended to not set this higher than idle_session_timeout)
data_connection_timeout=30

This setup will offer a very secure FTP server, tailored specifically for our needs. Note that this setup does not require a password, and should not be used in a publicly accessible network (unless that's what you want). Password protecting the FTP and still allowing it to work with pacman is beyond the scope of this document.

If you are going to connect to this machine from the outside, you will need to add the following line to /etc/hosts.allow:

vsftpd : ALL : ALL

Note that this will allow anyone to download from the mirror. If you want to control downloads more tightly, and don't know how to do so, see linux.about.com on the subject.

Let's make sure vsftpd starts:

sudo /etc/rc.d/vsftpd start

If vsftpd does not start, check that the options are set correctly in your /etc/vsftpd.conf file.

Enabling the mirror for pacman

Now let's edit /etc/pacman.d/mirrorlist files to use our shiny new mirror. Add the following line to the top of /etc/pacman.d/mirrorlist, at the top of the Servers list:

Server = ftp://192.168.1.21/$repo/os/i686

Note that 192.168.1.21 is the IP address of my test machine. Your address will most likely be different. (Remember that you can get the current IP of an Arch box with ifconfig -a or ifconfig eth0.)

If you want to use this same mirror on the local machine, use the following Server line:

Server = ftp://localhost/$repo/os/i686

Non-local machines will need to use an IP address to access the repository. Also make sure that the machine serving the mirror has a static IP address.

Synchronizing for the first time

Here comes the pain! Run the following command to start the sync:

sudo -u mirror ./scripts/mirrorsync.sh

This won't give you any kind of output, but you probably want some. You can use something like this (correct for the name of log file) to monitor the sync progress:

tail -f logs/pkgsync_20070203-9.log

This process will usually take a few hours, depending on the speed of your internet connection and how many repositories you are mirroring. After the first sync, only new packages will be sync'ed, so it will be much faster.

Wait for the first sync to finish, then run pacman -Syy to make sure that your new mirrors are syncing properly.

That's it! You now run a local mirror which will offer you massively improved speeds when updating your packages.

Notes

First version of this guide was written by busfahrer. He can be reached at #archlinux on irc.freenode.net.

Second version of this guide was written by Shadowhand. He can be reached at #archlinux on irc.freenode.net.

Comments and suggestions are always appreciated.