Difference between revisions of "DeveloperWiki:NewMirrors"

From ArchWiki
Jump to: navigation, search
(clarified the use of gen_rsyncd.conf.sh script)
(Things to keep in mind: Adapt for context)
 
(23 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 +
[[Category:DeveloperWiki]]
 +
[[Category:Arch development]]
 
=== Adding a new mirror ===
 
=== Adding a new mirror ===
  
 
This text should outline the procedure for adding a new mirror for Arch packages.
 
This text should outline the procedure for adding a new mirror for Arch packages.
 +
 +
=== Notes about private mirrors ===
 +
 +
* Bandwidth is not free for the mirrors. They must pay for all the data they serve you
 +
** This still applies although you pay your ISP
 +
** A full mirror is over 50GB in size
 +
* There are many packages that will be downloaded that you will likely never use
 +
* Mirror operators will much prefer you to download only the packages you need
 +
* Really please look at the alternatives listed [https://wiki.archlinux.org/index.php/Pacman/Tips_and_tricks#Network_shared_pacman_cache here] before setting up a private mirror
  
 
==2-tier mirroring scheme==
 
==2-tier mirroring scheme==
Line 12: Line 23:
  
 
== For the mirror administrator ==
 
== For the mirror administrator ==
Please open a [http://bugs.archlinux.org feature request] with a request to become an authorized mirror.
+
==== Tier 2 requirements ====
 +
* Disk-space >= 60 GB
 +
* Sync off a tier 1 mirror (see https://archlinux.org/mirrors)
 +
* Sync all contents of the upstream mirror (i.e. do not sync only some repositories)
 +
* Do not sync more often than every hour, but you should sync at least once a day
 +
* Sync on a random minute so it is more likely the requests will be spaced out with other mirrors
 +
* Use the following [[rsync]] options: '''-rtlvH --delete-after --delay-updates --safe-links'''
 +
* If you ever wish to send downtime notifications to our users, please use the [https://mailman.archlinux.org/mailman/listinfo/arch-mirrors-announce arch-mirrors-announce] list. You do not need to subscribe to be able to post.
 +
* http support
 +
 
 +
==== Tier 1 requirements ====
 +
* Tier 2 requirements
 +
* Bandwidth >= 100Mbit/s
 +
* [[rsync]] support
 +
* Proven reliability (be a tier 2 mirror for a while and have reasonable uptime, response to out-of-sync notifications etc.)
 +
 
 +
You can use rsync directly or [https://git.server-speed.net/users/flo/bin/tree/syncrepo.sh this script] as a starting point. Please note that the script tries to minimize load and bandwidth used (about 5MiB as of 2014-01-21) in case there are no changes. Feel free to remove this check if you don't sync very often or your upstream mirror does not provide the lastupdate file.
 +
 
 +
=== Create a feature-request ===
 +
{{Note|We are not accepting new ftp mirrors.}}
  
Please provide the following:
+
Go to [https://bugs.archlinux.org/newtask/proj1 https://bugs.archlinux.org] and create a feature-request (category: mirrors) containing the following information:
 
* Mirror domain name
 
* Mirror domain name
* Geographical Location of the mirror
+
* Geographical location of the mirror (country)
* Supported access methods (http, ftp, rsync, ...)
+
* URLs for supported access methods (http(s), [[rsync]]) (no ftp)
* URLs for the above access methods
+
* Your mirror's available bandwidth
* The name of tier 1 mirror you are syncing from, which should be one of this:
 
** '''aarnet.edu.au''' (Australia) - ''rsync://mirror.aarnet.au/pub/archlinux/''
 
** '''gwdg.de''' (Germany) - ''rsync://ftp5.gwdg.de/pub/linux/archlinux/''
 
** '''uk2.net''' (Great Britain) - ''rsync://mirrors.uk2.net/archlinux/''
 
** '''gtlib.gatech.edu''' (USA) - ''rsync://rsync.gtlib.gatech.edu/archlinux/''
 
** '''rit.edu''' (USA) - ''rsync://mirror.rit.edu/archlinux/''
 
** '''tku.edu.tw''' (Taiwan) - ''rsync://ftp.tku.edu.tw/archlinux/''
 
 
* An administrative contact email
 
* An administrative contact email
 +
* An alternative administrative contact email (optional)
 +
* (tier 1 mirrors) Rsync IPs so your server(s) can be allowed to sync off tier 0 (rsync.archlinux.org)
 +
* (tier 2 mirrors) The name of tier 1 mirror you are syncing from. You can find available tier 1 mirrors [https://www.archlinux.org/mirrors/ here] (sort using the tier column)
 +
 +
=== Contact info and mailing lists ===
  
Please join the [http://mailman.archlinux.org/mailman/listinfo/arch-mirrors arch-mirrors mailing list].
+
Feel free to join the [https://mailman.archlinux.org/mailman/listinfo/arch-mirrors arch-mirrors mailing list] which can be used for general discussion about our mirrors. If you want to inform our users about downtime of your mirror please use the [https://lists.archlinux.org/listinfo/arch-mirrors-announce arch-mirrors-announce] mailing list. You do not need to subscribe to be able to post to arch-mirrors-announce.
  
Please ensure the following, to provide consistent mirroring and keep the upstream mirror's load low:
+
If you want to reach the Arch Linux staff for questions, you can either use the arch-mirrors list, you can open a bug report on our tracker or you can send a mail to [mailto:mirrors@archlinux.org mirrors@archlinux.org].
* Sync all contents of the upstream mirror (i.e. do not sync only some repositories)
 
* Use the following rsync options: '''-rtlvH --delete-after --delay-updates --safe-links --max-delete=1000'''
 
* Do not rsync more rapidly than every hour
 
* Sync on a random minute so it is more likely the requests will be spaced out with other mirrors
 
  
 
== The Arch Linux side ==
 
== The Arch Linux side ==
  
 
* Add the mirror info to the Django admin site
 
* Add the mirror info to the Django admin site
* Regenerate the rsync whitelist with the gen_rsyncd.conf.sh script - only for tier 1 mirrors, or when disabling access to a previously untiered mirror
+
* Regenerate the rsync whitelist with the gen_rsyncd.conf.pl script - only for tier 1 mirrors, or when disabling access to a previously untiered mirror (also done by an hourly cronjob)
 
* Regenerate the pacman-mirrorlist package
 
* Regenerate the pacman-mirrorlist package
 +
 +
== Mirror size ==
 +
 +
To give you an impression how much space will be needed for a mirror here are some numbers (as of 2016-10-25):
 +
 +
Mandatory:
 +
* pool (all packages) - 48GB
 +
* repositories (core, community, extra, testing, gnome-unstable, kde-unstable, multilib) - total ~200MB
 +
 +
Optional:
 +
* iso - 9GB (encouraged)
 +
* archive - 15GB (permanently frozen)
 +
* other - 13GB
 +
* sources - 50GB
 +
 +
Most mirrors do not sync archive, other and sources directories, but sync everything else (including temporary repositories),
 +
so usually you will need about 50GB reserved for Arch Linux mirror.
 +
 +
However, note that the required space may temporarily increase when a big rebuild happens and thus many packages exist twice in different versions. Please plan in a buffer of 30GB to 50GB on top of the above mentioned values.

Latest revision as of 19:36, 11 October 2017

Adding a new mirror

This text should outline the procedure for adding a new mirror for Arch packages.

Notes about private mirrors

  • Bandwidth is not free for the mirrors. They must pay for all the data they serve you
    • This still applies although you pay your ISP
    • A full mirror is over 50GB in size
  • There are many packages that will be downloaded that you will likely never use
  • Mirror operators will much prefer you to download only the packages you need
  • Really please look at the alternatives listed here before setting up a private mirror

2-tier mirroring scheme

Due to the high load and bandwidth limits Arch Linux uses 2-tier mirroring scheme.

There are few tier 1 mirrors that sync directly from archlinux.org every hour.

All other mirrors should sync from one of tier 1 mirrors. Syncing from archlinux.org is not allowed.

For the mirror administrator

Tier 2 requirements

  • Disk-space >= 60 GB
  • Sync off a tier 1 mirror (see https://archlinux.org/mirrors)
  • Sync all contents of the upstream mirror (i.e. do not sync only some repositories)
  • Do not sync more often than every hour, but you should sync at least once a day
  • Sync on a random minute so it is more likely the requests will be spaced out with other mirrors
  • Use the following rsync options: -rtlvH --delete-after --delay-updates --safe-links
  • If you ever wish to send downtime notifications to our users, please use the arch-mirrors-announce list. You do not need to subscribe to be able to post.
  • http support

Tier 1 requirements

  • Tier 2 requirements
  • Bandwidth >= 100Mbit/s
  • rsync support
  • Proven reliability (be a tier 2 mirror for a while and have reasonable uptime, response to out-of-sync notifications etc.)

You can use rsync directly or this script as a starting point. Please note that the script tries to minimize load and bandwidth used (about 5MiB as of 2014-01-21) in case there are no changes. Feel free to remove this check if you don't sync very often or your upstream mirror does not provide the lastupdate file.

Create a feature-request

Note: We are not accepting new ftp mirrors.

Go to https://bugs.archlinux.org and create a feature-request (category: mirrors) containing the following information:

  • Mirror domain name
  • Geographical location of the mirror (country)
  • URLs for supported access methods (http(s), rsync) (no ftp)
  • Your mirror's available bandwidth
  • An administrative contact email
  • An alternative administrative contact email (optional)
  • (tier 1 mirrors) Rsync IPs so your server(s) can be allowed to sync off tier 0 (rsync.archlinux.org)
  • (tier 2 mirrors) The name of tier 1 mirror you are syncing from. You can find available tier 1 mirrors here (sort using the tier column)

Contact info and mailing lists

Feel free to join the arch-mirrors mailing list which can be used for general discussion about our mirrors. If you want to inform our users about downtime of your mirror please use the arch-mirrors-announce mailing list. You do not need to subscribe to be able to post to arch-mirrors-announce.

If you want to reach the Arch Linux staff for questions, you can either use the arch-mirrors list, you can open a bug report on our tracker or you can send a mail to mirrors@archlinux.org.

The Arch Linux side

  • Add the mirror info to the Django admin site
  • Regenerate the rsync whitelist with the gen_rsyncd.conf.pl script - only for tier 1 mirrors, or when disabling access to a previously untiered mirror (also done by an hourly cronjob)
  • Regenerate the pacman-mirrorlist package

Mirror size

To give you an impression how much space will be needed for a mirror here are some numbers (as of 2016-10-25):

Mandatory:

  • pool (all packages) - 48GB
  • repositories (core, community, extra, testing, gnome-unstable, kde-unstable, multilib) - total ~200MB

Optional:

  • iso - 9GB (encouraged)
  • archive - 15GB (permanently frozen)
  • other - 13GB
  • sources - 50GB

Most mirrors do not sync archive, other and sources directories, but sync everything else (including temporary repositories), so usually you will need about 50GB reserved for Arch Linux mirror.

However, note that the required space may temporarily increase when a big rebuild happens and thus many packages exist twice in different versions. Please plan in a buffer of 30GB to 50GB on top of the above mentioned values.