User:Zatricky/DeltaMirrorFromNonDelta

From ArchWiki

Overview

Make a mirror available with deltas where the source repository is *not* available with deltas:

After basic testing, will create ba/sh-form script


Also see:

https://wiki.archlinux.org/index.php/Deltup
https://github.com/sabooky/archdelta
https://bbs.archlinux.org/viewtopic.php?id=92085

Summary of procedure

On first sync:

  • 1. configure a separate mirror path for the delta *version* of the mirror
  • 2. use the standard rsync method to an “offline” path
  • 3. configure hard links from the delta mirror to the same locations in the main mirror only for package files
  • 4. use repo-add to add all .pkg files to new $repo.db files (how does the signing work? Does it create a new .sig for the package or does it only sign the .db?), security of signing previously unsigned packages or automatically rejecting invalid packages/signatures

Subsequent syncs:

  • 5. (As with 2 above). Complete normal sync to the offline path.
  • 6. (As with 3 above). Sync hard links from the delta mirror to the same locations in the main mirror only for package files
  • 7. use repo-add to add only the new files. Use -d switch to automatically create deltas (as above, signed??)
  • 8. remove old .pkg files
  • 9. remove old deltas? Find an easy/KISS way to determine when old deltas are using up too much disk that pacman won’t bother using them. Can repo-* do this part automatically? If it does, would repo-add *only* remove the db entry or would it also delete the .delta file?
  • 10. update lastsync

Questions still unanswered:

By creating lots of .delta files in /$repo/os/$arch/ might there be duplicates for multilib packages wasting mirror diskspace? Should we follow the symlinks in /$repo/os/$arch/ to /pool/packages/ and mv && ln -s as is done with the .pkg files?

What is repo-elephant?? lol.

Detail

1. configure a separate mirror path for the delta version of the mirror without the dbs:

mkdir -p /srv/ftp/pub/archlinux-delta

2. use standard rsync method with delete:

rsync -rtlvH --delete-after --delay-updates --safe-links --max-delete=1000 rsync://source-server/archlinux/ /srv/ftp/.hidden/archlinux/

3a. get lists of all new package files for each repo/os/arch:

comm -23 <(cd /srv/ftp/.hidden/archlinux/ ; find) <(cd /srv/ftp/pub/archlinux-delta; find) | grep \.pkg > $newlist
comm -13 <(cd /srv/ftp/.hidden/archlinux/ ; find) <(cd /srv/ftp/pub/archlinux-delta; find) | grep \.pkg > $oldlist

3b. sync package files across via hard links only:

rsync -aplx --link-dest=/srv/ftp/.hidden/archlinux/ --exclude lastsync --exclude '*.db' --exclude '*.db.tar.gz' --exclude '*.abs.tar.gz' --exclude '*.files' --exclude '*.files.tar.gz' /srv/ftp/.hidden/archlinux/ /srv/ftp/pub/archlinux-delta/

4. Create new $repo.db as appropriate:

cd /srv/ftp/pub/archlinux-delta/$repo/os/$arch/ ; repo-add -s -k <key> $repo.db *.pkg{,.tar{,.{bz2,gz,xz,Z}}}

5. As with 2 above

6. As with 3 above

7. For each .pkg applicable in newlist, run repo-add with with “-d” to automatically create a delta:

cd /srv/ftp/pub/archlinux-delta/$repo/os/$arch/ ; repo-add -s -k <key> -d $repo.db <package.pkg.ext>

Cannot use the original $repo.db every time as the original $repo.db will not contain any delta data

8a. remove all old package files:

for i in `grep pool\/packages oldlist` ; do rm -f $i ; done

AND/OR??

for i in `ls -l /srv/ftp/pub/archlinux-delta/pool/packages/ | awk '{print $2 " " $8}' | grep ^1\ | awk '{print $2}'` ; do rm -f $i ; done

Both methods have edge cases where files might be “missed” however both edge cases are covered by their alternative method.

8b. remove all broken symlinks:

find -L /srv/ftp/pub/archlinux-delta -type l -delete

9. remove old irrelevant deltas:

find /srv/ftp/pub/archlinux-delta -ctime +90 -name \*.delta -exec repo-remove {} \;

10. Update lastsync

rsync /srv/ftp/.hidden/archlinux/lastsync /srv/ftp/pub/archlinux-delta/