Wget
GNU Wget is a free software package for retrieving files using HTTP, HTTPS, FTP and FTPS (FTPS since version 1.18). It is a non-interactive commandline tool, so it may easily be called from scripts.
Installation
wget 1.x
Install the wget package. The git version is present in the AUR by the name wget-gitAUR.
wget2
An alternative with near identical command line syntax is wget2, which: "is the successor of GNU Wget" and is "Designed and written from scratch". This rewrite "works multi-threaded and uses many features to allow fast operation. In many cases Wget2 downloads much faster than Wget1.x due to HTTP2, HTTP compression, parallel connections and use of If-Modified-Since HTTP header." [1]
In order to use wget2, install wget2AUR from the AUR. Note that this has a makedepend on pandoc, which is dependency heavy; you can alternatively install wget2-no-docsAUR also in the AUR, to avoid this makedepend.
Configuration
Configuration is performed in /etc/wgetrc
. Not only is the default configuration file well documented; altering it is seldom necessary. See wget(1) § OPTIONS for more intricate options.
FTP automation
Normally, SSH is used to securely transfer files among a network. However, FTP is lighter on resources compared to scp and rsyncing over SSH. FTP is not secure, but when transfering large amounts of data inside a firewall protected environment on CPU-bound systems, using FTP can prove beneficial.
wget ftp://root:somepassword@10.13.X.Y//ifs/home/test/big/"*.tar" 3,562,035,200 74.4M/s in 47s
In this case, Wget transfered a 3.3 GiB file at 74.4MB/second rate.
In short, this procedure is:
- scriptable
- faster than ssh
- easily used by languages than can substitute string variables
- globbing capable
Proxy
Wget uses the standard proxy environment variables. See Proxy settings.
To use the proxy authentication feature:
$ wget --proxy-user "DOMAIN\USER" --proxy-password "PASSWORD" URL
Proxies that use HTML authentication forms are not covered.
pacman integration
To have pacman automatically use Wget and a proxy with authentication, place the Wget command into /etc/pacman.conf
, in the [options]
section:
XferCommand = /usr/bin/wget --proxy-user "domain\user" --proxy-password="password" --passive-ftp --quiet --show-progress --continue --output-document=%o %u
chmod 600 /etc/pacman.conf
.Usage
This section explains some of the use case scenarios for Wget.
Basic usage
One of the most basic and common use cases for Wget is to download a file from the internet.
$ wget <url>
When you already know the URL of a file to download, this can be much faster than the usual routine downloading it on your browser and moving it to the correct directory manually. Needless to say, just from the simplest usage, you can probably see a few ways of utilising this for some automated downloading if that's what you want.
Archive a complete website
Wget can archive a complete website whilst preserving the correct link destinations by changing absolute links to relative links.
$ wget --recursive --no-parent --convert-links 'target-url-here'
In case of a dynamic website, some additional options for conversion into static HTML are available.
$ wget --recursive --no-parent --page-requisites --adjust-extension --convert-links --backup-converted 'target-url-here'
wget also provides options for bypassing download-prevention mechanisms.
$ wget --recursive --no-parent --convert-links --random-wait --execute robots=off --user-agent "Mozilla/5.0" 'target-url-here'
And if third-party content is to be included in the download, -H/--span-hosts
switch can be used alongside -r/--recursive
to recurse to linked hosts.