Web Mirroring
In many situations, you can pull down the contents of a website, recursively using the wget utility.
To do a straight forward mirror of a site:
wget -m http:\\some.site.com
long version:
wget --mirror http:\\some.site.com
This is actually the same as specifying:
wget -r -l inf -nr -N http:\\some.site.com
long version:
wget --recursive --level
inf --dont-remove-listing --timestamping http:\\some.site.com=
Convert for off-line reading, including giving all html files an .html extension.
wget --recursive --level
1 --dont-remove-listing --timestamping --convert-links --html-extension http://some.site.com=
Other common options or variants
-l depth
--level=depth
Specifies the maximum depth level for recursion.
-L
--relative
Follows relative links only.
man wget will tell you more.
Proxy servers
Set an environment label http_proxy
E.g. in bash shell...
export http_proxy
http://proxy.mysite.com:8888=
See the manual for specifying username and password if your proxy server requires them. Options --proxy-user --proxy-passwd
Example
wget --verbose --timeout=30 --mirror --proxy-user=myuser --proxy-passwd=mypassword http://some.web.site
More info at http://www.gnu.org/manual/wget/
FTP
wget --mirror ftp://myusername:mypassword@ftp.targetsite.com/target/path
-- Frank Dean 6 Dec 2002