GNU Wget Cheat sheet

Recursively download a part of a website (all files are downloaded, hierarchy is preserved and links are converted)

-r
--recursive

Turn on recursive retrieving. The default maximum depth is 5.

-k
--convert-links

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links
to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

-np
--no-parent

Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.  

-nH
--no-host-directories

Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.

--cut-dirs=number
Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
Take, for example, the directory at ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve it with -r, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the -nH option can remove the
ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where --cut-dirs comes in handy; it makes Wget not "see" number remote directory components. Here are several examples of how
--cut-dirs option works.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
...
If you just want to get rid of the directory structure, this option is similar to a combination of -nd and -P. However, unlike -nd, --cut-dirs does not lose with subdirectories---for instance,
with -nH --cut-dirs=1, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.

-P prefix
--directory-prefix=prefix

Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the
current directory).

Recursively download only PDF's on the page

-A acclist --accept acclist
-R rejlist --reject rejlist

Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element of acclist or rejlist, it
will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in -A "*.mp3" or -A '*.mp3'.

-c
--continue

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program.

-x
--force-directories

The opposite of -nd---create a hierarchy of directories, even if one would not have been created otherwise.

-l depth
--level=depth

Specify recursion maximum depth level depth.

Leave a Reply

Your email address will not be published. Required fields are marked *