wget \
--recursive \
--level=1 \
--convert-links \
--page-requisites \
--adjust-extension \
--no-clobber \
--random-wait \
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36" \
--restrict-file-names=windows \
--no-parent \
--domains example.com \
http://example.com/path/to/my.html
-r
--recursive
Turn on recursive retrieving. The default maximum depth is 5.
-l depth
--level=depth
Specify recursion maximum depth level depth.
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local
viewing. This affects not only the visible hyperlinks, but any part of the document that links
to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML
content, etc.
-p
--page-requisites
This option causes Wget to download all the files that are necessary to properly display a given
HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
-E
--adjust-extension
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with
the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the
local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp
pages, but you want the mirrored pages to be viewable on your stock Apache server.
As of version 1.12, Wget will also ensure that any downloaded files of type text/css end in the
suffix .css.
As of version 1.19.2, Wget will also ensure that any downloaded files with a "Content-Encoding"
of br, compress, deflate or gzip end in the suffix .br, .Z, .zlib and .gz respectively.
-nc
--no-clobber
If a file is downloaded more than once in the same directory, Wget's behavior depends on a few
options, including -nc. In certain cases, the local file will be clobbered, or overwritten, upon
repeated download. In other cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the same file in the same directory
will result in the original copy of file being preserved and the second copy being named file.1.
If that file is downloaded yet again, the third copy will be named file.2, and so on. (This is
also the behavior with -nd, even if -r or -p are in effect.) When -nc is specified, this
behavior is suppressed, and Wget will refuse to download newer copies of file.
When running Wget with -r or -p, but without -N, -nd, or -nc, re-downloading a file will result
in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead
causing the original version to be preserved and any newer copies on the server to be ignored.
--random-wait
Some web sites may perform log analysis to identify retrieval programs such as Wget by looking
for statistically significant similarities in the time between requests. This option causes the
time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using
the --wait option, in order to mask Wget's presence from such analysis.
-e command
--execute command
Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after
the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one
wgetrc command, use multiple instances of -e.
-e robots=off
Ignore robots.txt for the domain
-U agent-string
--user-agent=agent-string
Identify as agent-string to the HTTP server.
The HTTP protocol allows the clients to identify themselves using a "User-Agent" header field.
This enables distinguishing the WWW software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as Wget/version, version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring the output according to the
"User-Agent"-supplied information. While this is not such a bad idea in theory, it has been
abused by servers denying information to clients other than (historically) Netscape or, more
frequently, Microsoft Internet Explorer. This option allows you to change the "User-Agent" line
issued by Wget. Use of this option is discouraged, unless you really know what you are doing.
Specifying empty user agent with --user-agent="" instructs Wget not to send the "User-Agent"
header in HTTP requests.
--restrict-file-names=modes
Change which characters found in remote URLs must be escaped during generation of local
filenames.
The modes are a comma-separated set of text values. The acceptable values are unix, windows,
nocontrol, ascii, lowercase, and uppercase.
When "windows" is given, Wget escapes the characters \, |, /, :, ?, ", *, <, >, and the control
characters in the ranges 0--31 and 128--159. In addition to this, Wget in Windows mode uses +
instead of : to separate host and port in local file names, and uses @ instead of ? to separate
the query portion of the file name from the rest. Therefore, a URL that would be saved as
www.xemacs.org:4300/search.pl?input=blah in Unix mode would be saved as
www.xemacs.org+4300/search.pl@input=blah in Windows mode. This mode is the default on Windows.
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option,
since it guarantees that only the files below a certain hierarchy will be downloaded.
-D domain-list
--domains=domain-list
Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does
not turn on -H.