ubuntu:wget:ignore_robots.txt
Ubuntu - wget - Ignore robots.txt
By default wget respects the robots.txt file and thus only downloads the non-private files.
The protocol of the robots exclusion standard is pure advisory, this means that the robots.txt contains rules that a search engine or other robots are not allowed to access certain files but they might ignore them.
Wget can be adviced to ignore that rules and thus it downloads the private files anyway. Set the e option as shown next.
wget -e robots=off -r http://somesite.com
ubuntu/wget/ignore_robots.txt.txt · Last modified: 2020/07/15 09:30 by 127.0.0.1