User Tools

Site Tools


ubuntu:wget:ignore_robots.txt

Ubuntu - wget - Ignore robots.txt

By default wget respects the robots.txt file and thus only downloads the non-private files.

The protocol of the robots exclusion standard is pure advisory, this means that the robots.txt contains rules that a search engine or other robots are not allowed to access certain files but they might ignore them.

Wget can be adviced to ignore that rules and thus it downloads the private files anyway. Set the e option as shown next.

wget -e robots=off -r http://somesite.com
ubuntu/wget/ignore_robots.txt.txt · Last modified: 2020/07/15 09:30 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki