====== Apache - Logs - Extract all user agents from Apache logs ======
cat test.log | awk -F\" '{print $6}' | sort | uniq -c | sort -n
* where “test.log” is the access logfile to analyze.
Returns
51916 MetaURI API/2.0 +metauri.com
59899 Twitterbot/1.0
87819 Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
111261 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
187812 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)
189834 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
390477 facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
**NOTE:** The first number is the amount of times this spider/crawler/user agent/ has accessed the site.
* Beware, these are not all crawlers, as the data is intermixed with actual human user traffic and other useful traffic.
In the example above, notice that the “Facebookexternalhit” user agent accessed the site 390,477 times per month.
* That is roughly 541x per hour. Excessive!!!.
* On the kill list, you go!