apache:logs:extract_all_user_agents_from_apache_logs
This is an old revision of the document!
Apache - Logs - Extract all user agents from Apache logs
cat test.log | awk -F\" '{print $6}' | sort | uniq -c | sort -n
(where “test.log” is the access logfile you want to analyse).
Returns
51916 MetaURI API/2.0 +metauri.com 59899 Twitterbot/1.0 87819 Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 111261 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 187812 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy) 189834 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 390477 facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
The first number (bolded) is the amount of times this spider/crawler/user agent/ has accessed your site. Beware, these are not all crawlers, as the data is intermixed with actual human user traffic and other useful traffic.
NOTE: In the example above, notice that the “Facebookexternalhit” user agent accessed the site 390,477 times per month.
- That is roughly 541x per hour. Excessive!!!.
- On the kill list, you go!
apache/logs/extract_all_user_agents_from_apache_logs.1689595890.txt.gz · Last modified: 2023/07/17 12:11 by peter