nixCraft Linux Forum

nixCraft

Linux / UNIX Tech Support Forum

Script to count unique ips in apache access log

This is a discussion on Script to count unique ips in apache access log within the Getting started tutorials forums, part of the Linux Getting Started category; Thought this was cool. We needed a shell script to count the unique IP's in a apache access log that ...


Go Back   nixCraft Linux Forum > Linux Getting Started > Getting started tutorials

Linux answers from nixCraft.


Getting started tutorials So much to read, so little time! If that is your problem, we have solution. Read our FAQ and tutorials to help you cut through the clutter of information overload. Only members of "contributors" group can post new tutorials. Other members can just reply to thread.

Reply

 

LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 18-09-2009, 02:13 AM
jaysunn's Avatar
Powered By Linux
User
 
Join Date: Apr 2009
Location: 41.332032,-73.089775
OS: RHEL - OSX
Scripting language: BASH - Learning Ruby
Posts: 604
Thanks: 61
Thanked 80 Times in 72 Posts
Rep Power: 10
jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold
Default Script to count unique ips in apache access log

Thought this was cool. We needed a shell script to count the unique IP's in a apache access log that appeared multiple times. I came up with this script that I pieced together from the web. Worked great:


PHP Code:
#!/bin/bash

FILE=/usr/local/apache/logs/access_log;
 for 
ip in `cat $FILE |cut -d ' ' -f 1 |sort |uniq`;
 do { 
COUNT=`grep ^$ip $FILE |wc -l`;
 if [[ 
"$COUNT" -gt "10" ]]; then echo "$COUNT:   $ip";
 
fi }; done 
Here are the results on my test:

Code:
[root@forums1 bin]# ./ipcount.sh 
4416:   66.89.97.xxx
4415:   xx.72.16.18.xxx
56607:   16.187.xxx.xxx
55459:   xxx.xxx.xxx.195
Hope you have fun with this!!

PS: Please Move this if it should be in Shell/Scripting........

jaysunn
__________________
Have a look at what I have been working on
http://www.shellasaurus.com

Last edited by jaysunn; 23-11-2009 at 04:32 AM. Reason: Change the access log IP count
Reply With Quote
  #2 (permalink)  
Old 28-12-2009, 10:45 AM
Member
User
 
Join Date: May 2009
OS: Mandriva
Posts: 82
Thanks: 0
Thanked 16 Times in 16 Posts
Rep Power: 3
cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about
Default

Code:
cut -d ' ' -f 1 "$FILE" | sort | uniq -c
Reply With Quote
  #3 (permalink)  
Old 28-12-2009, 11:09 AM
Member
User
 
Join Date: Sep 2006
Posts: 68
Thanks: 0
Thanked 20 Times in 16 Posts
Rep Power: 6
ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about
Default

good effort, but here's a more efficient approach, using just awk

Code:
awk '{!a[$1]++}END{for(i in a) if ( a[i] >10 ) print a[i],i }' file
the reason for the inefficiency in your code is you are calling the log file many times as you find each unique ip address. If there are 1000 unique ip addresses, you are going to use grep on the log file 1000 times.... you get the idea...
Reply With Quote
The Following 2 Users Say Thank You to ghostdog74 For This Useful Post:
jaysunn (28-12-2009), raj (30-12-2009)
  #4 (permalink)  
Old 28-12-2009, 12:30 PM
Member
User
 
Join Date: May 2009
OS: Mandriva
Posts: 82
Thanks: 0
Thanked 16 Times in 16 Posts
Rep Power: 3
cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about
Default

Quote:
Originally Posted by ghostdog74 View Post
good effort, but here's a more efficient approach, using just awk

Code:
awk '{!a[$1]++}END{for(i in a) if ( a[i] >10 ) print a[i],i }' file
The loop may be slower than calling separate commands that don't hold as much info in memory.

Last edited by cfajohnson; 28-12-2009 at 12:33 PM.
Reply With Quote
  #5 (permalink)  
Old 28-12-2009, 01:03 PM
Member
User
 
Join Date: Sep 2006
Posts: 68
Thanks: 0
Thanked 20 Times in 16 Posts
Rep Power: 6
ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about
Default

Quote:
Originally Posted by cfajohnson View Post
The loop may be slower than calling separate commands that don't hold as much info in memory.
sorry, what do you mean? which loop are you referring to?
Reply With Quote
  #6 (permalink)  
Old 28-12-2009, 05:15 PM
Member
User
 
Join Date: May 2009
OS: Mandriva
Posts: 82
Thanks: 0
Thanked 16 Times in 16 Posts
Rep Power: 3
cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about
Default

Quote:
Originally Posted by ghostdog74 View Post
sorry, what do you mean? which loop are you referring to?

The only loop there is:

Code:
for(i in a) ...
Explicit loops are always slower than implicit ones.
Reply With Quote
  #7 (permalink)  
Old 28-12-2009, 07:00 PM
Member
User
 
Join Date: Sep 2006
Posts: 68
Thanks: 0
Thanked 20 Times in 16 Posts
Rep Power: 6
ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about ghostdog74 has a spectacular aura about
Default

Quote:
Originally Posted by cfajohnson View Post

The only loop there is:

Code:
for(i in a) ...
Explicit loops are always slower than implicit ones.
ok, so how do you propose to solve this hypothesis of "explicit loops" may be slower than "implicit ones"
Reply With Quote
  #8 (permalink)  
Old 28-12-2009, 10:12 PM
Member
User
 
Join Date: May 2009
OS: Mandriva
Posts: 82
Thanks: 0
Thanked 16 Times in 16 Posts
Rep Power: 3
cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about cfajohnson has a spectacular aura about
Default

Quote:
Originally Posted by ghostdog74 View Post
ok, so how do you propose to solve this hypothesis of "explicit loops" may be slower than "implicit ones"

If it isn't obvious (the code has to be interpreted every time through an explicit loop), then use the time command to test it.
Reply With Quote
  #9 (permalink)  
Old 28-12-2009, 10:28 PM
jaysunn's Avatar
Powered By Linux
User
 
Join Date: Apr 2009
Location: 41.332032,-73.089775
OS: RHEL - OSX
Scripting language: BASH - Learning Ruby
Posts: 604
Thanks: 61
Thanked 80 Times in 72 Posts
Rep Power: 10
jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold jaysunn is a splendid one to behold
Default

Hmm,


Code:
[root@forums2 ~]# time cut -d ' ' -f 1 "$FILE" | sort | uniq -c
   6585 10.4.20.236
      1 173.10.18.115
      1 187.61.17.37
      4 217.24.240.68
     14 41.223.30.22
      3 61.160.216.63
 159051 67.72.16.xxx
   6613 67.72.16.xxx
 159047 67.72.16.xxx
     10 75.148.211.109
      2 78.138.151.126

real	0m6.954s
user	0m6.952s
sys	0m0.055s
[root@forums2 ~]# time awk '{!a[$1]++}END{for(i in a) if ( a[i] >10 ) print a[i],i }' $FILE
159070 67.72.16.xxx
14 41.223.30.22
159074 67.72.16.xxx
6586 10.4.20.xxx
6614 67.72.16.xxx

real	0m0.214s
user	0m0.201s
sys	0m0.014s
[root@forums2 ~]#
Second one was pretty fast.


Jaysunn
__________________
Have a look at what I have been working on
http://www.shellasaurus.com

Last edited by jaysunn; 28-12-2009 at 10:44 PM.
Reply With Quote
  #10 (permalink)  
Old 28-12-2009, 10:51 PM
nixcraft's Avatar
Never say die
User
 
Join Date: Jan 2005
Location: BIOS
OS: RHEL
Scripting language: Bash and Python
Posts: 2,710
Thanks: 11
Thanked 245 Times in 184 Posts
Rep Power: 10
nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute
Default

Quote:
Originally Posted by jaysunn View Post
Second one was pretty fast.
Jaysunn
I've not tested this but it is *possible* that results are cached by kernel. Can you run it on two different hosts with same data file and post it back?
__________________
Vivek Gite
Linux Evangelist
Be proud RHEL user, and let the world know about your enterprise choices! Join RedHat user group.
Always use CODE tags for posting system output and commands!
Do you run a Linux? Let's face it, you need help
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads

Thread Thread Starter Forum Replies Last Post
Apache error 403 Permission access on RHEL5 samengr Web servers 2 06-06-2009 02:18 AM
Shell script to count number of lines in file specified by the second command-line seaman77 Shell scripting 1 16-03-2009 07:46 PM
grep command count number of CPU sidebrake Shell scripting 3 09-09-2008 11:26 PM
Set and access apache from DSL / ADSL connection paul555 Web servers 4 17-07-2007 04:38 PM
Debian recovery mode read only access make it write access Donavit Linux software 1 30-12-2005 12:49 AM


All times are GMT +5.5. The time now is 01:19 AM.


Powered by vBulletin® Version 3.8.5 - Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2
©2005-2010 nixCraft. All rights reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38