Linux / UNIX Tech Support Forum
This is a discussion on Script to count unique ips in apache access log within the Getting started tutorials forums, part of the Linux Getting Started category; I have ran the same tests on radio5 host. Of course I copied the access_log to radio5 form forums2. Now ...
|
|||||||
| Getting started tutorials So much to read, so little time! If that is your problem, we have solution. Read our FAQ and tutorials to help you cut through the clutter of information overload. Only members of "contributors" group can post new tutorials. Other members can just reply to thread. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
| Sponsored Links | ||
|
|
|
|||
|
the for loop is only run at the END block of awk after processing the last line. if this is not it, then i still don't know what you are talking about.
__________________
Python tutorial | PHP manual | Bash Ref | Perl documentation | Awk Examples | Gawk | File Renamer Last edited by ghostdog74; 12-29-2009 at 04:36 AM. |
|
|||
|
its pretty obvious even without using time command, because of the extra pipe to sort + uniq. whereas with the awk command, the count is already done and stored in memory AS it reads through the file. Now imagine the file is >100MB in size...
__________________
Python tutorial | PHP manual | Bash Ref | Perl documentation | Awk Examples | Gawk | File Renamer Last edited by ghostdog74; 12-29-2009 at 04:37 AM. |
|
|||
|
Quote:
__________________
Python tutorial | PHP manual | Bash Ref | Perl documentation | Awk Examples | Gawk | File Renamer |
|
||||
|
Just for the record.
GhostDog. Thank you for paying attention to the OP which is me and my requirements of any unique IP being less then 10 disregard. And as for access log size: Code:
[root@forums1 logs]# ls -lah total 296M drwxr-xr-x 3 root root 4.0K Dec 27 11:10 . drwxr-xr-x 12 root root 4.0K Nov 5 20:17 .. -rw-r--r-- 1 root root 245M Dec 28 19:14 access_log Thanks again to you all, Jaysunn |
|
||||
|
@ghostdog74
Just wondering, If I add more criteria to the cut command provided my Mr. Johnson. Wouldn't it become slower? After all, I am learning. Please give me the commands you would like to test and I will be sure to execute them.................... Jaysunn |
|
|||
|
criteria like what? for such qns, its best to test it out (even if i can tell you straight away). after all, the shell is there waiting for you to learn about it. run the time command on your big access log and see ....
__________________
Python tutorial | PHP manual | Bash Ref | Perl documentation | Awk Examples | Gawk | File Renamer |
|
||||
|
I run it over 1Gb log file (on two different hosts) and clearly as ghostdog74 said it is extra pipes that is taking all the time with cut and uniq commands. awk is way to go to sort out your problem.
In theory C program can improve speed a bit, but I highly doubt that too as awk is already optimized for this kind of work. @jay, use two different hosts (i.e run cut/uniq on serverA and awk on ServerB with same hardware+os+kernel) with same data file. Otherwise server will cache the result for frequently used files in RAM and it will skip disk I/O second time you run awk. This will give you exact result. As, I said earlier, I did this and awk is way faster...
__________________
Vivek Gite Linux Evangelist Last edited by nixcraft; 12-29-2009 at 09:29 AM. |
|
||||
|
This is one of the deepest admin work related discussion i have come across after i joined nixcraft. This prompted me to keep in mind when writing scripts..
1)code should be as small as possible to achieve our goal, irrespective of lang used. 2)Time taken to run a script should be less.. so that system resources are used minimal. Thanks to all the guys for your valuable suggestions/sharing of knowledge.
__________________
Thanks, Surendra Kumar Anne Ubuntu: Simple, Stylish and Striking..! Linux: Fast, friendly, flexible and .... free! Support Open source. |
|
||||
|
OK and not to beat a dead horse. However I had to show the results of a cut command that performed the exact output and stipulations of the awk command. Fresh host never seen any of the commands to prevent kernel caching.
And as everyone suggested, AWK is the champ. Thanks to all for making this happen. I learned a lot. Code:
[root@server1 testing]# ls -lah total 255M drwxr-xr-x 2 root root 4.0K Dec 29 09:18 . drwxr-x--- 15 root root 4.0K Dec 29 09:18 .. -rw-r--r-- 1 root root 255M Dec 29 09:15 access_log Code:
[root@server1 testing]# FILE=/root/testing/access_log Code:
[root@server1 testing]# time cut -d ' ' -f 1 "$FILE" | sort | uniq -c | grep '[0-9][0-9] '
598129 10.4.20.236
179838 67.72.16.xxx
215 67.72.16.xxx
7470 67.72.16.xxx
414332 67.72.16.xxx
884701 67.72.16.xxx
880528 67.72.16.xxx
379 67.86.131.xxx
476 68.195.209.xxx
166 68.195.209.xxx
38 76.19.14.47
real 2m0.744s
user 2m11.299s
sys 0m0.758s
Code:
[root@server1 testing]# time awk '{!a[$1]++}END{for(i in a) if ( a[i] >10 ) print a[i],i }' access_log
880528 67.72.16.xxx
414332 67.72.16.xxx
884701 67.72.16.xxx
215 67.72.16.xxx
476 68.195.209.xxx
379 67.86.131.xxx
179838 67.72.16.xxx
166 68.195.209.xxx
38 76.19.14.47
7470 67.72.16.xxx
598129 10.4.20.xxx
real 0m2.756s
user 0m2.489s
sys 0m0.277s
[root@server testing]#
Jaysunn Last edited by jaysunn; 01-01-2010 at 09:07 PM. |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) |
|
| Thread Tools | |
| Display Modes | |
|
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Apache error 403 Permission access on RHEL5 | samengr | Web servers | 2 | 06-06-2009 02:18 AM |
| Shell script to count number of lines in file specified by the second command-line | seaman77 | Shell scripting | 1 | 03-16-2009 07:46 PM |
| grep command count number of CPU | sidebrake | Shell scripting | 3 | 09-09-2008 11:26 PM |
| Set and access apache from DSL / ADSL connection | paul555 | Web servers | 4 | 07-17-2007 04:38 PM |
| Debian recovery mode read only access make it write access | Donavit | Linux software | 1 | 12-30-2005 12:49 AM |