That way, even if you don't know Perl too well but know the command line, you could tweak arguments in sort to reverse the output or even pipe it further to tail, so you could see only the top ten IPs. I very well could have-Perl has powerful methods to sort output-but to make the code simpler and more flexible, I opted to pipe the output to the command-line sort command. Once I'm done iterating through each line in the file, I then drop to a foreach loop:īasically, all this does is increment through every key in the hash and output its value (the number of times I matched that IP in the file) and the IP itself. The second time I see the IP, it will increment it to two and so on. So, if it's the first time I see the IP, its value will be one. That means each time I come across a new IP, it will get its own key in the hash and have its value incremented. The power of a hash is that it forces each key to be unique. NET CLR 6 InfoPath.2)"Īnd, here's the one-liner that can parse the file and provide sorted output: ↪/talks/pxe/ui/default/iepngfix.htc HTTP/1.1" Here's a sample entry from the log (the IP has been changed to protect the innocent):ġ23.123.12.34 - "GET I want to find out how many unique IP addresses visited one of my sites on November 1, 2008, and the top ten IPs in terms of hits. When it's done accepting input, the script iterates through each key in the hash and outputs the tally for that key and the key itself.įor the test case, I use a general-purpose problem you can try yourself, as long as you have an Apache Web server. It then uses that pattern as a key in a hash table and increments the value of that key. The script parses through each line of input and uses a regular expression to match a particular column or other pattern of data on the line. Maybe it's the power of Perl regular expressions, or maybe it's how easy it is to use Perl hashes, or maybe it's just what I'm most comfortable with, but I just seem to be able to hack out this kind of script much faster in Perl.īefore I give a sample script though, here's a more specific algorithm. There's nothing at all wrong with those approaches, but I suppose I fall into the middle-child scripting category-I prefer Perl for this kind of text hacking. The whipper-snappers out there might pick a nicely formatted Python script. Old-school command-line junkies might prefer a nice sed and awk approach. There are many ways you can do this type of log parsing. Finally, I need to output that information along with its final tally and sort based on the tally. Then, I need to run through the log file, identify that information and keep a running tally that increments each time I see the particular pattern. For any log file, each line contains some bit of unique information I need. What I've found is that although the specific type of information I need might change a little, for the most part, the algorithm remains mostly the same. For these on-the-fly statistics, I've developed a common template for a shell one-liner that chops through logs like Paul Bunyan. All of these programs are great, and I suggest you use them, but sometimes you need specific statistics and you need them now. Some provide really nifty real-time visualizations of Web traffic, and others run every night and generate manager-friendly reports for you to browse. How many visitors did you get to your main index page today? What spider is hammering your site right now? Now, if you manage any Web servers, logs provide even more valuable information in terms of statistics. On a good day, logs show you every clue you need to track down any of a hundred strange system problems. On a bad day, a misbehaved program could dump gigabytes of errors into its log file, fill up the disk and light up your pager like a Christmas tree. If you are a sysadmin, logs can be both a bane and a boon to your existence.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |