grep, cat, zgrep and zcat
More on log parsing, I’m taking notes on how to read log files and get the information that I need. On Linux environment, these tools are perfect: grep, cat, zgrep and zcat.
Extracting patterns with grep
We can extract information from a text file using grep. Example, we can extract lines of log file containing patterns like GET /checkout/* where status code is 500.
1
grep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log
Depending on the Apache log format, above will extract lines whose request is /checkout/* and status code is 500 where it may support HTTP/1.0 or HTTP/1.1. However, that would extract the whole line. To only extract the matching pattern, use the -o option.
1
grep -o -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log
And to save the matching patterns to a file, simply redirect the output to file.
1
grep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log > checkout-errors.txt
Using cat
cat is usually used to output contents of a file. This is a small but very useful Linux utility. For example, we can combine multiple log files (uncompressed) into a single log file.
1
cat /path/to/log-files/*.log > /combined/log-file.log
Compressed counterpart
grep and cat have their compressed file counterpart. For grep, there’s zgrep.
1
zgrep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.gz > checkout-errors.txt
For cat, there’s zcat.
1
zcat /path/to/log-files/*.gz > /combined/log-file.log
I’ve done so many combination last week that I don’t remember them all and not able to include in this post. Happy log parsing.
# List out successful ssh login attempts
cat secure | grep ‘Accepted’ | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’
cat secure* | sort | grep ‘Accepted’ | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” IP:” $11 }’
# List out successful ssh login attempts from sudo users
cat /var/log/secure | grep ‘session opened for user root’ | awk ‘{print $1 ” ” $2 ” ” $3 ” Sudo User: ” $13 ” ” }’
# List out ssh login attempts from non-existing and unauthorized user accounts
cat /var/log/secure | grep ‘Invalid user’
# List out ssh login attempts by authorized ssh accounts with failed password
cat /var/log/secure | grep -v invalid | grep ‘Failed password’
Indeed, and even grep | awk can be shortened to awk /…/. So you could save a bit of space in the final script. For a typical log file (~200 kb), you might save 1 ms processing it. Or to be exact, 1.8 ms removing the cat and grep, and 0.3 ms using only awk instead of grep | awk.
time for i in `seq 1000`; do cat secure | grep Accepted | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/a
time for i in `seq 1000`; do grep Accepted secure | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/b
time for i in `seq 1000`; do awk ‘/Accepted/ {print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’ secure; done > /tmp/c
However, more interestingly, when the size of the log file is increased to 200 MB, it turns out that the cat | grep | awk chain is significantly faster, at 1.096 s over 100 runs. The single awk command will not max out the CPUs, while the pipe chain does.
for i in `seq 1000`; do cat secure >> s; done
time for i in `seq 100`; do cat s | grep Accepted | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/a1
time for i in `seq 100`; do awk ‘/Accepted/ {print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’ s; done > /tmp/c1
My perl script awk:
Code:
my $OldTime= ‘Sep 10, 2012 5:20:41′;
my $NewTime=’Sep 10, 2012 5:49:40’;
my $test2 = qx{ssh -o stricthostkeychecking=no $WLS “awk ‘/$OldTime/,/$NewTime/’ $WLSP/logs/CDSServer.* “};
My log file format:
Code:
####
####
####
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);
my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];
# make sure the log file exists
die “$log: No such file\n” unless(-f$log);
# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);
print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;
open(LOG,’<',$log) or die "can't read '$log': $!\n";
while(
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;
# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);
# print line if it falls into the range
print $line,”\n” if(($seconds >= $start_sec)&&($seconds <= $stop_sec));
}
}
close(LOG);
Reply With Quote Reply With Quote
09-19-2012 #3
charith charith is offline
Just Joined!
Join Date
Nov 2010
Posts
26
Hi atreyu,
Thank you very much for your great clean reply.
Your script working fine and it gives lines those match given times but what i need is get all lines whatever between that time range. I'm sorry for my bad log file format i attached correct log file below.[Errors not begin with time]
Code:
####
###
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);
my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];
# make sure the log file exists
die “$log: No such file\n” unless(-f$log);
# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);
# make sure we got nothing but digits in the variables
die “Failed to convert $startTime to seconds\n” unless($start_sec =~ /^[0-9]*$/);
die “Failed to convert $stopTime to seconds\n” unless($stop_sec =~ /^[0-9]*$/);
print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;
my @lines;
my $stop;
open(LOG,’<',$log) or die "can't read '$log': $!\n";
while(
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;
# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);
# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
push(@lines,$line) unless($stop);
}elsif($seconds >= $stop_sec){
push(@lines,$line) unless($stop);
$stop = 1;
}
}else{
# save line w/o time string to array if it falls into the range
push(@lines,$line) if(($#lines>=0)&&!($stop));
}
}
close(LOG);
# print the saved lines
print “$_\n” for(@lines);
Reply With Quote Reply With Quote
09-24-2012 #5
charith charith is offline
Just Joined!
Join Date
Nov 2010
Posts
26
Hi atreyu,
It’s working fine thank you very much.
Did small change:
Code:
# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
as
Code:
if (($seconds >= $start_sec)&&($seconds <= $stop_sec)){ }
}
close(LOG);
#!/usr/bin/perl
use strict;
use warnings;
use Date::Parse;
# get command line arguments (3)
die "
Usage: $0 '
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);
my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];
# make sure the log file exists
die “$log: No such file\n” unless(-f$log);
# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);
# make sure we got nothing but digits in the variables
die “Failed to convert $startTime to seconds\n” unless($start_sec =~ /^[0-9]*$/);
die “Failed to convert $stopTime to seconds\n” unless($stop_sec =~ /^[0-9]*$/);
print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;
my @lines;
my $stop;
open(LOG,’<',$log) or die "can't read '$log': $!\n";
while(
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;
# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);
# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
push(@lines,$line) unless($stop);
}elsif($seconds >= $stop_sec){
push(@lines,$line) unless($stop);
$stop = 1;
}
}else{
# save line w/o time string to array if it falls into the range
push(@lines,$line) if(($#lines>=0)&&!($stop));
}
}
close(LOG);
# print the saved lines
print “$_\n” for(@lines);
Recent Comments