April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Categories

April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Parsing – Logs

grep, cat, zgrep and zcat

More on log parsing, I’m taking notes on how to read log files and get the information that I need. On Linux environment, these tools are perfect: grep, cat, zgrep and zcat.

Extracting patterns with grep

We can extract information from a text file using grep. Example, we can extract lines of log file containing patterns like GET /checkout/* where status code is 500.

1
grep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log
Depending on the Apache log format, above will extract lines whose request is /checkout/* and status code is 500 where it may support HTTP/1.0 or HTTP/1.1. However, that would extract the whole line. To only extract the matching pattern, use the -o option.

1
grep -o -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log
And to save the matching patterns to a file, simply redirect the output to file.

1
grep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.log > checkout-errors.txt
Using cat

cat is usually used to output contents of a file. This is a small but very useful Linux utility. For example, we can combine multiple log files (uncompressed) into a single log file.

1
cat /path/to/log-files/*.log > /combined/log-file.log
Compressed counterpart

grep and cat have their compressed file counterpart. For grep, there’s zgrep.

1
zgrep -E -e ‘GET /checkout/.* HTTP/1\.(0|1)” 500’ some-log-file.gz > checkout-errors.txt
For cat, there’s zcat.

1
zcat /path/to/log-files/*.gz > /combined/log-file.log
I’ve done so many combination last week that I don’t remember them all and not able to include in this post. Happy log parsing.

# List out successful ssh login attempts
cat secure | grep ‘Accepted’ | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’
cat secure* | sort | grep ‘Accepted’ | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” IP:” $11 }’

# List out successful ssh login attempts from sudo users
cat /var/log/secure | grep ‘session opened for user root’ | awk ‘{print $1 ” ” $2 ” ” $3 ” Sudo User: ” $13 ” ” }’

# List out ssh login attempts from non-existing and unauthorized user accounts
cat /var/log/secure | grep ‘Invalid user’

# List out ssh login attempts by authorized ssh accounts with failed password
cat /var/log/secure | grep -v invalid | grep ‘Failed password’

Indeed, and even grep | awk can be shortened to awk /…/. So you could save a bit of space in the final script. For a typical log file (~200 kb), you might save 1 ms processing it. Or to be exact, 1.8 ms removing the cat and grep, and 0.3 ms using only awk instead of grep | awk.

time for i in `seq 1000`; do cat secure | grep Accepted | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/a
time for i in `seq 1000`; do grep Accepted secure | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/b
time for i in `seq 1000`; do awk ‘/Accepted/ {print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’ secure; done > /tmp/c

However, more interestingly, when the size of the log file is increased to 200 MB, it turns out that the cat | grep | awk chain is significantly faster, at 1.096 s over 100 runs. The single awk command will not max out the CPUs, while the pipe chain does.

for i in `seq 1000`; do cat secure >> s; done
time for i in `seq 100`; do cat s | grep Accepted | awk ‘{print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’; done > /tmp/a1
time for i in `seq 100`; do awk ‘/Accepted/ {print $1 ” ” $2 ” ” $3 ” User: ” $9 ” ” }’ s; done > /tmp/c1

My perl script awk:
Code:
my $OldTime= ‘Sep 10, 2012 5:20:41′;
my $NewTime=’Sep 10, 2012 5:49:40’;

my $test2 = qx{ssh -o stricthostkeychecking=no $WLS “awk ‘/$OldTime/,/$NewTime/’ $WLSP/logs/CDSServer.* “};
My log file format:
Code:
#### Kernel>> <>
#### <[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'>
#### <12d58f5205084394:4db5aec7:1393b6c3ce7:-8000-00000000 #### <[ACTIVE] ExecuteThread: ' Reply With Quote Reply With Quote 09-18-2012 #2 atreyu atreyu is offline Trusted Penguin Join Date May 2011 Posts 4,353 Hi, I'd make two suggestions. The first one is to use the Date::Parse Perl module. It is hopefully already packaged for your distro. This module will allow you to easily convert date/time strings to seconds since the epoch (which is an easy way to do date/time math). It will give you the equivalent output to this GNU date command: Code: date +%s -d "Sep 10, 2012 5:20:41 PM" The second suggestion would be to put the script on the server and pass to it 3 arguments: 1. the start date/time range 2. the end date/time range 3. the log file to parse then you'd do something like this to call it: Code: ssh server /tmp/parse-log.pl 'Sep 10, 2012 5:20:41 PM' 'Sep 10, 2012 5:44:42 PM' /path/to/CDSserver.log and here is the parse-log.pl script: Code: #!/usr/bin/perl use strict; use warnings; use Date::Parse; # get command line arguments (3) die " Usage: $0 '‘ ‘
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);

my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];

# make sure the log file exists
die “$log: No such file\n” unless(-f$log);

# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);

print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;

open(LOG,’<',$log) or die "can't read '$log': $!\n"; while(){
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;

# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);

# print line if it falls into the range
print $line,”\n” if(($seconds >= $start_sec)&&($seconds <= $stop_sec)); } } close(LOG); Reply With Quote Reply With Quote 09-19-2012 #3 charith charith is offline Just Joined! Join Date Nov 2010 Posts 26 Hi atreyu, Thank you very much for your great clean reply. Your script working fine and it gives lines those match given times but what i need is get all lines whatever between that time range. I'm sorry for my bad log file format i attached correct log file below.[Errors not begin with time] Code: #### f-tuning)’> <> <> <> <1345916502340> <[ACTIVE] ExecuteThread: '20' for queue: ' java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:443) at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:670oracle.jdbc.driver.T4CConnection.(T4CConnection.java:230
### <[ACTIVE] ExecuteThrea #### <[ACT Above java.sql.SQLRecoverableException: IO Error: part should be retrieve. Reply With Quote Reply With Quote 09-20-2012 #4 atreyu atreyu is offline Trusted Penguin Join Date May 2011 Posts 4,353 ah, okay. yeah, that changes things, but not by too much. basically, you can just set a marker once the start time string is matched, then set a stop marker once the end time string is matched, and save everything in between to an array. then print the array once you're done looping thru the file. Code: #!/usr/bin/perl use strict; use warnings; use Date::Parse; # get command line arguments (3) die " Usage: $0 '‘ ‘
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);

my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];

# make sure the log file exists
die “$log: No such file\n” unless(-f$log);

# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);

# make sure we got nothing but digits in the variables
die “Failed to convert $startTime to seconds\n” unless($start_sec =~ /^[0-9]*$/);
die “Failed to convert $stopTime to seconds\n” unless($stop_sec =~ /^[0-9]*$/);

print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;

my @lines;
my $stop;

open(LOG,’<',$log) or die "can't read '$log': $!\n"; while(){
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;

# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);

# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
push(@lines,$line) unless($stop);
}elsif($seconds >= $stop_sec){
push(@lines,$line) unless($stop);
$stop = 1;
}
}else{
# save line w/o time string to array if it falls into the range
push(@lines,$line) if(($#lines>=0)&&!($stop));
}
}
close(LOG);

# print the saved lines
print “$_\n” for(@lines);
Reply With Quote Reply With Quote
09-24-2012 #5
charith charith is offline
Just Joined!
Join Date
Nov 2010
Posts
26
Hi atreyu,
It’s working fine thank you very much.

Did small change:
Code:
# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
as
Code:
if (($seconds >= $start_sec)&&($seconds <= $stop_sec)){ } } close(LOG); #!/usr/bin/perl use strict; use warnings; use Date::Parse; # get command line arguments (3) die " Usage: $0 '‘ ‘
E.g.: $0 ‘Sep 10, 2012 5:20:41 PM’ ‘Sep 10, 2012 5:49:40 PM’ CDSServer.log\n”
unless($#ARGV == 2);

my $startTime = $ARGV[0];
my $stopTime = $ARGV[1];
my $log = $ARGV[2];

# make sure the log file exists
die “$log: No such file\n” unless(-f$log);

# convert date/time strings to seconds since epoch
my $start_sec = str2time($startTime);
my $stop_sec = str2time($stopTime);

# make sure we got nothing but digits in the variables
die “Failed to convert $startTime to seconds\n” unless($start_sec =~ /^[0-9]*$/);
die “Failed to convert $stopTime to seconds\n” unless($stop_sec =~ /^[0-9]*$/);

print “Start time: $startTime ($start_sec)\n”;
print “Stop time: $stopTime ($stop_sec)\n”;

my @lines;
my $stop;

open(LOG,’<',$log) or die "can't read '$log': $!\n"; while(){
chomp;
my $line = $_; # save original line
s/[ \t]+/ /; # replace contiguous white spaces w/single space
if(/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>/){
my $timedate = $1;

# convert date/time string in log entro to epoch seconds
my $seconds = str2time($timedate);

# save line w/time string to array if it falls into the range
if($seconds >= $start_sec){
push(@lines,$line) unless($stop);
}elsif($seconds >= $stop_sec){
push(@lines,$line) unless($stop);
$stop = 1;
}
}else{
# save line w/o time string to array if it falls into the range
push(@lines,$line) if(($#lines>=0)&&!($stop));
}
}
close(LOG);

# print the saved lines
print “$_\n” for(@lines);

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>