{"id":3415,"date":"2014-08-06T00:17:00","date_gmt":"2014-08-05T16:17:00","guid":{"rendered":"http:\/\/rmohan.com\/?p=3415"},"modified":"2014-08-06T10:17:03","modified_gmt":"2014-08-06T02:17:03","slug":"parsing-logs","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=3415","title":{"rendered":"Parsing &#8211; Logs"},"content":{"rendered":"<p>grep, cat, zgrep and zcat<\/p>\n<p>More on log parsing, I\u2019m taking notes on how to read log files and get the information that I need. On Linux environment, these tools are perfect: grep, cat, zgrep and zcat.<\/p>\n<p>Extracting patterns with grep<\/p>\n<p>We can extract information from a text file using grep. Example, we can extract lines of log file containing patterns like GET \/checkout\/* where status code is 500.<\/p>\n<p>1<br \/>\ngrep -E -e &#8216;GET \/checkout\/.* HTTP\/1\\.(0|1)&#8221; 500&#8217; some-log-file.log<br \/>\nDepending on the Apache log format, above will extract lines whose request is \/checkout\/* and status code is 500 where it may support HTTP\/1.0 or HTTP\/1.1. However, that would extract the whole line. To only extract the matching pattern, use the -o option.<\/p>\n<p>1<br \/>\ngrep -o -E -e &#8216;GET \/checkout\/.* HTTP\/1\\.(0|1)&#8221; 500&#8217; some-log-file.log<br \/>\nAnd to save the matching patterns to a file, simply redirect the output to file.<\/p>\n<p>1<br \/>\ngrep -E -e &#8216;GET \/checkout\/.* HTTP\/1\\.(0|1)&#8221; 500&#8217; some-log-file.log > checkout-errors.txt<br \/>\nUsing cat<\/p>\n<p>cat is usually used to output contents of a file. This is a small but very useful Linux utility. For example, we can combine multiple log files (uncompressed) into a single log file.<\/p>\n<p>1<br \/>\ncat \/path\/to\/log-files\/*.log > \/combined\/log-file.log<br \/>\nCompressed counterpart<\/p>\n<p>grep and cat have their compressed file counterpart. For grep, there\u2019s zgrep.<\/p>\n<p>1<br \/>\nzgrep -E -e &#8216;GET \/checkout\/.* HTTP\/1\\.(0|1)&#8221; 500&#8217; some-log-file.gz > checkout-errors.txt<br \/>\nFor cat, there\u2019s zcat.<\/p>\n<p>1<br \/>\nzcat \/path\/to\/log-files\/*.gz > \/combined\/log-file.log<br \/>\nI\u2019ve done so many combination last week that I don\u2019t remember them all and not able to include in this post. Happy log parsing.<\/p>\n<p># List out successful ssh login attempts<br \/>\ncat secure | grep &#8216;Accepted&#8217; | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217;<br \/>\ncat secure* | sort | grep &#8216;Accepted&#8217; | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; IP:&#8221; $11 }&#8217;<\/p>\n<p># List out successful ssh login attempts from sudo users<br \/>\ncat \/var\/log\/secure | grep &#8216;session opened for user root&#8217; | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; Sudo User: &#8221; $13 &#8221; &#8221; }&#8217;<\/p>\n<p># List out ssh login attempts from non-existing and unauthorized user accounts<br \/>\ncat \/var\/log\/secure | grep &#8216;Invalid user&#8217;<\/p>\n<p># List out ssh login attempts by authorized ssh accounts with failed password<br \/>\ncat \/var\/log\/secure | grep -v invalid | grep &#8216;Failed password&#8217;<\/p>\n<p>Indeed, and even grep | awk can be shortened to awk \/\u2026\/. So you could save a bit of space in the final script. For a typical log file (~200 kb), you might save 1 ms processing it. Or to be exact, 1.8 ms removing the cat and grep, and 0.3 ms using only awk instead of grep | awk.<\/p>\n<p>time for i in `seq 1000`; do cat secure | grep Accepted | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217;; done > \/tmp\/a<br \/>\ntime for i in `seq 1000`; do grep Accepted secure | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217;; done > \/tmp\/b<br \/>\ntime for i in `seq 1000`; do awk &#8216;\/Accepted\/ {print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217; secure; done > \/tmp\/c<\/p>\n<p>However, more interestingly, when the size of the log file is increased to 200 MB, it turns out that the cat | grep | awk chain is significantly faster, at 1.096 s over 100 runs. The single awk command will not max out the CPUs, while the pipe chain does.<\/p>\n<p>for i in `seq 1000`; do cat secure >> s; done<br \/>\ntime for i in `seq 100`; do cat s | grep Accepted | awk &#8216;{print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217;; done > \/tmp\/a1<br \/>\ntime for i in `seq 100`; do awk &#8216;\/Accepted\/ {print $1 &#8221; &#8221; $2 &#8221; &#8221; $3 &#8221; User: &#8221; $9 &#8221; &#8221; }&#8217; s; done > \/tmp\/c1<\/p>\n<p>My perl script awk:<br \/>\nCode:<br \/>\nmy $OldTime= &#8216;Sep 10, 2012 5:20:41&#8242;;<br \/>\nmy $NewTime=&#8217;Sep 10, 2012 5:49:40&#8217;;<\/p>\n<p>my $test2 = qx{ssh -o stricthostkeychecking=no $WLS &#8220;awk &#8216;\/$OldTime\/,\/$NewTime\/&#8217; $WLSP\/logs\/CDSServer.* &#8220;};<br \/>\nMy log file format:<br \/>\nCode:<br \/>\n####<Sep 10, 2012 5:44:41 PM IST> <Info> Kernel>> <>  <Connection for pool  closed.><br \/>\n####<Sep 10, 2012 5:44:41 PM IST> <Info> <JDBC> <cdsserver> <CDSServer11> <[ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'><br \/>\n####<Sep 10, 2012 5:44:41 PM IST> <Info> <JDBC> <12d58f5205084394:4db5aec7:1393b6c3ce7:-8000-00000000\n####<Sep 10, 2012 5:44:41 PM IST> <Info> <JDBC> <cdsserver> <CDSServer11> <[ACTIVE] ExecuteThread: '\nReply With Quote Reply With Quote\n09-18-2012 #2\natreyu  atreyu is offline\nTrusted Penguin\nJoin Date\nMay 2011\nPosts\n4,353\nHi,\n\nI'd make two suggestions. The first one is to use the Date::Parse Perl module. It is hopefully already packaged for your distro. This module will allow you to easily convert date\/time strings to seconds since the epoch (which is an easy way to do date\/time math). It will give you the equivalent output to this GNU date command:\n\nCode:\ndate +%s -d \"Sep 10, 2012 5:20:41 PM\"\nThe second suggestion would be to put the script on the server and pass to it 3 arguments:\n\n1. the start date\/time range \n2. the end date\/time range \n3. the log file to parse\n\nthen you'd do something like this to call it:\n\nCode:\nssh server \/tmp\/parse-log.pl 'Sep 10, 2012 5:20:41 PM' 'Sep 10, 2012 5:44:42 PM' \/path\/to\/CDSserver.log and here is the parse-log.pl script:\n\nCode:\n#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Date::Parse;\n\n# get command line arguments (3)\ndie \"\n  Usage: $0 '<START_TIME>&#8216; &#8216;<STOP_TIME>&#8216; <LOG_FILE><br \/>\n   E.g.: $0 &#8216;Sep 10, 2012 5:20:41 PM&#8217; &#8216;Sep 10, 2012 5:49:40 PM&#8217; CDSServer.log\\n&#8221;<br \/>\n  unless($#ARGV == 2);<\/p>\n<p>my $startTime = $ARGV[0];<br \/>\nmy $stopTime  = $ARGV[1];<br \/>\nmy $log = $ARGV[2];<\/p>\n<p># make sure the log file exists<br \/>\ndie &#8220;$log: No such file\\n&#8221; unless(-f$log);<\/p>\n<p># convert date\/time strings to seconds since epoch<br \/>\nmy $start_sec = str2time($startTime);<br \/>\nmy $stop_sec = str2time($stopTime);<\/p>\n<p>print &#8220;Start time: $startTime ($start_sec)\\n&#8221;;<br \/>\nprint &#8220;Stop time: $stopTime ($stop_sec)\\n&#8221;;<\/p>\n<p>open(LOG,&#8217;<',$log) or die \"can't read '$log': $!\\n\";\nwhile(<LOG>){<br \/>\n  chomp;<br \/>\n  my $line = $_; # save original line<br \/>\n  s\/[ \\t]+\/ \/;   # replace contiguous white spaces w\/single space<br \/>\n  if(\/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>\/){<br \/>\n    my $timedate = $1;<\/p>\n<p>    # convert date\/time string in log entro to epoch seconds<br \/>\n    my $seconds = str2time($timedate);<\/p>\n<p>    # print line if it falls into the range<br \/>\n    print $line,&#8221;\\n&#8221; if(($seconds >= $start_sec)&#038;&#038;($seconds <= $stop_sec));\n  }\n}\nclose(LOG);\nReply With Quote Reply With Quote\n09-19-2012 #3\ncharith  charith is offline\nJust Joined!\nJoin Date\nNov 2010\nPosts\n26\nHi atreyu,\n\nThank you very much for your great clean reply.\n\nYour script working fine and it gives lines those match given times but what i need is get all lines whatever between that time range. I'm sorry for my bad log file format i attached correct log file below.[Errors not begin with time] \n\nCode:\n####<Aug 25, 2012 11:11:42 PM IST> <Warning> <JDBC> <cdsserver> f-tuning)&#8217;> <<WLS Kernel>> <> <> <1345916502340> <B\n####<Aug 25, 2012 11:11:42 PM IST> <Info> <JDBC> <cdsserver> <CDSServer2> <[ACTIVE] ExecuteThread: '20' for queue: '\njava.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection\n\tat oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:443)\n\tat oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:670oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:230<br \/>\n###<Aug 25, 2012 11:11:47 PM IST> <Warning> <JDBC> <cdsserver> <CDSServer2> <[ACTIVE] ExecuteThrea\n####<Aug 25, 2012 11:11:47 PM IST> <Info> <JDBC> <cdsserver> <CDSServer2> <[ACT\nAbove java.sql.SQLRecoverableException: IO Error: part should be retrieve.\nReply With Quote Reply With Quote\n09-20-2012 #4\natreyu  atreyu is offline\nTrusted Penguin\nJoin Date\nMay 2011\nPosts\n4,353\nah, okay. yeah, that changes things, but not by too much. basically, you can just set a marker once the start time string is matched, then set a stop marker once the end time string is matched, and save everything in between to an array. then print the array once you're done looping thru the file.\n\nCode:\n#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Date::Parse;\n\n# get command line arguments (3)\ndie \"\n  Usage: $0 '<START_TIME>&#8216; &#8216;<STOP_TIME>&#8216; <LOG_FILE><br \/>\n   E.g.: $0 &#8216;Sep 10, 2012 5:20:41 PM&#8217; &#8216;Sep 10, 2012 5:49:40 PM&#8217; CDSServer.log\\n&#8221;<br \/>\n  unless($#ARGV == 2);<\/p>\n<p>my $startTime = $ARGV[0];<br \/>\nmy $stopTime  = $ARGV[1];<br \/>\nmy $log = $ARGV[2];<\/p>\n<p># make sure the log file exists<br \/>\ndie &#8220;$log: No such file\\n&#8221; unless(-f$log);<\/p>\n<p># convert date\/time strings to seconds since epoch<br \/>\nmy $start_sec = str2time($startTime);<br \/>\nmy $stop_sec = str2time($stopTime);<\/p>\n<p># make sure we got nothing but digits in the variables<br \/>\ndie &#8220;Failed to convert $startTime to seconds\\n&#8221; unless($start_sec =~ \/^[0-9]*$\/);<br \/>\ndie &#8220;Failed to convert $stopTime to seconds\\n&#8221; unless($stop_sec =~ \/^[0-9]*$\/);<\/p>\n<p>print &#8220;Start time: $startTime ($start_sec)\\n&#8221;;<br \/>\nprint &#8220;Stop time: $stopTime ($stop_sec)\\n&#8221;;<\/p>\n<p>my @lines;<br \/>\nmy $stop;<\/p>\n<p>open(LOG,&#8217;<',$log) or die \"can't read '$log': $!\\n\";\nwhile(<LOG>){<br \/>\n  chomp;<br \/>\n  my $line = $_; # save original line<br \/>\n  s\/[ \\t]+\/ \/;   # replace contiguous white spaces w\/single space<br \/>\n  if(\/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>\/){<br \/>\n    my $timedate = $1;<\/p>\n<p>    # convert date\/time string in log entro to epoch seconds<br \/>\n    my $seconds = str2time($timedate);<\/p>\n<p>    # save line w\/time string to array if it falls into the range<br \/>\n    if($seconds >= $start_sec){<br \/>\n      push(@lines,$line) unless($stop);<br \/>\n    }elsif($seconds >= $stop_sec){<br \/>\n      push(@lines,$line) unless($stop);<br \/>\n      $stop = 1;<br \/>\n    }<br \/>\n  }else{<br \/>\n    # save line w\/o time string to array if it falls into the range<br \/>\n    push(@lines,$line) if(($#lines>=0)&#038;&#038;!($stop));<br \/>\n  }<br \/>\n}<br \/>\nclose(LOG);<\/p>\n<p># print the saved lines<br \/>\nprint &#8220;$_\\n&#8221; for(@lines);<br \/>\nReply With Quote Reply With Quote<br \/>\n09-24-2012 #5<br \/>\ncharith  charith is offline<br \/>\nJust Joined!<br \/>\nJoin Date<br \/>\nNov 2010<br \/>\nPosts<br \/>\n26<br \/>\nHi atreyu,<br \/>\nIt&#8217;s working fine thank you very much.<\/p>\n<p>Did small change:<br \/>\nCode:<br \/>\n# save line w\/time string to array if it falls into the range<br \/>\n    if($seconds >= $start_sec){<br \/>\nas<br \/>\nCode:<br \/>\nif (($seconds >= $start_sec)&#038;&#038;($seconds <= $stop_sec)){  }\n}\nclose(LOG);\n\n#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Date::Parse;\n\n# get command line arguments (3)\ndie \"\n  Usage: $0 '<START_TIME>&#8216; &#8216;<STOP_TIME>&#8216; <LOG_FILE><br \/>\n   E.g.: $0 &#8216;Sep 10, 2012 5:20:41 PM&#8217; &#8216;Sep 10, 2012 5:49:40 PM&#8217; CDSServer.log\\n&#8221;<br \/>\n  unless($#ARGV == 2);<\/p>\n<p>my $startTime = $ARGV[0];<br \/>\nmy $stopTime  = $ARGV[1];<br \/>\nmy $log = $ARGV[2];<\/p>\n<p># make sure the log file exists<br \/>\ndie &#8220;$log: No such file\\n&#8221; unless(-f$log);<\/p>\n<p># convert date\/time strings to seconds since epoch<br \/>\nmy $start_sec = str2time($startTime);<br \/>\nmy $stop_sec = str2time($stopTime);<\/p>\n<p># make sure we got nothing but digits in the variables<br \/>\ndie &#8220;Failed to convert $startTime to seconds\\n&#8221; unless($start_sec =~ \/^[0-9]*$\/);<br \/>\ndie &#8220;Failed to convert $stopTime to seconds\\n&#8221; unless($stop_sec =~ \/^[0-9]*$\/);<\/p>\n<p>print &#8220;Start time: $startTime ($start_sec)\\n&#8221;;<br \/>\nprint &#8220;Stop time: $stopTime ($stop_sec)\\n&#8221;;<\/p>\n<p>my @lines;<br \/>\nmy $stop;<\/p>\n<p>open(LOG,&#8217;<',$log) or die \"can't read '$log': $!\\n\";\nwhile(<LOG>){<br \/>\n  chomp;<br \/>\n  my $line = $_; # save original line<br \/>\n  s\/[ \\t]+\/ \/;   # replace contiguous white spaces w\/single space<br \/>\n  if(\/^####<([a-zA-Z]{3} [0-9]{1,2}, [0-9]{4} [0-9]{1,2}:[0-9]{2}:[0-9]{2} [AP]M) [A-Z]{3}>\/){<br \/>\n    my $timedate = $1;<\/p>\n<p>    # convert date\/time string in log entro to epoch seconds<br \/>\n    my $seconds = str2time($timedate);<\/p>\n<p>    # save line w\/time string to array if it falls into the range<br \/>\n    if($seconds >= $start_sec){<br \/>\n      push(@lines,$line) unless($stop);<br \/>\n    }elsif($seconds >= $stop_sec){<br \/>\n      push(@lines,$line) unless($stop);<br \/>\n      $stop = 1;<br \/>\n    }<br \/>\n  }else{<br \/>\n    # save line w\/o time string to array if it falls into the range<br \/>\n    push(@lines,$line) if(($#lines>=0)&#038;&#038;!($stop));<br \/>\n  }<br \/>\n}<br \/>\nclose(LOG);<\/p>\n<p># print the saved lines<br \/>\nprint &#8220;$_\\n&#8221; for(@lines);<\/p>\n","protected":false},"excerpt":{"rendered":"<p>grep, cat, zgrep and zcat<\/p>\n<p>More on log parsing, I\u2019m taking notes on how to read log files and get the information that I need. On Linux environment, these tools are perfect: grep, cat, zgrep and zcat.<\/p>\n<p>Extracting patterns with grep<\/p>\n<p>We can extract information from a text file using grep. Example, we can extract [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/3415"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3415"}],"version-history":[{"count":5,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/3415\/revisions"}],"predecessor-version":[{"id":3432,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/3415\/revisions\/3432"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3415"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3415"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3415"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}