Principle of Script
Defining the Shell Type
To make a ksh script (which is a ksh program) crate a new file with a starting line like:
#!/usr/bin/ksh
It is important that the path to the ksh is propper and that the line doesn not have more than 32 characters. The shell from which you are starting the script will find this line and and hand the whole script over to to ksh. Without this line the script would be interpreted by the same typ of shell as the one, from which it was started. But since the syntax is different for all shells, it is necessary to define the shell with that line.
Four Types of Lines
A script has four types of lines: The shell defining line at the top, empty lines, commentary lines starting with a # and command lines. See the following top of a script as an example for these types of lines:
#!/usr/bin/ksh
# Commentary……
file=/path/file
if [[ $file = $1 ]];then
command
fi
Start and End of Script
The script starts at the first line and ends either when it encounters an “exit” or the last line. All “#” lines are ignored.
Start and End of Command
A command starts with the first word on a line or if it’s the second command on a line with the first word after a”;’.
A command ends either at the end of the line or whith a “;”. So one can put several commands onto one line:
print -n “Name: “; read name; print “”
One can continue commands over more than one line with a “\” immediately followed by a newline sign which is made be the return key:
grep filename | sort -u | awk ‘{print $4}’ | \
uniq -c >> /longpath/file
Name and Permissions of Script File
The script mus not have a name which is identical to a unix command: So the script must NOT be called “test”!
After saveing the file give it the execute permissions with: chmod 700 filename.
Variables
Filling in
When filling into a variable then one uses just it’s name: state=”US” and no blanks. There is no difference between strings and numbers: price=50.
Using
When using a variable one needs to put a $ sign in front of it: print $state $price.
Arrays
Set and use an array like:
arrname[1]=4 To fill in
print ${arraname[1]} To print out
${arrname[*]} Get all elements
${#arrname[*]} Get the number of elements
Declaration
There are happily no declarations of variables needed in ksh. One cannot have decimals only integers.
Branching
if then fi
if [[ $value -eq 7 ]];then
print “$value is 7”
fi
or:
if [[ $value -eq 7 ]]
then
print “$value is 7”
fi
or:
if [[ $value -eq 7 ]];then print “$value is 7”;fi
if then else fi
if [[ $name = “John” ]];then
print “Your welcome, ${name}.”
else
print “Good bye, ${name}!”
fi
if then elif then else fi
if [[ $name = “John” ]];then
print “Your welcome, ${name}.”
elif [[ $name = “Hanna” ]];then
print “Hello, ${name}, who are you?”
else
print “Good bye, ${name}!”
fi
case esac
case $var in
john|fred) print $invitation;;
martin) print $declination;;
*) print “Wrong name…”;;
esac
Looping
while do done
while [[ $count -gt 0 ]];do
print “\$count is $count”
(( count -= 1 ))
done
until do done
until [[ $answer = “yes” ]];do
print -n “Please enter \”yes\”: ”
read answer
print “”
done
for var in list do done
for foo in $(ls);do
if [[ -d $foo ]];then
print “$foo is a directory”
else
print “$foo is not a directory”
fi
done
continue…break
One can skip the rest of a loop and directly go to the next iteration with: “continue”.
while read line
do
if [[ $line = *.gz ]];then
continue
else
print $line
fi
done
One can also prematurely leave a loop with: “break”.
while read line;do
if [[ $line = *!(.c) ]];then
break
else
print $line
fi
done
Command Line Arguments
(Officially they are called “positional parameters”)
The number of command line arguments is stored in $# so one can check
for arguments with:
if [[ $# -eq 0 ]];then
print “No Arguments”
exit
fi
The single Arguments are stored in $1, ….$n and all are in $* as one string. The arguments cannot
directly be modified but one can reset the hole commandline for another part of the program.
If we need a first argument $first for the rest of the program we do:
if [[ $1 != $first ]];then
set $first $*
fi
One can iterate over the command line arguments with the help of the shift command. Shift indirectly removes the first argument.
until [[ $# -qe 0 ]];do
# commands ….
shift
done
One can also iterate with the for loop, the default with for is $*:
for arg;do
print $arg
done
The program name is stored in $0 but it contains the path also!
Comparisons
To compare strings one uses “=” for equal and “!=” for not equal.
To compare numbers one uses “-eq” for equal “-ne” for not equal as well as “-gt” for greater than
and “-lt” for less than.
if [[ $name = “John” ]];then
# commands….
fi
if [[ $size -eq 1000 ]];then
# commands….
fi
With “&&” for “AND” and “||” for “OR” one can combine statements:
if [[ $price -lt 1000 || $name = “Hanna” ]];then
# commands….
fi
if [[ $name = “Fred” && $city = “Denver” ]];then
# commands….
fi
Variable Manipulations
Removing something from a variable
Variables that contain a path can very easily be stripped of it: ${name##*/} gives you just the filename.
Or if one wants the path: ${name%/*}. % takes it away from the left and # from the right.
%% and ## take the longest possibility while % and # just take the shortest one.
Replacing a variable if it does not yet exits
If we wanted $foo or if not set 4 then: ${foo:-4} but it still remains unset. To change that we use:
${foo:=4}
Exiting and stating something if variable is not set
This is very important if our program relays on a certain vaiable: ${foo:?”foo not set!”}
Just check for the variable
${foo:+1} gives one if $foo is set, otherwise nothing.
Ksh Regular Expressions
Ksh has it’s own regular expressions.
Use an * for any string. So to get all the files ending it .c use *.c.
A single character is represented with a ?. So all the files starting with any sign followed bye 44.f can be fetched by: ?44.f.
Especially in ksh there are quantifiers for whole patterns:
?(pattern) matches zero or one times the pattern.
*(pattern) matches any time the pattern.
+(pattern) matches one or more time the pattern.
@(pattern) matches one time the pattern.
!(pattern) matches string without the pattern.
So one can question a string in a variable like: if [[ $var = fo@(?4*67).c ]];then …
Functions
Description
A function (= procedure) must be defined before it is called, because ksh is interpreted at run time.
It knows all the variables from the calling shell except the commandline arguments. But has it’s
own command line arguments so that one can call it with different values from different places in
the script. It has an exit status but cannot return a value like a c funcition can.
Making a Function
One can make one in either of the following two ways:
function foo {
# commands…
}
foo(){
# commands…
}
Calling the Function
To call it just put it’s name in the script: foo. To give it arguments do: foo arg1 arg2 …
The arguments are there in the form of $1…$n and $* for all at once like in the main code.
And the main $1 is not influenced bye the $1 of a particular function.
Return
The return statement exits the function imediately with the specified return value as an exit status.
Data Redirection
General
Data redirection is done with the follwoing signs: “> >> < <<". Every program has at least a standardinput, standardoutput and standarderroroutput. All of these can be redirected. Command Output to File For writing into a new file or for overwriting a file do: command > file
For appending to a file do: command >> file
Standard Error Redirection
To redirect the error output of a command do: command 2> file
To discard the error alltogether do: command 2>/dev/null
To put the error to the same location as the normal output do: command 2>&1
File into Command
If a program needs a file for input over standard input do: command < file
Combine Input and Output Redirection
command < infile > outfile
command < infile > outfile 2>/dev/null
Commands into Program ( Here Document )
Every unix command can take it’s commands from a text like listing with:
command <
‘ a=$var
BEGIN { }, { } and end { }
An awk script can have three types of blocks. One of them must be there. The BEGIN{} block is processed before the file is checked. The {} block runs for every line of input and the END{} block is processed after the final line of the input file.
awk ‘
BEGIN { myvalue = 1700 }
/debt/ { myvalue -= $4 }
/want/ { myvalue += $4 }
END { print myvalue }
‘ infile
Match in a particular field
Awk autosplits a line on whitespace as default. The fields are stored in $1 through $NF and the whole line is in $0. One can match or not match an individual field.
awk ‘
$1 ~ /fred/ && $4 !~ /ok/ {
print “Fred has not yet paid $3”
}
‘ infile
For, If, substr()
Awk can do for() loops like in c++ and has the normal if and while structures. In NR is current line number and in NF the number of fields on the current line.
awk ‘
BEGIN { count = 0 }
/myline/ {
for(i=1;i<=NF;i++){
if(substr($i,3,2) == "ae"){
bla = "Found it on line: "
print bla NR " in field: " i
count++
}
}
}
END { print "Found " count " instances of it" }
' infile
Turn around each word in a file:
awk '
{ for(i=1;i<=NF;i++){
len = length($i)
for(j=len;j>0;j–){
char = substr($i,j,1)
tmp = tmp char
}
$i = tmp
tmp = “”
}
print
}
‘ infile
Awk scripts within a shell script
Extract email addresses from incoming mail. The mail would be guided to the following script from within the ~/.forward file. This is not an eficient method, but only an example to show serial processing of text. The next example will do the same thing within awk only and will be efficient. The mail comes in over standardinput into the script.
Between the commands there must be a pipe “|”. For continuing on the next line one needs a “\” behind the pipe to escape the invisible newline.
#!/usr/bin/ksh
{ while read line;do
print – “$line”
done } |\
tee -a /path/mymailfile |\
awk ‘
/^From/ || /^Replay/ {
for(i=1;i<=NF;i++){
if($i ~ /@/){
print $i
}
}
}
' |\
sed '
s/[<>]//g;
s/[()]//g;
s/”//g;
…more substitutions for really extracting the email only…
‘ |\
{ while read addr;do
if [[ $(grep -c $addr /path/antimailfile) -gt 0 ]];then
mail $addr <
With #!/usr/bin/nawk -f the whole script is interpreted intirely as an awk script and no more shell escapes are needed, but one can and has to do everything in awk itself. It’s nawk because of the getline function.
While iterates until the expression becomes wrong or until a break is encountered.
Gsub() is for string substitution.
Getline reads in a line each time it es called.
System() executes a unix command.
“>>” appends to a file.
This script es an example only. For really extracting email addresses several special cases would have to be considered…
#!/usr/bin/nawk -f
# Lines from a mail are dropping in over stdin. Append every line to a
# file before checking anything.
{ print >> “/path/mymailfile” }
# Find lines with with From: or Replay: at beginning.
/^From:/ || /^Replay/ {
# Find fields with @. Iterate over the fields and check for @
for(i=1;i<=nf;i++){ if($i ~ /@/){ # Clean the email addresses with gsub() gsub(/[<>()”]/,””,$i)
# Check whether the email address is in the antimailfile
while( getline antiaddr < "/path/antimailfile" ){
# Compare actual address in $i with loaded address
if($i == antiaddr){
# Send a negative mail
system("mail " $i " < /path/badmail")
# Now end the while loop
break
}else{
# Send a positive mail
system("mail " $i " < /path/goodmail")
}
}
}
}
}
Calculate on columns and print formated output
If one has a formated input of number columns one can still split them on white space, but has to consider the format for output with printf()
#!/usr/bin/nawk -f
# Reprintet lines without foo or boo
! /(foo|boo)/ { print }
# Rearange and calculate with columns but only on lines with foo or boo
/(foo|boo)/ {
# Extract fields
mytype = $1
var1 = $2
var2 = $3
var3 = $4
# Calculate
if(mytype == "foo"){
var1 *= 10
var2 += 20
var3 = log(var3)
}
if(mytype == "boo"){
var1 *= 4
var2 += 10
var3 = cos(var3)
}
# Print formated output in reverse order
printf("%-4s%10.3f%10.3f%10.3f\n",mytype,var3,var2,var1)
}
How to iterate over each word of a shell variable in awk
In this example there is first a shell variable filled in and then it is given to awk. Awk splits it into an array and then iterates over the array and looks for each word on the current line of a file. If it finds it, it prints the whole line.
#!/usr/bin/ksh
var="term1 term2 term3 term4 term5"
awk '
BEGIN { split(myvar,myarr) }
{
for(val in myarr){
if($0 ~ myarr[val]){
print
}
}
}
' myvar="$var" file
Functions
This example substitutes the first three occurences of "searchterm" with a different term in each case and from the fourth case it just prints the line as it is.
It should show where to place a function and how to call it.
#!/usr/bin/nawk -f
BEGIN{
mysub1 = "first_sub"
mysub2 = "second_sub"
mysub3 = "third_sub"
mycount = 1
find = "searchterm"
}
{
if($0 ~ find){
if(mycount == 1){ replace(mysub1); }
if(mycount == 2){ replace(mysub2); }
if(mycount == 3){ replace(mysub3); }
if(mycount > 3){ print; }
mycount++
}else{
print
}
}
function replace(mysub) {
sub(find,mysub)
print
break
}
CGI with gawk
As an example for a CGI script in awk I make one which presents the unix man pages in html.
man.cgi
String functions
sub(regexp,sub) Substitute sub for regexp in $0
sub(regexp,sub,var) Substitute sub for regexp in var
gsub(regexp,sub) Globally substitute sub for regexp in $0
gsub(regexp,sub,var) Globally substitute sub for regexp in var
split(var,arr) Split var on white space into arr
split(var,arr,sep) Split var on white space into arr on sep as separator
index(bigvar,smallvar) Find index of smallvar in bigvar
match(bigvar,expr) Find index for regexp in bigvar
length(var) Number of characters in var
substr(var,num) Extract chars from posistion num to end
substr(var,num1,num2) Extract chars from num1 through num2
sprintf(format,vars) Format vars to a string
When to use awk, when to use perl?
Perl can do 100 times more than awk can, but awk is present on any standard unix system, where perl first has to be installed. And for short commands awk seems to be more practical. The autosplit mode of perl splits into pieces called: $F[0] through $F[$#F] which is not so nice as $1 through $NF where awk retains the whole line in $0 at the same time.
To get the first column of any file in awk and in perl:
awk ‘{print $1}’ infile
perl -nae ‘print $F[0],”\n”;’ infile
Recent Comments