How to raise and take care of shell scripts: check input
Every input is evil?
Yes, of course. At least if it's coming from a source you don't have under control completely. Input may come using the command line, from a file, from a command's output, ... The input data can be empty, can contain unexpected values (a letter when you expect digits), can produce numeric overflows, can have a format which is interpreted in a wrong way (some programs expect that a number with a leading zero is an octal value, in this case a 09 is invalid), input can contain control characters, ...
Your main principle should be: Don't use input data coming from untrusted sources without validation. The list below is a selection of tools, which you can use to validate input. It's not nearly complete - there are almost unlimited possibilities to do the job. Where you can get more information? Bingo: in the manual pages!
- test: A shell builtin to check empty or non empty variables, numeric and alphanumeric
values and pathnames. There are two available syntax formats:
if test -f "$VAR"; then ...
if [ -f "$VAR" ]; then ...
Which one you use depends on your preferences, but you should do that all the same way within a script. - tr: This is a command to replace or delete characters or character classes. It's useful
for example to check if a variable is an integer:
test -z "$VAR" -o -n "`echo \"$VAR\" | tr -d '[0-9]'`" && echo "is not an integer"
And this is going on here:- -z "$VAR": The variable is empty.
- -o: Or (now from the inside to the outside)
- echo \"$VAR\" | tr -d '[0-9]': Deletes all digits from the variable's value and returns the result on standard output. To learn about the reason, why I escaped the quotes here, look at Quoting.
- "`...`": Substitutes the standard output of the command above with the value (see the chapter "command substitution" in your shell manual).
- -n "`...`": The result is not empty. This means, that the variable's value contains non digit characters - it's not a (non negative) integer.
- ... && echo ...: The echo is executed only if the previously executed command returned a zero exit value (if the variable was empty or if the value did contain at least one non digit character).
test -z "$VAR" -o -n "${VAR//[0-9]/}" && echo "is not an integer" - cut: Like the name says - cuts fields or characters from input lines. You can use it to
check parts of an input (using the bash it's partially possible to do it using
${VAR:start:length}). If you want to know, if the second field of a line containing values,
which are seperated by a semicolon, is not the character "A", do it like this:
test "`echo \"$VAR\" | cut -f2 -d';'`" = 'A' || echo "field2 ist not an 'A'" - expr: Normally it's a command to calculate values. You can "missuse" it to transform numeric
values into a more appropriate format (e. g. drop leading zeros):
echo `expr 09 + 0` will give you the value "9", which is a valid parameter for commands interpreting values with a leading zero as octal numbers. The following code for example will produce an error:
VAR=09; echo $((VAR * 2)) because the used shell builtin tries to use 09 as octal value. - grep: Now we are diving into the regular expressions labyrinth.
Using grep you can search input values for patterns. The following example checks, if a
variable's value starts with the "H" character and continues with at least one lowercase letter. Any other character
is invalid:
For basic regular expressions:
echo "$VAR" | grep -q '^H[a-z][a-z]*$' && echo "input o.k."
Using extended regular expressions:
echo "$VAR" | grep -qE '^H[a-z]+$' && echo "input o.k."
You are missing the test? Continue reading here: True/False - sed: This command also works with regular expressions. It can search for regex similar to
grep, but additionally (besides many other things) do replacements. The next examples
return the first integer from a value (if there is one):
For basic regular expressions:
echo "$VAR" | sed -n '/[0-9]/s/[^0-9]*\([0-9][0-9]*\).*/\1/p'
Using extended regular expressions:
echo "$VAR" | sed -nr '/[0-9]/s/[^0-9]*([0-9]+).*/\1/p'
I don't start with a "regex tutorial" here - in this case this page would never end. RTFM! ("Read The F... Manual" ;-)
To analyse input values, you should use your imagination! Want to check, if the input is a valid user or group name? getent. Want to know, if an existing process id is given? ps. Is the file system you want to unmount, already mounted? mount or df. Is the input really a date? date - I could extend this list almost infinitely, but then you would never see the end of this page %-/

