Take your BASH scripting seriously

Posted by Jesse Portnoy on May 03, 2023 · 8 mins read

In 2007, the legendary Larry Wall, creator of Perl, wrote an article called Programming is Hard, Let’s Go Scripting.
I’d recommend giving it a good read but here are some especially important quotes I think are worth pondering:

.. scripting is not a technical term. When we call something a scripting language, we’re primarily making a linguistic and cultural judgment, not a technical judgment.

Suppose you went back to Ada Lovelace and asked her the difference between a script and a program. She’d probably look at you funny, then say something like: Well, a script is what you give the actors, but a program is what you give the audience. That Ada was one sharp lady…

Since her time, we seem to have gotten a bit more confused about what we mean when we say scripting. It confuses even me, and I’m supposed to be one of the experts.

I absolutely agree. Why am I bringing this up in this context? Because it seems to me that many people are under the impression that if it’s “just a script”, it needn’t be written with care and failing to do a good job with it is not a reflection of you as a “serious” programmer. Nothing can be further from the truth. If you’re a professional programmer, you should take your script writing very seriously. It’s not purely a matter of professional pride, either. Take a moment and think of what you often use shell scripts for…

And… time’s up! I’d venture that for at least 70% of readers, installation/deployment/initialisation came to mind. Now, let me ask you this: if your installation process is poorly written and error prone, how will people get to use your otherwise brilliantly written software? That’s right, they wouldn’t. They’ll invoke the installation script (be it directly or via a package manager running through the different hooks), it will fail and, unless they absolutely MUST have it (which, let’s face it, is hardly ever the case as there are alternatives to almost everything out there), they’ll curse some and move on to the next plausible solution.

Okay, so, hopefully, I convinced you that it IS important to get that bit right. Now let’s discuss some ways to do so. I’ll specifically focus on BASH here and the first, crucial point we’ll discuss is this:

/bin/sh does not always mean /bin/bash

In the olden days, that used to have only been true to non-Linux Unices. On Linux /bin/sh was always (by default, of course you were able to change that) a symlink pointing to /bin/bash. So, if your product only ran on Linux, you could have saved yourself the headache of thinking of compatibility with other shells.

At a certain point however, some distros (I first encountered it on Ubuntu back in 2007 but it soon became the norm on Debian as well), started using DASH as the default shell. Why? Because while BASH is lovely and feature rich and wonderful as an interactive shell, it is also, due to of all these pleasing traits, slower; since init scripts are traditionally written to run by a Bourne compatible shell and people HATE waiting for things to boot, a transition was made.

Regardless of the move to DASH, I was never fond of ignoring all but BASH, myself, since I think that even if at a given moment, you’re only targeting Linux where you know BASH will likely be present, you don’t want shell compatibility issues (of all things!) to be a blocker to porting your project to other ENVs.

I’m not saying you have to write a version for every shell under the moon (in fact, I’d even go as far as to say you absolutely shouldn’t) but being aware of the fact that BASH has features that other Bourne compat shells (let alone shells that do not have this common base at all) may not and trying, when it’s not too big of a hassle, to stick to the common denominator, is good practice.

When you absolutely do need BASH specific features, specifying #!/bin/bash or, better yet #!/usr/bin/env bash will prevent users from trying to run your script using other shells. If BASH isn’t present, it will fail straight away (which may sound bad to novices but is actually a good thing), whereas, if you use /bin/sh and end up with your script running with, say, DASH, it may only fail later down the line and in a more confusing and frustrating way, after having done some actions that may left the system in a half baked state.

Alright, at this point, you may be thinking: “Okay, you’ve convinced me. From now on, I’ll be more explicit and people will know they must have BASH present for the installation to succeed and run properly but, I’m not going to avoid my beloved BASH features.”.

Indeed, in most cases, especially if your target audience consists mostly of Linux users and you already wrote loads of code, this is a reasonable approach. There’s one small fact I’d be remiss not to mention though: BASH is.. chubby:)

On my Debian machine:

jesse@jessex:~/kalcli_saas$ du -sxh /bin/bash
1.2M /bin/bash
jesse@jessex:~/kalcli_saas$ du -sxh /bin/dash
124K /bin/dash

This difference may feel negligible to you and, in most cases, it really is that but, if you’re in the embedded business — it may not be. Just something to keep in mind.

Right. So, how can one check whether one’s scripts are compatible? Well, the easiest way (other than to run them, which is easy enough but, depending on what they do, may take some time), you’ve got the simplistic approach of invoking your shell with:

-n /path/to/script

For example, here’s a snippet that’s BASH specific:

#!/bin/bash    

for ((i=0;$i<100;i++)){
    echo -e "$i"
}

If you run this with BASH, it will output what you’d expect it to.
If you run this with DASH, however, you’ll get:

for.sh: 3: Syntax error: Bad for loop variable

if you run bash -n for.sh you will get no output and the RC will be zero (note to self— write a post about how elegant this method to denote success or failure is some day), if you invoke dash -n for.sh you will get the same error as when running the script without -n. In other words, when passing -n to your shell, a simple, built-in linter is called.

This is super easy but, may not always be enough.. Luckily, the fine people at ShellCheck gave this some thought and created a nifty tool. It’s got a nice README, is available via most decent package managers and, even has a web interface where you can test snippets (in case you’re bored and on the tube, I guess). The README also includes a Gallery of bad code to showcase some of the stuff shellcheck can pick up on.

Okay, excellent. So far, we’ve covered why it’s important to make no assumptions as to the default shell and how to check whether our scripts depend on BASH specific features and why that may be a problem. Let’s move on to the next painful mistake people make when writing shell scripts.

Unbridled Optimism

The fun thing about shell scripting is that it’s mostly just glue. You use shell features and constructs to chain together different utilities with some logic.

Often, these scripts are written with the [overly naive] assumption that all ENVs will have the same utils.

Never assume. Always check. For instance, if your script needs ffmpeg, be sure to start it off with a simple which check:

# check that we have the needed binaries
BINARIES="ffmpeg avifenc"
for BINARY in $BINARIES; do
	if [! -x "`which $BINARY 2>/dev/null`"];then
		echo "Need to install $BINARY."
		exit 2
	fi
done

In the above, we’re actually checking for two binaries, of course, you can add as many as you need.

Of course, correctly declaring your deps is made far easier if you use standard packaging formats (i.e: deb, RPM. etc) and one day, I’ll finish my Docker is not a packaging tool series and solidify that point further:) Still, even when using proper packaging tools, it does not hurt to have these defences in place as you never know in which context your scripts may run (copy paste party anyone? No? Perhaps a porting party then?).

Some people reading this may say that checking for the needed dependencies is redundant since you can toggle BASH’s -e option to make it exit upon any failure. This is true and -e is a very important flag that I do intend to discuss at some length but, I’d argue that outputting an orderly message and exiting with a pre-expected RC is better whenever possible.

Next is a slightly more annoying and related problem.

Certain utils may have different flags across Unices

I am a Linux user. It’s not that I don’t respect my fellow .*BSD chaps or any of the other FOSS Unices out there, I do; and there are advantages and disadvantages to everything but I will say this: Linux distributions, generally speaking, come with the most pampering utils included:)

Let me choose just two favourite examples (I could give dozens of the top my head without consulting man pages once — but that’s just me bragging, not a fetching trait, alas):

On Linux, I can invoke the standard netstat util (part of the net-tools package and yes, I know it’s obsoleted by the ss util, I don’t want to get into it right now, though), thusly:

$ netstat -plntu

And get this, most useful output:

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1181016/nginx: mast
tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 172070/python3
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 51144/sshd: /usr/sb
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 1581154/postgres
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN 1589222/master
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 1181016/nginx: mast

So, I can see the local addr, the foreign one, the ports AND what process is listening and even it’s PID(!) if I’m a privileged enough user. Very handy, indeed. Go ahead, try that on Darwin (the real power behind Mac OS) or, if you’re feeling very adventurous and can find one, on AIX:)

Want another example? Let’s take our beloved awk. On Linux, the version you’ll find on most (if not all — I’m just super cautious by nature) distros, is GNU AWK (gawk) where the default field separator is wisely set to a white space, so if I wanted to use it to get the Local Address column from the above netstat output, I could simply do:

$ netstat -plntu| awk{print $4}

And get:

Local
0.0.0.0:80
127.0.0.1:8080
0.0.0.0:22
127.0.0.1:5432
0.0.0.0:25
0.0.0.0:443
:::80
:::22
:::3000
:::25
:::443
127.0.0.1:323
::1:323

On all the other Unices I ever worked with (and I’ve worked on many), you need to explicitly specify -F “ “ to get the same which is why, I always do so and so should you:)

How can you catch these cases without trying? You can’t really, you simply have to test your scripts on as many Unices as you can get your hands on and encourage your users to report issues by being kind and appreciative when they do.

Right, on to my next advice.

Status codes exist for a reason

A common (and odd!) tendency people have when they do bother to add tests to their scripts is to [in case of failure] echo some message and invoke exit without specifying a status code (or, almost as bad, always use the same one).

Why is this so bad? Because it will not always be a human being interactively running the script. If you’re writing an init script for example (and yes. on Linux most people have switched to systemd and friends, but there are still plenty of older distros that are supported by many projects that do not have systemd support, plus, Linux is not the only animal out there) or a test meant to run by a CI/CD solution, having well defined status codes is imperative. By the way, this is also true when writing RESTful APIs (nothing more annoying than an API by people who feel all cases can be perfectly covered by returning either HTTP 200 or HTTP 404!).

This ends part one of this post. In the next instalment, we’ll cover other useful BASH flags/moderators (-e, -o, -x), discuss trapping and handling errors and how to make our users (and our own) lives easier with proper argument parsing and usage messages.

Happy scripting,