Heise writes an introduction to bash programming (in german):
Bash ist eine vollwertige Programmiersprache, mit der Sie alltägliche Aufgaben leicht automatisieren.
Bash is a fully featured programming language that you can use to automate everyday tasks.
Bash is not a fully featured programming language at all, and nothing in bash is ever easy. You are advised to use a proper programming language early on in development, and if possible never put bash commands into a file.
A few early warning signs to look out for:
- Bash is somewhat okay to handle files. If you find yourself handling lines, words or characters instead of entire files, you are using the wrong tool. The script you are working on should have been written in something else.
- Bash is really bad at math. If you are doing math, especially if it is not small positive integers, you should have been using something else.
- Bash is really bad at handling any kind of UI. If you start thinking about curses, Tk or Qt, you should have been using something else.
Bash is also bad at safely handling filenames with weird characters in them, bad at handling Unicode, bad at handling Errors and bad at many other elementary things.
Basically, it is better to start in something else right away if the things move away from an interactive command line and end up in a file. Use whatever you like as an interactive command line, but do not write bash or shell scripts.
Shell is a thing you want to understand and then not use, because you learned to understand it.
For the rest of this discussion, we assume “Python 3” as an instance of “something else”, but if you are older than 50, feel free to use “Perl” instead.
If you are already doing Python, the rest of this is not for you. You already know these things.
Do not modify the system python
Use the system Python, if possible, but do not try to modify the system Python installation. Use a virtual environment for packages, instead.
This will create a local (symlinked) copy of the system python, and activate it as the interpreter environment to modify if you install dependencies. You will want to update
wheel and then maintain a file named
requirements.txt at the top level of the
myscript directory. It will contain the names of the packages (optionally with version pins) you depend on. You can install the dependencies using
pip install -r requirements.txt.
It deactivates the venv, throws away the installed venv, and then re-makes it from the requirements.
Yes, that is a shell script. There are ways to do the same things natively and better.
Running a local Python package registry is as simple as exposing a directory with a specific file structure using a web server, see here, and a step-by-step walkthrough for a kind of minimal setup can be found here.
Do not try to write Bash in Python
In an online discussion, somebody remarked:
I do not think that
subprocesss.run(["ls", "-l", "/dev/null"], capture_output=True)
is more intuitive or less error prone, but different people have different opinions.
That is correct.
The point here is that this is not useful at all, in a Python program. That line will then produce output such as
crw-rw-rw- 1 root root 1, 3 Dec 7 12:15 /dev/null
and that needs parsing to be useful for anything. You’d not do that at all in Python, ever.
Now we can talk. In Bash, everything always is a string.
In a proper programming language, we have a wealth of basic data types, and can use them in containers to construct aggregate types or even objects, and we can make use of this.
Path().stat() we get access to the same information in useful form that combines nicely with any number of powerful language and library features.
If you need to run commands, consume JSON
So what if you have to run an external command to do things?
Hopefully the external commands produce something structured such as JSON:
It then uses the command
lvs --reportformat=json to list all LVM2 logical volumes in the system. The command is specificed as an argument list, so no interim bash is spawned, instead the command is run directly from Python, using
subprocess.run(), capturing the output.
The output, being JSON formatted, is turned into Python native data structures, using
json.loads(). We then iterate the data we collected, printing a system report from Python.
There are other ways to do the same: LVM2 offers liblvm2app and an API, and a binding of that API to Python exists (but seems to be rather unmaintained, so I’d rather use the JSON approach).
Useful modules that come with the system
sys, os, pathlib
sys is the meta about your Python environment. It offers you detailed introspection about the version, the base operating system platform, and many other things that relate to your runtime environment.
os is the access to the operating system, allowing you to manipulate files and many other base operating system abstractions
pathlib is a higher level convenience interface built on top of that, which overloads the
/ operator and allows you to manipulate operating system path names in a portable way. It offers functions to parse pathnames, is
os.stat() aware and has
dirname() functionality, plus globbing.
Path can completely replace os-like file access, and there is a handy table at the end of the manpage.
Basic shell file operations can be handled with
This module has a number of file copy operations available, which are aware of operating system specifics and modern metadata presence. There are also copytree and rmtree operations. Since Python 3.8 there are operating system specific high-efficiency implementations available which are network drive aware and are automatically used (MacOS fcopyfile, os.sendfile(), and others).
The module also offers a set of functions that deal with common archive formats such as
tar, and other compressors. There is a framework to register additional compressors and archive formats.
Note that the
walk() function is part of the
os module, not the
shutil module. It can be used to iterate over a filesystem subtree in a number of ways, offering
find(1) like functionality.
argparse, click, docopt
There are a number of more refined options that allow for positional, keyword and more restricted optional arguments (the type we have been using here), with typechecking and choices.
docopt is built around the concept of docstrings, so option parsers are configured from the program documentation at the start of the program.
click is built around the concept of Python decorators, and allows things such as
Click is very complete, extensible and specifically the tool of choice for large commands that require the implementation of subcommands (
git commit type interfaces).
In this context also useful is
fileinput, a helper that consumes pathnames from the command arguments and offers you the lines from the files named, in one single stream or separated.
There is a number of support options for writing filters, in-place file changes and similar programs.
It is possible to install decompressor/compressor hooks, as well as data encodoers/decoders.
os.stat() example from the very beginning of this text is easier expanded on with the
stat module, which has a number of constants and helpers that make more sense out of the data delivered by the operating system.
Temporary filenames, files and file handles can be made safely with
tempfile, another standard libary.
There are a number of modules that deal in comparisons of files,
difflib computes diff-like output with a nice programming interface, and
filecmp compares files and directory trees, finding files with different content or attributes.
ini files, yaml and json
sched, daemon, pidfile, and pystemd
A simple-cronlike timer facility,
sched comes with the system libraries.
The external dependency
pystemd allows you to speak dbus to talk to systemd, but you would not notice that from the usage: you can deal with systemd units as native Python objects and query and control them.
And of course, we already mentioned
subprocess.run(), the swiss army knife of bad old shell interfacing. Make sure you prefer commands that can produce JSON, that will hurt a lot less.