Docker is not a packaging tool — part 2

Posted by Jesse Portnoy on May 12, 2023 · 8 mins read

Docker is not a packaging tool — part 2

In the last segment of this series, we briefly discussed the importance of starting the build process from a clean ENV. We left off promising to say a word about pkg-config and move on to how proper packaging helps matters, how utilising chroots made things far easier way back when, why Docker was the next evolutionary step and, lastly - why, while grand, it is not an all encompassing, magic-holly-grail solution to all your build and deployment headaches. So, without further ado…

pkg-config

As I often recommend doing, let’s start from looking at the man page:

pkg-config(1) General Commands Manual pkg-config(1)
NAME
pkg-config - Return metainformation about installed libraries
DESCRIPTION
The pkg-config program is used to retrieve information about installed libraries
It is typically used to compile and link against one or more libraries.
Here is a typical usage scenario in a Makefile:
cc program.c `pkg-config --cflags --libs gnomeui`

Okay, that’s a clear and accurate description. The key bit here is this:

It is typically used to compile and link against one or more libraries.

So, pkg-config can help us ascertain that we have the needed deps to build our software.
Let’s look at a package (I chose libgif7, completely at random - I literally typed dpkg -L libg, hit tab-y and chose one) on my Debian machine to understand what pkg-config actually does for us.
Here we can begin to see the advantage of packaging formats like deb and RPM, to wit: they store the installed packages in a local DB and provide tools to look up metadata about them. In the case of APT/deb/dpkg, if we want to see what files a given package includes, we can run:

dpkg -L

Sample output:

$ dpkg -L libgif7
/.
/usr
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libgif.so.7.2.0
/usr/share
/usr/share/doc
/usr/share/doc/libgif7
/usr/share/doc/libgif7/NEWS.gz
/usr/share/doc/libgif7/TODO
/usr/share/doc/libgif7/changelog.Debian.gz
/usr/share/doc/libgif7/changelog.gz
/usr/share/doc/libgif7/copyright
/usr/lib/x86_64-linux-gnu/libgif.so.7

Very helpful, indeed:)

NOTE: in this post, I’ll be mentioning several deb/dpkg/APT commands (I’m a proud Debian GNU/Linux user). Of course, not all Linux distros and certainly not all UNIX flavours use this toolchain/stack/your term here. Since RPM/YUM/DNF is a very common stack, I’ll make a reasonable effort to provide the counterpart commands for these as well. In this case, the counterpart of dpkg -L is rpm -ql.

You’ll notice that the above output includes no reference to any pkg-config files at all. To understand why, allow me to provide some background as to how Linux distros based on pre-built/compiled packages segment things and why.

Let’s start with the why : it is commonly agreed (though not always practised) that a file system should not have files (of any kind: config, binaries, scripts, what have you), the system does not need in order to function. In the early, more naive days, this was mainly a question of disk space, which was very limited. With today’s resources, this point is somewhat less important (though not always — consider embedded devices) but, with the advancement and general availability of tech and computer resources, another problem has emerged; to wit: SECURITY. Put simply:

The more unneeded rubbish you have on your FS, the more vulnerable you are.

Another difficulty that has become more pronounced is that of managing dependencies and of course, the fewer packages you have installed, the easier it is to manage them.

Now, let’s go into the how. Again, I’ll be covering how it’s done in deb based distros, as well as RPM based ones. The principle in both is the same:

Packages are built from a spec file (or, in the case of deb — multiple spec files, each serving a different purpose).
The spec defines the package deps (separated into those needed to build the package and those needed to run it), as well as specifies the files to be included in the package (binaries, configuration files, documentation, etc) and their location (this is fixed, you cannot choose where to install files when deploying deb and RPM packages). It also includes some metadata: package name, description, source and so on. Some of the metadata is mandatory to specify (name for instance), other bits are optional (for example, not all packages declare the source/repo the package came from, which is a shame, because it’s useful data).

Now, to bring us back to the question of why the libgif7 package includes no pkg-config files: one spec can declare multiple packages and, in the case of libraries like libgif typically will.

To better explain this, let’s obtain the spec files for this package from my Debian 11 repo. We can do that with:

$ apt-get source libgif7

If you’ve run the above command, you’ll find that it has placed a directory called giflib-$VERSION in your CWD.

Inside it, you’ll find many different files, including the source for libgif of the version in question and a directory called debian where the spec files reside. Here’s what’s in there in my case:

-rw-r--r-- 1 jesse jesse 14506 Dec 20 2020 changelog
-rw-r--r-- 1 jesse jesse 1328 Dec 20 2020 control
-rw-r--r-- 1 jesse jesse 2371 Dec 20 2020 copyright
-rw-r--r-- 1 jesse jesse 10 Dec 20 2020 giflib-dbg.docs
-rw-r--r-- 1 jesse jesse 303 Dec 20 2020 giflib-tools.doc-base
-rw-r--r-- 1 jesse jesse 10 Dec 20 2020 giflib-tools.docs
-rw-r--r-- 1 jesse jesse 9 Dec 20 2020 giflib-tools.install
-rw-r--r-- 1 jesse jesse 24 Dec 20 2020 giflib-tools.manpages
-rw-r--r-- 1 jesse jesse 10 Dec 20 2020 libgif7.docs
-rw-r--r-- 1 jesse jesse 18 Dec 20 2020 libgif7.install
-rw-r--r-- 1 jesse jesse 1701 Dec 20 2020 libgif7.symbols
-rw-r--r-- 1 jesse jesse 10 Dec 20 2020 libgif-dev.docs
-rw-r--r-- 1 jesse jesse 46 Dec 20 2020 libgif-dev.install
drwxr-xr-x 2 jesse jesse 4096 Dec 20 2020 patches
-rwxr-xr-x 1 jesse jesse 887 Dec 20 2020 rules
drwxr-xr-x 2 jesse jesse 4096 Dec 20 2020 source
drwxr-xr-x 2 jesse jesse 4096 Dec 20 2020 upstream
-rw-r--r-- 1 jesse jesse 211 Dec 20 2020 watch

Let’s focus our attention on some of these:

  • control: defines the metadata for the packages to be produced (name, description, section and, very importantly: the build and runtime dependencies)
  • rules: contains the build instrustions (this will typically be processed by make but it’s not a requirement — you can use any tool you want, just remember to declare it as a build dep)
  • patches: in some cases, the package maintainer will apply patches to the upstream source (what is often referred to as pristine sources in RPM terminology). These will be placed in this directory and processed when rules is executed
  • changelog: this is a very important file; amongst other things, it allows us to easily tell what an upgrade includes (security fixes, new features, bug fixes and also breaking changes)

Right, this is all very interesting but again: why doesn’t the libgif7 package include any pkg-config files? Because they are part of the libgif-dev.

NOTE: in RPM/YUM/DNF based systems, the package spec consists of a single file (package-name.spec) rather than multiple files as described above. The general concepts are very similar, however, and the spec file is divided into sections, each serving as a counterpart to the above debian spec files. Patches will typically reside under ~/rpmbuild/SOURCES. The naming convention for development packages is package-name-devel.

As we said before, one spec can declare several different packages and this is the case for many packages, especially those providing libraries. Let us take a closer look at the control file; for the benefit of shortening this article, we’ll use grep to extract the different packages this file declares:

$ grep "Package:\|Description" debian/control
Package: giflib-tools
Description: library for GIF images (utilities)
Package: libgif7
Description: library for GIF images (library)
Package: libgif-dev
Description: library for GIF images (development)

Okay, so, we can see that 3 packages are declared in this spec. We’ve already seen what files libgif7 contains, let’s now look at libgif-dev:

$ dpkg -L libgif-dev
/.
/usr
/usr/include
/usr/include/gif_lib.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libgif.a
/usr/lib/x86_64-linux-gnu/pkgconfig
/usr/lib/x86_64-linux-gnu/pkgconfig/libgif7.pc
/usr/share
/usr/share/doc
/usr/share/doc/libgif-dev
/usr/share/doc/libgif-dev/NEWS.gz
/usr/share/doc/libgif-dev/TODO
/usr/share/doc/libgif-dev/changelog.Debian.gz
/usr/share/doc/libgif-dev/changelog.gz
/usr/share/doc/libgif-dev/copyright
/usr/lib/x86_64-linux-gnu/libgif.so
/usr/lib/x86_64-linux-gnu/pkgconfig/libgif.pc

As you can see, the dev package includes a directory called pkgconfig which has a single file called libgif7.pc and another file called libgif.pc which is a symlink pointing to the former.

To reiterate: the purpose of this package segmentation convention is to avoid the unnecessary deployment of files on our filesystem. dev packages include files that are only needed for developing with (or building against) a given package (headers, archive files, pkg-config files, etc).

Let’s have a look at the contents of pkgconfig/libgif7.pc:

prefix=/usr
exec_prefix=${prefix}
libdir=${prefix}/lib/x86_64-linux-gnu
includedir=${prefix}/include

Name: libgif
Description: Loads and saves GIF files
Version: 5.2.1
Cflags: -I${includedir}
Libs: -L${libdir} -lgif

As you can see, it provides information we’ll need when building against this library:

  • Its prefix
  • Where the library files (shared objects, archive files) reside
  • Its version (remember our description of the dependency hell?)
  • Cflags in this case only setting the include path (-I) but potentially, it could specify other flags to be used by the compiler
  • Libs points the linker to our $libdir (-L) and specifies that we should link against libgif (-lgif)

Let us look at a more elaborate example: the pkg-config for gtk4:

$ cat /usr/lib/x86_64-linux-gnu/pkgconfig/gtk4.pc
prefix=/usr
includedir=${prefix}/include
libdir=${prefix}/lib/x86_64-linux-gnu

targets=broadway wayland x11
gtk_binary_version=4.0.0
gtk_host=x86_64-linux

Name: GTK
Description: GTK Graphical UI Library
Version: 4.8.2
Requires: pango >= 1.50.0, pangocairo >= 1.50.0, gdk-pixbuf-2.0 >= 2.30.0, cairo >= 1.14.0, cairo-gobject >= 1.14.0, graphene-gobject-1.0 >= 1.9.1, gio-2.0 >= 2.66.0
Libs: -L${libdir} -lgtk-4
Cflags: -I${includedir}/gtk-4.0

This one includes another field called Requires which specifies additional deps (similar to what the debian/control file does).

So how is this metadata used? Generally speaking, the steps when building C/C++ code are:

  • Generate and run the configure script to specify the features you want to build the project with (i.e: libgif support) and ensure all project deps can be found
  • Build the code using a compiler (can be done by invoking make with a Makefile, which will in turn, invoke other tools including the compiler, but many other build frameworks/tools may be used, for example: CMake.
  • Link against needed deps using ld
  • Optionally, install the resulting files (binaries, configs, metadata, man pages, etc) onto the target paths

pkg-config will typically be involved in the configuration stage. For example, a configure script for a project that requires GTK4 may include this command:

$ pkg-config --cflags --libs gtk4

Which will return an output similar to this:

-mfpmath=sse -msse -msse2 -pthread -I/usr/local/include/freetype2 -I/usr/include/gtk-4.0 -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/pango-1.0 -I/usr/include/fribidi -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/x86_64-linux-gnu -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/uuid -I/usr/include/harfbuzz -I/usr/include/libpng16 -I/usr/include/graphene-1.0 -I/usr/lib/x86_64-linux-gnu/graphene-1.0/include -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -lgtk-4 -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -lgdk_pixbuf-2.0 -lcairo-gobject -lcairo -lgraphene-1.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0

At this point, you may be wondering why we need pkg-config at all if, as we’ve seen, deb/RPM based distros have tools that can provide the same information. The answer is rather simple but instead of phrasing my own version of it, let’s have a look at what Wikipedia has to say; from https://en.wikipedia.org/wiki/Pkg-config:

pkg-config is defines and supports a unified interface for querying installed libraries for the purpose of compiling software that depends on them. It allows programmers and installation scripts to work without explicit knowledge of detailed library path information. pkg-config was originally designed for Linux, but it is now also available for BSD, Microsoft Windows, macOS, and Solaris.

The imperative words here are: unified interface.

There are many Linux packaging formats but, while you can install the deb toolchain on an RPM based distro and vice versa, as well as on some other Unices, doing so can cause confusion (was this package installed via RPM, dpkg or something else? where does its metadata reside and what tool should I use to fetch it?) and when writing your packaging/deployment scripts, you cannot really rely on either of these toolchains being present.

You could, of course, cover both cases with conditional statements but that would make for a very long, error prone and hard to maintain code. Further, these tools will only work with packages deployed through them. In other words, dpkg will not return data for files/packages that were installed by manually invoking make install or cmake –install or rpm -i.
Moreover, while distros like Debian and RedHat (to name only two) do go to great lengths to package a multitude of popular (and some less popular) packages, no single distro can cover everything under the sun and chances are you’ll still have to build SOME of your needed dependencies yourself rather than rely on pre-built packages from the official distro repos.

pkg-config is useful because it is a unified interface. Declare that your build depends on it and you’re covered:) As far as the building bit is concerned, it doesn’t cover SUCCESSFULLY INSTALLING pre-built binaries, which is why packaging formats like deb and RPM are so important.

At the point, it is worth mentioning that there are alternatives to pkg-config. One such prominent alternative is GNU Libtool.

Once more, rather than needlessly work on my own explanation, allow me to refer you to Wikipedia for a quick comparison: https://en.wikipedia.org/wiki/Pkg-config#Comparison_with_libtool

Right, let’s recap:

In this chapter, we’ve discussed the purpose of pkg-config and demonstrated some obvious advantages of the packaging approach deb and RPM share, as well as explained the reasoning behind package segmentation.

Join me for the next instalment of this series where I’ll return to the use of chroots, the added value provided by Docker and how we can combine proper packaging and Docker in our persuit for the perfect build, packaging and deployment process:)

May the source be with you,