A GCC bug: tracking affected software
A while ago, I was browsing Hacker News and stumbled upon
this article which picked my
curiosity. It is about a bug in GCC 9 and 10 that, “under some circumstances
[…] will cause
memcmp to return an incorrect value when one of the inputs
is statically known array that contains NULL bytes”. If you haven’t clicked on
the link yet, do so and have a read, it’s
going to be important for the rest of this post!
With more hardware I could do a more thorough investigation of the consequences of this GCC bug.
This last line of the article got me thinking. What if I could use the processing power of the school computers to actually recompile a lot of software and find out what is actually affected by this bug? Clicking on the link in that last line, I ended up on a familiar website: the Hydra instance of the NixOS Foundation. Hydra is a Nix build farm that basically builds software from its Nix expression. And indeed, with a big Hydra cluster, I would be able to recompile the whole of nixpkgs, the Nix package repository, and with it, more than 60000 packages.
So I went and added the article’s proposed patch to my fork of nixpkgs, for GCC 9 and 10: https://github.com/rissson/nixpkgs/commit/253f5f809b319f7cd76719d758ed52c4f24634ef
Building a lot of packages
In the rest of this post, I will be mentioning “builds” and “build steps”. A build is the result of one or several build steps. Those steps can be, among other things, retrieving the package sources, building it, building one of its dependencies.
As Hydra uses Nix to build the packages, the next step was to somehow get Nix installed on the school computers. Instead of installing it on the existing Arch Linux images, I decided to create my own NixOS image, which you can learn more about here. This being the point of another post, let’s move it along. I then installed Hydra on one of our servers and got busy.
First, I built nixpkgs once without the patch, to have all the packages built once, and to get an estimation of how long it would take me. This first iteration involved 151746 build steps, which amounts to 2405854 seconds (27 days, 20 hours, 17 minutes and 34 seconds). This took about 8 hours, 55 minutes and 42 seconds and 99 machines.
And then I rebuilt nixpkgs with the patch, so I could see which ones were affected by it. This involved 92684 build steps, much less than before as some packages didn’t need rebuilding, which amounts to 2890923 seconds (13 days, 23 hours, 2 minutes and 3 seconds). This took about 50 hours, 30 minutes and 1 second, from the first build to the last, with interruptions in between, and 172 machines.
The patch doesn’t actually solve the problem with
memcmp, rather just prints
a warning at compile time when the bug might occur. So, as Hydra stores logs of
builds it made, I could just
grep through its logs and find what programs
might be affected by this bug. Here is
the result of this
grep. The first part of each line, before the
: shows which derivation, and
thus the corresponding package, is affected by the bug. The second part shows
which file from the source code is affected. As you can see, there aren’t many
more software affected by this than what the article originally reported. And
so I reported my findings to the author of the article and promised him to
publish this one.
As I did this mostly for fun, here are some fun stats about the process. Those stats include failed builds that have been restarted, as the previous ones didn’t.
From the first build to the last, the whole process took 4 days, 4 hours, 55 minutes and 57 seconds. Even though the builds were pretty fast themselves, there weren’t that many happening at daytime in order not to disturb the students that were working using the school’s infrastructure, which is why this period of time doesn’t add up with the two previous ones.
In total, 173 machines were used (server included), that is the Mid Lab and Lab SR computer rooms.
The total time needed to build all packages twice was 5296777 seconds, that is 61 days, 7 hours, 19 minutes and 37 seconds.
On average, a machine did 1362.39 build steps.
On average, a build step took 28.19 seconds. The longest took 12045 seconds (3 hours, 20 minutes and 45 seconds).
On average, a machine worked for 30523.5 seconds (8 hours, 28 minutes and 43.5 seconds).
And last, but not least, as we all love graphs, here is one of the number of running builds over the time of this project.
To build this graph yourself, first get the
datafile, and the
gnuplot file and then run
gnuplot running-builds-over-time.plt > running-builds-over-time.png and you’re done
(you must have gnuplot installed).