Thursday, December 14, 2023

MySQL 8.0.35 and 8.2.0 are out, here are my 15 compilation/test bug reports

I'm only a month and a half late to the party. That's, unfortunately, because I tried to build it and run its tests, on macOS, of all things. First the good news: it builds, and does so with the maximum set of 3rd party libraries possible.

Next I tried running the testsuite. I am used to clean test results in Oracle releases, under good conditions at least (not too heavy a load on the system, not too high a --parallel setting), with only occasional issues. This time I saw dozens of failures under debug, debug+sanitizers, release configurations, and tried to convert them to bug reports, best-effort.

First I identified a Homebrew-packaged Perl incompatibility with a test script: https://bugs.mysql.com/bug.php?id=113023.

Then I had a couple of test output differences where the difference was in floating point values: https://bugs.mysql.com/bug.php?id=113047 (MTR test main.derived_limit fails with small cost differences) and https://bugs.mysql.com/bug.php?id=113048 (MTR test gis.gis_bugs_crashes fails with a result difference). I am not a floating point programming expert, but somewhat luckily I remembered that there is a GCC option -ffp-contract=off, and that MySQL CMake script checks whether to add it. On a hunch that maybe the CMake test is incomplete (it is Linux-only and I was on macOS) I tried adding it as a workaround and it worked!

The next set of bugs was nastier. A bunch of query optimizer tests were failing with incorrect query results (https://bugs.mysql.com/bug.php?id=113046), and so did a JSON array test (https://bugs.mysql.com/bug.php?id=113049). To find the triggering conditions I tried different compilers, and, found that the tests pass if compiled with LLVM 14 and fail with LLVM 15, 16, 17, and XCode 15. I had no idea whether this is a compiler bug, MySQL undefined behavior, or something else, but Tor Didriksen posted on #113049 that "Recent versions of Clang have changed their implementation of std::sort(), and our own 'varlen_sort()' function returns wrong results.", one less mystery then.

Checking those different compiler versions was not trivial, because Homebrew-packaged LLVM 14 to 17 fail to build MySQL: https://bugs.mysql.com/bug.php?id=113113. Something about some incompatibility between system ar and LLVM ranlib utilities, with a workaround to use the ar coming from LLVM, i.e. -DCMAKE_AR=/opt/homebrew/opt/llvm@16/bin/llvm-ar. My build script is at 700 lines now, and that's already with some parts factored out.

On the top of the previous bug, LLVM 17, being new, had its regular and expected share of new warnings/errors: https://bugs.mysql.com/bug.php?id=113123.

Back from the build-with-different-compilers detour, there were still some test failures unaccounted: a debug assertion in group replication (https://bugs.mysql.com/bug.php?id=113257), all the TLS 1.3-using tests failing (https://bugs.mysql.com/bug.php?id=113258), spam in the replicating server error log (https://bugs.mysql.com/bug.php?id=113260).

At this point I stopped processing MTR tests, as I had already logged many bugs, and it became harder to avoid duplicates, so thought I could look at the unit tests. Here I'll just give a list of partial findings:

That's why it took me ~six weeks (and fifteen bug reports) to celebrate the new MySQL releases. That's halfway to the next expected release date on the quarterly schedule, and I hope I will be able to write a much shorter blog post much sooner after that release, as usual!