Laurynas Biveinis: 2026

I used Claude Code to do a simple mass refactoring on MySQL 9.6 over two weeks, resulting in a 25K line diff that passes the testsuite. That was something I wanted to experiment with for some time now: making boring but useful large-scale changes to MySQL while spending as little human effort as possible. Claude Code with Opus 4.5 (and 4.6 for the last few days) was used for everything.

Why boring but useful large-scale?

Pushing LLMs to the limit–like building a browser or a compiler, I guess I'd be building a database from scratch for a comparable project–is fun, it shows us what's possible and what needs one more frontier model release, but the result artifact itself is throwaway code, thus not usable in the general sense, nor do I have time (or money for tokens) for this.
Making algorithmically challenging changes is still firmly in the domain of humans. No "Add LSM tree to InnoDB. Be thorough. Do not make any mistakes." prompting here. Although, as I'm writing this, I just saw Zongzhi Chen doing exactly that and patching PostgreSQL with an InnoDB-like doublewrite buffer. Things move quickly.
Making too simple changes is a waste of LLM capabilities where sed would do the same faster and cheaper.

For a boring but useful large-scale change I picked replacing THD * with const THD & or THD & in method parameters and class fields. This might require some background. MySQL started in mid-90s, and its current source code, after 30 years, has features of three strata: modern C++ (C++11 to C++20), early standard C++ (C++98, C++03), and even earlier "C with classes" style. One thing still surviving from the earliest days is that pointers are used instead of references, including the contexts when the pointer can never be nullptr. I'll try not to digress on why it's not the right thing, C++ Core Guidelines has that (F.16, F.17, F.18, F.19, and finally F.60 for when the pointers are the right choice).

I settled on this after a few false starts, which made promising progress but eventually imploded. That was because of my mistake of not letting the agent make intermediate commits, resulting in unmanageable working directory diff every time. Now I believe I could have completed most of them too:

"Convert THD * to THD &": yep I tried it twice, the first time it broke the server and then couldn't debug its way out of the hole.
"Replace LEX_CSTRING with std::string_view" - the partial conversion was successful, but my mistake was not considering that lex strings are both owning and non-owning whereas string views are always strictly non-owning. The project failed in trying to replace the owning lex strings with something different altogether.
"Replace printf-style formatting with std::format" - this one imploded because of no intermediate commit rule, but not before scoring some strong wins, such as converting the MySQL error log and client error machinery and messages more or less correctly.

It was also my goal to spend as little of my time as possible on this, consciously ignoring some of the current best practices of this type of work, such as not setting up CLAUDE.md, not doing it in a dedicated container where the agent has full permissions, and not implementing any kind of Ralph Wiggum loop. All of these saved me some advance planning time, but then I had to go to a terminal and tell "continue" when it stopped mid-task or inexplicably started doing edits with some shell trickery instead of its standard tools. Thus by trying to save time I wasted some time, but definitely saved some thinking and could rubber-stamp agent questions while doing other, more useful, work.

I launched Claude Code and prompted

Convert pointers to THD to (optionally const) references to THD throughout the
codebase. The safe instances to convert are the ones where the pointer is dereferenced
unconditionally, there is an assert(thd != nullptr) check, or any thd == nullptr
branches are dead code (because e.g. the function is never called with pointer ==
nullptr). Not safe instances are the ones where null pointers are passed and C API.
In the touched code (and only in the touched code!) follow Almost Always Auto while
keeping references and pointers pulled out and const as much as possible (unless it
forces other code changes), except for trivial function parameters. Examples of AAA
and const: "const auto bar = ... ", "const auto *bar = ... ", "auto &baz = ...",
"const auto &x = ...", "const auto *const y = ...", "auto *const z = ..."

and letting it plan until we both were happy with the plan, and go. As I said above, every few tens of minutes to few hours it just stopped asking permission to continue, directly against original instructions. At those points I cleared its context and pointed to the plan file again.

Finally, some two weeks later (I did not orchestrate any parallelization, it sometimes parallelized itself a bit with subagents, also I needed room in the usage limits for other work on my account), it declared it's done. Then I ran debug + sanitizers testsuite, pasted any failures to the agent asking to fix, but, very importantly, always git bisect to the failing commit first. There were over a thousand commits. It debugged everything rather easily once it always had a relatively small regressing commit identified. Nice.

I succeeded in staying below usage limits of my Claude Max plan all the time. In the failed attempts above the agent parallelized much more aggressively and burned through the week's allowance in three days. I tried to preserve my sanity and did not go the route of stacked Max accounts.

With a clean testsuite run I asked it to create a second branch with all commits squashed. This produced a +25304, -25402 line diff. I looked at random points in the diff and everywhere I was satisfied with the result. The straightforward conversion bits went fine, but it also removed some provably-redundant nullptr checks, declared non-moveable classes as such, simplified constructors. The resulting release build binary is some 12K smaller on macOS, BTW. If this were real work, I'd still have to read the whole diff for any obvious issues, but in any case this is significantly faster than doing the work myself.

"If this were real work" - or is it? The nature of this patch is such that no tracking fork could take it, only Oracle MySQL. I have signed the contributor's agreement with them, then sent them some patches and have been waiting for years but they are still not merged, thus I will not submit this one unless asked to.

I pushed the code to GitHub:

To conclude, with LLMs, everyone has an opinion. A vocal part of the community does things like OpenClaw and Gas Town. I am way, way more conservative than that: so far I successfully avoided buying a Mac Mini for OpenClaw and giving my email and bank login details to it, and I also get uncomfortable quickly due to cognitive load if running too many coding agents in parallel. But, we live in exciting times, software engineering is changing and I am very optimistic.

Oracle has just released MySQL 8.0.45, 8.4.8, and 9.6.0 and here are the results of my usual testing of building them and running the testsuite on macOS, Apple Silicon hardware.

Build

All three versions compile with the current XCode (26.2) OK. With Homebrew-packaged LLVM, versions 14 to 17 inclusive are getting a strange CMake error that might have more to do with macOS than MySQL. LLVM 19 still fails to build 8.0.45 (bug #119238), but not 8.4.8, and so does LLVM 20 (bug #119239). Both these issues are pre-existing. But, 8.4.8 regressed in that it started being affected by bug #119246 too. LLVM 21 still fails to build 9.6.0 (preexisting bug #119246).

Test

Bad news:

No news aka old bad news:

Good news:

none! Every single bug I am tracking is present

Conclusion

With 4 new, 22 unchanged bugs, and not a single fix, the testsuite quality is continuing to slowly decay. I hope this will change for the better.

Addendum

A note for my future self. Issues I couldn't reliably reproduce:

main.mysqltest failing testcase check on 8.0.45, 8.0.45 debug build
innodb_zip.16k, main.mysqlpump_bugs failing with a result difference, 8.0.45 debug+sanitizers build
main.subquery_sj_firstmatch failing with a result difference, 8.0.45 release build
main.lowercase_table4 timeout, 8.0.45 debug+sanitizers build
x.mysqlxtest_mode_ssl test command failure, 8.4.8 release build
connection_control.performance_schema_processlist test result difference, 8.4.8 release build
merge_innodb_tests-t failed once, 8.4.8 debug+sanitizers build
routertest_harness_net_ts_timer failed once, 8.4.8 debug build
rpl_nogtid.rpl_semi_sync_optimize_for_static_plugin_config failing with global-buffer-overflow under 9.6.0 debug+sanitizers build, serious, but I couldn't reproduce.
routertest_integration_routing_direct, routertest_integration_routing_router_require, routertest_integration_routing_sharing, and routertest_integration_routing_sharing_constrained_pools failed once under 9.6.0 debug+sanitizers build and once under 9.6.0 debug build
perfschema.system_events_plugin, test_service_sql_api.test_sql_stmt, main.group_by, main.index_merge_innodb, perfschema.idx_compare_metadata_locks failed once, 9.6.0 debug build
router.authentication_mysql_accounts, router.app_specific_metadata_v_latest, and main.ps failed once, 9.6.0 release build

Laurynas Biveinis

Tuesday, February 10, 2026

Boring but useful: making large-scale changes on MySQL codebase with LLMs

Friday, January 23, 2026

Building and testing MySQL 8.0.45, 8.4.8, and 9.6.0 on macOS

Build

Test

Conclusion

Addendum