Tuesday, February 10, 2026

Boring but useful: making large-scale changes on MySQL codebase with LLMs

I used Claude Code to do a simple mass refactoring on MySQL 9.6 over two weeks, resulting in a 25K line diff that passes the testsuite. That was something I wanted to experiment with for some time now: making boring but useful large-scale changes to MySQL while spending as little human effort as possible. Claude Code with Opus 4.5 (and 4.6 for the last few days) was used for everything.

Why boring but useful large-scale?

  • Pushing LLMs to the limit–like building a browser or a compiler, I guess I'd be building a database from scratch for a comparable project–is fun, it shows us what's possible and what needs one more frontier model release, but the result artifact itself is throwaway code, thus not usable in the general sense, nor do I have time (or money for tokens) for this.
  • Making algorithmically challenging changes is still firmly in the domain of humans. No "Add LSM tree to InnoDB. Be thorough. Do not make any mistakes." prompting here. Although, as I'm writing this, I just saw Zongzhi Chen doing exactly that and patching PostgreSQL with an InnoDB-like doublewrite buffer. Things move quickly.
  • Making too simple changes is a waste of LLM capabilities where sed would do the same faster and cheaper.

For a boring but useful large-scale change I picked replacing THD * with const THD & or THD & in method parameters and class fields. This might require some background. MySQL started in mid-90s, and its current source code, after 30 years, has features of three strata: modern C++ (C++11 to C++20), early standard C++ (C++98, C++03), and even earlier "C with classes" style. One thing still surviving from the earliest days is that pointers are used instead of references, including the contexts when the pointer can never be nullptr. I'll try not to digress on why it's not the right thing, C++ Core Guidelines has that (F.16, F.17, F.18, F.19, and finally F.60 for when the pointers are the right choice).

I settled on this after a few false starts, which made promising progress but eventually imploded. That was because of my mistake of not letting the agent make intermediate commits, resulting in unmanageable working directory diff every time. Now I believe I could have completed most of them too:

  • "Convert THD * to THD &": yep I tried it twice, the first time it broke the server and then couldn't debug its way out of the hole.
  • "Replace LEX_CSTRING with std::string_view" - the partial conversion was successful, but my mistake was not considering that lex strings are both owning and non-owning whereas string views are always strictly non-owning. The project failed in trying to replace the owning lex strings with something different altogether.
  • "Replace printf-style formatting with std::format" - this one imploded because of no intermediate commit rule, but not before scoring some strong wins, such as converting the MySQL error log and client error machinery and messages more or less correctly.

It was also my goal to spend as little of my time as possible on this, consciously ignoring some of the current best practices of this type of work, such as not setting up CLAUDE.md, not doing it in a dedicated container where the agent has full permissions, and not implementing any kind of Ralph Wiggum loop. All of these saved me some advance planning time, but then I had to go to a terminal and tell "continue" when it stopped mid-task or inexplicably started doing edits with some shell trickery instead of its standard tools. Thus by trying to save time I wasted some time, but definitely saved some thinking and could rubber-stamp agent questions while doing other, more useful, work.

I launched Claude Code and prompted

Convert pointers to THD to (optionally const) references to THD throughout the
codebase. The safe instances to convert are the ones where the pointer is dereferenced
unconditionally, there is an assert(thd != nullptr) check, or any thd == nullptr
branches are dead code (because e.g. the function is never called with pointer ==
nullptr). Not safe instances are the ones where null pointers are passed and C API.
In the touched code (and only in the touched code!) follow Almost Always Auto while
keeping references and pointers pulled out and const as much as possible (unless it
forces other code changes), except for trivial function parameters. Examples of AAA
and const: "const auto bar = ... ", "const auto *bar = ... ", "auto &baz = ...",
"const auto &x = ...", "const auto *const y = ...", "auto *const z = ..."

and letting it plan until we both were happy with the plan, and go. As I said above, every few tens of minutes to few hours it just stopped asking permission to continue, directly against original instructions. At those points I cleared its context and pointed to the plan file again.

Finally, some two weeks later (I did not orchestrate any parallelization, it sometimes parallelized itself a bit with subagents, also I needed room in the usage limits for other work on my account), it declared it's done. Then I ran debug + sanitizers testsuite, pasted any failures to the agent asking to fix, but, very importantly, always git bisect to the failing commit first. There were over a thousand commits. It debugged everything rather easily once it always had a relatively small regressing commit identified. Nice.

I succeeded in staying below usage limits of my Claude Max plan all the time. In the failed attempts above the agent parallelized much more aggressively and burned through the week's allowance in three days. I tried to preserve my sanity and did not go the route of stacked Max accounts.

With a clean testsuite run I asked it to create a second branch with all commits squashed. This produced a +25304, -25402 line diff. I looked at random points in the diff and everywhere I was satisfied with the result. The straightforward conversion bits went fine, but it also removed some provably-redundant nullptr checks, declared non-moveable classes as such, simplified constructors. The resulting release build binary is some 12K smaller on macOS, BTW. If this were real work, I'd still have to read the whole diff for any obvious issues, but in any case this is significantly faster than doing the work myself.

"If this were real work" - or is it? The nature of this patch is such that no tracking fork could take it, only Oracle MySQL. I have signed the contributor's agreement with them, then sent them some patches and have been waiting for years but they are still not merged, thus I will not submit this one unless asked to.

I pushed the code to GitHub:

To conclude, with LLMs, everyone has an opinion. A vocal part of the community does things like OpenClaw and Gas Town. I am way, way more conservative than that: so far I successfully avoided buying a Mac Mini for OpenClaw and giving my email and bank login details to it, and I also get uncomfortable quickly due to cognitive load if running too many coding agents in parallel. But, we live in exciting times, software engineering is changing and I am very optimistic.

Friday, January 23, 2026

Building and testing MySQL 8.0.45, 8.4.8, and 9.6.0 on macOS

Oracle has just released MySQL 8.0.45, 8.4.8, and 9.6.0 and here are the results of my usual testing of building them and running the testsuite on macOS, Apple Silicon hardware.

Build

All three versions compile with the current XCode (26.2) OK. With Homebrew-packaged LLVM, versions 14 to 17 inclusive are getting a strange CMake error that might have more to do with macOS than MySQL. LLVM 19 still fails to build 8.0.45 (bug #119238), but not 8.4.8, and so does LLVM 20 (bug #119239). Both these issues are pre-existing. But, 8.4.8 regressed in that it started being affected by bug #119246 too. LLVM 21 still fails to build 9.6.0 (preexisting bug #119246).

Test

Bad news:

  1. Bug #119735 Unhandled error log warning in main.initialize-sha256
  2. Bug #119738 global-buffer-overflow on INSTALL PLUGIN
  3. Bug #119739 Test rpl_gtid.rpl_gtids_table_disable_binlog_on_slave result difference
  4. Bug #119746 innodb.table_encrypt_4 failing with a result difference

No news aka old bad news:

  1. Bug #113189 ColumnStatisticsTest.StoreAndRestoreAttributesEquiHeight unit test fails
  2. Bug #113190 Several tests in routertest_integration_routing_sharing_constrained_pools fail
  3. Bug #113260 Client error in error log: MY-004031 - The client was disconnected …
  4. Bug #113665 perfschema.relaylog fails with a result difference
  5. Bug #113709 StrXfrmTest.ChineseUTF8MB4 failing with an AddressSanitizer error
  6. Bug #113722 Test main.index_merge_innodb failing with a result difference
  7. Bug #114892 Test XComControlTest.SuspectMemberFailedRemovalDueToMajorityLoss fails
  8. Bug #115480 Test innodb.log_first_rec_group failing
  9. Bug #116369 rpl.rpl_semi_sync_alias crashes under AddressSanitizer
  10. Bug #116373 auth_sec.acl_tables_row_locking failing with result diff
  11. Bug #116378 routertest_integration_routing_splitting crashing under AddressSanitizer
  12. Bug #116385 Test main.mysql_upgrade_grant timing out on debug build
  13. Bug #116394 Test binlog_gtid.binlog_gtid_binlog_recovery_errors crashes with an assert
  14. Bug #118171 Test main.mysqldump-tablespace-escape failing
  15. Bug #118185 Non-specific ASan error on TestLoaderGood/LoaderReadTest.load_wrong_version/0
  16. Bug #118213 Test perfschema.idx_compare_mutex_instances fails under debug + sanitizers
  17. Bug #119247 Test main.log_buffered-big failure
  18. Bug #119249 Router unit test setup failures
  19. Bug #119251 Test rpl.rpl_seconds_behind_master failure
  20. Bug #119252 Test router.response_cache failure
  21. Bug #119253 innodb.tablespace_encrypt_9 crashing the server with assertion failure
  22. Bug #119258 router integration tests failing to login

Good news:

  1. none! Every single bug I am tracking is present

Conclusion

With 4 new, 22 unchanged bugs, and not a single fix, the testsuite quality is continuing to slowly decay. I hope this will change for the better.

Addendum

A note for my future self. Issues I couldn't reliably reproduce:

  • main.mysqltest failing testcase check on 8.0.45, 8.0.45 debug build
  • innodb_zip.16k, main.mysqlpump_bugs failing with a result difference, 8.0.45 debug+sanitizers build
  • main.subquery_sj_firstmatch failing with a result difference, 8.0.45 release build
  • main.lowercase_table4 timeout, 8.0.45 debug+sanitizers build
  • x.mysqlxtest_mode_ssl test command failure, 8.4.8 release build
  • connection_control.performance_schema_processlist test result difference, 8.4.8 release build
  • merge_innodb_tests-t failed once, 8.4.8 debug+sanitizers build
  • routertest_harness_net_ts_timer failed once, 8.4.8 debug build
  • rpl_nogtid.rpl_semi_sync_optimize_for_static_plugin_config failing with global-buffer-overflow under 9.6.0 debug+sanitizers build, serious, but I couldn't reproduce.
  • routertest_integration_routing_direct, routertest_integration_routing_router_require, routertest_integration_routing_sharing, and routertest_integration_routing_sharing_constrained_pools failed once under 9.6.0 debug+sanitizers build and once under 9.6.0 debug build
  • perfschema.system_events_plugin, test_service_sql_api.test_sql_stmt, main.group_by, main.index_merge_innodb, perfschema.idx_compare_metadata_locks failed once, 9.6.0 debug build
  • router.authentication_mysql_accounts, router.app_specific_metadata_v_latest, and main.ps failed once, 9.6.0 release build