Friday, November 08, 2024

Storing the MySQL Data Dictionary in a 3rd-Party Storage Engine

MySQL 8.0 replaced legacy FRM files with a transactional data dictionary, stored in InnoDB. Here I'll describe how any other storage engine could become the DDSE, that is, serve as the backend for the data dictionary, replacing InnoDB. This is purely an internals post, and will not look into user-facing motivations and implications of the change (which can be found here), now will it discuss storing any internal SE-specific data dictionary-like information in the server DD, nor should the DD tables be confused with system tables, a distinct category.

This is based on the experience of implementation this feature in MyRocks.

The needed changes broadly fall into three areas:

Changes to the SE to make it able work as the DDSE
Changes to the core server layer which no longer can assume InnoDB being the DDSE
Changes to InnoDB so it can keep on working while not being the DDSE

Changes to the SE to make it DDSE

The DDSE must:

be transactional
be linked statically into the server
know how to assign any internal IDs to DD tables
implement the handlerton API for DDSEs
make DD transactions durable
be able to do non-locking reads
support attachable transactions

The first two items are straightforward, in the motivation if not in the implementation. The storage engine must be transactional to maintain the transactional data dictionary. It also must be linked statically into the server, as dynamic plugins are registered in a mysql.plugin table, which is hard to access without data dictionary. This can be done by adding MANDATORY as an argument to MYSQL_ADD_PLUGIN in the CMakeLists.txt for the SE.

Internal ID assignment

The main task for the DDSE is to know how to access the DD tables without needing to refer to the DD. That is achieved by the DDSE hardcoding its internal IDs and any other needed data for the DD schema metadata. To get an ID, handler::get_se_private_id is called for each DD table, allowing the DDSE to update the passed dd::Table object to set the private IDs. For example:

dd_table->set_se_private_id(next_dd_index_id++);
for (auto *index : *dd_table->indexes()) {
  auto &properties = index->se_private_data();
  properties.set("index_id", next_dd_index_id++);
}

The create table implementation can then access this metadata to use the allocated IDs.

One question may arise, why bother going through get_se_private_id at all? Can't the CREATE TABLE code recognize that it's being called for a DD table, allocate the ID itself, and store it in the passed dd::Table object there? In principle, yes, it could, but: 1) there are asserts in the server layer that the DD tables have their SE private data set; 2) modifying passed dd::Table objects for handler CREATE TABLE is only allowed for HTON_SUPPORTS_ATOMIC_DDL engines, while there is no requirement for a DDSE to be one.

Handlerton API for DDSEs

There are several handlerton APIs that a DDSE must implement:

dict_get_server_version, dict_set_server_version: read and write the server of the DD schema in a SE-specific way. Called by the server layer during upgrades.
ddse_dict_init: the name suggests that it performs any DD initialization as needed in the DDSE. But it is also tasked with returning any extra SE-specific DD tables and tablespaces (in the InnoDB sense) back to the server DD layer. It also has a mode argument, with four possible values. While it may appear daunting needing to implement DDSE initialization in four different ways, in MyRocks this function does nothing. Not having SE-specific tables and tablespaces helps.
dict_recover: also perform any DD initialization in the DD, governed by the mode argument. The name and docs suggest that these are recovery-related actions. In MyRocks this function does nothing.
dict_register_dd_table_id: allows the DDSE to note which DD object IDs belong to DD tables. It is up to the engine to decide how to use this information. For instance, MyRocks checks whether a DROP TABLE and similar operations. are not executed against a DD table, for defense in depth.
is_dict_read_only: if the SE has a read only mode (InnoDB does), this allows telling the server layer when the DD can be read but not updated. Otherwise, return false; is a great implementation.
dict_cache_reset_tables_and_tablespaces: tells the DDSE to invalidate any cached metadata about tables and tablespaces, because it is going to be reloaded anew. This call happens in the server startup sequence right after binlog crash recovery.
dict_cache_reset: remove metadata for a single table from any internal SE caches. Called in upgrades. No-op in MyRocks, because there is no need to do any DDSE upgrades in MyRocks yet.

Durability for DD transactions

I have previously discussed SE durability implementation, and it touches upon DD transactions: if a transaction involves DD, it should be persisted regardless of any reduced durability settings (MyRocks implementation). While reduced durability for user transactions may result in lost data for the last transactions before a crash, in the case of DD it may result in desynced DD, making the whole instance nonviable, which I strongly suspect was the cause for this requirement.

Non-locking reads for user transactions for the DD tables

INFORMATION_SCHEMA views provide read-only access to the underlying DD tables for users. Now this access must be non-blocking, both ways: a DDL statement in progress such as ALTER TABLE must not block any INFORMATION_SCHEMA queries and conversely an INFORMATION_SCHEMA query, if done in a session with a high transaction isolation level, must not block any DDL statements from proceeding.

That is implemented by a new parameter for handler::extra method: HA_EXTRA_NO_READ_LOCKING. Having received this, the SE should skipping acquiring read locks, if any, for the handler's table. MyRocks implementation: setting, reading. For InnoDB implementation, you can take a look at WL#7446 (InnoDB: provide a way to do non-locking reads), note that the parameter was named differently there.

Attachable transactions

All the DD accesses are wrapped by DD transactions. These DD transactions differ from the regular user transactions in that they can and will happen in the middle of those regular transactions. The server layer takes care of starting and stopping them, and of swapping the transaction context in the thread handle. The storage engine must be prepared to handle this swapped context correctly, that is, if it observes THD::ha_data == nullptr in the middle of a transaction, it should allocate a new object to this field, and invalidate any cached assumptions about its contents. Once that is ensured, add HA_ATTACHABLE_TRX_COMPATIBLE to handler table_flags return value bitset. The server implementation design is WL#8003 (Server support for attachable transactions) and the InnoDB implementation is WL#7828.

Changes to the core server

The server assumes several things about the DDSE, and also that there is exactly one engine involved.

What the DDSE is
Early initialization of static plugins
READ COMMITTED and READ UNCOMMITTED
Foreign keys
Implicit primary keys
TABLESPACE support
handler::delete_row is bug-for-bug compatible with InnoDB implementation
Duplicate entry or lock wait timeout?
Supporting DD tables in multiple SEs
Changing the DDSE of an existing instance
Clone plugin compatibility

What the DDSE is

The DD layer is full of ha_resolve_by_legacy_type(thd, DB_TYPE_INNODB) calls for every time it needs to get the DDSE handlerton. Naturally, you'll have to grep all of sql for DB_TYPE_INNODB and refactor out some helper function to return the right handlerton for you. I wish grepping only sql/dd were enough, but there is sql/resourcegroups too.

That's not all. SQL definitions assume the engine too, the default constructor of Object_table_impl adds a hardcoded "ENGINE=InnoDB" string, which will need patching.

Early initialization of static plugins

Linking the DDSE statically into the server binary is not enough to ensure it is initialized early enough. The initialization of early plugins is handled by plugin_register_builtin_and_init_core_se in sql/plugin.cc. It has a loop where it iterates over all known at this stage plugins and initializes some of them. You'll have to add logic to initialize your SE as well.

Transaction isolation level requirements

The main level used in DD operations is READ COMMITTED. Luckily, this is sort of a "default" isolation level for transactional engines, and even if the SE implements something else, like snapshot isolation, it still has to map that to the levels MySQL understands, likely to READ COMMITTED again.

But, the server data dictionary also expects the DDSE to support READ UNCOMMITTED. Even though I spent better part of the year working on this, I still don't fully understand the reason behind it (any transaction can read its own uncommitted changes in any isolation level in every storage engine I know about, and reading uncommitted changes from other transactions for correctness looks suspicious). I have tried an experiment where I patched the server to always use READ COMMITTED with InnoDB still being the DDSE, to see what breaks. And what broke was the foreign key operations but only when there was more than one of them in a single ALTER TABLE. You have to evaluate how much of a showstopper that is for you, and whether go back to the SE and implement READ UNCOMMITTED, or patch the server to always use READ COMMITTED with DDSE.

A further complication here is that a SE may or may not support atomic DDL (handlerton flag HTON_SUPPORTS_ATOMIC_DDL), and the DDSE could go either way. If it does not, then there are interim DD transaction commits in the middle of a DDL statement, complicating this picture. I only worked on the case where the DDSE does not support atomic DDL.

Last but not least, if the unsupported isolation levels are strictly rejected, users may find it surprising that their DDL statements fail if their session uses one of the unsupported levels, for instance, if there is another engine for user tables that does support them. Consider accepting an unsupported level for the scope of any DDL statements. I am not fully certain why issue this happens at all – the attachable transactions should run in one of the accepted levels in the DDSE – but maybe the DDSE gets to see the DDL statement in an outer transaction first.

Foreign keys

The DD table definitions in .cc files under sql/dd/impl/tables feature many foreign keys. If your SE does not support them, you'll glad to learn that the DD schema has no referential actions, so the keys effectively serve as defense-in-depth against incorrect operations, but their presence is optional, and omitting their definitions works just fine. This commit also handles the hardcoded "ENGINE=InnoDB" discussed above and the next two items.

Implicit primary keys

Some of the DD tables, for example mysql.index_column_usage, have no explicit primary keys declared. But, they do have several unique non-NULL indexes, one of which is promoted to a primary key. As an implementation detail, it might be hard (and brittle) to track this promotion logic to repeat this decision correctly in the SE. If that's the case, patch the SQL definitions to make one of those unique keys a primary key explicitly.

`TABLESPACE` support

All the DD tables are created in the mysql tablespace. If your SE has no notion of tablespaces, patch the SQL definitions to remove this.

`delete_row` bug-for-bug compatibility with the DD-InnoDB interface

This was a fun one. MySQL deletes rows through handler::delete_row(const uchar *buf), where buf points to either record[0] or record[1]. But, the reference implementation InnoDB does not even look at what the buf is and relies on a pre-positioned cursor to delete the row. And the DD layer added a new call to delete_row that always passes buf == record[1], pointing to a memory block of zeros. Benign with InnoDB, anything can happen with another engine. Check sql/dd/impl/raw/raw_record.cc, so that your Raw_record::update passes m_table->record[0] instead of m_table->record[1].

Duplicate entry or lock wait timeout

The data dictionary holds the statistics tables, which usually have higher volume of writes than the rest of DD tables. They are not typical DD tables also in that it is possible, and legal, to write statistics data for the same object from two different threads, concurrently. One of them will win the race and the other may get ER_DUP_ENTRY error, which is then silently discarded.

The above is true if InnoDB is the DDSE. Other engines may handle duplicate index entries by returning a different error, for example, MyRocks returns ER_LOCK_WAIT_TIMEOUT instead and cannot be easily made to return ER_DUP_ENTRY. Thus the server has to accommodate by silently discarding either error.

Supporting DD tables in multiple SEs

While only one storage engine can host the data dictionary, it still makes sense for other storage engines to be able to have their own additional private tables in the DD, as discussed below in the context of InnoDB. This requires some changes in the server layer:

Handlerton method dict_register_dd_table_id needs to be called for the correct SE. The DD layer can no longer assume the engine of a DD table, and must get the handlerton for the dict_register_dd_table_id call from the table object.
dict_init, dict_recover, and dict_cache_reset_tables_and_tablespaces, need to be called for all participating SEs. Their implementations, especially for the first two, must handle both cases of the SE being the DDSE and not being one. But note that this code will have to be further adjusted if the DDSE change needs to be supported on existing instances.

Changing the DDSE of an existing instance

If you want to change the DDSE on existing instances, you'll have to design and write the code to perform this change. I was not the main author of this part of the project, thus I'll skip discussing it.

Clone

Technically not the core server, but a plugin. Clone is an InnoDB-centric plugin, but if your storage engine already supports clone, then almost everything will work. One thing that will not, without a larger patch for clone than I thought was worth developing, is clone rollbacks on startup. This does not happen on successful clones, and a failed clone instance with a non-InnoDB DDSE will just fail in a different way.

Changes to InnoDB

It is hard to blame InnoDB for not working straight out of the box when it does not store the data dictionary. Several changes need to be made there:

Rewrite InnoDB-specific INFORMATION_SCHEMA DD view implementations
Allow InnoDB to register its DD tables even when it's not the DDSE
Mark DDL transactions as read-write
Register DDL transactions with server 2PC coordinator
Allowing UNDO tablespace truncation to run without RW workload

Reimplementing InnoDB-specific `INFORMATION_SCHEMA` DD views.

There are several INFORMATION_SCHEMA views that are specific to InnoDB, that is, they are not server DD-layer views over DD tables: INNODB_TABLES, INNODB_TABLESPACES, INNODB_DATAFILES, INNODB_COLUMNS, INNODB_VIRTUAL. InnoDB implements those views by opening the underlying DD tables using low-level InnoDB APIs. Since those tables are in a different engine now, the views have to be reimplemented (main commit, fix 1, fix 2) using server layer APIs, luckily they are rather straightforward to use:

template <typename O, typename F>
int for_all_innodb_objects_in_dd(THD *thd, F f) {
  auto *const dc = dd::get_dd_client(thd);
  const dd::cache::Dictionary_client::Auto_releaser releaser(dc);
  std::vector<const O *> dd_objs;
  if (dc->fetch_global_components(&dd_objs)) {
    return 1;
  }

  for (const auto *obj : dd_objs) {
    if (obj->engine() != innobase_hton_name) continue;

    const auto ret = f(thd, *obj);
    if (ret) return ret;
  }

  return 0;
}

where the first template argument is dd::Table or dd::Tablespace and f reads the needed information and fill out the result row, possibly also looking into partitions, if needed:

[&tables](THD *thd, const dd::Table &table) {
  const auto table_id = table.se_private_id();
  if (dict_sys_t::is_dd_table_id(table_id)) return 0;

  if (table_id != dd::INVALID_OBJECT_ID) {
    fill_i_s_innodb_tables_row(thd, *tables, table_id);
  } else {
    assert(!table.leaf_partitions().empty());
    for (const auto *partition : table.leaf_partitions()) {
      assert(partition->engine() == innobase_hton_name);
      const auto table_id = partition->se_private_id();
      if (table_id != dd::INVALID_OBJECT_ID)
        fill_i_s_innodb_tables_row(thd, *tables, table_id);
    }
  }
  return 0;
}

InnoDB-specific DD tables

A storage engine should be allowed to have its private DD tables even if it is not the DDSE of the instance. InnoDB in particular needs its own tables in mysql schema: innodb_dynamic_metadata, innodb_table_stats, innodb_index_stats, and innodb_ddl_log. They get created and registered in ddse_dict_init handlerton method implementation and the handler method get_se_private_data will be called for them too.

There's more to it if DDSE changes need to be considered. The private data for DD tables in InnoDB is table ID, index ID, B-tree root page ID, etc. The first two are assigned sequentially. If InnoDB is not the DDSE, a naive implementation would give out the IDs that belong to different tables when InnoDB is the DDSE. To avoid this can of worms, it is prudent to play it safe and make sure that in both cases the same IDs are assigned to the private InnoDB DD tables. Last but not least, one of the InnoDB private data pieces assigned in advance is the B-tree root page ID, which must be in sync to what the underlying code actually will do at a later point. And that varies with innodb_page_size, so make sure to test and handle all possible settings for this system variable. Everything discussed in this paragraphs is handled here and also in dict_table_build_def with a btr_create debugging counterpart.

Mark DDL transactions as read-write

For some DDL operations, InnoDB did not bother marking their transactions as read-write in the DDL implementation proper, because the operation involved a write to a DD table in InnoDB anyway, taking care of that RW bit. With the DD table being in another SE, the DDL transaction incorrectly becomes RO in InnoDB, which had the effect of skipping the prepare call in the 2PC, breaking crash recovery. The existing handler::mark_trx_read_write fails to work here, being a private method. I ended up a copy-paste-shortening it to a new helper ha_innodb::mark_trx_rw, which is called from the various InnoDB DDL entry points (commit):

/** Mark transaction as read-write in the server layer.
Normally this is handled by handler::mark_trx_read_write, but when InnoDB is not
the DDSE (and transactions do not call that method many times), the few
remaining calls happen too early, when the transaction is not started yet, or
with an incompatible table_share value.
@param  thd     thread handle */
void mark_trx_rw(THD &thd) {
  auto &ha_info = thd.get_ha_data(innodb_hton_ptr->slot)->ha_info[0];
  if (ha_info.is_started())
    ha_info.set_trx_read_write();
}
// ...
if (!innobase_is_ddse()) {
  mark_trx_rw(*thd);
}

Register DDL transactions with server 2PC coordinator

With binlog disabled, when InnoDB is the DDSE, and it is, for example, creating an InnoDB table, it correctly assumes that it is the sole engine in this transaction and does not bother to register itself with the server 2PC coordinator. This assumption does not hold when InnoDB is asked to create its own table when it's not InnoDB.

Fixing this consists of adding numerous instances of

if (!innobase_is_ddse()) {
  innobase_register_trx(hton, thd, trx);
}

to the InnoDB DDL entry points (same commit as for the previous item).

GTID persistence with binlog disabled

If binlog is disabled, InnoDB persists the transaction GTID info itself. This logic needed some tweaking.

Allowing UNDO tablespace truncation to run without RW workload

With DDSE being another engine, InnoDB purge may get stuck and leave undo tablespaces untruncated. After I recovered from the initial "huh? Why is there a relationship between purge and DD?", I found that undo tablespace truncation runs only if there is something to purge (makes sense, doesn't it?). In the case of executing DROP UNDO TABLESPACE, InnoDB the DDSE would execute some read-write DD transactions, potentially creating something to purge. If it's not the DDSE, no actual writes to InnoDB happen, it has nothing to purge, consequently never getting to the undo tablespace truncation and the final drop.

One way to address this is to call trx_purge_truncate_undo_spaces from srv_do_purge. Doing that unconditionally if InnoDB is not the DDSE works, but might be less than optimal.

Testing

If your branch of MySQL needs to support both InnoDB and your SE as DDSEs, one way to do MTR testing for DDSE regressions is to have a server variable to choose the DDSE (which is likely what you'll already have if you need two support both DDSEs), and some MTR include files to skip some tests with one or the other choice. Then adding --mysqld=--ddse-var=non-default-one to MTR invocations will do the trick. Any failing tests then can be saved separately with --initialize --mysqld=--ddse-var in their option files and added as regression tests to the main testsuite runs.

Conclusion

A checklist for everything discussed above:

Changes to the SE to make it able work as the DDSE
- be transactional
- be linked statically into the server
- know how to assign any internal IDs to DD tables
- implement the handlerton API for DDSEs
- make DD transactions durable
- be able to do non-locking reads
- support attachable transactions
Changes to the core server layer which no longer can assume InnoDB being the DDSE
- What the DDSE is
- Early initialization of static plugins
- READ COMMITTED and READ UNCOMMITTED
- Foreign keys
- Implicit primary keys
- TABLESPACE support
- handler::delete_row is bug-for-bug compatible with InnoDB implementation
- Duplicate entry or lock wait timeout?
- Supporting DD tables in multiple SEs
- Changing the DDSE of an existing instance
- Clone plugin compatibility
Changes to InnoDB so it can keep on working while not being the DDSE
- Rewrite InnoDB-specific INFORMATION_SCHEMA DD view implementations
- Allow InnoDB to register its DD tables even when it's not the DDSE
- Mark DDL transactions as read-write
- Register DDL transactions with server 2PC coordinator
- Allowing UNDO tablespace truncation to run without RW workload
Testing

Reference

Oracle design documents:

WL#6380 (Formulate framework for API for DD)
WL#6394 (Bootstrap code for new DD)
WL#6599 (New Data Dictionary and I_S integration)
WL#7464 (InnoDB: provide a way to do non-locking reads)
WL#7828 (InnoDB: attachable transaction support)
WL#8003 (Server support for attachable transactions)

Blog posts:

https://dev.mysql.com/blog-archive/bootstrapping-the-transactional-data-dictionary/

Tuesday, October 22, 2024

A MySQL 9.1.0 branch with patches

Following-up on the previous MySQL 9.0.0 branch with patches, I have rebased them on 9.1.0: patched-mysql-9.1.0.

The set of the patches is the same as for 9.0.0: it improves AddressSanitizer support and fixes/improves clone plugin to support a second transactional storage engine:

Fix for Bug 115120: Provide memory debugging macro implementations for AddressSanitizer.
Fix for Bug 109922: SEs do not coordinate clone rollbacks on instance startup.
Fix for Bug 109920: SEs cannot add to clone data size estimates.
Fix for Bug 109919: Clone plugin does not support arbitrary length O_DIRECT files.
Fix for Bug 107715: Multiple SE clone not calling clone_end after mixed clone_begin failure/success.
Fix for Bug 106750: Some clone tests needlessly marked as Linux-only.
Fix for Bug 106737: Wrong SE called to ack clone application if >1 SE.

Friday, October 18, 2024

Building and testing MySQL 8.0.40 / 8.4.3 / 9.1.0 on macOS

Congrats to Oracle friends with MySQL 8.0.40 / 8.4.3 / 9.1.0 releases! Let's continue my blog post series on builds and tests on macOS (the previous post for MySQL 8.0.38 / 8.4.1 / 9.0.0 is here. I skipped the emergency hotfixes 8.0.39 / 8.4.2 / 9.0.1).

Build

Mostly good news here. XCode 16 works (8.4.3 & 9.10 need -Wno-unused-const-variable), and so does LLVM 18. The very recent LLVM 19 release will need fixes for the new errors (and not only warnings which could be disabled), but that's expected. Homebrew LLVM still needs the workaround to avoid mixing system and Homebrew LLVM static linker utilities (Bug #113113 Build failure with Homebrew LLVM 14-19 on macOS). All releases build with maximum system libraries, good.

Test

Many changes here. Several existing bugs have been fixed, but there are also new regressions. As usual, I tested Release, Debug, and Debug+ASan+UBSan configurations.

New bugs

Little or no changes

Fixed and no longer reproducing bugs:

Totals

To sum up the existing build & test bugs, we have 10 new, 11 unchanged, and 8 fixed bugs. A lot of churn, and still plus two regressions total.

Newer tooling

I have also logged two bugs which seem to be new classes of errors from newer tooling instead of being MySQL regressions, i.e. they are very likely present in older versions too. These I don't count in the above, but will start doing so from the next releases:

XCode 16 AddressSanitizer started catching ODR violations, resulting in Bug #116372 ODR violation reported by AddressSanitizer for plugin variables
LLVM 18 UndefinedBehaviorSanitizer also started diagnosing something that was not caught by earlier versions: Bug #116396 LLVM 18 UBSan errors on server bootstrap

Valgrind

I have been complaining that running Valgrind is excessively slow, and that there was an error present on every test (Bug #115229 pwrite(buf) points to uninitialised byte(s) in os_fusionio_get_sector_size()). The good news is that this bug has been fixed, making the Valgrind runs actually usable now. I have started one for 9.1.0, will see how far along it goes.

One-off issues that I couldn't handle better

Stuff I couldn't repeat reliably or at all, just dumping it here in case I have to reference it next time.

rpl_nogtid.rpl_semi_sync_non_group_commit_deadlock 8.0.40 Debug+San reports odr-violation on ReplSemiSyncBase::kPacketMagicNum, does not reproduce running individually
binlog.binlog_mysqlbinlog_raw failed once on 8.0.40 Debug+San (L77 copy_file error)
rpl_gtid.rpl_gtids_table_disable_binlog_on_slave failed once on 8.4.3 Debug+San build
main.index_merge_innodb_hypergraph failed once on 8.4.3 Debug+San build
innodb.trigger_function_lock_compare timed out once there
auth_sec.grant_as_ddl failed once 8.4.3 Release build
rpl_gtid.rpl_gtids_table_disable_binlog_on_slave failed once on 9.1.0 Debug+San build
merge_innodb_tests-t failed once on 9.1.0 Debug+San
main.lock_backup failed once 9.1.0 with MTR thread id masking clobbering error number
innodb.trigger_function_lock_compare failed once on 9.1.0 Debug

Conclusion

I am happy to see that some of the bugs are getting fixed, but the trend is still going the wrong way, and I miss the fully-clean mid-8.0 releases.

Thursday, October 10, 2024

Emacs Projectile Tweaks for MySQL Development

I am using Emacs with Projectile (besides other packages) for MySQL development. Here's a couple of little tweaks to save a few keystrokes here and there:

Visit (which in Emacs parlance means open) the MTR test file, by its MTR name instead of file path:

;; This is a not a MySQL-specific implementation of `projectile-find-test-file'.
;; That function is meant for jumping from a source file to its test file.
;; TODO(laurynas): this is not an exhaustive implementation for all locations
;; test files can reside. Add them as the need arises.
(defun my-visit-mtr-test ()
  "Input MySQL MTR test name and visit it."
  (interactive)
  (let* ((input (read-string "MySQL MTR test name: "))
         ;; TODO(laurynas): error checking
         (parts (split-string input "\\."))
         (suite-name (car parts))
         (test-in-suite (concat "t/" (cadr parts) ".test"))
         (test-root (concat (projectile-project-root) "mysql-test/"))
         (file-path (if (string= suite-name "main")
                        (concat test-root test-in-suite)
                      (concat test-root "suite/" suite-name "/"
                              test-in-suite))))
    (find-file file-path)))

(define-key projectile-command-map "M" #'my-visit-mtr-test)

The last line binds this command to M in the Projectile command map, so if your Projectile keybinding prefix is C-c p (C is Ctrl in case non-Emacs users are still reading), you can do C-c p M to invoke it.

Register .test and .result files as "other file types" for each other. This way projectile-find-other-file (at e.g. C-c p a) can be used for quick jumping between tests and their results:

(add-to-list 'projectile-other-file-alist '("test" "result"))
(add-to-list 'projectile-other-file-alist '("result" "test"))

If I think of more utilities, I'll try not to forget to update this post.

Friday, September 06, 2024

Running Perl-using MTR tests for fb-mysql under macOS

The headline is slightly misleading. Oracle MySQL has several Perl-using MTR tests, the whole MTR framework is Perl-based, and those tests run fine out of the box (with an occasional bug). However, fb-mysql has additional tests that call the Random Query Generator under the hood, and these tests don't work out of the box under macOS.

Here's what I had to do to get them working:

# mysql-client default version is 9.0 now, and you'll get the wrong auth plugin error
brew install perl mysql-client@8.0

# TODO: does not survive upgrades, do what the Perl Homebrew formula advises with
# local::lib

/opt/homebrew/opt/perl/bin/cpan DBI
/opt/homebrew/opt/perl/bin/cpan JSON
/opt/homebrew/opt/perl/bin/cpan Expect

# Adjust path only for compilation below. For now I don't need it globally
export PATH="/opt/homebrew/opt/mysql-client/bin:$PATH"

# Run the following command and expect it to fail with the error as in
# https://stackoverflow.com/questions/53277884/installation-of-dbdmysql-fails-with-symbol-not-found-on-osx-mojave
/opt/homebrew/opt/perl/bin/cpan DBD::mysql

# Now go to ~/.cpan/build/DBD-mysql-<latest> and do:
perl Makefile.PL \
     --libs="-L/opt/homebrew/opt/mysql-client/lib -L/opt/homebrew/lib -lmysqlclient -lz -lzstd -lssl -lcrypto -lresolv"
sudo make install

And now:

[  5%] rocksdb.rqg_runtime 'rocksdb_ddse'       w4  [ pass ]  41533

Success! Stay tuned for getting Python tests running too.

Friday, August 23, 2024

A README for my MySQL development tooling

Over the years I have written and borrowed various bits of scripts and configuration files to make MySQL development easier by configuring and automating things. Having written patch2testlist a few months ago, I figured that now I have at least one thing that should work not only on my machine. Thus I put together a README for MySQL-related stuff in my dotfiles, also included below.

My MySQL Development Tools

Tool configuration

clangd: I am so glad C++ finally got universal IDE tooling and clangd is very useful for MySQL development.
Emacs: my primary Emacs config is elsewhere, while this is a directory-local config that takes effect in the source trees. It prevents creating spurious .ispell.dict files, sets the C++ style to Google, and makes Magit (magit-todos actually) exclude 3rd party TODO comments.
cpplint: cpplint is still marginally useful with MySQL sources.

I symlink these into the parent directory of all the source trees.

Could be useful to you

patch2testlist: for a given patch, print a list of MTR tests to run to cover the test changes in the patch. Introduced in its own blog post. Unlike the scripts below, patch2testlist is standalone and works without the zsh utility functions.

Useful to me

mysql-work.sh: a shell initialization script that sets up environment variables and helper functions to configure, build, & test various MySQL versions on macOS and Linux. For example, $MY8037D will expand to CMake options for a debug build of MySQL 8.0.37. As for the functions, among others, mysql_build will configure and build a given tree, and rmtr will incrementally build and run MTR tests with libeatmydata. This script adds various warning disable flags so that older unpatched trees can be built with newer compilers within reason. It also absorbs bug workarounds and complications related to build environments, for example, different MySQL versions need both OpenSSL 1.1 and 3 installed, which then in turn results in build issues if both versions are present simultaneously. This script is not usable without its zsh utility functions.
fetchworksrc: fetch all remotes for Git repos in WORK_REPOS_TO_PULL environment variable; pull all the worktrees in WORK_TREES_TO_UPDATE one and do incremental builds. This script runs at midnight on my work machine. There are some quality of life embellishments such as skipping the the build if the machine is a laptop on battery and managing Homebrew OpenSSL version incompatibilities in builds. Again, this script is not usable without its zsh utility functions.

zsh function library

This directory. One file per function as zsh requires. Currently they fall into two sets of responsibilities: querying the source tree for version information and managing OpenSSL 3 build workarounds.

No longer maintained (patches welcome)

gca, the workhorse script from my time at Percona. Used when the same patch has to be developed for more than one major MySQL series (e.g. the patch is needed for 8.0, 8.4, & 9.0, using an example that is current at the time of the writing). There is a constraint that the lower series version of the patch has to be git merged to the next higher series, but the latest commits of these series are not fully merged to each other due to how Oracle pushes their source trees. So, a lower series feature branch has to be created not from the tip but from some older commit which had been fully merged. This script manages finding such commits and creating the worktrees. Luckily I only deal with single series these days.
reupmerge, used in that same multiple-series workflow. It handles the scenario where, i.e. a code review change needs to be applied to a lower branch commit in a feature branch. Changing that commit requires re-merging it to the higher series. For the case where the higher series commit needs not to change, this script automates saving its diff, and re-merging the lower series with that exact saved diff.
mysql-work.el: mostly outdated Emacs configuration bits, mostly copied from colleagues at Percona. They set up MySQL (and InnoDB) indentation styles that were used up to 5.7, and provide a bit of syntax support for MTR .test files.

Tuesday, July 09, 2024

A MySQL 9.0.0 branch with patches

Recently I made public a branch of MySQL 8.4.0 with patches. Now that newer upstream versions have been released, I have rebased the branch on 9.0.0: patched-mysql-9.0.0.

I did not add any new patches, and dropped the LLVM compilation fix patch. Thus the current patch set improves AddressSanitizer support and includes several fixes/improvements for the clone plugin to support a second transactional storage engine:

Fix for Bug 115120: Provide memory debugging macro implementations for AddressSanitizer.
Fix for Bug 109922: SEs do not coordinate clone rollbacks on instance startup.
Fix for Bug 109920: SEs cannot add to clone data size estimates.
Fix for Bug 109919: Clone plugin does not support arbitrary length O_DIRECT files.
Fix for Bug 107715: Multiple SE clone not calling clone_end after mixed clone_begin failure/success.
Fix for Bug 106750: Some clone tests needlessly marked as Linux-only.
Fix for Bug 106737: Wrong SE called to ack clone application if >1 SE.

Thursday, July 04, 2024

Building and testing MySQL 8.0.38 / 8.4.1 / 9.0.0 on macOS

Oracle has just released MySQL 8.0.38/8.4.1/9.0.0, so let's see how the Valgrind testing of the previous set of releases is going:

[ 65%] innodb.bug33788578_rec_IV_set            w2  [ fail ]  Found warnings/errors in error log file!

It only managed two thirds of a run (a --big-test one) in a few weeks, which means that I either need a faster machine, or that Valgrind testing does not take even remotely reasonable amount of time. Despite the run being incomplete, it resulted in couple of bug reports because every single test gives an error: #115229: pwrite(buf) points to uninitialised byte(s) in os_fusionio_get_sector_size(). I also noticed that one test always times out: #114998: Test rpl_gtid.rpl_gtid_parallel times out under Valgrind. I need to rethink Valgrind testing going forward, maybe by trying dropping --big-test first.

Back to the new releases. A nice surprise is that the they build with XCode 15.3 even though the associated bug (#113123: Compilation fails with LLVM 17 and XCode 15.3) is still open. However, an LLVM 18 build fails in the bundled zlib, and this time I refrained from reporting a bug. GCC 11–14 also fail, but they are officially unsupported for macOS, so I won't be reporting that, unless I encounter issues on Linux.

Fixed and no longer reproducing bugs:

New bugs:

Little to no changes

#113260: Client error in error log: MY-004031 - The client was disconnected …
#113023: mysql-test/include/wait_for_pattern_in_file.inc incompatible with Homebrew Perl
#113189: ColumnStatisticsTest.StoreAndRestoreAttributesEquiHeight unit test fails
#113190: Several tests in routertest_integration_routing_sharing_constrained_pools fail
#114888: Test innodb.parallel_read_kill always fails
#113709: StrXfrmTest.ChineseUTF8MB4 failing with an AddressSanitizer error, which is a duplicate of #109540, fixed in 8.1.0 only.
#113722: Test main.index_merge_innodb failing with a result difference
#114892: Test XComControlTest.SuspectMemberFailedRemovalDueToMajorityLoss fails
#113665: perfschema.relaylog fails with a result difference - reproduced only on 8.4.1. I guess the test is still flaky.
#113703: Test innodb.ddl_kill fails under Sanitizer build - reproduced on 8.4.1 and 9.0.0 but not 8.0.38.
#113710: Spec/HttpClientSecureTest.ensure/default_client_cipher_succeeds ASan error

Finally, I did not retest #113113: Build failure with Homebrew LLVM 14-17 on macOS, where I keep using the workaround of setting CMAKE_AR to Homebrew ar.

To sum up, 5 bugs fixed or no longer reproducing, 7 new, 11 with no changes, 1 not tested, and 2 new Valgrind ones. I don't like this trend and I miss the middle-8.0 releases that were fully clean under macOS.

Friday, June 14, 2024

Implementing a Table-Returning MySQL Function

Edit: links to Percona SEQUENCE_TABLE implementation added to the end

MySQL has many native functions and it's relatively straightforward to write new ones. While the official internals manual has a chapter on adding native functions, some aspects aren't immediately obvious, such as the possible return types for functions. The linked docs suggest that functions can return a double, an integer, or a string, but doesn't mention that they can also return JSON documents (i.e. JSON_ARRAY) and relational tables (i.e. JSON_TABLE). I have filed a bug for this omission, meanwhile this post will be my attempt at documenting the implementation of table-returning functions.

We will implement a table-returning function called LAST_INSERT_IDS, We will focus on the server infrastructure to support that, not on the function's purpose or the actual data it returns. Let's do the simplest possible thing: return a table with one column and one row.

Exploring the Codebase

Grepping the source tree for JSON_TABLE points to Table_function class declared in sql/table_function.h. Source code history then points to this commit, which in turn mentions Worklog #8867 (Add JSON table functions). Luckily for us it's a relatively old worklog, thus available publicly! From the worklog we can learn that any result tables will start as in-memory temporary tables which will spill to disk as needed. Good. The worklog also touches upon a lot of stuff elsewhere in the server so that the query optimizer will know how to deal with table-returning functions. We won't need to deal with that while implementing a new function, but it may provide debugging starters should anything go wrong.

Function class

To start writing code, we have to derive a new concrete class from Table_function and let the compiler tell us what methods must be implemented:

class Table_function_last_insert_ids final : public Table_function {
 public:
  bool init() override;
  bool fill_result_table() override;
  const char *func_name() const override;
  bool print(const THD *thd, String *str,
             enum_query_type query_type) const override;
  bool walk(Item_processor processor, enum_walk walk, uchar *arg) override;

 private:
  List<Create_field> *get_field_list() override;
  bool do_init_args() override;
};

But let's not implement them just yet (stub them out with assert(0) bodies to learn what gets called when), and let's look into instantiating the class object first. The regular functions get registered with MySQL in func_array variable in sql/item_create.cc, and the surrounding code knows how to create their objects. But JSON_TABLE is not there! Grepping shows that it is wired directly to the parser, probably because its arguments are non-trivial. Our function is much simpler in that regard, can we get away with putting it there instead of patching parser? It turns out that no, we cannot, because the simpler functions derive from Item_func class, which derives from Item, which is what this infrastructure expects. And our table-returning function derives from Table_function, which has no further ancestors.

Patching the parser

OK, off to the parser we go. To find a starting point, let's check what JSON_TABLE does. Grepping for Table_function_json in the parser (the worklog is not too specific in this area), we get a match for PT_table_factor_function class in sql/parse_tree_nodes.cc. Then we grep for that class in the parser grammar and we get a hint why the table-returning functions are harder to implement than other kinds of functions:

table_function:
        JSON_TABLE_SYM '(' expr ',' text_literal columns_clause ')'
        opt_table_alias
        {
          // Alias isn't optional, follow derived's behavior
          if ($8 == NULL_CSTR)
          {
            my_message(ER_TF_MUST_HAVE_ALIAS,
                       ER_THD(YYTHD, ER_TF_MUST_HAVE_ALIAS), MYF(0));
            MYSQL_YYABORT;
          }

          $$= NEW_PTN PT_table_factor_function($3, $5, $6, to_lex_string($8));
        }
      ;

We can see that the parser knows how to work with exactly one table-returning function that is named JSON_TABLE, thus we have to patch the parser. Let's keep everything under the non-terminal symbol table_function:

table_function:
          json_table_function
        | last_insert_ids_function
        ;

last_insert_ids_function:
          LAST_INSERT_IDS_SYM '(' ')'
          opt_table_alias
          {
            // Alias isn't optional, follow derived's behavior
            if ($4 == NULL_CSTR)
            {
                my_message(ER_TF_MUST_HAVE_ALIAS,
                           ER_THD(YYTHD, ER_TF_MUST_HAVE_ALIAS), MYF(0));
                MYSQL_YYABORT;
            }

            $$= NEW_PTN PT_last_insert_ids_function(to_lex_string($4));
          }
        ;

json_table_function:
        // As before

Unfortunately the opt_table_alias block must be duplicated. But, no increase in the parser shift/reduce conflicts! There is more not particularly interesting stuff to do in the lexer and the parser:

Add the function name to the lexer, {SYM("LAST_INSERT_IDS", LAST_INSERT_IDS_SYM)}, to sql/lex.h.
Declare this lexer symbol as a token for the parser. I did not make it a keyword because JSON_TABLE wasn't one neither: %token LAST_INSERT_IDS_SYM 10024.
Finally, declare last_insert_ids_function symbol to be of table_reference type, same as table_function was before.

Parse Tree Node class

That's enough for the lexer and the parser, next we need to implement PT_last_insert_ids_function class:

class PT_last_insert_ids_function : public PT_table_reference {
 public:
  PT_last_insert_ids_function(const LEX_STRING &table_alias)
      : m_table_alias{table_alias} {}

 private:
  const LEX_STRING m_table_alias;
};

Without a PT_last_insert_ids_function::contextualize method, which would actually create the Table_function_last_insert_ids object, the above implementation fails rather non-obviously:

mysqltest: At line 24: Query 'SELECT * FROM LAST_INSERT_IDS() as ids' failed.
ERROR 1096 (HY000): No tables used

Let's add that method by copying, pasting, and deleting the non-applicable bits of PT_table_factor_function::contextualize. The job of this method will be to create the function object and to assign a name and a table for the query optimizer for the result.

bool PT_last_insert_ids_function::contextualize(Parse_context *pc) {
  if (super::contextualize(pc)) return true;
  auto *const fn = new (pc->mem_root) Table_function_last_insert_ids{};
  if (unlikely(fn == nullptr)) return true;

  LEX_CSTRING alias;
  alias.length = strlen(fn->func_name());
  alias.str = sql_strmake(fn->func_name(), alias.length);
  if (unlikely(alias.str == nullptr)) return true;

  auto *const ti = new (pc->mem_root) Table_ident(alias, fn);
  if (ti == nullptr) return true;

  m_table_ref = pc->select->add_table_to_list(pc->thd, ti, m_table_alias.str, 0,
                                              TL_READ, MDL_SHARED_READ);
  if (m_table_ref == nullptr || pc->select->add_joined_table(m_table_ref))
    return true;

  return false;
}

Filling out the function class implementation

This implementation forces us to define the first method in Table_function_last_insert_ids, a very simple one:

const char *func_name() const override { return "last_insert_ids"; }

Let's run it!

SELECT * FROM LAST_INSERT_IDS() as ids;
...
Assertion failed: (0), function init, file table_function.h, line 440.

…which points to the stub Table_function_last_insert_ids::init(). Let's say we have no meaningful initialization at this point, so remove the assert from the stub:

Assertion failed: (false), function get_field_list, file table_function.h, line 464.

This is get_field_list method, and we have to implement it to describe the schema of the table we will be returning:

List<Create_field> *Table_function_last_insert_ids::get_field_list() {
  assert(fields.is_empty());

  auto *const field = new Create_field;
  field->init_for_tmp_table(MYSQL_TYPE_LONGLONG, MAX_BIGINT_WIDTH,
                            DECIMAL_NOT_SPECIFIED, /* is_nullable */ false,
                            /* is_unsigned */ true, 0, "insert_id");
  fields.push_back(field);

  return &fields;
}

This implementation assumes it is not a simple getter and thus is called exactly once. If the assumption will shown to be incorrect by the assert, I'll move the code to init and will reduce this one to a simple getter. Also, something will have to free the memory we are allocating for the field, that's for later.

Run again…

Assertion failed: (false), function do_init_args, file table_function.h, line 467.

For JSON_TABLE, this method is documented to "check whether given default values can be saved to fields." Let's assume that we have no meaningful actions here, remove the assert, return success, run again:

Assertion failed: (0), function fill_result_table, file table_function.h, line 442.

Finally we get to the second serious method that should provide the payload for the result table:

bool Table_function_last_insert_ids::fill_result_table() {
  assert(!fields.is_empty());
  assert(!table->materialized);
  assert(table->s->fields == 1);

  empty_table();

  auto *const field = get_field(0);
  assert(field->field_index() == 0);
  field->store(current_thd->first_successful_insert_id_in_prev_stmt);
  field->set_notnull();
  write_row();

  return false;
}

And…

SELECT * FROM LAST_INSERT_IDS() as ids;
insert_id
0

Success!

Implementing the remaining stubs

We still have two stub methods with asserts in them: print and walk. Let's try figuring out under what conditions they get called, again, by looking at JSON_TABLE implementation.

Let's start with print. It gets called by sql_lex.cc Table_ref::print. There are references to optimizer trace and something else around that. Let's see if that "something else" refers to the good old EXPLAIN:

EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids;
Assertion failed: (false), function print, file table_function.h, line 450.
7   mysqld                              0x000000010385ffb8 Table_function_last_insert_ids::walk(bool (Item::*)(unsigned char*), enum_walk, unsigned char*) + 0

Interesting! We tried to find what calls print, but we found what calls walk instead. Since our function is atomic with regard to the query optimizer and has no internal structure, there is nothing to walk, let's remove the assert:

EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids;
Assertion failed: (false), function print, file table_function.h, line 450.

Great, so the same EXPLAIN statement exercises both walk and print. Let's add a simple implementation for the latter:

bool print(const THD *, String *str, enum_query_type) const override {
  return !str->append(STRING_WITH_LEN("last_insert_ids()"));
}

And now the EXPLAIN!

EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	ids	NULL	ALL	NULL	NULL	NULL	NULL	2	100.00	Table function: last_insert_ids; Using temporary
Warnings:
Note	1003	/* select#1 */ select `ids`.`insert_id` AS `insert_id` from last_insert_ids() `ids`

Memory management

One last thing is that Field object which was allocated in Table_function_last_insert_ids::get_field_list and never freed. Let's see if the standard tooling catches this error. AddressSanitizer under Linux includes LeakSanitizer, let's try it:

==31384==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1440 byte(s) in 6 object(s) allocated from:
    #0 0xab9c0d1a99ac in operator new(unsigned long) (/home/laurynas/vilniusdb/last_insert_ids/_build-debug-llvm-14-san/runtime_output_directory/mysqld+0xb0599ac) (BuildId: 611e155cd8dc43455bd5d5e0ddbdfe53ae205fa4)
    #1 0xab9c0e6bb8bc in Table_function_last_insert_ids::get_field_list() /home/laurynas/vilniusdb/last_insert_ids/sql/table_function.cc:778:23
    #2 0xab9c0e6b5514 in Table_function::create_result_table(THD*, unsigned long long, char const*) /home/laurynas/vilniusdb/last_insert_ids/sql/table_function.cc:64:46
    #3 0xab9c0e0024fc in Table_ref::setup_table_function(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_derived.cc:933:23
    #4 0xab9c0e388bfc in Query_block::resolve_placeholder_tables(THD*, bool) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_resolver.cc:1310:15
    #5 0xab9c0e3854a4 in Query_block::prepare(THD*, mem_root_deque<Item*>*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_resolver.cc:247:7
    #6 0xab9c0e3e3da4 in Sql_cmd_select::prepare_inner(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:484:17
    #7 0xab9c0e3e2d74 in Sql_cmd_dml::prepare(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:399:11
    #8 0xab9c0e3e4360 in Sql_cmd_dml::execute(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:539:9
    #9 0xab9c0e263934 in mysql_execute_command(THD*, bool, unsigned long long*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:5489:29
    #10 0xab9c0e25c8c8 in dispatch_sql_command(THD*, Parser_state*, unsigned long long*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:6276:21
    #11 0xab9c0e253894 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:2565:7
    #12 0xab9c0e259124 in do_command(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:1745:18
    #13 0xab9c0e7f6f10 in handle_connection(void*) /home/laurynas/vilniusdb/last_insert_ids/sql/conn_handler/connection_handler_per_thread.cc:307:13
    #14 0xab9c12076ee8 in pfs_spawn_thread(void*) /home/laurynas/vilniusdb/last_insert_ids/storage/perfschema/pfs.cc:3022:3
    #15 0xfbb124cc5978 in start_thread nptl/./nptl/pthread_create.c:447:8
    #16 0xfbb124d2ba48 in thread_start misc/../sysdeps/unix/sysv/linux/aarch64/clone3.S:76

Yep, a leak. Let's try freeing that memory in the destructor:

Table_function_last_insert_ids::~Table_function_last_insert_ids() {
  assert(fields.elements == 1);
  delete fields.head();
}

Run again, no leak or any other Sanitizer error.

Summary checklist

So, to summarize, the checklist for adding a new table-returning function to MySQL:

Add the function name as a lexer symbol in sql/lex.h
Add the function name symbol as a token in the parser (sql/sql_yacc.yy)
In the parser, rename table_function to json_table_function and add new table_function with json_table_function and new function alternatives. All these symbols must be typed as table_reference
Create a new parse tree node class, for example:

class PT_last_insert_ids_function : public PT_table_reference {
  using super = PT_table_reference;

 public:
  PT_last_insert_ids_function(const LEX_STRING &table_alias)
      : m_table_alias{table_alias} {}

  bool contextualize(Parse_context *pc) override;

 private:
  const LEX_STRING m_table_alias;
};

bool PT_last_insert_ids_function::contextualize(Parse_context *pc) {
  if (super::contextualize(pc)) return true;
  auto *const fn = new (pc->mem_root) Table_function_last_insert_ids{};
  if (unlikely(fn == nullptr)) return true;

  LEX_CSTRING alias;
  alias.length = strlen(fn->func_name());
  alias.str = sql_strmake(fn->func_name(), alias.length);
  if (unlikely(alias.str == nullptr)) return true;

  auto *const ti = new (pc->mem_root) Table_ident(alias, fn);
  if (ti == nullptr) return true;

  m_table_ref = pc->select->add_table_to_list(pc->thd, ti, m_table_alias.str, 0,
                                              TL_READ, MDL_SHARED_READ);
  if (m_table_ref == nullptr || pc->select->add_joined_table(m_table_ref))
    return true;

  return false;
}

Create a new table function class:

class Table_function_last_insert_ids final : public Table_function {
 public:
  ~Table_function_last_insert_ids() override;

  bool init() override { return false; }

  bool fill_result_table() override;

  const char *func_name() const override { return "last_insert_ids"; }

  bool print(const THD *, String *str, enum_query_type) const override {
    return !str->append(STRING_WITH_LEN("last_insert_ids()"));
  }

  bool walk(Item_processor, enum_walk, uchar *) override {
    return false;
  }

 private:
  List<Create_field> *get_field_list() override;

  bool do_init_args() override { return false; }

  List<Create_field> fields;
};

Table_function_last_insert_ids::~Table_function_last_insert_ids() {
  assert(fields.elements == 1);
  delete fields.head();
}

List<Create_field> *Table_function_last_insert_ids::get_field_list() {
  assert(fields.is_empty());

  auto *const field = new Create_field;
  field->init_for_tmp_table(MYSQL_TYPE_LONGLONG, MAX_BIGINT_WIDTH,
                            DECIMAL_NOT_SPECIFIED, /* is_nullable */ false,
                            /* is_unsigned */ true, 0, "insert_id");
  fields.push_back(field);

  return &fields;
}

bool Table_function_last_insert_ids::fill_result_table() {
  assert(!fields.is_empty());
  assert(!table->materialized);
  assert(table->s->fields == 1);

  empty_table();

  auto *const field = get_field(0);
  assert(field->field_index() == 0);
  field->store(current_thd->first_successful_insert_id_in_prev_stmt);
  field->set_notnull();
  write_row();

  return false;
}

Percona SEQUENCE_TABLE

After I have posted this, Percona's Yura Soroking pointed out on LinkedIn their implementation of SEQUENCE_TABLE function: blog post, the initial implementation source code. This implementation follows similar framework as the toy implementation above.

Tuesday, May 28, 2024

A MySQL 8.4.0 branch with patches

While writing the previous post, I noticed that I didn't have a central location for the patches that I submitted to Oracle. Some were in local branches, some were .patch files lying around. So now I pushed a tree that has all those patches applied in a single place: patched-mysql-8.4.0, and I even added a README. This tree hopefully will make it easier to rebase on future Oracle releases. I had maintained similar branches before while pushing some of the Percona patches to Oracle around early 8.0 times.

As for the patches themselves, the majority of them add various missing features for the clone plugin to be able to support more than one transactional storage engine. There is also a compilation fix and a slightly improved AddressSanitizer support patch.

All MySQL 8.4.0 users are advised to migrate ASAP! /s

Wednesday, May 08, 2024

Building and testing MySQL 8.0.37 & 8.4.0 on macOS

The first MySQL LTS release, 8.4.0, is out, together with 8.0.37! Which means it's time for me to build and test them in my main work environment, continuing the series (8.3.0/8.0.36, 8.2.0/8.0.35).

The first surprise is that both these releases do not build with the current XCode (15.3 at the time of writing), because the LLVM 17 compilation failure I previously reported (#113123: Compilation fails with LLVM 17 and XCode 15.3) is not fixed yet and started affected Apple toolchain too. I am using Homebrew-packaged LLVM 16 for all the builds of these versions. I didn't do any LLVM 18 nor GCC testing neither.

For the good news, I no longer get a build-breaking warning in NDB, even though its bug (#113662) is still open. Finally, for the build-related no-news, nothing has changed with regard to system vs bundled libraries: both versions continue to build with all the system libraries, except for zlib.

On to the testsuite. It seems that 8.4.0 release build testsuite takes some 20% less time than the 8.0.37 one. It must be the payoff from the deprecated feature removal removing their tests too, although I did not try to confirm that.

Fixed bugs and no longer reproducing test failures on open bugs:

https://bugs.mysql.com/bug.php?id=113257 (Assertion on rpl.rpl_replication_observers_example_plugin_ongoing_transaction)
https://bugs.mysql.com/bug.php?id=113258 (TLS 1.3 tests fail)
https://bugs.mysql.com/bug.php?id=113420 (testNdbProcess fails under ASan+UBSan)
https://bugs.mysql.com/bug.php?id=113736 (Test main.qualify_hypergraph failing with a result difference)

New test failures:

https://bugs.mysql.com/bug.php?id=114888 (Test innodb.parallel_read_kill always fails)
https://bugs.mysql.com/bug.php?id=114892 (Test XComControlTest.SuspectMemberFailedRemovalDueToMajorityLoss fails)
https://bugs.mysql.com/bug.php?id=114894 (Test rpl.rpl_log_info_repository_persistence_require_row failing)
https://bugs.mysql.com/bug.php?id=114896 (Several tablespace encryption tests failing)

I wanted to call the following the list of bugs with no changes, but I can't. While some of the bugs indeed have no changes, there is also a small but definite creep there: a test that only failed in one configuration before now fails in two. A second sibling test started failing with same symptoms. That's definitely not "no changes", but not new independent bugs neither. It's worrying.

https://bugs.mysql.com/bug.php?id=113189 (ColumnStatisticsTest.StoreAndRestoreAttributesEquiHeight unit test fails)
https://bugs.mysql.com/bug.php?id=113190 (Several tests in routertest_integration_routing_sharing_constrained_pools fail)
https://bugs.mysql.com/bug.php?id=113260 (Client error in error log: MY-004031 - The client was disconnected …)
https://bugs.mysql.com/bug.php?id=113665 (perfschema.relaylog fails with a result difference)
https://bugs.mysql.com/bug.php?id=113702 (Test binlog.binlog_error_action fails under Sanitizer build)
https://bugs.mysql.com/bug.php?id=113703 (Test innodb.ddl_kill fails under Sanitizer build) - the test now fails under debug build without sanitizer too.
https://bugs.mysql.com/bug.php?id=113704 (Test main.log_backtrace fails under Sanitizer build), but now main.log_backtrace_debug fails too, added it under the same bug.
https://bugs.mysql.com/bug.php?id=113709 (StrXfrmTest.ChineseUTF8MB4 failing with an AddressSanitizer error), which is a duplicate of #109540, which is fixed in 8.1.0 only. There is a comment that unit tests ASan errors are not a high priority error, it's just a tiny harmless heap-buffer-overflow /s.
https://bugs.mysql.com/bug.php?id=113710 ( Spec/HttpClientSecureTest.ensure/default_client_cipher_succeeds ASan error) - still fails on 8.0.37 but not 8.4.0. The latter shows an UBSan error though.
https://bugs.mysql.com/bug.php?id=113722 (Test main.index_merge_innodb failing with a result difference)

I did not re-test two bugs:

https://bugs.mysql.com/bug.php?id=113023 (mysql-test/include/wait_for_pattern_in_file.inc incompatible with Homebrew Perl)
https://bugs.mysql.com/bug.php?id=113113 (Build failure with Homebrew LLVM 14-17 on macOS), where I keep using the workaround of setting CMAKE_AR to Homebrew ar.

Thus, the numbers are: 4 new bugs reported, 5 fixed or no longer reproducing, 10 bugs with no changes or with a bit of creep, 2 non-tested.

But wait, there's more! I have also sent some patches to Oracle by OCA! I am only counting the recent patches I developed at VilniusDB and not the ones I submitted at Percona. So how many have they applied since 8.3.0?

https://bugs.mysql.com/bug.php?id=106737 (Wrong SE called to ack clone application if >1 SE) - nope.
https://bugs.mysql.com/bug.php?id=106750 (Some clone tests needlessly marked as Linux-only) - nope.
https://bugs.mysql.com/bug.php?id=107715 (Multiple SE clone not calling clone_end after mixed clone_begin failure/success) - nope.
https://bugs.mysql.com/bug.php?id=113123 (Compilation fails with LLVM 17 and XCode 15.3) - nope.

So, zero. Of course two out of four patches are for clone with a 2nd transactional storage engine, thus the excuse of Oracle MySQL not having such engine sounds plausible.

That's about it. I am still running Valgrind tests, and they should complete any week now, unless I have to kill them to take their machine for something else. I will not be updating this post or writing a new one unless their results are very, very unexpected. Otherwise here's to the new MySQL releases and hopefully I will continue the series in three months.

Tuesday, January 30, 2024

Introducing patch2testlist for MySQL development

I wrote a small shell utility patch2testlist that might be useful for fellow MySQL developers. It reads a diff and outputs the list of tests touched in this diff to run in a format suitable for mysql-test-run.pl consumption. Furthermore, when provided with a path to the source tree of the diff, it handles included files.

There are two ways to invoke it.

Quick-and-dirty mode that does not handle included files:
```
$ ./mtr `git diff | patch2testlist` ...
```
Thorough mode that considers included files, if the source tree path is given:
```
$ ./mtr `git diff | patch2testlist ../..` ...
```

What does it do? Let's consider an example:

$ git diff | diffstat
 mysql-test/extra/rpl_tests/rpl_replica_start_after_clone.inc                                    |    2 
 mysql-test/include/keyring_tests/binlog/rpl_encryption_master_key_rotation_at_startup.inc       |    5 -
 mysql-test/include/keyring_tests/mats/rpl_encryption.inc                                        |    2 
 mysql-test/include/keyring_tests/mats/rpl_encryption_master_key_generation_recovery.inc         |    2 
 mysql-test/suite/auth_sec/include/acl_tables_row_locking_test.inc                               |    4 
 mysql-test/suite/binlog/t/binlog_restart_server_with_exhausted_index_value.test                 |    1 
 mysql-test/suite/component_keyring_file/inc/rpl_setup_component.inc                             |    1 
 mysql-test/suite/innodb/t/log_8_0_11_case1.test                                                 |    1 
 mysql-test/suite/rocksdb/r/sys_tables.result                                                    |    2 
 mysql-test/suite/rocksdb/r/sys_tables_acl_tables_row_locking.result                             |  384 +++++++++++++++++++---------------------------------------------------------------
 mysql-test/suite/rocksdb/r/sys_tables_is_statistics_mysql.result                                |    4 
 mysql-test/suite/rocksdb/r/sys_tables_mysqlcheck.result                                         |    8 -
 mysql-test/suite/rpl/t/rpl_cloned_slave_relay_log_info.test                                     |    4 
 mysql-test/suite/rpl/t/rpl_encryption.test                                                      |    3 
 mysql-test/suite/rpl/t/rpl_encryption_master_key_generation_recovery.test                       |    3 
 mysql-test/suite/rpl/t/rpl_encryption_master_key_rotation_at_startup.test                       |    5 -
 mysql-test/suite/rpl/t/rpl_gtid_innodb_sys_header.test                                          |    2 
 mysql-test/suite/rpl_gtid/t/rpl_gtid_xa_commit_failure_before_gtid_externalization.test         |    1 
 mysql-test/suite/rpl_gtid/t/rpl_gtid_xa_commit_one_phase_failure_before_prepare_in_engines.test |    1 
 mysql-test/suite/rpl_gtid/t/rpl_gtid_xa_prepare_failure_before_prepare_in_engines.test          |    1 
 mysql-test/suite/rpl_gtid/t/rpl_gtid_xa_rollback_failure_before_gtid_externalization.test       |    1 
 mysql-test/suite/rpl_nogtid/t/rpl_assign_gtids_to_anonymous_transactions_clone.test             |    4 
 mysql-test/suite/rpl_nogtid/t/rpl_gtid_mode.test                                                |    5 -
 mysql-test/suite/rpl_nogtid/t/rpl_nogtid_encryption_read.test                                   |    3 
 mysql-test/suite/test_services/t/test_host_application_signal_plugin.test                       |    3 
 mysql-test/t/basedir.test                                                                       |    5 -
 mysql-test/t/mysqld_daemon.test                                                                 |    3 
 mysql-test/t/mysqld_safe.test                                                                   |   27 ++---
 mysql-test/t/restart_server.test                                                                |    3 
 mysql-test/t/restart_server_no_acl.test                                                         |    3 
...
$ git diff | patch2testlist
binlog.binlog_restart_server_with_exhausted_index_value innodb.log_8_0_11_case1 main.basedir
main.mysqld_daemon main.mysqld_safe main.restart_server main.restart_server_no_acl
rocksdb.sys_tables rocksdb.sys_tables_acl_tables_row_locking
rocksdb.sys_tables_is_statistics_mysql rocksdb.sys_tables_mysqlcheck
rpl.rpl_cloned_slave_relay_log_info rpl.rpl_encryption
rpl.rpl_encryption_master_key_generation_recovery
rpl.rpl_encryption_master_key_rotation_at_startup rpl.rpl_gtid_innodb_sys_header
rpl_gtid.rpl_gtid_xa_commit_failure_before_gtid_externalization
rpl_gtid.rpl_gtid_xa_commit_one_phase_failure_before_prepare_in_engines
rpl_gtid.rpl_gtid_xa_prepare_failure_before_prepare_in_engines
rpl_gtid.rpl_gtid_xa_rollback_failure_before_gtid_externalization
rpl_nogtid.rpl_assign_gtids_to_anonymous_transactions_clone rpl_nogtid.rpl_gtid_mode
rpl_nogtid.rpl_nogtid_encryption_read test_services.test_host_application_signal_plugin

The quick-and-dirty mode above does not require a hundred line script, a ten-line one will do. But notice that several of the changed files in the diffstat output are test include files (i.e. rpl_replica_start_after_clone.inc). Ideally we'd want to run any tests that include (directly and indirectly) such files, and the ten-line script does not handle this case.

That's what the other ninety lines of the script do. If the optional source tree path argument is given, then it greps for any included files under mysql-test/, then greps for newly-found files and so on until it finds no more:

$ git diff | patch2testlist ../..
auth_sec.acl_tables_row_locking binlog.binlog_restart_server_with_exhausted_index_value
component_keyring_file.rpl_binlog_cache_encryption
component_keyring_file.rpl_binlog_cache_temp_file_encryption
component_keyring_file.rpl_default_table_encryption component_keyring_file.rpl_encryption
component_keyring_file.rpl_encryption_master_key_generation_recovery
component_keyring_file.rpl_encryption_master_key_rotation_at_startup innodb.log_8_0_11_case1
main.basedir main.mysqld_daemon main.mysqld_safe main.restart_server main.restart_server_no_acl
rocksdb.sys_tables rocksdb.sys_tables_acl_tables_row_locking rocksdb.sys_tables_is_statistics_mysql
rocksdb.sys_tables_mysqlcheck rpl.rpl_cloned_slave_relay_log_info rpl.rpl_encryption
rpl.rpl_encryption_master_key_generation_recovery rpl.rpl_encryption_master_key_rotation_at_startup
rpl.rpl_gtid_innodb_sys_header rpl.rpl_slave_start_after_clone
rpl_gtid.rpl_gtid_only_start_replica_after_clone
rpl_gtid.rpl_gtid_xa_commit_failure_before_gtid_externalization rpl_gtid.rpl_gtid_xa_commit_one_phase_failure_before_prepare_in_engines
rpl_gtid.rpl_gtid_xa_prepare_failure_before_prepare_in_engines
rpl_gtid.rpl_gtid_xa_rollback_failure_before_gtid_externalization
rpl_nogtid.rpl_assign_gtids_to_anonymous_transactions_clone rpl_nogtid.rpl_gtid_mode
rpl_nogtid.rpl_nogtid_encryption_read test_services.test_host_application_signal_plugin

As you can see the list is now significantly longer, indicating a more thorough test run coverage of the diff. All this extra grepping takes about 90 seconds on my machine, if some popular include files are touched. I have no idea whether that's with hot or cold FS cache. I also don't know whether replacing grep with rg would it make it faster.

To minimize the false positives in included file search, grep considers the lines that don't start with the MTR language comment character #, and are like ...source...basename-of-included-file. This allows false positives in indented comments and inside string literals (that one should be rare) and it cannot tell apart files with the same name in different directories. In theory it also allows false negatives if an include file is referenced using a string variable to store its name. Any suggestions for better regexps are welcome.

It goes without saying that it is best applied on test-only patches. If you touch the source code, then you should be looking at whole MTR runs, or, if possible, MTR runs of selected suites. But if you are indeed working on a test-only patch, this script reduces the required test time effectively.

Should be portable but currently tested on macOS only. Feedback is welcome!

Friday, November 08, 2024

Changes to the SE to make it DDSE

Internal ID assignment

Handlerton API for DDSEs

Durability for DD transactions

Non-locking reads for user transactions for the DD tables

Attachable transactions

Changes to the core server

What the DDSE is

Early initialization of static plugins

Transaction isolation level requirements

Foreign keys

Implicit primary keys

TABLESPACE support

delete_row bug-for-bug compatibility with the DD-InnoDB interface

Duplicate entry or lock wait timeout

Supporting DD tables in multiple SEs

Changing the DDSE of an existing instance

Clone

Changes to InnoDB

Reimplementing InnoDB-specific INFORMATION_SCHEMA DD views.

InnoDB-specific DD tables

Mark DDL transactions as read-write

Register DDL transactions with server 2PC coordinator

GTID persistence with binlog disabled

Allowing UNDO tablespace truncation to run without RW workload

Testing

Conclusion

Reference

Tuesday, October 22, 2024

Friday, October 18, 2024

Build

Test

Totals

Newer tooling

Valgrind

One-off issues that I couldn't handle better

Conclusion

Thursday, October 10, 2024

Friday, September 06, 2024

Friday, August 23, 2024

My MySQL Development Tools

Tool configuration

Could be useful to you

Useful to me

zsh function library

No longer maintained (patches welcome)

Tuesday, July 09, 2024

Thursday, July 04, 2024

Friday, June 14, 2024

Exploring the Codebase

Function class

Patching the parser

Parse Tree Node class

Filling out the function class implementation

Implementing the remaining stubs

Memory management

Summary checklist

Percona SEQUENCE_TABLE

Tuesday, May 28, 2024

Wednesday, May 08, 2024

Tuesday, January 30, 2024

`TABLESPACE` support

`delete_row` bug-for-bug compatibility with the DD-InnoDB interface

Reimplementing InnoDB-specific `INFORMATION_SCHEMA` DD views.