Wednesday, December 21, 2022

The Zoo of MySQL Storage Engine Flags

MySQL has had pluggable storage engine support for decades now. Their capabilities vary wildly. Some of those storage engines are not even storage engine-like, for example BLACKHOLE does not store data at all. For OLTP, this herd thins very quickly if we only consider ACID-supporting ones: InnoDB, MyRocks, NDB, (up until recently) TokuDB, and (up until ten years ago) PBXT. I am not too familiar with OLAP, but I know there's Warp.

The core server layer has accumulated dozens of engine feature flags over those decades to accommodate those varying capabilities, let's take a look at what they can be. To make this post look less like a source code annotation dump, I'll try to sort them by feature area. If I classify something as an "implementation detail", it still could be more than that, there could be e.g. performance implications or something user-visible like forced closing of a table, but I'll be not figuring that out at this time.

Handlerton Flags

To recap, a "handlerton" is a "handler singleton", and a "handler" applies to a single opened table. So a handlerton is a singleton for all the tables in a particular storage engine, the capabilities for it as the whole, although later we'll see that some of the table flags are engine-level too. Perhaps that happened because handlerton concept was introduced later, in 5.0 series AFAIK, and it was all handler earlier.

The list in the source code starts with #define and ends with constexpr, showing its evolution as the coding standards evolve. Although it all should be somehow made into an enum class bitset, I think, with a healthy part of handler flags moved over here too.

Features

  • HTON_ALTER_NOT_SUPPORTED: cannot do ALTER TABLE, i.e. performance_schema.
  • HTON_TEMPORARY_NOT_SUPPORTED: cannot do CREATE TEMPORARY TABLE, i.e. performance_schema again.
  • HTON_SUPPORT_LOG_TABLES: can host mysql.general_log and slow_log tables. CSV and MyISAM!
  • HTON_NO_PARTITION: tables cannot be partitioned. performance_schema, Warp.
  • HTON_SUPPORTS_EXTENDED_KEYS: all the secondary keys include the primary key columns in the end. This is a feature for the query optimizer. InnoDB, MyRocks, TokuDB.
  • HTON_SUPPORTS_FOREIGN_KEYS: self-descriptive. InnoDB, NDB.
  • HTON_SUPPORTS_ATOMIC_DDL: DDL statements are atomic, nice. InnoDB, NDB.
  • HTON_SUPPORTS_PACKED_KEYS: MyISAM something something.
  • HTON_SUPPORTS_SECONDARY_ENGINE, HTON_SECONDARY_ENGINES_DDL: too early to tell, the secondary engine feature is under development, and even then it might not be public, i.e. something for HeatWave.
  • HTON_SUPPORTS_TABLE_ENCRYPTION: self-descriptive. InnoDB.
  • HTON_SUPPORTS_ENGINE_ATTRIBUTE: whether CREATE and ALTER can take ENGINE_ATTRIBUTE for engine-specific attributes provided by user.
  • HTON_SUPPORTS_GENERATED_INVISIBLE_PK: will create a hidden primary key if you don't provide one. InnoDB.

Hidden Storage Engines

  • HTON_HIDDEN: do not advertise as a storage engine at all, do not allow CREATE TABLE in it etc. Binary log is such storage engine!
  • HTON_NOT_USER_SELECTABLE: do not allow user access any tables in this engine. Binlog again. Not sure why it cannot be merged with HTON_HIDDEN.

Implementation Details

  • HTON_CLOSE_CURSORS_AT_COMMIT: last in-tree engine that used it was BDB in 5.0, which I am not familiar with. MariaDB docs (here goes Planet MySQL) say this was BerkeleyDB. But also TokuDB.
  • HTON_CAN_RECREATE: map TRUNCATE TABLE to DROP & CREATE internally. MyRocks, Warp.
  • HTON_FLUSH_AFTER_RENAME: introduced somewhere between 5.0 and 5.1, removed in 8.0. An implementation detail for the BDB storage engine.
  • HTON_NO_BINLOG_ROW_OPT: if using row-based binlog format, do not do anything about before/after row images for this engine, meaning all the columns will be always included. NDB.

Handlerton Flags for Foreign Keys

Foreign key capabilities are finer-grained. Archeologically we are above the #define layer, but below the constexpr layer. These are static const, so they are emitted in every single compilation unit including handler.h, unless the compilers manage to clean it up.

  • HTON_FKS_WITH_PREFIX_PARENT_KEYS: whether the foreign key can have arbitrary columns with resulting non-uniqueness etc. as long as the unique parent key columns are in its prefix. InnoDB.
  • HTON_FKS_WITH_SUPPORTING_HASH_KEYS: the foreign key can be a hash key. NDB.
  • HTON_FKS_NEED_DIFFERENT_PARENT_AND_SUPPORTING_KEY: can't have a self-referring foreign key. InnoDB.
  • HTON_FKS_WITH_EXTENDED_PARENT_KEYS: if the keys have primary key columns in its suffix, use that. InnoDB.

Optional Handlerton Function Pointers

We are done with handlerton flags, but we are not done with handlerton. Turns out, some of its function pointers are optional, and capabilities are inferred from them being non-null. A lot of them are transaction-related. These capabilities are:

Savepoints

  • savepoint_set: transaction savepoints. InnoDB, MyRocks, TokuDB.
  • savepoint_rollback_can_release_mdl: ability to rollback MDL locks taken after a savepoint on rollback there. InnoDB, MyRocks

XA Transactions

  • commit_by_xid, rollback_by_xid, set_prepared_in_tc, set_prepared_in_tc_by_xid: XA. InnoDB, MyRocks, TokuDB.
  • recover: can recover XA transactions on startup. InnoDB, MyRocks.
  • recover_prepared_in_tc: can recover prepared XA transactions on startup. InnoDB.

Other Features

  • prepare, set_prepared_in_tc: 2PC support. InnoDB, MyRocks.
  • start_consistent_snapshot: supports START TRANSACTION WITH CONSISTENT SNAPSHOT. InnoDB, MyRocks.
  • flush_logs: self-descriptive. InnoDB, MyRocks, TokuDB.
  • show_status: Likewise.
  • alter_tablespace, get_tablespace_type_by_name: supports tablespaces. InnoDB.
  • is_supported_system_table: can host at least some of the mysql schema tables. InnoDB.
  • get_table_statistics, get_index_column_cardinality, get_tablespace_statistics, prepare_secondary_engine, optimize_secondary_engine, compare_secondary_engine_cost: self-descriptive. InnoDB, MyRocks.
  • clone_interface: can clone. InnoDB. Watch this space for more exciting announcements!
  • lock_hton_log, unlock_hton_log, collect_hton_log_info: can participate in performance_schema.LOG_STATUS table. InnoDB, MyRocks

Implementation Details

  • replace_native_transaction_in_thd, ddse_dict_init, dict_...: native data dictionary support. InnoDB.
  • page_track: can be used with the page tracking service.
  • drop_database: wants to participate in DROP DATABASE implementation. NDB.
  • panic: can be shutdown, even if it's statically compiled in.
  • fill_is_schema: wants to fill in some of INFORMATION_SCHEMA tables by itself.
  • get_tablespace: unused.
  • post_ddl, post_recover, check_fk_column_compat, se_before_commit, se_after_commit, se_before_rollback: just let's say it's all implementation details.

Handler Table Flags

These flags are per-table, but before handlerton was a thing, a lot of per-engine flags were added too.

Features

  • HA_NO_TRANSACTIONS: this is not a real OLTP database storage engine. MyISAM.
  • HA_CAN_SQL_HANDLER: supports HANDLER interface. InnoDB.
  • HA_NO_AUTO_INCREMENT: does not support auto increment fields. performance_schema, Warp.
  • HA_HAS_CHECKSUM: CREATE TABLE supports CHECKSUM option. MyISAM.
  • HA_CAN_REPAIR: can REPAIR TABLE. MyISAM, Warp.
  • HA_CAN_EXPORT: supports FLUSH TABLE FOR EXPORT. InnoDB.
  • HA_SUPPORTS_DEFAULT_EXPRESSION: DEFAULT column clauses may be expressions. InnoDB, MyRocks, MyISAM.
  • HA_UPDATE_NOT_SUPPORTED: self-descriptive. ARCHIVE.
  • HA_DELETE_NOT_SUPPORTED: likewise.

Supported Column and Index Types

  • HA_NO_PREFIX_CHAR_KEYS: does not support indexing CHAR column prefixes. NDB.
  • HA_NO_VARCHAR: does not support VARCHAR type. No engine sets this.
  • HA_CAN_FULLTEXT: supports fulltext indexes. InnoDB.
  • HA_NO_BLOBS: does not support BLOBs.
  • HA_CAN_INDEX_BLOBS: can index BLOBs. InnoDB, MyRocks, TokuDB.
  • HA_BLOB_PARTIAL_UPDATE: BLOBs can be updated partially. InnoDB.
  • HA_CAN_GEOMETRY: can handle spatial data. InnoDB.
  • HA_CAN_RTREEKEYS: supports R-Tree indexes.
  • HA_SUPPORTS_GEOGRAPHIC_GEOMETRY_COLUMN: geometry supports can do not only coordinates on a flat surface, but geographic coordinates too. InnoDB.
  • HA_CAN_BIT_FIELD: supports BIT type. MyISAM.
  • HA_GENERATED_COLUMNS: supports generated columns. InnoDB, MyRocks, MyISAM.
  • HA_CAN_INDEX_VIRTUAL_GENERATED_COLUMN: can index them. InnoDB, MyRocks.
  • HA_ANY_INDEX_MAY_BE_UNIQUE: self-descriptive. MRG_MyISAM.
  • HA_DESCENDING_INDEX: supports descending indexes. InnoDB.
  • HA_MULTI_VALUED_KEY_SUPPORT: can index JSON arrays. InnoDB.

Query Optimizer Features

  • HA_PARTIAL_COLUMN_READ: can return a subset of row columns. InnoDB, MyRocks.
  • HA_TABLE_SCAN_ON_INDEX: a clustered index always exists and should be used for table scans. InnoDB, TokuDB.
  • HA_FAST_KEY_READ: random key order is as fast as sequential. Set for in-memory engines, but probably should be set for everything on SSD?
  • HA_NULL_IN_KEY: can index NULL. InnoDB, MyRocks, TokuDB.
  • HA_STATS_RECORDS_IS_EXACT: table record count in statistics is exact. InnoDB.
  • HA_PRIMARY_KEY_IN_READ_INDEX: when doing an index-only read on a secondary key, primary key columns will be returned too. InnoDB, MyRocks, TokuDB.
  • HA_PRIMARY_KEY_REQUIRED_FOR_POSITION: to position() a handler, primary key is needed. Otherwise, position works as a row counter. InnoDB, MyRocks, TokuDB.
  • HA_COUNT_ROWS_INSTANT: not sure what the difference from HA_STATS_RECORDS_IS_EXACT is. The name suggests that counting records is instant, and the comment suggests that it's exact. MyISAM.
  • HA_READ_BEFORE_WRITE_REMOVAL: supports blind writes, i.e. to write a row you don't have to read it first. NDB. But, this has no relation to a similar optimization in MyRocks and TokuDB.
  • HA_BLOCK_CONST_TABLE: disables const-table optimization. NDB.
  • HA_CAN_FULLTEXT_HINTS: can provide extra FULLTEXT-related info in EXPLAIN SELECT? InnoDB.

Implementation Details

  • HA_UNUSED3: used to be HA_REC_NOT_IN_SEQ before 8.0. Implementation detail.
  • HA_REQUIRES_KEY_COLUMNS_FOR_DELETE: need to read all the key columns before deleting a row by that key. Unused in MySQL, used by MariaDB MyISAM and InnoDB.
  • HA_DUPLICATE_POS: on duplicate key error, provide the position of the existing key.
  • HA_AUTO_PART_KEY: multi-part keys can include auto increment columns. TokuDB.
  • HA_REQUIRE_PRIMARY_KEY: a user-provided primary key must exist. No SE uses this.
  • HA_UNUSED14: until 5.7 it was HA_CAN_INSERT_DELAYED. MyISAM.
  • HA_PRIMARY_KEY_REQUIRED_FOR_DELETE: DELETE and UPDATE work by primary key. performance_schema.
  • HA_FILE_BASED: each table is stored in a separate file. MyISAM, TokuDB.
  • HA_NO_COPY_ON_ALTER: the name suggests that "ALTER TABLE does not copy data", but this is different from online ALTER. MRG_MyISAM.
  • HA_HAS_OWN_BINLOGGING: NDB.
  • HA_BINLOG_ROW_CAPABLE, HA_BINLOG_STMT_CAPABLE: what binary log formats are supported.
  • HA_DUPLICATE_KEY_NOT_IN_ORDER: multiple key conflicts in a multiple value-REPLACE statements are reported not in the ascending key name order. Unused in MySQL, used by CONNECT storage engine in MariaDB.
  • HA_CAN_FULLTEXT_EXT: supports "extended fulltext API", i.e. it is InnoDB and not MyISAM.
  • HA_READ_OUT_OF_SYNC: the storage engine is BLACKHOLE. I cannot make more sense out of the description: "what you read will not be what is expected to be in the table".
  • HA_NO_READ_LOCAL_LOCK: does not support LOCK TABLE READ LOCAL, and does not want them to be upgraded to LOCK TABLE READ by the server. InnoDB.
  • HA_ATTACHABLE_TRX_COMPATIBLE: the storage engine is "compatible" with attachable transactions. It is a superset of "supporting" them. InnoDB and MyISAM.
  • HA_NO_INDEX_ACCESS: the indexes are not really indexes, because they don't support access through them. MOCK (that's a storage engine name, not a suggestion).

Index Flags

These are capabilities for individual indexes.

  • HA_READ_NEXT: the index can go from one record to the next. Not used, always assumed.
  • HA_READ_PREV: the index can go to the previous record.
  • HA_READ_ORDER: the index is ordered. Usually might mean not a hash index.
  • HA_READ_RANGE: the index supports ranges, again not a hash index, usually.
  • HA_ONLY_WHOLE_INDEX: the index does not support key prefix search. Hash indexes again.
  • HA_TABLE_SCAN_ON_NULL: the index does not store NULLs even if the underlying column does, forcing a switch to table scan. NDB.
  • HA_KEYREAD_ONLY: index-only scans are supported. InnoDB, MyRocks, TokuDB.
  • HA_KEY_SCAN_NOT_ROR: Not sure. Will update if I figure it out. InnoDB.
  • HA_DO_INDEX_COND_PUSHDOWN: supports Index Condition Pushdown. InnoDB, MyRocks, TokuDB.

Handler Partitioning Flags

Storage engines can implement partitioning with different levels of capabilities.

  • HA_CAN_UPDATE_PARTITION_KEY: can do UPDATE involving the partitioning key. NDB.
  • HA_CAN_PARTITION_UNIQUE: can handle unique indexes across partitions correctly. NDB.
  • HA_USE_AUTO_PARTITION: not sure what "automatic partitioning" is, but this is something NDB supports.
  • HA_CAN_EXCHANGE_PARTITION: can ALTER TABLE EXCHANGE PARTITION with a non-partitioned table. InnoDB.
  • HA_CANNOT_PARTITION_FK: a foreign key cannot be the partition key. MyRocks, TokuDB.
  • HA_TRUNCATE_PARTITION_PRECLOSE: TRUNCATE PARTITION requires closing the involved tables. InnoDB.

Writing this post resulted in three bug reports asking to remove get_tablespace, HA_REQUIRES_KEY_COLUMNS_FOR_DELETE, and HA_DUPLICATE_KEY_NOT_IN_ORDER.

1 comment:

Mark Callaghan said...

"this is not a real OLTP database storage engine. MyISAM."

If only they had gone the Mongo/mmap route -- crash safe, single writer, multiple reader. That would have given MyISAM a few more years of happy users.