Edit: links to Percona SEQUENCE_TABLE implementation added to the end
MySQL has many native functions and it's relatively straightforward to write new ones. While the official internals manual has a chapter on adding native functions, some aspects aren't immediately obvious, such as the possible return types for functions. The linked docs suggest that functions can return a double, an integer, or a string, but doesn't mention that they can also return JSON documents (i.e. JSON_ARRAY) and relational tables (i.e. JSON_TABLE). I have filed a bug for this omission, meanwhile this post will be my attempt at documenting the implementation of table-returning functions.
We will implement a table-returning function called LAST_INSERT_IDS
, We will focus
on the server infrastructure to support that, not on the function's purpose or the
actual data it returns. Let's do the simplest possible thing: return a table with one
column and one row.
Exploring the Codebase
Grepping the source tree for JSON_TABLE
points to Table_function
class declared
in sql/table_function.h. Source code history then points to this commit, which in
turn mentions Worklog #8867 (Add JSON table functions). Luckily for us it's a
relatively old worklog, thus available publicly! From the worklog we can learn that
any result tables will start as in-memory temporary tables which will spill to disk
as needed. Good. The worklog also touches upon a lot of stuff elsewhere in the server
so that the query optimizer will know how to deal with table-returning functions. We
won't need to deal with that while implementing a new function, but it may provide
debugging starters should anything go wrong.
Function class
To start writing code, we have to derive a new concrete class from Table_function
and let the compiler tell us what methods must be implemented:
class Table_function_last_insert_ids final : public Table_function { public: bool init() override; bool fill_result_table() override; const char *func_name() const override; bool print(const THD *thd, String *str, enum_query_type query_type) const override; bool walk(Item_processor processor, enum_walk walk, uchar *arg) override; private: List<Create_field> *get_field_list() override; bool do_init_args() override; };
But let's not implement them just yet (stub them out with assert(0)
bodies to learn
what gets called when), and let's look into instantiating the class
object first. The regular functions get registered with MySQL in func_array
variable in sql/item_create.cc, and the surrounding code knows how to create their
objects. But JSON_TABLE
is not there! Grepping shows that it is wired directly to
the parser, probably because its arguments are non-trivial. Our function is much
simpler in that regard, can we get away with putting it there instead of patching
parser? It turns out that no, we cannot, because the simpler functions derive from
Item_func
class, which derives from Item
, which is what this infrastructure
expects. And our table-returning function derives from Table_function
, which has no
further ancestors.
Patching the parser
OK, off to the parser we go. To find a starting point, let's check what JSON_TABLE
does. Grepping for Table_function_json
in the parser (the worklog is not too
specific in this area), we get a match for PT_table_factor_function
class in
sql/parse_tree_nodes.cc. Then we grep for that class in the parser grammar and we get
a hint why the table-returning functions are harder to implement than other kinds of
functions:
table_function: JSON_TABLE_SYM '(' expr ',' text_literal columns_clause ')' opt_table_alias { // Alias isn't optional, follow derived's behavior if ($8 == NULL_CSTR) { my_message(ER_TF_MUST_HAVE_ALIAS, ER_THD(YYTHD, ER_TF_MUST_HAVE_ALIAS), MYF(0)); MYSQL_YYABORT; } $$= NEW_PTN PT_table_factor_function($3, $5, $6, to_lex_string($8)); } ;
We can see that the parser knows how to work with exactly one table-returning
function that is named JSON_TABLE
, thus we have to patch the parser. Let's
keep everything under the non-terminal symbol table_function
:
table_function: json_table_function | last_insert_ids_function ; last_insert_ids_function: LAST_INSERT_IDS_SYM '(' ')' opt_table_alias { // Alias isn't optional, follow derived's behavior if ($4 == NULL_CSTR) { my_message(ER_TF_MUST_HAVE_ALIAS, ER_THD(YYTHD, ER_TF_MUST_HAVE_ALIAS), MYF(0)); MYSQL_YYABORT; } $$= NEW_PTN PT_last_insert_ids_function(to_lex_string($4)); } ; json_table_function: // As before
Unfortunately the opt_table_alias
block must be duplicated. But, no increase in the
parser shift/reduce conflicts! There is more not particularly interesting stuff to do
in the lexer and the parser:
- Add the function name to the lexer,
{SYM("LAST_INSERT_IDS", LAST_INSERT_IDS_SYM)},
tosql/lex.h
. - Declare this lexer symbol as a token for the parser. I did not make it a keyword
because
JSON_TABLE
wasn't one neither:%token LAST_INSERT_IDS_SYM 10024
. - Finally, declare
last_insert_ids_function
symbol to be oftable_reference
type, same astable_function
was before.
Parse Tree Node class
That's enough for the lexer and the parser, next we need to implement
PT_last_insert_ids_function
class:
class PT_last_insert_ids_function : public PT_table_reference { public: PT_last_insert_ids_function(const LEX_STRING &table_alias) : m_table_alias{table_alias} {} private: const LEX_STRING m_table_alias; };
Without a PT_last_insert_ids_function::contextualize
method, which would actually
create the Table_function_last_insert_ids
object, the above implementation fails
rather non-obviously:
mysqltest: At line 24: Query 'SELECT * FROM LAST_INSERT_IDS() as ids' failed. ERROR 1096 (HY000): No tables used
Let's add that method by copying, pasting, and deleting the non-applicable bits of
PT_table_factor_function::contextualize
. The job of this method will be to create
the function object and to assign a name and a table for the query optimizer for the
result.
bool PT_last_insert_ids_function::contextualize(Parse_context *pc) { if (super::contextualize(pc)) return true; auto *const fn = new (pc->mem_root) Table_function_last_insert_ids{}; if (unlikely(fn == nullptr)) return true; LEX_CSTRING alias; alias.length = strlen(fn->func_name()); alias.str = sql_strmake(fn->func_name(), alias.length); if (unlikely(alias.str == nullptr)) return true; auto *const ti = new (pc->mem_root) Table_ident(alias, fn); if (ti == nullptr) return true; m_table_ref = pc->select->add_table_to_list(pc->thd, ti, m_table_alias.str, 0, TL_READ, MDL_SHARED_READ); if (m_table_ref == nullptr || pc->select->add_joined_table(m_table_ref)) return true; return false; }
Filling out the function class implementation
This implementation forces us to define the first method in
Table_function_last_insert_ids
, a very simple one:
const char *func_name() const override { return "last_insert_ids"; }
Let's run it!
SELECT * FROM LAST_INSERT_IDS() as ids; ... Assertion failed: (0), function init, file table_function.h, line 440.
…which points to the stub Table_function_last_insert_ids::init()
. Let's say we
have no meaningful initialization at this point, so remove the assert from the stub:
Assertion failed: (false), function get_field_list, file table_function.h, line 464.
This is get_field_list
method, and we have to implement it to describe the schema
of the table we will be returning:
List<Create_field> *Table_function_last_insert_ids::get_field_list() { assert(fields.is_empty()); auto *const field = new Create_field; field->init_for_tmp_table(MYSQL_TYPE_LONGLONG, MAX_BIGINT_WIDTH, DECIMAL_NOT_SPECIFIED, /* is_nullable */ false, /* is_unsigned */ true, 0, "insert_id"); fields.push_back(field); return &fields; }
This implementation assumes it is not a simple getter and thus is called exactly
once. If the assumption will shown to be incorrect by the assert
, I'll move the
code to init
and will reduce this one to a simple getter. Also, something will have
to free the memory we are allocating for the field, that's for later.
Run again…
Assertion failed: (false), function do_init_args, file table_function.h, line 467.
For JSON_TABLE
, this method is documented to "check whether given default values
can be saved to fields." Let's assume that we have no meaningful actions here, remove
the assert, return success, run again:
Assertion failed: (0), function fill_result_table, file table_function.h, line 442.
Finally we get to the second serious method that should provide the payload for the result table:
bool Table_function_last_insert_ids::fill_result_table() { assert(!fields.is_empty()); assert(!table->materialized); assert(table->s->fields == 1); empty_table(); auto *const field = get_field(0); assert(field->field_index() == 0); field->store(current_thd->first_successful_insert_id_in_prev_stmt); field->set_notnull(); write_row(); return false; }
And…
SELECT * FROM LAST_INSERT_IDS() as ids; insert_id 0
Success!
Implementing the remaining stubs
We still have two stub methods with asserts in them: print
and walk
.
Let's try figuring out under what conditions they get called, again, by looking at
JSON_TABLE
implementation.
Let's start with print
. It gets called by sql_lex.cc
Table_ref::print
. There
are references to optimizer trace and something else around that. Let's see if that
"something else" refers to the good old EXPLAIN
:
EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids; Assertion failed: (false), function print, file table_function.h, line 450. 7 mysqld 0x000000010385ffb8 Table_function_last_insert_ids::walk(bool (Item::*)(unsigned char*), enum_walk, unsigned char*) + 0
Interesting! We tried to find what calls print
, but we found what calls walk
instead. Since our function is atomic with regard to the query optimizer and has no
internal structure, there is nothing to walk, let's remove the assert
:
EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids; Assertion failed: (false), function print, file table_function.h, line 450.
Great, so the same EXPLAIN
statement exercises both walk
and print
. Let's add
a simple implementation for the latter:
bool print(const THD *, String *str, enum_query_type) const override { return !str->append(STRING_WITH_LEN("last_insert_ids()")); }
And now the EXPLAIN
!
EXPLAIN SELECT * FROM LAST_INSERT_IDS() as ids; id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE ids NULL ALL NULL NULL NULL NULL 2 100.00 Table function: last_insert_ids; Using temporary Warnings: Note 1003 /* select#1 */ select `ids`.`insert_id` AS `insert_id` from last_insert_ids() `ids`
Memory management
One last thing is that Field
object which was allocated in
Table_function_last_insert_ids::get_field_list
and never freed. Let's see if the
standard tooling catches this error. AddressSanitizer under Linux includes
LeakSanitizer, let's try it:
==31384==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1440 byte(s) in 6 object(s) allocated from: #0 0xab9c0d1a99ac in operator new(unsigned long) (/home/laurynas/vilniusdb/last_insert_ids/_build-debug-llvm-14-san/runtime_output_directory/mysqld+0xb0599ac) (BuildId: 611e155cd8dc43455bd5d5e0ddbdfe53ae205fa4) #1 0xab9c0e6bb8bc in Table_function_last_insert_ids::get_field_list() /home/laurynas/vilniusdb/last_insert_ids/sql/table_function.cc:778:23 #2 0xab9c0e6b5514 in Table_function::create_result_table(THD*, unsigned long long, char const*) /home/laurynas/vilniusdb/last_insert_ids/sql/table_function.cc:64:46 #3 0xab9c0e0024fc in Table_ref::setup_table_function(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_derived.cc:933:23 #4 0xab9c0e388bfc in Query_block::resolve_placeholder_tables(THD*, bool) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_resolver.cc:1310:15 #5 0xab9c0e3854a4 in Query_block::prepare(THD*, mem_root_deque<Item*>*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_resolver.cc:247:7 #6 0xab9c0e3e3da4 in Sql_cmd_select::prepare_inner(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:484:17 #7 0xab9c0e3e2d74 in Sql_cmd_dml::prepare(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:399:11 #8 0xab9c0e3e4360 in Sql_cmd_dml::execute(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_select.cc:539:9 #9 0xab9c0e263934 in mysql_execute_command(THD*, bool, unsigned long long*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:5489:29 #10 0xab9c0e25c8c8 in dispatch_sql_command(THD*, Parser_state*, unsigned long long*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:6276:21 #11 0xab9c0e253894 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:2565:7 #12 0xab9c0e259124 in do_command(THD*) /home/laurynas/vilniusdb/last_insert_ids/sql/sql_parse.cc:1745:18 #13 0xab9c0e7f6f10 in handle_connection(void*) /home/laurynas/vilniusdb/last_insert_ids/sql/conn_handler/connection_handler_per_thread.cc:307:13 #14 0xab9c12076ee8 in pfs_spawn_thread(void*) /home/laurynas/vilniusdb/last_insert_ids/storage/perfschema/pfs.cc:3022:3 #15 0xfbb124cc5978 in start_thread nptl/./nptl/pthread_create.c:447:8 #16 0xfbb124d2ba48 in thread_start misc/../sysdeps/unix/sysv/linux/aarch64/clone3.S:76
Yep, a leak. Let's try freeing that memory in the destructor:
Table_function_last_insert_ids::~Table_function_last_insert_ids() { assert(fields.elements == 1); delete fields.head(); }
Run again, no leak or any other Sanitizer error.
Summary checklist
So, to summarize, the checklist for adding a new table-returning function to MySQL:
- Add the function name as a lexer symbol in
sql/lex.h
- Add the function name symbol as a token in the parser (
sql/sql_yacc.yy
) - In the parser, rename
table_function
tojson_table_function
and add newtable_function
withjson_table_function
and new function alternatives. All these symbols must be typed astable_reference
- Create a new parse tree node class, for example:
class PT_last_insert_ids_function : public PT_table_reference { using super = PT_table_reference; public: PT_last_insert_ids_function(const LEX_STRING &table_alias) : m_table_alias{table_alias} {} bool contextualize(Parse_context *pc) override; private: const LEX_STRING m_table_alias; }; bool PT_last_insert_ids_function::contextualize(Parse_context *pc) { if (super::contextualize(pc)) return true; auto *const fn = new (pc->mem_root) Table_function_last_insert_ids{}; if (unlikely(fn == nullptr)) return true; LEX_CSTRING alias; alias.length = strlen(fn->func_name()); alias.str = sql_strmake(fn->func_name(), alias.length); if (unlikely(alias.str == nullptr)) return true; auto *const ti = new (pc->mem_root) Table_ident(alias, fn); if (ti == nullptr) return true; m_table_ref = pc->select->add_table_to_list(pc->thd, ti, m_table_alias.str, 0, TL_READ, MDL_SHARED_READ); if (m_table_ref == nullptr || pc->select->add_joined_table(m_table_ref)) return true; return false; }
- Create a new table function class:
class Table_function_last_insert_ids final : public Table_function { public: ~Table_function_last_insert_ids() override; bool init() override { return false; } bool fill_result_table() override; const char *func_name() const override { return "last_insert_ids"; } bool print(const THD *, String *str, enum_query_type) const override { return !str->append(STRING_WITH_LEN("last_insert_ids()")); } bool walk(Item_processor, enum_walk, uchar *) override { return false; } private: List<Create_field> *get_field_list() override; bool do_init_args() override { return false; } List<Create_field> fields; }; Table_function_last_insert_ids::~Table_function_last_insert_ids() { assert(fields.elements == 1); delete fields.head(); } List<Create_field> *Table_function_last_insert_ids::get_field_list() { assert(fields.is_empty()); auto *const field = new Create_field; field->init_for_tmp_table(MYSQL_TYPE_LONGLONG, MAX_BIGINT_WIDTH, DECIMAL_NOT_SPECIFIED, /* is_nullable */ false, /* is_unsigned */ true, 0, "insert_id"); fields.push_back(field); return &fields; } bool Table_function_last_insert_ids::fill_result_table() { assert(!fields.is_empty()); assert(!table->materialized); assert(table->s->fields == 1); empty_table(); auto *const field = get_field(0); assert(field->field_index() == 0); field->store(current_thd->first_successful_insert_id_in_prev_stmt); field->set_notnull(); write_row(); return false; }