Wednesday, January 18, 2023

On Writing to the MySQL Error Log

If you develop a 3rd party MySQL plugin, it might take a while for you to figure out how to make your plugin write to the server error log in an official supported way. Let's take a look.

First things first: you don't want to patch the core server and ideally you'd develop and distribute the plugin only. I don't think there exists a single plugin that pulls this off today, but that's the ideal.

Then, you want to be a good citizen and not write to stderr directly. By doing that, you'd bypass the whole error logging machinery and the plugin messages could not be filtered, redirected to i.e. JSON or system error log sinks, and would not have proper severity and timestamps set. As the plugin developer you also want to be productive and avoid avoidable bugs at least, so even if you wanted to write to stderr directly, both printf and std::cerr are terrible APIs. For the former, fixing -Wformat-security warnings for all the platforms results in unreadable format strings "Foo: %" PRIu64 ", bar: %" PRIu64, the latter is just universally terrible with the sole good property of being type-safe. You'd want std::format (or {fmt} in C++17), so you could be ergonomic and type safe at the same time: "Foo: {}, bar: {}".

OK, so you have to use official MySQL APIs for plugins to write to the error logs. Which ones? Well, the internal documentation has a nice chapter "MySQL Services for Plugins" with an example:

my_plugin_log_message(plugin_ptr, MY_ERROR_LEVEL, "Cannot initialize plugin");

That's not too bad, you think, except that this is a printf-like API and not std::format. Then you grep the source tree for this function, and whoa:

/**
For use by legacy sql_print_*(), legacy my_plugin_log_message();
also available via the log_builtins service as message().
Wherever possible, use the fluent C++ wrapper LogErr()
(see log_builtins.h) instead.
*/

Oh, so my_plugin_log_message is "legacy" and you should be using fluent C++ LogErr from log_builtins.h. Let's go there:

#define LogErr(severity, ecode, ...) \
  LogEvent()                         \
      .prio(severity)                \
      .errcode(ecode)                \
      .subsys(LOG_SUBSYSTEM_TAG)     \
      .component(LOG_COMPONENT_TAG)  \
      .source_line(__LINE__)         \
      .source_file(MY_BASENAME)      \
      .function(__FUNCTION__)        \
      .lookup(ecode, ##__VA_ARGS__)

The fluency here applies to the macro definition itself, and not to the macro calls. There is no usage comment for it and there are a lot of things going on that need figuring out, maybe let's try grepping for examples? Grep shows that this macro is indeed heavily used, and also that it is not specific for plugins, because the core server uses it too. So you focus the grep to the plugin/ directory and pick a (not really) random sample:

    mysql_error_service_emit_printf(svc_error, ER_GRP_RPL_UDF_ERROR, MYF(0),
                                    action_name, error_message);
    if (log_error)
      LogErr(ERROR_LEVEL, ER_GRP_RPL_SERVER_UDF_ERROR, action_name,
             error_message);
    return false;
  }
}
// Log the error in case we can't do much
LogPluginErr(ERROR_LEVEL, ER_GRP_RPL_SERVER_UDF_ERROR, action_name,
             error_message);

Many, many things going on here. Why there are LogErr and LogPluginErr both at the same time? Is it the latter you should be using? What's that mysql_error_service_emit_printf thing? It looks error-log'ish too. Finally, why there is both ER_... something and error_message?

Let's start with LogErr vs LogPluginErr. Remember that my_plugin_log_message was documented in the internals manual? These two are not there. Googling gives skeleton Doxygen docs for log_builtins.h, so might as well go to the source code directly:

#define LogPluginErr(severity, ecode, ...)                           \
  LogEvent()                                                         \
      .prio(severity)                                                \
      .errcode(ecode)                                                \
      .subsys(LOG_SUBSYSTEM_TAG)                                     \
      .component("plugin:" LOG_COMPONENT_TAG)                        \
      .source_line(__LINE__)                                         \
      .source_file(MY_BASENAME)                                      \
      .function(__FUNCTION__)                                        \
      .lookup_quoted(ecode, "Plugin " LOG_COMPONENT_TAG " reported", \
                     ##__VA_ARGS__)

That's not too different from the LogErr definition above and seems interchangeable, with the difference being all those extra "plugin:" strings. If you have to choose between LogErr and LogErrPlugin, the latter looks like the right choice. Grepping shows heavy usage in plugin/ directory (and none in the core server), which seems correct. Let's figure out LOG_SUBSYSTEM_TAG and LOG_COMPONENT_TAG (and maybe MY_BASENAME) later.

Next question, should you be concerned about mysql_error_service_emit_printf? The Doxygen docs say "This function is substitute api service for my_error function." And my_error Doxygen docs say "Fill in and print a previously registered error message.". Print where? To an error log? You grep grep grep and half an hour later find yourself in my_message_sql and convince yourself that this error is, in fact, printed to the client and not to the error log. OK, so that was a digression.

At this point you are very certain you should be using LogPluginErr, but it was taking those ER_... error codes and extra parameters. How do you allocate an error code for your 3rd party, out-of-tree plugin? Maybe let's see how the in-tree plugins do that as for the starting point. You grep to find this in share/messages_to_error_log.txt:

ER_GRP_RPL_SERVER_UDF_ERROR
  eng "The function '%s' failed. %s"

You learn that these are printf-style format strings determining the tail of arguments for LogPluginErr. But remember, you don't want to patch the core server. So maybe the build process for this file could pick up some extra files too? You look at comp_err source and how it is invoked by CMake currently, and arrive at the conclusion, that no, you cannot do that.

And there is a fundamental reason: what error code numbers should be assigned to your plugin, so that there are no conflicts with the core server and all the other 3rd party plugins in the world? Ideally without a centralized number assignment authority? The closest I have to the answer is that the plugins could use the letter prefixes, i.e. the core server already prefixes all error codes with MY-. But the prefixes only reduce the conflict probability, not eliminate it.

So at this point you know you are supposed to use LogPluginErr but you can only use the existing error codes for it. How to find some suitable ones? Well a loophole happens to exist:

ER_LOG_PRINTF_MSG
  eng "%s"

An error code that says "this is a printf-formatted message", perfect.

You are almost there, what's remaining are the LOG_SUBSYSTEM_TAG, LOG_COMPONENT_TAG, and MY_BASENAME macros. By grepping for the first two you find something like this, in the group replication plugin:

ADD_DEFINITIONS(-DLOG_SUBSYSTEM_TAG="Repl")
ADD_DEFINITIONS(-DLOG_COMPONENT_TAG="group_replication")

OK, so LOG_COMPONENT_TAG should be defined to your plugin name, and LOG_SUBSYSTEM_TAG to something broader if applicable. And MY_BASENAME takes care of itself: log_builtins.h includes log_shared.h, which includes my_basename.h, which defines it.

Looks like you got all the pieces! Let's try it!

#define LOG_SUBSYSTEM_TAG "ThePlugin"  // This is proof-of-concept only:
#define LOG_COMPONENT_TAG "ThePlugin"  // should be done in CMake
// ...
#include "mysql/components/services/log_builtins.h"
// ...
LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG, "Hello, world!");

And the error log:

2023-01-18T06:38:01Z UTC - mysqld got signal 11 ;
...
0   mysqld                              0x0000000102610b8c my_print_stacktrace(unsigned char const*, unsigned long) + 72
1   mysqld                              0x0000000100f47250 print_fatal_signal(int) + 764
2   mysqld                              0x0000000100f4751c handle_fatal_signal + 120
3   libsystem_platform.dylib            0x00000001962fb2a4 _sigtramp + 56
4   ha_theplugin.so                     0x0000000124724c8c LogEvent::LogEvent() + 28
5   ha_theplugin.so                     0x0000000124724c8c LogEvent::LogEvent() + 28
....

Well, this is the error log all right, just not what you asked it to log.

 * thread #41, name = 'connection', stop reason = EXC_BAD_ACCESS (code=1, address=0x80)
    frame #0: 0x0000000151405980 ha_theplugin.so`LogEvent::LogEvent(this=0x00000001717e0308) at log_builtins.h:968:15
   965 	  */
   966 	  LogEvent() {
   967 	    have_msg = false;
-> 968 	    if ((ll = log_line_init()) != nullptr) {
   969 	      if ((msg = (char *)log_malloc(LOG_BUFF_MAX)) == nullptr) {
   970 	        log_line_exit(ll);
   971 	        ll = nullptr;

log_line_init is a nullptr. It looks like a function/method call but there is a macro to expand first:

#define log_line_init log_bi->line_init

And then

(lldb) p log_bi
(const mysql_service_log_builtins_t *) $0 = nullptr

More grepping, this time for log_bi. It is also helpful if you read the internals manual and recall something vague that everything is dynamic now. So let's say you find the following in the rewriter plugin:

static SERVICE_TYPE(registry) *reg_srv = nullptr;
SERVICE_TYPE(log_builtins) *log_bi = nullptr;
SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;
// ...
static int rewriter_plugin_init(MYSQL_PLUGIN plugin_ref) {
  // ...
  // Initialize error logging service.
  if (init_logging_service_for_plugin(&reg_srv, &log_bi, &log_bs)) return 1;
  // ...
}
static int rewriter_plugin_deinit(void *) {
  // ...
  deinit_logging_service_for_plugin(&reg_srv, &log_bi, &log_bs);
  // ...
}

You add the corresponding code to your plugin, and finally:

2023-01-18T06:56:40.735486Z 9 [Note] [MY-011071] [Server] Plugin ThePlugin reported: 'Hello, world!'

Success!

Can you do better? What if you need to log something else than a single static string? Yes, you can, and I am advocating for introducing std::format or {fmt} into the core server here too:

LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG,
             fmt::format("An error has occurred while reading {}: errno = {}",
                         path, my_errno()).c_str());

So, here we are. Of course, if you already happen to know that you should be using LogPluginErr(ER_LOG_PRINTF_MSG) or even just my_plugin_log_message then the majority of this post might seem to be trivial. But I believe that documentation and lowering the entry barrier for new MySQL plugin writers is important too.