Laurynas Biveinis: October 2021

UnoDB has two Adaptive Radix Tree implementations: a regular one, which is not safe for concurrent use, and optimistic lock coupling one (OLC ART), which enables concurrency and, hopefully, scalability for reads, by having per-node seqlocks. When I set out to implement them both, one interesting question for me was how much overhead does OLC add in single-thread workloads, in other words - can OLC ART be used instead of ART even if concurrency is not needed?

To try to make the answer more meaningful I did a round of optimizations for the latter, with the biggest one being a rewrite of try_read_lock implementation to do a single comparison in the fast path instead of two. Two comparisons is what the pseudocode in the OLC ART paper does. I have also logged my first missed-optimization compiler bug. The changes did not result in dramatic improvements (the try_read_lock one reduced branch mispredictions by 25%) but every little bit helps.

So the current answer is - OLC makes things slower 2x-3x compared to the regular one, in my implementation. This is a rough mean/average of various benchmarks, which vary from ~10% overhead on random gets to 200%-300%-400% (even 600% once) for everything else. Drilling down into these numbers a bit, it's somewhat easy to explain the relatively good numbers for random gets - memory-bound execution, least changes in the OLC algorithm compared to the baseline one. The update algorithms, especially the delete one, are relatively heavy in locking operations. The delete might lock three nodes at once, with restart logic if any one of the three write locks fail. This, however, does not explain, why scans have as much overhead as updates, sharing the same algorithm as random gets.

Another elephant in the room in this comparison is direct heap allocation/deallocation for the regular case vs. Quiescent State-Based Reclamation (QSBR) for OLC. I see a lot of QSBR in the single-thread profiles and I cannot really optimize it, at this point, due to the implementation needing a full rewrite as it does not scale due to a Big QSBR Lock.

And that is the next fun thing to do. I did not find any lock-free QSBR implementation on the internet, so I guess I'll see about writing one.

I love C++ static analysis. I also like linters, and so will happily try away and integrate any reasonable tool out there into CI/CD for UnoDB. So, in the last three years I got to play with quite a few of them.

Let's start with the simplest of them all, the linters. While the simplest, they still vary in complexity, from regex check collections (cpplint) to LLVM-infrastructure-based ones (clang-tidy).

cpplint. Good for having unified header guard style, "// namespace" comments at their ends, wrapping lines at 80 chars. It also tries to enforce Google C++ guidelines of 2012 (?) vintage, with less than impressive results: '<mutex> is an unapproved C++11 header'. OTOH, it integrates with Emacs through flycheck-google-cpplint, making it easy to keep the code compliant while it's being written.
include-what-you-use. Cleans up #include directives. A great tool, must-have for C++ until modules take over, but some of the diagnostics suggest to include some internal header for a C++ standard symbol. As a result, it's somewhat labor-intensive to go through results and apply fixes.
clang-tidy. A source-code linter with local code analysis. Enforced the Rule of 0-3-5 for me many times, helped with Almost Always Auto declarations (sorry Justinai), replaced v.size() == 0 with v.empty() etc. Overall, does not catch bugs, but helps to write better C++ idioms. Integrates with Emacs through clangd for instant feedback.

Next up, compiler warnings, GCC & clang. I went through the docs of both and enabled every reasonable non-default warning, but did not use clang's -Weverything, which is apparently not very reasonable. The warnings made me do several non-default-for-me things:

All the small constants have U suffixes if used with unsigned variables. You don't do unsigned y = 1U, x = y << 3; you do y = 1U, x = y << 3U;
GCC function attributes cold, pure, etc. were suggested and applied. But also there are in-line warning suppressions for false positives.
GCC made me write a class template deduction guide once. I have looked into them before and am happy user of them when somebody else writes them, but as for actually writing one, I had thought hell will freeze over first. Yet, here we are.

Honorable omission: MSVC /W4.

Finally, the actual static analysis tools:

Coverity. I couldn't get the damn thing ("Coverity Build Tool") to run, and believe me, I tried. Guess will check again in six months.
cppcheck. (Hey, an actual project still hosted on SourceForge!) This one punches above its weight in that it's implements its own C++ parser, but has better diagnostics than one would expect from that. Sadly, "better" is not always "good enough." I added a tweak here & there in my code, but most of the time I add suppressions. I also managed to crash it once, unfortunately this means I found more bugs in cppcheck than it did in my code. I tried to report this bug, but the procedure is unfriendly for new reporters, to put it mildly. On the bright side, it is being actively developed and it recently got an LLVM-based parser, so I expect more good things out of it in the future.
Compiler static analysis, GCC & clang. GCC static analyzer is very new, the clang one is a bit older but still not very complete, and so at this point it was more of integrating them into the pipeline, suppressing false positives, and waiting for the new versions to come out. Honorable omission: MSVC /ANALYZE.
Sonatype Lift, called muse.dev originally and then acquired. Its developers were extremely helpful, popping up in my project to comment on false positives and "oh yeah we will fix this ASAP" remarks before I could even review my own run results. As for the actual checks, for C++ code the main backend is FBInfer. It produced several non-obvious-at-first diagnostics which were not false positives nonetheless and required thinking and refactoring to address, resulting in better code structure.
Sonarcloud. This one tries to become the JIRA of everything by including test results, coverage results, and converting diagnostics to tasks. For the actual diagnostics, it has an opinion about everything ("never use std::unique_lock, always use std::lock_guard"), all that done in a web portal. I applied a lot of minor fixes to make this one happy - adding missing 'const', removing redundant template args, tightening class access specifiers, etc, etc, etc.

Omissions: PVS-Studio. Their licence for OSS used to be a rather strange one–requiring to add source code comments "This project is checked by PVS-Studio for free!", but I see it has changed since, and so I might try it next.

Laurynas Biveinis

Wednesday, October 27, 2021

Optimistic lock coupling overhead for single thread Adaptive Radix Tree

Friday, October 01, 2021

C++ linters and static analysis tools I tried for UnoDB