Monday, November 28, 2022

Rust for a C++ Engineer

Collected from reading https://doc.rust-lang.org/book/. The notes are organized by topic rather than by the book order.

Primitive types

  • Integer types are sized, except for isize and usize, whose width is architecture-specific, and corresponds to C++ std::ssize_t/std::intptr_t and std::size_t/std::uintptr_t respectively.
  • char != u8 != i8, oh thank god
  • String literals "foo" in UTF-8. They are slices (see below).
  • Tuples are built into language, use parentheses syntax, roughly correspond to std::tuple and std::pair. Can also do .i syntax to access the i-th element. Unit tuple ().
  • Fixed-size arrays are built into language, use bracket syntax, roughly correspond to std::array with mandatory bounds-checking.
  • Simple types implement Copy trait (i.e. scalars, correspond to value types in Java) and their assignment copies. Tuples copy if all their elements copy, and this may include arbitrary large tuples. Otherwise, assignment moves. A copy is done by an explicit .clone() operation.
  • String slices have type &str, can be created with &var[x..y] syntax, x and/or y may be omitted if they are zero and length respectively. Slice consists of a pointer and length.
  • General slices are very similar. They have &[element_type] type.
  • ! is an empty type with no values. Used as return type for functions that never return. It can be coerced into any other type, which is used for i.e. match arms which e.g. continue or panic!.

Variables & type inference

  • Variables (let) are immutable by default, but their names can be shadowed, allowing sequences of changes in what a given variable name means. The type can be changed too. Mutable variables can be declared with let mut.
  • Even immutable variables may be declared without initialization.
  • Variable types are inferred, but can be annotated when/if needed.
  • Constants are declared with const, and require type annotations.
  • Function parameter and return types always must be provided.

Lifetimes, references & borrow checker

  • All variables are owned by their enclosing scope. The ownership may be passed around, when the last owner scope exits, the variable is destroyed (drop is executed if the Drop trait is implemented, memory released).
  • Passing variable into a function moves it (or copies), transferring its ownership into the function. Likewise returning it moves it (or copies), transferring the return value ownership to the caller.
  • Creating references is called borrowing. Immutable references do not allow modifying the pointed-to variable and are borrowed with &, can be passed around without transferring ownership. Several of them may be active at the same time. Mutable references are borrowed with &mut, and no other mutable or immutable references may exist at the same time.
  • Each reference has a lifetime. Most cases are handled by implicit lifetimes. Lifetimes are named by 'a, usually very short names, placed after &.
  • For function signatures, generic lifetime parameters use angle brackets:
fn foo<'a>(x: &'a str, y: &'a str) -> &'a str
  • Lifetime annotations in struct definitions limit struct lifetime to that of its fields.
  • If there are multiple input lifetime parameters, but one of them is &self or &mut self, its lifetime is assigned to all output lifetime parameters.
  • Deref coercion converts a reference to a Deref-implementing type to a reference to a different type. I.e. &String to &str. Happens automatically on parameter-argument type mismatch: from &T to &U when T: Deref<Target=U>. Internally as many .deref() calls are inserted as needed.
  • For mutable references, implement DerefMut trait. Two extra deref coercion rules: from &mut T to &mut U when T: DerefMut<Target=U> and from &mut T to &U when T: Deref<Target=U>.
  • The Drop trait is the closest thing to a C++ destructor, adds a drop method that takes a mutable reference to self.
  • For structs, field lifetimes are part of the struct type, and should be specified where the struct type is specified.
  • 'static lifetime is the lifetime of the whole program.
  • Lifetimes are a type of generics, so a function with both lifetimes and generic type parameters lists both together in angle brackets.

On paper Rust lifetimes appear to be a genius idea. Besides manual resource management and GC, this is a viable third option that combines the advantages of the two while avoiding their disadvantages, althought only partially so.

Google Chrome developers C++ tried this and did not succeed enough for it to be viable: Borrowing Trouble: The Difficulties of a C++ Borrow-Checker.

Statements & expressions

  • Last nonterminal symbol in a block may be expression (lacking the final semicolon), in which case the whole block is an expression with this return value. This makes return expression; in functions replaceable with expression. This also merges if statement and ternary operator to a single if expression.
  • loop starts an infinite loop. break may return an expression, making the loop an expression. Nested loops may have labels 'label: loop {, then possible to do break 'label;.
  • for is a range loop. When possible, for loops seem to be more idiomatic than while loops.

Type aliases, structs & enums

  • Rust type alias type Foo = ExistingType is like C++ using Foo = ExistingType. Can be generic: type Foo<T> = std::result::Result<T, std::io::Error>.
  • Structures use struct keyword, contain only type-annotated fields.
  • If during a struct variable construction a field and a var it is initialized from have the same name, one of the can be omitted (field init shorthand).
  • ..var in the struct variable construction takes all the unspecified fields from var of the same struct type.
  • Tuple structs struct foo(i64, i64, i64) are structs that are very similar to tuples. Fields are unnamed.
  • Unit structs struct Foo;
  • If structs need to store references, then lifetimes have to be used, that's for later.
  • Attribute #[derive(Debug)] for struct allows doing {:?} in println! to dump the fields.
  • dbg!(value) macro maybe inserted as an expression to dump value
  • struct may have methods attached to them, in separate impl StructName blocks.
  • The first arg of a method may be one of
    • &self, corresponds to a C++ const method;
    • &mut self, corresponds to a regular C++ method;
    • self, consumes the object;
  • A function in an impl StructName not taking a self is an associated function, not a method, corresponding a C++ static method. Called through StructName:: syntax.
  • There may be muliple impl blocks.
  • The simplest Rust enum roughly matches C++ enum class. But then each enum variant may have different associated data with it, making it similar to std::variant
  • Standard library Option enum handles the use cases for nullptr (a null reference does not exist in Rust). Similar to C++ std::optional.

Closures & function pointers

  • Closures: || with parameters inside followed by an (optionally bracketed) expression. Parameter and return types are not annotated usually.
  • Once closure types are inferred, they don't change, cannot call the same closure with different ones.
  • Variables are captured by different types of borrowing / ownership taking implicitly depending on what the code does. move keyword before || forces taking ownership, when the body does not need it implicitly. One use case is passing data to a new thread.
  • All closures implement FnOnce trait, meaning they can be called once.
  • Closures that mutate captured values but don't move them out implement FnOnce and FnMut.
  • Closures that don't mutate captured values and don't move them out implement FnOnce, FnMut, and Fn.
  • All functions coerce to the fn type, which is the function pointer type. It implements all of FnOnce, FnMut, Fn.
  • To return a closure, use a trait object, e.g. -> Box<dyn Fn ... >

Generics & traits

  • Generics (types) and traits (behavior) resemble C++ templates. Traits also resemble interfaces in other languages.
  • Generic type uses must be constrained by traits–no SFINAE. C++ concepts.
  • impl Foo<f32> {...} adds implementation for a specific type, similar to C++ template specialization.
  • Separate traits are implemented for structs in separate blocks: impl Trait for Struct { ... }
  • Traits need to be brought into scope too, pulling in an implementing type is not sufficient to call trait methods.
  • Can implement a local trait on an external type or an external trait on an local type, but not an external trait on an external type (so no C++ std::hash specialization for a std:: type). This is to avoid allowing multiple trait implementations for the same type.
  • Trait methods may have default implementations, which may call other, possibly unimplemented methods in the same trait. The default implementation may not be called from an overriding implementation.
  • Trait-type parameters without generics syntax: fn foo(bar: &impl Trait).
  • Trait bounds, using generics syntax: fn foo<T: Trait>(bar: &T). Same as above.
  • Multiple trait bounds: fn foo(bar: &(impl Trait1 + Trait2)) and fn foo<T: Trait1 + Trait2>(bar: &T)
  • In the case trait bounds become long, where clauses pull them aside:
fn foo<T, U>(t: &T, u: &U) -> Result
where
    T: Trait1 + Trait2,
    U: Trait1 + Trait3,
{
    ...
}
  • Can use fn ... -> impl Trait to return a trait-implementing type, as long as it's a single type.
  • Can conditionally implement methods for generic structs by adding trait bounds to their implementation: impl<T: Trait> Type<T> { ... }. These are called blanket implementations.
  • Rust does not have OOP inheritance. Some form is available through default trait method implementations. Dynamic dispatch (C++ virtual methods) is through trait objects.
  • A trait object is pointer to an instance of a type and a pointer to a vtable.
  • Struct and enum vars in Rust are not objects, trait objects come close, but they cannot contain data.
  • Must be a reference (or a smart pointer) to dyn trait type, i.e. Box<dyn Trait>.
  • Associated types. type Name allows to use Name as a type in a trait before its declaration is given by the trait implementors. In C++ one would use template argument dependent typenames.
  • Default generic type parameters: <T=DefaultType>.
  • foo.bar() can be replaced by Type::bar(&foo) when bar is implemented by more than trait to disambiguate. If even more disambiguation is needed, i.e. for associated methods without self parameter, <Type as Trait>::bar calls the method from Trait as implemented for Type.
  • If a trait depends on another trait, the latter is called a supertrait: trait Foo: SuperTraitBar.
  • Newtype pattern. One use case: implement external traits on external types, declare a new thin wrapper tuple struct. There are other use cases.

Error handling

  • panic! macro exits (or aborts, depending on config) on unrecoverable error.
  • Errors are handled using Result enum, which can be Ok or Err.
  • unwrap returns the success variant of Result or panics.
  • unwrap_or_else executes given code instead of panicking.
  • expect is like unwrap with a given error message for panicking.
  • ? operator after a call, e.g. let foo = bar()?; unwraps returned Result, or returns from the caller with the error. If the error types do not match, From trait converts.
  • ? operator works with Option return types too.

Iterators

  • Rust iterators correspond to C++ ranges (or iterator pairs).
  • Calling .iter() on a collection roughly corresponds to a C++ .cbegin(), except that the latter is not a range. Other options are .into_iter() to take ownership of values–not sure what a direct C++ mapping would be–and .iter_mut() over mutable references (C++ .begin()).
  • Iterators implement the Iterator trait.
  • Iterator .collect() method returns a new collection from iterating.
  • Code using iterator adapters might be faster than equivalent loop-based code. An example of Rust zero-cost abstractions, which of course is found in C++ as well.

Pattern matching & related control flow

  • Pattern matching can decompose a tuple to local vars, and do many other things.
  • match is a generalized switch with pattern matching, variable binding, and more.
  • _ is a catch-all non-binding pattern, like the default in switch. Ignores the entire value.
  • if let behaves like a single match arm, combining if with variable binding in the case of true condition.
  • while let loop repeats until its pattern matches.
  • The value after for keyword in a for loop is a pattern.
  • let keyword takes a pattern, not a variable id.
  • Function parameters are patterns.
  • Patterns are refutable and irrefutable, the latter ones matching any possible passed value. Function parameters, let, and for take irrefutable patterns. if let and while let take both kinds, with a compiler warning if irrefutable (as that creates always-true if or an infinite loop while). match arms must be refutable except for the last one, which should be irrefutable (if the possibilities were not exhausted until then).
  • Multiple patterns can be combined with |.
  • An inclusive range of values can be matched with ..=. The range cannot be empty.
  • Struct destructuring: Foo { x: a, y: b } = v gets a and b. If field and var names match, then Foo { x, y } = v. Literals can be used too.
  • Enum destructuring Foo::Variant { x, y }, Foo::VariantWithNoData.
  • Can destructure arbitrarily deep nested structs and enums.
  • Nested _ ignores just that part.
  • Starting a variable name with an _ suppressed unused variable warnings for it.
  • .. is a greedy sequence of _, i.e.
let numbers = (1, 5, 7, 20, 30);
match numbers {
    (first, .., last) => ...
}
  • match arms may have match guards which are extra if conditions that can use the bound vars. Some(x) if x > 5. Exhaustiveness is not checked.
  • @ bindings allow to create a var holding the tested value at the match time, i.e. Message::Hello { id: id_var @ 3..=7 }.

Standard library types

  • Dynamic strings: String, would correspond to std::string type, but UTF-8. Display trait adds to_string method. Not indexable to avoid byte/UTF-8 encoding mixup. Slicing is allowed but runtime-checked to fall on char boundary. To disambiguate byte/char interpretation, use .chars() or .bytes().
  • Standard library vectors match std::vector. A macro vec![1, 2, 3] to create a vector with given contents. Ownership/borrowing rules apply to whole vector, i.e. if a mutable reference to the first element is taken, a new one cannot be pushed to the back.
  • Standard library hash maps correspond to std::unordered_map.
  • std::thread::spawn(closure) -> std::thread::thread(callable).

Smart pointers & dynamically-sized (unsized) types

  • Smart pointers own data. They implement Deref and Drop traits. String and Vec<T> are smart pointers.
  • Box<T> is like std::unique_ptr<T> in C++, except that Rust is more likely to use plain references and lifetimes, so no 1:1 mapping in i.e. rewrite. Box::new is std::make_unique.
  • Implementing Deref trait enables dereferencing with the * operator, like overloading C++ * and -> operators does.
  • Under the hood *x is transformed to *(x.deref()) exactly once.
  • std::mem::drop corresponds to C++ std::unique_ptr::reset or other early destruction.
  • Rc<T> matches std::shared_ptr. Rc::clone method matches std::shared_ptr copy constructor.
  • Interior mutability pattern: unsafe code to mutate data inside an immutable value even with immutable references present.
  • RefCell<T> does borrow checking at runtime instead of compile time. borrow and borrow_mut methods.
  • Rc<RefCell<T>> pattern implements multiple owners to potentially-mutable data.
  • Weak<T> matches std::weak_ptr. Constructed by Rc::downgrade. Upgraded by upgrade method, corresponding to std::weak_ptr::lock.
  • Dynamically sized types (DST) or unsized types, whose sizes are only known at the runtime. Cannot create variables of such types directly, naturally always hidden in some pointer + size structure. Rust automatically implements Sized trait for every non-DST, and implicitly bounds by it for every generic function. To relax the latter, fn foo<T: ?Sized>.... The ?Trait syntax is only available for Sized trait.

Operator overloading

  • Operator overloading by implementing the desired traits in std::ops.

Concurrency

  • thread::spawn returns a JoinHandle, which has a join method, similar to C++ std::thread::join.
  • Message passing for inter-thread communication, like in Go. Channels, std::sync::mpsc::channel(). The endpoints have send, recv, try_recv methods. The receiver implements Iterator too. The channels may have multiple transmitters (it's MPSC), which can be created by .clone.
  • Messages must implement Send trait. If a type is composed of Send types only, it becomes Send automatically.
  • Types whose variables are safe to be referenced from multiple threads implement Sync trait. If &T is Send, then T is Sync. A type made of Sync types only is Sync automatically.
  • Mutex<T> is a mutex-guarded variable of T. .lock() returns a LockResult, which has a (potentially mutable) reference MutexGuard to the guarded data. The guard unlocks when it goes out of scope. Mutex implements interior mutability.
  • Arc<T> corresponds to C++ std::atomic<std::shared_ptr>.
  • To actually share mutexes between threads, wrap them: Arc<Mutex<T>>.

Assorted standard library functionality

  • std::env::args is for int argc, char *argv[]. It's Unicode-checking, if that hurts then std::env::args_os.
  • std::process::exit is for exit
  • std::env::var is for getenv
  • println! prints to stdout, eprintln! to stderr.

Build and dependency management

  • cargo seems to be a much better story than CMake hell or its alternatives.
  • A crate is the smallest compilation unit, either a library crate, or a binary crate. Usually means the former. A crate root is the starting source file in it. A package is a bundle of crates with at most one library crate. Standard paths inside a package: src/main.rs, src/lib.rs, src/bin.
  • Release profiles correspond to a mixture of CMake build configurations, NDEBUG define, etc. in C++. dev profile corresponds to Debug, and release~to ~Release (or RelWithDebInfo?).
  • Can customize profiles in Cargo.toml, i.e. optimization levels.
  • Workspaces organize related packages together in large projects, to share directory root, Cargo.toml, and Cargo.lock.

Modules

Not familiar enough with C++ modules to compare.

  • Modules (and submodules) inside a crate do namespaceing and public/private. src/modulename.rs, src/modulename/submodulename.rs. Modules can be private or public, declared with pub mod and mod.
  • super:: as a part of name path goes one level up.
  • use keyword imports. Idiomatically functions are imported through their parent module, everything else directly.
  • use ... as ... creates name alias.
  • pub use re-exports. Used to organize and collect public API from several potentially nested submodules.
  • Nested path syntax: use foo::{bar, baz, self};, globs use foo::*;

Tooling

  • rustfmt formats, so does clang-format.
  • Clippy the linter.
  • rust-analyzer for LSP support.

Documentation

  • Documentation header comments start with /// and support Markdown. Built by cargo doc [--open].
  • Typical API doc sections: Examples, Panics, Errors, Safety.
  • Contained documentation comments start with //!, typically used for crates and modules.

Testing & benchmarking

  • #[test] annotates a function to be a test, i.e. Google Test TEST macro in C++. Tests run in parallel by default.
  • assert_eq! is like gtest ASSERT_EQ, except that the args are 'left' and 'right' instead of 'expected' and 'actual' or similar. Equality asserts may be applied on types implementing PartialEq and Debug traits.
  • assert! may take 2nd and subsequent args for a message in the case of failure.
  • Tests annotated with #[should_panic] test that the annotated function panics, similar but not identical to gtest death tests. Best to add expected parameter to the attribute to specify the reason for panic.
  • Tests may also be implemented by returning a Result<T, E>.
  • Benchmark tests correspond to Google Benchmark, but unstable ATM.
  • Documentation tests can compile API examples automatically.
  • Unit tests go with the code they test, mod tests annotated with #[cfg(test)]
  • Visibility rules happen to allow the testing of private functions.
  • Integration tests go to a top-level tests directory, no configuration annotation. Each file there is a separate (test) crate–if that's not what's needed, i.e. for setup code, use foo/mod.rs naming convention for non-tests.
  • cargo test runs in sequence: unit, integration, doc, does not go to the next category if failure.
  • Binary crates cannot have integration tests directly. The usual thing to do is to always have a library crate with a binary crate as minimal as possible.

Macros

  • There are macros, names trailing with exclamation mark (println!).
  • Macros can take Rust code and expand to a different Rust code. A difference from C++ preprocessor that it works on the AST, not textually. While powerful, how well does this work with tooling? Do they run macros? Can they refactor macros?
  • Declarative macros (macro_rules!) pattern-match given code to produce code.
  • #[macro_export] annotation for public macros.
  • Procedural macros take token stream input and produce token stream output.
  • One kind is custom derive macros that add code to a struct implementation.
  • Attribute-like macros allow creating new attributes.
  • Function-like macros are close to C preprocessor function-like macros, except that they also operate on TokenStream and not on arguments directly. Can take variable number of arguments.

Unsafe Rust & FFI

  • unsafe { ... }: allows some, well, unsafe features
  • unsafe can dereference raw pointers *const T, *mut T. Raw pointers are just like C raw pointers.
  • unsafe fn foo() {}, then fn can be called from unsafe code.
  • extern "C" { fn putenv ... } for FFI, may only be called from unsafe code.
  • To make Rust function callable from external code, add #[no_mangle] annotation and pub extern "C" before the fn.
  • Static variables may be declared with static FOO_BAR: type = value; Immutable static vars have an address in memory; constants don't. All mutable static vars are unsafe.
  • unsafe trait Foo, unsafe impl Foo for Bar.
  • union types exist, mainly used for interfacing with C unions, accessing fields is unsafe.
  • Raw identifier syntax r#while allows using e.g. a keyword for an identifier. Useful for FFI and different Rust edition interfacing.

No comments: