The basic idea is the same as micrograd: each number is a `Value` node in a computation graph, ops connect nodes, and the `backward` function topologically sorts the graph before applying the chain rule in reverse. The C-specific parts are memory management and two simple data structures I needed to implement backprop: sets and vectors.
The source code is about 1,350 lines, MIT licensed, and well documented. Dependencies are just the standard library and libm. In addition, the repo contains two examples to showcase how the engine works: a toy regression and an MNIST task.
What this is not: a framework to build and train neural networks in production. Being scalar-valued makes it slow, and it wasn't built for numerical robustness or large datasets. There's no commercial aim here; it's a learning project.
If you read through it, I'd like to hear thoughts, both on the ML engineering aspect and on anything that reads as un-idiomatic C.
> Reference counting buys correctness and composability, but at a cost.
> Disadvantage #1: you must balance every reference. Each value_create, value_retain, and each operation's implicit retain of its operands has to be matched by a value_release. Forget one and you leak; do one too many and you free memory that is still in use. The training examples in examples/ are verbose precisely because they are scrupulous about this in their error paths; that verbosity is the price of leak-free C.