For example, a matrix product MN, `a b, b c -> a c` is just two nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are connected, so the resulting graph has only two "free" edges `a` and `c`. That's how we know the result is another matrix.
Once you look at tensors this way, a number of things that are normally tricky with standard matrix notation become trivial. Such a higher order derivatives used in neural networks.
I wrote https://tensorcookbook.com/ to give a simple reference for all of this.
My background is in quantum physics, and it is a useful tool for understanding some aspects of entanglement. Here, the idea was to show its use in machine learning in general and deep learning in particular.
I focus on a common case of a single product without any other parts. If there is a summation, it is managed via indices.
If you are interested in arbitrary drawings, you are probably aware of TikZ (if you like coding) or Excalidraw.
Addition does feel a bit hacky, but it may just be necessary for any associative notation. At least it means that rules such as distribution works the same as with classical notation.
I also wrote this tensorgrad library to do symbolic calculations using tensor networks: https://github.com/thomasahle/tensorgrad it keeps track of what you are taking derivatives with respect to. But I agree it would be nice to show more explicitly.
Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
Do you have some recommendations? I'm currently using tikz' graphdrawing library because it supports subgraphs. This is necessary for handling addition, functions and derivatives. However, the force-based layouting doesn't work with subgraphs, which causes a lot of trouble.
> Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
Manim is on the TODO list :-) https://github.com/thomasahle/tensorgrad/issues/7 If you feel like writing some code, I think it could work very well with tensorgrad's step-by-step derivations.
To write a little more: there are two things people mean by ‘tensor’. One is a kind of geometric object that corresponds to a multilinear map between vector spaces (or modules, I suppose), and another is a array indexed by k-tuples (ie what would be called a multidimensional array in C or Fortran programming). For a given choice basis, one can represent a degree k tensor with a k-dimensional array of scalars.
Only certain operations make sense geometrically on tensors (in the sense that they do not depend on the choice of basis) and these can be broken down into:
- tensor products, which take degree n and m tensors and output a degree (n + m) tensor
- contractions which take a degree n tensor and output a degree (n-2) tensor
- generalized transpositions which take a degree n tensor and output a degree n tensor
A matrix multiplication can be seen as a composition of a tensor product and a contraction; a matrix trace is just a contraction.
The Einstein summation convention is a notation which succinctly expresses these geometric operations by describing what one does with the ‘grid of numbers’ representation, combined with the convention that, if an index is repeated in a term twice (an odd number bigger than 1 is meaningless, an even number is equivalent to reapplying the ‘twice’ rule many times) one should implicitly sum the expression for each basis vector for that index. You get: tensor products by juxtaposition, contractions by repeated indexes, and transpositions by reordering indexes.
In numpy, it is for general computation rather than expressing something geometric so one doesn’t need the restrictions on number of times an index occurs. Instead I guess the rule is something like:
- if index is only on lhs, sum over it
- if index on lhs and rhs then don’t sum
- if index only on rhs or repeated on rhs, error
And computationally I guess it’s something like (1) figure out output shape and (2):
for output_index of output:
p = 1
for (input, input_indexes) of (inputs, lhs):
p = p * input[input_indexes(output_index)]
output[output_index] = p
> an even number is equivalent to reapplying the ‘twice’ rule many times
I never heard that extension and in my opinion is a bad idea. One nice property of the Einstein summation notation is that you can reorder the tensors(with their index), and the result does no change. If you allow 4x repetitions this is not valid).
---
Also, a nice property of tensors written in paper is that you can write the index as subscripsts or superscripts to remember how they change when there is a base change. You can only colapse a subscripst with a superscript, so the result of the operation does not depend on the choice of the base. (It would be very nice that Python can remember and check this too to avoid silly error.)
So a vector isn't a vector, a tensor isn't a tensor, einsum isn't actually Einstein summation, automatic differentiation often gives derivatives for things that aren't differentiable, a neural network has nothing to do with neurons, etc etc etc
CS people think it's a crime that humanities people make up words in order to publish but they're really living in an extremely glassy glass house.
This also drives me mad reading many a computer science paper now -- the disrespect for the ordinary meaning of words is only so obscene in the most radical kinds of poetry. This feels wholly out of place in anything called a 'science', and leads to endless amounts of confusion.
There are none in all of academia so brazen in a poetic use of language and at the same time so incurious as to its meaning.
In some ways CS is more correct with the modern (abstract) algebra definition of a vector, vs the historical didactics arrow concept.
Tensor has been abused, it is a property of bilinear maps, and as matrix multiplication is a bilinear operation there is a concept overlap there.
Vectors are really just elements of a vector space, not necessarily arrows.
Among other things, that would make a lot of codde very convenient -- including graph algorithms. This is the idea behind GraphBLAS https://graphblas.org/