We wanted to have finally encrypt the L2 links between our DCs and got quotes from a number of providers for hardware appliances, and I was like, "no WAY this ought to cost that much!', and went off to try to build something myself that hauled Ethernet frames over a wireguard overlay network at 10Gbps using COTS hardware. I did pull it off after a tenday of work or so, undercutting the cheapest offer by about 70% (and the most expensive one by about 95% or so...), but there was a lot of intricate reading and experimentation involved.
I am looking forward to validate my understanding against the content of this article - it looks very promising and comprehensive at first and second glance! Thanks for creating and posting it.
https://github.com/m13253/VxWireguard-Generator
https://gitlab.com/NickCao/RAIT
Both build a set of Wireguard configurations so you can setup a L2 mesh, and then run whatever routing protocol you want on them (Babel, BGP, etc)
(not the OP, but I use these the first one in my own multi-site network mesh between DO, AWS, 2x physical DC, and our office.)
At the "outside", there's two NIC with SFP+ ports that are connected via single mode optical fiber that runs through the city - let's call these NICs eth0 on each of their nodes. eth0 have RFC1918 IP addresses assigned and can talk IP with each other. Between those nodes, a wireguard instance encrypts traffic in an inner RFC1918-network of its own - that is wg0 on each node. (Initially, I had IPv6 ULA networks prepared for these two pruposes, but afaict there's some important offload support missing for IPv6 in Linux still, and performance was quite severly hampered by that.) Then, each of the nodes defines a GRETAP netdev that has, as its endpoint, the peer's wireguard interface address - that interface is grt0.
Finally, on each side, another NIC SFP+ port (let's assume eth1) using a DAC plugs into the local switch uplink port. eth1 configure in promiscuous mode, and some `tc-mirred(8)` magic makes sure every frame it receives gets replayed over grt0, and every frame that is received via grt0 gets replayed over eth1.
So it kinda looks like this in a (badly "designed") ad-hoc ASCII graph:
[SWITCH]-<dac>-[ETH1]-<tc>-[GRT0]-[WG0]-[ETH0]-<fiber>-...
... with the whole shebang replicated once more, but in reverse, on the right-hand-side of the <fiber> cable/element.An earlier iteration I (briefly ;)) had in operation featured a Linux bridge instead of tc, but it quickly turned out that won't work with a few L2 protocols that we unfortunately need in operation across these links (and group_fwd_mask won't cut it for them either, so patching the kernel would have been necessary), while tc-mirred can actually replay L2 traffic without any restrictions.
So the only chance of running any of the commands in the article are when playing around with my own systems. I guess they would be useful too if I were working as Platform engineer.
This is one of the things I don’t like much about kubernetes: the networking model assume you only have one nic (like 99.99999% of cloud instances from cloud providers) and that your application is dumb enough not to need knowledge of anything beneath.
The whole networking model could really get a 2020-era overhaul for simplification and improvement.
and then we have `net.ipv4.tcp_wmem` which bring two questions: 1. why there is no IPv6 equivalent and 2. what's the difference from `net.core.wmem_max` ?
net.ipv4.tcp_wmem is a triple value, with minimum, default and maximum values. The maximum given here cannot exceed the previous value.
TCP is a protocol that should be the same regardless whether it is transported by IPv4 or by IPv6.
See e.g.
https://docs.redhat.com/en/documentation/red_hat_data_grid/7...
0. Cache invalidation
1. Naming things
2. Off by one errors