Or, to put it another way, you can make anything sound good if you consider only the positives and anything sound bad if you only consider the negatives. Analog computing sounds amazing when you read the brochure and consider only the positives. But when you bring the negatives back in, it makes sense why it is not frequently used. It is not a case of the mainstream keeping some great idea down because, uh, Big Digital or something, it's a case where digital computing turns out to be a stonking good idea and it's hard for the analog world to compete and it's virtually impossible for them to ever be anything but a niche.
Neural networks are an interesting possibility for a future successful niche, although even so, it would be neural networks specifically that may grow in importance and not analog computing in general. And I still wouldn't guarantee it'll be a good idea... we may have a lot of trouble keeping what would be very deeply nested analog circuitry stable in the real world and digital may still win out, e.g., an analog neural net that has a noticeable personality shift when it gets warmer may not be the best engineering solution. That's a question for 20 or 30 years from now.
But if you want an even better exercise, sure, work out how to send, say, precise stock market transactions where a "$5.04" becoming a "$5.05" is a very big deal to some people who have lots of money, and work out a mechanism for verifying the integrity of a lot of such data efficiently.
Bear in mind "efficiently" in this case includes the idea that "$5.04" and "$5.05" are actually close together, not separated by quite a lot of signal bandwidth. It should be of a similar size to the current digital world where that is a single bit; if you're throwing more bandwidth at your representation you've already lost to digital. Or to put it in analog terms, that needs to be pretty close to the noise barrier already; it's not a solution to make it so that you end up with "$5.04 +/- 0.000001" and "$5.05 +/- 0.000001" as the two signals you send. That is, after all, what digital is in the first place: All signals are analog in the end, and we send 0s and 1s with enough separation that the receiver can then re-amplify them into a 0 or a 1 without loss. It's not really analog if you're not hard up against the noise floor, it's just digital wearing an analog wig.
If analog is supposedly "better" than digital... at least, for the sake of argument, I recognize you did not make that claim... that would include being able to do some of these things that we do in the digital world quite comfortably. If it's just a niche... well, that's exactly where we are now in the world anyhow.
Quite a lot of real-world data is quite digital in nature. This message I'm posting is intrinsically digital. Even if I were to write something by hand, we all know, and knew even before computers, that the essence of that message is captured by a stream of letters. When we read the Gettysburg Address it doesn't even occur to us to worry about the theoretically vast amount of information we lose by not having the handwritten original. While those can be of historical interest we all know the payload is in the digital stream of letters and words. You have probably never worried before about whether the nuances of Hacker News posts are lost because they're typed in a fixed-width font but displayed in a proportional one, because the digital text carries the vast majority of the content. Even in an "analog world" there is no escape from quite a lot of digital-characteristic data. And there is no escape from the many, many issues with truly analog solutions, such as the inability to copy data without loss. This text has undergone literally dozens of copies by the time it gets from my keyboard to you eyes, and that would drive the analog world insane. Either accept much more degradation than any of us are used to, or dedicated much much larger amounts of bandwidth to everything in a way we would find horribly inefficient in our real world.
Those problems you mention are important in music synthesis where people could live with limited reconfigurability but reliability is at a premium: synth players in early touring bands (e.g. Yes) had to be electronics technicians and instruments have to survive being packed in boxes and transported everywhere. The Yamaha DX-7 made FM synthesis mainstream because digital FM synthesis was absolutely reliable.
20 B200 hours for CIFAR-10 seems like a lot...
The question of what physical / electronic phenomena is the most efficient yet large enough function space to be used for inference is a really good one to think about. I have no suggestions.
You’ve got to wonder when you have an image generation demo why would you possibly have 64 x 64 pixel output as your demo?
If I’m understanding this properly to generate a 4K image, you need like 5 trillion point to point connections on the chip. Even if power use from the oscillators is zero that’s going to be an issue.
These are cool results but I was disappointed not to find any discussion of where oscillator array technology stands today what the manufacturing challenges/opportunities might be. It seems like it would be prohibitively expensive for anything beyond minimal networks of a few hundred nodes that could be used in sensors. Even if you have perfectly consistent oscillators that synchronize to each other within very fine tolerances, wiring them up to each other is still a massive headache.
But specifically what they’ve simulated here? I don’t see how that would ever work in real life scaled up to any kind of real size.
I’m not criticizing them for starting out small. Lots of things can be proven with small models. I’m saying in principle, I don’t see how this will work unless there’s some fundamentally new technique that is currently not known about. Maybe they have some secret idea but they haven’t shown it here.
Are they allowing all oscillators to influence all others, or are they picking modalities where the influences can be limited to some maximal fixed degree?
One would imagine that there'd be a variety of different topologies available to explore. Even if during training the treatment was fully connected, one could imagine the training itself biasing towards a maximal fixed degree per oscillator, and then inference later operating on a quantized version of that that drops the low-weight influences to zero.
Do you mean that they may get away with less oscillators because of the decoder layer? Well there’s the rub isn’t it, the more work you have done by a software layer the less power you’ve proportionally saved by having it be done by physical computing.
But let’s spitball here what would you estimate would be needed in number of oscillators and interconnects for a 4K image?
One thing I'm unclear on is that their total parameter count scales similarly to conventional models but many of those conventional models incorporate convolutions. I wonder how interconnect count (as opposed to unique parameters) compares to performance?
As to 4k images, I'm not clear how much farther their current architecture would be expected to scale. Single layer networks aren't parameter efficient compared to deep networks; I'd naively assume that to also apply here. That said given their results so far with what amounts to a single layer the naive assumption begins to seem questionable.
We can implement coupled oscillators in hardware but are the couplings and frequencies programmable? If they're being streamed in I guess you'd still have a memory bandwidth bottleneck and associated energy usage. If not then the fair comparison is to a conventional model hardcoded in an ASIC which AFAIU is actually quite energy efficient.