Super impressive looking demo, works well on my older iphone.
As an only-dabbling-hobbiest game developer who lacks a lot of 3d programming knowledge, the only feedback I can offer is you might perhaps define what "Gaussian Splatting" is somewhere on the github or the website. Just the one-liner from wikipedia helps me get more excited about the project and potential uses: Gaussian splatting is a volume rendering technique that deals with the direct rendering of volume data without converting the data into surface or line primitives.
Super high performance clouds and fire and smoke and such? Awesome!
The food scans demo ("Interactivity" examples section) is incredible. Especially Mel's Steak Sandwich looking into the holes in the bread.
The performance seems amazingly good for the apparent level of detail, even on my integrated graphics laptop. Where is this technique most commonly used today?
There's a community of people passionate about scanning all short stuff with handheld devices, drones... Tipatat let us generously use his food scans for the demo. I also enjoy kotohibi flower scans: https://superspl.at/user?id=kotohibi
I'm sure it's not cutting edge, but the app "scaniverse" generates some very nice splats just by you waving your phone around an object for a minute or so.
BabylonJS and the OP's own Aframe [1] seem to have similar licenses, similar number of Github stars and forks, although Aframe seems newer and more game / VR focused.
How do Babylon, Aframe, Three.js, and PlayCanvas [2] compare from those that have used them?
IIUC, PlayCanvas is the most mature, featureful, and performant, but it's commercial. Babylon is the featureful 3D engine, whereas Three.js is fairly raw. Though it has some nice stuff for animation, textures, etc., you're really building your own kit.
Any good experiences (or bad) with any of these?
OP, your demo is rock solid! What's the pitch for Aframe?
How do you see the "gaussian splat" future panning out? Will these be useful for more than visualizations and "digital twins" (in the industrial setting)? Will we be editing them and animating them at any point in the near future? Or to rephrase, when (or will) they be useful for the creative and gaming fields?
A-Frame is an entity component system on top of THREE.js that uses the DOM as a declarative layer for the scene graph. It can be manipulated using the standard APIs and tools that Web developers are used to. Initial target was onboarding Web devs into 3D but found success beyond. The super low barrier of entry (hello world below) without sacrificing functionality made it very popular for people learning programming / 3D (part of the curriculum in many schools / universities) and in advanced scenarios (moonrider.xyz ~100k MAUs (300k MAUs at peak) most popular WebXR content to date is made with A-Frame)
One of the Spark goals is exploring applications of 3D Gaussian Splatting. I don't have all the answers yet but already compelling use cases quickly developing. e.g photogrammetry / scanning where splats represent high frequency detail in an appealing and relatively compact way as you can see in one of the demos (https://sparkjs.dev/examples/interactivity/index.html). There are great examples of video capture already (https://www.4dv.ai/). Looking forward to seeing new applications as we figure out better compression, streaming, relighting, generative models, LOD...
When you say that PlayCanvas is commercial, that's a little misleading. The PlayCanvas Engine (analogous to Three.js and Babylon.js) is free and open source (MIT). The PlayCanvas Engine is where you'll find all the cool 3DGS tech. There are two further frameworks that wrap the Engine (for those that prefer to use a declarative interface): PlayCanvas Web Components and PlayCanvas React. Again, both of these are free and open source (MIT). Only the PlayCanvas Editor (analogous to a browser-based Unity) has optional payment plans (for those that want to create private projects).
Did a test study in BabylonJS, and generally the subset of compatible features is browser specific.
The good:
1. Blender plugin for baked mesh animation export to stream asset is cool
2. the procedural texture tricks combined with displacement maps mean making reasonable looking in game ocean/water possible with some tweaking
3. adding 2D sprite swap out for distant objects is trivial (think Paper Mario style)
The bad:
1. burns gpu vram far faster than normal engines (dynamic paint bloats up fast when duplicating aliases etc. )
2. JS burns CPU cycles, but the wasm support is reasonable for physics/collision
3. all resources are exposed to end users (expect unsophisticated cheaters/cloners)
The ugly:
1. mobile gpu support on 90% of devices is patchwork
2. baked lighting ymmv (we tinted the gpu smoke VFX to cheat volumetric scattering)
3. in browser games essentially combine the worst aspects of browser memory waste, and security sandbox issues (audio sync is always bad in browser games)
Anecdotally, I would only recommend the engine for server hosted transactional games (i.e. cards or board games could be a good fit.)
Otherwise, if people want something that is performant, and doesn't look awful.... Than just use the Unreal engine, and hire someone that mastered efficient shader tricks. =3
Personaly I have been using babylonJs for five years. And I just love it. For me it's so easy to program ( cleanest API I have ever seen) and my 3D runtime is so light, my demos work fine even on my android phone.
Web browsers add a lot of unnecessary overhead, and require dancing with quarterly changes in policies.
In general, most iOS devices are forced to use/link their proprietary JS vm API implementation. While Babylon makes it easier, it often had features NERF'd by both Apple iOS, and Alphabet Android. In the former case it is driven by a business App walled garden, and in the latter it is device design fragmentation.
I like Babylon in many ways too, but we have to acknowledge the limitations in deployment impacting end users. People often end up patching every update Mozilla/Apple/Microsoft pushes.
Thus, difficult to deploy something unaffected by platform specific codecs, media syncing, and interface hook shenanigans.
This coverage issue is trivial to handle in Unity, GoDot, and Unreal.
The App store people always want their cut, and will find convenient excuses to nudge that policy. It is the price of admission on mobile... YMMV =3
One component of my hobby web app project is a wavetable. Below are two examples of wavetables. I want it to not tax the browser so that other, latency sensitive, components do not suffer.
Would you have any suggestions on what JS/TS package to use? I built a quick prototype in three.js but I am neither a 3D person nor a web dev, so I would appreciate your advice.
* recall fixed rate producer/consumers should lock relative phase when the garbage collector decides to ruin your day, things like software FIR filters are also fine, and a single-thread output pre-mixed stream will eventually buffer though whatever abstraction the local users have setup (i.e. while the GC does its thing... playback sounds continuous.)
Inside a VM we are unfortunately at the mercy of the garbage collector, and any assumptions JIT compiled languages make. Yet wasm should be able to push io transfers fast enough for software mixers on modern cpus.
Cool work, but I have to say the performance is pretty bad in Firefox on my laptop with an Nvidia RTX A3000 GPU. There are enough shader cores here to cause first degree burns.
Do you have any insights into the current performance bottlenecks? Especially around dynamic scenes. That particle simulation one seems to struggle but then improves dramatically when the camera is rotated, implying the static background is much heavier than it appears.
And as a counterpoint to the bottlenecks, that Sierpinski pyramid, procedurally, is brilliant.
Number of splats in the scene and distribution have an impact on performance. Probably in your case you turned the camera in a direction with less splats. There's definitely work to do to deliver consistent performance. We'll probably look into an LOD system next.
I'm still highly skeptical of gaussian splatting as anything more than a demo. The files are too large. The steak sandwich is 12meg (as just one example)
There was a guassain splat based Matterport port clone at least year's siggraph. To view a 2 bedroom apartment required streaming 1.5gig
Thanks! Notice 12MB steak sandwich is the biggest of them all. Rest are < 10MB and several of those very compelling in the 1-3MB range (e.g: Iberico Sandwich 1MB, Clams and Caviar 1.8MB).
Fancier compression methods are coming (e.g SOGS). This is 30MB!
How much of the huge file size is because you need tons of splats to simulate a hard surface? Conceptually the splats seems flawed because gaussians don't have hard edges - they literally go to infinity in all directions, just at vanishingly small densities. So practically everybody cuts them off at 3 sigma or something, which covers 99.7% of the volume. But real-world objects have hard edges, and splats don't.
Would the format work better if you made that cut-off at something like 1 sigma instead? Then instead of these blurry blobs you'd effectively be rendering ovals with hard edges. I speculate out loud that maybe you could get a better render with fewer hard-edged ovals than tons of blurry blobs.
It's an interesting idea, and with spark you could test this by adjusting the parameter maxStdDev to control how far out it draws the splat.
I agree with you though that in general 3DGS is a worse representation for hard, flat, synthetic things with hard edges. But in the flip side, I would argue it's a better representation for many organic, real-world things, like imagine fur or hair or leaves on a tree... These are things that can render beautifully photo realistically in a way that would require much, much more complex polygon geometry and texturing and careful sorting and blending of semi-transparent texels. This is one reason why 3DGS has become so popular in scanning and 3D reconstruction.. you just get much better results with smaller file sizes. When 3DGS first appeared, everyone was shocked by how photorealistic you could render things in real time on a mobile device!
But one final thought I want to add: with Spark it's not an either/or. You can have BOTH in the same Three.js scene and they will blend together perfectly via the Z-buffer. So you can scan the world around you and render it with 3DGS, and then insert your hard-edged robot character polygon meshes right into that world, and get the best of both!
Cool - thanks for explaining that. I totally see how each has its place.
I imagine it's pretty complex to take the raw scan data and generate 3dgs. Are these algorithms simple & standard, or do they take a fair amount of tuning & tweaking to do a good job? Adapting these to work well with hard-edge ovals seems like it would take some work, and a lot more work to get them to output a mix of ovals & fuzzy blobs. But if you could do that, I agree the combination would be amazingly expressive.
There are a lot of tools to do this easily today, for free! Take a look at Postshot, or Brush. You can literally take a video with your mobile phone, toss it in Postshot, and a few minutes later you have a photorealistic 3DGS model you can use in Spark!
3DGS is still a rapidly evolving research field, but the "baseline" is pretty much standard these days.
The SOGS compression technique works well. You can get 1M Gaussians with full spherical harmonics in about 14MB. There's a good article about it on the PlayCanvas blog:
Wish I could see this! My iPhone 16 blocked viewing because of, I think, expired certificate. At least, that’s the error I think I got initially and now it just says the page belongs to a category that is blocked. :(
How do you do the rendering? Is it sorted (radix?) instances? Do you amortize the sorting over a couple frames? Or use some bin sorting? Are you happy with the performance?
Yes, Spark does instanced rendering of quads, one covering each Gaussian splat. The sorting is done by 1) calculating sort distance for every splat on the GPU, 2) reading it back to the CPU as float16s, 3) doing a 1-pass bucket sort to get an ordering of all the splats from back to front.
On most newer devices the sorting can happen pretty much every frame with approx 1 frame latency, and runs in parallel on a Web Worker. So the sorting itself has minimal performance impact, and because of that Spark can do fully dynamic 3DGS where every splat can move independently each frame!
On some older Android devices it can be a few frames worth of latency, and in that case you could say it's amortized over a few frames. But since it all happens in parallel there's no real impact to the overall rendering performance. I expect for most devices the sorting in Spark is mostly a solved problem, especially with increasing memory bandwidth and shared CPU-GPU memory.
If you say 1 pass bucket sorting.. I assume you do sort the buckets as well?
I've implemented a radix sort on GPU to sort the splats (every frame).. and I'm not quite happy with performance yet. A radix sort (+ prefix scan) is quite involved with lot's of dedicated hierarchical compute shaders.. I might have to get back to tune it.
I might switch to float16s as well, I'm a bit hesitant, as 1 million+ splats, may exceed the precision of halfs.
We are purposefully trading off some sorting precision for speed with float16, and for scenes with large Z extents you'd probably get more Z-fighting, so I'm not sure if I'd recommend it for you if your goal is max reconstruction accuracy! But we'll likely add a 2-pass sort (i.e. radix sort with a large base / #buckets) in the future for higher precision (user selectable so you can decide what's more important for you). But I will say that implementing a sort on the CPU is much simpler than on the GPU, so it opens up possibilities if you're willing to do a readback from GPU to CPU and tolerate at least 1 frame of latency (usually not perceivable).
You might want to consider using words (16 bit integer) instead of halfs? Then you can use all the 65k value precision in a range you choose (by remapping 32bit floats to words), potentially adjust it every frame, or with a delay.
Yeah you're right, using float16 gets us 0x7C00 buckets of resolution only. We could explicitly turn it into a log encoding and spread it over 2^16 buckets and get 2x the range there! Other renderers do this dynamic per-frame range adjustment, we could do that too.
obj is traditional geometry (vertices, triangles). gaussian splats is a different way to represent 3D information (simplifying. it's a point cloud where each point it's an ellipsoid with view dependent color)
The WebGL API is based on the OpenGL ES standard, which jettisoned a lot of the procedural pipeline calls that made it easy to write CPU-bound 3D logic.
The tradeoff is initial complexity (your "hello world" for WebGL showing one object will include a shader and priming data arrays for that shader), but as consequence of design the API sort of forces more computation into the GPU layer, so the fact JavaScript is driving it matters very little.
THREE.js adds a nice layer of abstraction atop that metal.
Spark allows you to construct compute graphs at runtime in Javascript and have them compiled and run on the GPU and not be bound by the CPU: https://sparkjs.dev/docs/dyno-overview/
WebGL2 isn't the best graphics API, but it allows anyone to write Javascript code to harness the GPU for compute and rendering, and run on pretty much any device via the web browser. That's pretty amazing IMO!
Yes. We have demos working already. Those 3D gaussian videos (or 4D that some people call) are really big so we're figuring out what's the best way to distribute and make it a great experience.
Most 4DGS reconstruction methods right now are exactly that: setting up many cameras and recording them simultaneously so you can reconstruct each instant in time as a 3DGS. In the future it might be possible to use a single camera and have an AI/ML method figure out how all the 3D gaussians move over time, including parts that are occluded from the single camera!
We seem to have to poles: extreme realism, and extremely minimalistic pixel art. I prefer the second camp. But your project looks really important in the first camp.
Thanks! It works for both! An under-explored area is converting into splats assets created in “traditional” ways (e.g blender). Better visual results in some scenarios (high freq detail). See the furry logo in the homepage carousel.
Cool stuff. Do you have examples with semi-transparent surfaces? Something like a toy christmas tree inside a glass sphere, with basic reflections and refractions calculated?
Wait, you renamed Forge (https://forge.dev) released last week by World Labs, a startup that raised $230M.
Is this "I worked with some friends and I hope you find useful" or is it "So proud of the World Labs team that made this happen, and we are making this open source for everyone" (CEO, World Labs)?
Yes. I collaborated with one of the devs at World Labs on this. The goal is to explore new rendering techniques and popularize adoption of 3D gaussian splatting. There's no product associated with it.