Hacker Newsnew | past | comments | ask | show | jobs | submit | Cloudly's favoriteslogin

makes sense - humans have evolved a lot of wetware dedicated to 3D processing from stereo 2D.

I've made some progress on a PoC in 3D reconstruction - detecting planes, edges, pipes from pointclouds from lidar scans, eg : https://youtu.be/-o58qe8egS4 .. and am bootstrapping with in-house gigs as I build out the product.

Essentially it breaks down to a ton of matmulls, and I use a lot of tricks from pre-LLM ML .. this is a domain that perfectly fits RL.

The investors Ive talked to seem to understand that scan-to-cad is a real problem with a viable market - automating 5Bn / yr of manual click-labor. But they want to see traction in the form of early sales of the MVP, which is understandable, especially in the current regime of high interest rates.

Ive not been able to get across to potential investors the vast implications for robotics, AI, AR, VR, VFX that having better / faster / realtime 3D reconstruction will bring. Its great that someone of the caliber of Fei-Fei Li is talking about it.

Robots that interact in the real world will need to make a 3D model in realtime and likely share it efficiently with comrades.

While a gaussian splat model is more efficient than a pointcloud, a model which recognizes a wall as a quad plane is much more efficient still, and needed for realtime communication. There is the old idea that compression is equivalent to AI.

What is stopping us from having a google street-view v3.0 in which I can zoom right into and walk around a shopping mall, or train station or public building ? Our browsers can do this now, essentially rendering quake like 3D environments - the problem is with turning a scan into a lightweight 3D model.

Photogrammetry, where you have hundreds of photos and reconstruct the 3D scene, uses a lot of compute, and the colmap / Structure-from-Motion algorithm predates newer ML approaches and is ripe for a better RL algorithm imo. Ive done experiments where you can manually model a 3D scene from well positioned 360 panorama photos of a building, picking corners, following the outline of walls to make a floorplan etc ... this should be amenable to an RL algorithm. Most 360 panorama photo tours have enough overlap to reconstruct the scene reasonably well.

I have no doubt that we are on the brink of a massive improvement in 3D processing. Its clearly solvable with the ML/RL approaches we currently have .. we dont need AGI. My problem is getting funding to work on it fulltime, equivalently talking an investor into taking that bet :)


This is really impressive; we're getting close to a dream of mine: the ability to generate proper audiobooks from EPUBs. Not just a robotic single voice for everything, but different, consistent voices for each protagonist, with the LLM analyzing the text to guess which voice to use and add an appropriate tone, much like a voice actor would do.

I've tried "EPUB to audiobook" tools, but they are really miles behind what a real narrator accomplishes and make the audiobook impossible to engage with


MIT OCW has a free textbook "The Art of Insight in Science and Engineering: Mastering Complexity", which I usually point folks to on the subject of estimating physical systems. I think this is a real gem. Another work of the author is referenced at the bottom of the article. I'll also vouch for the referenced Guesstimation (though I hate the title).

https://ocw.mit.edu/courses/res-6-011-the-art-of-insight-in-...


On the topic of DPO - I have a Colab notebook to finetune with Unsloth 2x faster and use 50% less memory for DPO if it helps anyone! https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-h...

https://bauble.studio/ is a programmatic 3D art playground that I've been working on for a while now, and I'm pretty excited about it! It's based around signed distance functions, which are a way to represent 3D shapes as, well, functions, and you can do a lot of like weird mathematical distortions and operations that give you cool new shapes. Like average two shapes together, or take the modulo of space to infinitely repeat something... it's a really fun and powerful way to make certain kinds of shapes.

SDFs are very cool in general, and widely used in the generative art communities, but kinda hard to wrangle when you're writing shader code directly. They really are functions, but GLSL doesn't support first-class functions, so if you want to compose shapes you have to manually plumb a bunch of arguments around. So Bauble is essentially a high-level GLSL compiler that lets you model SDFs as first-class values, and as a result you can make a pretty cool 3D shape in just a few lines of code. And then 3D print them!

I need to do some actual work to promote and publicize it once I'm done with the documentation and implement a few more primitives, but it's very close!

The docs have lots of examples of the sorts of things you can do with SDFs: https://bauble.studio/help/

And for examples of some "art" that I've made with it recently:

https://x.com/ianthehenry/status/1839061056301445451 https://x.com/ianthehenry/status/1839649510597013592 https://x.com/ianthehenry/status/1827461714524434883


Their inference server is written in Rust using huggingface’s Candle crate. One of the Moshi authors is also the primary author of Candle.

We’ve also been building our inference stack on top of Candle, I’m really happy with it.


Brave new world, where our machines are sometimes wrong but by gum they are quick about it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: