Jonathan Stephens (@jonstephens85)

2026-01-02 | ❤️ 315 | 🔁 36


You can’t 3D reconstruct glass from images…

…WRONG! Thanks for video diffusion, now just about anything is possible!

Introducing…Diffusion Knows Transparency (DKT)

Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don’t follow the “solid object” rules standard cameras expect. DKT is a new AI model that repurposes the “internal physics engine” found in video generation models to solve this problem.

Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor.

To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion.

Without ever seeing a “real” labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS).

They created a “lightweight” 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware.

Two reasons I find this project important:

  1. It further proves that synthetic data will be essential for training the next generation vision models.

  2. In real-world robotic tests, using DKT’s depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis.

Check out the project page here: https://daniellli.github.io/projects/DKT/

Code is LIVE!

Computervision Robotics AI

🔗 원본 링크

미디어

image


Auto-generated - needs manual review

Tags

3D Rendering Robotics AI-ML GenAI Dev-Tools Simulation