Pablo Vela (@pablovelagomez1)
2025-05-09 | โค๏ธ 485 | ๐ 66
MVP of Multiview Video โ Camera parameters + 3D keypoints. Visualized with @rerundotio
The basic pipeline as of right now looks like this:
- Capture ๐ด โ Using 4 iPhones and an Insta360ย Go. iPhone videos are captured via Finalย Cutย Pro Multicam for easy sync and the exocentric view; the Insta360ย Go is used for the egocentric view.
- Sync ๐ โ Custom Gradio app using two @rerundotio viewers and callbacks for easily aligning frame timestamps so the ego and exo views are aligned.
- Calibrate ๐ฏ โ Use VGGT from @jianyuan_wang and @AIatMeta to get intrinsics/extrinsics for sparse cameras.
- Estimateโฏ3D ๐บ โ Use RTMLib wholeโbody keypoint estimator on each frame, then triangulate in 3D.
Whatโs missing?
- No temporal coherence: Iโm estimating keypoints one frame at a time and one camera at a time. This leads to a lot of jittering. For now, I plan on adding a OneโฏEuro Filter to help with jittering. Long term, Iโd want to train a multiview keypoint estimator
- Kinematic fitting is still missing; this is my next goal. The output will be joint angles, as explored in my previous posts.
- Missing dense point cloud: VGGT seems to fail for me here. Iโm looking to explore using MPโSFM as a method for generating dense multiview depth maps + normals (plus it has a friendlier license compared to VGGT).
- Eventually, creation of 4D Gaussian splatting using something akin to DNโsplatterโmy longโterm goal is a data engine that provides poses/depths/splats/keypoints/etc.
๋ฏธ๋์ด
![]()
๐ Related
Auto-generated - needs manual review
์ธ์ฉ ํธ์
Pablo Vela (@pablovelagomez1)
Synchronization โ Calibration with VGGT โ Next up is hand tracking + kinematic fitting
Almost there ๐ฎโ๐จ https://t.co/de7fy0DYLr