Pablo Vela (@pablovelagomez1)
2025-10-02 | โค๏ธ 84 | ๐ 9
Got some new out-of-distribution data to test with, tracker is looking ๐ฅ. Had to disable the mano optimization for now as Iโve found some wonky bugs, but will have those solved soon https://x.com/pablovelagomez1/status/1973857488283054470/video/1
๐ Related
Auto-generated bookmark
์ธ์ฉ ํธ์
Pablo Vela (@pablovelagomez1)
Itโs finally done, Iโve finished ripping out my full-body pipeline and replaced it with a hands-only version. Critical to make it work in a lot more scenarios! Iโve visualized the final predictions with @rerundotio!
I want to emphasize that these are not the ground-truth values provided by the wonderful HOCap dataset, but rather from my pipeline that was written from the ground up!
For context, it consists of 4 parts
- Exo/Ego camera estimation
- Hand Shape Calibration
- Per View 2D keypoint estimation
- Hand Pose Optimization
At the end of it all, I have a pipeline where you input synchronized videos and this outputs full tracked per-view 2D keypoints, bounding boxes, 3D keypoints, MANO joint angles + hand shape!
Really happy with how it looks so far, but this is far from ideal.
- Not even close to real time, this 30-second 8-view sequence took nearly 5 minutes to process on my 5090 GPU
- 8 views is WAY too many and unscalable, Iโm convinced this can be done with far fewer (2 exo + 1 stereo ego)
- Interacting hands causes lots of issues, and the pipeline is very fragile when thereโs no clear delineation between hands
Still, Iโm quite happy with how itโs going so far. Currently, I have a reasonable set of datasets to validate, a performant baseline, and an annotation app to correct inaccurate predictions.
From here, the focus will be more on the egocentric side!
๐ฌ ์์