Philippe Hansen-Estruch (@tokenpilled65B)
2026-01-20 | โค๏ธ 413 | ๐ 57 | ๐ฌ 15
Releasing ViTok-v2: open-source ViT auto-encoder codebase + pretrained weights
Train your own ViT auto-encoder on any streamed (hf://) or local webdataset. NaFlex pipeline handles any resolution and aspect ratio
Includes reproduced 350M and 4.5B models weights competitive at 256p, SOTA at high-res (512p+)
๋ฏธ๋์ด

๐ Related
- what-if-we-could-model-vision-like-a-wave-moving-through โ ์ฃผ์ : AI-ML, Dev-Tools
- video-models-serve-as-a-good-pretrained-backbone-for-robot โ ์ฃผ์ : AI-ML, Dev-Tools
- introducing-shaper-a-method-for-robust-conditional-3d-shape โ ์ฃผ์ : AI-ML, Dev-Tools
- what-if-we-could-train-ai-robots-in-a-perfect-physics โ ์ฃผ์ : AI-ML, Dev-Tools
- if-youve-ever-tried-to-create-3dgs-scenes-from-photos-taken โ ์ฃผ์ : AI-ML, Web/Graphics