Michael Black (@Michael_J_Black)
2025-12-15 | โค๏ธ 292 | ๐ 46
Video diffusion models have strong implicit representations of 3D shape, material, and lighting, but controlling them with language is cumbersome, and control is critical for artists and animators.
GenLit connects these implicit representations with a continuous 5D control signal describing the direction and intensity of a point light source.
This enables single-image near-field relighting of an image using a video diffusion model. We use a ControlNet-like approach and show that, with a small amount of synthetic data, GenLit generalizes to complex real-world images.
Given a single image and the 5D lighting signal, GenLit creates a video of a moving light source that is inside the scene. It moves around and behind scene objects, producing effects such as shading, cast shadows, secularities, and interreflections with a realism that is hard to obtain with traditional inverse rendering methods.
GenLit shows that it is possible to get continuous control over implicit physical processes within a video model. I think this is just the beginning and promises to make such models much more practical for creators.
@sbharadwaj__ will present today at SIGGRAPH Asia Room: S423/S424, Level 4 @ 13:50 on 15 of Dec.
๐ ์๋ณธ ๋งํฌ
๋ฏธ๋์ด
![]()