Research
Personalized AI Systems
Building AI systems that learn personal visual concepts and maintain long-term memory for tailored, user-specific experiences.
- CamRoll: A hierarchical-memory agent for personal camera roll.
- VisualMem: A hybrid visual–text memory architecture for personal visual memory.
- PEARL: A personalized streaming video understanding model with persistent user memory.
- Yo'Chameleon / Yo'LLaVA: Embedding user-defined visual concepts into multimodal LLMs for personalized recognition and generation.
Controllable Generative Models
Building controllable generative models with spatially grounded image synthesis and seamless multimodal integration.
- GLIGEN: Grounded text-to-image generation conditioned on open-set spatial inputs including boxes, keypoints, and reference images.
- X-Fusion: Grafting visual understanding and generation onto frozen LLMs, preserving language capabilities while enabling multimodal reasoning.
- UniTemp: Autoregressive distillation for video generation in any temporal order, enabling flexible non-sequential video synthesis.