Yuheng Li

Research

Personalized AI Systems

Building AI systems that learn personal visual concepts and maintain long-term memory for tailored, user-specific experiences.

CamRoll: A hierarchical-memory agent for personal camera roll.
VisualMem: A hybrid visual–text memory architecture for personal visual memory.
PEARL: A personalized streaming video understanding model with persistent user memory.
Yo'Chameleon / Yo'LLaVA: Embedding user-defined visual concepts into multimodal LLMs for personalized recognition and generation.

Controllable Generative Models

Building controllable generative models with spatially grounded image synthesis and seamless multimodal integration.

GLIGEN: Grounded text-to-image generation conditioned on open-set spatial inputs including boxes, keypoints, and reference images.
X-Fusion: Grafting visual understanding and generation onto frozen LLMs, preserving language capabilities while enabling multimodal reasoning.
UniTemp: Autoregressive distillation for video generation in any temporal order, enabling flexible non-sequential video synthesis.

Publications

CamRoll: Personal AI Agent for Camera Roll VQA

Thao Nguyen, Krishna Kumar Singh, Donghyun Kim, Yong Jae Lee*, Yuheng Li* (*equal advising)

arXiv, 2026

Project Demo Code Paper

VisualMem: Personal Visual Memory from Explicit and Implicit Evidence

Viet Nguyen, Thao Nguyen, Vishal M. Patel*, Yuheng Li* (*equal advising)

arXiv, 2026

Project Code Paper

PEARL: Personalized Streaming Video Understanding Model

Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, Yuxing Liu, Sihan Yang, Huanyu Zhang, Haodong Li, Qintong Zhang, Renrui Zhang, Guopeng Li, Yifan Zhang, Yuheng Li*, Wentao Zhang* (*equal advising)

arXiv, 2026

Paper Code

Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

Sicheng Mo, Thao Nguyen, Richard Zhang, Nick Kolkin, Siddharth Srinivasan Iyer, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li

Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Project Paper

Relational Visual Similarity

Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee*, Yuheng Li* (*equal advising)

Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Project Paper Code

Learning an Image Editing Model without Image Editing Pairs

Nupur Kumari, Sheng-Yu Wang, Nanxuan Zhao, Yotam Nitzan, Yuheng Li, Krishna Kumar Singh, Richard Zhang, Eli Shechtman, Jun-Yan Zhu, Xun Huang

International Conference on Learning Representations (ICLR), 2026

Project Paper

Yo'Chameleon: Personalized Vision and Language Generation

Thao Nguyen, Krishna Kumar Singh, Jing Shi, Trung Bui, Yong Jae Lee*, Yuheng Li* (*equal advising)

Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Project Code Paper

X-Fusion: Introducing New Modality to Frozen Large Language Models

Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li

IEEE International Conference on Computer Vision (ICCV), 2025

Project Code Paper

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

Bhishma Dedhia, David Bourgin, Krishna Kumar Singh, Yuheng Li, Yan Kang, Zhan Xu, Niraj K. Jha, Yuchen Liu

IEEE International Conference on Computer Vision (ICCV), 2025

Project Paper

Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Chun-Hsiao Yeh, Yilin Wang, Nanxuan Zhao, Richard Zhang, Yuheng Li, Yi Ma, Krishna Kumar Singh

AAAI Conference on Artificial Intelligence (AAAI), 2025

Project Paper

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

IEEE Winter Conference on Applications of Computer Vision (WACV), 2025

Yo'LLaVA: Your Personalized Language and Vision Assistant

Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

Neural Information Processing Systems (NeurIPS), 2024

Project Code Paper

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Yuheng Li, Haotian Liu, Mu Cai, Yijun Li, Eli Shechtman, Zhe Lin, Yong Jae Lee, Krishna Kumar Singh

European Conference on Computer Vision (ECCV), 2024

Project Paper

Edit One for All: Interactive Batch Image Editing

Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Project Code Paper

Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Project Code Paper

GLIGEN: Open-Set Grounded Text-to-Image Generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

arXiv Code Project Demo

Visual Instruction Inversion: Image Editing via Visual Prompting

Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

Neural Information Processing Systems (NeurIPS), 2023

Project Code Paper

What Knowledge Gets Distilled in Knowledge Distillation?

Utkarsh Ojha*, Yuheng Li*, Anirudh Sundara Rajan*, Yingyu Liang, Yong Jae Lee (*equal contribution)

Neural Information Processing Systems (NeurIPS), 2023

Towards Universal Fake Image Detectors that Generalize Across Generative Models

Utkarsh Ojha*, Yuheng Li*, Yong Jae Lee (*equal contribution)

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Generate Anything Anywhere in Any Scene

Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee

arXiv, 2023

Contrastive Learning for Diverse Disentangled Foreground Generation

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

European Conference on Computer Vision (ECCV), 2022

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

arXiv Code

Delving Deeper into Anti-aliasing in ConvNets

Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yuheng Li, Yong Jae Lee

International Journal of Computer Vision (IJCV), 2022

Collaging Class-specific GANs for Semantic Image Synthesis

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

IEEE International Conference on Computer Vision (ICCV), 2021

arXiv Project

PartGAN: Unsupervised Part Decomposition for Image Generation and Segmentation

Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

British Machine Vision Conference (BMVC), 2021

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

arXiv Code