Wenhao Chai

Profile

Wenhao Chai is currently a graduate student at University of Washington, with Information Processing Lab advised by Prof. Jenq-Neng Hwang. Previously, he was an undergradate student at Zhejiang University, with CVNext Lab advised by Prof. Gaoang Wang. He is fortunate to work with Prof. Christopher D Manning at Stanford University, and have worked with Prof. Saining Xie and Prof. Yilun Du. He has internship at Pika Labs and Microsoft Research Asia. His research primarily in large multimodal models (LMMs) for video understanding, embodied agent, and generative models. He has published related papers in top-tier conferences and journals such as CVPR, ICCV, ECCV, and AAAI. He has also organized workshops and tutorials at CVPR and AAAI, and served as a reviewer for NeurIPS, ICLR, ICML, CVPR, ECCV, and AISTATS. My current research focus on developing embodied AI agents inspired by cognitive science principles to interact with the physical world, building upon video understanding as a core perceptual mechanism. I propose a long-short term memory framework modeled after the human memory system, enabling pre-trained video LMMs to comprehend multi-hour video content without additional fine-tuning. To enhance efficiency, I introduce token merging, significantly reducing visual tokens with minimal performance degradation. I also demonstrate step-by-step agent system development in Minecraft, showcasing cognitive-inspired agent capabilities in virtual environments.

Publications

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound

Xuechen Guo, Wenhao Chai, Shiyan Li, Gaoang Wang

ACM International Conference on Multimedia (ACM MM), 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos

Ego3DT: Tracking Every 3D Object in Ego-centric Videos

Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang

ACM International Conference on Multimedia (ACM MM), 2024

PAD: Personalized Alignment of LLMs at Decoding-Time

PAD: Personalized Alignment of LLMs at Decoding-Time

Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, J. Hwang, Saining Xie, Christopher D. Manning

arXiv.org 2024

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding

Association for the Advancement of Artificial Intelligence (AAAI), 2025

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

Yuan-Hao Ho, Jen-Hao Cheng, Sheng-Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang

European Conference on Computer Vision (ECCV), 2024

NTIRE 2024 Image Shadow Removal Challenge Report

NTIRE 2024 Image Shadow Removal Challenge Report

Florin-Alexandru Vasluianu, †. TimSeizinger, †. ZhuyunZhou, Zongwei Wu, †. CailianChen, R. Timofte, Wei Dong, Han Zhou, Yuqiong Tian, Jun Chen, Xueyang Fu, Xin Lu, Yurui Zhu, Xi Wang, Dong Li, Jie Xiao, Yunpeng Zhang, Zheng Zha, Zhao Zhang, Suiyi Zhao, Bomin Wang, Yan Luo, Yanyan Wei, Zhihao Zhao, Long Sun, Tingting Yang, Jin-Mei Pan, Jian-Ping Dong, Jinhui Tang, Bilel Benjdira, Mohammed Nassif, A. Koubâa, Ahmed Elhayek, Anas M. Ali, Kyotaro Tokoro, Kento Kawai, Kaname Yokoyama, Takuya Seno, Yuki Kondo, N. Ukita, LI, Bo Yang, Zhiqi Wu Gao, Chen Yihan, Sixiang Yu, Chen Kai Zhang, Tian Ye, Wenbin Zou, Yunlong Lin, Zhaohu Xing, Jinbin Bai, Wenhao Chai, Lei Zhu, Ritik Maheshwari, Rakshank Verma, Rahul Tekchandani Praful, Hambarde Satya, Narayan Tazi, Santosh Kumar, Vipparthi Subrahmanyam, Murala Jaeho, Lee Seongwan, Kim Sharif, Nodirkhuja Khujaev, Roman A. Tsoy, Fan Gao, Weidan Yan, Wenze Shao, Dengyin Zhang Bin, Chen Siqi, Yanxin Zhang, Yu Qian, Yuanbo Chen, Zhou Tong, Rongfeng Tong, Wei Ruiqi, Sun Yue Liu, Nikhil Akalwadi, Amogh Joshi, Sampada Malagi, Chaitra Desai, R. Tabib, U. Mudenagudi, Ali Murtaza, U. Khairuddin, Ahmad Athif, Mohd Faudzi, Adinath Dukre, Vivek Deshmukh, Shruti S. Phutke, Ashutosh Kulkarni, Vipparthi Anil Gonde, Subrahmanyam Murala, Arun karthik K Manasa, N. Shri, Hari Priya, Wei Hao, X. Yan, Minghan Fu, LVGroup Hfut, Ustc ShadowTitan, Zhi-Ming Wu, Gao Chen, Yi-Kang Yu, Sixiang Chen, Kai Zhang, Rahul Tekchandani, Praful Hambarde, S. Tazi, Jae-Hyeon Lee, Seongwan Kim, Finetuned MambaIR, Dengyin Zhang, Bin Chen, Siqi Zhang, Yanxin Qian, Yuanbin Chen, Yuanbo Zhou, Tong Tong, Rongfeng Wei, Ruiqi Sun, Yue Liu

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tianbo Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

arXiv.org 2024

Learning Diffusion Texture Priors for Image Restoration

Learning Diffusion Texture Priors for Image Restoration

Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu

Computer Vision and Pattern Recognition 2024

CityCraft: A Real Crafter for 3D City Generation

CityCraft: A Real Crafter for 3D City Generation

Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

arXiv.org 2024

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang

2024 IEEE Intelligent Vehicles Symposium (IV) 2024

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

Enxin Song, Wenhao Chai, Tianbo Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

arXiv.org 2024

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng

arXiv.org 2024

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

arXiv.org 2024

Random bridge generator as a platform for developing computer vision-based structural inspection algorithms

Haojia Cheng, Wenhao Chai, Jiabao Hu, Wenhao Ruan, Mingyu Shi, Hyunjun Kim, Yifan Cao, Y. Narazaki

Journal of Infrastructure Intelligence and Resilience 2024

VersaT2I: Improving Text-to-Image Models with Versatile Reward

VersaT2I: Improving Text-to-Image Models with Versatile Reward

Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tianbo Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

arXiv.org 2024

Exploring Learning-based Motion Models in Multi-Object Tracking

Exploring Learning-based Motion Models in Multi-Object Tracking

Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

arXiv.org 2024

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tianbo Ye, Yanting Zhang, Gaoang Wang

arXiv.org 2024

Unsupervised Domain Adaptation Approach for Vision-Based Semantic Understanding of Bridge Inspection Scenes without Manual Annotations

Y. Narazaki, Wendong Pang, Gaoang Wang, Wenhao Chai

Journal of Bridge Engineering 2024

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang

Chinese Conference on Pattern Recognition and Computer Vision 2023

CityGen: Infinite and Controllable 3D City Layout Generation

CityGen: Infinite and Controllable 3D City Layout Generation

Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang

arXiv.org 2023

See and Think: Embodied Agent in Virtual Environment

See and Think: Embodied Agent in Virtual Environment

Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tianbo Ye, Jenq-Neng Hwang, Gaoang Wang

European Conference on Computer Vision 2023

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang

arXiv.org 2023

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li, Jenq-Neng Hwang

2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) 2023

Sequential Affinity Learning for Video Restoration

Sequential Affinity Learning for Video Restoration

Tian Ye, Sixiang Chen, Yun Liu, Wenhao Chai, Jinbin Bai, Wenbin Zou, Yunchen Zhang, Mingchao Jiang, Erkang Chen, Chenghao Xue

ACM Multimedia 2023

Devil in the Number: Towards Robust Multi-modality Data Filter

Devil in the Number: Towards Robust Multi-modality Data Filter

Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

arXiv.org 2023

Chasing Consistency in Text-to-3D Generation from a Single Image

Chasing Consistency in Text-to-3D Generation from a Single Image

Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

arXiv.org 2023

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang

AAAI Conference on Artificial Intelligence 2023

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Wenhao Chai, Xun Guo, Gaoang Wang, Yang Lu

IEEE International Conference on Computer Vision 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

Han-Wen Liu, Ju He, Zhi-Qi Cheng, Wangmeng Xiang, Q. Yang, Wenhao Chai, Gaoang Wang, Xueting Bao, Bin Luo, Yifeng Geng, Xuansong Xie

ACM Multimedia 2023

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Xun Guo, Tianbo Ye, Yang Lu, Jenq-Neng Hwang, Gaoang Wang

Computer Vision and Pattern Recognition 2023

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Min-Gyoo Song, Jenq-Neng Hwang, Gaoang Wang

arXiv.org 2023

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

Zhongyu Jiang, Zhu Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

IEEE Workshop/Winter Conference on Applications of Computer Vision 2023

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tianbo Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Chinese Conference on Pattern Recognition and Computer Vision 2023

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models

Shidong Cao, Wenhao Chai, Shengyu Hao, Gaoang Wang

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023

Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement

Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement

J. Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, Erkang Chen

British Machine Vision Conference 2023

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

IEEE International Conference on Computer Vision 2023

Blind Inpainting with Object-Aware Discrimination for Artificial Marker Removal

Blind Inpainting with Object-Aware Discrimination for Artificial Marker Removal

Xue-jun Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

IEEE International Conference on Acoustics, Speech, and Signal Processing 2023

Deep Learning Methods for Small Molecule Drug Discovery: A Survey

Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

IEEE Transactions on Artificial Intelligence 2023

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models

Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

IEEE transactions on multimedia 2023

Automatic Spinal Ultrasound Image Segmentation and Deployment for Real-time Spine Volumetric Reconstruction

Automatic Spinal Ultrasound Image Segmentation and Deployment for Real-time Spine Volumetric Reconstruction

Yifan Cao, C. Tan, Wenzhuo Qian, Wenhao Chai, Luhang Cui, Wen-xiong Yang, Xinben Hu, Yongjian Zhu, Wenhui Zhou, Xingfa Shen

2022 IEEE International Conference on Unmanned Systems (ICUS) 2022

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, V. Kindratenko

IEEE International Conference on Tools with Artificial Intelligence 2022

Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend

Wenhao Chai, Gaoang Wang

Applied Sciences 2022

Diffusion-based Zero-Shot 3D Human Pose Estimation

Diffusion-based Zero-Shot 3D Human Pose Estimation

Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

Supplement Material: Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

PAD: Personalized Alignment at Decoding-Time

PAD: Personalized Alignment at Decoding-Time

Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu

arXiv.org 2024