Zirui "Colin" Wang

I am a Ph.D. student in EECS at UC Berkeley. My research is advised by Prof. Joseph Gonzalez, Prof. Trevor Darrell and Prof. Ion Stoica. I am interested in grounded decision making, reasoning and planning in large multimodal models. My research has been adopted and cited by multiple frontier vision-language models, including OpenAI GPT-series, Google Gemini, and Qwen-VL.

I work as a Member of Technical Staff at Voio, Inc, where I co-lead post-training of vision-language models for medical report generation, segmentation, and interactive decision-making.

Previously, I obtained B.S. in Data Science at the Halicioglu Data Science Institute (HDSI) and B.A. in Cognitive Science at the CogSci Department at the University of California, San Diego (UCSD). I was advised by Prof. Zhuowen Tu and Prof. Zhiting Hu for generative models in computer vision during my undergraduate years. I obtained my M.S.E. in Computer Science at Princeton University, advised by Prof. Danqi Chen where I worked on multimodal pre-training, reasoning and evaluation.

Email / Resume / CV / GitHub / Twitter / Google Scholar / LinkedIn

Research

	VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Zirui Wang, Junyi Zhang, Jiaxin Ge, Long Lian, Letian Fu, Lisa Dunlap, Ken Goldberg, Xudong Wang, Ion Stoica, David M. Chan, Sewon Min, Joseph E. Gonzalez Research preview*, 2025 website / arxiv / We introduce VisGym, a gymnasium of 17 environments for evaluating and training VLMs. The suite spans symbolic puzzles, real-image understanding, navigation, and manipulation, and provides flexible controls over difficulty, input representation, planning horizon, and feedback.
	YOLO-Count: Differentiable Object Counting for Text-to-Image Generation Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu International Conference on Computer Vision (ICCV), 2025 website / arxiv / We propose YOLO-Count, a differentiable open-vocabulary object counting model that tackles both general counting challenges and enables precise quantity control for text-to-image (T2I) generation.
	CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen Neural Information Processing Systems (NeurIPS) NeurIPS Workshop on Multimodal Algorithmic Reasoning (Spotlight) ECCV Workshop on Emergent Visual Abilities and Limits of Foundation Models, 2024 website / arxiv / code CharXiv reveals significant shortcomings in MLLMs’ chart understanding, showing a large performance gap between models and humans.
	Improving Language Understanding from Screenshots Tianyu Gao, Zirui Wang, Adithya Bhaskar, Danqi Chen Preprint, 2024 website / arxiv / code We close the performance gap between screenshot Language Models and text-only Language Models on language understanding tasks with our PTP objective.
	Language Models as Science Tutors Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Simon Machado, Arturo Rodriguez Fanlo, Simon Frieder, Zirui Wang, Akshara Prabhakar, Jiachen T. Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Ellie Thieu, Max Aragon, Zhiyong Ren, Junjie Zhu, Toni Annala, Sanjeev Arora, Danqi Chen International Conference on Machine Learning (ICML), 2024 website / arxiv / code We propose TutorChat and TutorEval, a dataset of long synthetic dialogues about textbooks and a question-ansering benchmark consisting questions about long chapters from STEM textbooks written by human experts.
	TokenCompose: Grounding Diffusion with Token-level Supervision Zirui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu Computer Vision and Pattern Recognition (CVPR), 2024 website / arxiv / code We introduce token-wise consistency terms between the image content and object segmentation maps in training text-to-image models for enhanced multi-category instance composition and photorealism.
	OmniControlNet: Dual-stage Integration for Conditional Image Generation Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu Computer Vision and Pattern Recognition (CVPR), Workshop in Generative Models for Computer Vision, 2024 We provide a two-way integration for the widely-adopted ControlNet method by integrating four external condition generation algorithms into a single dense image labeling method, and by integrating its individually trained image generation processes into a single model.
	Language Models Meet World Models: Embodied Experiences Enhance Language Models Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu Neural Information Processing Systems (NeurIPS), 2023 website / arxiv / code We establish a framework that effectively and efficiently finetunes a language model with embodied experience while retaining its language modeling abilities.
	On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu International Conference on Learning Representations (ICLR), 2023 website / arxiv / code We investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster.

Services

ICML: Reviewer (2024, 2025, 2026)
ICLR: Reviewer (2024, 2025)
NeurIPS: Reviewer (2024, 2025)
ACL: Reviewer (2025)
CVPR: Reviewer (2025)
AAAI: Reviewer (2026)

Teaching

Full lists and details about classes I have served as a teaching assistant in the past. Instructor names are listed based on the time I worked with them. Staff names are listed in alphebatical order based on their first names. Instructor names and staff names are separated by a semicolon. Instructor evaluation is attached if available.

	NLP Karthik Narasimhan; Anika Maskara, Ben Shi, Evan Wang, Howard Yen, Yash Parikh, Yihan Wang, Zachary Siegel, Zirui Wang Princeton COS 484 SP24 website An introductory class about Natural Language Processing. Topics covered include language modeling, representation learning, text classification, sequence tagging, machine translation, Transformers, and others.
	Human-Computer Interaction Andrés Monroy-Hernández, Parastoo Abtahi; Beza Desta, Yuhan Liu, Zirui Wang Princeton COS 436 FA23 website Project-based class where students are introduced with basics of HCI and either conduct a study-based (e.g., survey + interviews) project or a system-based (e.g., implementations) project. I lead HCI + AI domain.
	Deep Learning Gary Cottrell; Eric Yu, Martha Gahl, Rohin Garg, Shubham Kulkarni, Weitang Liu, Zirui Wang UCSD CSE 151B WI22 evaluation / website This course covers the fundamentals of neural networks. We introduce linear regression, logistic regression, perceptrons, multilayer networks and back-propagation, convolutional neural networks, recurrent networks, and deep networks trained by reinforcement learning.
	Introduction to Machine Learning Jingbo Shang; Dheeraj Mekala, Weijian Xu, Xinghan Wang, Yilun Hao, Zhaoyi Hou, Zhenyu Bi, Zirui Wang UCSD CSE 151A SP21 evaluation / website The topics include some topics in supervised learning, such as k-nearest neighbor classifiers, decision trees, boosting, and perceptrons; and topics in unsupervised learning, such as k-means and hierarchical clustering. In addition to the actual algorithms, the course focuses on the principles behind the algorithms.
	Practical Data Science in R Shannon Ellis; Sean Trott, Shubham Kulkarni, Zirui Wang UCSD COGS 137 FA21 evaluation / website Learn coding for data analysis using the R programming language. Course focus will be on practical and applied skills in asking data-informed questions, data wrangling, data visualization, building statistical learning models, and communication.
	The Practice and Application of Data Science Justin Eldridge; Amy Nguyen, Jiaqi Feng, Murali Dandu, Nicole Brye, Ruojia Tao, Shubham Kaushal, Vineet Tallavajhala, Winston Yu, Zirui Wang UCSD DSC 80 FA21 website Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.
	Data Structures and Algorithms for Data Science (x4) Marina Langlois, Aaron Fraenkel, Soohyun Liao; Amy Nguyen, Brian Wang, Huaning Liu, Jeffrey Feng, Kevin Chin, Kunyang Sun, Madeline Tjoa, Sally Poon, Sharmi Mathur, Shasank Bonthala, Shubham Kaushal, Travis Tran, Trinity Pham, Viswesh Uppalapati, Yu-Chieh Chen, Yuanjia Yang, Yung-Chieh Chan, Yuri Bukhradze, Yuru Zhou, Yuxiao Ran, Yuxuan Fan, Zirui Wang UCSD DSC 30 WI21/SP21/S221/FA21 evaluation / website Programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees, and hash tables with Java.
	Programming and Basic Data Structures for Data Science (x2) Marina Langlois; Aaron Chan, Amy Nguyen, Darren Liu, Haihao Sun, Huaning Liu, Huy Trinh, Jacqueline Lee, James Yu, Jeffrey Chu, Jianming Geng, Madeline Tjoa, Ruixuan Zhang, Sharmi Mathur, Shubham Kaushal, Siddharth Saha, Xiangyi Kong, Yijun Liu, Yu-Chieh Chen, Yung-Chieh Chan, Yuri Bukhradze, Yuru Zhou, Yuxiao Ran, Yuxuan Fan, Zirui Wang UCSD DSC 20 FA20/WI21 website Programming techniques including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes, and simple data structures such as arrays, lists, and linked lists.
	Principles of Data Science Justin Eldridge; Anna Liu, Anqi Wang, Dylan Lee, Jeffrey Chu, Jessica Guzman, Meiwen Liu, Ruojia Tao, Shubham Kaushal, Teresa Lee, Xiaowang Huang, Xuzhe Zhi, Yuanjia Yang, Yi Li, Zirui Wang UCSD DSC 10 S121 website This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets, emphasizing an end-to-end approach to data science, introducing programming techniques in Jupyter Notebook that cover data processing, modeling, and analysis.

Other Projects

These include courseworks, projects and other research-related work not for publishing. contents to be updated (10/11/2022). To remind myself what to put: DSC180 Capstone, DSC190 DataMining, COGS 108, MATH 189, DSC 102, DSC 106, COGS 189, DataHacks Adv, DataHacks Bus, Tencent, SS (CMU)

On the Domain Robustness with Prompt & Prefix Tuning

Zirui Wang*, Lechuan Wang*, Yutong Luo
Data Science Undergraduate Capstone, 2022
paper / slides / code /

We analyze the robustness of Language Models using prompt tuning and prefix tuning toward a domain shift (i.e. learning a task from data in a specific domain and evaluating that model on the same task, but data is out-of-domain).

EEG Transformer

Zirui Wang, Xing Hong, Luning Yang, Annie Fan, Yunyi Huang, Zixin Ma
COGS 189: Brain Computer Interfaces, UCSD, 2022
slides / code /

We implement a naive EEG transformer that explores the possibility of using a ViT based transformer for inferring 3-class motor imagery based on multichannel time-series EEG data recorded at 1000 Hz for 8 seconds (in which 4 seconds are used). We propose future directions.

Design and source code from Jon Barron's website. Forked from the Jekyll varient by Leonid Keselman.

Zirui "Colin" Wang

Research

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Improving Language Understanding from Screenshots

Language Models as Science Tutors

TokenCompose: Grounding Diffusion with Token-level Supervision

OmniControlNet: Dual-stage Integration for Conditional Image Generation

Language Models Meet World Models: Embodied Experiences Enhance Language Models

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Services

Teaching

NLP

Human-Computer Interaction

Deep Learning

Introduction to Machine Learning

Practical Data Science in R

The Practice and Application of Data Science

Data Structures and Algorithms for Data Science (x4)

Programming and Basic Data Structures for Data Science (x2)

Principles of Data Science

Other Projects

On the Domain Robustness with Prompt & Prefix Tuning

EEG Transformer