Zirui "Colin" Wang

I am a second-year master's student in Computer Science at Princeton University. My research is currently advised by Prof. Danqi Chen, and I serve as a Teaching Assistant in the department. I am generally interested in enhancing large language models with multimodal capabilities.

Previously, I obtained B.S. in Data Science at the Halicioglu Data Science Institute (HDSI) and B.A. in Cognitive Science at the CogSci Department at the University of California, San Diego (UCSD). I was advised by Prof. Zhuowen Tu and Prof. Zhiting Hu for generative models in computer vision during my undergraduate years.

Email  /  Resume  /  CV  /  GitHub  /  Twitter  /  Google Scholar  /  LinkedIn

profile photo

Updates

  • 11/04/2024: Awarded as a Top Reviewer in NeurIPS 2024"
  • 10/02/2024: Invited Talk at University of Michigan, Ann Arbor on "Evaluations of Multimodal Large Language Models"
  • 09/26/2024: CharXiv accepted into NeurIPS 2024 (D&B Track) & ECCV FoMo-Eval Workshop
  • 09/20/2024: Nominated for Siebel Scholar 2025
  • 08/05/2024: Invited Talk at Google on "CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs"

Schedule

My calendar. Note that all the schedulings are tentative and subjective to change

Research

project image

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs


Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen
Neural Information Processing Systems (NeurIPS)
NeurIPS Workshop on Multimodal Algorithmic Reasoning (Spotlight)
ECCV Workshop on Emergent Visual Abilities and Limits of Foundation Models
, 2024
website / arxiv / code

CharXiv reveals significant shortcomings in MLLMs’ chart understanding, showing a large performance gap between models and humans.

project image

Improving Language Understanding from Screenshots


Tianyu Gao, Zirui Wang, Adithya Bhaskar, Danqi Chen
Preprint, 2024
website / arxiv / code

We close the performance gap between screenshot Language Models and text-only Language Models on language understanding tasks with our PTP objective.

project image

Language Models as Science Tutors


Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Simon Machado, Arturo Rodriguez Fanlo, Simon Frieder, Zirui Wang, Akshara Prabhakar, Jiachen T. Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Ellie Thieu, Max Aragon, Zhiyong Ren, Junjie Zhu, Toni Annala, Sanjeev Arora, Danqi Chen
International Conference on Machine Learning (ICML), 2024
website / arxiv / code

We propose TutorChat and TutorEval, a dataset of long synthetic dialogues about textbooks and a question-ansering benchmark consisting questions about long chapters from STEM textbooks written by human experts.

project image

TokenCompose: Grounding Diffusion with Token-level Supervision


Zirui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu
Computer Vision and Pattern Recognition (CVPR), 2024
website / arxiv / code

We introduce token-wise consistency terms between the image content and object segmentation maps in training text-to-image models for enhanced multi-category instance composition and photorealism.

project image

OmniControlNet: Dual-stage Integration for Conditional Image Generation


Yilin Wang*, Haiyang Xu*, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui Wang, Zhuowen Tu
Computer Vision and Pattern Recognition (CVPR), Workshop in Generative Models for Computer Vision, 2024

We provide a two-way integration for the widely-adopted ControlNet method by integrating four external condition generation algorithms into a single dense image labeling method, and by integrating its individually trained image generation processes into a single model.

project image

Language Models Meet World Models: Embodied Experiences Enhance Language Models


Jiannan Xiang*, Tianhua Tao*, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu
Neural Information Processing Systems (NeurIPS), 2023
website / arxiv / code

We establish a framework that effectively and efficiently finetunes a language model with embodied experience while retaining its language modeling abilities.

project image

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning


Yifan Xu*, Nicklas Hansen*, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
International Conference on Learning Representations (ICLR), 2023
website / arxiv / code

We investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster.



Services

  • ICML: Reviewer (2024)
  • ICLR: Reviewer (2023, 2024)
  • NeurIPS: Reviewer (2024)


Teaching

Full lists and details about classes I have served as a teaching assistant in the past. Instructor names are listed based on the time I worked with them. Staff names are listed in alphebatical order based on their first names. Instructor names and staff names are separated by a semicolon. Instructor evaluation is attached if available.

teaching image

NLP


Karthik Narasimhan; Anika Maskara, Ben Shi, Evan Wang, Howard Yen, Yash Parikh, Yihan Wang, Zachary Siegel, Zirui Wang
Princeton COS 484 SP24
website

An introductory class about Natural Language Processing. Topics covered include language modeling, representation learning, text classification, sequence tagging, machine translation, Transformers, and others.

teaching image

Human-Computer Interaction


Andrés Monroy-Hernández, Parastoo Abtahi; Beza Desta, Yuhan Liu, Zirui Wang
Princeton COS 436 FA23
website

Project-based class where students are introduced with basics of HCI and either conduct a study-based (e.g., survey + interviews) project or a system-based (e.g., implementations) project. I lead HCI + AI domain.

teaching image

Deep Learning


Gary Cottrell; Eric Yu, Martha Gahl, Rohin Garg, Shubham Kulkarni, Weitang Liu, Zirui Wang
UCSD CSE 151B WI22
evaluation / website

This course covers the fundamentals of neural networks. We introduce linear regression, logistic regression, perceptrons, multilayer networks and back-propagation, convolutional neural networks, recurrent networks, and deep networks trained by reinforcement learning.

teaching image

Introduction to Machine Learning


Jingbo Shang; Dheeraj Mekala, Weijian Xu, Xinghan Wang, Yilun Hao, Zhaoyi Hou, Zhenyu Bi, Zirui Wang
UCSD CSE 151A SP21
evaluation / website

The topics include some topics in supervised learning, such as k-nearest neighbor classifiers, decision trees, boosting, and perceptrons; and topics in unsupervised learning, such as k-means and hierarchical clustering. In addition to the actual algorithms, the course focuses on the principles behind the algorithms.

teaching image

Practical Data Science in R


Shannon Ellis; Sean Trott, Shubham Kulkarni, Zirui Wang
UCSD COGS 137 FA21
evaluation / website

Learn coding for data analysis using the R programming language. Course focus will be on practical and applied skills in asking data-informed questions, data wrangling, data visualization, building statistical learning models, and communication.

teaching image

The Practice and Application of Data Science


Justin Eldridge; Amy Nguyen, Jiaqi Feng, Murali Dandu, Nicole Brye, Ruojia Tao, Shubham Kaushal, Vineet Tallavajhala, Winston Yu, Zirui Wang
UCSD DSC 80 FA21
website

Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.

teaching image

Data Structures and Algorithms for Data Science (x4)


Marina Langlois, Aaron Fraenkel, Soohyun Liao; Amy Nguyen, Brian Wang, Huaning Liu, Jeffrey Feng, Kevin Chin, Kunyang Sun, Madeline Tjoa, Sally Poon, Sharmi Mathur, Shasank Bonthala, Shubham Kaushal, Travis Tran, Trinity Pham, Viswesh Uppalapati, Yu-Chieh Chen, Yuanjia Yang, Yung-Chieh Chan, Yuri Bukhradze, Yuru Zhou, Yuxiao Ran, Yuxuan Fan, Zirui Wang
UCSD DSC 30 WI21/SP21/S221/FA21
evaluation / website

Programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees, and hash tables with Java.

teaching image

Programming and Basic Data Structures for Data Science (x2)


Marina Langlois; Aaron Chan, Amy Nguyen, Darren Liu, Haihao Sun, Huaning Liu, Huy Trinh, Jacqueline Lee, James Yu, Jeffrey Chu, Jianming Geng, Madeline Tjoa, Ruixuan Zhang, Sharmi Mathur, Shubham Kaushal, Siddharth Saha, Xiangyi Kong, Yijun Liu, Yu-Chieh Chen, Yung-Chieh Chan, Yuri Bukhradze, Yuru Zhou, Yuxiao Ran, Yuxuan Fan, Zirui Wang
UCSD DSC 20 FA20/WI21
website

Programming techniques including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes, and simple data structures such as arrays, lists, and linked lists.

teaching image

Principles of Data Science


Justin Eldridge; Anna Liu, Anqi Wang, Dylan Lee, Jeffrey Chu, Jessica Guzman, Meiwen Liu, Ruojia Tao, Shubham Kaushal, Teresa Lee, Xiaowang Huang, Xuzhe Zhi, Yuanjia Yang, Yi Li, Zirui Wang
UCSD DSC 10 S121
website

This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets, emphasizing an end-to-end approach to data science, introducing programming techniques in Jupyter Notebook that cover data processing, modeling, and analysis.




Other Projects

These include courseworks, projects and other research-related work not for publishing. contents to be updated (10/11/2022). To remind myself what to put: DSC180 Capstone, DSC190 DataMining, COGS 108, MATH 189, DSC 102, DSC 106, COGS 189, DataHacks Adv, DataHacks Bus, Tencent, SS (CMU)

project image

On the Domain Robustness with Prompt & Prefix Tuning


Zirui Wang*, Lechuan Wang*, Yutong Luo
Data Science Undergraduate Capstone, 2022
paper / slides / code /

We analyze the robustness of Language Models using prompt tuning and prefix tuning toward a domain shift (i.e. learning a task from data in a specific domain and evaluating that model on the same task, but data is out-of-domain).

project image

EEG Transformer


Zirui Wang, Xing Hong, Luning Yang, Annie Fan, Yunyi Huang, Zixin Ma
COGS 189: Brain Computer Interfaces, UCSD, 2022
slides / code /

We implement a naive EEG transformer that explores the possibility of using a ViT based transformer for inferring 3-class motor imagery based on multichannel time-series EEG data recorded at 1000 Hz for 8 seconds (in which 4 seconds are used). We propose future directions.


Design and source code from Jon Barron's website. Forked from the Jekyll varient by Leonid Keselman.