I am a Member of Technical Staff at Anthropic. Previously, I worked as a Machine Learning Engineer at Apple, focusing on efficient LLMs, efficient NeRFs, and 3D Hand-Object Interaction.
In 2022, I obtained my Master's degree in Robotics (MSR) at the Robotics Institute of
Carnegie Mellon University. In CMU, I worked with Prof. Kris Kitani on computer vision research.
Specifically, I foused on hand-object interaction.
I obtained my dual Bachelor's degree: B.S in Computer Science at University of
Michigan - Ann Arbor, and B.Eng in Electrical & Computer Engineering at Shanghai
Jiao Tong University. At UM, I was
advised by Prof. David
Fouhey working on object articulation
detection, cloud geographical location
prediction, and 3D hand pose forecasting. I was
also advised by
Prof. Jeffrey
A. Fessler on medical image reconstruction with
deep learning.
My research interests are computer vision, generative models, and multimodal machine learning. Recently, I am interested in understanding human
activity, reconstructing 3D objects/scenes, and learning to interact with the world. I am also particularly interested in self-supervised and unsupervised learning which exploit prior knowledge such as
temporal information, geometry, multimodal consistency, and physical constraints.
Publications
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers