I study computer science and statistics at Harvard University. I'm interested in understanding why AI systems behave unexpectedly, especially how unintended capabilities and failure modes emerge with interaction and scale. I aim to better understand AI systems in order to develop more reliable methods for aligning them with human intent. I'm also broadly interested in multi-agent systems, drawing from reinforcement learning and evolutionary computation to study emergent behavior in open-ended environments. I love music and language, too. I also like to design and build things for people.

I'm currently an undergraduate researcher at the Kempner Institute working on problems in technical AI alignment and multi-agent systems with Professor Kianté Brantley and Research Fellow Aaron Walsman. Previously, I worked with Professor Yilun Du on multi-agent reasoning with language models.

I interned as a software engineer in Institutional Securities Technology at Morgan Stanley. Before that, I was a Machine Learning Engineer Intern at FADEL and a Generative AI Research Intern at The Slade Lab.

Recent Posts

What I’ve learned doing RL with JAX

8 minute read

Some of my experiences while working on mechagogue, a reinforcement learning repository with from-scratch JAX implementations of classic RL algorithms.

Research

Evolutionary Alignment

We show that Evolution Strategies, a zero-order optimization algorithm, discovers qualitatively and geometrically different solutions than GRPO during LLM fine-tuning, and that these differences can matter in safety-relevant alignment tasks.

Programmatic Interpretability for Reward Model Debugging

Are Reward Models (RMs) used in RLHF actually rewarding what we want them to? We extend learned programmatic representations models for interpreting 'helpful' RMs, extracting opaque internal heuristics into human-readable Python functions. We identify exploitable, non-semantic biases through SHAP analysis of learned programmatic features, including verbosity and list-formatting biases causing the RM to assign higher rewards to unhelpful responses.

When Honest Work Becomes Impossible: Coding Agents Under Pressure

Experiments and talk for Professor Boaz Barak's graduate seminar, Topics in Foundations of ML: AI Alignment and Safety. Demonstrated how impossible tasks and threats to autonomy and capabilities can elicit evaluation hacking by coding agents. Highlighted the challenges of measuring misaligned behaviors with situational awareness as a growing concern.

Engineering

DIRT: The Distributed Intelligent Replicator Toolkit

We introduce DIRT, a GPU-accelerated simulation platform built on JAX for studying large-scale multi-agent populations in simulated ecosystems. DIRT is designed to explore the ways that intelligence in artificial agents influences the emergent population dynamics of complex environments at very large scales. To support analysis, DIRT includes integrated measurement tools and an interactive 3D viewer for fine-grained agent inspection and tracking.

Mechagogue

'Teacher of Machines,' a JAX-based machine learning framework for reinforcement learning, supervised learning, and evolutionary algorithms.

The Golden Arm

The official web app for Harvard's student-run movie theater, with a custom content management system, seat booking, archives, merch shop, and more.