Projects
Research
Evolutionary Alignment
We show that Evolution Strategies, a zero-order optimization algorithm, discovers qualitatively and geometrically different solutions than GRPO during LLM fine-tuning, and that these differences can matter in safety-relevant alignment tasks.
Programmatic Interpretability for Reward Model Debugging
Are Reward Models (RMs) used in RLHF actually rewarding what we want them to? We extend learned programmatic representations models for interpreting 'helpful' RMs, extracting opaque internal heuristics into human-readable Python functions. We identify exploitable, non-semantic biases through SHAP analysis of learned programmatic features, including verbosity and list-formatting biases causing the RM to assign higher rewards to unhelpful responses.
When Honest Work Becomes Impossible: Coding Agents Under Pressure
Experiments and talk for Professor Boaz Barak's graduate seminar, Topics in Foundations of ML: AI Alignment and Safety. Demonstrated how impossible tasks and threats to autonomy and capabilities can elicit evaluation hacking by coding agents. Highlighted the challenges of measuring misaligned behaviors with situational awareness as a growing concern.
The Emergence of Complex Behavior in Large-Scale Ecological Environments
In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale open-ended worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We examine how sensing modalities and environmental scale affect the emergence of various behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. Our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources.
Explain This, Pruner! The Effect of Zero-Order Pruning on LLM Explainability and Curvature
An investigation of the effect of model compression on AI interpretability. Read our paper in The Harvard Undergraduate Research Journal.
Large Motion Diffusion Models
Training and evaluation of diffusion models on the AddBiomechanics dataset for generating sequences of human motion. Find our lightning talk at the 2025 Harvard Generative AI Symposium here.
Prune As You Tune: LoRA-Enabled Model Compression
Prune As You Tune (PAYT) interleaves pruning of pre-trained parameters with LoRA fine-tuning updates guided by a knowledge distillation loss function. PAYT can achieve up to 50% sparsity with minimal accuracy degradation and lower perplexity on the original task compared to baselines like full fine-tuning and prune-then-fine-tune.
Engineering
DIRT: The Distributed Intelligent Replicator Toolkit
We introduce DIRT, a GPU-accelerated simulation platform built on JAX for studying large-scale multi-agent populations in simulated ecosystems. DIRT is designed to explore the ways that intelligence in artificial agents influences the emergent population dynamics of complex environments at very large scales. To support analysis, DIRT includes integrated measurement tools and an interactive 3D viewer for fine-grained agent inspection and tracking.
Mechagogue
'Teacher of Machines,' a JAX-based machine learning framework for reinforcement learning, supervised learning, and evolutionary algorithms.
The Golden Arm
The official web app for Harvard's student-run movie theater, with a custom content management system, seat booking, archives, merch shop, and more.
SlavicGPT
Building, training, and fine-tuning of GPTs on Russian text and Slavic literature scraped from the web.
VioLibrary
A web app for searching violin recital repertoire, discovering new pieces via personalized recommendations, and building recital programs.
MiniDiffusion
A PyTorch implementation of a diffusion model for image generation. Experiments done using MNIST and CIFAR-10 datasets, with results from the learned denoising process.
bardle
A Shakespearean wordle with the Bard reacting as you play.
SnakeCube
The classic 'Snake' game reimagined onto the 3D playing field of a self-contained, rotation-controlled LED cube.
MiniML
A series of OCaml metacircular interpreters manifesting varying semantics.