Projects
Research
When Honest Work Becomes Impossible: Coding Agents Under Pressure
Experiments and talk for Professor Boaz Barak's graduate seminar, Topics in Foundations of ML: AI Alignment and Safety. Demonstrated how impossible tasks and threats to autonomy and capabilities lead to evaluation hacking by coding agents. Highlighted the challenges of measuring misaligned behaviors with situational awareness as a growing concern.
The Emergence of Complex Behavior in Large-Scale Ecological Environments
In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale open-ended worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We examine how sensing modalities and environmental scale affect the emergence of various behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. Our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources.
Explain This, Pruner! The Effect of Zero-Order Pruning on LLM Explainability and Curvature
An investigation of the effect of model compression on AI interpretability. Read our paper in The Harvard Undergraduate Research Journal.
Large Motion Diffusion Models
Training and evaluation of diffusion models on the AddBiomechanics dataset for generating sequences of human motion. Find our lightning talk at the 2025 Harvard Generative AI Symposium here.
Prune As You Tune: LoRA-Enabled Model Compression
Prune As You Tune (PAYT) interleaves pruning of pre-trained parameters with LoRA fine-tuning updates guided by a knowledge distillation loss function. PAYT can achieve up to 50% sparsity with minimal accuracy degradation and lower perplexity on the original task compared to baselines like full fine-tuning and prune-then-fine-tune.
Engineering
DIRT: The Distributed Intelligent Replicator Toolkit
We introduce DIRT, a GPU-accelerated simulation platform built on JAX for studying large-scale multi-agent populations in simulated ecosystems. DIRT is designed to explore the ways that intelligence in artificial agents influences the emergent population dynamics of complex environments at very large scales. To support analysis, DIRT includes integrated measurement tools and an interactive 3D viewer for fine-grained agent inspection and tracking.
Mechagogue
'Teacher of Machines,' a JAX-based machine learning framework for reinforcement learning, supervised learning, and evolutionary algorithms.
The Golden Arm
The official web app for Harvard's student-run movie theater, with a custom content management system, seat booking, archives, merch shop, and more.
SlavicGPT
Building, training, and fine-tuning of GPTs on Russian text and Slavic literature scraped from the web.
VioLibrary
A web app for searching violin recital repertoire, discovering new pieces via personalized recommendations, and building recital programs.
MiniDiffusion
A PyTorch implementation of a diffusion model for image generation. Experiments done using MNIST and CIFAR-10 datasets, with results from the learned denoising process.
bardle
A Shakespearean wordle with the Bard reacting as you play.
SnakeCube
The classic 'Snake' game reimagined onto the 3D playing field of a self-contained, rotation-controlled LED cube.
MiniML
A series of OCaml metacircular interpreters manifesting varying semantics.