I study computer science and statistics at Harvard University. I'm interested in understanding why AI systems behave unexpectedly, especially how unintended capabilities and failure modes emerge with interaction and scale. I aim to better understand AI systems in order to develop more reliable methods for aligning them with human intent. I'm also broadly interested in multi-agent systems, drawing from reinforcement learning and evolutionary computation to study emergent behavior in open-ended environments. I love music and language, too. I also like to design and build things for people.
I'm currently an undergraduate researcher at the Kempner Institute working on problems in technical AI alignment and multi-agent systems with Professor Kianté Brantley and Research Fellow Aaron Walsman. Previously, I worked with Professor Yilun Du on multi-agent reasoning with language models.
I interned as a software engineer in Institutional Securities Technology at Morgan Stanley. Before that, I was a Machine Learning Engineer Intern at FADEL and a Generative AI Research Intern at The Slade Lab.
Recent Posts
Inoculating Language Models Against Misalignment
Inoculation prompting can mitigate emergent misalignment but may also create backdoor triggers.
Do LLMs understand their adversarial prompts?
Discovering perplexing prompts that generate poems—then asking the LLM to explain.
When Agents Prefer Hacking To Failure: Evaluating Misalignment Under Pressure
What do agents do when they face obstacles to a goal? If the only path to a goal requires misaligned action, will they choose it or accept failure? We build off Anthropic’s work on Agentic Misalignment to investigate these questions in an agentic coding environment.
What I’ve learned doing RL with JAX
Some of my experiences while working on mechagogue, a reinforcement learning repository with from-scratch JAX implementations of classic RL algorithms.