ββββββββββββββββββββββ
β research -- thinking, reasoning models β
ββββββββββββββββββββββ
I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.
My work focuses on the post-training stack for LLMs β supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models.
Iβm also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought.
Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.
* I love SpaceX rockets *



