Skip to content
View shaheennabi's full-sized avatar

Block or report shaheennabi

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shaheennabi/README.md

Thanks for tuning hereπŸ‘‹






Who I am

╔════════════════════╗ β•‘ research -- thinking, reasoning models β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•


I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.

My work focuses on the post-training stack for LLMs β€” supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models.

I’m also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought.

Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.


* I love SpaceX rockets *

Pinned Loading

  1. open-posttraining-system open-posttraining-system Public

    Open-source research engineering project for building the end-to-end post-training stack for reasoning language models, including SFT, preference learning, RLHF/RLVR, evaluation, inference-time sca…

    Jupyter Notebook 4 2

  2. Olmo3-from-scratch Olmo3-from-scratch Public

    β€œA clean, from-scratch implementation of the OLMo architecture with KV caching, RoPE, and an efficient autoregressive inference pipeline. Designed as a minimal yet extensible foundation for post-tr…

    Jupyter Notebook 5

  3. Production-Ready-Instruction-Finetuning-of-Meta-Llama-3.2-3B-Instruct-Project Production-Ready-Instruction-Finetuning-of-Meta-Llama-3.2-3B-Instruct-Project Public

    Instruction Fine-Tuning of Meta Llama 3.2-3B Instruct on Kannada Conversations. Tailoring the model to follow specific instructions in Kannada, enhancing its ability to generate relevant, context-a…

    Jupyter Notebook 27 6

  4. Production-Ready-TripPlanner-Multi-AI-Agents-Project Production-Ready-TripPlanner-Multi-AI-Agents-Project Public

    ✈️🌍 Production-Ready TripPlanner Multi-AI Agent Project: Transform your travel planning with AI-driven assistance! From discovering dream destinations, creating custom itineraries, exploring avenue…

    Jupyter Notebook 74 16

  5. transformers transformers Public

    Forked from huggingface/transformers

    πŸ€— Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Python 1

  6. rlvr_grpo-experiment-with-math500 rlvr_grpo-experiment-with-math500 Public

    A small experiment repository comparing a base reasoning model against RLVR-GRPO checkpoints on the Math500 dataset. It includes evaluation results, short-form observations, and a local temp_clone …

    Jupyter Notebook 1