A collection of notebooks and guides for deploying, fine-tuning, and using NVIDIA Nemotron-3-Ultra.
Nemotron-3-Ultra is a 550B total / 55B active-parameter hybrid Mamba-Transformer MoE model for long-running agentic workflows across coding, research, and enterprise tasks. The usage cookbooks cover hosted agent harness configuration, multi-GPU deployment, LoRA fine-tuning, and RL post-training.
- vllm_cookbook.ipynb - Deploy Nemotron-3-Ultra with vLLM.
- sglang_cookbook.ipynb - Deploy Nemotron-3-Ultra with SGLang.
- trtllm_cookbook.ipynb - Deploy Nemotron-3-Ultra with TensorRT-LLM.
- SparkDeploymentGuide - Deploy Nemotron-3-Ultra across a 4x DGX Spark cluster with vLLM, then benchmark it with NVIDIA AIPerf.
- RL - Full-weight RL training with DAPO/GRPO, including direct NeMo RL and NeMo Gym variants.
- lora-text2sql/nemo-automodel - LoRA fine-tuning recipe for Text2SQL using NeMo AutoModel.
- lora-text2sql/nemo-megatron-bridge - LoRA fine-tuning recipe for Text2SQL using NeMo Megatron-Bridge.
- OpenScaffoldingResources - Config-based guides for using Nemotron-3-Ultra with agentic coding tools via OpenRouter and build.nvidia.com.
- build.nvidia.com: nvidia/nemotron-3-ultra-550b-a55b
- Hugging Face: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
- Technical report: NVIDIA Nemotron 3 Ultra Technical Report