Add LoRA fork weight loading (pre-transformers-v5 base) by arcticfly · Pull Request #654 · OpenPipe/ART

arcticfly · 2026-04-16T22:36:05Z

Summary

Adds the pieces needed for `backend._experimental_fork_checkpoint` to actually
load the forked LoRA weights into the trainer (rather than just copying the
checkpoint directory and letting `from_pretrained` initialize a fresh LoRA).

`UnslothState.load_lora_adapter(path)` — reads `adapter_model.safetensors` and applies it to the live peft model via `set_peft_model_state_dict`.
`UnslothService._forked_checkpoint_dir` — records the forked path so the first `_train_dedicated` / `_train_shared` call applies it.
`LocalBackend._experimental_fork_checkpoint` — invalidates the `_state` cache after `shutil.copytree` and records `_forked_checkpoint_dir` on the service.

Why the unusual base

This branch is based on commit `621e82b2` (last commit before the transformers-v5 upgrade in #629), not current main. On H200 + `load_in_4bit=True`, transformers v5 + Unsloth 2026.3.3 crash with `Half and BFloat16` in Unsloth's fused LoRA kernels on the first forward pass, before any rollouts. The v4 base avoids that.

Not expected to merge as-is — posting as a reference for the fork-weight-loading mechanics. Maintainers would likely want to:

Resolve the v5 dtype mismatch upstream (possibly via Unsloth), then
Cherry-pick the three pieces above onto main.

Test plan

End-to-end 20-step training on a forked `kl-000-1` checkpoint: checkpoint reloaded correctly across every step, `val/reward` started at ~0.86 (source-checkpoint quality, not raw-base-model quality).
End-to-end training without forking: unchanged behavior.
Maintainer review of whether this approach is the right shape for a forward-port.

🤖 Generated with Claude Code

Adds three pieces needed for LocalBackend._experimental_fork_checkpoint to actually load the forked LoRA weights into the trainer: 1. UnslothState.load_lora_adapter — loads adapter_model.safetensors into the live peft model via set_peft_model_state_dict, replacing the freshly-initialized LoRA layers from from_pretrained. 2. UnslothService._forked_checkpoint_dir — stores the forked path so the first _train_dedicated / _train_shared call can apply it. 3. backend._experimental_fork_checkpoint — invalidates the _state cache after copytree, then records _forked_checkpoint_dir on the service. Built on 621e82b (pre-transformers-v5) because v5 introduces a bf16/fp16 mismatch in Unsloth's fused LoRA kernels that crashes every forward pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LoRA fork weight loading (pre-transformers-v5 base)#654

Add LoRA fork weight loading (pre-transformers-v5 base)#654
arcticfly wants to merge 1 commit into
mainfrom
fix/fork-on-pre-v5

arcticfly commented Apr 16, 2026

Labels

1 participant