WattGPU is a framework for predicting the energy and latency characteristics of Large Language Model (LLM) inference on GPUs without requiring profiling or hardware access.
The repository contains the code, data processing pipeline, and models from the paper WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs, presented at the 1st Workshop on Sustainability and Resource-Efficiency of Artificial Intelligence @ IJCAI 2026.
WattGPU predicts:
- Mean GPU power draw during inference
- Inter-Token Latency (ITL)
using only:
- Public GPU specifications
- Public LLM metadata
The models generalize to unseen GPUs and unseen LLMs, enabling energy-aware deployment decisions before running experiments.
If you use our work, please cite it.
-
WattGPU.ipynb— Main notebook containing:- Data preprocessing
- Feature engineering
- Model training
- Evaluation
- Reproduction of the paper results
-
requirements.txt— Python dependencies -
Data used in the experiments, including the subset of Watt Counts used for training and evaluation.
git clone <repository-url>
cd wattgpuWe recommend Python 3.12.
uv venv --python 3.12source .venv/bin/activate.venv\Scripts\activateuv pip install -r requirements.txtStart Jupyter Lab:
jupyter labThen open:
WattGPU.ipynb
The notebook contains the complete pipeline used in the paper.
Apache 2.0.