Skip to content

lvogel04/aiperf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

527 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIPerf

PyPI version License Codecov Discord Ask DeepWiki

AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution. It provides detailed metrics using a command line display as well as extensive benchmark performance reports.

AIPerf UI Dashboard

Quick Start

pip install aiperf

aiperf profile \
  --model Qwen/Qwen3-0.6B \
  --url http://localhost:8000 \
  --endpoint-type chat \
  --concurrency 10 \
  --request-count 100 \
  --streaming

Features

  • Scalable multiprocess architecture with 9 services communicating via ZMQ
  • 3 UI modes: dashboard (real-time TUI), simple (progress bars), none (headless)
  • Multiple benchmarking modes: concurrency, request-rate, request-rate with max concurrency, trace replay
  • Extensible plugin system for endpoints, datasets, transports, and metrics
  • Public dataset support including ShareGPT and custom formats

Supported APIs

  • OpenAI chat completions, completions, embeddings, audio, images
  • NIM embeddings, rankings

Tutorials and Feature Guides

Getting Started

Load Control and Timing

Workloads and Data

Endpoint Types

Analysis and Monitoring

Documentation

Document Purpose
Architecture Three-plane architecture, core components, credit system, data flow
CLI Options Complete command and option reference
Metrics Reference All metric definitions, formulas, and requirements
Environment Variables All AIPERF_* configuration variables
Plugin System Plugin architecture, 25+ categories, creation guide
Creating Plugins Step-by-step plugin tutorial
Accuracy Benchmarks Accuracy evaluation stubs and datasets
Benchmark Modes Trace replay and timing modes
Server Metrics Prometheus-compatible server metrics collection
Tokenizer Auto-Detection Pre-flight tokenizer detection
Dataset Synthesis API Synthesis module API reference
Code Patterns Code examples for services, models, messages, plugins
Migrating from Genai-Perf Migration guide and feature comparison
Design Proposals Enhancement proposals and discussions

Contributing

See CONTRIBUTING.md for development setup, coding conventions, and contribution guidelines.

Known Issues

  • Output sequence length constraints (--output-tokens-mean) cannot be guaranteed unless you pass ignore_eos and/or min_tokens via --extra-inputs to an inference server that supports them.
  • Very high concurrency settings (typically >15,000) may lead to port exhaustion on some systems. Adjust system limits or reduce concurrency if connection failures occur.
  • Startup errors caused by invalid configuration settings can cause AIPerf to hang indefinitely. Terminate the process and check configuration settings.
  • Copying selected text may not work reliably in the dashboard UI. Use the c key to copy all logs.

About

AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.4%
  • Other 0.6%