Gemini 2.5 Flash-Lite is our most balanced Gemini model, optimized for low latency use cases. It comes with the same capabilities that make other Gemini 2.5 models helpful, such as the ability to turn thinking on at different budgets, connecting to tools like Grounding with Google Search and code execution, multimodal input, and a 1 million-token context length.
2.5 Flash-Lite
Try in Agent Studio Deploy example app
| Model ID | gemini-2.5-flash-lite |
|
|---|---|---|
| Modalities |
|
|
| Token limits | Context window | 1,048,576 |
| Maximum output tokens | 65,535 (default) | |
| Capabilities |
|
|
| Consumption options |
|
|
| See Consumption options for more information. | ||
| Input size limit | 500 MB | |
| Technical specifications | Image |
|
| Text |
|
|
| Video |
|
|
| Audio |
|
|
| Parameter defaults |
|
|
| Supported regions |
Model availability |
|
| See Deployments and endpoints for more information. | ||
| Knowledge cutoff date | January 2025 | |
| Versions |
|
|
| Security controls | Online prediction |
|
| Batch inference |
|
|
| Tuning |
|
|
| Context caching |
|
|
| RAG Engine |
|
|
| Grounding with Google Search and Grounding with Google Maps |
|
|
| See Security controls for more information. | ||
| Pricing | See Pricing. | |
2.5 Flash-Lite
Try in Agent Studio Deploy example app
| Model ID | gemini-2.5-flash-lite-preview-09-2025 |
|
|---|---|---|
| Modalities |
|
|
| Token limits | Context window | 1,048,576 |
| Maximum output tokens | 65,535 (default) | |
| Capabilities |
|
|
| Consumption options |
|
|
| See Consumption options for more information. | ||
| Technical specifications | Image |
|
| Text |
|
|
| Video |
|
|
| Audio |
|
|
| Parameter defaults |
|
|
| Supported regions |
Model availability |
|
| See Deployments and endpoints for more information. | ||
| Knowledge cutoff date | January 2025 | |
| Versions |
|
|
| Pricing | See Pricing. | |