Skip to content

make get_current_servable_instance async#55457

Merged
zcin merged 1 commit into
masterfrom
abrar-fastapi
Aug 12, 2025
Merged

make get_current_servable_instance async#55457
zcin merged 1 commit into
masterfrom
abrar-fastapi

Conversation

@abrarsheikh

@abrarsheikh abrarsheikh commented Aug 10, 2025

Copy link
Copy Markdown
Contributor

This PR updates the injected dependency used in FastAPI routes to be asynchronous. Specifically, it modifies the implementation of get_current_servable_instance to be an async def so that FastAPI awaits it instead of offloading it to a threadpool. This change avoids unnecessary threadpool overhead and improves request performance.

Performance Comparison Table (With fix vs Master)

Metric With fix Master
# Requests 22,429 12,970
# Fails 0 0
Median (ms) 250 340
95%ile (ms) 310 430
99%ile (ms) 350 480
Average (ms) 248.04 338.4
Min (ms) 119 156
Max (ms) 392 662
Average size (bytes) 15 15
Current RPS 400.2 295.7
Current Failures/s 0 0

evidence that fastapi dependency injection is using threadpool

master

image

after fix

image

app used for benchmark

app = FastAPI()

@serve.deployment(max_ongoing_requests=10000)
@serve.ingress(app)
class MyDeployment:
    @app.get("/")
    async def request(self):
        return "Hello, world!"


app = MyDeployment.bind()

run using serve run app:app

device used: m6a.2xlarge
Concurrency: 100

Signed-off-by: abrar <abrar@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @abrarsheikh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've updated the get_current_servable_instance dependency injection in FastAPI routes to be asynchronous. This change prevents FastAPI from offloading this function to a threadpool, which significantly reduces overhead and improves request performance, as demonstrated by the provided performance metrics.

Highlights

  • Asynchronous Dependency Injection: The get_current_servable_instance function, used as a dependency in FastAPI routes, has been converted to an async def function.
  • Performance Improvement: This modification avoids unnecessary threadpool usage by FastAPI for this dependency, leading to improved request performance, higher requests per second (RPS), and lower latency (median, 95th, and 99th percentile).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Aug 10, 2025

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly changes get_current_servable_instance to an async function. This is a solid performance optimization for FastAPI-based deployments, as it avoids the overhead of running the dependency in a threadpool. The change is well-explained in the description and backed by performance data. The implementation is simple, correct, and has no side effects. I approve this change.

@abrarsheikh abrarsheikh marked this pull request as ready for review August 11, 2025 16:50
@abrarsheikh abrarsheikh requested a review from a team as a code owner August 11, 2025 16:50

@zcin zcin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@zcin zcin merged commit 5d05513 into master Aug 12, 2025
5 checks passed
@zcin zcin deleted the abrar-fastapi branch August 12, 2025 18:38
dioptre pushed a commit to sourcetable/ray that referenced this pull request Aug 20, 2025
This PR updates the injected dependency used in FastAPI routes to be
asynchronous. Specifically, it modifies the implementation of
get_current_servable_instance to be an async def so that FastAPI awaits
it instead of offloading it to a threadpool. This change avoids
unnecessary threadpool overhead and improves request performance.

### Performance Comparison Table (With fix vs Master)

| Metric                   | With fix       | Master         |
|--------------------------|----------------|----------------|
| # Requests               | 22,429         | 12,970         |
| # Fails                  | 0              | 0              |
| Median (ms)              | 250            | 340            |
| 95%ile (ms)              | 310            | 430            |
| 99%ile (ms)              | 350            | 480            |
| Average (ms)             | 248.04         | 338.4          |
| Min (ms)                 | 119            | 156            |
| Max (ms)                 | 392            | 662            |
| Average size (bytes)     | 15             | 15             |
| Current RPS              | 400.2          | 295.7          |
| Current Failures/s       | 0              | 0              |

evidence that fastapi dependency injection is using threadpool

### master

<img width="2560" height="594" alt="image"
src="https://github.com/user-attachments/assets/90091fc7-c93d-49b1-bd0a-a2449122f548"
/>

### after fix

<img width="2050" height="466" alt="image"
src="https://github.com/user-attachments/assets/9ca04860-ec0b-4d4a-9e32-efe5afc77d30"
/>

### app used for benchmark

```python
app = FastAPI()

@serve.deployment(max_ongoing_requests=10000)
@serve.ingress(app)
class MyDeployment:
    @app.get("/")
    async def request(self):
        return "Hello, world!"

app = MyDeployment.bind()
```
run using `serve run app:app`

device used: `m6a.2xlarge`
Concurrency: `100`

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Andrew Grosser <dioptre@gmail.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
This PR updates the injected dependency used in FastAPI routes to be
asynchronous. Specifically, it modifies the implementation of
get_current_servable_instance to be an async def so that FastAPI awaits
it instead of offloading it to a threadpool. This change avoids
unnecessary threadpool overhead and improves request performance.

### Performance Comparison Table (With fix vs Master)

| Metric                   | With fix       | Master         |
|--------------------------|----------------|----------------|
| # Requests               | 22,429         | 12,970         |
| # Fails                  | 0              | 0              |
| Median (ms)              | 250            | 340            |
| 95%ile (ms)              | 310            | 430            |
| 99%ile (ms)              | 350            | 480            |
| Average (ms)             | 248.04         | 338.4          |
| Min (ms)                 | 119            | 156            |
| Max (ms)                 | 392            | 662            |
| Average size (bytes)     | 15             | 15             |
| Current RPS              | 400.2          | 295.7          |
| Current Failures/s       | 0              | 0              |

evidence that fastapi dependency injection is using threadpool

### master

<img width="2560" height="594" alt="image"
src="https://github.com/user-attachments/assets/90091fc7-c93d-49b1-bd0a-a2449122f548"
/>

### after fix

<img width="2050" height="466" alt="image"
src="https://github.com/user-attachments/assets/9ca04860-ec0b-4d4a-9e32-efe5afc77d30"
/>

### app used for benchmark

```python
app = FastAPI()

@serve.deployment(max_ongoing_requests=10000)
@serve.ingress(app)
class MyDeployment:
    @app.get("/")
    async def request(self):
        return "Hello, world!"

app = MyDeployment.bind()
```
run using `serve run app:app`

device used: `m6a.2xlarge`
Concurrency: `100`

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
This PR updates the injected dependency used in FastAPI routes to be
asynchronous. Specifically, it modifies the implementation of
get_current_servable_instance to be an async def so that FastAPI awaits
it instead of offloading it to a threadpool. This change avoids
unnecessary threadpool overhead and improves request performance.

### Performance Comparison Table (With fix vs Master)

| Metric                   | With fix       | Master         |
|--------------------------|----------------|----------------|
| # Requests               | 22,429         | 12,970         |
| # Fails                  | 0              | 0              |
| Median (ms)              | 250            | 340            |
| 95%ile (ms)              | 310            | 430            |
| 99%ile (ms)              | 350            | 480            |
| Average (ms)             | 248.04         | 338.4          |
| Min (ms)                 | 119            | 156            |
| Max (ms)                 | 392            | 662            |
| Average size (bytes)     | 15             | 15             |
| Current RPS              | 400.2          | 295.7          |
| Current Failures/s       | 0              | 0              |

evidence that fastapi dependency injection is using threadpool

### master

<img width="2560" height="594" alt="image"
src="https://github.com/user-attachments/assets/90091fc7-c93d-49b1-bd0a-a2449122f548"
/>

### after fix

<img width="2050" height="466" alt="image"
src="https://github.com/user-attachments/assets/9ca04860-ec0b-4d4a-9e32-efe5afc77d30"
/>

### app used for benchmark

```python
app = FastAPI()

@serve.deployment(max_ongoing_requests=10000)
@serve.ingress(app)
class MyDeployment:
    @app.get("/")
    async def request(self):
        return "Hello, world!"

app = MyDeployment.bind()
```
run using `serve run app:app`

device used: `m6a.2xlarge`
Concurrency: `100`

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

3 participants