Skip to content

[llm] lazy-load batch stage and processor submodules#62861

Merged
kouroshHakha merged 2 commits into
ray-project:masterfrom
kouroshHakha:lazy-batch-stages
Apr 27, 2026
Merged

[llm] lazy-load batch stage and processor submodules#62861
kouroshHakha merged 2 commits into
ray-project:masterfrom
kouroshHakha:lazy-batch-stages

Conversation

@kouroshHakha

@kouroshHakha kouroshHakha commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Why

ray.llm._internal.batch.stages and ray.llm._internal.batch.processor previously imported every stage / engine submodule eagerly from their __init__.py. Several of those submodules pull in heavy optional dependencies (transformers, vllm, sglang, mistral_common, huggingface_hub, ...).

As a result, even importing a lightweight piece like HttpRequestProcessorConfig — which only needs an aiohttp client and a few pydantic models — was loading the entire ML stack. On a typical machine that meant ~7s of import latency, a multi-hundred-MB process footprint, and a hard ImportError whenever any of those optional deps were not installed.

Closes #52632

Concretely, before this change:

import sys, time
t = time.perf_counter()
import ray.llm._internal.batch.processor.http_request_proc  # noqa
print(f"{time.perf_counter() - t:.2f}s")
print("transformers loaded:", "transformers" in sys.modules)
print("vllm.transformers_utils loaded:", "vllm.transformers_utils" in sys.modules)
print("sglang loaded:", "sglang" in sys.modules)
print("mistral_common loaded:", "mistral_common" in sys.modules)

prints something like:

7.46s
transformers loaded: True
vllm.transformers_utils loaded: True
sglang loaded: True
mistral_common loaded: True

After this change the same script prints ~1.1s and all four flags are False.

What

Convert both __init__.py files to PEP 562 __getattr__ lazy re-exports:

  • Cheap symbols (StatefulStage, Processor, ProcessorBuilder, ProcessorConfig, the various *StageConfig classes) stay eagerly imported because they have no heavy transitive deps and are used everywhere.
  • Engine-specific stage / processor classes are listed in a small _LAZY_ATTRS map and resolved on first attribute access via __getattr__. The result is then cached in globals() so subsequent lookups are free.
  • __dir__ is overridden to keep tab-completion / introspection working.
  • A TYPE_CHECKING block re-exports the lazy names statically so type checkers / IDEs continue to see them.

Side-effect note

Each *_proc.py calls ProcessorBuilder.register(...) at import time. With this change the registration happens the first time the corresponding config is accessed via the package, which is exactly when a user constructs that config and then calls ProcessorBuilder.build, so the registry is populated in time for every realistic use. This is exercised in the tests — see test_lazy_imports.py and the existing ProcessorBuilder.build(HttpRequestProcessorConfig(...)) path.

Test plan

  • New regression test file python/ray/llm/tests/batch/cpu/processor/test_lazy_imports.py (17 cases, all green) pinning the new behavior:
    • importing HttpRequestProcessorConfig must not pull transformers, vllm, sglang, mistral_common, tokenizers, huggingface_hub or any non-HTTP stage / processor submodule into sys.modules (verified in a clean subprocess).
    • importing HttpRequestStage from the stages package must only load http_request_stage.py and not any other stage submodule.
    • every lazy attr resolves to the right class from the right submodule.
    • unknown attributes raise AttributeError (so hasattr etc. behave correctly).
    • dir(pkg) lists all lazy attrs.
  • Pre-existing python/ray/llm/tests/batch/cpu/processor/ and python/ray/llm/tests/batch/cpu/stages/ tests pass with this change identically to baseline (73 passed, same 13 pre-existing env-skew failures unrelated to this change).
  • Sanity-checked the public API end-to-end: from ray.llm._internal.batch import HttpRequestProcessorConfig, then ProcessorBuilder.build(cfg) builds an HTTP processor with the expected stages.
  • pre-commit passes on all changed files (black, ruff, pydoclint, import order, etc.).

Made with Cursor

The ``ray.llm._internal.batch.stages`` and
``ray.llm._internal.batch.processor`` packages previously imported every
stage / engine submodule eagerly from their ``__init__.py``. Several of
those submodules pull in heavy optional dependencies (``transformers``,
``vllm``, ``sglang``, ``mistral_common``, ``huggingface_hub`` etc.).

As a result, even importing a lightweight piece like
``HttpRequestProcessorConfig`` -- which only needs an ``aiohttp`` client and
a few pydantic models -- was loading the entire ML stack. On a typical
machine that meant ~7s of import latency, a multi-hundred-MB process
footprint, and a hard ``ImportError`` whenever any of those optional deps
were not installed.

This change converts both ``__init__.py`` files to PEP 562
``__getattr__`` lazy re-exports:

  * Cheap symbols (``StatefulStage``, ``Processor``, ``ProcessorBuilder``,
    ``ProcessorConfig``, the various ``*StageConfig`` classes) stay
    eagerly imported because they have no heavy transitive deps and are
    used everywhere.
  * Engine-specific stage / processor classes are listed in a small
    ``_LAZY_ATTRS`` map and resolved on first attribute access via
    ``__getattr__``. The result is then cached in ``globals()`` so
    subsequent lookups are free.
  * ``__dir__`` is overridden to keep tab-completion working.
  * A ``TYPE_CHECKING`` block re-exports the lazy names statically so
    type checkers / IDEs continue to see them.

Side-effect note: each ``*_proc.py`` calls ``ProcessorBuilder.register``
at import time. With this change the registration happens the first time
the corresponding config is accessed via the package, which is exactly
when a user constructs that config and calls ``ProcessorBuilder.build``,
so the registry is populated in time for every realistic use.

Adds ``test_lazy_imports.py`` to pin the new behaviour: importing
``HttpRequestProcessorConfig`` must not pull ``transformers``, ``vllm``,
``sglang``, ``mistral_common`` or any non-HTTP stage / processor
submodule into ``sys.modules``.

Made-with: Cursor
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Made-with: Cursor
@kouroshHakha kouroshHakha requested a review from a team as a code owner April 22, 2026 21:50

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements lazy loading for batch processors and stages using PEP 562 getattr to avoid unnecessary loading of heavy ML dependencies. It also includes regression tests to ensure that importing lightweight components does not trigger heavy imports and that attribute resolution works correctly. I have no feedback to provide.

Made-with: Cursor
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Apr 23, 2026
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label Apr 23, 2026
@kouroshHakha kouroshHakha self-assigned this Apr 23, 2026
}


def __getattr__(name):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite similar to stages/_init_.py's _get_attr_. Consider refactoring to a shared utility.

@kouroshHakha kouroshHakha merged commit b3eb203 into ray-project:master Apr 27, 2026
8 checks passed
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…ct#62861)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

2 participants