[llm] lazy-load batch stage and processor submodules#62861
Merged
Conversation
The ``ray.llm._internal.batch.stages`` and
``ray.llm._internal.batch.processor`` packages previously imported every
stage / engine submodule eagerly from their ``__init__.py``. Several of
those submodules pull in heavy optional dependencies (``transformers``,
``vllm``, ``sglang``, ``mistral_common``, ``huggingface_hub`` etc.).
As a result, even importing a lightweight piece like
``HttpRequestProcessorConfig`` -- which only needs an ``aiohttp`` client and
a few pydantic models -- was loading the entire ML stack. On a typical
machine that meant ~7s of import latency, a multi-hundred-MB process
footprint, and a hard ``ImportError`` whenever any of those optional deps
were not installed.
This change converts both ``__init__.py`` files to PEP 562
``__getattr__`` lazy re-exports:
* Cheap symbols (``StatefulStage``, ``Processor``, ``ProcessorBuilder``,
``ProcessorConfig``, the various ``*StageConfig`` classes) stay
eagerly imported because they have no heavy transitive deps and are
used everywhere.
* Engine-specific stage / processor classes are listed in a small
``_LAZY_ATTRS`` map and resolved on first attribute access via
``__getattr__``. The result is then cached in ``globals()`` so
subsequent lookups are free.
* ``__dir__`` is overridden to keep tab-completion working.
* A ``TYPE_CHECKING`` block re-exports the lazy names statically so
type checkers / IDEs continue to see them.
Side-effect note: each ``*_proc.py`` calls ``ProcessorBuilder.register``
at import time. With this change the registration happens the first time
the corresponding config is accessed via the package, which is exactly
when a user constructs that config and calls ``ProcessorBuilder.build``,
so the registry is populated in time for every realistic use.
Adds ``test_lazy_imports.py`` to pin the new behaviour: importing
``HttpRequestProcessorConfig`` must not pull ``transformers``, ``vllm``,
``sglang``, ``mistral_common`` or any non-HTTP stage / processor
submodule into ``sys.modules``.
Made-with: Cursor
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Made-with: Cursor
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements lazy loading for batch processors and stages using PEP 562 getattr to avoid unnecessary loading of heavy ML dependencies. It also includes regression tests to ensure that importing lightweight components does not trigger heavy imports and that attribute resolution works correctly. I have no feedback to provide.
Made-with: Cursor Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
jeffreywang88
approved these changes
Apr 27, 2026
| } | ||
|
|
||
|
|
||
| def __getattr__(name): |
Contributor
There was a problem hiding this comment.
This is quite similar to stages/_init_.py's _get_attr_. Consider refactoring to a shared utility.
Lucas61000
pushed a commit
to Lucas61000/ray
that referenced
this pull request
May 15, 2026
…ct#62861) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
ray.llm._internal.batch.stagesandray.llm._internal.batch.processorpreviously imported every stage / engine submodule eagerly from their__init__.py. Several of those submodules pull in heavy optional dependencies (transformers,vllm,sglang,mistral_common,huggingface_hub, ...).As a result, even importing a lightweight piece like
HttpRequestProcessorConfig— which only needs anaiohttpclient and a few pydantic models — was loading the entire ML stack. On a typical machine that meant ~7s of import latency, a multi-hundred-MB process footprint, and a hardImportErrorwhenever any of those optional deps were not installed.Closes #52632
Concretely, before this change:
prints something like:
After this change the same script prints
~1.1sand all four flags areFalse.What
Convert both
__init__.pyfiles to PEP 562__getattr__lazy re-exports:StatefulStage,Processor,ProcessorBuilder,ProcessorConfig, the various*StageConfigclasses) stay eagerly imported because they have no heavy transitive deps and are used everywhere._LAZY_ATTRSmap and resolved on first attribute access via__getattr__. The result is then cached inglobals()so subsequent lookups are free.__dir__is overridden to keep tab-completion / introspection working.TYPE_CHECKINGblock re-exports the lazy names statically so type checkers / IDEs continue to see them.Side-effect note
Each
*_proc.pycallsProcessorBuilder.register(...)at import time. With this change the registration happens the first time the corresponding config is accessed via the package, which is exactly when a user constructs that config and then callsProcessorBuilder.build, so the registry is populated in time for every realistic use. This is exercised in the tests — seetest_lazy_imports.pyand the existingProcessorBuilder.build(HttpRequestProcessorConfig(...))path.Test plan
python/ray/llm/tests/batch/cpu/processor/test_lazy_imports.py(17 cases, all green) pinning the new behavior:HttpRequestProcessorConfigmust not pulltransformers,vllm,sglang,mistral_common,tokenizers,huggingface_hubor any non-HTTP stage / processor submodule intosys.modules(verified in a clean subprocess).HttpRequestStagefrom thestagespackage must only loadhttp_request_stage.pyand not any other stage submodule.AttributeError(sohasattretc. behave correctly).dir(pkg)lists all lazy attrs.python/ray/llm/tests/batch/cpu/processor/andpython/ray/llm/tests/batch/cpu/stages/tests pass with this change identically to baseline (73 passed, same 13 pre-existing env-skew failures unrelated to this change).from ray.llm._internal.batch import HttpRequestProcessorConfig, thenProcessorBuilder.build(cfg)builds an HTTP processor with the expected stages.pre-commitpasses on all changed files (black, ruff, pydoclint, import order, etc.).Made with Cursor