Releases: ray-project/ray
Releases · ray-project/ray
Release list
Ray-2.56.0
Highlights
- Ray Data Stability: In this Ray release, we've added a variety of stability improvements, including running multiple datasets in a cluster, adding automatic batch size selection to CPU-based map-batches, and default logical memory configuration to prevent OOMs. We've also tightened
iter_batchesstability by reducing hidden buffering and shutting down the executor when consumers exit early (#63660, #63682, #62949). This reduces object-store spilling for common training workloads - Ray Serve: We re-architected Ray Serve LLM by decoupling request handling from token streaming response path (#62667, #62680, #62668, #62669, #63167), resulting in significant LLM serving performance improvements. We've also introduced new routing policies such as session-sticky routing via consistent hashing with
ConsistentHashRouter(#62905, #63096, #62906) andCapacityQueueRouter(#62323) which is beneficial for supply-constrained workloads. - Ray Core: We've added GPU-domain-aware placement groups using label locality (#61442, #61614, #62487, #62533). This enables placement groups to pack bundles onto nodes that share a
ray.io/gpu-domainlabel instead of only packing at the single-node level. We've also added initial Kubernetes in-place pod resizing support for Autoscaler v2 (#55961, #62369, #62215), enabling Ray clusters to resize CPU and memory on existing worker pods before scaling out new pods.
Ray Data
🎉 New Features
- Support multiple datasets per cluster via subcluster labels and resource partitioning (#63331, #63375, #63982)
- Add
Dataset.mix()public API andMixOperatorfor weighted dataset mixing (#63168, #62450) - New DataSourceV2 framework:
ParquetDatasourceV2, chunked reader, predicate splitting, listing/scanner infra (#63113, #63454, #63163, #62975, #63027, #62182) - Add
batch_size='auto'tomap_batchesto derive batch row count from target row batch size (#62648) - Implement distributed upsert for Iceberg using task-based merge algorithm, preventing performance bottleneck on driver (#63482)
- Add
include_row_hashtoread_parquet(#61408) - Add JAX data iterator (#61630)
- Expose flag to run read tasks on isolated worker processes via
isolate_read_workers(#63490) - Expose flag to set default logical memory for map operators via
default_map_logical_memory_enabled(#63814) - Support predicate pushdown for Lance format (#61400)
- Support per-partition
start_offsetandend_offsetforread_kafka(#61620) - Add obstore async download backend for download operator (#61735)
- Support UDF retries on transient exceptions (#63023)
💫 Enhancements
- Fix
iter_batchesspilling by replacingmake_async_genwithiter_threadedand reducing buffered batches (#63660, #63682) - Gate
restore_original_orderiniter_batchesbehindpreserve_order(#63792) - Convert
drop_columnsto aProjectlogical operator when input schema is known (#63813) - Make
ConcatAggregationandTurbopufferDatasinkusepolarsfor sorting (#61904) - Boost and vectorize
hash_partitionwithsort_indices, zero-copy slices, and pandas (#63498, #62757, #63152, #62587) - Enable
GPU_SHUFFLEingrouped_data.py(#62410) - Eager
StarExprexpansion, schema inference for non-black-box UDFs, and Expressions struct support (#63776, #63387, #62560) - Make logging configurable via
RAY_DATA_LOG_LEVELand logRAY_DATAenv vars at execution start (#63487, #63380) - Display and track logical memory in progress bar (#63379)
- Honor
compute=infilter(expr=...)and deprecateconcurrency=(#63576) - Enable filter pushdown through
StreamingRepartitionand read stage column-rename removal (#62347, #63384, #63582) - Cache deserialized Arrow schemas in
BlockMetadataWithSchema(#63462) - Track scheduling-loop step duration (p50/p90/max), peak USS/object-store memory, and task block locality (#63586, #63345, #63489, #63418, #62249)
- Replace
TaskDurationStatsand Timer withDistributionTracker(#63488, #63530, #63825) - Introduce
BlockEntryonRefBundlein place of(ref, metadata)tuples (#63654) - Pre-resolve filesystem in threaded download to avoid IMDS herd (#62898)
- Convert logical operators to frozen dataclasses and consolidate operator base/repr (#62593, #62568, #62400, #63137, #63140, #63108)
- Non-blocking default autoscaling coordinator and resource-aware auto-downscaling (#62725, #62574)
- Release pinned blocks after dataset execution and shut down executor on early
DataIteratorexit (#62456, #62949) - Optimize local shuffle with incremental index and configurable compaction threshold (#62539)
- Speed up checkpoint filter and reduce memory usage (#60294)
- Preserve Arrow types through pandas roundtrip and reorder block columns by name before schema ops (#63017, #63582)
- Block pickle object columns when reading untrusted Parquet and gate unsafe WebDataset deserialization (#63470, #63469)
- Move backpressure escape hatch across all policies (#63539)
- Update
pandas,modin, andpyarrowminimum versions (#62899) - Add utilization monitoring and correct logical resource usage for
ActorPool(#61987, #61528) - Deprecate
ConcurrencyCapBackpressurePolicy,DataIterator.to_torch, and pandas UDF batches (#63392, #62540, #61733) - Rank actors per node in a heap and avoid re-exporting actor class via
.options(#62309, #62722) read_deltareads from preconfiguredpyarrowdataset (#61721)- Include column name and target type in
ArrowConversionError; reduce arrow conversion warning verbosity (#62407, #61486, #62521) - Show external consumer bytes in verbose operator progress log (#63728)
- Disable
DataSourceV2by default after earlier enabling (#63674, #63326)
🔨 Fixes
- Rename subcluster label key from
__subcluster__toray-subcluster(#63982) - Fix
get_or_create_stats_actorcrash in Ray Client mode (#63402) - Fix datasource pushdown crashes for generic
UDFExprfilter predicates (#63781) - Fix hash-shuffle aggregator memory estimation: metadata propagation, node-size clamp, column pruning (#63809)
- Fix
CheckpointConfigFileNotFoundErroron Azure Blob Storage (#63606) - Fix silent credential drop for fsspec-S3 in download expression (#62897)
- Fix missing f-string prefix in
_concatenate_extension_column(#62939) - Fix
HashAggregateduplicate group rows forAggregateFnV2(#63066) - Fix JSONL read retry with advanced file cursor (#63233)
- Fix
read_parquetArrowNotImplementedErrorfor nested column types exceeding ~2GB row group (#61824) - Fix
read_parquetnested-type fallback and parquet scanner memory accumulation (#63175, #62745) - Fix memory leak in
DataIterator.to_torch()by switching toPyArrow(#60966) - Fix
ZipOperatorfreeing shared blocks via_split_at_indices(#62665) - Fix concurrent writes race condition in
write_parquet(#62377) - Fix GPU shuffle output ordering when using
ShuffleStrategy.GPU_SHUFFLE(#62351) - Fix incorrect
DatasetStatuuid propagation (#62255) - Fix none issue when
DATA_ENABLE_OP_RESOURCE_RESERVATION=False(#61718) - Fix filesystem compatibility check for fsspec-wrapped
PyFileSystem(#61850) - Forward
try_create_dirtopyarrow.dataset.write_dataset(#58302) - Fix autoscaler bug blocking timely release of leased resources (#62592)
- Ensure consistent
nan_is_null/nans-as-nulls semantics in encoder (#62623, #62618) - Skip unconditional null strip in
find_partition_index(#62594) - V1
_split_predicate_by_columnscorrectness fix (#63176) - Avoid importing cudf in
_is_cudf_dataframewhen cudf not loaded (#62302) - Revert raw-modulo hash partition fast path (#63097)
- Remove
tfx-bslsupport fromread_tfrecords(#63245)
📖 Documentation
- Document
isolate_read_workersforread_parquet(#63816) - Remove docs recommending increased object store memory proportion (#63389)
- Update docs minimum version for
build_processorand"auto"batch size (#61757, #62790) - Remove outdated limitation of
DefaultClusterAutoscalerV2and stale object-store-memory warnings (#62385, #62387)
Ray Serve
🎉 New Features:
- Add custom ingress request router app interfaces and HAProxy ingress dispatch path (#62680, #62668, #62669, #62667)
- Expose
choose_replica/dispatchon deployment handles andAsyncioRouterwith replica-side slot reservation (#63255, #63254, #63252) - Introduce experimental round robin router and
ConsistentHashRouterfor session-sticky routing (#63238, #62906, #63096, #62905) - Central capacity queue for token-based request routing via
CapacityQueueRouter(#62323) - Add experimental
ray-haproxysupport behindRAY_SERVE_EXPERIMENTAL_PIP_HAPROXY(#62589) - Add deployment actor context API and broadcast API for deployment handles (#62532, #61472)
- Add
ControllerOptionsfor configurable controllerruntime_env(#63352) - Make rolling update percentage configurable (#62160)
- Support per-request timeout and disconnect in HTTP proxy path (#62867)
💫 Enhancements:
- HAProxy stability improvements: wait for old workers before drain, redirect stdout/stderr, redispatch+retry-on, coalesce broadcasts, quarantine released ports (#63620, #63621, #63622, #63623, #63628)
- Bind direct ingress ports to
0.0.0.0for cross-node HAProxy routing (#62515) - HAProxy ingress request router metrics, enable splice by default,
TCP_NODELAYdefault 1, optional retry knobs,RAY_SERVE_HAPROXY_STATS_PORT(#63356, #63531, #63353, #63415, #62979) - Resolve bundled ray-haproxy binary before
RAY_SERVE_HAPROXY_BINARY_PATH; HAProxy abspath env var (#63829, #62610) - Replace socat subprocess with Python socket for HAProxy admin communication; bump HAProxy to avoid CVE-2025-11230 (#61897, #62585)
- Expose controller health metrics via
/api/serve/applications/API; addmax_replicas_per_nodeto response (#63556, #63234) - Run health check on user ...
Ray-2.55.1
Ray-2.55.0
Ray Data
🎉 New Features
- Add
DataSourceV2API with scanner/reader framework, file listing, and file partitioning (#61220, #61615, #61997) - Support GPU shuffle with
rapidsmpf26.2 (#61371, #62062) - Add Kafka datasink, migrate to
confluent-kafka, supportdatetimeoffsets (#60307, #61284, #60909) - Add Turbopuffer datasink (#58910)
- Add 2-phase commit checkpointing with trie recovery and load method (#61821, #60951)
- Queue-based autoscaling policy integrated with task consumers (#59548, #60851)
- Enable autoscaling for GPU stages (#61130)
- Expressions: add
random(),uuid(),cast, and map namespace support (#59656, #60695, #59879) - Add support for Arrow native fixed-shape tensor type (#56284)
- Support writing tensors to tfrecords (#60859)
- Add
pathlib.Pathsupport toread_*functions (#61126) - Add
cudfas abatch_format(#61329) - Allow
ActorPoolStrategyforread_datasource()viacomputeparameter (#59633) - Introduce
ExecutionCachefor streamlined caching (#60996) - Support
strict=Falsemode forStreamingRepartition(#60295) - Port changes from lance-ray into Ray Data (#60497)
- Enable PyArrow compute-to-expression conversion for predicate pushdown (#61617)
- Add vLLM metrics export and Data LLM Grafana dashboard (#60385)
- Include logical memory in resource manager scheduling decisions (#60774)
- Add monotonically increasing ID support (#59290)
💫 Enhancements
- Performance: cache
_map_taskargs, heap-based actor ranking, actor pool map improvements (#61996, #62114, #61591) - Optimize concat tables and PyArrow schema hashing (#61315, #62108)
- Reduce default
DownstreamCapacityBackpressurePolicythreshold to 50% (#61890) - Improve reproducibility for random APIs (#59662)
- Clamp batch size to fall within C++ 32-bit int range (#62242)
- Account for external consumer object store usage in resource manager budget (#62117)
- Make
get_parquet_datasetconfigurable in number of fragments to scan (#61670) - Consolidate schema inference and make all preprocessors implement
SerializablePreprocessorBase(#61213, #61341) - Disable hanging issue detection by default (#62405)
- Make execution callback dataflow explicit to prevent state leakage (#61405)
- Log
DataContextin JSON format at execution start for traceability (#61150, #61428) - Autoscaler: configurable traceback, Prometheus gauges, relaxed constraints (#62210, #62209, #61917, #61385)
- Add metrics for task scheduling time, output backpressure, and logical memory (#61192, #61007, #61436)
- Prevent operators from dominating entire shared object store budget (#61605)
- Eliminate generators to avoid intermediate state pinning (#60598)
- Default log encoding to UTF-8 on Windows (#61143)
- Remove legacy
BlockList,locality_with_output, old callback API, PyArrow 9.0 checks (#60575, #61044, #62055, #61483) - Upgrade to
pyiceberg0.11.0; cappandasto <3 (#61062, #60406) - Refactor logical operators to frozen dataclasses (#61059, #61308, #61348, #61349, #61351, #61364, #61481)
- Prevent aggregator head node scheduling (#61288)
- Add error for
local://paths with a zero-resource head node (#60709)
🔨 Fixes
- Fix RCE in Arrow extension type deserialization from Parquet (#62056)
- Fix
StreamingSplitDataIterator.schema()(#62057) - Fix
ParquetDatasourcehandling ofFileSystemFactory.inspect(#62065) - Fix
read_parquetfile-extension filtering for versioned object-store URIs (#61376) - Fix
wide_schema_pipeline_tensorscloudpickle deserialization (#62149) - Fix
OpBufferQueuerace condition (#60828) - Fix scheduling metrics computation (#62031)
- Fix
OneHotEncodermax_categoriesto use global top-k instead of per-partition (#60790) - Fix
ReservationOpResourceAllocatorresource borrowing forActorPoolMapOperator(#60882) - Fix
DatabricksUCDatasourceschema()shadowing by schema string attribute (#61282) - Fix
AliasExprstructural equality to respect rename flag (#60711) - Fix
_align_struct_fieldsfailure with unaligned scalar fields (#58364) - Fix
min_scheduling_resourcesfallback toincremental_resource_usage(#60997) - Fix output backpressure unblocking sequence for terminal ops (#60798)
- Fix multi-input operator object store memory attribution (#61208)
- Fix reference cycle by moving to module scope (#61934)
- Fix autoscaler logging: reduce verbose output and move traceback to debug (#61989, #62126)
- Fix double counting
ref_bundle+input_files(#61774) - Replace
on_exithook with__ray_shutdown__to fix UDF cleanup race (#61700) - Prevent
Limitfrom getting pushed pastmap_groups(#60881) - Propagate schema in empty
_shuffle_blockto fixColumnNotFoundin chained left joins (#61507) - Fix unclear metadata warning and incorrect operator name logging (#61380)
- Clamp rolling utilization averages to zero (#61543)
- Fix floating point errors in
TimeWindowAverageCalculator(#61580) - Remove default task-level timeout and clamp
end_offsetin Kafka datasource (#61476) - Avoid redundant reads in
train_test_split(#60274) - Return
Nonewhen no outputs have been produced (#62029) - Replace bare
raisewithTypeErrorin string concatenation (#60795)
📖 Documentation
- Add job-level checkpointing documentation (#60921)
- Update
exclude_resourcesdocs for Train autoscaling changes (#61990) - Add
locality_with_outputmigration instructions (#61151) - Document
max_tasks_in_flight_per_actorvsmax_concurrent_batches(#60477) - Add missing
MODoperation docs; improveray.data.Datasourcedocs (#60803, #59654) - Add
polarsusage instructions (#60029)
Ray Serve
🎉 New Features:
- Added end-to-end gRPC client and bidirectional streaming support, including public APIs, proxy handling, proto updates, and developer docs, so Serve apps can handle streaming workloads natively instead of building custom transport layers. (#60767, #60768, #60769, #60770, #60771)
- Introduced HAProxy-based serving with fallback proxy support and load-balancer tunables, giving operators a higher-throughput ingress path and more control over traffic behavior in production. (#60586, #61180, #61271, #61468, #61988)
- Added queue-based autoscaling for async inference and Taskiq-backed workloads, so scaling decisions can account for both HTTP in-flight load and queued tasks. (#59548, #60851, #60977, #61008)
- Rolled out gang scheduling support across validation, core scheduling, fault tolerance, downscaling, autoscaling, rolling updates, and migration, enabling coordinated multi-replica placement for tightly coupled workloads. (#60944, #61205, #61206, #61207, #61215, #61467, #61216, #61659)
- Introduced deployment-scoped actors with config/schema, lifecycle management, public API, and controller health checks, making it easier to run durable per-deployment sidecar-like logic inside Serve. (#61639, #61648, #61664, #61833, #62161)
💫 Enhancements:
- Added first-class tracing support for Serve, including inter-deployment gRPC propagation and richer streaming-path attributes, improving end-to-end observability across distributed request flows. (#61230, #61089, #61451)
- Expanded operational metrics with replica utilization, richer error labeling, and client IP logging in access logs, helping teams diagnose bottlenecks and user-impacting issues faster. (#60758, #61092, #60967)
- Improved autoscaling extensibility with class-based policies and
policy_kwargs, so advanced users can package reusable autoscaling logic without custom forks. (#60964) - Reduced controller overhead with broad algorithmic improvements (indexing, cache reuse, and avoiding repeated per-tick work), which improves scalability as deployment and replica counts grow. (#60810, #60829, #60830, #60838, #60842, #60843, #60844, #60832, #60806)
- Improved throughput-oriented operation controls by adding environment-based tuning and explicit throughput optimization logging, making performance behavior easier to configure and audit. (#60757, #62146)
- Upgraded Serve internals to Pydantic v2 and refined time-series aggregation behavior for more predictable metric accuracy under high load. (#61061, #61403)
🔨 Fixes:
- Fixed a direct-ingress shutdown bug where replicas could hang indefinitely while draining stuck requests, ensuring bounded shutdown behavior in failure scenarios. (#60754)
- Fixed HAProxy reliability issues, including config race conditions, draining guards, and platform compatibility edge cases, improving stability in production rollouts. (#61120, #60955)
- Fixed autoscaling correctness issues that could cause runaway scaling or delayed reactions, including feedback-loop regressions, streaming scale-down behavior, and wall-clock delay handling. (#61731, #61920, #62331, #61844, #60613)
- Fixed high-percentile latency regression in request routing and queue-length accounting, reducing tail-latency spikes under load. (#61755)
- Fixed replica-state and health-state edge cases during migration and ingress transitions, preventing false errors and unhealthy/healthy misreporting. (#60365, #61818, #62213)
- Fixed chained upstream actor-failure handling so request failures are attributed correctly and no longer hang when upstream deployments die mid-chain. (#61758, #62147)
- Fixed HTTP status classification for client disconnects after successful responses, improving accuracy of error-rate monitoring and alerting. (#61396)
📖 Documentation:
- Added
AsyncInferenceAutoscalingPolicydocumentation and clarified Serve performance guidance for HAProxy and inter-deployment gRPC use cases. (#61086, #61386) - Updated scheduling and configuration docs, including replica scheduling guidance and a catalog of Serve environment variables, so operators can tune deployments with less guesswork. (#60922, #60807)
- Clarified multiplexing and async behavior docs (including model pre-warming con...
Ray-2.54.1
Ray Data
🔨 Fixes
- Disable hanging issue detection (#61895) — The hanging issue detector was making blocking calls to the Ray State API, which could cause the scheduling loop to block and severely degrade pipeline performance. The detector is disabled in this patch release until the blocking calls are fixed.
Ray-2.54.0
Ray Data
🎉 New Features
- Add checkpointing support to Ray Data (#59409)
- Compute Expressions: list operations (#59346), fixed-size arrays (#58741), string padding (#59552), logarithmic (#59549), trigonometric (#59712), arithmetic (#59678), and rounding (#59295)
- Add
sql_paramssupport toread_sql(#60030) - Add
AsListaggregation (#59920) - Support
CountDistinctaggregate (#59030) - Add credential provider abstraction for Databricks UC datasource (#60457)
- Support callable classes for
UDFExpr(#56725) - Add autoscaler metrics to Data Dashboard (#60472)
- Add optional filesystem parameter to download expression (#60677)
- Allow specifying partitioning style or flavor in
write_parquet()(#59102) - New cluster autoscaler enabled by default (#60474)
💫 Enhancements
- Improve numerical stability in scalers by handling near-zero values (#60488)
- Export dataset operator output schema to event logger (#60086)
- Iceberg: add retry policy for Storage + Catalog writes (#60620)
- Iceberg: remove calls to Catalog Table in write tasks (#60476)
- Expose logical operators and rules via package exports (#60297, #60296)
- Demote Sort from requiring
preserve_order(#60555) - Improve appearance of repr(dataset) (#59631)
- Allow configuring
DefaultClusterAutoscalerV2thresholds via env vars (#60133) - Use Arrow IPC for Arrow Schema serialization/deserialization (#60195)
- Store _source_paths in object store to prevent excessive spilling during read task serialization (#59999)
- Add more shuffle fusion rules (#59985)
- Enable and tune
DownstreamCapacityBackpressurePolicy(#59753) - Enable concurrency cap backpressure with tuning (#59392)
- Set default actor pool scale up threshold to 1.75 (#59512)
- Don't downscale actors if the operator hasn't received any inputs (#59883)
- Don't reserve GPU budget for non-GPU tasks (#59789)
- Only return selected data columns in hive-partitioned Parquet files (#60236)
- Ordered + FIFO bundle queue (#60228)
- Add
node_id,pid, attempt number for hanging tasks (#59793) - Revise resource allocator task scheduling to factor in pending task outputs (#60639)
- Track block serialization time (#60574)
- Use metrics from
OpRuntimeMetricsfor progress (#60304) - Tabular form for streaming executor op metrics (#59774)
- Info-log cluster scale-up decisions (#60357)
- Use plain mode instead of grid mode for
OpMetricslogging (#59907) - Progress reporting refactors (#59350, #59629, #59880)
- Remove deprecated
TENSOR_COLUMN_NAMEconstant (#60573) - Remove
meta_providerparameter (#60379) - Decouple Ray Train from Ray Data by removing top-level
ray.dataimports (#60292) - Move extension types to ray.data (#59420)
- Skip upscaling validation warning for fixed-size actor pools (#60569)
- Make
StatefulShuffleAggregation.finalizeallow incremental streaming (#59972) - Revisit
OutputSplittersemantics to avoid unnecessary buffer accumulation (#60237) - Update to PyArrow 23 (#60739, #59489)
- Add
BackpressurePolicyto streaming executor progress bar (#59637) - Support Arrow-based transformations for preprocessors (#59810)
StandardScalerpreprocessor with Arrow format (#59906)- OneHotEncoder with Arrow format (#59890)
🔨 Fixes
- Fuse
MapBatcheseven if they modify the row count (#60756) - Don't push limit past
map_batchesby default (#60448) - Fix wrong type hint of other dataset in zip and union (#60653)
- Fix
ActorPoolMapOperatorto guarantee dispatch of all given inputs (#60763) - Fix
ArrowInvaliderror when backfilling missing fields from map tasks (#60643) - Fix attribute error in
UnionOperator.clear_internal_output_queue(#60538) - Fix
DefaultClusterAutoscalerV2raising KeyError: 'CPU' (#60208) - Fix
ReorderingBundleQueuehandling of empty output sequences (#60470) - Fix task completion time without backpressure grafana panel metric name (#60481)
- Fix Union operator blocking when preserve_order is set (#59922)
- Fix autoscaler requesting empty resources instead of previous allocation when not scaling up (#60321)
- Fix autoscaler not respecting user-configured resource limits (#60283)
- Fix
DefaultAutoscalerV2not scaling nodes from zero (#59896) - Fix Iceberg warning message (#60044)
- Fix Parquet datasource path column support (#60046)
- Fix ProgressBar with
use_ray_tqdm(#59996) - Fix stale stats on refit for preprocessors (#60031)
- Fix
StreamingRepartitionhang with empty upstream results (#59848) - Fix operator fusion bug to preserve UDF modifying row count (#59513)
- Fix
AutoscalingCoordinatordouble-allocating resources for multiple datasets (#59740) - Fix
DownstreamCapacityBackpressurePolicyissues (#59990) - Fix
AutoscalingCoordinatorcrash when requesting 0 GPUs on CPU-only cluster (#59514) - Fix
TensorArraytoArrowtensor conversion (#59449) - Fix resource allocator not respecting max resource requirement (#59412)
- Fix GPU autoscaling when
max_actorsis set (#59632) - Fix checkpoint filter PyArrow zero-copy conversion error (#59839)
- Restore class aliases to fix deserialization of existing datasets (#59828, #59818)
- Fix DataContext deserialization issue with StatsActor (#59471)
📖 Documentation
- Sort references in "Loading data and Saving data" pages (#60084)
- Fix inconsistent heading levels in "How to write tests" guide (#60706)
- Clarify
resource_limitsrefers to logical resources (#60109) - Update
read_lancedoc (#59673) - Fix broken link in
read_unity_catalogdocstring (#59745) - Fix bug in docs for
enable_true_multi_threading(#60515) - Add more education around transformations (#59415)
Ray Serve
🎉 New Features
- Queue-based autoscaling for TaskConsumer deployments (phase 1). Introduces a
QueueMonitoractor that queries message brokers (Redis, RabbitMQ) for queue length, enabling TaskConsumer scaling based on pending tasks rather than HTTP load. (#59430) - Default autoscaling parameters for custom policies. New
apply_autoscaling_configdecorator allows custom autoscaling policies to automatically benefit from Ray Serve's standard parameters (delays, scaling factors, bounds) without reimplementation. (#58857) label_selectorandbundle_label_selectorin Serve deployments. Deployments can now specify node label selectors for scheduling and bundle-level label selectors for placement groups, useful for targeting specific hardware (e.g., TPU topologies). (#57694)- Deployment-level autoscaling observability. The controller now emits a structured JSON
serve_autoscaling_snapshotlog per autoscaling-enabled deployment each control-loop tick, with an event summarizer that reduces duplicate logs. (#56225) - Batching with multiplexing support. Batching now guarantees each batch contains requests for the same multiplexed model, enabling correct multiplexed model serving with
@serve.batch. (#59334)
💫 Enhancements
- Replica routing data structure optimizations. O(1) pending-request lookups, cached replica lists, lazy cleanup, optimized retry insertion, and metrics throttling yield significant routing performance improvements. (#60139)
- New operational metrics suite. Added long-poll metrics, replica lifecycle metrics, app/deployment status metrics, proxy health and request routing delay metrics, event loop utilization metrics, and controller health metrics — greatly improving monitoring and debugging capabilities. (#59246, #59235, #59244, #59238, #59535, #60473)
- Autoscaling config validation.
lookback_period_smust now be greater thanmetrics_interval_s, preventing silent misconfigurations. (#59456) - Cross-version
root_pathsupport for uvicorn.root_pathnow works correctly across all uvicorn versions, including >=0.26.0 which changed how root_path is processed. (#57555) - Preserve user-set gRPC status codes. When deployments raise exceptions after setting a gRPC status code on the context, that code is now correctly propagated to the client instead of being overwritten with INTERNAL. Error messages are truncated to 4 KB to respect HTTP/2 trailer limits. (#60482)
- Replica ThreadPoolExecutor capped to num_cpus. The user-code event loop's default ThreadPoolExecutor is now limited to the deployment's num_cpus, preventing oversubscription when using asyncio.to_thread. (#60271)
- Generic actor registration API for shutdown cleanup. Deployments can register auxiliary actors (e.g., PrefixTreeActor) with the controller for automatic cleanup on
serve.shutdown(), eliminating cross-library import dependencies. (#60067) - Deployment config logging in controller. Deployment configurations are now logged in the controller for easier debugging and auditability. (#59222, #59501)
- Pydantic v1 deprecation warning. A FutureWarning is now emitted at
ray.init()when Pydantic v1 is detected, as support will be removed in Ray 2.56. (#59703)
🔨 Fixes
- Fixed tracing signature mismatch across processes. Resolved TypeError: got an unexpected keyword argument
_ray_trace_ctxwhen calling actors from a different process than the one that created them (e.g., serve start + dashboard interaction). (#59634) - Fixed ingress deployment name collision. Ingress deployment name was incorrectly modified when a child deployment shared the same name, causing routing failures. (#59577)
- Fixed downstream deployment over-provisioning. Downstream deployments no longer over-provision replicas when receiving DeploymentResponse objects. (#60747)
- Fixed replicas hanging forever during draining. Replicas no longer hang indefinitely when requests are stuck during the draining phase. (#60788)
- Fixed
TaskProcessorAdaptershutdown during rolling updates. Removedshutdown()from__del__, which was broadcasting a kill signal to all Celery workers instead of just the local one, breaking rolling updates. (#59713) - Fixed Windows test failures. Resolved tracing file handle cleanup on Window...
Ray-2.53.0
Highlights
- Ray plans to drop support for Pydantic V1 starting version 2.56.0. Please see this RFC for details.
- Ray Data now has support for bounded reading from Kafka and improved Iceberg support.
Ray Data
🎉 New Features
- Autoscaling: New utilization-based cluster autoscaler for Ray Data workloads (#59353, #59362, #59366). To use this new autoscaler set RAY_DATA_CLUSTER_AUTOSCALER=V2.
- Kafka Datasource: Add Kafka as a native datasource for data ingestion (#58592)
- Dataset summary API: Add
Dataset.summary()API for quick dataset inspection (#58862) - Iceberg support: Add Iceberg schema evolution, upsert, and overwrite support (#59210, #59335)
- Graceful error handling: Add
should_continue_on_errorfor graceful error handling in batch inference (#59212) - Datetime compute expressions: Add datetime compute expressions support (#58740)
- Grouped
with_columnexpressions: Enable expressions for groupedwith_columnin Ray Data (#58231) - Parallelized collation: Parallelize
DefaultCollateFn,arrow_batch_to_tensors(#58821)
💫 Enhancements
- Optimized Autoscaler Step Size: Optimize autoscaler to support configurable step size for actor pool scaling (#58726)
- Improved Streaming Repartition: Improve streaming repartition performance (#58728)
- Actor init retry: Add actor retry if there's a failure in
__init__(#59105) - Fused Repartition + MapBatches: Fuse StreamingRepartition with MapBatches operators to scale collate (#59108)
- Combined repartitions: Combine consecutive repartitions for efficiency (#59145)
- Prefetch buffering: Handle prefetch buffering in
iter_batches(#58657) - HashShuffle block breakdown:
HashShuffleAggregatorbreaks down blocks on finalize (#58603) - Backpressure tuning: Tune concurrency cap backpressure object store budget ratio (#58813)
- Non-string ApproximateTopK: Support non-string items for
ApproximateTopKaggregator (#58659) - Lance version support: Add version support to
read_lance()(#58895) - Dashboard metrics: Add
time_to_first_batchandget_ref_bundlesmetrics to data dashboard (#58912) - Iter prefetched bytes stats: Add
iter_prefetched_bytesstatistics tracking (#58900) - Configurable batching for
iter_batches: Add configurable batching forresolve_block_refsto speed upiter_batches(#58467) - Improved dashboard metrics: Improve Ray Data dashboard metrics display (#58667)
- Histogram percentiles: Update Ray Data histograms to show percentiles in data dashboard (#58650)
- Deprecated API removal: Remove deprecated
read_parquet_bulkAPI (#58970) - Block shaping option: Add disable block shaping option to BlockOutputBuffer (#58757)
- Removed concurrency lock: Remove concurrency lock for better performance (#56798)
🔨 Fixes
- Fixes to Unique: Fix support of list types for Unique aggregator (#58916)
- Parquet NaN fix: Fix reading from written parquet for numpy with NaNs (#59172)
- Hash Shuffle empty block: Fix empty block sort in hash shuffle operator (#58836)
- Hive partitioning pushdown: Fix pushdown optimizations with Hive partitioning (#58723)
- Object Store usage reporting: Fix
obj_store_mem_max_pending_output_per_taskreporting (#58864) - Pyarrow FileSystem serialization fix: Handle filesystem serialization issue in
get_parquet_dataset(#57047) - Azure UC SAS: Handle Azure UC user delegation SAS (#59393)
- Async UDF Thread Cleanup: Close threads from async UDF after actor died (#59261)
- Object Locality Default: Default return 0s for object locality instead of -1s (#58754)
📖 Documentation
- Added contributing guide to Ray Data documentation (#58589)
- Added download expression to key user journeys in documentation (#59417)
- Added Kafka user guide (#58881)
- Added unstructured data templates from Ray Summit 2025 (#57063)
- Improved instructions for reading Hugging Face datasets (#58492, #58832)
- Refined batch-format guidance in docs (#58971)
- Exposed
vision_preprocessandvision_postprocessin VLM docs (#59012) - Added upgrading
huggingface_hubinstruction (#59109) - Added scaling out expensive collation functions doc (#58993)
Ray Serve
🎉 New Features
- Deployment topology visibility. Exposes deployment dependency graphs in Serve REST API, allowing users to visualize and understand the DAG structure of their applications. (#58355)
- External autoscaler integration. Adds
external_scaler_enabledflag to application config, enabling third-party autoscalers to control replica counts. (#57727, #57698) - Node rank and local rank support. Extends replica rank system to track node-level and per-node local ranks, enabling better distributed serving coordination for multi-node deployments. (#58477, #58479)
- Custom batch size function. Allows users to define custom functions for computing logical batch sizes in
@serve.batch, useful when batch items have varying weights (e.g., token counts in LLM inference). (#59059) - Stateful application-level autoscaling. Adds policy state persistence for custom autoscaling policies, allowing policies to maintain state across control-loop iterations. (#59118)
- New autoscaling, batching, and routing metrics. Adds Prometheus metrics for autoscaling decisions (
ray_serve_deployment_target_replicas,ray_serve_autoscaling_decision_replicas), batching statistics, and router queue latency for improved observability. (#59220, #59232, #59233)
💫 Enhancements
- Smarter downscaling behavior. Prioritizes stopping most recently scaled-up replicas during downscale, preserving long-lived replicas that are optimally placed and fully warmed up. (#52929)
- Autoscaling performance optimizations. Short-circuits metric aggregation for single time series cases (O(n log n) → O(1)) and lazily evaluates expensive autoscaling context fields to reduce controller CPU usage. (#58962, #58963)
- Route matching cleanup. Removes redundant route matching logic from replicas since correct route values are now included in RequestMetadata. Also allows multiple methods (
GET,PUT) corresponding to a route. (#58927) - Deployment wrapper metadata preservation. Wrapper classes from decorators like
@ingressnow preserve original class metadata (__qualname__,__module__,__doc__,__annotations__). (#58478) - Improved type annotations. Enhances generic type annotations on
DeploymentHandle,DeploymentResponse, andDeploymentResponseGeneratorfor better IDE support and type inference. Adds.result()stub toDeploymentResponseGeneratorto fix static typing errors. (#59363, #58522)
🔨 Fixes
- YAML serialization for autoscaling enums. Fixes
RepresenterErrorwhen usingserve buildwithAggregationFunctionenum values in autoscaling config. (#58509) - Autoscaling context timestamp fix. Correctly sets
last_scale_up_timeandlast_scale_down_timeon autoscaling context. (#59057) - Deadlock in chained deployment responses. Fixes hang when awaiting intermediate
DeploymentResponseobjects in a chain of deployment calls from different event loops. (#59385) - FastAPI class-based view inheritance. Fixes
make_fastapi_class_based_viewto properly handle inherited methods. (#59410)
📖 Documentation
- Async I/O best practices guide. New documentation covering async programming patterns and best practices for Ray Serve deployments. (#58909)
- Replica scheduling guide. New documentation covering compact scheduling, placement groups, custom resources, and guidance on when to use each feature. (#59114)
Ray Train
🎉 New Features
- Worker Placement with Label Selectors: Added
label_selectortoScalingConfig. This allows users to control worker placement by targeting specific labeled nodes in the cluster. (#58845, #59414) - Multihost JaxTrainer on GPU: Introduced support for
JaxTrainerrunning on GPU machines. (#58322) - Checkpoint Consistency Modes: Added
CheckpointConsistencyModetoget_all_reported_checkpoints, providing options for handling checkpoint retrieval consistency. (#58271) - Per-Dataset Execution Options:
DataConfignow supports settingexecution_optionson a per-dataset basis for finer-grained control over data loading. (#58717)
💫 Enhancements
- Nested Metrics Support:
Result.get_best_checkpointnow supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537) - Non-Blocking Checkpoint Retrieval:
get_all_reported_checkpointsno longer blocks when only metrics are reported. (#58870) - Improved Resource Cleanup: Implemented eager cleanup of data resources and placement groups upon training run failures or aborts, preventing resource leaks. (#58325, #58515)
🔨 Fixes
- MLflow Compatibility: Updated
setup_mlflowAPI to ensure full compatibility with Ray Train V2. (#58705) - Validation for Checkpoint Uploads: A
ValueErroris now raised ifcheckpoint_upload_fnfails to return a valid checkpoint. (#58863)
📖 Documentation
- New API Documentation: Added comprehensive documentation for the
ray.train.get_all_reported_checkpointsmethod. (#58946)
Ray Tune
💫 Enhancements:
- Nested Metrics Support:
Result.get_best_checkpointnow supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)
Ray LLM
💫 Enhancements
- Cloud filesystem restructuring with provider-specific implementations (#58469)
- Bump
transformersto 4.57.3 (#58980) - Ray Data LLM config refactor (#58298)
- Update
vllm_engine.pyto check forVLLM_USE_V1attribute (#58820) - Infer
VLLM_RAY_PER_WORKER_GPUSfrom fractional placement-group bundles automatically (#5...
Ray-2.51.2
- Fix for CVE-2025-62593: reject Sec-Fetch-* other browser-specific headers in dashboard browser rejection logic
Ray-2.52.1
- More robust handling for CVE-2025-62593: test for more browser-specific headers in dashboard browser rejection logic
Ray-2.52.0
Release Highlights
Ray Core:
- End of Life for Python 3.9 Support: Ray will no longer be releasing Python 3.9 wheels from now on.
- Token authentication: Ray now supports built-in token authentication across all components including the dashboard, CLI, API clients, and internal services. This provides an additional layer of security for production deployments to reduce the risk of unauthorized code execution. Token authentication is initially off by default. For more information, see: https://docs.ray.io/en/latest/ray-security/token-auth.html
Ray Data:
- We’ve added a number of improvements for Iceberg, including upserts, predicate and projection pushdown, and overwrite.
- We’ve added significant improvements to our expressions framework, including temporal, list, tensor, and struct datatype expressions.
Ray Libraries
Ray Data
🎉 New Features:
- Added predicate pushdown rule that pushes filter predicates past eligible operators (#58150, #58555)
- Iceberg support for upsert tables, schema updates, and overwrite operations (#58270)
- Iceberg support for predicate and projection pushdown (#58286)
- Iceberg write datafiles in write() then commit (#58601)
- Enhanced Unity Catalog integration (#57954)
- Namespaced expressions that expose PyArrow functions (#58465)
- Added version argument to read_delta_lake (#54976)
- Generator UDF support for map_groups (#58039)
- ApproximateTopK aggregator (#57950)
- Serialization framework for preprocessors (#58321)
- Support for temporal, list, tensor, and struct datatypes (#58225)
💫 Enhancements:
- Use approximate quantile for RobustScaler preprocessor (#58371)
- Map batches support for limit pushdown (#57880)
- Make all map operations zero-copy by default (#58285)
- Use tqdm_ray for progress reporting from workers (#58277)
- Improved concurrency cap backpressure tuning (#58163, #58023, #57996)
- Sample finalized partitions randomly to avoid lens effect (#58456)
- Allow file extensions starting with '.' (#58339)
- Set default file_extensions for read_parquet (#56481)
- URL decode values in parse_hive_path (#57625)
- Streaming partition enforces row_num per block (#57984)
- Streaming repartition combines small blocks (#58020)
- Lower DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR to 2 (#58262)
- Set udf-modifying-row-count default to false (#58264)
- Cache PyArrow schema operations (#58583)
- Explain optimized plans (#58074)
- Ranker interface (#58513)
🔨 Fixes:
- Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
- Fixed handling of renames in projection pushdown (#58033, #58037)
- Fixed broken LogicalOperator abstraction barrier in predicate pushdown rule (#58683)
- Fixed file size ordering in download partitioning with multiple URI columns (#58517)
- Fixed HTTP streaming file download by using open_input_stream (#58542)
- Fixed expression mapping for Pandas (#57868)
- Fixed reading from zipped JSON (#58214)
- Fixed MCAP datasource import for better compatibility (#57964)
- Avoid slicing block when total_pending_rows < target (#58699)
- Clear queue for manually marked execution_finished operators (#58441)
- Add exception handling for invalid URIs in download operation (#58464)
- Fixed progress bar name display (#58451)
📖 Documentation:
- Documentation for Ray Data metrics (#58610)
- Simplify and add Ray Data LLM quickstart example (#58330)
- Convert rST-style to Google-style docstrings (#58523)
🏗 Architecture:
- Removed stats update thread (#57971)
- Refactor histogram metrics (#57851)
- Revisit OpResourceAllocator to make data flow explicit (#57788)
- Create unit test directory for fast, isolated tests (#58445)
- Dump verbose ResourceManager telemetry into ray-data.log (#58261)
Ray Train
🎉 New Features:
- Result::from_path implementation in v2 (#58216)
💫 Enhancements:
- Exit actor and log appropriately when poll_workers is in terminal state (#58287)
- Set JAX_PLATFORMS environment variable based on ScalingConfig (#57783)
- Default to disabling Ray Train collective util timeouts (#58229)
- Add SHUTTING_DOWN TrainControllerState and improve logging (#57882)
- Improved error message when calling training function utils outside Ray Train worker (#57863)
- FSDP2 template: Resume from previous epoch when checkpointing (#57938)
- Clean up checkpoint config and trainer param deprecations (#58022)
- Update failure policy log message (#58274)
📖 Documentation:
- Ray Train Metrics documentation page (#58235)
- Local mode user guide (#57751)
- Recommend tree_learner="data_parallel" in examples for distributed LightGBM training (#58709)
Ray Serve
🎉 New Features:
- Custom request routing with runtime environment support. Users can now define custom request router classes that are safely imported and serialized using the application's runtime environment, enabling advanced routing logic with custom dependencies. (#56855)
- Custom autoscaling policies with enhanced logging. Deployment-level and application-level autoscaling policies now display their custom policy names in logs, making it easier to debug and monitor autoscaling behavior. (#57878)
- Audio transcription support in vLLM backend. Ray Serve now supports transcription tasks through the vLLM engine, expanding multimodal capabilities. (#57194)
- Data parallel attention public API. Introduced a public API for data parallel attention, enabling efficient distributed attention mechanisms for large-scale inference workloads. (#58301)
- Route pattern tracking in proxy metrics. Proxy metrics now expose actual route patterns (e.g.,
/api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180) - Replica dependency graph construction. Added
list_outbound_deployments()method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350) - Multi-dimensional replica ranking. Introduced
ReplicaRankschema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473) - Proxy readiness verification. Added a check to ensure proxies are ready to serve traffic before
serve.run()completes, improving deployment reliability. (#57723) - IPv6 socket support. Ray Serve now supports IPv6 networking for socket communication. (#56147)
💫 Enhancements:
- Selective throughput optimization flag overrides. Users can now override individual flags set by
RAY_SERVE_THROUGHPUT_OPTIMIZEDwithout manually configuring all f...
Ray-2.51.1
- Reuse previous metadata if transferring the same tensor list with
nixl(#58309)