Release list

Ray-2.56.0 Latest

Latest

sai-miduthuri released this 29 Jun 20:32

ray-2.56.0

637fd06

Highlights

Ray Data Stability: In this Ray release, we've added a variety of stability improvements, including running multiple datasets in a cluster, adding automatic batch size selection to CPU-based map-batches, and default logical memory configuration to prevent OOMs. We've also tightened iter_batches stability by reducing hidden buffering and shutting down the executor when consumers exit early (#63660, #63682, #62949). This reduces object-store spilling for common training workloads
Ray Serve: We re-architected Ray Serve LLM by decoupling request handling from token streaming response path (#62667, #62680, #62668, #62669, #63167), resulting in significant LLM serving performance improvements. We've also introduced new routing policies such as session-sticky routing via consistent hashing with ConsistentHashRouter (#62905, #63096, #62906) and CapacityQueueRouter (#62323) which is beneficial for supply-constrained workloads.
Ray Core: We've added GPU-domain-aware placement groups using label locality (#61442, #61614, #62487, #62533). This enables placement groups to pack bundles onto nodes that share a ray.io/gpu-domain label instead of only packing at the single-node level. We've also added initial Kubernetes in-place pod resizing support for Autoscaler v2 (#55961, #62369, #62215), enabling Ray clusters to resize CPU and memory on existing worker pods before scaling out new pods.

Ray Data

🎉 New Features

Support multiple datasets per cluster via subcluster labels and resource partitioning (#63331, #63375, #63982)
Add Dataset.mix() public API and MixOperator for weighted dataset mixing (#63168, #62450)
New DataSourceV2 framework: ParquetDatasourceV2, chunked reader, predicate splitting, listing/scanner infra (#63113, #63454, #63163, #62975, #63027, #62182)
Add batch_size='auto' to map_batches to derive batch row count from target row batch size (#62648)
Implement distributed upsert for Iceberg using task-based merge algorithm, preventing performance bottleneck on driver (#63482)
Add include_row_hash to read_parquet (#61408)
Add JAX data iterator (#61630)
Expose flag to run read tasks on isolated worker processes via isolate_read_workers (#63490)
Expose flag to set default logical memory for map operators via default_map_logical_memory_enabled (#63814)
Support predicate pushdown for Lance format (#61400)
Support per-partition start_offset and end_offset for read_kafka (#61620)
Add obstore async download backend for download operator (#61735)
Support UDF retries on transient exceptions (#63023)

💫 Enhancements

Fix iter_batches spilling by replacing make_async_gen with iter_threaded and reducing buffered batches (#63660, #63682)
Gate restore_original_order in iter_batches behind preserve_order (#63792)
Convert drop_columns to a Project logical operator when input schema is known (#63813)
Make ConcatAggregation and TurbopufferDatasink use polars for sorting (#61904)
Boost and vectorize hash_partition with sort_indices, zero-copy slices, and pandas (#63498, #62757, #63152, #62587)
Enable GPU_SHUFFLE in grouped_data.py (#62410)
Eager StarExpr expansion, schema inference for non-black-box UDFs, and Expressions struct support (#63776, #63387, #62560)
Make logging configurable via RAY_DATA_LOG_LEVEL and log RAY_DATA env vars at execution start (#63487, #63380)
Display and track logical memory in progress bar (#63379)
Honor compute= in filter(expr=...) and deprecate concurrency= (#63576)
Enable filter pushdown through StreamingRepartition and read stage column-rename removal (#62347, #63384, #63582)
Cache deserialized Arrow schemas in BlockMetadataWithSchema (#63462)
Track scheduling-loop step duration (p50/p90/max), peak USS/object-store memory, and task block locality (#63586, #63345, #63489, #63418, #62249)
Replace TaskDurationStats and Timer with DistributionTracker (#63488, #63530, #63825)
Introduce BlockEntry on RefBundle in place of (ref, metadata) tuples (#63654)
Pre-resolve filesystem in threaded download to avoid IMDS herd (#62898)
Convert logical operators to frozen dataclasses and consolidate operator base/repr (#62593, #62568, #62400, #63137, #63140, #63108)
Non-blocking default autoscaling coordinator and resource-aware auto-downscaling (#62725, #62574)
Release pinned blocks after dataset execution and shut down executor on early DataIterator exit (#62456, #62949)
Optimize local shuffle with incremental index and configurable compaction threshold (#62539)
Speed up checkpoint filter and reduce memory usage (#60294)
Preserve Arrow types through pandas roundtrip and reorder block columns by name before schema ops (#63017, #63582)
Block pickle object columns when reading untrusted Parquet and gate unsafe WebDataset deserialization (#63470, #63469)
Move backpressure escape hatch across all policies (#63539)
Update pandas, modin, and pyarrow minimum versions (#62899)
Add utilization monitoring and correct logical resource usage for ActorPool (#61987, #61528)
Deprecate ConcurrencyCapBackpressurePolicy, DataIterator.to_torch, and pandas UDF batches (#63392, #62540, #61733)
Rank actors per node in a heap and avoid re-exporting actor class via .options (#62309, #62722)
read_delta reads from preconfigured pyarrow dataset (#61721)
Include column name and target type in ArrowConversionError; reduce arrow conversion warning verbosity (#62407, #61486, #62521)
Show external consumer bytes in verbose operator progress log (#63728)
Disable DataSourceV2 by default after earlier enabling (#63674, #63326)

🔨 Fixes

Rename subcluster label key from __subcluster__ to ray-subcluster (#63982)
Fix get_or_create_stats_actor crash in Ray Client mode (#63402)
Fix datasource pushdown crashes for generic UDFExpr filter predicates (#63781)
Fix hash-shuffle aggregator memory estimation: metadata propagation, node-size clamp, column pruning (#63809)
Fix CheckpointConfig FileNotFoundError on Azure Blob Storage (#63606)
Fix silent credential drop for fsspec-S3 in download expression (#62897)
Fix missing f-string prefix in _concatenate_extension_column (#62939)
Fix HashAggregate duplicate group rows for AggregateFnV2 (#63066)
Fix JSONL read retry with advanced file cursor (#63233)
Fix read_parquet ArrowNotImplementedError for nested column types exceeding ~2GB row group (#61824)
Fix read_parquet nested-type fallback and parquet scanner memory accumulation (#63175, #62745)
Fix memory leak in DataIterator.to_torch() by switching to PyArrow (#60966)
Fix ZipOperator freeing shared blocks via _split_at_indices (#62665)
Fix concurrent writes race condition in write_parquet (#62377)
Fix GPU shuffle output ordering when using ShuffleStrategy.GPU_SHUFFLE (#62351)
Fix incorrect DatasetStat uuid propagation (#62255)
Fix none issue when DATA_ENABLE_OP_RESOURCE_RESERVATION=False (#61718)
Fix filesystem compatibility check for fsspec-wrapped PyFileSystem (#61850)
Forward try_create_dir to pyarrow.dataset.write_dataset (#58302)
Fix autoscaler bug blocking timely release of leased resources (#62592)
Ensure consistent nan_is_null/nans-as-nulls semantics in encoder (#62623, #62618)
Skip unconditional null strip in find_partition_index (#62594)
V1 _split_predicate_by_columns correctness fix (#63176)
Avoid importing cudf in _is_cudf_dataframe when cudf not loaded (#62302)
Revert raw-modulo hash partition fast path (#63097)
Remove tfx-bsl support from read_tfrecords (#63245)

📖 Documentation

Document isolate_read_workers for read_parquet (#63816)
Remove docs recommending increased object store memory proportion (#63389)
Update docs minimum version for build_processor and "auto" batch size (#61757, #62790)
Remove outdated limitation of DefaultClusterAutoscalerV2 and stale object-store-memory warnings (#62385, #62387)

Ray Serve

🎉 New Features:

Add custom ingress request router app interfaces and HAProxy ingress dispatch path (#62680, #62668, #62669, #62667)
Expose choose_replica/dispatch on deployment handles and AsyncioRouter with replica-side slot reservation (#63255, #63254, #63252)
Introduce experimental round robin router and ConsistentHashRouter for session-sticky routing (#63238, #62906, #63096, #62905)
Central capacity queue for token-based request routing via CapacityQueueRouter (#62323)
Add experimental ray-haproxy support behind RAY_SERVE_EXPERIMENTAL_PIP_HAPROXY (#62589)
Add deployment actor context API and broadcast API for deployment handles (#62532, #61472)
Add ControllerOptions for configurable controller runtime_env (#63352)
Make rolling update percentage configurable (#62160)
Support per-request timeout and disconnect in HTTP proxy path (#62867)

💫 Enhancements:

HAProxy stability improvements: wait for old workers before drain, redirect stdout/stderr, redispatch+retry-on, coalesce broadcasts, quarantine released ports (#63620, #63621, #63622, #63623, #63628)
Bind direct ingress ports to 0.0.0.0 for cross-node HAProxy routing (#62515)
HAProxy ingress request router metrics, enable splice by default, TCP_NODELAY default 1, optional retry knobs, RAY_SERVE_HAPROXY_STATS_PORT (#63356, #63531, #63353, #63415, #62979)
Resolve bundled ray-haproxy binary before RAY_SERVE_HAPROXY_BINARY_PATH; HAProxy abspath env var (#63829, #62610)
Replace socat subprocess with Python socket for HAProxy admin communication; bump HAProxy to avoid CVE-2025-11230 (#61897, #62585)
Expose controller health metrics via /api/serve/applications/ API; add max_replicas_per_node to response (#63556, #63234)
Run health check on user ...

Contributors

werkt, robertnishihara, and 167 other contributors

Assets 2

Ray-2.55.1

elliot-barn released this 22 Apr 20:24

ray-2.55.1

237c245

Fixes SSH connectivity issue in the ray-llm image (#62625 / #62718).
Upgrade apt packages in slim base (#62666 / #62717).

Assets 2

Ray-2.55.0

sai-miduthuri released this 15 Apr 20:34

ray-2.55.0

58af3fc

Ray Data

🎉 New Features

Add DataSourceV2 API with scanner/reader framework, file listing, and file partitioning (#61220, #61615, #61997)
Support GPU shuffle with rapidsmpf 26.2 (#61371, #62062)
Add Kafka datasink, migrate to confluent-kafka, support datetime offsets (#60307, #61284, #60909)
Add Turbopuffer datasink (#58910)
Add 2-phase commit checkpointing with trie recovery and load method (#61821, #60951)
Queue-based autoscaling policy integrated with task consumers (#59548, #60851)
Enable autoscaling for GPU stages (#61130)
Expressions: add random(), uuid(), cast, and map namespace support (#59656, #60695, #59879)
Add support for Arrow native fixed-shape tensor type (#56284)
Support writing tensors to tfrecords (#60859)
Add pathlib.Path support to read_* functions (#61126)
Add cudf as a batch_format (#61329)
Allow ActorPoolStrategy for read_datasource() via compute parameter (#59633)
Introduce ExecutionCache for streamlined caching (#60996)
Support strict=False mode for StreamingRepartition (#60295)
Port changes from lance-ray into Ray Data (#60497)
Enable PyArrow compute-to-expression conversion for predicate pushdown (#61617)
Add vLLM metrics export and Data LLM Grafana dashboard (#60385)
Include logical memory in resource manager scheduling decisions (#60774)
Add monotonically increasing ID support (#59290)

💫 Enhancements

Performance: cache _map_task args, heap-based actor ranking, actor pool map improvements (#61996, #62114, #61591)
Optimize concat tables and PyArrow schema hashing (#61315, #62108)
Reduce default DownstreamCapacityBackpressurePolicy threshold to 50% (#61890)
Improve reproducibility for random APIs (#59662)
Clamp batch size to fall within C++ 32-bit int range (#62242)
Account for external consumer object store usage in resource manager budget (#62117)
Make get_parquet_dataset configurable in number of fragments to scan (#61670)
Consolidate schema inference and make all preprocessors implement SerializablePreprocessorBase (#61213, #61341)
Disable hanging issue detection by default (#62405)
Make execution callback dataflow explicit to prevent state leakage (#61405)
Log DataContext in JSON format at execution start for traceability (#61150, #61428)
Autoscaler: configurable traceback, Prometheus gauges, relaxed constraints (#62210, #62209, #61917, #61385)
Add metrics for task scheduling time, output backpressure, and logical memory (#61192, #61007, #61436)
Prevent operators from dominating entire shared object store budget (#61605)
Eliminate generators to avoid intermediate state pinning (#60598)
Default log encoding to UTF-8 on Windows (#61143)
Remove legacy BlockList, locality_with_output, old callback API, PyArrow 9.0 checks (#60575, #61044, #62055, #61483)
Upgrade to pyiceberg 0.11.0; cap pandas to <3 (#61062, #60406)
Refactor logical operators to frozen dataclasses (#61059, #61308, #61348, #61349, #61351, #61364, #61481)
Prevent aggregator head node scheduling (#61288)
Add error for local:// paths with a zero-resource head node (#60709)

🔨 Fixes

Fix RCE in Arrow extension type deserialization from Parquet (#62056)
Fix StreamingSplitDataIterator.schema() (#62057)
Fix ParquetDatasource handling of FileSystemFactory.inspect (#62065)
Fix read_parquet file-extension filtering for versioned object-store URIs (#61376)
Fix wide_schema_pipeline_tensors cloudpickle deserialization (#62149)
Fix OpBufferQueue race condition (#60828)
Fix scheduling metrics computation (#62031)
Fix OneHotEncoder max_categories to use global top-k instead of per-partition (#60790)
Fix ReservationOpResourceAllocator resource borrowing for ActorPoolMapOperator (#60882)
Fix DatabricksUCDatasource schema() shadowing by schema string attribute (#61282)
Fix AliasExpr structural equality to respect rename flag (#60711)
Fix _align_struct_fields failure with unaligned scalar fields (#58364)
Fix min_scheduling_resources fallback to incremental_resource_usage (#60997)
Fix output backpressure unblocking sequence for terminal ops (#60798)
Fix multi-input operator object store memory attribution (#61208)
Fix reference cycle by moving to module scope (#61934)
Fix autoscaler logging: reduce verbose output and move traceback to debug (#61989, #62126)
Fix double counting ref_bundle + input_files (#61774)
Replace on_exit hook with __ray_shutdown__ to fix UDF cleanup race (#61700)
Prevent Limit from getting pushed past map_groups (#60881)
Propagate schema in empty _shuffle_block to fix ColumnNotFound in chained left joins (#61507)
Fix unclear metadata warning and incorrect operator name logging (#61380)
Clamp rolling utilization averages to zero (#61543)
Fix floating point errors in TimeWindowAverageCalculator (#61580)
Remove default task-level timeout and clamp end_offset in Kafka datasource (#61476)
Avoid redundant reads in train_test_split (#60274)
Return None when no outputs have been produced (#62029)
Replace bare raise with TypeError in string concatenation (#60795)

📖 Documentation

Add job-level checkpointing documentation (#60921)
Update exclude_resources docs for Train autoscaling changes (#61990)
Add locality_with_output migration instructions (#61151)
Document max_tasks_in_flight_per_actor vs max_concurrent_batches (#60477)
Add missing MOD operation docs; improve ray.data.Datasource docs (#60803, #59654)
Add polars usage instructions (#60029)

Ray Serve

🎉 New Features:

Added end-to-end gRPC client and bidirectional streaming support, including public APIs, proxy handling, proto updates, and developer docs, so Serve apps can handle streaming workloads natively instead of building custom transport layers. (#60767, #60768, #60769, #60770, #60771)
Introduced HAProxy-based serving with fallback proxy support and load-balancer tunables, giving operators a higher-throughput ingress path and more control over traffic behavior in production. (#60586, #61180, #61271, #61468, #61988)
Added queue-based autoscaling for async inference and Taskiq-backed workloads, so scaling decisions can account for both HTTP in-flight load and queued tasks. (#59548, #60851, #60977, #61008)
Rolled out gang scheduling support across validation, core scheduling, fault tolerance, downscaling, autoscaling, rolling updates, and migration, enabling coordinated multi-replica placement for tightly coupled workloads. (#60944, #61205, #61206, #61207, #61215, #61467, #61216, #61659)
Introduced deployment-scoped actors with config/schema, lifecycle management, public API, and controller health checks, making it easier to run durable per-deployment sidecar-like logic inside Serve. (#61639, #61648, #61664, #61833, #62161)

💫 Enhancements:

Added first-class tracing support for Serve, including inter-deployment gRPC propagation and richer streaming-path attributes, improving end-to-end observability across distributed request flows. (#61230, #61089, #61451)
Expanded operational metrics with replica utilization, richer error labeling, and client IP logging in access logs, helping teams diagnose bottlenecks and user-impacting issues faster. (#60758, #61092, #60967)
Improved autoscaling extensibility with class-based policies and policy_kwargs, so advanced users can package reusable autoscaling logic without custom forks. (#60964)
Reduced controller overhead with broad algorithmic improvements (indexing, cache reuse, and avoiding repeated per-tick work), which improves scalability as deployment and replica counts grow. (#60810, #60829, #60830, #60838, #60842, #60843, #60844, #60832, #60806)
Improved throughput-oriented operation controls by adding environment-based tuning and explicit throughput optimization logging, making performance behavior easier to configure and audit. (#60757, #62146)
Upgraded Serve internals to Pydantic v2 and refined time-series aggregation behavior for more predictable metric accuracy under high load. (#61061, #61403)

🔨 Fixes:

Fixed a direct-ingress shutdown bug where replicas could hang indefinitely while draining stuck requests, ensuring bounded shutdown behavior in failure scenarios. (#60754)
Fixed HAProxy reliability issues, including config race conditions, draining guards, and platform compatibility edge cases, improving stability in production rollouts. (#61120, #60955)
Fixed autoscaling correctness issues that could cause runaway scaling or delayed reactions, including feedback-loop regressions, streaming scale-down behavior, and wall-clock delay handling. (#61731, #61920, #62331, #61844, #60613)
Fixed high-percentile latency regression in request routing and queue-length accounting, reducing tail-latency spikes under load. (#61755)
Fixed replica-state and health-state edge cases during migration and ingress transitions, preventing false errors and unhealthy/healthy misreporting. (#60365, #61818, #62213)
Fixed chained upstream actor-failure handling so request failures are attributed correctly and no longer hang when upstream deployments die mid-chain. (#61758, #62147)
Fixed HTTP status classification for client disconnects after successful responses, improving accuracy of error-rate monitoring and alerting. (#61396)

📖 Documentation:

Added AsyncInferenceAutoscalingPolicy documentation and clarified Serve performance guidance for HAProxy and inter-deployment gRPC use cases. (#61086, #61386)
Updated scheduling and configuration docs, including replica scheduling guidance and a catalog of Serve environment variables, so operators can tune deployments with less guesswork. (#60922, #60807)
Clarified multiplexing and async behavior docs (including model pre-warming con...

Contributors

justinrmiller, MrKWatkins, and 128 other contributors

Assets 2

Ray-2.54.1

elliot-barn released this 25 Mar 23:37

ray-2.54.1

8768a32

Ray Data

🔨 Fixes

Disable hanging issue detection (#61895) — The hanging issue detector was making blocking calls to the Ray State API, which could cause the scheduling loop to block and severely degrade pipeline performance. The detector is disabled in this patch release until the blocking calls are fixed.

Assets 2

Ray-2.54.0

aslonnie released this 18 Feb 23:44

ray-2.54.0

48bd1f8

Ray Data

🎉 New Features

Add checkpointing support to Ray Data (#59409)
Compute Expressions: list operations (#59346), fixed-size arrays (#58741), string padding (#59552), logarithmic (#59549), trigonometric (#59712), arithmetic (#59678), and rounding (#59295)
Add sql_params support to read_sql (#60030)
Add AsList aggregation (#59920)
Support CountDistinct aggregate (#59030)
Add credential provider abstraction for Databricks UC datasource (#60457)
Support callable classes for UDFExpr (#56725)
Add autoscaler metrics to Data Dashboard (#60472)
Add optional filesystem parameter to download expression (#60677)
Allow specifying partitioning style or flavor in write_parquet() (#59102)
New cluster autoscaler enabled by default (#60474)

💫 Enhancements

Improve numerical stability in scalers by handling near-zero values (#60488)
Export dataset operator output schema to event logger (#60086)
Iceberg: add retry policy for Storage + Catalog writes (#60620)
Iceberg: remove calls to Catalog Table in write tasks (#60476)
Expose logical operators and rules via package exports (#60297, #60296)
Demote Sort from requiring preserve_order (#60555)
Improve appearance of repr(dataset) (#59631)
Allow configuring DefaultClusterAutoscalerV2 thresholds via env vars (#60133)
Use Arrow IPC for Arrow Schema serialization/deserialization (#60195)
Store _source_paths in object store to prevent excessive spilling during read task serialization (#59999)
Add more shuffle fusion rules (#59985)
Enable and tune DownstreamCapacityBackpressurePolicy (#59753)
Enable concurrency cap backpressure with tuning (#59392)
Set default actor pool scale up threshold to 1.75 (#59512)
Don't downscale actors if the operator hasn't received any inputs (#59883)
Don't reserve GPU budget for non-GPU tasks (#59789)
Only return selected data columns in hive-partitioned Parquet files (#60236)
Ordered + FIFO bundle queue (#60228)
Add node_id, pid, attempt number for hanging tasks (#59793)
Revise resource allocator task scheduling to factor in pending task outputs (#60639)
Track block serialization time (#60574)
Use metrics from OpRuntimeMetrics for progress (#60304)
Tabular form for streaming executor op metrics (#59774)
Info-log cluster scale-up decisions (#60357)
Use plain mode instead of grid mode for OpMetrics logging (#59907)
Progress reporting refactors (#59350, #59629, #59880)
Remove deprecated TENSOR_COLUMN_NAME constant (#60573)
Remove meta_provider parameter (#60379)
Decouple Ray Train from Ray Data by removing top-level ray.data imports (#60292)
Move extension types to ray.data (#59420)
Skip upscaling validation warning for fixed-size actor pools (#60569)
Make StatefulShuffleAggregation.finalize allow incremental streaming (#59972)
Revisit OutputSplitter semantics to avoid unnecessary buffer accumulation (#60237)
Update to PyArrow 23 (#60739, #59489)
Add BackpressurePolicy to streaming executor progress bar (#59637)
Support Arrow-based transformations for preprocessors (#59810)
StandardScaler preprocessor with Arrow format (#59906)
OneHotEncoder with Arrow format (#59890)

🔨 Fixes

Fuse MapBatches even if they modify the row count (#60756)
Don't push limit past map_batches by default (#60448)
Fix wrong type hint of other dataset in zip and union (#60653)
Fix ActorPoolMapOperator to guarantee dispatch of all given inputs (#60763)
Fix ArrowInvalid error when backfilling missing fields from map tasks (#60643)
Fix attribute error in UnionOperator.clear_internal_output_queue (#60538)
Fix DefaultClusterAutoscalerV2 raising KeyError: 'CPU' (#60208)
Fix ReorderingBundleQueue handling of empty output sequences (#60470)
Fix task completion time without backpressure grafana panel metric name (#60481)
Fix Union operator blocking when preserve_order is set (#59922)
Fix autoscaler requesting empty resources instead of previous allocation when not scaling up (#60321)
Fix autoscaler not respecting user-configured resource limits (#60283)
Fix DefaultAutoscalerV2 not scaling nodes from zero (#59896)
Fix Iceberg warning message (#60044)
Fix Parquet datasource path column support (#60046)
Fix ProgressBar with use_ray_tqdm (#59996)
Fix stale stats on refit for preprocessors (#60031)
Fix StreamingRepartition hang with empty upstream results (#59848)
Fix operator fusion bug to preserve UDF modifying row count (#59513)
Fix AutoscalingCoordinator double-allocating resources for multiple datasets (#59740)
Fix DownstreamCapacityBackpressurePolicy issues (#59990)
Fix AutoscalingCoordinator crash when requesting 0 GPUs on CPU-only cluster (#59514)
Fix TensorArray to Arrow tensor conversion (#59449)
Fix resource allocator not respecting max resource requirement (#59412)
Fix GPU autoscaling when max_actors is set (#59632)
Fix checkpoint filter PyArrow zero-copy conversion error (#59839)
Restore class aliases to fix deserialization of existing datasets (#59828, #59818)
Fix DataContext deserialization issue with StatsActor (#59471)

📖 Documentation

Sort references in "Loading data and Saving data" pages (#60084)
Fix inconsistent heading levels in "How to write tests" guide (#60706)
Clarify resource_limits refers to logical resources (#60109)
Update read_lance doc (#59673)
Fix broken link in read_unity_catalog docstring (#59745)
Fix bug in docs for enable_true_multi_threading (#60515)
Add more education around transformations (#59415)

Ray Serve

🎉 New Features

Queue-based autoscaling for TaskConsumer deployments (phase 1). Introduces a QueueMonitor actor that queries message brokers (Redis, RabbitMQ) for queue length, enabling TaskConsumer scaling based on pending tasks rather than HTTP load. (#59430)
Default autoscaling parameters for custom policies. New apply_autoscaling_config decorator allows custom autoscaling policies to automatically benefit from Ray Serve's standard parameters (delays, scaling factors, bounds) without reimplementation. (#58857)
label_selector and bundle_label_selector in Serve deployments. Deployments can now specify node label selectors for scheduling and bundle-level label selectors for placement groups, useful for targeting specific hardware (e.g., TPU topologies). (#57694)
Deployment-level autoscaling observability. The controller now emits a structured JSON serve_autoscaling_snapshot log per autoscaling-enabled deployment each control-loop tick, with an event summarizer that reduces duplicate logs. (#56225)
Batching with multiplexing support. Batching now guarantees each batch contains requests for the same multiplexed model, enabling correct multiplexed model serving with @serve.batch. (#59334)

💫 Enhancements

Replica routing data structure optimizations. O(1) pending-request lookups, cached replica lists, lazy cleanup, optimized retry insertion, and metrics throttling yield significant routing performance improvements. (#60139)
New operational metrics suite. Added long-poll metrics, replica lifecycle metrics, app/deployment status metrics, proxy health and request routing delay metrics, event loop utilization metrics, and controller health metrics — greatly improving monitoring and debugging capabilities. (#59246, #59235, #59244, #59238, #59535, #60473)
Autoscaling config validation. lookback_period_s must now be greater than metrics_interval_s, preventing silent misconfigurations. (#59456)
Cross-version root_path support for uvicorn. root_path now works correctly across all uvicorn versions, including >=0.26.0 which changed how root_path is processed. (#57555)
Preserve user-set gRPC status codes. When deployments raise exceptions after setting a gRPC status code on the context, that code is now correctly propagated to the client instead of being overwritten with INTERNAL. Error messages are truncated to 4 KB to respect HTTP/2 trailer limits. (#60482)
Replica ThreadPoolExecutor capped to num_cpus. The user-code event loop's default ThreadPoolExecutor is now limited to the deployment's num_cpus, preventing oversubscription when using asyncio.to_thread. (#60271)
Generic actor registration API for shutdown cleanup. Deployments can register auxiliary actors (e.g., PrefixTreeActor) with the controller for automatic cleanup on serve.shutdown(), eliminating cross-library import dependencies. (#60067)
Deployment config logging in controller. Deployment configurations are now logged in the controller for easier debugging and auditability. (#59222, #59501)
Pydantic v1 deprecation warning. A FutureWarning is now emitted at ray.init() when Pydantic v1 is detected, as support will be removed in Ray 2.56. (#59703)

🔨 Fixes

Fixed tracing signature mismatch across processes. Resolved TypeError: got an unexpected keyword argument _ray_trace_ctx when calling actors from a different process than the one that created them (e.g., serve start + dashboard interaction). (#59634)
Fixed ingress deployment name collision. Ingress deployment name was incorrectly modified when a child deployment shared the same name, causing routing failures. (#59577)
Fixed downstream deployment over-provisioning. Downstream deployments no longer over-provision replicas when receiving DeploymentResponse objects. (#60747)
Fixed replicas hanging forever during draining. Replicas no longer hang indefinitely when requests are stuck during the draining phase. (#60788)
Fixed TaskProcessorAdapter shutdown during rolling updates. Removed shutdown() from __del__, which was broadcasting a kill signal to all Celery workers instead of just the local one, breaking rolling updates. (#59713)
Fixed Windows test failures. Resolved tracing file handle cleanup on Window...

Contributors

dlwh, pcmoritz, and 129 other contributors

Assets 2

Ray-2.53.0

aslonnie released this 20 Dec 15:16

ray-2.53.0

0de2118

Highlights

Ray plans to drop support for Pydantic V1 starting version 2.56.0. Please see this RFC for details.
Ray Data now has support for bounded reading from Kafka and improved Iceberg support.

Ray Data

🎉 New Features

Autoscaling: New utilization-based cluster autoscaler for Ray Data workloads (#59353, #59362, #59366). To use this new autoscaler set RAY_DATA_CLUSTER_AUTOSCALER=V2.
Kafka Datasource: Add Kafka as a native datasource for data ingestion (#58592)
Dataset summary API: Add Dataset.summary() API for quick dataset inspection (#58862)
Iceberg support: Add Iceberg schema evolution, upsert, and overwrite support (#59210, #59335)
Graceful error handling: Add should_continue_on_error for graceful error handling in batch inference (#59212)
Datetime compute expressions: Add datetime compute expressions support (#58740)
Grouped with_column expressions: Enable expressions for grouped with_column in Ray Data (#58231)
Parallelized collation: Parallelize DefaultCollateFn, arrow_batch_to_tensors (#58821)

💫 Enhancements

Optimized Autoscaler Step Size: Optimize autoscaler to support configurable step size for actor pool scaling (#58726)
Improved Streaming Repartition: Improve streaming repartition performance (#58728)
Actor init retry: Add actor retry if there's a failure in __init__ (#59105)
Fused Repartition + MapBatches: Fuse StreamingRepartition with MapBatches operators to scale collate (#59108)
Combined repartitions: Combine consecutive repartitions for efficiency (#59145)
Prefetch buffering: Handle prefetch buffering in iter_batches (#58657)
HashShuffle block breakdown: HashShuffleAggregator breaks down blocks on finalize (#58603)
Backpressure tuning: Tune concurrency cap backpressure object store budget ratio (#58813)
Non-string ApproximateTopK: Support non-string items for ApproximateTopK aggregator (#58659)
Lance version support: Add version support to read_lance() (#58895)
Dashboard metrics: Add time_to_first_batch and get_ref_bundles metrics to data dashboard (#58912)
Iter prefetched bytes stats: Add iter_prefetched_bytes statistics tracking (#58900)
Configurable batching for iter_batches: Add configurable batching for resolve_block_refs to speed up iter_batches (#58467)
Improved dashboard metrics: Improve Ray Data dashboard metrics display (#58667)
Histogram percentiles: Update Ray Data histograms to show percentiles in data dashboard (#58650)
Deprecated API removal: Remove deprecated read_parquet_bulk API (#58970)
Block shaping option: Add disable block shaping option to BlockOutputBuffer (#58757)
Removed concurrency lock: Remove concurrency lock for better performance (#56798)

🔨 Fixes

Fixes to Unique: Fix support of list types for Unique aggregator (#58916)
Parquet NaN fix: Fix reading from written parquet for numpy with NaNs (#59172)
Hash Shuffle empty block: Fix empty block sort in hash shuffle operator (#58836)
Hive partitioning pushdown: Fix pushdown optimizations with Hive partitioning (#58723)
Object Store usage reporting: Fix obj_store_mem_max_pending_output_per_task reporting (#58864)
Pyarrow FileSystem serialization fix: Handle filesystem serialization issue in get_parquet_dataset (#57047)
Azure UC SAS: Handle Azure UC user delegation SAS (#59393)
Async UDF Thread Cleanup: Close threads from async UDF after actor died (#59261)
Object Locality Default: Default return 0s for object locality instead of -1s (#58754)

📖 Documentation

Added contributing guide to Ray Data documentation (#58589)
Added download expression to key user journeys in documentation (#59417)
Added Kafka user guide (#58881)
Added unstructured data templates from Ray Summit 2025 (#57063)
Improved instructions for reading Hugging Face datasets (#58492, #58832)
Refined batch-format guidance in docs (#58971)
Exposed vision_preprocess and vision_postprocess in VLM docs (#59012)
Added upgrading huggingface_hub instruction (#59109)
Added scaling out expensive collation functions doc (#58993)

Ray Serve

🎉 New Features

Deployment topology visibility. Exposes deployment dependency graphs in Serve REST API, allowing users to visualize and understand the DAG structure of their applications. (#58355)
External autoscaler integration. Adds external_scaler_enabled flag to application config, enabling third-party autoscalers to control replica counts. (#57727, #57698)
Node rank and local rank support. Extends replica rank system to track node-level and per-node local ranks, enabling better distributed serving coordination for multi-node deployments. (#58477, #58479)
Custom batch size function. Allows users to define custom functions for computing logical batch sizes in @serve.batch, useful when batch items have varying weights (e.g., token counts in LLM inference). (#59059)
Stateful application-level autoscaling. Adds policy state persistence for custom autoscaling policies, allowing policies to maintain state across control-loop iterations. (#59118)
New autoscaling, batching, and routing metrics. Adds Prometheus metrics for autoscaling decisions (ray_serve_deployment_target_replicas, ray_serve_autoscaling_decision_replicas), batching statistics, and router queue latency for improved observability. (#59220, #59232, #59233)

💫 Enhancements

Smarter downscaling behavior. Prioritizes stopping most recently scaled-up replicas during downscale, preserving long-lived replicas that are optimally placed and fully warmed up. (#52929)
Autoscaling performance optimizations. Short-circuits metric aggregation for single time series cases (O(n log n) → O(1)) and lazily evaluates expensive autoscaling context fields to reduce controller CPU usage. (#58962, #58963)
Route matching cleanup. Removes redundant route matching logic from replicas since correct route values are now included in RequestMetadata. Also allows multiple methods (GET, PUT) corresponding to a route. (#58927)
Deployment wrapper metadata preservation. Wrapper classes from decorators like @ingress now preserve original class metadata (__qualname__, __module__, __doc__, __annotations__). (#58478)
Improved type annotations. Enhances generic type annotations on DeploymentHandle, DeploymentResponse, and DeploymentResponseGenerator for better IDE support and type inference. Adds .result() stub to DeploymentResponseGenerator to fix static typing errors. (#59363, #58522)

🔨 Fixes

YAML serialization for autoscaling enums. Fixes RepresenterError when using serve build with AggregationFunction enum values in autoscaling config. (#58509)
Autoscaling context timestamp fix. Correctly sets last_scale_up_time and last_scale_down_time on autoscaling context. (#59057)
Deadlock in chained deployment responses. Fixes hang when awaiting intermediate DeploymentResponse objects in a chain of deployment calls from different event loops. (#59385)
FastAPI class-based view inheritance. Fixes make_fastapi_class_based_view to properly handle inherited methods. (#59410)

📖 Documentation

Async I/O best practices guide. New documentation covering async programming patterns and best practices for Ray Serve deployments. (#58909)
Replica scheduling guide. New documentation covering compact scheduling, placement groups, custom resources, and guidance on when to use each feature. (#59114)

Ray Train

🎉 New Features

Worker Placement with Label Selectors: Added label_selector to ScalingConfig. This allows users to control worker placement by targeting specific labeled nodes in the cluster. (#58845, #59414)
Multihost JaxTrainer on GPU: Introduced support for JaxTrainer running on GPU machines. (#58322)
Checkpoint Consistency Modes: Added CheckpointConsistencyMode to get_all_reported_checkpoints, providing options for handling checkpoint retrieval consistency. (#58271)
Per-Dataset Execution Options: DataConfig now supports setting execution_options on a per-dataset basis for finer-grained control over data loading. (#58717)

💫 Enhancements

Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)
Non-Blocking Checkpoint Retrieval: get_all_reported_checkpoints no longer blocks when only metrics are reported. (#58870)
Improved Resource Cleanup: Implemented eager cleanup of data resources and placement groups upon training run failures or aborts, preventing resource leaks. (#58325, #58515)

🔨 Fixes

MLflow Compatibility: Updated setup_mlflow API to ensure full compatibility with Ray Train V2. (#58705)
Validation for Checkpoint Uploads: A ValueError is now raised if checkpoint_upload_fn fails to return a valid checkpoint. (#58863)

📖 Documentation

New API Documentation: Added comprehensive documentation for the ray.train.get_all_reported_checkpoints method. (#58946)

Ray Tune

💫 Enhancements:

Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)

Ray LLM

💫 Enhancements

Cloud filesystem restructuring with provider-specific implementations (#58469)
Bump transformers to 4.57.3 (#58980)
Ray Data LLM config refactor (#58298)
Update vllm_engine.py to check for VLLM_USE_V1 attribute (#58820)
Infer VLLM_RAY_PER_WORKER_GPUS from fractional placement-group bundles automatically (#5...

Contributors

justinrmiller, robertnishihara, and 97 other contributors

Assets 2

Ray-2.51.2

rayci-bot released this 29 Nov 00:40

ray-2.51.2

9ac1e61

Fix for CVE-2025-62593: reject Sec-Fetch-* other browser-specific headers in dashboard browser rejection logic

Assets 2

Ray-2.52.1

aslonnie released this 28 Nov 02:23

ray-2.52.1

4ebdc0a

More robust handling for CVE-2025-62593: test for more browser-specific headers in dashboard browser rejection logic

Assets 2

Ray-2.52.0

dayshah released this 21 Nov 19:10

ray-2.52.0

9527a55

Release Highlights

Ray Core:

End of Life for Python 3.9 Support: Ray will no longer be releasing Python 3.9 wheels from now on.
Token authentication: Ray now supports built-in token authentication across all components including the dashboard, CLI, API clients, and internal services. This provides an additional layer of security for production deployments to reduce the risk of unauthorized code execution. Token authentication is initially off by default. For more information, see: https://docs.ray.io/en/latest/ray-security/token-auth.html

Ray Data:

We’ve added a number of improvements for Iceberg, including upserts, predicate and projection pushdown, and overwrite.
We’ve added significant improvements to our expressions framework, including temporal, list, tensor, and struct datatype expressions.

Ray Libraries

Ray Data

🎉 New Features:

Added predicate pushdown rule that pushes filter predicates past eligible operators (#58150, #58555)
Iceberg support for upsert tables, schema updates, and overwrite operations (#58270)
Iceberg support for predicate and projection pushdown (#58286)
Iceberg write datafiles in write() then commit (#58601)
Enhanced Unity Catalog integration (#57954)
Namespaced expressions that expose PyArrow functions (#58465)
Added version argument to read_delta_lake (#54976)
Generator UDF support for map_groups (#58039)
ApproximateTopK aggregator (#57950)
Serialization framework for preprocessors (#58321)
Support for temporal, list, tensor, and struct datatypes (#58225)

💫 Enhancements:

Use approximate quantile for RobustScaler preprocessor (#58371)
Map batches support for limit pushdown (#57880)
Make all map operations zero-copy by default (#58285)
Use tqdm_ray for progress reporting from workers (#58277)
Improved concurrency cap backpressure tuning (#58163, #58023, #57996)
Sample finalized partitions randomly to avoid lens effect (#58456)
Allow file extensions starting with '.' (#58339)
Set default file_extensions for read_parquet (#56481)
URL decode values in parse_hive_path (#57625)
Streaming partition enforces row_num per block (#57984)
Streaming repartition combines small blocks (#58020)
Lower DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR to 2 (#58262)
Set udf-modifying-row-count default to false (#58264)
Cache PyArrow schema operations (#58583)
Explain optimized plans (#58074)
Ranker interface (#58513)

🔨 Fixes:

Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
Fixed handling of renames in projection pushdown (#58033, #58037)
Fixed broken LogicalOperator abstraction barrier in predicate pushdown rule (#58683)
Fixed file size ordering in download partitioning with multiple URI columns (#58517)
Fixed HTTP streaming file download by using open_input_stream (#58542)
Fixed expression mapping for Pandas (#57868)
Fixed reading from zipped JSON (#58214)
Fixed MCAP datasource import for better compatibility (#57964)
Avoid slicing block when total_pending_rows < target (#58699)
Clear queue for manually marked execution_finished operators (#58441)
Add exception handling for invalid URIs in download operation (#58464)
Fixed progress bar name display (#58451)

📖 Documentation:

Documentation for Ray Data metrics (#58610)
Simplify and add Ray Data LLM quickstart example (#58330)
Convert rST-style to Google-style docstrings (#58523)

🏗 Architecture:

Removed stats update thread (#57971)
Refactor histogram metrics (#57851)
Revisit OpResourceAllocator to make data flow explicit (#57788)
Create unit test directory for fast, isolated tests (#58445)
Dump verbose ResourceManager telemetry into ray-data.log (#58261)

Ray Train

🎉 New Features:

Result::from_path implementation in v2 (#58216)

💫 Enhancements:

Exit actor and log appropriately when poll_workers is in terminal state (#58287)
Set JAX_PLATFORMS environment variable based on ScalingConfig (#57783)
Default to disabling Ray Train collective util timeouts (#58229)
Add SHUTTING_DOWN TrainControllerState and improve logging (#57882)
Improved error message when calling training function utils outside Ray Train worker (#57863)
FSDP2 template: Resume from previous epoch when checkpointing (#57938)
Clean up checkpoint config and trainer param deprecations (#58022)
Update failure policy log message (#58274)

📖 Documentation:

Ray Train Metrics documentation page (#58235)
Local mode user guide (#57751)
Recommend tree_learner="data_parallel" in examples for distributed LightGBM training (#58709)

Ray Serve

🎉 New Features:

Custom request routing with runtime environment support. Users can now define custom request router classes that are safely imported and serialized using the application's runtime environment, enabling advanced routing logic with custom dependencies. (#56855)
Custom autoscaling policies with enhanced logging. Deployment-level and application-level autoscaling policies now display their custom policy names in logs, making it easier to debug and monitor autoscaling behavior. (#57878)
Audio transcription support in vLLM backend. Ray Serve now supports transcription tasks through the vLLM engine, expanding multimodal capabilities. (#57194)
Data parallel attention public API. Introduced a public API for data parallel attention, enabling efficient distributed attention mechanisms for large-scale inference workloads. (#58301)
Route pattern tracking in proxy metrics. Proxy metrics now expose actual route patterns (e.g., /api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180)
Replica dependency graph construction. Added list_outbound_deployments() method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350)
Multi-dimensional replica ranking. Introduced ReplicaRank schema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473)
Proxy readiness verification. Added a check to ensure proxies are ready to serve traffic before serve.run() completes, improving deployment reliability. (#57723)
IPv6 socket support. Ray Serve now supports IPv6 networking for socket communication. (#56147)

💫 Enhancements:

Selective throughput optimization flag overrides. Users can now override individual flags set by RAY_SERVE_THROUGHPUT_OPTIMIZED without manually configuring all f...