Skip to content
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
adea3a9
Preliminary change to print data context
rayhhome Feb 17, 2026
800a0ff
Merge remote-tracking branch 'origin/master' into print-data-context
rayhhome Feb 18, 2026
fbab811
Prettier printed message
rayhhome Feb 18, 2026
6121e4c
Merge branch 'master' into print-data-context
rayhhome Feb 18, 2026
5c4bc77
Change message level from info to debug
rayhhome Feb 18, 2026
24d5479
Fix message level mistake
rayhhome Feb 18, 2026
08da0ee
Clearer and more efficient log message
rayhhome Feb 19, 2026
2b26e6e
Merge branch 'master' into print-data-context
rayhhome Feb 19, 2026
9ef9cc1
Remove redundant ExecutionOptions message + use dataContext repr
rayhhome Feb 19, 2026
9a0ceca
Merge remote-tracking branch 'refs/remotes/origin/print-data-context'…
rayhhome Feb 19, 2026
bb62e3b
Change log level
rayhhome Feb 19, 2026
c459741
Merge branch 'master' into print-data-context
rayhhome Feb 19, 2026
b17c1fc
Use log_once for Data Context
rayhhome Feb 20, 2026
dfc7202
Log once no longer trigger on each dataset
rayhhome Feb 20, 2026
a839653
Merge branch 'master' into print-data-context
rayhhome Feb 20, 2026
1cbdc7a
Merge branch 'master' into print-data-context
rayhhome Feb 23, 2026
a574e5c
Merge branch 'master' into print-data-context
rayhhome Feb 23, 2026
10ff713
Merge branch 'master' into print-data-context
rayhhome Feb 23, 2026
f12da12
Log once for each dataset based on id
rayhhome Feb 23, 2026
677b5d3
Merge branch 'master' into print-data-context
rayhhome Feb 23, 2026
ca832e8
Merge branch 'master' into print-data-context
rayhhome Feb 24, 2026
efa06cf
Merge branch 'master' into print-data-context
rayhhome Feb 24, 2026
c1e4300
Merge remote-tracking branch 'origin' into print-data-context
rayhhome Mar 2, 2026
3efde5f
Log DataContext in json format
rayhhome Mar 2, 2026
0e8ce54
Merge branch 'master' into print-data-context
rayhhome Mar 2, 2026
39bf7ca
Use sanitize_for_struct for formatting instead
rayhhome Mar 2, 2026
d861083
Merge branch 'master' into print-data-context
rayhhome Mar 2, 2026
6fb4c51
Increase truncate length to avoid truncation
rayhhome Mar 3, 2026
6566152
Merge branch 'print-data-context' of github.com:rayhhome/ray into pri…
rayhhome Mar 3, 2026
0c4194e
Merge branch 'master' into print-data-context
rayhhome Mar 3, 2026
b6481ac
Merge branch 'master' into print-data-context
rayhhome Mar 4, 2026
01c072c
Merge branch 'master' into print-data-context
rayhhome Mar 4, 2026
d08d1d5
Adding to the truncation length macro to ensure fully logged datacontext
rayhhome Mar 4, 2026
4a9c381
Merge branch 'master' into print-data-context
rayhhome Mar 4, 2026
3335991
Merge branch 'master' into print-data-context
rayhhome Mar 6, 2026
c794cd3
Merge branch 'master' into print-data-context
rayhhome Mar 7, 2026
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions python/ray/data/_internal/execution/streaming_executor.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import logging
import math
import pprint
import threading
import time
import typing
Expand Down Expand Up @@ -45,7 +44,10 @@
register_dataset_logger,
unregister_dataset_logger,
)
from ray.data._internal.metadata_exporter import Topology as TopologyMetadata
from ray.data._internal.metadata_exporter import (
Topology as TopologyMetadata,
sanitize_for_struct,
)
from ray.data._internal.operator_schema_exporter import (
OperatorSchema,
get_operator_schema_exporter,
Expand All @@ -65,6 +67,10 @@
# Interval for logging execution progress updates and operator metrics.
DEBUG_LOG_INTERVAL_SECONDS = 5

# Maximum string/sequence length for DataContext logging. Set high to avoid truncation
# while still protecting against pathological cases.
DATA_CONTEXT_LOG_TRUNCATE_LENGTH = 10000

# Visible for testing.
_num_shutdown = 0

Expand Down Expand Up @@ -196,7 +202,10 @@ def execute(
):
logger.debug(
f"Data Context for dataset {self._dataset_id}:\n%s",
pprint.pformat(self._data_context),
sanitize_for_struct(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this truncate nesting inside the datacontext, or does it truncate the whole context? I believe we want to log the full context and not lose keys

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might truncate nesting inside the datacontext, e.g.

'execution_options': 'ExecutionOptions(resource_limits=ExecutionResources(cpu=inf, gpu=inf, object_store_memory=inf, memor...'

I tried to mitigate this issue by setting truncate_length to DATA_CONTEXT_LOG_TRUNCATE_LENGTH. My testing shows that none of the fields would be truncated in the debug output with the current configuration (log.txt)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @rayhhome I have the same question

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current truncate_length passed into (which is DATA_CONTEXT_LOG_TRUNCATE_LENGTH, i.e. 10000), the DataContext will not be truncated unless there's any string with more than 10000 characters or any list with more than 10000 elements, which I believe is unlikely.

Currently, calling json.dumps directly on DataContext raises a exception because there are a few fields in DataContext that are not JSON serializable (I've documented such fields in the PR description). I can use _json_default to get around this though, do we want to switch to using json.dumps instead of sanitize_for_struct?

self._data_context,
truncate_length=DATA_CONTEXT_LOG_TRUNCATE_LENGTH,
),
)

# Setup the streaming DAG topology and start the runner thread.
Expand Down