Skip to content

[autoscaler] Add kill and get IP commands to CLI for testing#3731

Merged
richardliaw merged 4 commits into
ray-project:masterfrom
stephanie-wang:autoscaler-kill-node
Jan 11, 2019
Merged

[autoscaler] Add kill and get IP commands to CLI for testing#3731
richardliaw merged 4 commits into
ray-project:masterfrom
stephanie-wang:autoscaler-kill-node

Conversation

@stephanie-wang

Copy link
Copy Markdown
Contributor

What do these changes do?

Adds 2 commands to the CLI that take in an autoscaler config:

  1. Kill a random ray node in the cluster.
  2. Get all the worker node IP addresses.

These commands are both for testing and are not recommended for normal use.

Related issue number

Closes #3685.

@stephanie-wang stephanie-wang requested review from ericl and robertnishihara and removed request for ericl January 9, 2019 22:35
@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10716/
Test PASSed.

Comment thread python/ray/scripts/scripts.py Outdated
help="Override the configured cluster name.")
def get_worker_ips(cluster_config_file, cluster_name):
click.echo('\n'.join(
get_worker_node_ips(cluster_config_file, cluster_name)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: splitting this into two lines instead of inlining can help with debuggability

Comment thread python/ray/autoscaler/commands.py
return provider.external_ip(head_node)


def get_worker_node_ips(config_file, override_cluster_name):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring?

@AmplabJenkins

Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10750/
Test FAILed.

Comment thread python/ray/scripts/scripts.py Outdated
cli.add_command(submit)
cli.add_command(teardown)
cli.add_command(teardown, name="down")
cli.add_command(kill, name="kill_node")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should guard this somehow so that users aren't doing this to kill clusters; maybe
name=_kill_random_node?

Comment thread python/ray/autoscaler/commands.py
Comment thread python/ray/autoscaler/commands.py Outdated

provider = get_node_provider(config["provider"], config["cluster_name"])
nodes = provider.nodes({TAG_RAY_NODE_TYPE: "worker"})
node = nodes[random.randint(0, len(nodes))]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could also random.choice(nodes)


_exec(updater, "ray stop", False, False)

time.sleep(5)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this sleep?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give the Raylet process some time to exit. Not strictly necessary, but I copied the code for cluster teardown: https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/commands.py#L93

Comment thread python/ray/scripts/scripts.py Outdated

@richardliaw richardliaw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits but looks fine; will approve after questions

@richardliaw richardliaw changed the title Add kill and get IP commands to CLI for testing Jan 10, 2019
Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu>
@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10757/
Test PASSed.

@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10755/
Test PASSed.

@richardliaw richardliaw merged commit cc5ecd7 into ray-project:master Jan 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants