Skip to content

Support service network attachment changes#32062

Merged
thaJeztah merged 3 commits into
moby:masterfrom
aaronlehmann:change-network-attachments
Apr 10, 2017
Merged

Support service network attachment changes#32062
thaJeztah merged 3 commits into
moby:masterfrom
aaronlehmann:change-network-attachments

Conversation

@aaronlehmann

@aaronlehmann aaronlehmann commented Mar 24, 2017

Copy link
Copy Markdown
  • Vendor a new version of swarmkit that allows service network attachments to change
  • Add --network-add and --network-rm options to docker service update
  • Resolve network IDs on the client side (see Inconsistency between REST and gRPC APIs: Network attachments #32060). Resolution in the daemon is left in place for backward compatibility.
  • Avoid filling in deprecated Spec.Networks field
  • Sort networks in the TaskSpec for update stability
  • Add an integration test for changing service networks

Fixes #28247

cc @dhiltgen @mavenugo @aluzzardi @yongtang

@aaronlehmann aaronlehmann added area/swarm status/failing-ci Indicates that the PR in its current state fails the test suite status/needs-vendoring labels Mar 24, 2017
@aluzzardi

Copy link
Copy Markdown
Member

LGTM

Might want to try stack deploy

@aaronlehmann

Copy link
Copy Markdown
Author

Tested this with stack deploy, and it didn't work as expected because stack deploy is using the deprecated ServiceSpec.Networks field, instead of TaskTemplate.Networks.

I've added a commit that uses TaskTemplate.Networks instead when the API version is recent enough. This change can't be unconditional, because it causes the client to move the networks of stack-related services from ServiceSpec to TaskTemplate, and this is only allowed by engines that have the swarmkit change included in this PR.

cc @dnephin

Comment thread cli/command/service/opts.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann shouldn't we share the ctx context.Context here too ? 👼

Comment thread cli/compose/convert/service.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann I would rather avoid the boolean arg and just pass the ClientVersion here, and handle the condition inside the func.

@mavenugo

Copy link
Copy Markdown
Contributor

@aaronlehmann #31714 is merged now. pls rebase

@yongtang

Copy link
Copy Markdown
Member

Code LGTM.

Avoid filling deprecated Spec.Networks field helps a lot in the future. There were many confusions about this part from user as far as I could remember.

@aaronlehmann aaronlehmann force-pushed the change-network-attachments branch from f5886bd to ba14cfa Compare March 27, 2017 17:30
@aaronlehmann

Copy link
Copy Markdown
Author

Rebased and addressed comments.

@aaronlehmann aaronlehmann removed status/failing-ci Indicates that the PR in its current state fails the test suite status/needs-vendoring labels Mar 27, 2017

@vdemeester vdemeester left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM 🐸

Comment thread cli/command/service/opts.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm -1 on this change and on #32060 in general. The engine API handles this already so that every client doesn't need to handle it.

I also don't think that the convert code should be making API calls.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client is preparing a spec here, which serves as configuration for the service. It's quite unexpected for the engine to be rewriting the spec before storing it. I see it as a analagous to docker compose up rewriting the compose file on disk.

In theory this change could be split out to a different PR, but I feel strongly that we shouldn't be relying on the engine to magically rewrite the service's configuration, which is supposed to be completely determined by the client. We're effectively overloading a field that's defined to contain a network ID, and saying that it's okay to put a network name there in certain cases, because that field will be rewritten before it's acted upon.

More details in #32060 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an issue, can we go ahead and split it out?
Also we can probably optimize this by hitting NetworkList with filters for the network names.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a analagous to docker compose up rewriting the compose file on disk.

I don't really see it that way. The user controlled data is the command line they run. Either way "something" takes the names they provide and turns them into IDs. What does it matter to the user if that's done in the client or the daemon?

If it's done in the client then every single API user has to duplicate the lookup logic.

Is there any practical advantage to doing it in the engine client vs the daemon (which is acting the swarmkit client)?

Comment thread cli/command/service/opts.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input is already a slice, why do we need to sort them?

If the order isn't relevant then the server which processes this data should perform a sort before operating on it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably you have a point here. We sort these so that the manager won't see two tasks with the same set of networks as different. Right now the manager is doing a simple reflect.DeepEqual on the TaskSpec. Ideally, the manager would do this comparison in a smarter way. That said, an order-insensitive comparison might end up being inefficient. We might have to pay the overhead of sorting on every comparison. But reflect is very inefficient to begin with, so extra overhead from sorting may be comparatively small.

Since this would be a somewhat tangential change in swarmkit, I suggest that we leave in the sort for the moment, and consider removing the need for it through a separate change. Feel free to open an issue and assign it to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't also be less overhead for the server if the slice is pre-sorted, since it won't have to actually move anything around even if it does call sort?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.Cmd() needs to be deprecated, please use d.Command() which provides better errors messages and less verbose test cases.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to d.Command

Comment thread cli/compose/convert/service.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass in just the version string instead of the entire client?

I believe that's what was meant by #32062 (comment)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pass the version string

Comment thread cli/command/service/update.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context should be passed in (if it's going to be used), but I think we should be using names not ids here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary for this function to make API calls. Even if translation from network names to network IDs is done in the engine, as you prefer, it is done before the service is stored. When the service is retrieved from the manager NetworkAttachmentConfig.Target will contain an ID. If a user tries to remove a network by name, the name needs to be resolved to an ID so that the appropriate NetworkAttachmentConfig can be removed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threaded Context through

@yongtang

Copy link
Copy Markdown
Member

I tested manually on --network-add and --network-rm parts, and they work as expected.

@aaronlehmann aaronlehmann force-pushed the change-network-attachments branch from ba14cfa to 540217c Compare March 31, 2017 02:30
@vdemeester

Copy link
Copy Markdown
Member

@aaronlehmann needs a rebase 👼 🙇

@aaronlehmann aaronlehmann force-pushed the change-network-attachments branch from 540217c to ec010b4 Compare March 31, 2017 17:00
@aaronlehmann

Copy link
Copy Markdown
Author

Rebased

@aaronlehmann

Copy link
Copy Markdown
Author

ping @dnephin: Is the switch from submitting network names on the client side to using network IDs still a concern?

The issue about client side vs. server side sorting is being handled in moby/swarmkit#2065, but I don't expect that will make 17.05 at this point.

@thaJeztah

Copy link
Copy Markdown
Member

Started doing some testing

Init swarm, create two networks, the second having the first one's ID as name (heh,
always worth testing);

docker swarm init
docker pull nginx:alpine

docker network create -d overlay foo
jnb6r5655lgauwaxc2xk3bjkg

docker network create -d overlay jnb6r5655lgauwaxc2xk3bjkg
o53gxh9we29bzcwszsi53t4h0

docker network ls
NETWORK ID          NAME                        DRIVER              SCOPE
f23f49c32e1a        bridge                      bridge              local
b05cf8bbb91a        docker_gwbridge             bridge              local
jnb6r5655lga        foo                         overlay             swarm
1a4bcf679928        host                        host                local
7frgm3zxl00t        ingress                     overlay             swarm
o53gxh9we29b        jnb6r5655lgauwaxc2xk3bjkg   overlay             swarm
193c0931de9a        none                        null                local

Create a service, attaching to network by ID:

docker service create --name web --network=jnb6r5655lgauwaxc2xk3bjkg nginx:alpine
uyjck7mt1ee6r4au1ak5l6508

docker service inspect -f '{{json .Spec.TaskTemplate.Networks }}'  web | jq .
[
  {
    "Target": "jnb6r5655lgauwaxc2xk3bjkg"
  }
]

That looks right 👍

I did notice this (network name nor ID is no longer in .Spec.Networks);

docker service inspect -f '{{json .Spec.Networks }}'  web | jq .
null

Remove the network from the service (looks OK);

docker service update --network-rm=jnb6r5655lgauwaxc2xk3bjkg web
web

docker service inspect -f '{{json .Spec.TaskTemplate.Networks }}'  web | jq .
null

Doing again, but by "name";

docker service create --name web2 --network=foo nginx:alpine
uazr3rgs0anvfe8gilfjhlmb3

docker service inspect -f '{{json .Spec.TaskTemplate.Networks }}'  web2 | jq .
[
  {
    "Target": "jnb6r5655lgauwaxc2xk3bjkg"
  }
]

Here also, the network name as specified by the user is not to be found:

docker service inspect -f '{{json .Spec.Networks }}'  web2 | jq .
null

docker service inspect --pretty web2

ID:		uazr3rgs0anvfe8gilfjhlmb3
Name:		web2
Service Mode:	Replicated
 Replicas:	1
Placement:
UpdateConfig:
 Parallelism:	1
 On failure:	pause
 Max failure ratio: 0
RollbackConfig:
 Parallelism:	1
 On failure:	pause
 Max failure ratio: 0
ContainerSpec:
 Image:		nginx:alpine@sha256:b65b240c4d8418167731f6c82544c6e7d29655530e4ed26c9f5eb27efba269af
Resources:
Endpoint Mode:	vip
@thaJeztah

thaJeztah commented Apr 6, 2017

Copy link
Copy Markdown
Member

The original (as specified by the user) name of the network to not be included in the spec is a regression from 17.03/17.04. I think we should preserve both a name and an ID somewhere in the spec (ideally, in such a way that the relation is clear).

edit I see the comment here https://github.com/docker/docker/pull/32062/files#diff-dc2678e1e65c05f198d9b84656ed2f00R139 that .Spec.Networks was deprecated. Which lead to the question;

  • what happens when downgrading a daemon to the previous version? (perhaps resolve the name, and set both .Spec.Networks and .Spec.TaskTemplate.Networks, would that work?)
  • we still need a way for the user to show which network(s) a service is connected to, other than only seeing an "id" (see my comment above)
@thaJeztah

thaJeztah commented Apr 6, 2017

Copy link
Copy Markdown
Member

Combine add and remove in a single update;

docker service create --name web --network=foo nginx:alpine
9vlyasko7cspst2py2kjgzoy4

docker service inspect -f '{{json .Spec.TaskTemplate.Networks }}'  web | jq .
[
  {
    "Target": "jnb6r5655lgauwaxc2xk3bjkg"
  }
]

docker service update --network-add=foo --network-rm=foo web
web

docker service inspect -f '{{json .Spec.TaskTemplate.Networks }}'  web | jq .
null

That's not correct; we should do remove before add, so that removing/adding the network in the same update becomes a no-op. In this case, it's not a very useful thing to do, but if we add "advanced" options to networks, and want to update options, the network should not be removed but updated (--network-rm=foo --network-add=name=foo,option=bar)

@thaJeztah

Copy link
Copy Markdown
Member

FWIW, I did a similar change for "secrets"; #29802

@thaJeztah

Copy link
Copy Markdown
Member

I'm not sure I understand this one correctly. Is this a regression with this PR? In the old code, the network name was being sent to the REST API, but my understanding is that the daemon would rewrite it to an ID before storing the service in swarmkit's data store. Was it possible to see the network name before this PR?

Yes, if I run in docker 17.04, I see the network name in service inspect. I think that's useful information to have because as a used, I'd be interested in what networks my services are connected to

@aaronlehmann

Copy link
Copy Markdown
Author

Yes, if I run in docker 17.04, I see the network name in service inspect. I think that's useful information to have because as a used, I'd be interested in what networks my services are connected to

Oh, I think this is because the old CLI code wrote to both the deprecated field and the non-deprecated field. The code in the engine that translates an ID to a name would only translate the non-deprecated field if both were present. So you would still see the name in the deprecated field.

@aaronlehmann

aaronlehmann commented Apr 6, 2017

Copy link
Copy Markdown
Author

I don't think this PR changes that behaviour. You submit something from the CLI with a name, and you the inspect shows an ID. Is that not the case?

I'm talking about the API behavior, not the CLI behavior.

And the PR doesn't change the API's behavior (to preserve backward compatibility), but I think it's better not to rely on this behavior and be explicit about submitting IDs in places that really should have IDs.

@aaronlehmann

Copy link
Copy Markdown
Author

@thaJeztah: I've updated the CLI to apply network removals before network additions. However, I noticed that for everything except secrets, we seem to use the opposite approach. So if you feel this is wrong, we should have a separate PR that fixes these cases. Feel free to open an issue.

As for visibility of network names, the object model is defined so that tasks reference networks by IDs. The fact that a network name was appearing the spec before was an error caused by the daemon only rewriting one of the two fields the CLI inserted (another reason I'm not a fan of rewriting fields in the daemon). I propose that service inspect --pretty translate the IDs to names for display. I'll work on a separate PR to do this, since I feel it's out of scope for this one. Does that sound good?

@aaronlehmann aaronlehmann force-pushed the change-network-attachments branch from 4bcd9d3 to 061fcb8 Compare April 7, 2017 23:43
@aaronlehmann

Copy link
Copy Markdown
Author

As for visibility of network names, the object model is defined so that tasks reference networks by IDs. The fact that a network name was appearing the spec before was an error caused by the daemon only rewriting one of the two fields the CLI inserted (another reason I'm not a fan of rewriting fields in the daemon). I propose that service inspect --pretty translate the IDs to names for display. I'll work on a separate PR to do this, since I feel it's out of scope for this one. Does that sound good?

Actually, I decided to add it to this PR, since I realized docker service inspect --pretty isn't showing useful network info without this change. PTAL.

Aaron Lehmann added 3 commits April 7, 2017 16:46
Resolve networks IDs on the client side.

Avoid filling in deprecated Spec.Networks field.

Sort networks in the TaskSpec for update stability.

Add an integration test for changing service networks.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This is the non-deprecated field, and the one that can be changed in a
service update.

Since old daemon versions don't allow migrating from one field to the
other, make this conditional on the API version.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the change-network-attachments branch from 061fcb8 to 56f1da7 Compare April 7, 2017 23:46
@thaJeztah

Copy link
Copy Markdown
Member

thanks! that satisfies my concerns; not able to review code right now, but if it has enough LGTM's, don't let me stop this

@thaJeztah thaJeztah left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

As a follow up, we could consider adding the names to the "non-pretty" case as well, for people that want to use --format to print names 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment