Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update restore network fix for checkpointed container to latest bouch… #19

Open
wants to merge 5 commits into
base: cr-combined
Choose a base branch
from
Open

Update restore network fix for checkpointed container to latest bouch… #19

wants to merge 5 commits into from

Conversation

huikang
Copy link

@huikang huikang commented Oct 18, 2015

…er/docker/cr-combined.

Reuse the endpoint of the checkpointed container when restore.
Pass veth pair name to ciur when restore a checkpointed container.

TODO: Add libnetwork API to retrieve ethXXX in the container

Signed-off-by: Hui Kang [email protected]

Saied Kazemi and others added 4 commits October 1, 2015 12:55
Methods for checkpointing and restoring containers were added to the
native driver.  The LXC driver returns an error message that these
methods are not implemented yet.

Signed-off-by: Saied Kazemi <[email protected]>

Conflicts:
	daemon/execdriver/native/create.go
	daemon/execdriver/native/driver.go
	daemon/execdriver/native/init.go

Conflicts:
	daemon/execdriver/driver.go
	daemon/execdriver/native/create.go
Support was added to the daemon to use the Checkpoint and Restore methods
of the native exec driver for checkpointing and restoring containers.

Signed-off-by: Saied Kazemi <[email protected]>

Conflicts:
	api/server/server.go
	daemon/container.go
	daemon/daemon.go
	daemon/networkdriver/bridge/driver.go
	daemon/state.go
	vendor/src/github.com/docker/libnetwork/ipallocator/allocator.go

Conflicts:
	api/server/server.go
- C/R is now an EXPERIMENTAL level feature.
- Requires CRIU 1.6 (and builds it from source in the Dockerfile)
- Introduces checkpoint and restore as top level cli methods (will likely change)

Signed-off-by: Ross Boucher <[email protected]>
…er/docker/cr-combined.

Reuse the endpoint of the checkpointed container when restore.
Pass veth pair name to ciur when restore a checkpointed container.

TODO: Add libnetwork API to retrieve ethXXX in the container

Signed-off-by: Hui Kang <[email protected]>
@boucher
Copy link
Owner

boucher commented Oct 18, 2015

I'll try this out as soon as I get a chance.

for _, i := range criuOpts.VethPairs {
veth := new(criurpc.CriuVethPair)
veth.IfOut = proto.String(i.HostInterfaceName)
veth.IfOut = proto.String(i.HostInterfaceName + "@docker0")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add the @docker0 in docker rather than here.

rename veth name in runconfig/restore.go

Signed-off-by: Hui Kang <[email protected]>
@huikang
Copy link
Author

huikang commented Oct 18, 2015

@boucher Updated.
Looks like we need to work with libnetwork regarding the interface.

@boucher
Copy link
Owner

boucher commented Oct 22, 2015

Sorry for the delay. This seems to be working for me locally. Unfortunately, I can't really merge it with the libnetwork change. We need to figure out a way to do this that they'll accept.

@boucher
Copy link
Owner

boucher commented Oct 29, 2015

Unfortunately, I've rebased and this no longer applies again. container.NetworkSettings.EndpointID no longer appears to exist, and the releaseNetwork logic has been moved around quite a bit.

@boucher
Copy link
Owner

boucher commented Oct 29, 2015

I've pushed an attempted update here, but it has some flaws:
https://github.com/boucher/docker/tree/huikang-network-fix-rebased

@huikang
Copy link
Author

huikang commented Oct 29, 2015

I will look at it soon. Thanks.

@boucher boucher force-pushed the cr-combined branch 2 times, most recently from e91c518 to 988a915 Compare November 3, 2015 17:17
@boucher boucher force-pushed the cr-combined branch 2 times, most recently from b584b5a to a6a4511 Compare November 12, 2015 17:00
@boucher boucher force-pushed the cr-combined branch 4 times, most recently from 7c96921 to 7fda470 Compare December 5, 2015 01:35
@amakumar
Copy link

I am trying to checkpoint and restore a container with active TCP connection. For this i took the latest code from boucher's cr-combined branch and compiled it with Experimental flag enabled.

I have compiled and installed CRIU version 1.8

I have a docker image (TCP server ) which contains the code to listen on a TCP socket. And i execute a client code which sends messages to the server and waits for the response from server. The client is executed from the same host in which the containers are running

When i issue a checkpoint the client sends the message to the server and keeps the waiting for the response.

Once the server container is restored( same container and not new one) the client is unable to send the message and the client exits. Also found that the interface eth0 of the restored container is not in running state( From the container the docker bridge is not pingable).

The above issue is not seen if i run docker with --net=host option and checkpoint and restore of tcp connection works seamlessly.

Is this an know issue and is there any workaround for it ?

@boucher boucher force-pushed the cr-combined branch 5 times, most recently from 06cd8b9 to 9bb9ce0 Compare December 14, 2015 17:01
@boucher boucher force-pushed the cr-combined branch 3 times, most recently from d80f2fb to 9272300 Compare December 17, 2015 21:01
@hixichen
Copy link

hixichen commented Jun 15, 2016

@amakumar , Great thanks for your post.

I faced the same issue.
--net=host can bypass this issue.
I had checked this for several days.
I found that the fd file are restored properly.

Process number matters

Here is the clue I had found:

if you donot use --net=host,

you will get:(ps auxf)
root 14333 0.0 3.0 1162612 30932 ? Sl Jun14 1:04 docker daemon
root 23262 0.0 1.8 121968 18712 ? Sl 22:23 0:00 _ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9095 -container-ip 1
root 23285 2.8 4.2 412488 42964 ? Ssl 22:23 0:01 _ ./moped_server -j 2

Here we get TWO process!
but if you run with '--net-host'.

we will get only ONE process.

Here is the difference.

https://criu.org/Inheriting_FDs_on_restore
with two processes, container relies on the FIFO to connect each.

lsof -p [container process id]

myapp 23285 root 1w FIFO 0,9 0t0 262144 pipe
myapp 23285 root 2w FIFO 0,9 0t0 262145 pipe
myapp 23285 root 3u sock 0,8 0t0 260837 can't identify protocol

maybe docker native checkpoint /restore do not support or handle inherite_FD operation very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants