Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker integration #13

Closed
francisco-polaco opened this issue Jul 30, 2018 · 33 comments
Closed

Docker integration #13

francisco-polaco opened this issue Jul 30, 2018 · 33 comments

Comments

@francisco-polaco
Copy link

Hi all,

I was trying to launch some Rapid nodes using Docker, but every node is throwing an exception saying that it cannot join.
Basically, I've compiled using maven and Java 8 and copied the JAR files to a Docker image. From what I could inspect, the JARs have all the dependencies needed.

I've attached a zip file with my Docker file, the container's script and a log of a previous attempt.
files.zip
Do you have any idea of what is causing this issue?

Thank you

@lalithsuresh
Copy link
Owner

lalithsuresh commented Jul 30, 2018

Thanks for submitting an issue.

I looked over your script and don't see anything wrong with the java command you're running to start Rapid nodes.

Can you double check if the seed node is brought up before the others (the seed node has listenAddress == seedAddress)?

@francisco-polaco
Copy link
Author

I checked again and the seed node is launch immediately, while the other nodes wait 60 seconds to be launched.
Also, the seed node has listenAddress == seedAddress but the issue remains.
PS: The containers are using an overlay network, but I guess this shouldn't be the problem.

@lalithsuresh
Copy link
Owner

It shouldn't be an issue, I've ran it on overlay networks before.

To make sure it's not the overlay network, can you try to bring up the nodes outside of docker on a single machine?

@francisco-polaco
Copy link
Author

francisco-polaco commented Aug 1, 2018

Running 3 nodes, in a single machine, as described in the README file results in every node being stuck.
Or at least, it seems that way. I'm not sure if it's a log4j problem.

Every node prints the following:

log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

@lalithsuresh
Copy link
Owner

Yes, it can't find a log4j configuration file, and therefore does not print any information.

Here's an example log4j.configuration file you can use: https://gist.github.com/lalithsuresh/5abe5990e94c5e999550a107d67aa062

Then do:

java -Dlog4j.configuration=file:<path-to-log4j.properties-file> -jar examples/target/standalone-agent.jar --listenAddress <ip:port> --seedAddress <ip:port>

@francisco-polaco
Copy link
Author

Thank you. I can run the 3 nodes example in a single machine. However, I cannot do it using Docker.
I've attached the log file.
pure.log

@lalithsuresh
Copy link
Owner

Have you verified the firewall settings on the overlay network? What about MTU sizes? I'm guessing here because the logs don't have enough info for me to go on.

@lalithsuresh
Copy link
Owner

lalithsuresh commented Aug 6, 2018

@francisco-polaco thinking further about your issue, I wonder if your overlay network is remapping ports (is there NAT involved)?

@francisco-polaco
Copy link
Author

@lalithsuresh I checked the network configuration and I don't believe its anything related with NAT.
However, I forced java to use IPV4 and the results have changed. -- JVM flag -Djava.net.preferIPv4Stack=true

Could you check if this log represents the correct functioning of Rapid? Because I see a lot of exceptions and the cluster size went from 1 to 8.
I also send a network description hoping you can detect something that is making all these exceptions.
log.zip

@lalithsuresh
Copy link
Owner

It's reporting the right cluster size, but I suspect the default messaging layer we use (gRPC) is experiencing issues, judging from the errors you see. I'll dig deeper and get back to you.

@francisco-polaco
Copy link
Author

Hey @lalithsuresh! Any progress regarding this issue?

@lalithsuresh
Copy link
Owner

My suspicion is that the default timeouts are too low, which causes some gRPC messages to timeout (during the join phase, because they take the longest). This then causes a bunch of late responses to show up at nodes, which gives the unknown stream errors.

I usually set messaging timeouts on my own when running Rapid experiments. Can you please try the version here: https://github.com/lalithsuresh/rapid/tree/issue-13?

@francisco-polaco
Copy link
Author

As before, a lot of exceptions related to deadlines were thrown, but in between exceptions some nodes were added to the membership. I've bootstrapped 10, but the cluster size is 8.
pure.log

@lalithsuresh
Copy link
Owner

Yeah, a join timing out after 20 seconds is definitely pathological -- something is definitely wrong with gRPC's messaging between nodes here. Any instructions on how I can reproduce your docker setup?

@lalithsuresh
Copy link
Owner

I set up something using the Dockerfile you'd shared earlier and I can reproduce the behavior you're seeing. I'll investigate and keep you posted!

@lalithsuresh
Copy link
Owner

lalithsuresh commented Sep 14, 2018

I take that back. I'm still not able to reproduce it. :)

Can you share your docker workflow here so I can try it out too?

This is what I tried and it works for me:

$: sudo docker network create --driver bridge isolated_nw1 --subnet 172.28.0.0/16

# In different terminals
$: docker run --network isolated_nw1 --ip 172.28.0.6 rapid-docker:latest
$: docker run --network isolated_nw1 --ip 172.28.0.7 rapid-docker:latest
$: docker run --network isolated_nw1 --ip 172.28.0.8 rapid-docker:latest
$: docker run --network isolated_nw1 --ip 172.28.0.9 rapid-docker:latest

Every terminal reports something like:

 INFO 14-Sep-2018-20:58:53,570 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:54,572 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:55,574 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:56,575 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:57,554 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:58,555 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:58:59,556 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:00,557 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:01,558 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:02,560 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:03,562 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:04,563 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:05,564 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:06,566 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:07,567 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4
 INFO 14-Sep-2018-20:59:08,569 [main] (StandaloneAgent.java:84) - Node 172.28.0.6:5001 -- cluster size 4

@francisco-polaco
Copy link
Author

The network driver I am using is Overlay, creating a "closed" cluster.
Here you can find a network inspection
eptonet2.txt

@lalithsuresh
Copy link
Owner

@francisco-polaco : can you share the commands you used to create the network and deploy the swarm?

@francisco-polaco
Copy link
Author

In this case, the swarm only have one node, since I'm trying to run it on my local machine.
I'm using swarm since my python script is ready to deploy the system in the cloud. The following commands are the conversion of the python's SDK to bash.

Swarm creation:
docker swarm init

Network creation:
docker network create --driver "overlay" --scope "swarm" --attachable --internal eptonet

Service launching:
docker service create --replicas 10 --name rapid rapid

Then use docker network connect to connect the containers to the created network. Keep in mind that the seed IP in the containers script may change.

@lalithsuresh
Copy link
Owner

lalithsuresh commented Sep 18, 2018

Still seems to work for me when I create a swarm according to your instructions: https://gist.github.com/lalithsuresh/40f57f402c14e9716ac0bfbe04810069

Note that some of the cancellations reported by gRPC are benign (for example, during the join phase, the joiner makes progress as soon as it hears from even one of its temporary observers -- the remaining responses get dropped).

If you however see a lot of messages with DEADLINE_EXCEEDED, that's bad.

@lalithsuresh
Copy link
Owner

If it helps:

$: docker --version
Docker version 18.06.1-ce, build e68fc7a

@francisco-polaco
Copy link
Author

I am running the same docker version as you and this time I've run the commands by hand.
There is a lot of DEADLINE_EXCEEDED messages and the cluster size is not right.
STDOUT: https://gist.github.com/francisco-polaco/1b31078a8d7613b6255530d784f12450 STDERR: https://gist.github.com/francisco-polaco/724626e7faf6ee733c4b012262d11f85
Is it maybe something related to my compiled jar? Could you send yours?

@lalithsuresh
Copy link
Owner

You are building the right jar, judging from the timeouts being printed. It seems like even the most simple RPCs (like the first join request message) are timing out in your trace, whereas it isn't the case on mine. I find this odd, given that the timeouts are high. That would explain not just the DEADLINE_EXCEEDED messages, but also the "Received DATA frame for an unknown stream " errors in your STDERR (because the caller receives a response after the timeout).

Let's try increasing the timeouts one more time. Can you pull the latest version from the issue-13 branch? I've bumped up some of the timeouts to 5 seconds.

By the way, please note that the StandaloneAgent code terminates after 400 seconds -- which causes docker swarm to keep recreating the containers.

@francisco-polaco
Copy link
Author

Again, the same behavior https://gist.github.com/francisco-polaco/64137ef4cf07a8074dc81ce31353eb6f
The last file is the commands I am using to set up the experiments.
To compile it, I am running mvn clean package with oracle-jdk-8

@lalithsuresh
Copy link
Owner

Your compilation is fine.

The previous command you'd shared with me was not using the overlay network. I'll retry again with the new commands you'd shared and get back to you.

One question: how are you forcing swarm to create a container with the first node's IP?

@francisco-polaco
Copy link
Author

I just delete and create the network again. The IPs are reset.
Then, if you launch, for example, 10 replicas, you know that one replica will have an IP between 172.28.0.2 and 172.28.0.11. Docker does this because if you launch more replicas after the initial ones, their IPs will be always consecutive.

So in the container-start-script.sh you replace the variable FIRST_NODE_IP by something like 172.28.0.10 and that node will be the seed node. If you replace by that IP, I think it should work like a charm :)

@lalithsuresh
Copy link
Owner

Thanks! I'm able to reproduce the effect now.

I tried two runs:

It works fine with this:

$: docker service create --replicas 10 --name rapid-docker --network teste rapid-docker:latest

But not when I use the flag you had to limit CPU utilization:

$: docker service create --replicas 10 --name rapid-docker --limit-cpu 0.1 --network teste rapid-docker:latest

Can you double check?

@lalithsuresh
Copy link
Owner

Note, the command you ran likely limits the CPU utilization per container to 10% of a single core.

@francisco-polaco
Copy link
Author

Yup the culprit was the CPU limit. In my python script, I also was limiting the CPU.
I would never figure out the cause of the problem.
I tried with the master branch and everything worked as well.
Thanks!! 😄

@lalithsuresh
Copy link
Owner

Perfect. I'll close the issue now. Thanks for your patience!

@lalithsuresh
Copy link
Owner

@francisco-polaco : Thanks to what you found, I ended up profiling Rapid to see why limiting CPU resources caused it to react that badly. One culprit was the excessive hash computations in MembershipView which I've cut down almost entirely with some caching. Rerunning Rapid in a docker container with --limit-cpus 0.10 still throws some gRPC exceptions (with the higher timeouts on the issue-13 branch though), but the cluster does reach the right size of 10 every time. I've cherry picked that commit to the master branch.

@francisco-polaco
Copy link
Author

Hashes can be quite heavy when computed multiple times. I also had the same problem with a software I was implementing.
I'm glad you found the cause for the CPU usage :) Nevertheless, I am still experimenting with rapid and the same limits with more nodes, still cause gRPC exceptions, but without reaching the right cluster size :(

@lalithsuresh
Copy link
Owner

@francisco-polaco Yes, I still see them too. My patch doesn't fully eliminate that issue. The bigger problem is that there are multiple threads, mainly from gRPC (which is based on Netty, which polls continuously to check for messages to send/receive). Not good if there are limited CPU resources. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants