-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception while using the Rapid Library #37
Comments
Rapid is a consensus based membership service. Once you bootstrap a cluster (size > 1), you need a majority of processes to agree to each membership change. In this case, you had size == 2, and then dropped to size == 1, so there is no majority anymore. Once you lose a majority, there is no safe way for Rapid automatically recover. You will need some out-of-band mechanism (like an admin command) that invokes Cluster.start() and start a new cluster. For your particular exercise, try to create 5 nodes and try failing some. |
Thank you Lalith. |
Yes you can (the membership structure is flat and independent of topology). You may want to consider plugging in your own messaging and failure detectors though (see the examples/ folder and also look at IEdgeFailureDetectorFactory). |
Hi Lalith,
I'm trying to use Rapid for membership information. Trying to execute the StandaloneAgent example provided in the examples directory. Below are the steps I'm executing.
Step1: on Terminatal1 executed the following command
$>java -cp standalone-agent.jar:. com.test.StandaloneAgent --listenAddress 127.0.0.1:1234 --seedAddress 127.0.0.1:1234
Output I'm seeing as follows
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
Step2: Opened Terminal2 and executed the following command
$>java -cp standalone-agent.jar:. com.test.StandaloneAgent --listenAddress 127.0.0.1:1235 --seedAddress 127.0.0.1:1234
Output:
Terminal1
[main] INFO com.vrg.rapid.Cluster - 127.0.0.1:1235 is sending a join-p2 to 127.0.0.1:1234 for config 3713649891269931577
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2
and the output on Terminal2 is as follows
[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Join at seed for {seed:127.0.0.1:1234, sender:127.0.0.1:1235, config:3713649891269931577, size:1}
[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Proposing membership change of size 1
[protocol-127.0.0.1:1234-0] INFO com.test.StandaloneAgent - The condition detector has outputted a proposal: ClusterStatusChange{configurationId=3713649891269931577, membership=[hostname: "127.0.0.1"
port: 1234
], delta=[127.0.0.1:1235:UP:]}
[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Decide view change called in current configuration 3713649891269931577 (1 nodes), for proposal [127.0.0.1:1235, ]
[protocol-127.0.0.1:1234-0] INFO com.test.StandaloneAgent - View change detected: ClusterStatusChange{configurationId=-4337195239393783641, membership=[hostname: "127.0.0.1"
port: 1235
, hostname: "127.0.0.1"
port: 1234
], delta=[127.0.0.1:1235:UP:]}
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 2
Step3:
I have killed the process in the Terminal2 and moved to the Terminal1. I'm getting following exception in the Terminal1
[bg-127.0.0.1:1234-0] ERROR com.vrg.rapid.messaging.impl.Retries - Retrying call to hostname: "127.0.0.1"
port: 1235
because of exception {}
io.grpc.StatusRuntimeException: UNAVAILABLE
at io.grpc.Status.asRuntimeException(Status.java:526)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:433)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:339)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:443)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:525)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:446)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:557)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:107)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:1235
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
[bg-127.0.0.1:1234-0] ERROR com.vrg.rapid.messaging.impl.Retries - Retrying call to hostname: "127.0.0.1"
port: 1235
because of exception {}
What I expected on Terminal1 was cluster size 1. However, I'm getting the above expection
Step 4:
on Terminal2, I'm trying to restart the process that I have killed in the step3 and getting the following exception
[main] ERROR com.vrg.rapid.Cluster - Join message to seed 127.0.0.1:1234 returned an exception: {}
com.vrg.rapid.Cluster$JoinPhaseTwoException
at com.vrg.rapid.Cluster$Builder.joinAttempt(Cluster.java:400)
at com.vrg.rapid.Cluster$Builder.join(Cluster.java:315)
at com.vrg.rapid.Cluster$Builder.join(Cluster.java:294)
at com.test.StandaloneAgent.startCluster(StandaloneAgent.java:44)
at com.test.StandaloneAgent.main(StandaloneAgent.java:100)
[main] INFO com.vrg.rapid.Cluster - 127.0.0.1:1235 is sending a join-p2 to 127.0.0.1:1234 for config -1
What I expect here is that, after restarting the process again, it will rejoin the cluster and cluster size should be 2.
Is the behaviour in the step3 and step4 are expected? Am I using the membership library correctly.
Can you please help me here?
Thanks
Ravi
The text was updated successfully, but these errors were encountered: