Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception while using the Rapid Library #37

Closed
raviov opened this issue Jul 28, 2021 · 3 comments
Closed

Exception while using the Rapid Library #37

raviov opened this issue Jul 28, 2021 · 3 comments

Comments

@raviov
Copy link

raviov commented Jul 28, 2021

Hi Lalith,

I'm trying to use Rapid for membership information. Trying to execute the StandaloneAgent example provided in the examples directory. Below are the steps I'm executing.

Step1: on Terminatal1 executed the following command

$>java -cp standalone-agent.jar:. com.test.StandaloneAgent --listenAddress 127.0.0.1:1234 --seedAddress 127.0.0.1:1234

Output I'm seeing as follows

[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 1

Step2: Opened Terminal2 and executed the following command

$>java -cp standalone-agent.jar:. com.test.StandaloneAgent --listenAddress 127.0.0.1:1235 --seedAddress 127.0.0.1:1234

Output:

Terminal1

[main] INFO com.vrg.rapid.Cluster - 127.0.0.1:1235 is sending a join-p2 to 127.0.0.1:1234 for config 3713649891269931577
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1235 -- cluster size 2

and the output on Terminal2 is as follows

[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Join at seed for {seed:127.0.0.1:1234, sender:127.0.0.1:1235, config:3713649891269931577, size:1}
[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Proposing membership change of size 1
[protocol-127.0.0.1:1234-0] INFO com.test.StandaloneAgent - The condition detector has outputted a proposal: ClusterStatusChange{configurationId=3713649891269931577, membership=[hostname: "127.0.0.1"
port: 1234
], delta=[127.0.0.1:1235:UP:]}
[protocol-127.0.0.1:1234-0] INFO com.vrg.rapid.MembershipService - Decide view change called in current configuration 3713649891269931577 (1 nodes), for proposal [127.0.0.1:1235, ]
[protocol-127.0.0.1:1234-0] INFO com.test.StandaloneAgent - View change detected: ClusterStatusChange{configurationId=-4337195239393783641, membership=[hostname: "127.0.0.1"
port: 1235
, hostname: "127.0.0.1"
port: 1234
], delta=[127.0.0.1:1235:UP:]}
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 2
[main] INFO com.test.StandaloneAgent - Node 127.0.0.1:1234 -- cluster size 2

Step3:

I have killed the process in the Terminal2 and moved to the Terminal1. I'm getting following exception in the Terminal1

[bg-127.0.0.1:1234-0] ERROR com.vrg.rapid.messaging.impl.Retries - Retrying call to hostname: "127.0.0.1"
port: 1235
because of exception {}
io.grpc.StatusRuntimeException: UNAVAILABLE
at io.grpc.Status.asRuntimeException(Status.java:526)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:433)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:339)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:443)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:525)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:446)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:557)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:107)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:1235
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
[bg-127.0.0.1:1234-0] ERROR com.vrg.rapid.messaging.impl.Retries - Retrying call to hostname: "127.0.0.1"
port: 1235
because of exception {}

What I expected on Terminal1 was cluster size 1. However, I'm getting the above expection

Step 4:

on Terminal2, I'm trying to restart the process that I have killed in the step3 and getting the following exception

[main] ERROR com.vrg.rapid.Cluster - Join message to seed 127.0.0.1:1234 returned an exception: {}
com.vrg.rapid.Cluster$JoinPhaseTwoException
at com.vrg.rapid.Cluster$Builder.joinAttempt(Cluster.java:400)
at com.vrg.rapid.Cluster$Builder.join(Cluster.java:315)
at com.vrg.rapid.Cluster$Builder.join(Cluster.java:294)
at com.test.StandaloneAgent.startCluster(StandaloneAgent.java:44)
at com.test.StandaloneAgent.main(StandaloneAgent.java:100)
[main] INFO com.vrg.rapid.Cluster - 127.0.0.1:1235 is sending a join-p2 to 127.0.0.1:1234 for config -1

What I expect here is that, after restarting the process again, it will rejoin the cluster and cluster size should be 2.

Is the behaviour in the step3 and step4 are expected? Am I using the membership library correctly.
Can you please help me here?

Thanks
Ravi

@lalithsuresh
Copy link
Owner

Rapid is a consensus based membership service. Once you bootstrap a cluster (size > 1), you need a majority of processes to agree to each membership change. In this case, you had size == 2, and then dropped to size == 1, so there is no majority anymore.

Once you lose a majority, there is no safe way for Rapid automatically recover. You will need some out-of-band mechanism (like an admin command) that invokes Cluster.start() and start a new cluster.

For your particular exercise, try to create 5 nodes and try failing some.

@raviov
Copy link
Author

raviov commented Jul 29, 2021

Thank you Lalith.
One more question, can we use the nodes/machines that are belongs to multiple Data Centres to form a cluster?

@lalithsuresh
Copy link
Owner

lalithsuresh commented Jul 29, 2021

Yes you can (the membership structure is flat and independent of topology).

You may want to consider plugging in your own messaging and failure detectors though (see the examples/ folder and also look at IEdgeFailureDetectorFactory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants