This library enables two or more Azure IoT Edge Modules to communicate across the network to establish an Active / Passive relationship, where the Active Module would process task. This library was modeled after the Microsoft Clustering Server (https://en.wikipedia.org/wiki/Microsoft_Cluster_Server) architecture to address resiliency and high availability. By initializing this library as code in a C# Azure IoT Edge Module there are synchronous and asynchronous methods to determine the Active state which should perform work. With configurable millisecond ‘heartbeat’ messages and retry, this library can be configured at the sub second level. Unlike Microsoft Clustering Server, this library does not require dedicated hardware (network or storage), matching hardware, matching operating systems or dedicated networks.
After including the library and creating a single IoTEdgeModuelHA object, simply call the ActiveAsync() or the Active() method to determine state and preform workload. The following example is a simplified code sample with just 3 lines (13, 55 and 60) added to the default template:
- Copy the IoTEdgeHA.dll file to your project.
- Add the following to your “.csproj” file:
<Reference Include="IoTEdgeModuleHA"> <HintPath>IoTEdgeHA.dll</HintPath> </Reference> </ItemGroup>
- Add a “using IoTEdgeModuleHA;” to your “.cs” file
- After the “ioTHubModuleClient.OpenAsync()” line add “IoTEdgeModuleHA IoTEdgeModuleHA = new IoTEdgeModuleHA(ioTHubModuleClient, udpPort:2000, broadcastSubnet="192.168.15.0");” to your “.cs” file
- In your normal loop in IoT Edge, add “await IoTEdgeModuleHA.ActiveAsync();” which will pause if not Active
- In your deployment template or via the Azure Portal, add "createOptions": "{"ExposedPorts":{"2000/udp":{}},"HostConfig":{"PortBindings":{"2000/udp":[{"HostPort":"2000"}]}}}" to expose the UDP port 2000.
The IoTEdgeModuleHA object requires a ModuleClient (or DeviceClient) for initialization and can optionally be passed the following parameters to fine tune the CPU usage and recover time. These parameters can either be provided when creating the IoTEdgeModuleHA object or passed as a desired property in the module TWIN.
Parameter | Type (Default) | Notes |
---|---|---|
udpPort | Integer (60000) | This is the UDP port that the EdgeModuleHA sends and receives messages. NOTE: the IoT Edge Module, via a creationOption, needs to be configured where the host listens on this port on behalf of the module. |
broadcastSubnet | String (“192.168.15.0”) | Because the IoT Edge Module runs in a container on a different network, EdgeModuleHA needs to know the external network that other IoT Edge systems are running on. This network is assumed to be a 24 bit network. |
probeIntervalMS | Integer (200) | How often in milliseconds UDP packets are sent on the network. |
failedProbeCount | Integer (3) | The total number of missed probes at the probeIntervalMS duration until the Active host is considered down and an election is forced. |
Using the desired TWIN of the IoT Edge Module, the same parameter can be passed as shown below:
{
"properties": {
"desired": {
"IoTEdgeModuleHA": {
"probeIntervalMS": 1000,
"failedProbeCount": 3,
"udpPort": 2000,
"broadcastSubnet": "192.168.15.0"
},
},
"reported": {
}
},
"tags": {}
}
The following reported TWINs show the state of the IoTEdgeModuleHA. In addition to the desired properties there are 4 additional properties:
Property | Value | Notes |
---|---|---|
isActive | true or false | Indicates if this IoTEdgeModuleHA is in active state |
bootTimeEPOCH | Integer EPOCH | When the module started, used in election criteria |
lastElection | Date | When the last election happened |
Peers | Delimeted string | Shows all the peers by: IoTEdgeGatewayId | isActive | bootTimeEPOCH | lastSeen |
{
"properties": {
"desired": {
},
"reported": {
"IoTEdgeModuleHA": {
"isActive": false,
"bootTimeEPOCH": 1592424685,
"lastElection": "2020-06-17T19:07:27.1922825Z",
"peers": "node2|True|1592419489|06/17/2020 20:11:25;node1|False|1592424685|06/17/2020 20:11:26;",
"udpPort": 2000,
"broadcastSubnet": "192.168.15.0",
"probeIntervalMS": 1000,
"failedProbeCount": 3
}
}
},
"tags": {}
}
Adding the following createOptions will expose the UDP port when deploying:
"modules": {
"hamodule": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "${MODULES.hamodule}",
"createOptions": "{\"ExposedPorts\":{\"2000/udp\":{}},\"HostConfig\":{\"PortBindings\":{\"2000/udp\":[{\"HostPort\":\"2000\"}]}}}"
}
}
}
When a node fails to receive probes from the active host in failedProbeCount duration, it assumes the role of isActive and if other nodes are online will go into election mode.
When one or more node claim to be “Active” and election is forced. This election is based on 1) the highest bootTimeEPOCH and if there is a tie in bootTimeEPOCH the gatewayDeviceID that is highest in the alphabet “a0001” wins over “b0001”.
Configuring the failedProbeCount too low can cause failovers to happen under CPU load. Testing and tuning should be considered based on need.
Shown below, either stopping or having a failure of a host or module can force an election in less than a second.
This project contains all source code.