Skip to content

Commit

Permalink
gpexpand : Fix multi port issue when clusters are not in balanced sta…
Browse files Browse the repository at this point in the history
…te (#13822)

Why : In previous PR 13757 where the plan was to check the health of the cluster before starting the expansion in gpexpand. while backporting in 6x we realised that hard exit after the health check will create problem for the customer when due to some reason they will not able to make the cluster up in balance state. this will block customer to expand.

Issue : we found the root cause as when the cluster is not in balance state then the max/min port is returning wrong port values and as result system is expanding at the wrong port which is giving multi port error.

Fix: updated the function to work in imbalance cluster state scenario. we are collecting all the port for primary and mirror based on preferred role of the segment, and if the role is switched we are considering segmentPairs.mirrorDB for primary port and segmentPairs.primaryDB for mirror port. in this we are collecting the port which is based on the preferred role.

another change in the pr is, instead of error exit in imbalance scenario we are just logging the warning.

Added unit test cases for the positive/negative and exception scenario for the newly added functions. Updated feature test case where we are converting error to warning.
  • Loading branch information
shrakesh authored and reshke committed Feb 25, 2025
1 parent 3b8cbd8 commit e9e45d2
Show file tree
Hide file tree
Showing 5 changed files with 188 additions and 45 deletions.
18 changes: 9 additions & 9 deletions gpMgmt/bin/gpexpand
Original file line number Diff line number Diff line change
Expand Up @@ -261,12 +261,13 @@ def gpexpand_status_file_exists(coordinator_data_directory):


def is_cluster_up_and_balanced(dburl):
conn = dbconn.connect(dburl, encoding='UTF8')
count = -1
sql = "select count(*) from gp_segment_configuration where status <> 'u' or preferred_role <> role;"
try:
count = dbconn.querySingleton(conn,
"select count(*) from gp_segment_configuration where status <> 'u' or preferred_role <> role;")
finally:
conn.close()
with closing(dbconn.connect(dburl, encoding='UTF8')) as conn:
count = dbconn.querySingleton(conn, sql)
except Exception as e:
raise Exception("failed to query cluster role check: %s" % str(e))

return count == 0

Expand Down Expand Up @@ -2487,10 +2488,9 @@ def main(options, args, parser):
logger.error(e)
sys.exit(1)

# check if the cluster is in good health if not exit from gpexpand
if gpexpand_file_status is None and gpexpand_db_status is None and not is_cluster_up_and_balanced(dburl):
logger.error('One or more segments are either down or not in preferred role. Please fix the issue before running gpexpand.')
sys.exit(1)
# check if the cluster is in good health
if gpexpand_file_status is None and gpexpand_db_status is None and not is_cluster_up_and_balanced(dburl):
logger.warning('One or more segments are either down or not in preferred role.')

if gpexpand_db_status == 'SETUP DONE' or gpexpand_db_status == 'EXPANSION STOPPED':
if not _gp_expand.validate_max_connections():
Expand Down
74 changes: 43 additions & 31 deletions gpMgmt/bin/gppylib/gparray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1290,54 +1290,66 @@ def get_segment_count(self):
return len(self.segmentPairs)

# --------------------------------------------------------------------
def get_min_primary_port(self):
"""Returns the minimum primary segment db port"""
min_primary_port = self.segmentPairs[0].primaryDB.port
def get_primary_port_list(self):
primary_ports = []
for segPair in self.segmentPairs:
if segPair.primaryDB.port < min_primary_port:
min_primary_port = segPair.primaryDB.port
primary = segPair.primaryDB
mirror = segPair.mirrorDB

if primary.preferred_role == primary.role:
primary_ports.append(primary.port)
else:
primary_ports.append(mirror.port)

return min_primary_port
if len(primary_ports) == 0:
raise Exception("No primary ports found in array.")

return primary_ports
# --------------------------------------------------------------------
def get_max_primary_port(self):
"""Returns the maximum primary segment db port"""
max_primary_port = self.segmentPairs[0].primaryDB.port
for segPair in self.segmentPairs:
if segPair.primaryDB.port > max_primary_port:
max_primary_port = segPair.primaryDB.port

return max_primary_port
def get_mirror_port_list(self):

# --------------------------------------------------------------------
def get_min_mirror_port(self):
"""Returns the minimum mirror segment db port"""
if self.get_mirroring_enabled() is False:
raise Exception('Mirroring is not enabled')

min_mirror_port = self.segmentPairs[0].mirrorDB.port

mirror_ports = []
for segPair in self.segmentPairs:
primary = segPair.primaryDB
mirror = segPair.mirrorDB
if mirror and mirror.port < min_mirror_port:
min_mirror_port = mirror.port

return min_mirror_port
if mirror.preferred_role == mirror.role:
mirror_ports.append(mirror.port)
else:
mirror_ports.append(primary.port)

if len(mirror_ports) == 0:
raise Exception("No mirror ports found in array.")

return mirror_ports

# --------------------------------------------------------------------
def get_max_mirror_port(self):
"""Returns the maximum mirror segment db port"""
if self.get_mirroring_enabled() is False:
raise Exception('Mirroring is not enabled')
def get_min_primary_port(self):
"""Returns the minimum primary segment db port"""
primary_ports = self.get_primary_port_list()
return min(primary_ports)

max_mirror_port = self.segmentPairs[0].mirrorDB.port
# --------------------------------------------------------------------
def get_max_primary_port(self):
"""Returns the maximum primary segment db port"""
primary_ports = self.get_primary_port_list()
return max(primary_ports)

for segPair in self.segmentPairs:
mirror = segPair.mirrorDB
if mirror and mirror.port > max_mirror_port:
max_mirror_port = mirror.port
# --------------------------------------------------------------------
def get_min_mirror_port(self):
"""Returns the minimum mirror segment db port"""
mirror_ports = self.get_mirror_port_list()
return min(mirror_ports)

return max_mirror_port
# --------------------------------------------------------------------
def get_max_mirror_port(self):
"""Returns the maximum mirror segment db port"""
mirror_ports = self.get_mirror_port_list()
return max(mirror_ports)

# --------------------------------------------------------------------
def get_interface_numbers(self):
Expand Down
121 changes: 121 additions & 0 deletions gpMgmt/bin/gppylib/test/unit/test_unit_gparray.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,127 @@ def test_initFromCatalog_mismatched_versions(self, mock_connect, mock_query):
with self.assertRaisesRegex(Exception, 'Cannot connect to GPDB version 5 from installed version 7'):
GpArray.initFromCatalog(None)

def test_get_min_primary_port_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = 6000
actual = GpArray.get_min_primary_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_min_primary_port_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = 6000
actual = GpArray.get_min_primary_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_max_primary_port_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = 6001
actual = GpArray.get_max_primary_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_max_primary_port_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = 6001
actual = GpArray.get_max_primary_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_min_mirror_port_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = 7000
actual = GpArray.get_min_mirror_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_min_mirror_port_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = 7000
actual = GpArray.get_min_mirror_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_max_mirror_port_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = 7001
actual = GpArray.get_max_mirror_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_max_mirror_port_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = 7001
actual = GpArray.get_max_mirror_port(self.gpArray)
self.assertEqual(expected, actual)

def test_get_primary_port_list_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = [6000, 6001]
actual = GpArray.get_primary_port_list(self.gpArray)
self.assertEqual(expected, actual)

def test_get_primary_port_list_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = [6000, 6001]
actual = GpArray.get_primary_port_list(self.gpArray)
self.assertEqual(expected, actual)

def test_get_primary_port_list_len_exception(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
self.gpArray.segmentPairs = []
with self.assertRaisesRegex(Exception, 'No primary ports found in array.'):
GpArray.get_primary_port_list(self.gpArray)

def test_get_mirror_port_list_when_cluster_balanced(self):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
expected = [7000, 7001]
actual = GpArray.get_mirror_port_list(self.gpArray)
self.assertEqual(expected, actual)

def test_get_mirror_port_list_when_cluster_not_balanced(self):
self.gpArray = self._createUnBalancedGpArrayWith2Primary2Mirrors()
expected = [7000, 7001]
actual = GpArray.get_mirror_port_list(self.gpArray)
self.assertEqual(expected, actual)

@patch('gppylib.gparray.GpArray.get_mirroring_enabled', return_value=False)
def test_get_mirror_port_list_no_mirror_exception(self,mock1):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
with self.assertRaisesRegex(Exception, 'Mirroring is not enabled'):
GpArray.get_mirror_port_list(self.gpArray)

@patch('gppylib.gparray.GpArray.get_mirroring_enabled', return_value=True)
def test_get_mirror_port_list_len_exception(self, mock1):
self.gpArray = self._createBalancedGpArrayWith2Primary2Mirrors()
self.gpArray.segmentPairs = []
with self.assertRaisesRegex(Exception, 'No mirror ports found in array.'):
GpArray.get_mirror_port_list(self.gpArray)

def _createBalancedGpArrayWith2Primary2Mirrors(self):
self.coordinator = Segment.initFromString(
"1|-1|p|p|s|u|cdw|cdw|6432|/data/coordinator")
self.primary0 = Segment.initFromString(
"2|0|p|p|s|u|sdw1|sdw1|6000|/data/primary0")
self.primary1 = Segment.initFromString(
"3|1|p|p|s|u|sdw2|sdw2|6001|/data/primary1")
self.mirror0 = Segment.initFromString(
"4|0|m|m|s|u|sdw2|sdw2|7000|/data/mirror0")
self.mirror1 = Segment.initFromString(
"5|1|m|m|s|u|sdw1|sdw1|7001|/data/mirror1")
self.standby = Segment.initFromString(
"6|-1|m|m|s|u|sdw3|sdw3|6432|/data/standby")
return GpArray([self.coordinator, self.standby, self.primary0, self.primary1, self.mirror0, self.mirror1])

def _createUnBalancedGpArrayWith2Primary2Mirrors(self):
self.coordinator = Segment.initFromString(
"1|-1|p|p|s|u|cdw|cdw|6432|/data/coordinator")
self.primary0 = Segment.initFromString(
"2|0|m|p|s|u|sdw1|sdw1|6000|/data/primary0")
self.primary1 = Segment.initFromString(
"3|1|m|p|s|u|sdw2|sdw2|6001|/data/primary1")
self.mirror0 = Segment.initFromString(
"4|0|p|m|s|u|sdw2|sdw2|7000|/data/mirror0")
self.mirror1 = Segment.initFromString(
"5|1|p|m|s|u|sdw1|sdw1|7001|/data/mirror1")
self.standby = Segment.initFromString(
"6|-1|m|m|s|u|sdw3|sdw3|6432|/data/standby")
return GpArray([self.coordinator, self.standby, self.primary0, self.primary1, self.mirror0, self.mirror1])

def convert_bool(val):
if type(val) is bool:
return val
Expand Down
10 changes: 9 additions & 1 deletion gpMgmt/bin/gppylib/test/unit/test_unit_gpexpand.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,17 @@ def test_unit_cluster_up_and_balanced_false(self, mock):

@patch('gppylib.db.dbconn.querySingleton', side_effect=Exception())
def test_unit_cluster_up_and_balanced_exception(self, mock1):
with self.assertRaises(Exception):
with self.assertRaises(Exception) as ex:
self.subject.is_cluster_up_and_balanced(dbconn.DbURL())

self.assertTrue('failed to query cluster role check' in str(ex.exception))

@patch('gppylib.db.dbconn.connect', side_effect=Exception())
def test_unit_cluster_up_and_balanced_conn_exception(self, mock1):
with self.assertRaises(Exception) as ex:
self.subject.is_cluster_up_and_balanced(dbconn.DbURL())

self.assertTrue('failed to query cluster role check' in str(ex.exception))

#
# end tests for interview_setup()
Expand Down
10 changes: 6 additions & 4 deletions gpMgmt/test/behave/mgmt_utils/gpexpand.feature
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,7 @@ Feature: expand the cluster by adding more segments
And a working directory of the test as '/data/gpdata/gpexpand'
And a temporary directory under "/data/gpdata/gpexpand/expandedData" to expand into
And a cluster is created with mirrors on "mdw" and "sdw1"
And the user runs gpinitstandby with options " "
And database "gptest" exists
And there are no gpexpand_inputfiles
And the cluster is setup for an expansion on hosts "mdw,sdw1"
Expand All @@ -516,14 +517,15 @@ Feature: expand the cluster by adding more segments
And an FTS probe is triggered
And the status of the primary on content 0 should be "d"
When the user runs gpexpand with a static inputfile for a single-node cluster with mirrors without ret code check
Then gpexpand should return a return code of 1
Then gpexpand should print "One or more segments are either down or not in preferred role. Please fix the issue before running gpexpand." to stdout
Then gpexpand should return a return code of 0
Then gpexpand should print "One or more segments are either down or not in preferred role." to stdout

Scenario: on expand check if one or more cluster is not in their preferred role
Given the database is not running
And a working directory of the test as '/data/gpdata/gpexpand'
And a temporary directory under "/data/gpdata/gpexpand/expandedData" to expand into
And a cluster is created with mirrors on "mdw" and "sdw1"
And the user runs gpinitstandby with options " "
And database "gptest" exists
And there are no gpexpand_inputfiles
And the cluster is setup for an expansion on hosts "mdw,sdw1"
Expand All @@ -536,5 +538,5 @@ Feature: expand the cluster by adding more segments
And all the segments are running
And the segments are synchronized
When the user runs gpexpand with a static inputfile for a single-node cluster with mirrors without ret code check
Then gpexpand should return a return code of 1
And gpexpand should print "One or more segments are either down or not in preferred role. Please fix the issue before running gpexpand." to stdout
Then gpexpand should return a return code of 0
And gpexpand should print "One or more segments are either down or not in preferred role." to stdout

0 comments on commit e9e45d2

Please sign in to comment.