-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple inventories and --limit
#366
base: master
Are you sure you want to change the base?
Support multiple inventories and --limit
#366
Conversation
Having in mind that ansible allows us to have multiple inventory dirs, it makes more sense to use the playbook_dir as a default directory for the metadata used by the role.
…_files_path This variable is being used only by the manager_agents task.
18e05d7
to
b79335c
Compare
@igorribeiroduarte please, check why CI is failing |
ansible-scylla-node/tasks/common.yml
Outdated
method: GET | ||
register: _failure_detector_result | ||
until: _failure_detector_result.status == 200 | ||
retries: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's introduce scylla_api_retries
, set it to 5 by default and use it everywhere instead of hard-codded values like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block has already been removed since I'm not using the API in this task anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that such variable should be added, but I think it's out of the scope of this PR
ansible-scylla-node/tasks/common.yml
Outdated
|
||
- name: Set all_nodes_up as a fact | ||
set_fact: | ||
all_nodes_up: "{% if item.status is defined and item.status == 200 %}{{ True }}{% else %}{{ False }}{% endif %}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if all nodes are responding to REST but some are in UJ state or reshaping/resharding?
This condition is not good. If you require all nodes to be UN
- you need to check that exactly - not just requiring nodes to respond on REST API port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I'm using nodetool status
now
ansible-scylla-node/tasks/common.yml
Outdated
delay: 1 | ||
ignore_errors: true | ||
delegate_to: "{{ item }}" | ||
loop: "{{ groups['scylla'] }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why running this serially and not concurrently (without loop)? Imagine that your cluster has 100s of nodes - this will run forever!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
00562a2
to
08e41cd
Compare
--limit
--limit
--limit
0ef1ba8
to
da57435
Compare
…ication Before this patch, we were assuming that if we get to the adjust_keyspace_replication task and start_scylla_service is set to true, it means that all the nodes are up, since the 'start_scylla_service dependent tasks' block was already executed. However, the role is responsible for starting scylla only for the nodes in ansible_play_hosts, meaning that if the user has a cluster with 6 nodes, but is running the role for 3 nodes (by using --limit), the role should start only these 3 nodes and the other 3 nodes might or might not have already been started. Having that in mind, this patch checks if ALL the nodes in the inventory (and not only the ones in ansible_play_hosts) are up before adjusting the replication.
…cation Keyspace replication needs to be adjusted based on ALL nodes, and not only the ones for which we're currently executing the role.
…_sources to find cql credentials A user might pass multiple inventories and we don't know in which of them the credentials will be, so we need to iterate through all of them to find out.
…the 'play_hosts' var Currently we're starting scylla service in all the nodes in the [scylla] section in the inventory. By doing this we're not respecting the limits applied by the user with the '--limit' option. This patch fixes this by limiting the execution of the 'Start scylla non-seeds nodes serially' and 'Start seeders serially' tasks to the nodes in the 'play_hosts' var. This patch does the same with the 'Create a map from dc to its list of nodes' task.
In order to run the token_distributor script in the generate_tokens task, we need to have broadcast_address and rpc_address defined for all the already bootstrapped nodes, including the ones that are not in the 'play_hosts' variable. So far we've been assuming that the variables scylla_broadcast_address and scylla_rpc_address would always be defined for all the nodes, but the user has no obligation of passing variables for nodes that are not in 'play_hosts', i.e.: hosts that were excluded from the playbook execution by the usage of --limit option. Having that in mind, this patch is defining these variables for bootstrapped nodes that are not in 'play_hosts' by using the values defined in their 'scylla.yaml' files. This patch assumes that any already bootstrapped node will have a scylla.yaml in /etc/scylla with rpc_address and broadcast_address defined.
da57435
to
93135cb
Compare
@@ -17,11 +7,14 @@ | |||
until: _datacenter_out.status == 200 | |||
retries: 5 | |||
delay: 1 | |||
delegate_to: "{{ item }}" | |||
loop: "{{ groups['scylla'] }}" | |||
run_once: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: this is bad - this will work terribly for large clusters.
We need a solution that will execute concurrently on all hosts - not serially.
@@ -79,7 +79,7 @@ | |||
- name: Create a map from dc to its list of nodes | |||
set_fact: | |||
dc_to_node_list: "{{ dc_to_node_list | default({}) | combine( {hostvars[item]['dc']: (dc_to_node_list | default({}))[hostvars[item]['dc']] | default([]) + [item]} ) }}" | |||
loop: "{{ groups['scylla'] }}" | |||
loop: "{{ play_hosts }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bogus! This MUST include all scylla hosts
@@ -328,13 +328,13 @@ | |||
- name: Start seeders serially | |||
run_once: true | |||
include_tasks: start_one_node.yml | |||
loop: "{{ groups['scylla'] }}" | |||
loop: "{{ play_hosts }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
when: hostvars[item]['broadcast_address'] in scylla_seeds or item in scylla_seeds | ||
|
||
- name: Start scylla non-seeds nodes serially | ||
run_once: true | ||
include_tasks: start_one_node.yml | ||
loop: "{{ groups['scylla'] }}" | ||
loop: "{{ play_hosts }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
register: node_count | ||
until: node_count.stdout|int == ansible_play_batch|length | ||
until: node_count.stdout|int == play_hosts|length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for the nodes of a specific DC become UN is not enough - we need to wait that other nodes are UN from these nodes perspective too. So you should wait for all (!!) nodes in the cluster to become UN from all nodes perspective.
No description provided.