Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix - scylla-node] - Ansible builtin + when condition start node #372

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

unkls
Copy link

@unkls unkls commented Apr 3, 2024

Hello Scylladb community,

First thanks for that amazing ansible role !!

I'm currently using your ansible-role to deploy a cluster from scratch and adding a node into a cluster. I'm using ansible 2.16 to match your requirement (2.8). Currently when I was using your examples I was facing some issues with 2 tasks : Start scylla non-seeds nodes serially | Start seeders serially.

This issue append when I'm trying to add a node into the cluster.
My error was due to the Resolve scylla_broadcast_address task that was setting the fact only for the current node that run the task.
Then when we enter into start_scylla_service dependent tasks task, it will iterate over all the servers inside our current inventory and try to catch the scylla_broadcast_address. This variable is empty when we call the role for one node.

Examples :

My inventory is like :

[scylla]
scylla-1
scylla-2
scylla-3

I want to add a new node scylla-4 from the group [new_node].
My playbook is like :

- name: Add Scylla node(s)
  hosts: "{{ ansible_limit }}"
  become: true
  gather_facts: true
  vars:
    full_inventory: false
  roles:
    - ansible-scylla-node
  tags: new_node

- name: Run cleanup on the old nodes
  hosts: "{{ target_host | default('scylla') }}"
  gather_facts: false
  serial: 1
  tasks:
    - name: Run cleanup
      script:  files/repair.sh
  tags: new_node

I call it like : ansible-playbook scylla.yaml -i inventories/inventory.yml -u devops --diff -l scylla-4 -t new_node -e "@group_vars/scylla_poc"
When I arrive at the task scylla_broadcast_address it will perform the set_fact only for my node scylla-4. Once I will run the task Start seeders serially ansible will iterate over loop: "{{ groups['scylla'] }}" but by design this should run only the new node.
That will iterate like :

skipping: [scylla-4] => (scylla-1) --> Empty value for scylla_broadcast_address
skipping: [scylla-4] => (scylla-2) --> Empty value for scylla_broadcast_address
skipping: [scylla-4] => (scylla-3) --> Empty value for scylla_broadcast_address
skipping: [scylla-4] => (scylla-4) --> IP from the NIC not into seeder list
skipping: [scylla-4]

And same pattern append with Start scylla non-seeds nodes serially task.

My fix proposal is to add a new check to avoid crash errors on set_fact empty for scylla_broadcast_address.

Let me know if you need more to debug/discuss !

Thanks a lot in advance !

@unkls unkls mentioned this pull request Apr 3, 2024
@tarzanek
Copy link
Collaborator

tarzanek commented Apr 5, 2024

well the idea looks OK to me, @vladzcloudius ?

@tarzanek
Copy link
Collaborator

tarzanek commented Apr 5, 2024

code wise however you should split the builtin into separate commit (since it's refactoring)
and leave functional change (the scylla_broadcast_address stuff) in a separate isolated commit

both patches/commits should have messages prefixed with
ansible-scylla-node:

and the functional change should describe the change and intended behaviour fix
(so instead "fix empty when condition" you should write something like "support running the role on limited number of nodes, skip tasks that miss information " )

of course thinking about it, there might be more corner cases that this change will unblock and that would require more fixes
original design is that you should always run role on top of whole inventory and it can detect which nodes have to be added as new and wait for them until they get added to cluster

cleanup task is then done afterwards (after all nodes are added)

but I see how your idea can speed up the role run for big clusters, so we should consider it

@unkls
Copy link
Author

unkls commented Apr 5, 2024

Hello @tarzanek,

Thanks for the feedback, I've split both commits let me know if it's ok for you. Thanks for the prefix tips I've include it.

We do use -l in order to add quickly one node into our clusters + we manage multiple inventory with multiple clusters that make our life easier to just limit in this way.

Let me know if you need anything else to allow this PR.

@unkls
Copy link
Author

unkls commented Apr 16, 2024

Hello @tarzanek ,

Any update about this PR ?

Thanks a lot in advance !

@igorribeiroduarte
Copy link
Collaborator

Please note that we already have another PR under review that will add support for the --limit option: #366.

Also note that this PR couldn't be merged before fixing the adjust_keyspace_replication.yml, which currently assumes that if start_scylla_service is set to true and we already executed the 'start_scylla_service dependent tasks block, it means that all the nodes are up and tries to adjust the replication based on that, as explained here -> 1326e4e

The generate_tokens.yml task also assumes that broadcast_address is defined for all nodes and would stop working if we merged this PR as it is right now.

In general, even though this serves as a workaround, relying on broadcast_address being defined or not to decide if we'll start scylla-service doesn't look like the best approach to me. We should explicitly limit the execution of the start_one_node task to the nodes in the play_hosts variable, which will be the nodes passed to --limit.

@unkls
Copy link
Author

unkls commented Apr 18, 2024

Hello @igorribeiroduarte ,

Thanks for your feedback.
About your first point I don't really get it sorry. Even if you perform --limit on one node the context can be global using groups into the tasks (as it's already the case). My point is to run faster the playbook on our clusters.
The PR you shared don't perform the same as your context is changed to only --limit which I don't need for now.
For the adjust_keyspace_replication it will still work as your node will be added and others node are already there and inside the inventory. Again the context stay global with groups.

For the broadcast_address adding a try statement to avoid crash sounds good for me even if you start a node you must need to check if the node itself but not the others as my using a --limit.

Let me know if it's not clear I can elaborate more.

@vladzcloudius
Copy link
Collaborator

vladzcloudius commented May 15, 2024

@unkls First of all it's awesome that you enjoy Scylla Role.
A few comments about the way you use it: Scylla Node Role is not ready to be executed using --limit and not only for reasons you are trying to fix in this PR.

The main problem is that Role has a few invariants that are being validated in regards to the cluster.
One of them is a validation of the cluster being healthy ("wait for the cluster to become healthy") that relies on the ansible_play_batch to include all scylla nodes.
And --limit is reducing ansible_play_batch to the nodes you are restricting the playbook to.

Another pieces that are probably going to break are the tasks that are replicating the node configuration from the first node in the DC ensuring that all nodes are configured the same way: "Collect IO settings for the first node of every DC".

And we there are a few other places that are going to break unless you execute a role on all nodes of the cluster (scylla group).

And the thing is - there is no much motivation to restrict the Role using --limit AFAICT.
Simply run Ansible with --fork as the number of your scylla nodes and make sure to cache ssh connections.

If you do that the execution of a Role from the beginning to the end on the cluster that has already been configured takes about 2 minutes - in your example the execution on the whole cluster and on a single node that bootstraps is going to take exactly the same time.

Instrumenting a Node Role to support --limit would make us reduce the Role's functionality in such a mode to levels quite significantly:

  • We would have to remove configuration files syncing.
  • We would have to remove cluster's health checking.

While this is not impossible but I find the motivation for this unclear while cost of such an instrumentation - quite significant.

@jonbuffington
Copy link

One of them is a validation of the cluster being healthy ("wait for the cluster to become healthy") that relies on the ansible_play_batch to include all scylla nodes.

@vladzcloudius I made a local change related to ansible_play_batch that may be useful for others. I changed:
- nodetool status|grep -E '^UN|^UJ|^DN'|wc -l
+ nodetool status|grep -E '^UJ|^DN'|wc -l
register: node_count
- until: node_count.stdout|int == ansible_play_batch|length
+ until: node_count.stdout|int == 0

The above change now waits for all "unhealthy" cluster nodes to be 0. This seems to be a useful check since this step executes after the check for the CQL port step.

I ran into an issue deploying a test cluster from scratch were the second node of three would never proceed past the "become healthy" step because only one of three nodes were deployed prior to the check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants