Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLCOMPUTE-967 | changes for Spark drivers on k8s #3679

Merged
merged 93 commits into from
Mar 27, 2024

Conversation

CaptainSame
Copy link
Contributor

@CaptainSame CaptainSame commented Aug 17, 2023

  • Add newly created iam role to system paasta configs
  • Add methods to fetch the iam role
  • Use the iam role for creating the service account to start the Spark drivers with
  • Mount kubeconfig file to Spark driver pods
  • Point to the mounted kubeconfig using env variable
  • consolidate tron_tools, spark_run and service_configuration_lib methods

namespace=EXECUTOR_TYPE_TO_NAMESPACE[executor],
k8s_role=_spark_k8s_role(),
# k8s_role=_spark_k8s_role(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Point env variable to KUBECONFIG = /etc/kubernetes/spark.conf (being used in adhoc mode)
  • Puppet kubeconfig to k8s agents to be available in pods
  • Mount KUBECONFIG to pods (extra volumes)
  • command: env include unset env variables for unsetting the env variables set by /etc/boto_cfg
    e.g. --unset=AWS_ACCESS_KEY_ID --unset=AWS_SECRET_ACCESS_KEY

paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
@88manpreet
Copy link
Contributor

Can you add description, testing, any impact on existing infra, rollout and rollback plan if needed?

Comment on lines 1039 to 1053
if "extra_volumes" in result:
result["extra_volumes"] = result["extra_volumes"] + format_volumes(action_config.get_extra_volumes())
else:
result["extra_volumes"] = format_volumes(action_config.get_extra_volumes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are your thoughts on having TronActionConfig provide an override for get_extra_volumes() and doing the same if self.get_executor() == "spark": trick we do elsewhere to transparently add this mount? that we we can avoid doing the merge here which (without additional comments) could potentially confuse future yelpers reading this code

paasta_tools/tron_tools.py Show resolved Hide resolved
@edingroot edingroot force-pushed the MLCOMPUTE-967_spark_drivers_on_k8s branch from e266651 to 88c5320 Compare March 10, 2024 10:29
@88manpreet 88manpreet self-requested a review March 14, 2024 14:46
Copy link
Contributor

@88manpreet 88manpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm once mypy tests pass.
Once someone from compute-infra reviews, should be good to go.

paasta_tools/cli/cmds/spark_run.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/spark_run.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/spark_run.py Show resolved Hide resolved
paasta_tools/cli/schemas/tron_schema.json Outdated Show resolved Hide resolved
paasta_tools/cli/schemas/tron_schema.json Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
paasta_tools/spark_tools.py Show resolved Hide resolved
paasta_tools/spark_tools.py Outdated Show resolved Hide resolved
paasta_tools/spark_tools.py Outdated Show resolved Hide resolved
paasta_tools/spark_tools.py Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Outdated Show resolved Hide resolved
paasta_tools/cli/schemas/tron_schema.json Outdated Show resolved Hide resolved
"pool", load_system_paasta_config().get_tron_default_pool_override()
)
if not self.get_executor() == "spark"
else spark_tools.SPARK_DRIVER_POOL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: we'll probably need to make this configurable in soaconfigs in the near future to allow for gpu drivers

paasta_tools/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once eman has also shipped, we can pair to pin paasta and release this :)

(might also be good to draft some comms for spark users at yelp in case they notice anything weird)

Comment on lines +4307 to +4309
realized_cluster = (
load_system_paasta_config().get_eks_cluster_aliases().get(cluster, cluster)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i look forward to the day this is no longer necessary :)

paasta_tools/cli/schemas/tron_schema.json Outdated Show resolved Hide resolved
paasta_tools/tron_tools.py Show resolved Hide resolved
paasta_tools/tron_tools.py Show resolved Hide resolved
@CaptainSame CaptainSame merged commit 7135849 into master Mar 27, 2024
9 checks passed
@nurdann nurdann mentioned this pull request Apr 12, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants