Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3655] Update Spark Version 3.5.0 #1960

Closed
wants to merge 1 commit into from

Conversation

BACtaki
Copy link
Contributor

@BACtaki BACtaki commented Dec 13, 2023

No description provided.

@BACtaki
Copy link
Contributor Author

BACtaki commented Dec 13, 2023

[WIP] Fix build failures with new Spark version

@j143
Copy link
Contributor

j143 commented Dec 17, 2023

Hi Badrul, there seem to be some components missing from spark dependencies perspective.

we could either create a simple checklist or one migration doc to help with this upgrade!


  • 1. Pre-Upgrade Checks:

    • Review the release notes: Carefully review the release notes for Spark 3.4 and 3.5.x, paying particular attention to breaking changes, deprecations, and new features.

    • Review previous upgrade commits: We have a rich history of previous upgrades to help with the extent of changeset.

    • Compatibility checks: Ensure the application's dependencies (libraries, frameworks) are compatible with Spark 3.5. Some libraries may require upgrades or adjustments to work with the newer version.

      java/org/apache/sysds/runtime/compress/colgroup/APreAgg.java:[22,31] package org.apache.commons.lang does not exist
    • Build and test application locally: Rebuild and test your application locally with Spark 3.5 to identify any compilation errors or functionality regressions. Try with components you are familiar with as a starting point.

    • Review logging and monitoring: Update logging and monitoring configurations to work with Spark 3.5. I believe the log4j template needs update.

  • 2. Specific Considerations:

    • SQL, Datasets, and DataFrames: potential breaking changes in SQL syntax, DataFrame behavior, and schema inference.
    • Spark API usage: Review your internal Spark API usage for any deprecations or new features available in 3.5.x.
  • 3. Design Steps for Upgrade: (optional)
    In case, we want to support upgrade in the hot running system.

    • Gradual rollout: Consider a phased rollout of the upgrade to minimize risk. We could start by deploying to a staging environment for testing, then gradually roll out to production in batches. (if there is production instance of systemds)
    • Monitor closely: monitor application after the upgrade for any performance regressions, errors, or unexpected behavior.
    • Rollback plan: Have a rollback plan in place in case you encounter unexpected issues after the upgrade. This might involve reverting to Spark 3.3 or rolling back specific changes you made.

@Baunsgaard
Copy link
Contributor

I have verified the update works, after i updated some other dependencies.
I ran it in a cluster on the new Hadoop version as well.
So far, no breaking elements, but also no obvious improvements in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants