Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3656] Update Hadoop 3.3.6 #1961

Closed
wants to merge 1 commit into from

Conversation

BACtaki
Copy link
Contributor

@BACtaki BACtaki commented Dec 13, 2023

No description provided.

@j143
Copy link
Contributor

j143 commented Dec 17, 2023

For Hadoop upgrade, we simply change the pom version.
But, we can have a look at whether any specific internal updates are required.

@j143 j143 added this to the systemds-3.2.0 milestone Dec 17, 2023
@BACtaki BACtaki requested review from Baunsgaard and j143 December 21, 2023 05:28
@Baunsgaard
Copy link
Contributor

The change is fine, and most likely all good. But we need to test it with an actual Hadoop instance. The test is what makes this upgrade harder than it seems on the surface. It is the same for the Spark update.

@BACtaki
Copy link
Contributor Author

BACtaki commented Dec 21, 2023

The change is fine, and most likely all good. But we need to test it with an actual Hadoop instance. The test is what makes this upgrade harder than it seems on the surface. It is the same for the Spark update.

@Baunsgaard What would testing entail? Is it as simple as running all unit/integration tests on a Hadoop cluster?

@Baunsgaard
Copy link
Contributor

The change is fine, and most likely all good. But we need to test it with an actual Hadoop instance. The test is what makes this upgrade harder than it seems on the surface. It is the same for the Spark update.

@Baunsgaard What would testing entail? Is it as simple as running all unit/integration tests on a Hadoop cluster?

Good question.

The unit tests we have does use HDFS per default but you would not see if it works or not when running the tests unless some of them suddenly crash.

The way I did it before was to have simple scripts that write and read matrices and frames from an HDFS cluster (our normal execution should default to it if HDFS is detected). This does not give a full guarantee everything works but does sufficiently verify support.

@Baunsgaard Baunsgaard removed their request for review December 28, 2023 21:26
@Baunsgaard
Copy link
Contributor

I have tested the Hadoop 3.3.6, and it works with just the update. Thanks for initiating this @BACtaki .

The test i performed was a spark mode job,

I had some issues with writing to HDFS with hybrid mode, and need to look further into it.
The hybrid mode before the update also does not work the way I thought it should, and it might be a regression from some other changes.

For now we merge this update, since we also updated our cluster with the newer version and the update is (as far as i tested) backwards compatible with SystemDS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants