Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor noop transform to use dpk_ structures #951

Merged
merged 42 commits into from
Jan 23, 2025
Merged

refactor noop transform to use dpk_ structures #951

merged 42 commits into from
Jan 23, 2025

Conversation

daw3rd
Copy link
Member

@daw3rd daw3rd commented Jan 17, 2025

Why are these changes needed?

To adapt to the new transform project structure.

Related issue number (if any).

#774

@daw3rd daw3rd requested a review from touma-I January 17, 2025 15:56
daw3rd added 18 commits January 17, 2025 17:05
Signed-off-by: David Wood <[email protected]>
Signed-off-by: David Wood <[email protected]>
Signed-off-by: David Wood <[email protected]>
@shahrokhDaijavad
Copy link
Member

@daw3rd I have reviewed 2 things so far:

  1. Going over your README file
  2. Testing the code by running make run-cli-sample successfully

On the README, everything looks good, but I have a suggestion for improvement:
In the Usage section, both CLI and make run-cli-sample use the Python runtime. Since noop is a good learning transform for first-time users, it makes sense to add two small sections of code for using CLI to run Ray and Spark runtimes. An example is the README for fdedup here: https://github.com/IBM/data-prep-kit/blob/noop-refactor/transforms/universal/fdedup/README.md.
Since we have the generic: python -m dpk_$(TRANSFORM_NAME).runtime
an alternative, if you do not want to have new Ray and Spark sections in the README, is just changing the CLI part to:
python -m dpk_noop/transform_python --noop_sleep_sec 10 ...
=>
python -m dpk_noop/transform_runtime --noop_sleep_sec 10 ... and saying that runtime can be python, ray, or spark.

@daw3rd
Copy link
Member Author

daw3rd commented Jan 21, 2025

@daw3rd I have reviewed 2 things so far:

1. Going over your README file

2. Testing the code by running `make run-cli-sample` successfully

On the README, everything looks good, but I have a suggestion for improvement: In the Usage section, both CLI and make run-cli-sample use the Python runtime. Since noop is a good learning transform for first-time users, it makes sense to add two small sections of code for using CLI to run Ray and Spark runtimes. An example is the README for fdedup here: https://github.com/IBM/data-prep-kit/blob/noop-refactor/transforms/universal/fdedup/README.md. Since we have the generic: python -m dpk_$(TRANSFORM_NAME).runtime an alternative, if you do not want to have new Ray and Spark sections in the README, is just changing the CLI part to: python -m dpk_noop/transform_python --noop_sleep_sec 10 ... => python -m dpk_noop/transform_runtime --noop_sleep_sec 10 ... and saying that runtime can be python, ray, or spark.

I added an example of running the Ray runtime. Do we need to show how to run all the local_*.py?

@shahrokhDaijavad
Copy link
Member

@daw3rd I have reviewed 2 things so far:

1. Going over your README file

2. Testing the code by running `make run-cli-sample` successfully

On the README, everything looks good, but I have a suggestion for improvement: In the Usage section, both CLI and make run-cli-sample use the Python runtime. Since noop is a good learning transform for first-time users, it makes sense to add two small sections of code for using CLI to run Ray and Spark runtimes. An example is the README for fdedup here: https://github.com/IBM/data-prep-kit/blob/noop-refactor/transforms/universal/fdedup/README.md. Since we have the generic: python -m dpk_$(TRANSFORM_NAME).runtime an alternative, if you do not want to have new Ray and Spark sections in the README, is just changing the CLI part to: python -m dpk_noop/transform_python --noop_sleep_sec 10 ... => python -m dpk_noop/transform_runtime --noop_sleep_sec 10 ... and saying that runtime can be python, ray, or spark.

I added an example of running the Ray runtime. Do we need to show how to run all the local_*.py?

@daw3rd We don't need to show how to run all 3, but when you show the command line for Ray runtime, you want to add a sentence saying that the runtime could be python, ray or spark.

@daw3rd
Copy link
Member Author

daw3rd commented Jan 21, 2025

@daw3rd I have reviewed 2 things so far:

1. Going over your README file

2. Testing the code by running `make run-cli-sample` successfully

On the README, everything looks good, but I have a suggestion for improvement: In the Usage section, both CLI and make run-cli-sample use the Python runtime. Since noop is a good learning transform for first-time users, it makes sense to add two small sections of code for using CLI to run Ray and Spark runtimes. An example is the README for fdedup here: https://github.com/IBM/data-prep-kit/blob/noop-refactor/transforms/universal/fdedup/README.md. Since we have the generic: python -m dpk_$(TRANSFORM_NAME).runtime an alternative, if you do not want to have new Ray and Spark sections in the README, is just changing the CLI part to: python -m dpk_noop/transform_python --noop_sleep_sec 10 ... => python -m dpk_noop/transform_runtime --noop_sleep_sec 10 ... and saying that runtime can be python, ray, or spark.

I added an example of running the Ray runtime. Do we need to show how to run all the local_*.py?

@daw3rd We don't need to show how to run all 3, but when you show the command line for Ray runtime, you want to add a sentence saying that the runtime could be python, ray or spark.

@shahrokhDaijavad OK I added spark example and was more specific about running the various runtimes. If you want more, maybe get together to discuss.

@shahrokhDaijavad
Copy link
Member

Thanks, @daw3rd Looks good!

Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to go through this with a fine comb as the noop is a reference for everything we do. will have some more tomorrow during our call. thanks

data-processing-lib/doc/python-launcher-options.md Outdated Show resolved Hide resolved
data-processing-lib/doc/python-launcher-options.md Outdated Show resolved Hide resolved
data-processing-lib/doc/ray-launcher-options.md Outdated Show resolved Hide resolved
data-processing-lib/doc/simplified_transform_apis.md Outdated Show resolved Hide resolved
data-processing-lib/doc/spark-launcher-options.md Outdated Show resolved Hide resolved
data-processing-lib/doc/transform-standalone-testing.md Outdated Show resolved Hide resolved
doc/quick-start/contribute-your-own-transform.md Outdated Show resolved Hide resolved
doc/quick-start/run-transform-image.md Outdated Show resolved Hide resolved
doc/repo.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments in preparation for our call. thanks

doc/repo.md Outdated Show resolved Hide resolved
transforms/universal/noop/Makefile Show resolved Hide resolved
transforms/universal/noop/dpk_noop/local_python.py Outdated Show resolved Hide resolved
transforms/universal/noop/dpk_noop/ray/local_ray.py Outdated Show resolved Hide resolved
transforms/universal/noop/dpk_noop/ray/s3_ray.py Outdated Show resolved Hide resolved
transforms/universal/noop/dpk_noop/spark/local_spark.py Outdated Show resolved Hide resolved
@daw3rd daw3rd requested a review from touma-I January 22, 2025 22:35
data-processing-lib/doc/overview.md Outdated Show resolved Hide resolved
@touma-I touma-I merged commit 3ca9926 into dev Jan 23, 2025
83 checks passed
@touma-I touma-I deleted the noop-refactor branch January 23, 2025 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants