Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor noop transform to use dpk_ structures #951

Merged
merged 42 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
19947c7
refactor noop transform to use dpk_ structures
daw3rd Jan 17, 2025
69fdae2
readme and kfp Makefile for noop restructure
daw3rd Jan 17, 2025
d68cb94
add spark support to restructured noop
daw3rd Jan 17, 2025
f8f6e89
fix import in spark noop test
daw3rd Jan 17, 2025
c735092
fix comments and help text in noop
daw3rd Jan 17, 2025
fdebbcc
update doc references to noop to match restructuring
daw3rd Jan 17, 2025
65e1f00
Update transform readme to match new transform structure
daw3rd Jan 17, 2025
9e22f9c
more project structure documentation change
daw3rd Jan 17, 2025
ecc4d5d
add noop .dockerignore
daw3rd Jan 17, 2025
0fec98b
correct make conventions target for new project structure
daw3rd Jan 17, 2025
8b537b1
fix transform check-exists target to not print errors when dir does n…
daw3rd Jan 17, 2025
a595ee0
fix check-exists target and update .dockerignores
daw3rd Jan 17, 2025
2cbba15
remove duplicate convention check on spark file
daw3rd Jan 18, 2025
dc77340
readme typo on running transform
daw3rd Jan 21, 2025
ffe8aa9
rename all noop runtime files to runtime.py
daw3rd Jan 21, 2025
01394d6
update transform dir README with dir tree
daw3rd Jan 21, 2025
4bd2109
update Makefile to use runtime.py for noop
daw3rd Jan 21, 2025
ebb17f8
formatting in transform readme
daw3rd Jan 21, 2025
c01a201
remove notebook from transform readme
daw3rd Jan 21, 2025
a8232e2
typo in transforms readme
daw3rd Jan 21, 2025
67c7e7a
Merge branch 'dev' into noop-refactor
daw3rd Jan 21, 2025
3e39456
fix typo in quickstart
daw3rd Jan 21, 2025
6c90d95
set image runtime module for noop Makefile to runtime
daw3rd Jan 21, 2025
cb3ef12
transform readme changes
daw3rd Jan 21, 2025
9c4037c
change noop module to runtime in kfp_ray
daw3rd Jan 21, 2025
7baae68
adapt to name change of module to runtime in tests and readme for noop
daw3rd Jan 21, 2025
6a0193b
fix noop module references in docs to match new runtime module naming
daw3rd Jan 21, 2025
01ddba2
fix typo in noop readme
daw3rd Jan 21, 2025
8554217
more noop renamings in markdown
daw3rd Jan 21, 2025
2f946d5
add README.md.template for transforms
daw3rd Jan 21, 2025
86eb40c
transform readme updates
daw3rd Jan 21, 2025
7ba9317
noop readme and transform readme template
daw3rd Jan 21, 2025
d5b1ac8
address doc comments, fix local_*.py to reference correct package and…
daw3rd Jan 22, 2025
f651b39
fix noop s3 import
daw3rd Jan 22, 2025
78a0f91
edits to launcher md to only show runtime.py
daw3rd Jan 22, 2025
994bf03
remove details on execution from md
daw3rd Jan 22, 2025
0598b5f
make runtime.py the default SRC in the transform Makefile template
daw3rd Jan 22, 2025
a9bcc22
replace python/ray/spark launcher options docs with a single doc on l…
daw3rd Jan 22, 2025
e6f48dc
fix testing doc by removing setting of PYTHONPATH
daw3rd Jan 22, 2025
1e157b9
Modifications/Deletions of a few md files
shahrokhDaijavad Jan 22, 2025
fa383c5
Update data-processing-lib/doc/overview.md
touma-I Jan 23, 2025
cea4ea5
Update data-processing-lib/doc/overview.md
touma-I Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
306 changes: 0 additions & 306 deletions data-processing-lib/doc/advanced-transform-tutorial.md

This file was deleted.

2 changes: 1 addition & 1 deletion data-processing-lib/doc/data-access-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ the processing of input data files and the expected destination
of the processed files.
The `DataAccessFactory` is most often configured using command line arguments
to specify the type of `DataAccess` instance to create
(see `--data_*` options [here](python-launcher-options.md).
(see `--data_*` options [here](launcher-options.md).
Currently, it supports
[DataAccessLocal](../python/src/data_processing/data_access/data_access_local.py)
and
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,16 @@
# Ray Launcher Command Line Options
A number of command line options are available when launching a transform.
# Runtime Command Line Options

The following is a current --help output (a work in progress) for
the `NOOPTransform` (note the --noop_sleep_sec and --noop_pwd options):
A number of command line options are available when launching a transform.
* Transform options defined by the specific transform
* Runtime/launcher independent options, primarily for identifying data sources and destinations.
* Runtime-specific options for controlling aspects of the individual runtime.

The runtime options are discussed below (see the specific transform or using -help
to determine transform options.)

## Runtime-independent Launcher CLI Arguments
The following are the set of command line launcher options available to all runtimes.
```
usage: noop_transform.py [-h] [--run_locally RUN_LOCALLY] [--noop_sleep_sec NOOP_SLEEP_SEC] [--noop_pwd NOOP_PWD] [--data_s3_cred DATA_S3_CRED] [--data_s3_config DATA_S3_CONFIG] [--data_local_config DATA_LOCAL_CONFIG]
[--data_max_files DATA_MAX_FILES] [--data_checkpointing DATA_CHECKPOINTING] [--data_data_sets DATA_DATA_SETS] [--data_files_to_use DATA_FILES_TO_USE] [--data_num_samples DATA_NUM_SAMPLES]
[--runtime_num_workers RUNTIME_NUM_WORKERS] [--runtime_worker_options RUNTIME_WORKER_OPTIONS] [--runtime_creation_delay RUNTIME_CREATION_DELAY] [--runtime_pipeline_id RUNTIME_PIPELINE_ID]
[--runtime_job_id RUNTIME_JOB_ID] [--runtime_code_location RUNTIME_CODE_LOCATION]

Driver for noop processing

options:
-h, --help show this help message and exit
--run_locally RUN_LOCALLY
running ray local flag
--noop_sleep_sec NOOP_SLEEP_SEC
Sleep actor for a number of seconds while processing the data frame, before writing the file to COS
--noop_pwd NOOP_PWD A dummy password which should be filtered out of the metadata
--data_s3_cred DATA_S3_CRED
AST string of options for s3 credentials. Only required for S3 data access.
access_key: access key help text
Expand Down Expand Up @@ -49,6 +41,29 @@ options:
list of file extensions to choose for input.
--data_num_samples DATA_NUM_SAMPLES
number of random input files to process
```

## Python Launcher CLI Arguments
The following are the set of command line launcher options available on for the python runtime.
```
--runtime_num_processors RUNTIME_NUM_PROCESSORS
size of multiprocessing pool
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }

```
## Ray Launcher CLI Arguments
The following are the set of command line launcher options available on for the Ray runtime.
```
--runtime_num_workers RUNTIME_NUM_WORKERS
number of workers
--runtime_worker_options RUNTIME_WORKER_OPTIONS
Expand Down Expand Up @@ -77,3 +92,18 @@ options:
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
## Spark Launcher CLI Arguments
The following are the set of command line launcher options available on for the Spark runtime.
```
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
Loading