diff --git a/docs/parallelism_design.md b/docs/parallelism_design.md new file mode 100644 index 0000000..b359605 --- /dev/null +++ b/docs/parallelism_design.md @@ -0,0 +1,22 @@ +### our paralellism strategy: + +*subject to change* + +use nextflow processes (dataflow) to use multinode+multiprocessing parallelise concurrently across samples/files + +we work to enable multinode, but for now we just use 1 node + +for now, we avoid splitting files for more parallelism + +instead we use tool-level threading + +so we run multiple nf-processes + +each nf-process runs a single tool process + +each tool process runs the tool with the tools parallelism option + + +with the current filesize to count ratio: 1GB to 5 GB per file and about 5-50 files per workflow on ~100 cores this works pretty well + +and most importantly it is quite simple to reason about diff --git a/zen_of_nextflow.md b/docs/zen_of_nextflow.md similarity index 100% rename from zen_of_nextflow.md rename to docs/zen_of_nextflow.md diff --git a/readme.md b/readme.md index d3c8172..f5c5bb2 100644 --- a/readme.md +++ b/readme.md @@ -63,7 +63,7 @@ cd $NEXTFLOW_CALLDIR && nextflow run $NEXTFLOW_MODULES/ukb_main_workflow/main.nf ## configuration -see the environment variables and nextflow-configs like modules/ukb_main_workflow/user.config +see the environment variables and nextflow-configs like `modules/ukb_main_workflow/user.config` ## development @@ -146,26 +146,3 @@ export NXF_LOG_FILE='/path/nextflow_logs' export NXF_PLUGINS_DIR='/path/nextflow_plugins' export NEXTFLOW_MODULES="$(pwd)/modules" ``` - -### our paralellism strategy: - -*subject to change* - -use nextflow processes (dataflow) to use multinode+multiprocessing parallelise concurrently across samples/files - -we work to enable multinode, but for now we just use 1 node - -for now, we avoid splitting files for more parallelism - -instead we use tool-level threading - -so we run multiple nf-processes - -each nf-process runs a single tool process - -each tool process runs the tool with the tools parallelism option - - -with the current filesize to count ratio: 1GB to 5 GB per file and about 5-50 files per workflow on ~100 cores this works pretty well - -and most importantly it is quite simple to reason about