Skip to content

Commit

Permalink
tweak documentation, create doc folder
Browse files Browse the repository at this point in the history
  • Loading branch information
feiloo committed Jan 24, 2025
1 parent c70f984 commit 325d40f
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 24 deletions.
22 changes: 22 additions & 0 deletions docs/parallelism_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
### our paralellism strategy:

*subject to change*

use nextflow processes (dataflow) to use multinode+multiprocessing parallelise concurrently across samples/files

we work to enable multinode, but for now we just use 1 node

for now, we avoid splitting files for more parallelism

instead we use tool-level threading

so we run multiple nf-processes

each nf-process runs a single tool process

each tool process runs the tool with the tools parallelism option


with the current filesize to count ratio: 1GB to 5 GB per file and about 5-50 files per workflow on ~100 cores this works pretty well

and most importantly it is quite simple to reason about
File renamed without changes.
25 changes: 1 addition & 24 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ cd $NEXTFLOW_CALLDIR && nextflow run $NEXTFLOW_MODULES/ukb_main_workflow/main.nf

## configuration

see the environment variables and nextflow-configs like modules/ukb_main_workflow/user.config
see the environment variables and nextflow-configs like `modules/ukb_main_workflow/user.config`


## development
Expand Down Expand Up @@ -146,26 +146,3 @@ export NXF_LOG_FILE='/path/nextflow_logs'
export NXF_PLUGINS_DIR='/path/nextflow_plugins'
export NEXTFLOW_MODULES="$(pwd)/modules"
```

### our paralellism strategy:

*subject to change*

use nextflow processes (dataflow) to use multinode+multiprocessing parallelise concurrently across samples/files

we work to enable multinode, but for now we just use 1 node

for now, we avoid splitting files for more parallelism

instead we use tool-level threading

so we run multiple nf-processes

each nf-process runs a single tool process

each tool process runs the tool with the tools parallelism option


with the current filesize to count ratio: 1GB to 5 GB per file and about 5-50 files per workflow on ~100 cores this works pretty well

and most importantly it is quite simple to reason about

0 comments on commit 325d40f

Please sign in to comment.