Releases: molgenis/NGS_Automated
4.4.1
What's Changed
- Rename vip configuration directory by @bartcharbon in #340
Full Changelog: 4.4.0...4.4.1
4.4.0
What's Changed
- fixing new dirname nanopore run, fix line ends for samplesheet, added concordance notification on failures by @Gerbenvandervries in #328
- included notification for ConcordanceCheck by @Gerbenvandervries in #329
- BUGFIX by @RoanKanninga in #331
- feat:update nanopore pipeline to VIP v8.0.0 by @dennishendriksen in #332
- added bedfile to NP resultsdir, bugfix for checksumming fastqs by @Gerbenvandervries in #333
- feat:pass sequencing .bed to VIP by @dennishendriksen in #334
- chore:use VIP EasyBuild module instead of local install by @dennishendriksen in #335
- use vip-umcg-config-gd + do not load vip module for nanopore by @bartcharbon in #336
- Remove duplicate region column in VIP sample sheet by @bartcharbon in #337
- changed output of copyProjectDataToPrm.finished without possible previous error messages by @RoanKanninga in #338
New Contributors
- @bartcharbon made their first contribution in #336
Full Changelog: 4.3.0...4.4.0
4.3.1
Full Changelog: 4.3.0...4.3.1
Added atd and gd notification lines for concordance check.
4.3.0
-
included RNA pipeline
-
included nanopore pipeline
- copyRawNanoporeDataToPrm.sh (copy fastQ and pod5 files to prm)
- copyRawNanoporeDataToTmp.sh (copy raw data to tmp)
- calculateMd5NanoporeFastQ.sh (calculates checksums on nanopore machine)
- startNanoporePipeline.sh (starts vip nextflow pipeline)
-
added constraint to prevent inhouse pipeline running on dragen node and vice versa
bugfixes:
4.2.0
New
- includes the new nf_ngs_dna WGS pipeline (nextflow)
- new script startNextflowDragenPipeline.sh
- new script that (re)moves the sequencing data from the
sequencers_incoming
to thesequencers
folder- data will be removed 2 days after .transferCompleted file is created
- added build column in the samplesheet when not present (in moveAndCheckSamplesheet)
Edits
- genomescan will now write UMCG_CSV samplesheet in the root of the batch (instead of in the Raw_data folder)
- tar gzipped jobs file is now without complete data structure, only contains "basename"
Bugfixes
- when the piped
head
is finished faster than the entire reading of the file (which leads to an error), using tail now instead - when Genomescan uses sequencers with longer names
- when demultiplexing-only a finished state was never reached. (removed wrongly
continue
command)
4.1.0
bugfixes:
- moveAndCheckSamplesheet converts from dos/mac to unix format with sed commands (instead of a non existing mac2unix/dos2unix on the new machines)
- moveAndCheckSamplesheet, when the samplesheet is a genomescan samplesheet certain (required for in house) columns of the samplesheet are not filled. For GS the check will be skipped
added:
-
capturingkit name to the projectname in DRAGEN data (PullAndProcessAnalysis.sh)
-
cleanup script that will cleanup data on tmp
-
cleaning up samplesheets on tmp (is being distributed over all clusters)
-
check in copyProjectDataToPrm that checks whether the rawdata is copied to prm
-
check in copyRawDataToPrm that will skip the splitting of the samplesheet per project when it is demultiplexing only
-
parsing the metrics file for the trendanalysis
-
new group config: umcg-pr.cfg
-
.discarde in atd and gd config, notification for a failed demultiplexing will be mailed
-
cleaning up code in startPipeline
-
updated some configs
4.0.0
Introduction of the bucket system.
The analysis column (for NGS_DNA and GAP) in the samplesheet determine which pipelines will be run. Running multiple pipelines can be selected by a + sign (i.e. NGS_Demultiplexing+NGS_DNA
)
-
bucket system for pipelines(on tmp0X)**
- folders: tmp, Samplesheets, projects and runs have an additional subfolder with the pipeline name i.e.
projects/NGS_DNA/ProjectXX and tmp/GAP/GSA_v3-XXX
- logs folder and prm remains the same as before
- folders: tmp, Samplesheets, projects and runs have an additional subfolder with the pipeline name i.e.
-
new script;
moveAndCheckSamplesheets.sh
(originated from MoveSamplesheets): -
d dat_dir, to overwrite the server.cfg/sharedConfig variable:
DAT_ROOT_DIR
- parsing the samplesheet to see what the first step of the pipeline is (and send it to the correct bucket/samplesheetsfolder)*
* the value in the samplesheet is now hardcoded until the Darwin team make this variable (e.g. for the NGS_DNA pipeline it is by default NGS_Demultiplexing+NGS_DNA**
- parsing the samplesheet to see what the first step of the pipeline is (and send it to the correct bucket/samplesheetsfolder)*
-
new script;
splitAndMoveSamplesheetPerProject.sh
, that handles the splitting of the samplesheet into projects and moving (if required) from NGS_Demultiplexing to NGS_DNA Samplesheets folder -
new script;
copyRawDataToTmp.sh
, that will run on the chaperone machines (where the prm storage is mounted) that handles the copying of the rawdata to tmp (since the introduction of the new diagnostic clusters it is no longer possible to pull data).
This step is required when the rawdata is no longer available on the diagnostic cluster- the script will scan the logs directory of tmp0X on the diagnostic clusters and search for
${project}.data.requested
files. These files are created by the NGS_DNA/GAP pipeline and are already in the correct format to be used directly by the rsync command.
- the script will scan the logs directory of tmp0X on the diagnostic clusters and search for
-
Dragen pipeline is also part of the NGS_Automated
- Merged
PullRawDataFromDS.sh
andprocessGsRawData.sh
into one file:PullAndProcessGsRawData.sh
- the raw data from Genomescan is now one directory level deeper (e.g.
${gsBatch}/Raw_data/123.fastq.gz
instead of in104832-062/123.fastq.gz
)
- the raw data from Genomescan is now one directory level deeper (e.g.
- Created new file for pulling and processing Dragen data:
PullAndProcessGsAnalysisData.sh
- analysis data (such as bams, gvcf and vcf files) are in the
${gsBatch}/Analysis/
folder, per Sample there is one folder. (e.g.104832-062/Analysis/sample1/sample1.gvcf.gz
) - The script will merge all the A,B,C etc samplesheets (e.g. GS_118A, GS_118B) in one samplesheet without the suffix (e.g. GS_118.csv)
- analysis data (such as bams, gvcf and vcf files) are in the
- new script that runs the Dragen pipeline:
startDragenPipeline.sh
- it is executed in the umcg-genomescan group or in its test group (umcg-gst)
- it will execute the NGS_DNA pipeline with the
workflow_DRAGEN.csv
workflow (also part of the NGS_DNA)
- Merged
-
copyProjectDataToPrm.sh has extra arguments
- -d dat_dir, to overwrite the server.cfg/sharedConfig variable:
DAT_ROOT_DIR
- -p in which samplesheets folder (which pipeline) should the script search (e.g. NGS_DNA, GAP)
- rawdata will be processed on the same machine as where the NGS_DNA and GAP pipeline run, build in a check if the rawdata has been copied to prm yet, project data will not be copied until.
- -d dat_dir, to overwrite the server.cfg/sharedConfig variable:
-
copyRawDataToPrm.sh has extra arguments:
- -p in which samplesheets folder (which pipeline) should the script search (e.g. NGS_Demultiplexing, DRAGEN, AGCT)
- -f, this argument can be used when the user that executes the script is pulling rawdata not from the inhouse demultiplexing and it comes from a different group. (in case of genomescan/dragen the option looks like this:
-f run01.processGsRawData.finished
). This will overwrite the RAWDATAPROCESSINGFINISHED parameter in the group.csv file. This argument will also set a mergedSamplesheet variable to true. (see below for more info about the mergedSamplesheet) - if the data is copied there will be a message to the diagnostic cluster that the rawdata is finished (this is needed as explained above in the copyProjectDataToPrm part)
- for regular inhouse data it will create per project in the logsfolder on the diagnostic cluster:
run01.rawDataCopiedToPrm.finished
- in case of genomescan/dragen, the mergedSamplesheet variable is true, then the
run01.rawDataCopiedToPrm.finished
will only be created in the merged projectfolder name (e.g. GS_118)
- for regular inhouse data it will create per project in the logsfolder on the diagnostic cluster:
-
new Trendanalysis scripts
copyQcDataToTmp.sh
(copies all the data from a chaperone to a diagnostic cluster)trendanalyse.sh
(runs the actual trendanalysis)copyTrendAnalysisDataToPrm.sh
(copies the reports back to prm)
-
notification script is now sending messages to MS teams instead of mailing
- message send --> .channelsnotified logfile when message send to MS teams
3.8.0
notifications.sh:
added argument for debugging (-s phase:state) this option will let you run/generate output for solely that specific combination
new method for timing scripts (max duration can be set via the group cfg files)
copyProjectDataToPrm.sh:
removing samplesheet
array added:
gendercheck
missingsamples check
NGS_Automated-3.7.1
Merge pull request #223 from RoanKanninga/master essential bugfix for track and trace
NGS_Automated-3.7.0
- added trendAnalysis.sh to monitor lab, NGS and array results over time.
- minor changes to pipelineTiming.sh
- minor changes to copyProjectdataToPrm.sh