Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

busco needs --offline flag #187

Open
AlcaArctica opened this issue Jan 31, 2025 · 1 comment
Open

busco needs --offline flag #187

AlcaArctica opened this issue Jan 31, 2025 · 1 comment
Assignees
Labels
awaiting-feedback Waiting for input from user bug Something isn't working

Comments

@AlcaArctica
Copy link

AlcaArctica commented Jan 31, 2025

Description of the bug

Hi, I am just having a look at your pipeline and it seems really interesting. I gave it a test run and it did produce some results. However, I am having some issue with busco. I am running the nextflow pipeline with singularity and I would like to use a pre-installed busco database, made available by our system administrators. This database location is of course read only. I did specify the busco database in the nextflow config:

    // BUSCO options
    busco_skip                          = false
    busco_mode                          = "genome"
    busco_lineage_datasets              = null
    busco_download_path                 = "/raven/ri/public_sequence_data/sanger-tol/blobtoolkit/2024_11/busco"

but I get the following error OSError: [Errno 30] Read-only file system: 'busco/file_versions.tsv'.

I believe this issue is described here: https://gitlab.com/ezlab/busco/-/issues/560

So it could be solved by passing ' --offline' to busco, but there is no place to put this flag in your nextflow config file.

What would you suggest?

Here is the pipeline output for my command:

nextflow run plant-food-research-open/assemblyqc -revision 2.2.1 -profile mpcdf,raven --input assemblysheet.csv --outdir result --busco_lineage_datasets aves_odb10

...
executor >  slurm (6)
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GUNZIP_FASTA                                                             -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GUNZIP_GFF3                                                              -
[64/4ebaac] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTAVALIDATOR (bPacMac)                                                 [100%] 1 of 1 ✔
[34/995987] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:SEQKIT_RMDUP (bPacMac)                                                   [100%] 1 of 1 ✔
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GFF3_GT_GFF3_GFF3VALIDATOR_STAT:GT_GFF3                                  -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GFF3_GT_GFF3_GFF3VALIDATOR_STAT:GT_GFF3VALIDATOR                         -
[27/b6465f] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GFF3_GT_GFF3_GFF3VALIDATOR_STAT:SAMTOOLS_FAIDX (yahs_scaffolds_final.fa) [100%] 1 of 1 ✔
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GFF3_GT_GFF3_GFF3VALIDATOR_STAT:GT_STAT                                  -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FCS_FCSADAPTOR                                                           -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:NCBI_FCS_GX:NCBI_FCS_GX_SETUP_SAMPLE                                     -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:NCBI_FCS_GX:NCBI_FCS_GX_SCREEN_SAMPLES                                   -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:NCBI_FCS_GX:NCBI_FCS_GX_KRONA_PLOT                                       -
[75/bb01d8] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:TAG_ASSEMBLY (bPacMac)                                                   [100%] 1 of 1 ✔
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FETCHNGS:CUSTOM_SRATOOLSNCBISETTINGS                                     -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FETCHNGS:SRATOOLS_PREFETCH                                               -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FETCHNGS:SRATOOLS_FASTERQDUMP                                            -
[ad/6b0ce6] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:ASSEMBLATHON_STATS (bPacMac)                                             [100%] 1 of 1 ✔
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:GFASTATS                                                                 -
[66/d51db8] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:BUSCO_ASSEMBLY (bPacMac)                            [100%] 1 of 1, failed: 1 ✘
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:PLOT_ASSEMBLY                                       -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:EXTRACT_PROTEINS                                    -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:BUSCO_ANNOTATION                                    -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:PLOT_ANNOTATION                                     -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:FILTER_BY_LENGTH                          -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:SORT_BY_LENGTH                            -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_EXPLORE                              -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_SEARCH_APRIORI                       -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_SEARCH_APOSTERIORI                   -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_PLOT_APRIORI                         -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_EXPLORE_SEARCH_PLOT_TIDK:TIDK_PLOT_APOSTERIORI                     -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:UNMASK_IF_ANY                                     -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:CUSTOM_SHORTENFASTAIDS                            -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:LTRHARVEST                                        -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:LTRFINDER                                         -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:CAT_CAT                                           -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:LTRRETRIEVER_LTRRETRIEVER                         -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:LTRRETRIEVER_LAI                                  -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_LTRRETRIEVER_LAI:CUSTOM_RESTOREGFFIDS                              -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_KRAKEN2:UNTAR                                                      -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_KRAKEN2:KRAKEN2                                                    -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_KRAKEN2:KRAKEN2_KRONA_PLOT                                         -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_RAW                            -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP                                 -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_FASTQC_UMITOOLS_FASTP:FASTQC_TRIM                           -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:SEQKIT_SORT                                                       -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_BWA_MEM_SAMBLASTER:BWA_INDEX                                -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_BWA_MEM_SAMBLASTER:BWA_MEM                                  -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:FASTQ_BWA_MEM_SAMBLASTER:SAMBLASTER                               -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:HICQC                                                             -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:MAKEAGPFROMFASTA                                                  -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:AGP2ASSEMBLY                                                      -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:ASSEMBLY2BEDPE                                                    -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:MATLOCK_BAM2_JUICER                                               -
[-        ] process > PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FQ2HIC:JUICER_SORT                                                       -
Plus 30 more processes waiting for tasks…
Execution cancelled -- Finishing pending tasks before exit
-[plant-food-research-open/assemblyqc] Pipeline completed with errors-
ERROR ~ Error executing process > 'PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:BUSCO_ASSEMBLY (bPacMac)'

Caused by:
  Process `PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:BUSCO_ASSEMBLY (bPacMac)` terminated with an error exit status (1)


Command executed:

  # Nextflow changes the container --entrypoint to /bin/bash (container default entrypoint: /usr/local/env-execute)
  # Check for container variable initialisation script and source it.
  if [ -f "/usr/local/env-activate.sh" ]; then
      set +u  # Otherwise, errors out because of various unbound variables
      . "/usr/local/env-activate.sh"
      set -u
  fi
  
  # If the augustus config directory is not writable, then copy to writeable area
  if [ ! -w "${AUGUSTUS_CONFIG_PATH}" ]; then
      # Create writable tmp directory for augustus
      AUG_CONF_DIR=$( mktemp -d -p $PWD )
      cp -r $AUGUSTUS_CONFIG_PATH/* $AUG_CONF_DIR
      export AUGUSTUS_CONFIG_PATH=$AUG_CONF_DIR
      echo "New AUGUSTUS_CONFIG_PATH=${AUGUSTUS_CONFIG_PATH}"
  fi
  
  # Ensure the input is uncompressed
  INPUT_SEQS=input_seqs
  mkdir "$INPUT_SEQS"
  cd "$INPUT_SEQS"
  for FASTA in ../tmp_input/*; do
      if [ "${FASTA##*.}" == 'gz' ]; then
          gzip -cdf "$FASTA" > $( basename "$FASTA" .gz )
      else
          ln -s "$FASTA" .
      fi
  done
  cd ..
  
  busco \
      --cpu 6 \
      --in "$INPUT_SEQS" \
      --out bPacMac-aves_odb10-busco \
      --mode genome \
      --lineage_dataset aves_odb10 \
      --download_path busco \
       \
      --metaeuk
  
  # clean up
  rm -rf "$INPUT_SEQS"
  
  # Move files to avoid staging/publishing issues
  mv bPacMac-aves_odb10-busco/batch_summary.txt bPacMac-aves_odb10-busco.batch_summary.txt
  mv bPacMac-aves_odb10-busco/*/short_summary.*.{json,txt} . || echo "Short summaries were not available: No genes were found."
  
  cat <<-END_VERSIONS > versions.yml
  "PLANTFOODRESEARCHOPEN_ASSEMBLYQC:ASSEMBLYQC:FASTA_GXF_BUSCO_PLOT:BUSCO_ASSEMBLY":
      busco: $( busco --version 2>&1 | sed 's/^BUSCO //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  2025-01-31 11:49:32 INFO:	***** Start a BUSCO v5.7.1 analysis, current time: 01/31/2025 11:49:32 *****
  2025-01-31 11:49:32 INFO:	Configuring BUSCO with local environment
  2025-01-31 11:49:32 INFO:	Running genome mode
  2025-01-31 11:49:32 INFO:	Downloading information on latest versions of BUSCO data...

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
  Traceback (most recent call last):
    File "/usr/local/bin/busco", line 54, in <module>
      run_BUSCO.main()
    File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 502, in main
      busco_run.run()
    File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 68, in run
      self.load_config()
    File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 60, in load_config
      self.config_manager.load_busco_config_main()
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoLogger.py", line 62, in wrapped_func
      self.retval = func(*args, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/busco/ConfigManager.py", line 63, in load_busco_config_main
      self.config_main.validate()
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoConfig.py", line 640, in validate
      self._init_downloader()
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoConfig.py", line 440, in _init_downloader
      self.downloader = BuscoDownloadManager(self)
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoDownloadManager.py", line 53, in __init__
      self._obtain_versions_file()
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoLogger.py", line 62, in wrapped_func
      self.retval = func(*args, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/busco/BuscoDownloadManager.py", line 77, in _obtain_versions_file
      urllib.request.urlretrieve(remote_filepath, local_filepath)
    File "/usr/local/lib/python3.7/urllib/request.py", line 257, in urlretrieve
      tfp = open(filename, 'wb')
  OSError: [Errno 30] Read-only file system: 'busco/file_versions.tsv'

Work dir:
  /raven/ptmp/luelze/nextflow/assemblyqc/work/66/d51db878fc06c903e20e457c9f0974

Container:
  /u/luelze/sw/nextflow/cache/depot.galaxyproject.org-singularity-busco-5.7.1--pyhdfd78af_0.img

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Command used and terminal output

Relevant files

No response

System information

No response

@AlcaArctica AlcaArctica added the bug Something isn't working label Jan 31, 2025
@GallVp
Copy link
Member

GallVp commented Jan 31, 2025

Thank you @AlcaArctica for the issue. Yes, the busco offline mode is not supported through the pipeline parameters. Nonetheless, you can turn it on by creating a custom.config file with following contents and pass it via the -c parameter.

custom.config

process {
    withName: 'BUSCO_BUSCO' {
        ext.args = '--metaeuk --offline'
    }
}

Updated command

nextflow run plant-food-research-open/assemblyqc -revision 2.2.1 -profile mpcdf,raven -c /path/to/custom.config --input assemblysheet.csv --outdir result --busco_lineage_datasets aves_odb10

@GallVp GallVp self-assigned this Jan 31, 2025
@GallVp GallVp added the awaiting-feedback Waiting for input from user label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-feedback Waiting for input from user bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants