Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run gets stuck in binding step (anansesnake) #14

Closed
janursa opened this issue Apr 1, 2024 · 4 comments
Closed

run gets stuck in binding step (anansesnake) #14

janursa opened this issue Apr 1, 2024 · 4 comments

Comments

@janursa
Copy link

janursa commented Apr 1, 2024

i submit a job based on the anansesnake and doent seem like it's progressing. This is the log file of the job after 120 hours on 6 cores. Similar was after 40 hours using 40 cores.

Config
rna_samples            : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
rna_tpms               : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/TPM.tsv
rna_counts             : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/RNA_Counts.tsv
atac_samples           : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
atac_counts            : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv
genome                 : hg38
result_dir             : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6
contrasts              : ['anansesnake_B-cells_average', 'anansesnake_agg-type_average', 'anansesnake_NK-cells_average', 'anansesnake_Myeloid-cells_average']
database               : gimme.vertebrate.v5.0
jaccard                : 0.1
edges                  : 500000
padj                   : 0.05
plot_type              : png
tmp_dir                : None

Resources
mem_mb                 : 60000
_cores                 : 6
deseq2                 : 1

Conditions
B-cells                :
  RNA-seq samples:  ['B-cells']
  ATAC-seq samples: ['B-cells']
average                :
  RNA-seq samples:  ['average']
  ATAC-seq samples: ['average']
agg-type               :
  RNA-seq samples:  ['agg-type']
  ATAC-seq samples: ['agg-type']
NK-cells               :
  RNA-seq samples:  ['NK-cells']
  ATAC-seq samples: ['NK-cells']
Myeloid-cells          :
  RNA-seq samples:  ['Myeloid-cells']
  ATAC-seq samples: ['Myeloid-cells']

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=60000, deseq2=1
Job stats:
job              count    min threads    max threads
-------------  -------  -------------  -------------
all                  1              1              1
binding              5              1              1
influence            4              1              1
maelstrom            1              6              6
motif2factors        1              6              6
network              5              1              1
pfmscorefile         1              6              6
plot                 4              1              1
total               22              1              6

Select jobs to execute...

[Mon Mar 25 18:18:31 2024]
rule motif2factors:
    input: /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_m2f.txt
    jobid: 5
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 18:18:44 2024]
Finished job 5.
1 of 22 steps (5%) done
Select jobs to execute...

[Mon Mar 25 18:18:45 2024]
rule maelstrom:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_maelstrom.txt
    jobid: 25
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp, mem_mb=40000

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:20:17 2024]
Finished job 25.
2 of 22 steps (9%) done
Select jobs to execute...

[Mon Mar 25 19:20:17 2024]
rule pfmscorefile:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_pfmscorefile.txt
    jobid: 6
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:56:53 2024]
Finished job 6.
3 of 22 steps (14%) done
Select jobs to execute...

[Mon Mar 25 19:56:53 2024]
rule binding:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/log_Myeloid-cells.txt
    jobid: 23
    benchmark: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/benchmarks/binding_Myeloid-cells.txt
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    wildcards: condition=Myeloid-cells
    resources: tmpdir=/tmp, mem_mb=40000

Activating conda environment: .snakemake/conda/d744163a4690c04ba52f3bf00737fc7a_
slurmstepd: error: *** JOB 6311320 ON max-wn050 CANCELLED AT 2024-03-30T18:18:31 DUE TO TIME LIMIT ***```
@Arts-of-coding
Copy link

Hi @janursa,

Can you provide the output of this log file?
/beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/log_Myeloid-cells.txt

I noticed that you have specified the "tmpdir" at its default location. It might help if the tmpdir is specified on a non-default location where there is certainly enough space. You can run this following command directly before the anansnake step: export TMPDIR=/...../tmp/

@janursa
Copy link
Author

janursa commented Apr 3, 2024

Regarding the default /tmp/, i checked it and it has enough space.

The log_Myeloid-cells.txt has the following content.


Matplotlib is building the font cache; this may take a moment.
2024-03-25 19:58:24 | INFO | Loading specified motif file: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
2024-03-25 19:58:39 | INFO |   Using motifs for 828 factors
2024-03-25 19:58:45 | INFO | Loading pre-scanned motif scores
2024-03-25 19:59:40 | INFO |   Using 135358 regions
2024-03-25 19:59:40 | INFO | Loading ATAC data
2024-03-25 19:59:40 | INFO |   Using 1 columns
2024-03-25 19:59:41 | WARNING | Expected region width is 200, got 841.
2024-03-25 19:59:41 | DEBUG | Quantile normalization for ATAC
2024-03-25 19:59:45 | INFO |   Columns being used for model type: ['ATAC', 'motif']
2024-03-25 19:59:45 | INFO | Loading models
2024-03-25 19:59:47 | INFO |   Using 238 models
2024-03-25 19:59:49 | INFO | Predicting TF activity
2024-03-25 19:59:49 | INFO |     Motif activity prediction on ATAC data, run 1/3
2024-03-25 19:59:58,494 - INFO - motif scanning (scores)
2024-03-25 19:59:58,494 - INFO - reading table
2024-03-25 20:00:12,414 - INFO - creating score table (z-score, GC%)
2024-03-25 20:56:31,074 - INFO - done
2024-03-25 20:56:31,075 - INFO - creating dataframe
2024-03-25 20:56:54,848 - INFO - Fitting BayesianRidge
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:47<00:00, 47.39s/it]
100%|██████████| 1/1 [00:47<00:00, 47.39s/it]
2024-03-25 20:57:51,310 - INFO - Done
2024-03-25 20:57:51 | INFO |     Motif activity prediction on ATAC data, run 2/3
2024-03-25 20:58:00,260 - INFO - motif scanning (scores)
2024-03-25 20:58:00,261 - INFO - reading table
2024-03-25 20:58:05,182 - INFO - using 14000 sequences
2024-03-25 20:58:05 | INFO |     Motif activity prediction on ATAC data, run 3/3
2024-03-25 20:58:13,983 - INFO - motif scanning (scores)
2024-03-25 20:58:13,983 - INFO - reading table```

@Arts-of-coding
Copy link

Hi @janursa,

That is good to check. I noticed that you get a warning of the expected region width (841bp instead of 200 bp peaks), which implies that either these regions are no peaks or they are very broad peaks. Since the "reading table" is after the "Fitting BayesianRidge", it might be related to a previous issue: ANANSE issue 90. Could this still be a problem @siebrenf?

For now I would suggest to try anansnake out with the example data or the constructed data (with 200bp peaks) from the anansescanpy package.

If this works, that means that your peak length needs to be trimmed. I would suggest to use snapatac2 for generating a cellxpeak matrix. Alternatively, you can create this matrix with Seurat and Signac in R, see vignette.

@janursa
Copy link
Author

janursa commented Apr 19, 2024

it turned out that there was something problematic with the installation. i reinstalled the package and everything went through.

@janursa janursa closed this as completed Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants