How to deal with output directories #73

Gibbsdavidl · 2019-08-08T00:28:08Z

Hello,

When running a job with "gcloud alpha genomics pipelines run", I have an output that is a couple different directories...
/mnt/data/output/A
/mnt/data/output/B

Is there any way to copy the directories A and B to my GCS without naming all files?

It fails because the pipelines tries: gsutil /mnt/data/output/* gs://my_bucket

Similar to the samtools example yaml, I have:
outputParameters:

name: outputPath
description: Cloud Storage path for where bamtofastq writes
localCopy:
path: output/*
disk: datadisk

And:
gcloud alpha genomics pipelines run
--pipeline-file my.yaml
--inputs bamfiles.bam
--outputs outputPath=gs://cgc_bam_bucket_007/output/ \

I was thinking that in the docker cmd: >, the output dir could be tarred up, and then
the output is just a tarball. But it's not a great solution.

Please help?

mbookman · 2019-08-08T00:37:23Z

Hi @Gibbsdavidl !

I would recommend that you use dsub. I think it will provide a better experience than the gcloud command-line, including having support for wildcards and recursive inputs and outputs.

See https://github.com/DataBiosphere/dsub#working-with-input-and-output-files-and-folders.

Gibbsdavidl · 2019-08-08T18:32:06Z

Hey there!!

Good call. I'm already having a better time... so much easier for what I want.
It's really come a long ways (in terms of development)!

-dave

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with output directories #73

How to deal with output directories #73

Gibbsdavidl commented Aug 8, 2019

mbookman commented Aug 8, 2019

Gibbsdavidl commented Aug 8, 2019

How to deal with output directories #73

How to deal with output directories #73

Comments

Gibbsdavidl commented Aug 8, 2019

mbookman commented Aug 8, 2019

Gibbsdavidl commented Aug 8, 2019