Skip to content
This repository has been archived by the owner on Mar 30, 2020. It is now read-only.

How to deal with output directories #73

Open
Gibbsdavidl opened this issue Aug 8, 2019 · 2 comments
Open

How to deal with output directories #73

Gibbsdavidl opened this issue Aug 8, 2019 · 2 comments

Comments

@Gibbsdavidl
Copy link

Hello,

When running a job with "gcloud alpha genomics pipelines run", I have an output that is a couple different directories...
/mnt/data/output/A
/mnt/data/output/B

Is there any way to copy the directories A and B to my GCS without naming all files?

It fails because the pipelines tries: gsutil /mnt/data/output/* gs://my_bucket

Similar to the samtools example yaml, I have:
outputParameters:

  • name: outputPath
    description: Cloud Storage path for where bamtofastq writes
    localCopy:
    path: output/*
    disk: datadisk

And:
gcloud alpha genomics pipelines run
--pipeline-file my.yaml
--inputs bamfiles.bam
--outputs outputPath=gs://cgc_bam_bucket_007/output/ \

I was thinking that in the docker cmd: >, the output dir could be tarred up, and then
the output is just a tarball. But it's not a great solution.

Please help?

@mbookman
Copy link
Contributor

mbookman commented Aug 8, 2019

Hi @Gibbsdavidl !

I would recommend that you use dsub. I think it will provide a better experience than the gcloud command-line, including having support for wildcards and recursive inputs and outputs.

See https://github.com/DataBiosphere/dsub#working-with-input-and-output-files-and-folders.

@Gibbsdavidl
Copy link
Author

Hey there!!

Good call. I'm already having a better time... so much easier for what I want.
It's really come a long ways (in terms of development)!

-dave

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants