-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bioboxes CLI does not work on OSX #186
Comments
What kind of temporary files does the CLI store? IMO the CLI should avoid to copy or move original output files as these could be very large. A move/copy can double the occupied space on disk and might move between different filesystems. |
@fungs I think the Idea was to make it for a biobox impossible to remove data in a mounted output directory. |
@pbelmann The CLI could simply refuse to mount a non-empty output folder (+ force switch). Also, I've had a look at https://github.com/boot2docker/boot2docker "Folder sharing" and according to their description, docker is running as a remote instance and they recommend file transfer over the (virtual) network. So I'm not really sure to what extend Linux mounts are working. |
Another point is that by mounting a temporary directory, we can skip the output biobox.yaml. At the moment:
So in my opinion is the solution to this temporary directory creating and moving eventually large files the following:
This way the output is a directory and contains an output file and the biobox.yaml which is maybe not as nice as the previous solution. PR for this solution: bioboxes/command-line-interface#68 |
The output files, but also the temporary biobox.yaml files.
I agree that copying large files is undesirable.
My suggestion might be to create a temporary hidden directory in the current
And also to hide the docker mechanics from the user.
I use a Mac and the CLI does work with the work around I suggested in the
I definitely prefer this solution as my opinion is that it is a more consistent |
I think the hidden dir is an acceptable approach, this is basically what downloaders do when they retrieve files (e.g. firefox does name them file.part and renames them when the download is finished). However, IMO a better solution would be to make the biobox directly write to the destination file because that would allow the container to work in a streaming context, e.g. using a fifo special file or processing the contig via standard input directly when it is being produced, e.g. for compression. Whether this makes sense in the assembly context, I'm not sure, but the CLI should be universal. The most straight-forward way to do this would be to mount the contig file (auto-created empty file) into the biobox output directory. AFAIK we have nowhere a requirement that the output folder must be a host mounted folder and other files than the contigs file which are created there will not be used anyway. The same would go for any input or output file or subfolder. |
I agree that a hidden dir is not the ideal solution I think it might be the pragmatic one for solving this problem. I'm open to alternatives - I can't think of one where we could make the container write to specific file without changing the spec to some how specify this ahead of running it. This is because the current spec identifies the files with tags in the output biobox.yaml rather than with specific file names. |
I have found a bug in the biobox command line interface - the CLI will always
fail when run on OSX. The reason is because of how docker works on OSX where
boot2docker is used to run docker because it does not work natively.
Boot2Docker works by creating a Linux VM in memory and then running docker in
this VM. Any mounted volumes are effectively doubly mounted:
the the VM.
The biobox CLI fails because it stores temporary files in ${TMPDIR}. This
however is the ${TMPDIR} in the VM on not on the user's computer. Therefore
when the CLI tries to copy the output files back to the current directory after
docker finishes these files do not exist in the expected location. I think this
is a serious bug as it breaks the bioboxes CLI on a common platform.
A temporary solution to this is to run TMPDIR=$(pwd) in the current shell and
then use the biobox cli as usual. A longer term solution would be to set the
temporary directory to be a hidden directory within the directory in which the
commands are being run.
The text was updated successfully, but these errors were encountered: