Check pointing / in progress results #734
Unanswered
parcbioinfo
asked this question in
Q&A
Replies: 1 comment
-
I guess this was a long way of asking if there are any non-blocking map-reduce functions, which has been discussed before over at futureverse/future.apply#44, so that answers that question. I'll leave this up here for the time being in case anyone has some general advise as to best practice for checkpointing/in-progress results saving. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello all - first off thank you so much for the future framework, it has made sharing code and collaborating within my lab much easier.
I was curious if there was an easy way to save checkpoints or in-progress results, generally, using any of the map-reduce APIs?
My current use case is an operation which takes ~15 minutes per gene, across ~20,000 genes. This task is not quite important enough to throw onto the formal cluster, so I am using shared resources and the ability to restart/grab partial results would be highly useful.
I do understand I could just do this manually with many smaller calls to the map-reduce APIs, but this would lead to a lot of workers sitting idle at the end of each call, which will make a significant difference over the multiple days this is looking to take to run. (something like 10 additional hours to run, if I assume I lose on average 7m of runtime per worker at the end of a chunk, and we save progress 100 times (every 1%))
Right now I'm doing this:
This is working fine, but I can't help but feel like I am reinventing the wheel. No doubt this process is also maximizing the overhead costs by starting the absolute maximum number of futures, which feels bad. I can say that some simulations have the above code running faster than 100 manual map-reduces, but slower than just running a single map-reduce.
Any advise would be welcome. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions