Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ah_var_store] Retry with more memory VCF extract [VS-1351] #8812

Merged
merged 4 commits into from
May 7, 2024

Conversation

mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented May 6, 2024

Currently in draft status, I still need to test this the way I tested the PGEN version. Tested, works, ready for review!

@mcovarr mcovarr force-pushed the vs_1351_retry_with_more_memory_vcf_extract branch from b076447 to d7bc9f2 Compare May 6, 2024 15:30
@@ -368,7 +368,22 @@ task ExtractTask {
--filter-set-name ~{filter_set_name}'
fi

gatk --java-options "-Xmx~{memory_gib - 3}g" \
# This tool may get invoked with "Retry with more memory" with a different amount of memory than specified in
# the input `memory_gib`, so use the memory-related environment variables rather than the `memory_gib` input.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to set the memory in the run-time block from the memory-related environment variables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of, and the way "retry with more memory" works I don't think it would matter. Cromwell basically remembers how much memory the task was last run with, and if it fails with looks like an out of memory error, Cromwell retries the task with an overridden, "multiplied" memory value: https://support.terra.bio/hc/en-us/articles/4403215299355-Out-of-Memory-Retry

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. But that would defeat the whole point. So it just allocates to the VM the same amount of memory (here memory_gib) each time it retries??
Line 1.2 Some commands/tools might be able to pick up and use the additional memory provided automatically without customization. seems to imply that the VM will come up with more memory and tools like python would then recognize that they're on a VM with more memory than before?.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a run where I added in some fake "out of memory" code that shows the task being retried with increasing amounts of memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that you had it OOM before hitting the actual Java command, and from try 4: MEM_SIZE: 69.12, MEM_UNIT: GB, memory_mb: 66120, memory_gib: 40.

So, what I'm still confused about is how would cromwell up the memory needed for the VM for each retry? At this point the memory that Java will ask for is more than the VM has. The VM's memory is always set to memory_gib which is 40. How is that even supposed to work? Is it possible that if the runtime memory was hard-coded to 40, then cromwell might be smart enough to up that as it's upping MEM_SIZE?

Basically, I don't understand the use of retrying with more memory if there's no way that that additional memory isn't requested on the VM we are spinning up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically Cromwell doesn't honor the original memory specification on retry-with-more-memory retries, it uses the scaled values instead.

@mcovarr mcovarr force-pushed the vs_1351_retry_with_more_memory_vcf_extract branch from 788b0d1 to 733163a Compare May 6, 2024 21:28
@mcovarr
Copy link
Collaborator Author

mcovarr commented May 6, 2024

Tested and working, taking out of Draft status.

@mcovarr mcovarr marked this pull request as ready for review May 6, 2024 21:54
@mcovarr mcovarr changed the title Retry with more memory for VCF extract [VS-1351] [ah_var_store] Retry with more memory for VCF extract [VS-1351] May 6, 2024
@mcovarr mcovarr changed the title [ah_var_store] Retry with more memory for VCF extract [VS-1351] [ah_var_store] Retry with more memory VCF extract [VS-1351] May 6, 2024
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Also wondering if we want to add in documentation on the quota increases that we need for extract so that those can be requested well before extract is kicked off?

@mcovarr
Copy link
Collaborator Author

mcovarr commented May 7, 2024

LGTM. Also wondering if we want to add in documentation on the quota increases that we need for extract so that those can be requested well before extract is kicked off?

I will be more than happy to document how to have quotas increased as soon as I figure out how to actually accomplish that, which so far I have not. 😞

@mcovarr mcovarr merged commit b447d7f into ah_var_store May 7, 2024
17 checks passed
@mcovarr mcovarr deleted the vs_1351_retry_with_more_memory_vcf_extract branch May 7, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants