Precise number of PCM instances & jobs needed for HLS PGE #38

riverma · 2022-02-09T23:16:39Z

riverma
Feb 9, 2022
Maintainer

Hi @hhlee445 @chrisjrd - we have a general formula for predicting the number of EC2 instances required to execute a PGE on the PCM system in any given day: number_of_jobs * number_of_hours_execution_time / 24 hours. This equation is useful for providing an initial model of the rough number of workers needed but has an assumption built in: it only accounts for processing jobs, no data downloading / validating jobs

It'd be good to arrive at an updated and precise number for:

The number of jobs required for the HLS PGE (Release 1), including download / ingest, data validation, etc. steps that need to run on the PCM (HySDS)
The number of workers required for the HLS PGE (Release 1), including any reuse of already stood up workers

Let's get that conversation started here, can you all provide some updated numbers? Feel free to post a diagram of your latest job workflow if that's helpful.

riverma · 2022-02-16T23:12:46Z

riverma
Feb 16, 2022
Maintainer Author

Adding @hfattahi, @maseca @LucaCinquini because a missing piece of information we need first is to ascertain the expected and precise number of HLS products we need to download (which drives the full PCM system in total number of workers) down to a typical query.

@hfattahi - do you have some guidance here on expected set of collection(s) on LP.DAAC we'd need to ingest per day?

1 reply

hfattahi Feb 17, 2022

Our current understanding is 10500 tiles/day based on inputs from HLS folks. I assume if we query the CMR for the right products (HLSS30 and HLSL30) and filter with acquisition dates, we should be able to get to the exact numbers. The numbers may be smaller before 2017. 10500 seems to represent after 2017.

LucaCinquini · 2022-02-16T23:31:53Z

LucaCinquini
Feb 16, 2022
Maintainer

And related to that - Heresh do you know exactly what is the name of the HLS collection that we need to use to query LP.DAAC?

0 replies

hhlee445 · 2022-02-16T23:52:14Z

hhlee445
Feb 16, 2022
Collaborator

PCM is using HLSL30 and HLSS30. Are we using correct data set name?

1 reply

hfattahi Feb 17, 2022

Yes those are the products we are interested in. It should be from HLS V2.0

riverma · 2022-04-25T22:41:41Z

riverma
Apr 25, 2022
Maintainer Author

Hi @hhlee445 @chrisjrd -

Wanted to wrap up this discussion and converge on a good set of numbers.

Could you all help with an equation to estimate the below based on a variable called $num-ingest-products - which represents the number of products the PCM will pull from LP.DAAC for R1?

"num-download-jobs" - (pre-calculated) the number of expected jobs running on the PCM system, inclusive of all data download jobs from LP.DAAC, and misc. jobs needed to produce products
**num-download-jobs = number of inputs / 7 **
"num-pge-jobs" - (pre-calculated) the number of expected jobs running on the PCM system, inclusive of all PGE processing jobs
num-pge-jobs = number of inputs / 7
"num-misc-jobs" - (pre-calculated) all other PCM jobs needed for processing, submitting, etc.
num-misc-jobs = num_ingest_jobs_for_intput + num_state_config_jobs + num_purge_isl
"num-generated-products" - (pre-calculated) the number of expected products to generate and deliver to PO.DAAC
num-generated-pruducts = num_ingest_jobs for outputs (num-pge-jobs * (8 to 10)) + num-delivery-jobs
"num-delivery-jobs" = num_cnm_notify_jobs + num_cnm_response_jobs
num-cnm-notify-jobs and num-cnm-response-jobs are equal to num-pge-jobs

As mentioned above, we already have an equation to estimate the number of EC2 instances needed: number_of_jobs * number_of_hours_execution_time / 24 hours but we need to be precise about the number of jobs first. For the number of instances, it'd be good to be cognizant about how many instances are being reused.

2 replies

riverma Apr 29, 2022
Maintainer Author

Thank you @hhlee445 for filling the new equations in!

riverma May 2, 2022
Maintainer Author

@hhlee445 - some questions for clarification:

**num-download-jobs = number of inputs / 7 **
num-pge-jobs = number of inputs / 7

Why divided by 7?
number of inputs = $num-ingest-products here right?

num-misc-jobs = num_ingest_jobs_for_intput + num_state_config_jobs + num_purge_isl

Are num_ingest_jobs_for_input, num_state_config_jobs, and num_purge_isl configurable? If so how? If not, how are they calculated?

num-generated-pruducts = num_ingest_jobs for outputs (num-pge-jobs * (8 to 10)) + num-delivery-jobs

Shouldn't the num-generated-products be a direct function of $num-ingest-products and the expected generation strategy of the HLS PGE? Why this equation involving job numbers?
Is num_ingest_jobs for outputs configurable? If so how? If not, how is it calculated?
Why 8 to 10?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precise number of PCM instances & jobs needed for HLS PGE #38

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Precise number of PCM instances & jobs needed for HLS PGE #38

riverma Feb 9, 2022 Maintainer

Replies: 4 comments · 4 replies

riverma Feb 16, 2022 Maintainer Author

hfattahi Feb 17, 2022

LucaCinquini Feb 16, 2022 Maintainer

hhlee445 Feb 16, 2022 Collaborator

hfattahi Feb 17, 2022

riverma Apr 25, 2022 Maintainer Author

riverma Apr 29, 2022 Maintainer Author

riverma May 2, 2022 Maintainer Author

riverma
Feb 9, 2022
Maintainer

Replies: 4 comments 4 replies

riverma
Feb 16, 2022
Maintainer Author

LucaCinquini
Feb 16, 2022
Maintainer

hhlee445
Feb 16, 2022
Collaborator

riverma
Apr 25, 2022
Maintainer Author

riverma Apr 29, 2022
Maintainer Author

riverma May 2, 2022
Maintainer Author