Replies: 4 comments 4 replies
-
Adding @hfattahi, @maseca @LucaCinquini because a missing piece of information we need first is to ascertain the expected and precise number of HLS products we need to download (which drives the full PCM system in total number of workers) down to a typical query. @hfattahi - do you have some guidance here on expected set of collection(s) on LP.DAAC we'd need to ingest per day? |
Beta Was this translation helpful? Give feedback.
-
And related to that - Heresh do you know exactly what is the name of the HLS collection that we need to use to query LP.DAAC? |
Beta Was this translation helpful? Give feedback.
-
PCM is using HLSL30 and HLSS30. Are we using correct data set name? |
Beta Was this translation helpful? Give feedback.
-
Wanted to wrap up this discussion and converge on a good set of numbers. Could you all help with an equation to estimate the below based on a variable called $num-ingest-products - which represents the number of products the PCM will pull from LP.DAAC for R1?
As mentioned above, we already have an equation to estimate the number of EC2 instances needed: |
Beta Was this translation helpful? Give feedback.
-
Hi @hhlee445 @chrisjrd - we have a general formula for predicting the number of EC2 instances required to execute a PGE on the PCM system in any given day:
number_of_jobs * number_of_hours_execution_time / 24 hours
. This equation is useful for providing an initial model of the rough number of workers needed but has an assumption built in: it only accounts for processing jobs, no data downloading / validating jobsIt'd be good to arrive at an updated and precise number for:
Let's get that conversation started here, can you all provide some updated numbers? Feel free to post a diagram of your latest job workflow if that's helpful.
Beta Was this translation helpful? Give feedback.
All reactions