Metrics utilization #25

samanvp · 2019-07-18T20:33:09Z

We define a decorator to utilize metrics module and collect anonymous usage
metrics from deepvariant_runner.

This PR contains the implementation of a decorator to collect metrics from DeepVariant Runner. A second PR will utilize the newly defined decorator to collect usage metrics.

We define a decorator to utilize metrics module and collect usage metrics from deepvariant_runner.

samanvp · 2019-07-18T20:35:42Z

Please note this PR is the continuation of #19 meaning that metrics.py and metrics_test.py is branched from that PR.

allieychen

Thanks Saman, it looks great!

allieychen · 2019-07-19T14:28:11Z

gcp_deepvariant_runner.py

@@ -1040,10 +1102,57 @@ def run(argv=None):
            'jobs. By default, the pipeline runs all 3 jobs (make_examples, '
            'call_variants, postprocess_variants) in sequence. '
            'This option may be used to run parts of the pipeline.'))
+  parser.add_argument(
+      '--stop_collecting_anonymous_usage_metrics',


I think add the argument collecting_anonymous_usage_metrics would be easier to read. Especially it is used as if not pipeline_args.stop_collecting_anonymous_usage_metrics:. And the action and help seem not relevant?

I agree the if statement would be more readable if this flag was not 'negative'. However, I really like this flag to explicitly indicate it stops metrics collection.

Please let me know if this does not make sense.

allieychen · 2019-07-19T14:45:25Z

gcp_deepvariant_runner.py

+
+  if not pipeline_args.stop_collecting_anonymous_usage_metrics:
+    metrics.add(
+        _get_project_number(pipeline_args.project),


I am not sure whether it still makes sense to collect the metrics when getting the project number failed (-1).

Yes it does. Ideally we like to know how many unique users we have and getting their project_id is needed for that. But in case we cannot get the project_id still it will be informative if we find out other information about the run (how many workers, failure or success, ...).

Do you know in which case it may fail to get the project_id?

I can imagine there might be some GCP IAM settings that prevents a service account from reading the project's metadata, I am guessing in those cases it will also raise an error.
It is also possible we can successfully fetch the project_id in all runs and never return -1. But in any case in _get_project_number() we have to run the gcloud projects describe command in a try-catch block.

According to gcloud projects describe --help:

This command can fail for the following reasons:

The project specified does not exist.

The active account does not have permission to access the given project

The first case does not occur to us (otherwise the gcloud alpha genomics pipelines run would't go through). So we will not log the right project_id when serviceaccount does not have enough permission.

In general we aren't allowed to log project ID (it's PII). I think we got approval to log a SHA1 hash of the project ID.

Note that the permissions required to get the project ID are non-trivial and so it might be likely that the SA doesn't have them.

Thank you Saman for the detailed explanation! I think the second case should not occur to us neither, since the user is running the pipeline in that project, which they must have access to. So I agree that it probably will never return -1 :).

Sorry I misread this as ID not number for some reason. Project number is fine :)

allieychen · 2019-07-19T14:56:51Z

gcp_deepvariant_runner.py

+      if pipeline_args.stop_collecting_anonymous_usage_metrics:
+        func(pipeline_args, *args, **kwargs)
+      else:
+        success = False


super nit: How about status = '_Failure', and after the func succeeds, having status = '_Success', so you can avoid if else below.

samanvp

Thank Allie for your comments.

samanvp · 2019-07-23T13:26:35Z

gcp_deepvariant_runner.py

+      if pipeline_args.stop_collecting_anonymous_usage_metrics:
+        func(pipeline_args, *args, **kwargs)
+      else:
+        success = False


samanvp · 2019-07-23T13:40:17Z

gcp_deepvariant_runner.py

@@ -1040,10 +1102,57 @@ def run(argv=None):
            'jobs. By default, the pipeline runs all 3 jobs (make_examples, '
            'call_variants, postprocess_variants) in sequence. '
            'This option may be used to run parts of the pipeline.'))
+  parser.add_argument(
+      '--stop_collecting_anonymous_usage_metrics',


I agree the if statement would be more readable if this flag was not 'negative'. However, I really like this flag to explicitly indicate it stops metrics collection.

Please let me know if this does not make sense.

samanvp · 2019-07-23T13:43:28Z

gcp_deepvariant_runner.py

+
+  if not pipeline_args.stop_collecting_anonymous_usage_metrics:
+    metrics.add(
+        _get_project_number(pipeline_args.project),


Yes it does. Ideally we like to know how many unique users we have and getting their project_id is needed for that. But in case we cannot get the project_id still it will be informative if we find out other information about the run (how many workers, failure or success, ...).

samanvp added 6 commits July 18, 2019 14:23

Add metrics module to DeepVariant Runner

ec4c557

This PR contains the implementation of a decorator to collect metrics from DeepVariant Runner. A second PR will utilize the newly defined decorator to collect usage metrics.

First round of comments (Metrics)

ae775d0

Second round of comments (Metrics)

2aaddc9

Third round of comments (Metrics)

fcc3426

modify console_type to CLOUD_HCLS_OSS

9c6c110

Using metrics module collect DV runner metrics

3978915

We define a decorator to utilize metrics module and collect usage metrics from deepvariant_runner.

samanvp requested review from kemp-google and allieychen July 18, 2019 20:33

allieychen reviewed Jul 19, 2019

View reviewed changes

First round of comments (metrics utilization)

f05ca90

samanvp force-pushed the metrics_utilization branch from 0a677df to f05ca90 Compare July 23, 2019 14:33

samanvp commented Jul 23, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics utilization #25

Metrics utilization #25

samanvp commented Jul 18, 2019

samanvp commented Jul 18, 2019

allieychen left a comment

allieychen Jul 19, 2019

samanvp Jul 23, 2019

allieychen Jul 19, 2019

samanvp Jul 23, 2019

allieychen Jul 23, 2019

samanvp Jul 23, 2019

samanvp Jul 23, 2019

kemp-google Jul 23, 2019

allieychen Jul 23, 2019

kemp-google Jul 23, 2019

allieychen Jul 19, 2019

samanvp Jul 23, 2019

samanvp left a comment

samanvp Jul 23, 2019

samanvp Jul 23, 2019

samanvp Jul 23, 2019

Metrics utilization #25

Are you sure you want to change the base?

Metrics utilization #25

Conversation

samanvp commented Jul 18, 2019

samanvp commented Jul 18, 2019

allieychen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samanvp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment