forked from laurentedel/cml_churn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path5_model_serve_explainer.py
executable file
·211 lines (196 loc) · 8.07 KB
/
5_model_serve_explainer.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
## Part 5: Model Serving
#
# This notebook explains how to create and deploy Models in CML which function as a
# REST API to serve predictions. This feature makes it very easy for a data scientist
# to make trained models available and usable to other developers and data scientists
# in your organization.
#
# In the last part of the series, you learned:
# - the requirements for running an Experiment
# - how to set up a new Experiment
# - how to monitor the results of an Experiment
# - limitations of the feature
#
# In this part, you will learn:
# - the requirements for creating and deploying a Model
# - how to deploy a Model
# - how to test and use a Model
# - limitations of the feature
#
# If you haven't yet, run through the initialization steps in the README file and Part 1.
# In Part 1, the data is imported into the `default.telco_churn` table in Hive.
# All data accesses fetch from Hive.
#
### Requirements
# Models have the same requirements as Experiments:
# - model code in a `.py` script, not a notebook
# - a `requirements.txt` file listing package dependencies
# - a `cdsw-build.sh` script containing code to install all dependencies
#
# > In addition, Models *must* be designed with one main function that takes a dictionary as its sole argument
# > and returns a single dictionary.
# > CML handles the JSON serialization and deserialization.
# In this file, there is minimal code since calculating predictions is much simpler
# than training a machine learning model.
# Once again, we use the `ExplainedModel` helper class in `churnexplainer.py`.
# When a Model API is called, CML will translate the input and returned JSON blobs to and from python dictionaries.
# Thus, the script simply loads the model we saved at the end of the last notebook,
# passes the input dictionary into the model, and returns the results as a dictionary with the following format:
#
# {
# 'data': dict(data),
# 'probability': probability,
# 'explanation': explanation
# }
#
# The Model API will return this dictionary serialized as JSON.
#
### Model Operations
#
# This model is deployed using the model operations feature of CML which consists of
# [Model Metrics](https://docs.cloudera.com/machine-learning/cloud/model-metrics/topics/ml-enabling-model-metrics.html)
# and [Model Governance](https://docs.cloudera.com/machine-learning/cloud/model-governance/topics/ml-enabling-model-governance.html)
#
# The first requirement to make the model use the model metrics feature by adding the
# `@cdsw.model_metrics` [Python Decorator](https://wiki.python.org/moin/PythonDecorators)
# before the fuction.
#
# Then you can use the *`cdsw.track_metric`* function to add additional
# data to the underlying database for each call made to the model.
# **Note:** `cdsw.track_metric` has different functionality depening on if its being
# used in an *Experiment* or a *Model*.
#
# More detail is available
# using the `help(cdsw.track_mertic)` function
#```
# help(cdsw.track_metric)
# Help on function track_metric in module cdsw:
#
# track_metric(key, value)
# Description
# -----------
#
# Tracks a metric for an experiment or model deployment
# Example:
# model deployment usage:
# >>>@cdsw.model_metrics
# >>>predict_func(args):
# >>> cdsw.track_metric("input_args", args)
# >>> return {"result": "prediction"}
#
# experiment usage:
# >>>cdsw.track_metric("input_args", args)
#
# Parameters
# ----------
# key: string
# The metric key to track
# value: string, boolean, numeric
# The metric value to track
#```
#
#
### Creating and deploying a Model
# To create a Model using our `5_model_serve_explainer.py` script, use the following settings:
# * **Name**: Explainer
# * **Description**: Explain customer churn prediction
# * **File**: `5_model_serve_explainer.py`
# * **Function**: explain
# * **Input**:
# ```
# {
# "StreamingTV": "No",
# "MonthlyCharges": 70.35,
# "PhoneService": "No",
# "PaperlessBilling": "No",
# "Partner": "No",
# "OnlineBackup": "No",
# "gender": "Female",
# "Contract": "Month-to-month",
# "TotalCharges": 1397.475,
# "StreamingMovies": "No",
# "DeviceProtection": "No",
# "PaymentMethod": "Bank transfer (automatic)",
# "tenure": 29,
# "Dependents": "No",
# "OnlineSecurity": "No",
# "MultipleLines": "No",
# "InternetService": "DSL",
# "SeniorCitizen": "No",
# "TechSupport": "No"
# }
# ```
#* **Kernel**: Python 3
#* **Engine Profile**: 1 vCPU / 2 GiB Memory
#
# The rest can be left as is.
#
# After accepting the dialog, CML will *build* a new Docker image using `cdsw-build.sh`,
# then *assign an endpoint* for sending requests to the new Model.
## Testing the Model
# > To verify it's returning the right results in the format you expect, you can
# > test any Model from it's *Overview* page.
#
# If you entered an *Example Input* before, it will be the default input here,
# though you can enter your own.
## Using the Model
#
# > The *Overview* page also provides sample `curl` or Python commands for calling your Model API.
# > You can adapt these samples for other code that will call this API.
#
# This is also where you can find the full endpoint to share with other developers
# and data scientists.
#
# **Note:** for security, you can specify
# [Model API Keys](https://docs.cloudera.com/machine-learning/cloud/models/topics/ml-model-api-key-for-models.html)
# to add authentication.
## Limitations
#
# Models do have a few limitations that are important to know:
# - re-deploying or re-building Models results in Model downtime (usually brief)
# - re-starting CML does not automatically restart active Models
# - Model logs and statistics are only preserved so long as the individual replica is active
#
# A current list of known limitations are
# [documented here](https://docs.cloudera.com/machine-learning/cloud/models/topics/ml-models-known-issues-and-limitations.html).
from collections import ChainMap
import cdsw, numpy
from churnexplainer import ExplainedModel
#Load the model save earlier.
em = ExplainedModel(model_name='telco_linear',data_dir='/home/cdsw')
# *Note:* If you want to test this in a session, comment out the line
#`@cdsw.model_metrics` below. Don't forget to uncomment when you
# deploy, or it won't write the metrics to the database
@cdsw.model_metrics
# This is the main function used for serving the model. It will take in the JSON formatted arguments , calculate the probablity of
# churn and create a LIME explainer explained instance and return that as JSON.
def explain(args):
data = dict(ChainMap(args, em.default_data))
data = em.cast_dct(data)
probability, explanation = em.explain_dct(data)
# Track inputs
cdsw.track_metric('input_data', data)
# Track our prediction
cdsw.track_metric('probability', probability)
# Track explanation
cdsw.track_metric('explanation', explanation)
return {
'data': dict(data),
'probability': probability,
'explanation': explanation
}
# To test this is a session, comment out the `@cdsw.model_metrics` line,
# uncomment the and run the two rows below.
#x={"StreamingTV":"No","MonthlyCharges":70.35,"PhoneService":"No","PaperlessBilling":"No","Partner":"No","OnlineBackup":"No","gender":"Female","Contract":"Month-to-month","TotalCharges":1397.475,"StreamingMovies":"No","DeviceProtection":"No","PaymentMethod":"Bank transfer (automatic)","tenure":29,"Dependents":"No","OnlineSecurity":"No","MultipleLines":"No","InternetService":"DSL","SeniorCitizen":"No","TechSupport":"No"}
#explain(x)
## Wrap up
#
# We've now covered all the steps to **deploying and serving Models**, including the
# requirements, limitations, and how to set up, test, and use them.
# This is a powerful way to get data scientists' work in use by other people quickly.
#
# In the next part of the project we will explore how to launch a **web application**
# served through CML.
# Your team is busy building models to solve problems.
# CML-hosted Applications are a simple way to get these solutions in front of
# stakeholders quickly.