-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add callback function as parameter in the feature usage functions #396
base: main
Are you sure you want to change the base?
Add callback function as parameter in the feature usage functions #396
Conversation
Thanks @chinmaytredence for the contribution! The code looks good to me. @hangfei can also chime in. A few comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this work!
- Please add a user guide so we know how the design will look in the POV of users
- asyncio is a new lib so please follow the contribution guide to add its license and add to requirements/setup.py.
Thanks @xiaoyongzhu and @hangfei for your valuable comments. I am adding the tests and updating this PR |
added documentation for using callback function in client.py
@xiaoyongzhu @hangfei we have added the how-to-doc as well as the test case. Please review once. |
Could you please explain more about the purpose of these callback functions? What's their use cases? |
@windoze the primary aim to add this callback as parameter was to register the details about which notebook is using these features. This helps in generating end to end lineage of a feature. the callback function could be used to call one rest end point. The param is used as payload to the rest end point which would hold the metadata of the feature usage. In this way we will be able to extend the feature consumption layer. This will solve the issue #344 |
Took a look at the issue, if my understanding is correct, the final usage should look likes this:
My point is, what's the different between above snippet to the following one?
Why do we want to "callback" a "call-after" function? |
@windoze what if someone needs to store the feature usage in a separate DB to monitor it, or some other events we want to trigger based on the feature usage. Please suggest. |
Thanks, @chinmaytredence, I understand the motivation now. From my POV this callback is too generic and doesn't look very clear how it should be used, my suggestion is to add a more specific function like following:
So the user needs to register the consuming callback only once in the FeathrClient instead of adding them ad-hoc, and the consumption can be recorded automatically. |
@windoze thanks for suggesting the class level callback approach. Wouldn't it be too specific and what if someone wants to call different callbacks in different apis of the Fethrclient. Suppose i would be sending an email to the feature creator as well as call another api to store the feature in a DB |
We can add something like a For ad-hoc usage, I still think they can just call another function right after the |
Then I'll suggest to thoroughly review the requirement and the use case before this PR. I cannot imagine a case that requires totally different parameter sets in different places but for the same purpose, or you can add multiple callbacks if you do have multiple different purposes. If the variable is environmental, you can always wrap them into a callable Class and get needed info from the environment. It's easy to add a function, but it's hard to change or remove it later even if we find that the function is not what we expected. |
@windoze Please see the below code snippet
and
There are various features getting pulled in two different methods. If we have to register the features w.r.t online or offline feature store, then we need to send variable parameters. Please suggest. |
We could pass the consumption input/output information to the callback as the callback context. Given the main use cases, I think it's reasonable. Also it can be less error-prone, e.g., if you add the callback in an ad-hoc fashion:
You need to provide feature list twice and they could be inconsistent then you record everything wrong. I strongly recommend an accurate design for the clear scenarios, over of a overgeneric solution that can do anything. The latter will always be abused in an unexpected way and you can do nothing about it. |
I understand that the generic solution could do anything. But, the intention of this callback was to make it generic enough so that the capability of the handlers could be enhanced by calling other rest end points without any specific usecase. i am not able to visualize what could be breaking with this callback function in the Feathr implementations. Would you like to help me with some scenarios on this please. |
So far nothing will be breaking by this change because we're talking about adding instead of changing or removing functions, I was talking about the design solely. As to generic solution, I think nothing can be more generic than following:
And it doesn't require any change. With this PR, the code doesn't look so different:
Pretty much like we merge 2 lines into 1. So I'm really confused by the purpose of this PR. |
When we write some other piece of code apart from the client handler as below: |
I'm even more confused, isn't the code in the notebook written by the user? Even if the callback is added, how can you prevent user from calling the callback function directly? |
Good point. I think we could use the
and record_cosumption will take the features from the third parameter itself instead of we passing the features in the params. But what if we would like to send the path of notebook or the url of the notebook ? That's where the use case coming up where we can package all the details in one place instead of writing the callback function as a separate line as below:
|
Before the PR, we write:
After the PR, we write:
Whatever the info you need to provided in the former, you'll need the same in the latter, the only difference is "one line" or "two lines", right? If you want to use implicit parameters, you won't need to set callback in every |
Not necessarily on two lines. It is about the origin of your call stack. who is calling it. That makes it feathr client compliant. |
Could you please make a working sample? I'm not sure if I fully understand your point. |
Moved callback to constructor
The philosophy is, we are using Feathr client for all our operations. Be it register feature, or build feature, or get features. Instead user could directly use the lower level functions by passing the environment variables. Adhering to this, when we are going for any callbacks, I think feathr client is the best place to attach the callback and params. And I am confused where is the design issue in this ? |
The design issue is we don't need to add this callback, because we can always simply call whatever right after the Feathr call. This PR simply chains 2 function calls into 1 without any additional logic, so I think it's pointless. If you're talking about missing consumption statistics, this PR doesn't help as user may still forget to add the callback into the If you're talking about consumption forgery, this PR also doesn't help as user can still call the function to record fake consumptions directly. I don't see any issue resolved by this PR, that's why I put all these comments. |
Don't you think this gives us the capability to extend the capability of client by enabling it with rest api handlers ? We have separate set of code which does the registration of features in the purview. But this PR is mostly for extending the capability to call a rest api with params. |
it doesn't extend any capability, because logically
|
We can similarly say that instead of client.register_features(...) we can just call the purview . My whole point is atleast some how we make sure that the usage of Feathrclient is helping us to regsiter the consumption layer as well till we tightly couple the consumption layer code. |
Theoretically yes, as long as you're willing to reimplement all the logic to make sure your output is still compatible with Feathr. I can understand your point, but this PR doesn't do any help to your point, it just saved 1 keystroke by replacing a pair of braces with a comma. |
How this PR helps me ? I have one Fast API layer which has used feather as library and implemented the consumer_file url registration to purview registry with a new typedef. When the FeathrClient gets the ability to register a callback function, my notebook users would be able to use client.get_onlinefeatures(...,cb,params) which will enable me to call the fast api layer, which will do mulitple stuff in the background. Be it registering to purview, be it putting who is using it in my local db as well as sending one email etc. One more thing, while we are provisioning the cluster we will configure feathr client with the callback so the users will not be going through setting up feathr. Rather they will be more focused on the feature usage and model evaluations |
How about you try to use your Fast API layer and call it by using this?
Does it still do the same help to you? |
yes it does the same thing. But, the philosophy is to use FeathrClient always whenever user wants to connect for feathr related stuff. This makes the usage easier and user will not be going to forget as well |
That's why I used the word 'pointless' because logically this PR does nothing. My suggestions is, either we define a clear interface to systematically collect all consumptions, or, we let our users to record consumption by themselves, I put the suggestions about the former in previous comments, we can talk more if you're interested, but adding a half-baked function may not be a good idea. |
Then could you help me understand how the Feathrclient would be extendable with a rest api handler. Currently it is not able to give us any handler. Just think in the direction of Webhooks. |
Are you going to extend a REST API service or a client with Feathr? Feathr itself has many REST clients inside and there shouldn't be any trouble to add another one. As to the REST service, it's totally another topic, and I can hardly understand why do we want to do it. |
May be think in the direction of Webhook concept. |
Background: The current functions in feathr client which are used for getting the online or offline features, do not give the capability to extend the function to register the cosumption of the features by any notebook.
This PR includes changes for following functions:
In these above functions, two optional parameters: a callback function, params for the callback function, are added.