ONNX conversion #181

adrinjalali · 2022-10-10T12:54:04Z

We should have convenience methods to convert scikit-learn fitted models to ONNX, and have easy ways to check if a model can be converted to ONNX.

The https://github.com/onnx/sklearn-onnx project as well as https://github.com/sdpython/mlprodict/ are where we can start to see how to implement these features.

Once this is in place, we can think of ways to automatically do the conversion, if possible, on the hub side.

BenjaminBossan · 2022-10-13T14:26:11Z

convenience methods to convert scikit-learn fitted models to ONNX, and have easy ways to check if a model can be converted to ONNX

I went through most of the sklearn-onnx docs (which I often found confusing) and have to say this looks to be more difficult than I initially expected. There seem to be many missing pieces, pitfalls, specific adjustments that have to be made to models and pipelines, many of which are not trivially automated.

As an example, if I understand correctly, the onnx runtime doesn't support sparse matrices yet (https://onnx.ai/sklearn-onnx/auto_tutorial/plot_usparse_xgboost.html#tfidf-and-sparse-matrices), so most of text classification/regression will only work with small data, since converting to dense would be too expensive otherwise. And as soon as users have custom estimators, automatic conversion is basically impossible.

Maybe we can first start with supporting inference of ONNX models but require users to do the conversion themselves?

What would be the main benefits of ONNX anyway? Is it for efficiency or for security?

adrinjalali · 2022-10-17T10:19:06Z

We can start with a utility which tries to convert models, and it might fail on any complex case, and then improve that over time.

I suspect a good part of that might later end up in the sklearn-onnx lib itself.

I don't think we have to worry about using onnx in inference time much yet, the value here is more about people not having to load pickles rather than us not doing that. We're also not really focusing on the performance of the inference API ATM, so thta's not an issue yet.

People might like if for efficiency or security, but they like it.

omar-araboghli · 2023-02-13T09:03:11Z

Despite whether onnx is efficient or not and if the conversion to its format is within the scope of skops, I have had a use-case where I had to convert many ML models to an intermediate formats that can be used by one python environment. Specifically, I converted sklearn models to onnx and torch.Module to torch.ScriptModule to make that work.

Now let's imagine the following use-case:

I have a REST API in the project my-app that depends on sklearn version x.1.x.
I dumped sklearn.KMeans version x.1.x with skops and uploaded it to the hub.
I dumped sklearn.KMeans version x.2.x with skops and uploaded it to the hub.
my-app tries to download the models from the hub and load them in its environment.

Here it's a question to my understanding of skops rather than a statement. Would my-app break ? If yes, then what's the benefit of introducing the skops format apart from the ability of storing the metadata on the hub to make loading and saving easier ?

Also, introducing the model Intermediate Representation (IR) from scratch in skops would mean reinventing the wheel, thus, providing a conversion utility to onnx, also when it's not very mature yet, seems to be a good path to take.

BenjaminBossan · 2023-02-13T12:53:32Z

Would my-app break ? If yes, then what's the benefit of introducing the skops format apart from the ability of storing the metadata on the hub to make loading and saving easier ?

Your app would most likely not break because sklearn rarely makes backwards incompatible changes. However, sklearn doesn't go so far as to guarantee it will never do that, which is why you would generally get a warning if you load an sklearn model using a different sklearn version.

To give a hypothetical example, in version 1.2.0, sklearn could add a new attribute foo_ to KMeans that is set during fitting and which is required for inference. If you load a model from version 1.1.0, the attribute is missing, thus the model could not be used. There are possibly ways of avoiding this breakage, but that would add a lot of extra maintenance burden. Also, there can be many edge cases when it comes to what is considered bc breaking and what isn't.

Coming to the second part of your question, the main objective of the skops persistence format is to present a secure alternative to pickle for the sklearn ecosystem. If this is not clear from our docs, please let us know and we can clarify.

Regarding the specific comparison to ONNX, the goals are quite different. For skops, we have:

Security is the number one concern (runtime performance is not a goal)
Stick very close to sklearn, with all the pros and cons
Support as much of the sklearn ecosystem as possible
Loading a skops object results in the original sklearn object, therefore it's possible to train models loaded with the skops format

For ONNX, we have:

Generate a common IR, which abstracts away many implementation details of sklearn, has potential to run much faster (ONNX is secure too but that's not the main concern)
Therefore, there can be some deviations from the sklearn implementation
Many parts of sklearn are not supported yet
Only used for inference, the original sklearn object cannot be reconstructed (thus no further training possible)

There are more differences of course, this is just from the top of my head.

omar-araboghli · 2023-02-14T08:41:46Z

Many thanks for clarifying the borders! Indeed then, users can use both skops and sklearn-onnx based on their specific needs.

adrinjalali added the persistence Secure persistence feature label Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX conversion #181

ONNX conversion #181

adrinjalali commented Oct 10, 2022

BenjaminBossan commented Oct 13, 2022

adrinjalali commented Oct 17, 2022

omar-araboghli commented Feb 13, 2023

BenjaminBossan commented Feb 13, 2023

omar-araboghli commented Feb 14, 2023

ONNX conversion #181

ONNX conversion #181

Comments

adrinjalali commented Oct 10, 2022

BenjaminBossan commented Oct 13, 2022

adrinjalali commented Oct 17, 2022

omar-araboghli commented Feb 13, 2023

BenjaminBossan commented Feb 13, 2023

omar-araboghli commented Feb 14, 2023