-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using baikal steps for applying transformations without Model #23
Comments
Hi there. baikal can handle not only transformations but also predictions, so you can make non-linear pipelines combining both (the example in the README shows a pipeline that does that). By default, baikal will detect and use either # Assume you have a class _MyClass that implements
# some_method that does some interesting computation
class _MyClass:
def __init__(self, ...):
...
def some_method(self, X):
# calculate y from X
return y
# Make the step from _MyClass
MyClass = make_step(_MyClass)
x = Input()
y = MyClass(function="some_method", name="myclass")(x)
model = Model(x, y)
# When doing model.predict, the myclass step will apply some_method on x I wrote the example above based on the API of 0.2.0. The upcoming 0.3.0 version that I'm planning to release soon, however, will introduce a backwards-incompatible API, but it will allow you to reuse steps on different inputs and specify a different function in each case. This is useful, for example, for applying down in the pipeline transformations that were learned up in the pipeline (see the |
Thanks for the example! That probably does do what I want then, but the call to model.predict seems a bit contrived if I'm only using baikal to do transformations. Maybe a |
Yes, that's a valid point. It is weird to call I guess that you want to compose several transformers in models that are further composed into bigger transformer models, so having # Written on 0.2.0. In 0.3.0 this would be written slightly different.
class TransformerModel(baikal.Model):
def transform(self, X, outputs=None):
# Or you could also override `Model._build` and add this check there
if not all(step.function == step.transform for step in self.graph)
raise RuntimeError("All steps must be transformers")
return self.predict(X, outputs=outputs) |
Hmm, I didn't realise that the Sklearn pipeline also doesn't have a transform, good point. It does have a fit_transform though. |
Closing due to inactivity. If you have any other questions feel free to re-open. |
In some of our projects, we have ETL/preprocessing pipelines that take multiple inputs and produce a single output dataset. In some current implementations we've been using the scikit-learn transformer/pipeline API to transform individual datasets before then combining them with a join/merge and applying some (optional) postprocessing on the merged dataset using another sklearn pipeline.
A drawback of this approach is that we have to intersperse our transformer steps with merges, which don't fit in the sklearn pipeline API. Baikal would seem like a nice approach for defining (non-linear) transfomer pipelines that take multiple inputs, but it doesn't seem as if you can use baikal for only performing transformations (e.g. .transform(..) in the sklearn API).
Am I missing something in the API, or would this be something that might be interesting to include for future development?
The text was updated successfully, but these errors were encountered: