Using baikal steps for applying transformations without Model #23

jrderuiter · 2020-01-15T15:40:18Z

In some of our projects, we have ETL/preprocessing pipelines that take multiple inputs and produce a single output dataset. In some current implementations we've been using the scikit-learn transformer/pipeline API to transform individual datasets before then combining them with a join/merge and applying some (optional) postprocessing on the merged dataset using another sklearn pipeline.

A drawback of this approach is that we have to intersperse our transformer steps with merges, which don't fit in the sklearn pipeline API. Baikal would seem like a nice approach for defining (non-linear) transfomer pipelines that take multiple inputs, but it doesn't seem as if you can use baikal for only performing transformations (e.g. .transform(..) in the sklearn API).

Am I missing something in the API, or would this be something that might be interesting to include for future development?

alegonz · 2020-01-16T13:12:32Z

Hi there.

baikal can handle not only transformations but also predictions, so you can make non-linear pipelines combining both (the example in the README shows a pipeline that does that). By default, baikal will detect and use either predict or transform (if the class implements either), but you can specify any function you like via the function argument when instantiating the step. For example:

# Assume you have a class _MyClass that implements
# some_method that does some interesting computation

class _MyClass:
    def __init__(self, ...):
        ...

    def some_method(self, X):
        # calculate y from X
        return y

# Make the step from _MyClass
MyClass = make_step(_MyClass)

x = Input()
y = MyClass(function="some_method", name="myclass")(x)
model = Model(x, y)

# When doing model.predict, the myclass step will apply some_method on x

I wrote the example above based on the API of 0.2.0. The upcoming 0.3.0 version that I'm planning to release soon, however, will introduce a backwards-incompatible API, but it will allow you to reuse steps on different inputs and specify a different function in each case. This is useful, for example, for applying down in the pipeline transformations that were learned up in the pipeline (see the transformed_target example in the master branch). I give more details about 0.3.0 in Issue #16.

jrderuiter · 2020-01-17T09:51:10Z

Thanks for the example! That probably does do what I want then, but the call to model.predict seems a bit contrived if I'm only using baikal to do transformations. Maybe a pipeline.transform method would seem a bit more natural?

alegonz · 2020-01-19T01:20:50Z

Yes, that's a valid point. It is weird to call predict on a model that is composed entirely of transformer steps. But if transform would be implemented, then you have the opposite problem: how would that method behave for models that have both transformers and predictors? When I defined the API I picked predict because 1) it seemed the least weird, 2) it is similar to sklearn's Pipeline (which does not have a pipeline.transform either) and to Keras' Model.predict so people would be more familiar with it.

I guess that you want to compose several transformers in models that are further composed into bigger transformer models, so having Model.transform would be convenient and more readable. In that case you could subclass from Model to add the behavior specific for your application:

# Written on 0.2.0. In 0.3.0 this would be written slightly different.

class TransformerModel(baikal.Model):
    def transform(self, X, outputs=None):
        # Or you could also override `Model._build` and add this check there
        if not all(step.function == step.transform for step in self.graph)
            raise RuntimeError("All steps must be transformers")
       
        return self.predict(X, outputs=outputs)

jrderuiter · 2020-01-20T08:50:15Z

Hmm, I didn't realise that the Sklearn pipeline also doesn't have a transform, good point. It does have a fit_transform though.

alegonz · 2020-11-15T09:18:44Z

Closing due to inactivity. If you have any other questions feel free to re-open.

alegonz closed this as completed Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using baikal steps for applying transformations without Model #23

Using baikal steps for applying transformations without Model #23

jrderuiter commented Jan 15, 2020

alegonz commented Jan 16, 2020 •

edited

Loading

jrderuiter commented Jan 17, 2020

alegonz commented Jan 19, 2020 •

edited

Loading

jrderuiter commented Jan 20, 2020

alegonz commented Nov 15, 2020

Using baikal steps for applying transformations without Model #23

Using baikal steps for applying transformations without Model #23

Comments

jrderuiter commented Jan 15, 2020

alegonz commented Jan 16, 2020 • edited Loading

jrderuiter commented Jan 17, 2020

alegonz commented Jan 19, 2020 • edited Loading

jrderuiter commented Jan 20, 2020

alegonz commented Nov 15, 2020

alegonz commented Jan 16, 2020 •

edited

Loading

alegonz commented Jan 19, 2020 •

edited

Loading