Avoid passing `deterministic` to complicated nested modules by using variable dict #2928

JyChang012 · 2023-03-03T20:26:38Z

JyChang012
Mar 3, 2023

I am recently working with modules with complicated nested structures.

I find it tedious to have to pass parameters like deterministic all the way down to those stochastic child modules.

In this WMT example, deterministic appears 21 times in code. In contrast, we simply use model.train() to set all child modules to train mode in PyTorch, no need to explicitly pass them.

I think we can do similar here in Flax, by making use of the variable dict and passing their global default values to apply, like this

class Dropout(Module):
    rate: float
    broadcast_dims: Sequence[int] = ()
    deterministic: Optional[bool] = None

    @compact
    def __call__(self, inputs, deterministic: Optional[bool] = None):
        # use variable `training` to indicate deterministic
        eval_mode = not self.variable(col='properties',
                                     name='training').value \
            if self.has_variable(col='properties', name='training') \
            else \
            None
        
        deterministic = merge_param('deterministic',
                                    self.deterministic,
                                    deterministic,
                                    eval_mode)
        ...
        # apply dropout if not deterministic


class StochasticDenseBlock(Module):
    @compact
    def __call__(self, inputs):
        # no need to pass deterministic to Dropout
        x = Dense(5)(inputs)
        x = Dropout(.5)(x)
        return relu(x)


class Model(Module):
    out_layer: Module
    
    @compact
    def __call__(self, inputs):
        # no need to pass deterministic to out_layer or any DenseBlock
        x = StochasticDenseBlock()(inputs)
        x = StochasticDenseBlock()(x)
        x = StochasticDenseBlock()(x)
        x = self.out_layer(x)
        return x

mdl = Model(StochasticDenseBlock())
rng = random.PRNGKey(0)

x = np.ones(10)
params = mdl.init(
    rngs=dict(params=rng),
    inputs=x,
    default={'properties': {'training': False}}  # pass global default value for variables absent in variable dict, do not need to have the same nested module structure.
)['params']

def mdl_apply(inputs, training, rng=None):
    return mdl.apply(dict(params=params),
                     inputs=inputs,
                     rngs=dict(dropout=rng),
                     default={'properties': {'training': training}})

mdl_apply_train = jit(partial(mdl_apply, training=True))
mdl_apply_eval = jit(partial(mdl_apply, training=False))

y1 = mdl_apply_train(x, rng=rng)
y2 = mdl_apply_eval(x)

Following this style, we only need to handle deterministic when applying the top-level module and in modules that directly use deterministic.

This change maintains backward compatibility to code that passes deterministic explicitly.

I have not taken a deep look at Flax internals, but we can probably add a global_defalut_variables field to Scope and propagate it down to children.

A similar question has been asked here #1561

Answered by cgarciae

Mar 6, 2023

First thing to note is that Flax is very explicit about everything, it doesn't try to do anything for you to give you maximum control. That said, I share the sentiment and in the past created the Scope Flags FLIP (#2131) to try to minimize passing down these parameters, take a look at some of the comments.

The current situation is that flags are indeed implemented (Module.scope.flags exists) but currently we only use it to power the Module.is_initializing method, but we don't expose them and our layers don't use them (apart from their use of is_initializing). In theory we could have something like this:

y = module.apply({'params': params}, x, flags={'deterministic': False})

I'll try to br…

View full answer

cgarciae · 2023-03-06T14:56:51Z

cgarciae
Mar 6, 2023
Maintainer

First thing to note is that Flax is very explicit about everything, it doesn't try to do anything for you to give you maximum control. That said, I share the sentiment and in the past created the Scope Flags FLIP (#2131) to try to minimize passing down these parameters, take a look at some of the comments.

The current situation is that flags are indeed implemented (Module.scope.flags exists) but currently we only use it to power the Module.is_initializing method, but we don't expose them and our layers don't use them (apart from their use of is_initializing). In theory we could have something like this:

y = module.apply({'params': params}, x, flags={'deterministic': False})

I'll try to bring up this idea again with the team pointing to this use case to see if there is renewed interest.

1 reply

JyChang012 Mar 14, 2023
Author

Thanks! Exposing the flags API would definitely help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid passing `deterministic` to complicated nested modules by using variable dict #2928

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Avoid passing deterministic to complicated nested modules by using variable dict #2928

JyChang012 Mar 3, 2023

Replies: 1 comment · 1 reply

cgarciae Mar 6, 2023 Maintainer

JyChang012 Mar 14, 2023 Author

Avoid passing `deterministic` to complicated nested modules by using variable dict #2928

JyChang012
Mar 3, 2023

Replies: 1 comment 1 reply

cgarciae
Mar 6, 2023
Maintainer

JyChang012 Mar 14, 2023
Author