Add `padding_value` attribute to features #1020

desh2608 · 2023-04-06T14:30:05Z

At the moment, when we are mixing features, we expect the user to specify a padding value. It would be better if the feature extractor itself came with a default value that should be used for padding.

pzelasko · 2023-04-06T14:44:25Z

That's smart!

As a side note I was also considering if we should support more nuanced padding strategies, e.g. with mean example/batch value (so that it doesn't influence things such as layer norm), but I'm not sure if there's much benefit to it.

pzelasko · 2023-04-06T14:47:10Z

lhotse/features/mixer.py

-        self.padding_value = padding_value
+        self.padding_value = (
+            feature_extractor.padding_value
+            if hasattr(feature_extractor, "padding_value")


I think since this behavior is undocumented in this PRs current shape, it may lead to some surprises / be hard to discover for implementers of other feature extractors how to handle this. What if we implemented this property on the base feature extractor class and made it return None there, and check for None here instead? Then we can add a proper docstring in feature extractor / documentation.

Yeah, that makes sense. I will update it.

… pad_value

desh2608 · 2023-04-06T15:11:50Z

lhotse/features/kaldifeat.py

@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None:
    def feature_dim(self, sampling_rate: int) -> int:
        return self.config.mel_opts.num_bins

+    @property
+    def padding_value(self) -> float:


@csukuangfj could you check if this looks correct?

csukuangfj · 2023-04-06T16:14:58Z

lhotse/features/base.py

@@ -90,6 +93,10 @@ def frame_shift(self) -> Seconds:
    def feature_dim(self, sampling_rate: int) -> int:
        ...

+    @property
+    def padding_value(self) -> float:


The type hint for the return value does not match the actual return value.

csukuangfj · 2023-04-06T16:16:25Z

lhotse/features/kaldifeat.py

@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None:
    def feature_dim(self, sampling_rate: int) -> int:
        return self.config.mel_opts.num_bins

+    @property
+    def padding_value(self) -> float:
+        return -1000.0 if self.config.use_log_fbank else EPSILON


Should -1000 be replaced with

lhotse/lhotse/utils.py

Line 46 in 14b51f6

LOG_EPSILON = math.log(EPSILON)

Yes, ideally it should, but I was basing this on the default value used in the FeatureMixer.

csukuangfj · 2023-04-06T16:17:18Z

lhotse/features/mixer.py

@@ -41,7 +41,8 @@ def __init__(
        :param frame_shift: Required to correctly compute offset and padding during the mix.
        :param padding_value: The value used to pad the shorter features during the mix.
            This value is adequate only for log space features. For non-log space features,
-            e.g. energies, use either 0 or a small positive value like 1e-5.
+            e.g. energies, use either 0 or a small positive value like 1e-5. This value will be
+            ignore if the ``feature_extractor`` has a ``padding_value`` attribute.


Suggested change

ignore if the ``feature_extractor`` has a ``padding_value`` attribute.

ignored if the ``feature_extractor`` has a ``padding_value`` attribute.

Should this user-provided argument have a higher priority?

Perhaps the default value for this option can be set to None. If it is None, we use the value from the feature extractor, otherwise we use the user-provided value. But I'm not sure what should be done if they are both None. Raise a error? @pzelasko what do you think?

I like your solution. Raising an error in this case is acceptable if you can add the default padding value attributes to the remaining feature extractors.

… pad_value

…ad_value

desh2608 · 2023-04-07T02:31:50Z

I have propagated the padding value related changes to all the affected parts of the codebase. There are a few subtle things which are causing some of the test cases to fail. Basically, when we add a padding cut to a mixed cut such that the resulting duration increases, there is double addition of padding that happens. This is because the original cut is first extended to the new duration with a padding value, and then the feats of the mixed-in cut are added.

One way to avoid this is to pass an additional is_padding option to add_to_mix() to avoid this double addition.

desh2608 · 2023-04-07T02:38:34Z

I have propagated the padding value related changes to all the affected parts of the codebase. There are a few subtle things which are causing some of the test cases to fail. Basically, when we add a padding cut to a mixed cut such that the resulting duration increases, there is double addition of padding that happens. This is because the original cut is first extended to the new duration with a padding value, and then the feats of the mixed-in cut are added.

One way to avoid this is to pass an additional is_padding option to add_to_mix() to avoid this double addition.

This would still cause inconsistencies though. Consider the following case:

# c is a 10s MonoCut, for example
m1  = c.mix(c)
m2 = m1.mix(c)
m1_padded = m1.pad(duration=15)
m2_padded = m2.pad(duration=15)

If we keep adding tracks to the mixture, the feature value of the padding region will keep increasing, whereas ideally we only want padding to happen once.

desh2608 · 2023-04-07T02:43:21Z

This was not caught in the test case before because the default padding value used was -1000, and np.log(EPSILON + np.exp(-1000) + np.exp(-1000) + ...) is still almost equal to LOG_EPSILON.

desh2608 · 2023-04-07T02:47:13Z

This was not caught in the test case before because the default padding value used was -1000, and np.log(EPSILON + np.exp(-1000) + np.exp(-1000) + ...) is still almost equal to LOG_EPSILON.

I suppose this means that the easy solution is to use a very large negative number (like -1000) instead of LOG_EPSILON (which is -23) for padding value of log scale features.

pzelasko · 2023-04-07T03:27:29Z

Hmm, now I remember that’s why I hardcoded -1000 in the feature mixer in the first place. I am a bit afraid that changing -23 (which is log(energy_floor) for a default value of energy floor in our feature extractors) to -1000 could somehow negatively affect the models as a „too out of distribution” value, especially if somebody uses layer norm or instance MVN. I agree the most elegant solution would be to pad once and have the feature mixer resolve that in a „smart” way to avoid the addition of energy floors, but maybe this is not really an issue? WDYT?

csukuangfj · 2023-04-07T03:36:03Z

lhotse/features/fbank.py

@@ -54,6 +54,10 @@ def _feature_fn(self, *args, **kwargs):
    def feature_dim(self, sampling_rate: int) -> int:
        return self.config.num_mel_bins

+    @property
+    def padding_value(self) -> float:
+        return -1000.0


Is there a reason to choose -1000.0 instead of LOG_EPSILON or just 0?

Check the discussion in the main thread.

add padding value attribute to features

7b6bda7

pzelasko reviewed Apr 6, 2023

View reviewed changes

pzelasko added this to the v1.14 milestone Apr 6, 2023

desh2608 added 3 commits April 6, 2023 11:05

changes suggested by @pzelasko

6c8c918

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

1b88cfa

… pad_value

add padding_value for kaldifeat features

14b51f6

desh2608 commented Apr 6, 2023

View reviewed changes

csukuangfj reviewed Apr 6, 2023

View reviewed changes

desh2608 added 6 commits April 6, 2023 12:24

Merge branch 'master' into pad_value

907447c

Merge branch 'master' of https://github.com/lhotse-speech/lhotse into…

56ec757

… pad_value

add padding_value for all feature extractors

4953907

Merge branch 'pad_value' of https://github.com/desh2608/lhotse into p…

de6071e

…ad_value

add padding_value to kaldi extractors

fd7fd3c

propagate padding feat value changes

81c7a86

default pad of -1000 for log-scale features

96c5477

csukuangfj reviewed Apr 7, 2023

View reviewed changes

Merge branch 'master' into pad_value

33f5bef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `padding_value` attribute to features #1020

Add `padding_value` attribute to features #1020

desh2608 commented Apr 6, 2023

pzelasko commented Apr 6, 2023

pzelasko Apr 6, 2023

desh2608 Apr 6, 2023

desh2608 Apr 6, 2023

csukuangfj Apr 6, 2023

csukuangfj Apr 6, 2023

desh2608 Apr 6, 2023

csukuangfj Apr 6, 2023

csukuangfj Apr 6, 2023

desh2608 Apr 6, 2023

pzelasko Apr 6, 2023

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

pzelasko commented Apr 7, 2023

csukuangfj Apr 7, 2023

desh2608 Apr 7, 2023

	ignore if the ``feature_extractor`` has a ``padding_value`` attribute.
	ignored if the ``feature_extractor`` has a ``padding_value`` attribute.

Add padding_value attribute to features #1020

Are you sure you want to change the base?

Add padding_value attribute to features #1020

Conversation

desh2608 commented Apr 6, 2023

pzelasko commented Apr 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

desh2608 commented Apr 7, 2023

pzelasko commented Apr 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `padding_value` attribute to features #1020

Add `padding_value` attribute to features #1020