-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add padding_value
attribute to features
#1020
base: master
Are you sure you want to change the base?
Conversation
That's smart! As a side note I was also considering if we should support more nuanced padding strategies, e.g. with mean example/batch value (so that it doesn't influence things such as layer norm), but I'm not sure if there's much benefit to it. |
lhotse/features/mixer.py
Outdated
self.padding_value = padding_value | ||
self.padding_value = ( | ||
feature_extractor.padding_value | ||
if hasattr(feature_extractor, "padding_value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since this behavior is undocumented in this PRs current shape, it may lead to some surprises / be hard to discover for implementers of other feature extractors how to handle this. What if we implemented this property on the base feature extractor class and made it return None
there, and check for None here instead? Then we can add a proper docstring in feature extractor / documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. I will update it.
@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None: | |||
def feature_dim(self, sampling_rate: int) -> int: | |||
return self.config.mel_opts.num_bins | |||
|
|||
@property | |||
def padding_value(self) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csukuangfj could you check if this looks correct?
lhotse/features/base.py
Outdated
@@ -90,6 +93,10 @@ def frame_shift(self) -> Seconds: | |||
def feature_dim(self, sampling_rate: int) -> int: | |||
... | |||
|
|||
@property | |||
def padding_value(self) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint for the return value does not match the actual return value.
@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None: | |||
def feature_dim(self, sampling_rate: int) -> int: | |||
return self.config.mel_opts.num_bins | |||
|
|||
@property | |||
def padding_value(self) -> float: | |||
return -1000.0 if self.config.use_log_fbank else EPSILON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should -1000 be replaced with
Line 46 in 14b51f6
LOG_EPSILON = math.log(EPSILON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally it should, but I was basing this on the default value used in the FeatureMixer
.
lhotse/features/mixer.py
Outdated
@@ -41,7 +41,8 @@ def __init__( | |||
:param frame_shift: Required to correctly compute offset and padding during the mix. | |||
:param padding_value: The value used to pad the shorter features during the mix. | |||
This value is adequate only for log space features. For non-log space features, | |||
e.g. energies, use either 0 or a small positive value like 1e-5. | |||
e.g. energies, use either 0 or a small positive value like 1e-5. This value will be | |||
ignore if the ``feature_extractor`` has a ``padding_value`` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore if the ``feature_extractor`` has a ``padding_value`` attribute. | |
ignored if the ``feature_extractor`` has a ``padding_value`` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this user-provided argument have a higher priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the default value for this option can be set to None. If it is None, we use the value from the feature extractor, otherwise we use the user-provided value. But I'm not sure what should be done if they are both None. Raise a error? @pzelasko what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your solution. Raising an error in this case is acceptable if you can add the default padding value attributes to the remaining feature extractors.
I have propagated the padding value related changes to all the affected parts of the codebase. There are a few subtle things which are causing some of the test cases to fail. Basically, when we add a padding cut to a mixed cut such that the resulting duration increases, there is double addition of padding that happens. This is because the original cut is first extended to the new duration with a padding value, and then the feats of the mixed-in cut are added. One way to avoid this is to pass an additional |
This would still cause inconsistencies though. Consider the following case: # c is a 10s MonoCut, for example
m1 = c.mix(c)
m2 = m1.mix(c)
m1_padded = m1.pad(duration=15)
m2_padded = m2.pad(duration=15) If we keep adding tracks to the mixture, the feature value of the padding region will keep increasing, whereas ideally we only want padding to happen once. |
This was not caught in the test case before because the default padding value used was -1000, and |
I suppose this means that the easy solution is to use a very large negative number (like -1000) instead of LOG_EPSILON (which is -23) for padding value of log scale features. |
Hmm, now I remember that’s why I hardcoded -1000 in the feature mixer in the first place. I am a bit afraid that changing -23 (which is log(energy_floor) for a default value of energy floor in our feature extractors) to -1000 could somehow negatively affect the models as a „too out of distribution” value, especially if somebody uses layer norm or instance MVN. I agree the most elegant solution would be to pad once and have the feature mixer resolve that in a „smart” way to avoid the addition of energy floors, but maybe this is not really an issue? WDYT? |
@@ -54,6 +54,10 @@ def _feature_fn(self, *args, **kwargs): | |||
def feature_dim(self, sampling_rate: int) -> int: | |||
return self.config.num_mel_bins | |||
|
|||
@property | |||
def padding_value(self) -> float: | |||
return -1000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to choose -1000.0 instead of LOG_EPSILON
or just 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the discussion in the main thread.
At the moment, when we are mixing features, we expect the user to specify a padding value. It would be better if the feature extractor itself came with a default value that should be used for padding.