You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code, there is a 5-elements vector representing each state's logits for each pitch. First of all, what is the label of each state? Creating an enum to represent each state would increase the code readability. BTW, I think the order of states is
0: off
1: offset
2: on
3: onset
4: re-onset.
Is this correct?
And, why do you double the logits of the two last states?
As you said, the order of states is as you said.
The reason I added the language_out[0,0,:,3:5] *= 2 was to give additional weight for the onset (and reonset), so that the model can achieve higher recall. Of course it will degrade the precision, but considering the acoustical environment of where I had to demonstrate this system, I found this compensation makes a preferable result. You can change or delete it based on your use scenario.
In the code, there is a 5-elements vector representing each state's logits for each pitch. First of all, what is the label of each state? Creating an enum to represent each state would increase the code readability. BTW, I think the order of states is
Is this correct?
And, why do you double the logits of the two last states?
online_amt/transcribe.py
Line 111 in cbcc906
The text was updated successfully, but these errors were encountered: