Questions about the code #6

AliKarimi95 · 2022-07-03T09:43:00Z

In the code, there is a 5-elements vector representing each state's logits for each pitch. First of all, what is the label of each state? Creating an enum to represent each state would increase the code readability. BTW, I think the order of states is

0: off
1: offset
2: on
3: onset
4: re-onset.

Is this correct?

And, why do you double the logits of the two last states?

online_amt/transcribe.py

Line 111 in cbcc906

language_out[0,0,:,3:5] *= 2

jdasam · 2022-07-03T13:00:44Z

Hello,
sorry for the bad documentation.

As you said, the order of states is as you said.
The reason I added the language_out[0,0,:,3:5] *= 2 was to give additional weight for the onset (and reonset), so that the model can achieve higher recall. Of course it will degrade the precision, but considering the acoustical environment of where I had to demonstrate this system, I found this compensation makes a preferable result. You can change or delete it based on your use scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the code #6

Questions about the code #6

AliKarimi95 commented Jul 3, 2022

jdasam commented Jul 3, 2022

Questions about the code #6

Questions about the code #6

Comments

AliKarimi95 commented Jul 3, 2022

jdasam commented Jul 3, 2022