Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the code #6

Open
AliKarimi95 opened this issue Jul 3, 2022 · 1 comment
Open

Questions about the code #6

AliKarimi95 opened this issue Jul 3, 2022 · 1 comment

Comments

@AliKarimi95
Copy link

In the code, there is a 5-elements vector representing each state's logits for each pitch. First of all, what is the label of each state? Creating an enum to represent each state would increase the code readability. BTW, I think the order of states is

  • 0: off
  • 1: offset
  • 2: on
  • 3: onset
  • 4: re-onset.

Is this correct?

And, why do you double the logits of the two last states?

language_out[0,0,:,3:5] *= 2

@jdasam
Copy link
Owner

jdasam commented Jul 3, 2022

Hello,
sorry for the bad documentation.

As you said, the order of states is as you said.
The reason I added the language_out[0,0,:,3:5] *= 2 was to give additional weight for the onset (and reonset), so that the model can achieve higher recall. Of course it will degrade the precision, but considering the acoustical environment of where I had to demonstrate this system, I found this compensation makes a preferable result. You can change or delete it based on your use scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants