Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.0.0-beta.1 version has low accuracy #40

Open
haozes opened this issue Jul 27, 2017 · 5 comments
Open

1.0.0-beta.1 version has low accuracy #40

haozes opened this issue Jul 27, 2017 · 5 comments

Comments

@haozes
Copy link

haozes commented Jul 27, 2017

I write a demo with 1.0.0-beta.1 version. It has low accuracy.
Is it sample rate config problem?

Decorder.swift set input 32bit,44.1khz ,output 16bit,44.1khz
`

    let formatIn = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 1, interleaved: false)            

    let formatOut = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 44100, channels: 1, interleaved: false)        

`

And in your unittest file:basic.swift testSpeechFromFile() method, goforward.raw is 8K,32bit ,but it pass unittest! why?

@BrunoBerisso
Copy link
Contributor

Hi.

If you are using decodeSpeechAtPath in your demo you need to make sure the input format match the format you pass in to the Config. The input and output formats you past here are not the one used to decode the speech in a file.

It you tell me more about your test maybe I can help you improve your results :)

@haozes
Copy link
Author

haozes commented Jul 28, 2017

I think i got the problem
pocketsphinx needs 16KHZ samples. the iPhone7/iOS10 default record audio format is 44.1K 2 channels sample.here is my code

    let formatIn =  input.inputFormat(forBus: 0)

    let formatOut = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: false)
    let bufferMapper = AVAudioConverter(from: formatIn, to: formatOut)
    //let ratio = (formatIn.sampleRate * Double(formatIn.channelCount)) / (formatOut.sampleRate * Double(formatOut.channelCount))
    mixer.installTap(onBus: 0, bufferSize: 4096, format: formatIn, block: {
        [unowned self] (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) in

        //let capacity = UInt32(Double(buffer.frameCapacity)/ratio)
        let sphinxBuffer = AVAudioPCMBuffer(pcmFormat: formatOut, frameCapacity: buffer.frameCapacity)
        let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
            outStatus.pointee = AVAudioConverterInputStatus.haveData
            return buffer
        }
        
        var error: NSError? = nil
        let status:AVAudioConverterOutputStatus = bufferMapper.convert(to: sphinxBuffer, error: &error, withInputFrom: inputBlock)
        if let e = error {
            print(e)
            return
        }
        if(status == AVAudioConverterOutputStatus.haveData){
            let audioData = sphinxBuffer.toData()
            self.process_raw(audioData)
        }
        
        print("speechState:\(self.speechState)")

        if self.speechState == .utterance {

            self.end_utt()
            let hypothesis = self.get_hyp()

            DispatchQueue.main.async {
                utteranceComplete(hypothesis)
            }

            self.start_utt()
        }
    })

It works now ,but accuracy is still not very high

@BrunoBerisso
Copy link
Contributor

Awesome, do you test on a real device?

The accuracy can be related to the model you are using. Keep in mind that if you want to use CMUSphinx for something like dictation you need to narrow the LM to some domain to get near to acceptable results. Using a general LM will not work as expected.

@haozes
Copy link
Author

haozes commented Jul 29, 2017

Yes, I test it on real device. I use it to keyword spotting(Chinese Language),the model is download form pocketsphinx package.

PS: TLSphinx only output result after speech stop , output keywords immediately when recognize keyword well be useful

@BrunoBerisso
Copy link
Contributor

Do you mind open a PR with your changes? I think you may fix #24 witch is a long standing issue 😄 I also add #41 to try to get an hypothesis before the end of speech. I can't get on this now but maybe someone else can.

About the low accuracy, how are you performing the test? do you try the same test with pocketsphinx_continuous? (more on this here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants