TTS Hallucinations in shorter phrases #1695

adem-rguez · 2025-01-08T07:54:55Z

i am running tts sherpa-onnx in unity (c#), i am having a problem where in the shorter sentences the generated audio tends to add extra audio containing gibberish at the end..

example long sentence (works fine) : "Bonjour monsieur, comment allez-vous aujourd’hui ? J’espère que vous passez une excellente journée !"
audio file: long sentence example

example short sentence (adds gibberish at the end): bonjour monsieur
audio file: short sentence example

in these examples i used umpc voice for french, but the same issues exists on other models.
for example on the libritts_r model when you generate "hello sir" it works, but when you generate "hello" immediately after it, it adds the previous text sometimes or part of it "hello sir" or "hello si".

csukuangfj · 2025-01-08T07:57:01Z

but when you generate "hello" immediately after it

Could you describe in detail how you tried it?

Do you first generate

hello sir

and then you invoke a second call to generate

hello

or

hello sir hello

?

adem-rguez · 2025-01-08T08:04:36Z

in the english example:
first generated:

hello sir

then tried 3 times the text:

hello

i noticed that 2 of 3 times it adds "sir" or "si" after the "hello" ( "hello sir" or "hello si")
but then if i generate a longer phrase "hello how are you?" it doesn't hallucinate!
try this in my apk:
download unity tts apk

meanwhile in the french exaple it adds stuff the first time!

csukuangfj · 2025-01-08T08:06:35Z

Can you reproduce it with our APK?
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

I think there is a bug in your apk if what you described can be reproduced with your APK.

adem-rguez · 2025-01-08T08:14:57Z

you are right, it doesn't happen on your apk, the problem for me isn't just in the apk but even inside unity, using the code i shared earlier in the other thread. i am in need for french models in particular, the stuff they add at the end is not normal, and there are models that don't work at all on short sentences (they generate just distorted audio) like fr-FR_mls_medium.onnx.
for the french example i sent i used fr-FR-upmc-medium, and i used the espeak-ng-data folder, model path, and tokens.txt path, was i missing something?

csukuangfj · 2025-01-08T08:15:08Z

I just tried with your apk and I think there is a bug in your code.

Please make sure you have overwritten the buffer for the previous call .

Don't overwrite the buffer partially.

csukuangfj · 2025-01-08T08:15:54Z

fr-FR_mls_medium.onnx

Please don't use models containing mls in the filename.

I think I have deleted all models containing mls in its name.

csukuangfj · 2025-01-08T08:18:11Z

Or make sure you have cleared the buffer containing samples of the previous call before you play the samples of the current text.

adem-rguez · 2025-01-08T08:34:21Z

the buffer is cleared already:

/// <summary>
    /// 1) Splits the text into sentences using multiple delimiters,
    /// 2) For each sentence, spawns a background thread to generate TTS,
    /// 3) Waits for generation to finish (without freezing the main thread),
    /// 4) Plays the resulting clip in order.
    /// </summary>
    private IEnumerator CoPlayTextBySentenceAsync(string text)
    {
        // More delimiters: period, question mark, exclamation, semicolon, colon
        // We also handle multiple punctuation in a row, etc.
        // This uses Regex to split on punctuation [.!?;:]+ 
        // Then trim the results and remove empties.
        // Split the text while keeping the punctuation with the preceding text
        string[] sentences = Regex.Matches(text, @"[^\.!\?;:]+[\.!\?;:]*")
            .Cast<Match>()
            .Select(m => m.Value.Trim())
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .ToArray();


        if (sentences.Length == 0)
        {
            Debug.LogWarning("No valid sentences found in input text.");
            yield break;
        }

        Debug.Log("senteces #"+ sentences.Length.ToString() );

        foreach (string sentence in sentences)
        {
            Debug.Log("[Background TTS] Generating:"+ sentence );
            
            // Prepare a place to store the generated float[] 
            float[] generatedSamples = null;
            bool generationDone = false;

            // Run .Generate(...) on a background thread
            Thread t = new Thread(() =>
            {
                // Generate the audio for this sentence
                OfflineTtsGeneratedAudio generated = offlineTts.Generate(sentence, speed, speakerId);
                generatedSamples = generated.Samples;
                generationDone = true;
            });
            t.Start();

            // Wait until the thread signals it's done
            yield return new WaitUntil(() => generationDone);

            // Back on the main thread, we create the AudioClip and play it
            if (generatedSamples == null || generatedSamples.Length == 0)
            {
                Debug.LogWarning("Generated empty audio for a sentence. Skipping...");
                continue;
            }

            AudioClip clip = AudioClip.Create(
                "SherpaOnnxTTS-SentenceAsync",
                generatedSamples.Length,
                1,
                offlineTts.SampleRate,
                false
            );
            clip.SetData(generatedSamples, 0);

            sentenceAudioSource.clip = clip;
            sentenceAudioSource.Play();
            Debug.Log($"Playing sentence: \"{sentence}\"  length = {clip.length:F2}s");

            // Wait until playback finishes
            while (sentenceAudioSource.isPlaying)
                yield return null;
        }

        Debug.Log("All sentences have been generated (background) and played sequentially.");
    }

also this is if we are talking about the apk, but in the french version it's different, would you like me to provide an apk for french as well?

csukuangfj · 2025-01-08T08:37:54Z

but in the french version it's different,

Could you describe the differences? Does the APK for French use a different set of code from the APK for English?

adem-rguez · 2025-01-08T08:38:31Z

no the same, just a different model, with different tokens file, what i mean by different, is the issue

adem-rguez · 2025-01-08T08:41:18Z

from the first time i generate an audio in french it hallucinates other stuff in the end of the text, so it's not a buffer issue for french, i just mentioned the english apk thinking it was related

csukuangfj · 2025-01-08T08:57:44Z

I don't see any issues from your posted code.

csukuangfj · 2025-01-08T08:58:20Z

foreach (string sentence in sentences)

Is each sentence processed sequentially, not in parallel?

adem-rguez · 2025-01-08T10:02:51Z

yes, sequentially, since the tts functions don't support streaming right now, it was the only option to make the generation faster

csukuangfj · 2025-01-08T10:26:32Z

it was the only option to make the generation faster

No, we support passing a callback to C++.

Inside C++, it processes the text sentence by sentence. After processing a sentence, the callback is invoked with the generated samples for this sentence.

Please try our Android APK first. You will find it plays almost immediately no matter how long the given text is.

Remeber to use the TTS APK, not the TTS Engine APK.

adem-rguez · 2025-01-08T10:34:05Z

in the script i provided in the other thread there is a function that used that:

/// <summary>
    /// Attempted "streaming" approach. The callback is called only once in practice
    /// for the entire waveform, so it doesn't truly stream partial chunks.
    /// </summary>
    private void PlayTextStreamed(string text)
    {
        Debug.Log($"[Streaming] Generating TTS for text: '{text}'");

        int sampleRate = offlineTts.SampleRate;
        int maxAudioLengthInSamples = sampleRate * 300; // 5 min

        streamingClip = AudioClip.Create(
            "SherpaOnnxTTS-Streamed",
            maxAudioLengthInSamples,
            1,
            sampleRate,
            true,
            OnAudioRead,
            OnAudioSetPosition
        );

        if (streamingAudioSource == null)
            streamingAudioSource = gameObject.AddComponent<AudioSource>();

        streamingAudioSource.playOnAwake = false;
        streamingAudioSource.clip = streamingClip;
        streamingAudioSource.loop = false;

        streamingBuffer = new ConcurrentQueue<float>();
        samplesRead = 0;

        streamingAudioSource.Play();

        // This calls your callback, but typically only once for the entire wave
        offlineTts.GenerateWithCallback(text, speed, speakerId, MyTtsChunkCallback);

        Debug.Log("[Streaming] Playback started; awaiting streamed samples...");
    }

    private int MyTtsChunkCallback(System.IntPtr samplesPtr, int numSamples)
    {
        Debug.Log("chunk callback");
        if (numSamples <= 0)
            return 0;

        float[] chunk = new float[numSamples];
        System.Runtime.InteropServices.Marshal.Copy(samplesPtr, chunk, 0, numSamples);

        foreach (float sample in chunk)
            streamingBuffer.Enqueue(sample);

        return 0; 
    }

    private void OnAudioRead(float[] data)
    {
        for (int i = 0; i < data.Length; i++)
        {
            if (streamingBuffer.TryDequeue(out float sample))
            {
                data[i] = sample;
                samplesRead++;
            }
            else
            {
                data[i] = 0f; // fill silence
            }
        }
    }

    private void OnAudioSetPosition(int newPosition)
    {
        Debug.Log($"[Streaming] OnAudioSetPosition => {newPosition}");
    }

as you can see it's implementend with the generatewithcallback function, but when i use it the callback is only called once at the end.

here is an example:

also i don't think it's related to the hallucination problem i mentioned, sadly :(

csukuangfj · 2025-01-08T10:56:54Z

Could you enable the debug in tts model config and post the logs when you generate samples?

sherpa-onnx/sherpa-onnx/c-api/c-api.h

Line 916 in 0cb2db3

int32_t debug;

adem-rguez · 2025-01-08T11:13:13Z

i don't get any logs, that's the weird part, unity is not showing me any logs except the ones i made! am i doing something wrong?

// 1. Prepare the VITS model config
        var vitsConfig = new OfflineTtsVitsModelConfig
        {
            Model = BuildPath(modelPath),
            Lexicon = BuildPath(lexiconPath),
            Tokens = BuildPath(tokensPath),
            DataDir = BuildPath(espeakDir),
            DictDir = BuildPath(dictDirPath),

            NoiseScale = noiseScale,
            NoiseScaleW = noiseScaleW,
            LengthScale = lengthScale
        };

        // 2. Wrap it inside the ModelConfig
        var modelConfig = new OfflineTtsModelConfig
        {
            Vits = vitsConfig,
            NumThreads = numThreads,
            Debug = 1,
            Provider = provider
        };

        // 3. Create the top-level OfflineTtsConfig
        var ttsConfig = new OfflineTtsConfig
        {
            Model = modelConfig,
            RuleFsts = "",
            MaxNumSentences = maxNumSentences,
            RuleFars = ""
        };

        // 4. Instantiate the OfflineTts object
        Debug.Log("will create offline tts now!");
        offlineTts = new OfflineTts(ttsConfig);
        Debug.Log($"OfflineTts created! SampleRate: {offlineTts.SampleRate}, NumSpeakers: {offlineTts.NumSpeakers}");

csukuangfj · 2025-01-08T11:37:05Z

IIRC, you posted some error logs in your first issue in the other session. How did you get them?

adem-rguez · 2025-01-08T12:02:55Z

from log cat that was in an apk using logcat, for some reason unity doesn't show the errors directly, hold tight, i will use log cat again

adem-rguez · 2025-01-08T12:21:59Z

so this is from logcat:

this part isn't supposed to be there:

the raw text is having random stuff added to it..

this example might be easier to understand:

it had an "u" added to it, this was made using the generate function:

adem-rguez · 2025-01-08T12:24:58Z

it's sherpa that's logging that yellow raw text warning, but i am unable to get its stack trace

csukuangfj · 2025-01-09T02:22:22Z

            OfflineTtsGeneratedAudio generated = offlineTts.Generate(sentence, speed, speakerId);

Please show the code for offlineTts.Generate

adem-rguez · 2025-01-09T06:00:20Z

good morning, thank you for your reply!
it's read-only for me, i am using it straight from the nuget package, in order to modify for me, i would need to make a copy of it and use the copy:

#region Assembly sherpa-onnx, Version=1.10.38.0, Culture=neutral, PublicKeyToken=null
// D:\Unity Projects 2\Sherpa-onnx-Unity-main\Assets\Packages\org.k2fsa.sherpa.onnx.1.10.38\lib\netstandard2.0\sherpa-onnx.dll
// Decompiled with ICSharpCode.Decompiler 8.1.1.7464
#endregion

using System;
using System.Runtime.InteropServices;
using System.Text;

namespace SherpaOnnx;

public class OfflineTts : IDisposable
{
    private HandleRef _handle;

    public int SampleRate => SherpaOnnxOfflineTtsSampleRate(_handle.Handle);

    public int NumSpeakers => SherpaOnnxOfflineTtsNumSpeakers(_handle.Handle);

    public OfflineTts(OfflineTtsConfig config)
    {
        IntPtr handle = SherpaOnnxCreateOfflineTts(ref config);
        _handle = new HandleRef(this, handle);
    }

    public OfflineTtsGeneratedAudio Generate(string text, float speed, int speakerId)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(text);
        return new OfflineTtsGeneratedAudio(SherpaOnnxOfflineTtsGenerate(_handle.Handle, bytes, speakerId, speed));
    }

    public OfflineTtsGeneratedAudio GenerateWithCallback(string text, float speed, int speakerId, OfflineTtsCallback callback)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(text);
        return new OfflineTtsGeneratedAudio(SherpaOnnxOfflineTtsGenerateWithCallback(_handle.Handle, bytes, speakerId, speed, callback));
    }

    public void Dispose()
    {
        Cleanup();
        GC.SuppressFinalize(this);
    }

    ~OfflineTts()
    {
        Cleanup();
    }

    private void Cleanup()
    {
        SherpaOnnxDestroyOfflineTts(_handle.Handle);
        _handle = new HandleRef(this, IntPtr.Zero);
    }

    [DllImport("sherpa-onnx-c-api")]
    private static extern IntPtr SherpaOnnxCreateOfflineTts(ref OfflineTtsConfig config);

    [DllImport("sherpa-onnx-c-api")]
    private static extern void SherpaOnnxDestroyOfflineTts(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern int SherpaOnnxOfflineTtsSampleRate(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern int SherpaOnnxOfflineTtsNumSpeakers(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern IntPtr SherpaOnnxOfflineTtsGenerate(IntPtr handle, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1)] byte[] utf8Text, int sid, float speed);

    [DllImport("sherpa-onnx-c-api", CallingConvention = CallingConvention.Cdecl)]
    private static extern IntPtr SherpaOnnxOfflineTtsGenerateWithCallback(IntPtr handle, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1)] byte[] utf8Text, int sid, float speed, OfflineTtsCallback callback);
}
#if false // Decompilation log
'238' items in cache
------------------
Resolve: 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Found single assembly: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
WARN: Version mismatch. Expected: '2.0.0.0', Got: '2.1.0.0'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\ref\2.1.0\netstandard.dll'
------------------
Resolve: 'System.Runtime.InteropServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
Found single assembly: 'System.Runtime.InteropServices, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
WARN: Version mismatch. Expected: '2.0.0.0', Got: '4.1.2.0'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\compat\2.1.0\shims\netstandard\System.Runtime.InteropServices.dll'
------------------
Resolve: 'System.Runtime.CompilerServices.Unsafe, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
Could not find by name: 'System.Runtime.CompilerServices.Unsafe, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
------------------
Resolve: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Found single assembly: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\ref\2.1.0\netstandard.dll'
#endif

adem-rguez · 2025-01-10T10:25:48Z

hello @csukuangfj, any solution yet?

csukuangfj · 2025-01-10T11:25:22Z

The code looks correct.

Can you reproduce it with our example code in the dotnet-examples folder?

adem-rguez · 2025-01-10T11:58:03Z

sorry but i wasn't able to do that, i kinda lack experience of coding outside of unity

adem-rguez · 2025-01-12T13:00:57Z

hello @csukuangfj , i managed to fix the error by modifying the generate function as follows,

public OfflineTtsGeneratedAudio Generate(String text, float speed, int speakerId)
{
    byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);
    byte[] utf8BytesWithNull = new byte[utf8Bytes.Length + 1]; // +1 for null terminator
    Array.Copy(utf8Bytes, utf8BytesWithNull, utf8Bytes.Length);
    utf8BytesWithNull[utf8Bytes.Length] = 0; // Null terminator

    IntPtr p = SherpaOnnxOfflineTtsGenerate(_handle.Handle, utf8BytesWithNull, speakerId, speed);
    return new OfflineTtsGeneratedAudio(p);
}

i wasn't able to modify the nuget package in unity so i had to replicate the offlineTTS class in unity and then modify it, i hope you modify to the nuget package generate function like i did to avoid this error for other devs (c# .net example).

thank you for all your help and responsiveness!

See also k2-fsa#1695 (comment) We need to place a 0 at the end of the buffer.

csukuangfj · 2025-01-13T01:51:53Z

WithNull, utf8B

Great to hear you fixed it!

Please see #1701

See also #1695 (comment) We need to place a 0 at the end of the buffer.

AdrianPress · 2025-01-31T11:40:48Z

Hello @adem-rguez @csukuangfj ! First of all, sorry for being a newbie, I'm just getting started.

I've seen this thread and I can't resist asking if you can help me because, even with the documentation and how detailed this thread is, I still can't get Unity to load the library.

Although it's able to find it, I always get this error.DllNotFoundException: sherpa-onnx-c-api assembly:<unknown assembly> type:<unknown type> member:(null) SherpaOnnx.SherpaTest.InitializeSherpaOnnx () (at Assets/Scripts/SherpaTest.cs:130) SherpaOnnx.SherpaTest.Start () (at Assets/Scripts/SherpaTest.cs:53)

I have downloaded the necessary .so files for Android and placed them in the corresponding folders. I have a folder espeak-ng-data that I downloaded and placed in StreamingAssets, and a models folder where I have the model, tokens, etc.

I have declared the functions of the library.

`using SherpaOnnx;
using System;
using System.Runtime.InteropServices;

public static class Sherpita
{
#if UNITY_ANDROID && !UNITY_EDITOR
const string dll = "__Internal";
#else
const string dll = "sherpa-onnx-c-api"; // Nombre de la DLL sin .dll
#endif

[DllImport(dll)]
public static extern IntPtr SherpaOnnxCreateOfflineTts(ref OfflineTtsConfig config);

[DllImport(dll)]
public static extern void SherpaOnnxDestroyOfflineTts(IntPtr handle);

[DllImport(dll)]
public static extern int SherpaOnnxOfflineTtsSampleRate(IntPtr handle);

[DllImport(dll)]
public static extern IntPtr SherpaOnnxOfflineTtsGenerate(IntPtr handle, byte[] utf8Text, int sid, float speed);

}
`

I have a script to test the functionality in which I call a function.

`using System;
using System.IO;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using TMPro;
using System.Runtime.InteropServices;
using System.Diagnostics; // Solo se usará para Stopwatch

namespace SherpaOnnx
{
public class SherpaTest : MonoBehaviour
{
[Header("UI Elements")]
[SerializeField] private TMP_InputField inputField;
[SerializeField] private Button runButton;
[SerializeField] private AudioSource audioSource;

    [Header("Model Paths (desde StreamingAssets)")]
    [SerializeField] private string modelPath = "vits_generator.onnx";
    [SerializeField] private string tokensPath = "tokens.txt";
    [SerializeField] private string lexiconPath = "lexicon.txt";
    [SerializeField] private string dictDirPath = "dict";
    [SerializeField] private string espeakDataPath = "espeak-ng-data";

    [Header("TTS Settings")]
    [Range(0f, 1f)] public float noiseScale = 0.667f;
    [Range(0f, 1f)] public float noiseScaleW = 0.8f;
    [Range(0.5f, 2f)] public float lengthScale = 1.0f;
    public int speakerId = 0;
    public int numThreads = 1;
    public bool debugMode = false;
    public string provider = "cpu";
    public int maxNumSentences = 1;

    private IntPtr ttsHandle = IntPtr.Zero;

    private void Start()
    {
        UnityEngine.Debug.Log("🔹 Start() ejecutándose...");

        if (runButton != null)
        {
            runButton.onClick.AddListener(RunTTS);
            UnityEngine.Debug.Log("✅ Botón RunTTS asignado correctamente.");
        }
        else
        {
            UnityEngine.Debug.LogError("❌ ERROR: El botón no está asignado en el inspector.");
        }

        CheckLibrary();
        InitializeSherpaOnnx();
    }

    private void CheckLibrary()
    {
        UnityEngine.Debug.Log("🔹 Verificando librerías...");

#if UNITY_STANDALONE_WIN || UNITY_EDITOR
string libPath = Path.Combine(Application.dataPath, "Plugins/x86_64/sherpa-onnx-c-api.dll");
UnityEngine.Debug.Log($"📂 Buscando DLL en: {libPath}");

        if (!File.Exists(libPath))
        {
            UnityEngine.Debug.LogError($"❌ ERROR: No se encontró la DLL en {libPath}");
        }
        else
        {
            UnityEngine.Debug.Log("✅ DLL encontrada correctamente.");
        }

#endif
}

    public void TestTTSInitialization()
    {
        UnityEngine.Debug.Log(" TestTTSInitialization() ejecutado desde el Editor.");
        InitializeSherpaOnnx();
    }

    private void InitializeSherpaOnnx()
    {
        UnityEngine.Debug.Log(" Intentando inicializar Sherpa ONNX...");

        string streamingAssetsPath = Application.streamingAssetsPath;
        string fullModelPath = Path.Combine(streamingAssetsPath, modelPath);
        string fullTokensPath = Path.Combine(streamingAssetsPath, tokensPath);

        if (!File.Exists(fullModelPath))
        {
            UnityEngine.Debug.LogError($" ERROR: No se encontró el modelo ONNX en {fullModelPath}");
            return;
        }
        if (!File.Exists(fullTokensPath))
        {
            UnityEngine.Debug.LogError($" ERROR: No se encontró el archivo de tokens en {fullTokensPath}");
            return;
        }

        UnityEngine.Debug.Log(" Todos los archivos requeridos están disponibles.");

        var ttsConfig = new OfflineTtsConfig
        {
            Model = new OfflineTtsModelConfig
            {
                Vits = new OfflineTtsVitsModelConfig
                {
                    Model = fullModelPath,
                    Tokens = fullTokensPath,
                    Lexicon = Path.Combine(streamingAssetsPath, lexiconPath),
                    DictDir = Path.Combine(streamingAssetsPath, dictDirPath),
                    DataDir = Path.Combine(streamingAssetsPath, espeakDataPath),
                    NoiseScale = noiseScale,
                    NoiseScaleW = noiseScaleW,
                    LengthScale = lengthScale
                },
                NumThreads = numThreads,
                Debug = debugMode ? 1 : 0,
                Provider = provider
            },
            RuleFsts = "",
            MaxNumSentences = maxNumSentences,
            RuleFars = ""
        };

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        UnityEngine.Debug.Log(" Llamando a Sherpita.SherpaOnnxCreateOfflineTts...");
        ttsHandle = Sherpita.SherpaOnnxCreateOfflineTts(ref ttsConfig);

        stopwatch.Stop();
        UnityEngine.Debug.Log($" Tiempo de inicialización de Sherpa ONNX: {stopwatch.ElapsedMilliseconds} ms");

        if (ttsHandle == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: Falló la inicialización de Sherpa ONNX.");
        }
        else
        {
            UnityEngine.Debug.Log(" Sherpa ONNX inicializado correctamente.");
        }
    }

    private void RunTTS()
    {
        UnityEngine.Debug.Log(" Ejecutando RunTTS()...");

        if (ttsHandle == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: Sherpa ONNX no ha sido inicializado.");
            return;
        }

        if (string.IsNullOrWhiteSpace(inputField.text))
        {
            UnityEngine.Debug.LogWarning(" No hay texto para sintetizar.");
            return;
        }

        string text = inputField.text;
        UnityEngine.Debug.Log($" Texto a sintetizar: {text}");

        byte[] utf8Text = Encoding.UTF8.GetBytes(text);
        IntPtr audioPtr = Sherpita.SherpaOnnxOfflineTtsGenerate(ttsHandle, utf8Text, speakerId, 1.0f);

        if (audioPtr == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: La generación de audio devolvió un puntero nulo.");
            return;
        }

        float[] pcmSamples = new float[16000];
        Marshal.Copy(audioPtr, pcmSamples, 0, pcmSamples.Length);
        Marshal.FreeHGlobal(audioPtr);

        AudioClip clip = AudioClip.Create("SherpaOnnxTTS", pcmSamples.Length, 1, Sherpita.SherpaOnnxOfflineTtsSampleRate(ttsHandle), false);
        clip.SetData(pcmSamples, 0);
        audioSource.clip = clip;
        audioSource.loop = false;
        audioSource.Play();

        UnityEngine.Debug.Log(" Audio generado y reproducido correctamente.");
    }

    private void OnDestroy()
    {
        UnityEngine.Debug.Log(" OnDestroy() ejecutado.");

        if (ttsHandle != IntPtr.Zero)
        {
            Sherpita.SherpaOnnxDestroyOfflineTts(ttsHandle);
            ttsHandle = IntPtr.Zero;
            UnityEngine.Debug.Log(" Recursos liberados correctamente.");
        }
    }
}

}
`
And the script to configure the model.

`using System;
using System.Runtime.InteropServices;

namespace SherpaOnnx
{
[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsConfig
{
public OfflineTtsModelConfig Model;
public string RuleFsts;
public int MaxNumSentences;
public string RuleFars;
}

[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsModelConfig
{
    public OfflineTtsVitsModelConfig Vits;
    public int NumThreads;
    public int Debug;
    public string Provider;
}

[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsVitsModelConfig
{
    public string Model;
    public string Lexicon;
    public string Tokens;
    public string DataDir;
    public string DictDir;
    public float NoiseScale;
    public float NoiseScaleW;
    public float LengthScale;
}

public class OfflineTtsGeneratedAudio
{
    public float[] Samples;

    public OfflineTtsGeneratedAudio(IntPtr audioPtr)
    {
        if (audioPtr == IntPtr.Zero)
        {
            Samples = new float[0];  // Si el puntero es nulo, devuelve un array vacío.
            return;
        }

        // Asumimos que el tamaño máximo del audio es de 16000 muestras (1 segundo de audio a 16kHz).
        int length = 16000;  //  CAMBIA ESTO SEGÚN TU NECESIDAD REAL.
        Samples = new float[length];

        // Copia los datos desde el puntero de C++ a un array de C#.
        Marshal.Copy(audioPtr, Samples, 0, length);
    }
}

public delegate int OfflineTtsCallback(IntPtr samplesPtr, int numSamples);

}

In summary, I only have the .so files and those three scripts. Is there anything else I might be missing?

Is there something wrong with the scripts that is preventing the library from loading correctly when I call a function?

Thanks!
`

adem-rguez closed this as completed Jan 12, 2025

csukuangfj added a commit to csukuangfj/sherpa-onnx that referenced this issue Jan 13, 2025

Fix passing strings from C# to C.

0a7cb9b

See also k2-fsa#1695 (comment) We need to place a 0 at the end of the buffer.

csukuangfj mentioned this issue Jan 13, 2025

Fix passing strings from C# to C. #1701

Merged

csukuangfj added a commit that referenced this issue Jan 13, 2025

Fix passing strings from C# to C. (#1701)

0d20558

See also #1695 (comment) We need to place a 0 at the end of the buffer.

csukuangfj mentioned this issue Jan 29, 2025

Possible to integrate this as a Unity plugin and build to Android? #796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS Hallucinations in shorter phrases #1695

TTS Hallucinations in shorter phrases #1695

adem-rguez commented Jan 8, 2025 •

edited

Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 •

edited

Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 •

edited

Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 •

edited

Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 9, 2025

adem-rguez commented Jan 9, 2025 •

edited

Loading

adem-rguez commented Jan 10, 2025

csukuangfj commented Jan 10, 2025

adem-rguez commented Jan 10, 2025

adem-rguez commented Jan 12, 2025 •

edited

Loading

csukuangfj commented Jan 13, 2025

AdrianPress commented Jan 31, 2025

TTS Hallucinations in shorter phrases #1695

TTS Hallucinations in shorter phrases #1695

Comments

adem-rguez commented Jan 8, 2025 • edited Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 • edited Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 • edited Loading

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 • edited Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025 • edited Loading

csukuangfj commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

adem-rguez commented Jan 8, 2025

csukuangfj commented Jan 9, 2025

adem-rguez commented Jan 9, 2025 • edited Loading

adem-rguez commented Jan 10, 2025

csukuangfj commented Jan 10, 2025

adem-rguez commented Jan 10, 2025

adem-rguez commented Jan 12, 2025 • edited Loading

csukuangfj commented Jan 13, 2025

AdrianPress commented Jan 31, 2025

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 8, 2025 •

edited

Loading

adem-rguez commented Jan 9, 2025 •

edited

Loading

adem-rguez commented Jan 12, 2025 •

edited

Loading