Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS Hallucinations in shorter phrases #1695

Closed
adem-rguez opened this issue Jan 8, 2025 · 30 comments
Closed

TTS Hallucinations in shorter phrases #1695

adem-rguez opened this issue Jan 8, 2025 · 30 comments

Comments

@adem-rguez
Copy link

adem-rguez commented Jan 8, 2025

i am running tts sherpa-onnx in unity (c#), i am having a problem where in the shorter sentences the generated audio tends to add extra audio containing gibberish at the end..

example long sentence (works fine) : "Bonjour monsieur, comment allez-vous aujourd’hui ? J’espère que vous passez une excellente journée !"
audio file: long sentence example

example short sentence (adds gibberish at the end): bonjour monsieur
audio file: short sentence example

in these examples i used umpc voice for french, but the same issues exists on other models.
for example on the libritts_r model when you generate "hello sir" it works, but when you generate "hello" immediately after it, it adds the previous text sometimes or part of it "hello sir" or "hello si".

@csukuangfj
Copy link
Collaborator

but when you generate "hello" immediately after it

Could you describe in detail how you tried it?

Do you first generate

hello sir

and then you invoke a second call to generate

hello

or

hello sir hello

?

@adem-rguez
Copy link
Author

adem-rguez commented Jan 8, 2025

in the english example:
first generated:

hello sir

then tried 3 times the text:

hello

i noticed that 2 of 3 times it adds "sir" or "si" after the "hello" ( "hello sir" or "hello si")
but then if i generate a longer phrase "hello how are you?" it doesn't hallucinate!
try this in my apk:
download unity tts apk

meanwhile in the french exaple it adds stuff the first time!

@csukuangfj
Copy link
Collaborator

Can you reproduce it with our APK?
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

I think there is a bug in your apk if what you described can be reproduced with your APK.

@adem-rguez
Copy link
Author

you are right, it doesn't happen on your apk, the problem for me isn't just in the apk but even inside unity, using the code i shared earlier in the other thread. i am in need for french models in particular, the stuff they add at the end is not normal, and there are models that don't work at all on short sentences (they generate just distorted audio) like fr-FR_mls_medium.onnx.
for the french example i sent i used fr-FR-upmc-medium, and i used the espeak-ng-data folder, model path, and tokens.txt path, was i missing something?

@csukuangfj
Copy link
Collaborator

I just tried with your apk and I think there is a bug in your code.

Please make sure you have overwritten the buffer for the previous call .

Don't overwrite the buffer partially.

@csukuangfj
Copy link
Collaborator

fr-FR_mls_medium.onnx

Please don't use models containing mls in the filename.

I think I have deleted all models containing mls in its name.

@csukuangfj
Copy link
Collaborator

Or make sure you have cleared the buffer containing samples of the previous call before you play the samples of the current text.

@adem-rguez
Copy link
Author

the buffer is cleared already:

/// <summary>
    /// 1) Splits the text into sentences using multiple delimiters,
    /// 2) For each sentence, spawns a background thread to generate TTS,
    /// 3) Waits for generation to finish (without freezing the main thread),
    /// 4) Plays the resulting clip in order.
    /// </summary>
    private IEnumerator CoPlayTextBySentenceAsync(string text)
    {
        // More delimiters: period, question mark, exclamation, semicolon, colon
        // We also handle multiple punctuation in a row, etc.
        // This uses Regex to split on punctuation [.!?;:]+ 
        // Then trim the results and remove empties.
        // Split the text while keeping the punctuation with the preceding text
        string[] sentences = Regex.Matches(text, @"[^\.!\?;:]+[\.!\?;:]*")
            .Cast<Match>()
            .Select(m => m.Value.Trim())
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .ToArray();


        if (sentences.Length == 0)
        {
            Debug.LogWarning("No valid sentences found in input text.");
            yield break;
        }

        Debug.Log("senteces #"+ sentences.Length.ToString() );

        foreach (string sentence in sentences)
        {
            Debug.Log("[Background TTS] Generating:"+ sentence );
            
            // Prepare a place to store the generated float[] 
            float[] generatedSamples = null;
            bool generationDone = false;

            // Run .Generate(...) on a background thread
            Thread t = new Thread(() =>
            {
                // Generate the audio for this sentence
                OfflineTtsGeneratedAudio generated = offlineTts.Generate(sentence, speed, speakerId);
                generatedSamples = generated.Samples;
                generationDone = true;
            });
            t.Start();

            // Wait until the thread signals it's done
            yield return new WaitUntil(() => generationDone);

            // Back on the main thread, we create the AudioClip and play it
            if (generatedSamples == null || generatedSamples.Length == 0)
            {
                Debug.LogWarning("Generated empty audio for a sentence. Skipping...");
                continue;
            }

            AudioClip clip = AudioClip.Create(
                "SherpaOnnxTTS-SentenceAsync",
                generatedSamples.Length,
                1,
                offlineTts.SampleRate,
                false
            );
            clip.SetData(generatedSamples, 0);

            sentenceAudioSource.clip = clip;
            sentenceAudioSource.Play();
            Debug.Log($"Playing sentence: \"{sentence}\"  length = {clip.length:F2}s");

            // Wait until playback finishes
            while (sentenceAudioSource.isPlaying)
                yield return null;
        }

        Debug.Log("All sentences have been generated (background) and played sequentially.");
    }

also this is if we are talking about the apk, but in the french version it's different, would you like me to provide an apk for french as well?

@csukuangfj
Copy link
Collaborator

but in the french version it's different,

Could you describe the differences? Does the APK for French use a different set of code from the APK for English?

@adem-rguez
Copy link
Author

adem-rguez commented Jan 8, 2025

no the same, just a different model, with different tokens file, what i mean by different, is the issue

@adem-rguez
Copy link
Author

from the first time i generate an audio in french it hallucinates other stuff in the end of the text, so it's not a buffer issue for french, i just mentioned the english apk thinking it was related

@csukuangfj
Copy link
Collaborator

I don't see any issues from your posted code.

@csukuangfj
Copy link
Collaborator

foreach (string sentence in sentences)

Is each sentence processed sequentially, not in parallel?

@adem-rguez
Copy link
Author

yes, sequentially, since the tts functions don't support streaming right now, it was the only option to make the generation faster

@csukuangfj
Copy link
Collaborator

it was the only option to make the generation faster

No, we support passing a callback to C++.

Inside C++, it processes the text sentence by sentence. After processing a sentence, the callback is invoked with the generated samples for this sentence.

Please try our Android APK first. You will find it plays almost immediately no matter how long the given text is.

Remeber to use the TTS APK, not the TTS Engine APK.

@adem-rguez
Copy link
Author

adem-rguez commented Jan 8, 2025

in the script i provided in the other thread there is a function that used that:

/// <summary>
    /// Attempted "streaming" approach. The callback is called only once in practice
    /// for the entire waveform, so it doesn't truly stream partial chunks.
    /// </summary>
    private void PlayTextStreamed(string text)
    {
        Debug.Log($"[Streaming] Generating TTS for text: '{text}'");

        int sampleRate = offlineTts.SampleRate;
        int maxAudioLengthInSamples = sampleRate * 300; // 5 min

        streamingClip = AudioClip.Create(
            "SherpaOnnxTTS-Streamed",
            maxAudioLengthInSamples,
            1,
            sampleRate,
            true,
            OnAudioRead,
            OnAudioSetPosition
        );

        if (streamingAudioSource == null)
            streamingAudioSource = gameObject.AddComponent<AudioSource>();

        streamingAudioSource.playOnAwake = false;
        streamingAudioSource.clip = streamingClip;
        streamingAudioSource.loop = false;

        streamingBuffer = new ConcurrentQueue<float>();
        samplesRead = 0;

        streamingAudioSource.Play();

        // This calls your callback, but typically only once for the entire wave
        offlineTts.GenerateWithCallback(text, speed, speakerId, MyTtsChunkCallback);

        Debug.Log("[Streaming] Playback started; awaiting streamed samples...");
    }

    private int MyTtsChunkCallback(System.IntPtr samplesPtr, int numSamples)
    {
        Debug.Log("chunk callback");
        if (numSamples <= 0)
            return 0;

        float[] chunk = new float[numSamples];
        System.Runtime.InteropServices.Marshal.Copy(samplesPtr, chunk, 0, numSamples);

        foreach (float sample in chunk)
            streamingBuffer.Enqueue(sample);

        return 0; 
    }

    private void OnAudioRead(float[] data)
    {
        for (int i = 0; i < data.Length; i++)
        {
            if (streamingBuffer.TryDequeue(out float sample))
            {
                data[i] = sample;
                samplesRead++;
            }
            else
            {
                data[i] = 0f; // fill silence
            }
        }
    }

    private void OnAudioSetPosition(int newPosition)
    {
        Debug.Log($"[Streaming] OnAudioSetPosition => {newPosition}");
    }

as you can see it's implementend with the generatewithcallback function, but when i use it the callback is only called once at the end.

here is an example:
image
also i don't think it's related to the hallucination problem i mentioned, sadly :(

@csukuangfj
Copy link
Collaborator

Could you enable the debug in tts model config and post the logs when you generate samples?

int32_t debug;

@adem-rguez
Copy link
Author

adem-rguez commented Jan 8, 2025

i don't get any logs, that's the weird part, unity is not showing me any logs except the ones i made! am i doing something wrong?

// 1. Prepare the VITS model config
        var vitsConfig = new OfflineTtsVitsModelConfig
        {
            Model = BuildPath(modelPath),
            Lexicon = BuildPath(lexiconPath),
            Tokens = BuildPath(tokensPath),
            DataDir = BuildPath(espeakDir),
            DictDir = BuildPath(dictDirPath),

            NoiseScale = noiseScale,
            NoiseScaleW = noiseScaleW,
            LengthScale = lengthScale
        };

        // 2. Wrap it inside the ModelConfig
        var modelConfig = new OfflineTtsModelConfig
        {
            Vits = vitsConfig,
            NumThreads = numThreads,
            Debug = 1,
            Provider = provider
        };

        // 3. Create the top-level OfflineTtsConfig
        var ttsConfig = new OfflineTtsConfig
        {
            Model = modelConfig,
            RuleFsts = "",
            MaxNumSentences = maxNumSentences,
            RuleFars = ""
        };

        // 4. Instantiate the OfflineTts object
        Debug.Log("will create offline tts now!");
        offlineTts = new OfflineTts(ttsConfig);
        Debug.Log($"OfflineTts created! SampleRate: {offlineTts.SampleRate}, NumSpeakers: {offlineTts.NumSpeakers}");

@csukuangfj
Copy link
Collaborator

IIRC, you posted some error logs in your first issue in the other session. How did you get them?

@adem-rguez
Copy link
Author

from log cat that was in an apk using logcat, for some reason unity doesn't show the errors directly, hold tight, i will use log cat again

@adem-rguez
Copy link
Author

so this is from logcat:
image
this part isn't supposed to be there:
image
the raw text is having random stuff added to it..

this example might be easier to understand:
image

it had an "u" added to it, this was made using the generate function:
image

@adem-rguez
Copy link
Author

it's sherpa that's logging that yellow raw text warning, but i am unable to get its stack trace

@csukuangfj
Copy link
Collaborator

            OfflineTtsGeneratedAudio generated = offlineTts.Generate(sentence, speed, speakerId);

Please show the code for offlineTts.Generate

@adem-rguez
Copy link
Author

adem-rguez commented Jan 9, 2025

good morning, thank you for your reply!
it's read-only for me, i am using it straight from the nuget package, in order to modify for me, i would need to make a copy of it and use the copy:

#region Assembly sherpa-onnx, Version=1.10.38.0, Culture=neutral, PublicKeyToken=null
// D:\Unity Projects 2\Sherpa-onnx-Unity-main\Assets\Packages\org.k2fsa.sherpa.onnx.1.10.38\lib\netstandard2.0\sherpa-onnx.dll
// Decompiled with ICSharpCode.Decompiler 8.1.1.7464
#endregion

using System;
using System.Runtime.InteropServices;
using System.Text;

namespace SherpaOnnx;

public class OfflineTts : IDisposable
{
    private HandleRef _handle;

    public int SampleRate => SherpaOnnxOfflineTtsSampleRate(_handle.Handle);

    public int NumSpeakers => SherpaOnnxOfflineTtsNumSpeakers(_handle.Handle);

    public OfflineTts(OfflineTtsConfig config)
    {
        IntPtr handle = SherpaOnnxCreateOfflineTts(ref config);
        _handle = new HandleRef(this, handle);
    }

    public OfflineTtsGeneratedAudio Generate(string text, float speed, int speakerId)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(text);
        return new OfflineTtsGeneratedAudio(SherpaOnnxOfflineTtsGenerate(_handle.Handle, bytes, speakerId, speed));
    }

    public OfflineTtsGeneratedAudio GenerateWithCallback(string text, float speed, int speakerId, OfflineTtsCallback callback)
    {
        byte[] bytes = Encoding.UTF8.GetBytes(text);
        return new OfflineTtsGeneratedAudio(SherpaOnnxOfflineTtsGenerateWithCallback(_handle.Handle, bytes, speakerId, speed, callback));
    }

    public void Dispose()
    {
        Cleanup();
        GC.SuppressFinalize(this);
    }

    ~OfflineTts()
    {
        Cleanup();
    }

    private void Cleanup()
    {
        SherpaOnnxDestroyOfflineTts(_handle.Handle);
        _handle = new HandleRef(this, IntPtr.Zero);
    }

    [DllImport("sherpa-onnx-c-api")]
    private static extern IntPtr SherpaOnnxCreateOfflineTts(ref OfflineTtsConfig config);

    [DllImport("sherpa-onnx-c-api")]
    private static extern void SherpaOnnxDestroyOfflineTts(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern int SherpaOnnxOfflineTtsSampleRate(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern int SherpaOnnxOfflineTtsNumSpeakers(IntPtr handle);

    [DllImport("sherpa-onnx-c-api")]
    private static extern IntPtr SherpaOnnxOfflineTtsGenerate(IntPtr handle, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1)] byte[] utf8Text, int sid, float speed);

    [DllImport("sherpa-onnx-c-api", CallingConvention = CallingConvention.Cdecl)]
    private static extern IntPtr SherpaOnnxOfflineTtsGenerateWithCallback(IntPtr handle, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1)] byte[] utf8Text, int sid, float speed, OfflineTtsCallback callback);
}
#if false // Decompilation log
'238' items in cache
------------------
Resolve: 'netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Found single assembly: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
WARN: Version mismatch. Expected: '2.0.0.0', Got: '2.1.0.0'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\ref\2.1.0\netstandard.dll'
------------------
Resolve: 'System.Runtime.InteropServices, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
Found single assembly: 'System.Runtime.InteropServices, Version=4.1.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
WARN: Version mismatch. Expected: '2.0.0.0', Got: '4.1.2.0'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\compat\2.1.0\shims\netstandard\System.Runtime.InteropServices.dll'
------------------
Resolve: 'System.Runtime.CompilerServices.Unsafe, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
Could not find by name: 'System.Runtime.CompilerServices.Unsafe, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null'
------------------
Resolve: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Found single assembly: 'netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Load from: 'D:\Unity Installs\2022.3.55f1\Editor\Data\NetStandard\ref\2.1.0\netstandard.dll'
#endif

@adem-rguez
Copy link
Author

hello @csukuangfj, any solution yet?

@csukuangfj
Copy link
Collaborator

The code looks correct.

Can you reproduce it with our example code in the dotnet-examples folder?

@adem-rguez
Copy link
Author

sorry but i wasn't able to do that, i kinda lack experience of coding outside of unity

@adem-rguez
Copy link
Author

adem-rguez commented Jan 12, 2025

hello @csukuangfj , i managed to fix the error by modifying the generate function as follows,

public OfflineTtsGeneratedAudio Generate(String text, float speed, int speakerId)
{
    byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);
    byte[] utf8BytesWithNull = new byte[utf8Bytes.Length + 1]; // +1 for null terminator
    Array.Copy(utf8Bytes, utf8BytesWithNull, utf8Bytes.Length);
    utf8BytesWithNull[utf8Bytes.Length] = 0; // Null terminator

    IntPtr p = SherpaOnnxOfflineTtsGenerate(_handle.Handle, utf8BytesWithNull, speakerId, speed);
    return new OfflineTtsGeneratedAudio(p);
}

i wasn't able to modify the nuget package in unity so i had to replicate the offlineTTS class in unity and then modify it, i hope you modify to the nuget package generate function like i did to avoid this error for other devs (c# .net example).

thank you for all your help and responsiveness!

csukuangfj added a commit to csukuangfj/sherpa-onnx that referenced this issue Jan 13, 2025
See also
k2-fsa#1695 (comment)

We need to place a 0 at the end of the buffer.
@csukuangfj
Copy link
Collaborator

WithNull, utf8B

Great to hear you fixed it!

Please see #1701

csukuangfj added a commit that referenced this issue Jan 13, 2025
See also
#1695 (comment)

We need to place a 0 at the end of the buffer.
@AdrianPress
Copy link

Hello @adem-rguez @csukuangfj ! First of all, sorry for being a newbie, I'm just getting started.

I've seen this thread and I can't resist asking if you can help me because, even with the documentation and how detailed this thread is, I still can't get Unity to load the library.

Although it's able to find it, I always get this error.DllNotFoundException: sherpa-onnx-c-api assembly:<unknown assembly> type:<unknown type> member:(null) SherpaOnnx.SherpaTest.InitializeSherpaOnnx () (at Assets/Scripts/SherpaTest.cs:130) SherpaOnnx.SherpaTest.Start () (at Assets/Scripts/SherpaTest.cs:53)

I have downloaded the necessary .so files for Android and placed them in the corresponding folders. I have a folder espeak-ng-data that I downloaded and placed in StreamingAssets, and a models folder where I have the model, tokens, etc.

Image

Image

Image

I have declared the functions of the library.

`using SherpaOnnx;
using System;
using System.Runtime.InteropServices;

public static class Sherpita
{
#if UNITY_ANDROID && !UNITY_EDITOR
const string dll = "__Internal";
#else
const string dll = "sherpa-onnx-c-api"; // Nombre de la DLL sin .dll
#endif

[DllImport(dll)]
public static extern IntPtr SherpaOnnxCreateOfflineTts(ref OfflineTtsConfig config);

[DllImport(dll)]
public static extern void SherpaOnnxDestroyOfflineTts(IntPtr handle);

[DllImport(dll)]
public static extern int SherpaOnnxOfflineTtsSampleRate(IntPtr handle);

[DllImport(dll)]
public static extern IntPtr SherpaOnnxOfflineTtsGenerate(IntPtr handle, byte[] utf8Text, int sid, float speed);

}
`

I have a script to test the functionality in which I call a function.

`using System;
using System.IO;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using TMPro;
using System.Runtime.InteropServices;
using System.Diagnostics; // Solo se usará para Stopwatch

namespace SherpaOnnx
{
public class SherpaTest : MonoBehaviour
{
[Header("UI Elements")]
[SerializeField] private TMP_InputField inputField;
[SerializeField] private Button runButton;
[SerializeField] private AudioSource audioSource;

    [Header("Model Paths (desde StreamingAssets)")]
    [SerializeField] private string modelPath = "vits_generator.onnx";
    [SerializeField] private string tokensPath = "tokens.txt";
    [SerializeField] private string lexiconPath = "lexicon.txt";
    [SerializeField] private string dictDirPath = "dict";
    [SerializeField] private string espeakDataPath = "espeak-ng-data";

    [Header("TTS Settings")]
    [Range(0f, 1f)] public float noiseScale = 0.667f;
    [Range(0f, 1f)] public float noiseScaleW = 0.8f;
    [Range(0.5f, 2f)] public float lengthScale = 1.0f;
    public int speakerId = 0;
    public int numThreads = 1;
    public bool debugMode = false;
    public string provider = "cpu";
    public int maxNumSentences = 1;

    private IntPtr ttsHandle = IntPtr.Zero;

    private void Start()
    {
        UnityEngine.Debug.Log("🔹 Start() ejecutándose...");

        if (runButton != null)
        {
            runButton.onClick.AddListener(RunTTS);
            UnityEngine.Debug.Log("✅ Botón RunTTS asignado correctamente.");
        }
        else
        {
            UnityEngine.Debug.LogError("❌ ERROR: El botón no está asignado en el inspector.");
        }

        CheckLibrary();
        InitializeSherpaOnnx();
    }

    private void CheckLibrary()
    {
        UnityEngine.Debug.Log("🔹 Verificando librerías...");

#if UNITY_STANDALONE_WIN || UNITY_EDITOR
string libPath = Path.Combine(Application.dataPath, "Plugins/x86_64/sherpa-onnx-c-api.dll");
UnityEngine.Debug.Log($"📂 Buscando DLL en: {libPath}");

        if (!File.Exists(libPath))
        {
            UnityEngine.Debug.LogError($"❌ ERROR: No se encontró la DLL en {libPath}");
        }
        else
        {
            UnityEngine.Debug.Log("✅ DLL encontrada correctamente.");
        }

#endif
}

    public void TestTTSInitialization()
    {
        UnityEngine.Debug.Log(" TestTTSInitialization() ejecutado desde el Editor.");
        InitializeSherpaOnnx();
    }

    private void InitializeSherpaOnnx()
    {
        UnityEngine.Debug.Log(" Intentando inicializar Sherpa ONNX...");

        string streamingAssetsPath = Application.streamingAssetsPath;
        string fullModelPath = Path.Combine(streamingAssetsPath, modelPath);
        string fullTokensPath = Path.Combine(streamingAssetsPath, tokensPath);

        if (!File.Exists(fullModelPath))
        {
            UnityEngine.Debug.LogError($" ERROR: No se encontró el modelo ONNX en {fullModelPath}");
            return;
        }
        if (!File.Exists(fullTokensPath))
        {
            UnityEngine.Debug.LogError($" ERROR: No se encontró el archivo de tokens en {fullTokensPath}");
            return;
        }

        UnityEngine.Debug.Log(" Todos los archivos requeridos están disponibles.");

        var ttsConfig = new OfflineTtsConfig
        {
            Model = new OfflineTtsModelConfig
            {
                Vits = new OfflineTtsVitsModelConfig
                {
                    Model = fullModelPath,
                    Tokens = fullTokensPath,
                    Lexicon = Path.Combine(streamingAssetsPath, lexiconPath),
                    DictDir = Path.Combine(streamingAssetsPath, dictDirPath),
                    DataDir = Path.Combine(streamingAssetsPath, espeakDataPath),
                    NoiseScale = noiseScale,
                    NoiseScaleW = noiseScaleW,
                    LengthScale = lengthScale
                },
                NumThreads = numThreads,
                Debug = debugMode ? 1 : 0,
                Provider = provider
            },
            RuleFsts = "",
            MaxNumSentences = maxNumSentences,
            RuleFars = ""
        };

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        UnityEngine.Debug.Log(" Llamando a Sherpita.SherpaOnnxCreateOfflineTts...");
        ttsHandle = Sherpita.SherpaOnnxCreateOfflineTts(ref ttsConfig);

        stopwatch.Stop();
        UnityEngine.Debug.Log($" Tiempo de inicialización de Sherpa ONNX: {stopwatch.ElapsedMilliseconds} ms");

        if (ttsHandle == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: Falló la inicialización de Sherpa ONNX.");
        }
        else
        {
            UnityEngine.Debug.Log(" Sherpa ONNX inicializado correctamente.");
        }
    }

    private void RunTTS()
    {
        UnityEngine.Debug.Log(" Ejecutando RunTTS()...");

        if (ttsHandle == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: Sherpa ONNX no ha sido inicializado.");
            return;
        }

        if (string.IsNullOrWhiteSpace(inputField.text))
        {
            UnityEngine.Debug.LogWarning(" No hay texto para sintetizar.");
            return;
        }

        string text = inputField.text;
        UnityEngine.Debug.Log($" Texto a sintetizar: {text}");

        byte[] utf8Text = Encoding.UTF8.GetBytes(text);
        IntPtr audioPtr = Sherpita.SherpaOnnxOfflineTtsGenerate(ttsHandle, utf8Text, speakerId, 1.0f);

        if (audioPtr == IntPtr.Zero)
        {
            UnityEngine.Debug.LogError(" ERROR: La generación de audio devolvió un puntero nulo.");
            return;
        }

        float[] pcmSamples = new float[16000];
        Marshal.Copy(audioPtr, pcmSamples, 0, pcmSamples.Length);
        Marshal.FreeHGlobal(audioPtr);

        AudioClip clip = AudioClip.Create("SherpaOnnxTTS", pcmSamples.Length, 1, Sherpita.SherpaOnnxOfflineTtsSampleRate(ttsHandle), false);
        clip.SetData(pcmSamples, 0);
        audioSource.clip = clip;
        audioSource.loop = false;
        audioSource.Play();

        UnityEngine.Debug.Log(" Audio generado y reproducido correctamente.");
    }

    private void OnDestroy()
    {
        UnityEngine.Debug.Log(" OnDestroy() ejecutado.");

        if (ttsHandle != IntPtr.Zero)
        {
            Sherpita.SherpaOnnxDestroyOfflineTts(ttsHandle);
            ttsHandle = IntPtr.Zero;
            UnityEngine.Debug.Log(" Recursos liberados correctamente.");
        }
    }
}

}
`
And the script to configure the model.

`using System;
using System.Runtime.InteropServices;

namespace SherpaOnnx
{
[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsConfig
{
public OfflineTtsModelConfig Model;
public string RuleFsts;
public int MaxNumSentences;
public string RuleFars;
}

[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsModelConfig
{
    public OfflineTtsVitsModelConfig Vits;
    public int NumThreads;
    public int Debug;
    public string Provider;
}

[StructLayout(LayoutKind.Sequential)]
public struct OfflineTtsVitsModelConfig
{
    public string Model;
    public string Lexicon;
    public string Tokens;
    public string DataDir;
    public string DictDir;
    public float NoiseScale;
    public float NoiseScaleW;
    public float LengthScale;
}

public class OfflineTtsGeneratedAudio
{
    public float[] Samples;

    public OfflineTtsGeneratedAudio(IntPtr audioPtr)
    {
        if (audioPtr == IntPtr.Zero)
        {
            Samples = new float[0];  // Si el puntero es nulo, devuelve un array vacío.
            return;
        }

        // Asumimos que el tamaño máximo del audio es de 16000 muestras (1 segundo de audio a 16kHz).
        int length = 16000;  //  CAMBIA ESTO SEGÚN TU NECESIDAD REAL.
        Samples = new float[length];

        // Copia los datos desde el puntero de C++ a un array de C#.
        Marshal.Copy(audioPtr, Samples, 0, length);
    }
}

public delegate int OfflineTtsCallback(IntPtr samplesPtr, int numSamples);

}

In summary, I only have the .so files and those three scripts. Is there anything else I might be missing?

Is there something wrong with the scripts that is preventing the library from loading correctly when I call a function?

Thanks!
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants