Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache mechanism to sherpa tts #1732

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Add cache mechanism to sherpa tts #1732

wants to merge 12 commits into from

Conversation

mah92
Copy link

@mah92 mah92 commented Jan 17, 2025

Delay is a major drawback of sherpa tts on android phones, which makes it unusable for average blind people using average phones. By caching the most frequent tts requests, sherpa is made as fast as lightning on most of the time, using much less cpu and battery. Some blind individials have tested this upgrade and have been happy :)

Considered guide-lines:

  • App does not cache user texts due to security and privacy concerns. Just a hash is saved from the texts.
  • Usage statistics of texts(hashes) are saved to a list, and stored periodically(every 10 minutes). If cache space is filled, wav files that are less used are removed.
  • Setting amount of cache and the ability to remove cached wav files are added in the MainActivity
  • Present jni functions are tried to remain intact due to possible compatibility considerations.
  • When changing speed, cache could be cleared automatically to update the speed of cached media as well, not implemented yet as there is a clear button on MainActivity.kt.
  • Code is scanned for possible bugs with deepseek AI.

@mah92
Copy link
Author

mah92 commented Jan 17, 2025

Screenshot_20250117-200932_TTS Engine

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

Will review it.

sherpa-onnx/csrc/offline-tts-cache-mechanism.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/offline-tts-cache-mechanism.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/offline-tts-cache-mechanism.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/offline-tts-cache-mechanism.h Outdated Show resolved Hide resolved
sherpa-onnx/csrc/CMakeLists.txt Outdated Show resolved Hide resolved
sherpa-onnx/csrc/offline-tts.h Outdated Show resolved Hide resolved
@@ -32,6 +33,9 @@ struct OfflineTtsConfig {
// If you set it to -1, then we process all sentences in a single batch.
int32_t max_num_sentences = 1;

// Path to cache_directory
std::string cache_dir;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a new config for cache.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove cache_dir by calling the constructor from within GetOfflineTtsConfig. Is it acceptable?
If not, please give more details.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, OfflineTts has a config called OfflineTtsConfig.

You can create a struct called CacheMechanismConfig and put cache_dir in it. In this way, you don't need to add a new field to OfflineTtsConfig. You also don't need to add new methods to OfflineTts. (Note you need to wrap CacheMechanism and its config to Kotlin via jni)

Remember that sherpa-onnx currently supports 12 programming languages. If we change the config in C++, we also need to update APIs of other programming languages.

You can create a smart pointer of CacheMechanism and pass it to the constructor of OfflineTts.

If later we need to extend the CacheMechanism, we can leave OfflineTts and OfflineTtsConfig untouched. We only need to change CacheMechanism or/and CacheMechanismConfig.

By the way, since the CacheMechanism is related to tts, can you rename it to OfflineTtsCacheMechanism?

In general, the filename of a class contains the class name, though there are dashes in the filename but not in the class name

Methods of CacheMechanism don't need to be repeated in OfflineTts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I add OfflineTtsCacheMechanismConfig to OfflineTtsConfig or keep it separate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep it separate.

Comment on lines 94 to 104
// Return the maximum number of cached audio files size
int32_t CacheSize() const;

// Set the maximum number of cached audio files size
void SetCacheSize(const int32_t cache_size);

// Remove all cache data
void ClearCache();

// To get total used cache size(for wav files) in bytes
int64_t GetTotalUsedCacheSize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these methods from OfflineTtts.

You can pass a smart pointer of CacheMechanism to the constructor of OffineTts.
(You can add an overload constructor for OfflineTts)

Copy link
Author

@mah92 mah92 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get how.
You mean I directly call functions in CacheMechanism from jni functions themselves?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. CacheMechanism is constructed outside of OfflineTts.

OfflineTts is only for converting text to speech. Anything related to cache should be put in a separate class, i.e., please don't repeat the methods of CacheMechanism in OfflineTts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the pimpl implementation here.

Comment on lines 111 to 113
OfflineTtsConfig config_;
std::unique_ptr<OfflineTtsImpl> impl_;
std::unique_ptr<CacheMechanism> cache_mechanism_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OfflineTtsConfig config_;
std::unique_ptr<OfflineTtsImpl> impl_;
std::unique_ptr<CacheMechanism> cache_mechanism_;
std::unique_ptr<OfflineTtsImpl> impl_;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get it. If CacheMechanism is to be added to the constructor of OffineTts, aren't these lines still needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update OfflineTtsImpl.

You don't need to add any new fields or any new methods to OfflineTts.

You can add a new field

CacheMechanism* cache_; // not owned here

to OfflineTtsImpl.

@csukuangfj
Copy link
Collaborator

Please use
https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/check_style_cpplint.sh
to check the style of your C++ code.

You can refer to

# (1) To check files of the last commit
# ./scripts/check_style_cpplint.sh
#
# (2) To check changed files not committed yet
# ./scripts/check_style_cpplint.sh 1
#
# (3) To check all files in the project
# ./scripts/check_style_cpplint.sh 2


You can use

pip install clang-format==12.0.1

to install clang-format and use it to auto-format your code.

@mah92
Copy link
Author

mah92 commented Jan 22, 2025

Thank you, I will correct tomorrow.
I wonder if it is better to remove the clear button and remove the cache automatically when the speed is changed. By the way the clear button seems to be displaced after merging to the newer commit.

@csukuangfj
Copy link
Collaborator

I wonder if it is better to remove the clear button and remove the cache automatically when the speed is changed

Yes, I agree.


By the way, you can even set a default cache size. Users don't need to set the cache size through the UI.

@mah92
Copy link
Author

mah92 commented Jan 22, 2025

Another proposal: whatever we do in the model, there still remains some words that are not spelled correctly(like keyboard letters). Good to add them permanently in the cache so that they be read fron the cache instead of the model?

@mah92
Copy link
Author

mah92 commented Jan 22, 2025

About the cache slider, I think it is good to keep it for a while. Then rethink based on user feedback. I know it is good between 20 and 200 MB but need feedback to fix it(maybe not a single optimal value based on phone). I also think the user may be happier when knowing about the cach size and would accept the extra memory usage easier. Also better for privacy concerns to see the slider.

@csukuangfj
Copy link
Collaborator

the cache instead of the model?

Ok, it is fine with me.

@csukuangfj
Copy link
Collaborator

Another proposal: whatever we do in the model, there still remains some words that are not spelled correctly(like keyboard letters). Good to add them permanently in the cache so that they be read fron the cache instead of the model?

Please make the cache do only 1 thing.

You can create a separate PR to add it.

@mah92
Copy link
Author

mah92 commented Jan 23, 2025

Please use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/check_style_cpplint.sh to check the style of your C++ code.
...

done.

@mah92
Copy link
Author

mah92 commented Jan 24, 2025

Shall I add OfflineTtsCacheMechanismConfig to OfflineTtsConfig or keep it separate?

@csukuangfj
Copy link
Collaborator

Shall I add OfflineTtsCacheMechanismConfig to OfflineTtsConfig or keep it separate?

Please keep it separate.

@mah92
Copy link
Author

mah92 commented Jan 24, 2025

impl_ is private in csrc/offline-tts.h. Unable to use it in jni/offline-tts.cc
private:
std::unique_ptr impl_;

Instead I can put OfflineTtsCacheMechanism* cache_; inside OfflineTts itself.

@csukuangfj
Copy link
Collaborator

You don't need to wrap.impl_ to.jni.

You need to.wrap. CacheMechanism to jni.

Not sure why you.need to wrap impl_ to jni.

@mah92
Copy link
Author

mah92 commented Jan 24, 2025

You told me ealier to update OfflineTtsImpl and add CacheMechansim pointer there. It was impossible to use it from there due to it's private nature. So should I cache the pointer in the jni/offline-tts.cc?

Note that if I cache it in jni/offline-tts.cc, I should move calling all it's methods like GetWavFile there. It means the cache mechansim would only be added to kotin/android.

If I cache the pointer as public in class in csrc/offline-tts.h, it would remain compatible(other projects just won't have cache) but might be easier for them to use cache later(using the second constructor and passing OfflineTtsCaxheMechanismConfig). Still it might not be neat.

@csukuangfj
Copy link
Collaborator

Yes, I said that.

Let me give.you the pseudo code later.

…erpaOnnxEngine

Removing cache_dir from OfflineTts
Removing function implementations from OfflineTts
Passing config pointer  to OfflineTts
Added cache_mechanism_ to OfflineTts (as public, might not be so neat)
Changing all cache units in kotlin to Bytes from MB to reduce confusion. Showing is still in MBs
@mah92
Copy link
Author

mah92 commented Jan 25, 2025

This is what I have done in the last commit, looking forward to your pseudo code...
offline-tts-cahce-mechanism-config in both SherpaOnnxTts and SherpaOnnxEngine
Removing cache_dir from OfflineTts
Removing function implementations from OfflineTts
Passing config pointer to OfflineTts
Added cache_mechanism_ to OfflineTts (as public, might not be so neat)
Changing all cache units in kotlin to Bytes from MB to reduce confusion. Showing is still in MBs

P.N: Due to problems in my side, I was not able to test SherpaOnnxTts directly. I just test on SherpaOnnxEngine

Also solved bug by accidentally pasted extra code in previous commit
…counts from uint32_t to size_t for better alignment in the file, Changed repeat_counts save time to every 1minute as destructor not working
@csukuangfj
Copy link
Collaborator

csukuangfj commented Jan 25, 2025

So here is the pseudo code.

  1. Create a cache mechanism outside of OfflineTts.
OfflineTtsCacheMechanismConfig config;
// set config

auto cache = new OfflineTtsCacheMechanism(config)
  1. Now pass the pointer to cache to the OfflineTts
OfflineTtsConfig config;

auto tts = new OfflineTts(config, cache);
  1. Now Change OfflineTtsImpl
class OfflineTtsImpl {
public:
  OfflineTtsImpl(const OfflineTtsConfig& config, OfflineTtsCacheMechanism* cache)
   : ...
     cache_(cache) {
   ...
  }

private:
  OfflineTtsCacheMechanism *cache_ = nullptr; // not owned here
};

Note again in the above code, you don't need to add any new methods (except the constructor) of OfflineTts.

You also don't need to care about that impl_ is private in OfflineTts.


You just need to wrap OfflineTtsCacheMechanism and OfflineTtsCacheMechanismConfig. to JNI.


Please don't add any method related to OfflineTtsCacheMechanism to OfflineTts. You don't need to do that.

@@ -91,6 +99,8 @@ class OfflineTts {
// If it supports only a single speaker, then it return 0 or 1.
int32_t NumSpeakers() const;

std::unique_ptr<OfflineTtsCacheMechanism> cache_mechanism_; // not owned here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<OfflineTtsCacheMechanism> cache_mechanism_; // not owned here

Please don't add any new fields to OfflineTts.

As said before, we use pimpl here.

If you don't know what pimpl is, please google it first.

The only member OfflineTts has is impl_.

Everything else is put inOfflineTtsImpl.

@mah92
Copy link
Author

mah92 commented Jan 31, 2025

I will catch up soon. By the way, What files needs change in the 1. and 2. part of pseudo code change comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants