You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know this is no small feat, but a feature that would be insanely good in my opinion is for the extension to be able to jump/cut based on VAD (Voice Activity Detection) or Speech Recognition.
I tried doing some research on the matter, mainly to find an editor that would be able to cut parts of a video that doesn't contain speech, and for example there's the new paid jumpcutter (gui, is in beta and has a trial) from carykh (jumpcutter.com) that now can jump/cut using VAD, but it's a bit slow and lacking and you can't use it in CLI, which is what I'm mainly looking for. There's also cloud-based services like wisecut.video but it's not suitable for my use case being priced/limited in video time/size/etc.
And it's while doing this research that I found this extension, that I found actually pretty useful for different use cases than what I was looking for (I have a lot of media files which I would like to trim the non-speech parts, but I also consume quite a bit of content online and I'm glad to have found this extension for this)
So having not found anything that could do what I wanted I'm now looking into maybe coding myself a script to do it.
And so I thought that maybe I could share the resources I've found to this point to help implement this in this extension if this ever gets implemented, which I think would be such a huge and useful feature.
Sadly everything I've found is mainly in python so not sure how well it could apply to this project.
Oh yeah sorry, I didn't even think about searching this in the existing issues because of how rare it seems to be for "loudness-based" softwares to have this kind of feature. That's cool.
Reading through the issue you linked, I guess my post doesn't bring much to the table, feel free to close it if you think it's just adding a duplicate to this subject.
Thanks a lot for the links! I guess I'll close this as a duplicate, and let's continue the discussion there.
FYI the extension is modular enough in this regard, so if you have an algorithm, it shouldn't be hard to integrate it into the extension. Here's the responsible part.
I know this is no small feat, but a feature that would be insanely good in my opinion is for the extension to be able to jump/cut based on VAD (Voice Activity Detection) or Speech Recognition.
I tried doing some research on the matter, mainly to find an editor that would be able to cut parts of a video that doesn't contain speech, and for example there's the new paid jumpcutter (gui, is in beta and has a trial) from carykh (jumpcutter.com) that now can jump/cut using VAD, but it's a bit slow and lacking and you can't use it in CLI, which is what I'm mainly looking for. There's also cloud-based services like wisecut.video but it's not suitable for my use case being priced/limited in video time/size/etc.
And it's while doing this research that I found this extension, that I found actually pretty useful for different use cases than what I was looking for (I have a lot of media files which I would like to trim the non-speech parts, but I also consume quite a bit of content online and I'm glad to have found this extension for this)
So having not found anything that could do what I wanted I'm now looking into maybe coding myself a script to do it.
And so I thought that maybe I could share the resources I've found to this point to help implement this in this extension if this ever gets implemented, which I think would be such a huge and useful feature.
Sadly everything I've found is mainly in python so not sure how well it could apply to this project.
But here's what I've got so far:
https://archive.is/20220527092223/https://towardsdatascience.com/automatic-video-editing-using-python-324e5efd7eba
https://archive.is/S6a4V
https://wandb.ai/yvrjsharma/posts/reports/Video-Editing-Using-Automatic-Speech-Recognition---VmlldzoyMTY4OTQy
https://realpython.com/python-speech-recognition/
https://thegradient.pub/one-voice-detector-to-rule-them-all/
https://github.com/openai/whisper
https://github.com/snakers4/silero-vad
https://github.com/wiseman/py-webrtcvad
https://github.com/Picovoice/cobra
https://github.com/alibaba-damo-academy/FunASR
Edit:
Adding some links which seem more suitable for this extension:
https://github.com/ccoreilly/vosk-browser
https://github.com/wiseman/py-webrtcvad
(Silero VAD seems to be the best model to use)
The text was updated successfully, but these errors were encountered: