Issue with preprocess_text #271

DrJPK · 2024-11-18T07:12:56Z

Hi,

This is a fantastic project but I appear to be having an issue with the call to preprocess_text on line 172 of diaraize.py

The initial call is to python diaraise.py -a audio.MP3 and I have the whole script loaded in its own venv with Python 3.12
The separating audio tracks works fine and they are stored in the server as expected.

The subsequent call to preprocess_text() then fails with ctc_forced_aligner throwing errors

[NeMo W 2024-11-18 17:38:04 nemo_logging:393] /srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
      warnings.warn(

Traceback (most recent call last):
  File "/srv/whisperAI/whisper-diarization/diarize.py", line 172, in <module>
    tokens_starred, text_starred = preprocess_text(
                                   ^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/ctc_forced_aligner/text_utils.py", line 220, in preprocess_text
    tokens = get_uroman_tokens(norm_text, language)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/ctc_forced_aligner/text_utils.py", line 164, in get_uroman_tokens
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['perl', '/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/ctc_forced_aligner/uroman/bin/uroman.pl', '-l', 'eng']' returned non-zero exit status 2.

I am likely missing something really obvious but looking into that perl scripts I can't even see an exit 2; line so I am struggling to see how the error is being generated, never mind beginning to understand how to fix it.

Any ideas as to what might be causing this?

The text was updated successfully, but these errors were encountered:

homelab-00 · 2024-11-18T07:30:54Z

Try this: install the language 'perl' on your system and also install the uroman package via pip.

DrJPK · 2024-11-18T07:40:09Z

Unfortunately the error is still the same

perl is v5.32.1

and pip list shows uroman as 1.3.1.1

ctc-forced-aligner is also version 0.2 in case that makes any difference

homelab-00 · 2024-11-18T08:08:40Z

It looks a lot like an issue I had while trying to run and install this project, which was resolved by installing 'perl'. Did you restart your system after installing 'perl' (or did already had it installed)?

You can also take a look at my full windows installation instructions here and see if they help.

homelab-00 · 2024-11-18T10:21:05Z

Also make sure 'perl' has been correctly added to PATH. I've installed 'perl' via Strawberry perl and my PATH looks like this:

C:\Strawberry\c\bin
C:\Strawberry\perl\site\bin
C:\Strawberry\perl\bin

DrJPK · 2024-11-18T12:30:39Z

I've tried a few things assuming that it is a perl python binding issue, but I'll add this comment here incase any one has any further insight.

Server is linux RHEL 9 running Python3.12 in a venv
Hardware is
32 cores on Xeon Silver 4309Y CPU,
64GB memory,
Nvidia A30 24GB and CUDA 12.7

MahmoudAshraf97 · 2024-11-18T12:35:37Z

@homelab-00 Thanks for activity in solving installation issues here, a quick note that you shouldn't need uroman pip installation as it is not used anywhere, only perl should be enough

homelab-00 · 2024-11-18T20:15:57Z

Nvidia A30 24GB and CUDA 12.7

The latest version of pytorch 2.5.1 supports CUDA up to 12.4 (see here). You might want uninstall your newer version and install 12.4 to see if it helps.

Edit: This probably isn't related to your original issue, but it'll help the script run better once you manage to install it. It would be best however to replace your CUDA version before installing pytorch (and you should install pytorch with the correct CUDA association (e.g. pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 for CUDA 12.4) before installing whisper-diarization).

homelab-00 · 2024-11-18T21:08:55Z

perl is v5.32.1

Your 'perl' version could also potentially be causing some issues. In my setup I'm using v.5.40.0.1 without problems, you could try upgrading to that version to see if it helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with preprocess_text #271

Issue with preprocess_text #271

DrJPK commented Nov 18, 2024

homelab-00 commented Nov 18, 2024

DrJPK commented Nov 18, 2024 •

edited

Loading

homelab-00 commented Nov 18, 2024

homelab-00 commented Nov 18, 2024

DrJPK commented Nov 18, 2024

MahmoudAshraf97 commented Nov 18, 2024

homelab-00 commented Nov 18, 2024 •

edited

Loading

homelab-00 commented Nov 18, 2024

Issue with preprocess_text #271

Issue with preprocess_text #271

Comments

DrJPK commented Nov 18, 2024

homelab-00 commented Nov 18, 2024

DrJPK commented Nov 18, 2024 • edited Loading

homelab-00 commented Nov 18, 2024

homelab-00 commented Nov 18, 2024

DrJPK commented Nov 18, 2024

MahmoudAshraf97 commented Nov 18, 2024

homelab-00 commented Nov 18, 2024 • edited Loading

homelab-00 commented Nov 18, 2024

DrJPK commented Nov 18, 2024 •

edited

Loading

homelab-00 commented Nov 18, 2024 •

edited

Loading