jp_sub_extract

Extracts Japanese words from an .srt subtitles file, filters them using the JMdict and then outputs the result into an output file. Not available as a pip package.

Prerequisites: Python 3, and the following pip packages (note that you can choose the sudachidict-* version between small, core, full: read https://github.com/WorksApplications/SudachiPy#dictionary-edition for details):

pip install lxml
pip install srt
pip install sudachipy
pip install sudachidict-core

You also need a JMdict edition, which you can download from https://www.edrdg.org/wiki/index.php/JMdict-EDICT_Dictionary_Project

Example usage:

python3 subs_filter.py \
-i subs.srt \
-o out.txt \
--jmdict JMdict_e \
--sudachi_dict_type core \
--whitelist_cache whitelist.pickle

Where: -i is the input .srt file; -o is the output file; --jmdict is the JMdict file; --sudachi_dict_type is the sudachidict version you installed; --whitelist_cache is, if you want to cache the whitelist, where it will be pickled and depickled.

If you want to change the filter code, it is located in lib/jmdict_whitelist.py.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
LICENSE		LICENSE
README.md		README.md
subs_filter.py		subs_filter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jp_sub_extract

About

Releases

Packages

Languages

License

Shibodd/jp_sub_extract

Folders and files

Latest commit

History

Repository files navigation

jp_sub_extract

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages