-
Notifications
You must be signed in to change notification settings - Fork 238
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #78 from pettarin/nextmajor
aeneas v1.5.0
- Loading branch information
Showing
389 changed files
with
20,014 additions
and
9,675 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,16 +4,18 @@ aeneas | |
**aeneas** is a Python/C library and a set of tools to automagically | ||
synchronize audio and text (aka forced alignment). | ||
|
||
- Version: 1.4.1 | ||
- Date: 2016-02-13 | ||
- Version: 1.5.0 | ||
- Date: 2016-04-02 | ||
- Developed by: `ReadBeyond <http://www.readbeyond.it/>`__ | ||
- Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/>`__ | ||
- License: the GNU Affero General Public License Version 3 (AGPL v3) | ||
- Contact: [email protected] | ||
- Quick Links: `Home <http://www.readbeyond.it/aeneas/>`__ - | ||
`GitHub <https://github.com/readbeyond/aeneas/>`__ - | ||
`PyPI <https://pypi.python.org/pypi/aeneas/>`__ - `API | ||
Docs <http://www.readbeyond.it/aeneas/docs/>`__ - `Mailing | ||
`PyPI <https://pypi.python.org/pypi/aeneas/>`__ - | ||
`Docs <http://www.readbeyond.it/aeneas/docs/>`__ - | ||
`Tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ | ||
- `Mailing | ||
List <https://groups.google.com/d/forum/aeneas-forced-alignment>`__ - | ||
`Web App <http://aeneasweb.org>`__ | ||
|
||
|
@@ -34,25 +36,31 @@ interval in the audio file: | |
|
||
:: | ||
|
||
1 => [00:00:00.000, 00:00:02.680] | ||
From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480] | ||
That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640] | ||
But as the riper should by time decease, => [00:00:08.640, 00:00:11.960] | ||
His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280] | ||
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520] | ||
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760] | ||
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720] | ||
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240] | ||
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280] | ||
And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960] | ||
Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640] | ||
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600] | ||
Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000] | ||
To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280] | ||
|
||
This synchronization map can be output to file in several formats: SMIL | ||
for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON/RBSE for | ||
Web usage, or raw CSV/SSV/TSV/TXT/XML for further processing. | ||
1 => [00:00:00.000, 00:00:02.640] | ||
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] | ||
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] | ||
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] | ||
His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] | ||
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] | ||
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] | ||
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] | ||
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] | ||
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] | ||
And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] | ||
Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] | ||
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] | ||
Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] | ||
To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240] | ||
|
||
.. figure:: wiki/align.png | ||
:alt: Waveform with aligned labels, detail | ||
|
||
Waveform with aligned labels, detail | ||
|
||
This synchronization map can be output to file in several formats: EAF | ||
for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed | ||
captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for | ||
further processing. | ||
|
||
System Requirements, Supported Platforms and Installation | ||
--------------------------------------------------------- | ||
|
@@ -66,20 +74,17 @@ System Requirements | |
3. `FFmpeg <https://www.ffmpeg.org/>`__ | ||
4. `eSpeak <http://espeak.sourceforge.net/>`__ | ||
5. Python modules ``BeautifulSoup4``, ``lxml``, and ``numpy`` | ||
6. Python C headers to compile the Python C extensions (Optional but | ||
6. Python C headers to compile the Python C extensions (optional but | ||
strongly recommended) | ||
7. A shell supporting UTF-8 (Optional but strongly recommended) | ||
8. Python module ``pafy`` (Optional, only required if you want to | ||
download audio from YouTube) | ||
7. A shell supporting UTF-8 (optional but strongly recommended) | ||
|
||
Supported Platforms | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
**aeneas** has been developed and tested on **Debian 64bit**, which is | ||
the **only supported OS** at the moment. | ||
|
||
However, **aeneas** has been confirmed to work on other Linux | ||
distributions, OS X, and Windows. See the `PLATFORMS | ||
the **only supported OS** at the moment. Nevertheless, **aeneas** has | ||
been confirmed to work on other Linux distributions, OS X, and Windows. | ||
See the `PLATFORMS | ||
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__ | ||
for the details. | ||
|
||
|
@@ -115,37 +120,45 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows. | |
Usage | ||
----- | ||
|
||
1. To check that you installed ``aeneas`` correctly, run: | ||
1. To **check** whether you installed **aeneas** correctly, run: | ||
|
||
``bash python -m aeneas.diagnostics`` | ||
|
||
2. Run ``execute_task`` or ``execute_job`` with ``-h`` (resp., | ||
``--help``) to get a short (resp., long) usage message: | ||
2. Run without arguments to get the **usage message**: | ||
|
||
.. code:: bash | ||
python -m aeneas.tools.execute_task -h | ||
python -m aeneas.tools.execute_job -h | ||
python -m aeneas.tools.execute_task | ||
python -m aeneas.tools.execute_job | ||
You can also get a list of **live examples** that you can immediately | ||
run on your machine thanks to the included files: | ||
|
||
The above commands also print a list of live usage examples that you | ||
can immediately run on your machine, thanks to the included example | ||
files. | ||
.. code:: bash | ||
3. To compute a synchronization map ``map.json`` for a pair | ||
python -m aeneas.tools.execute_task --examples | ||
python -m aeneas.tools.execute_task --examples-all | ||
3. To **compute a synchronization map** ``map.json`` for a pair | ||
(``audio.mp3``, ``text.txt`` in | ||
```plain`` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__ | ||
`plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__ | ||
text format), you can run: | ||
|
||
.. code:: bash | ||
python -m aeneas.tools.execute_task \ | ||
audio.mp3 \ | ||
text.txt \ | ||
"task_language=en|os_task_file_format=json|is_text_type=plain" \ | ||
"task_language=eng|os_task_file_format=json|is_text_type=plain" \ | ||
map.json | ||
To compute a synchronization map ``map.smil`` for a pair (``audio.mp3``, | ||
```page.xhtml`` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__ | ||
(The command has been split into lines with ``\`` for visual clarity; in | ||
production you can have the entire command on a single line and/or you | ||
can use shell variables.) | ||
|
||
To **compute a synchronization map** ``map.smil`` for a pair | ||
(``audio.mp3``, | ||
`page.xhtml <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__ | ||
containing fragments marked by ``id`` attributes like ``f001``), you can | ||
run: | ||
|
||
|
@@ -155,80 +168,89 @@ run: | |
python -m aeneas.tools.execute_task \ | ||
audio.mp3 \ | ||
page.xhtml \ | ||
"task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ | ||
"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ | ||
map.smil | ||
``` | ||
|
||
The third parameter (the *configuration string*) can specify several | ||
other parameters/options. See the | ||
As you can see, the third argument (the *configuration string*) | ||
specifies the parameters controlling the I/O formats and the processing | ||
options for the task. Consult the | ||
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details. | ||
|
||
4. If you have several tasks to process, you can create a job container | ||
and a configuration file, to process them all at once: | ||
4. If you have several tasks to process, you can create a **job | ||
container** to batch process them: | ||
|
||
.. code:: bash | ||
python -m aeneas.tools.execute_job job.zip output_directory | ||
File ``job.zip`` should contain a ``config.txt`` or ``config.xml`` | ||
configuration file, providing **aeneas** with all the information needed | ||
to parse the input assets and format the output sync map files. See the | ||
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details. | ||
to parse the input assets and format the output sync map files. Consult | ||
the `documentation <http://www.readbeyond.it/aeneas/docs/>`__ for | ||
details. | ||
|
||
The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ provides | ||
an introduction to the concepts of | ||
```task`` <http://www.readbeyond.it/aeneas/docs/#tasks>`__ and | ||
```job`` <http://www.readbeyond.it/aeneas/docs/#job>`__, and it lists of | ||
all the options and tools available in the library. | ||
The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ contains a | ||
highly suggested | ||
`tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ | ||
which explains how to use the built-in command line tools. | ||
|
||
Documentation and Support | ||
------------------------- | ||
|
||
Documentation: http://www.readbeyond.it/aeneas/docs/ | ||
|
||
High level description of how aeneas works: | ||
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__ | ||
|
||
Tutorial: `A Practical Introduction To The aeneas | ||
Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__ | ||
|
||
Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment | ||
|
||
Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html | ||
|
||
Development history: | ||
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__ | ||
- Documentation: http://www.readbeyond.it/aeneas/docs/ | ||
- Command line tools tutorial: | ||
http://www.readbeyond.it/aeneas/docs/clitutorial.html | ||
- Library tutorial: | ||
http://www.readbeyond.it/aeneas/docs/libtutorial.html | ||
- Old, verbose tutorial: `A Practical Introduction To The aeneas | ||
Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__ | ||
- Mailing list: | ||
https://groups.google.com/d/forum/aeneas-forced-alignment | ||
- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html | ||
- High level description of how **aeneas** works: | ||
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__ | ||
- Development history: | ||
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__ | ||
|
||
Supported Features | ||
------------------ | ||
|
||
- Input text files in plain, parsed, subtitles, or unparsed format | ||
- Input text files in ``parsed``, ``plain``, ``subtitles``, or | ||
``unparsed`` (XML) format | ||
- Multilevel input text files in ``mplain`` and ``munparsed`` (XML) | ||
format | ||
- Text extraction from XML (e.g., XHTML) files using ``id`` and | ||
``class`` attributes | ||
- Arbitrary text fragment granularity (single word, subphrase, phrase, | ||
paragraph, etc.) | ||
- Input audio file formats: all those supported by ``ffmpeg`` | ||
- Possibility of downloading the audio file from a YouTube video | ||
- Batch processing | ||
- Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT, | ||
VTT, XML | ||
- Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI, | ||
FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, | ||
SR, SV, SW, TR, UK | ||
- Input audio file formats: all those readable by ``ffmpeg`` | ||
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, | ||
TSV, TTML, TXT, VTT, XML | ||
- Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, | ||
EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD, | ||
NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR | ||
- MFCC and DTW computed via Python C extensions to reduce the | ||
processing time | ||
- On Linux, eSpeak called via a Python C extension for faster audio | ||
synthesis | ||
- Batch processing of multiple audio/text pairs | ||
- Several built-in TTS engine wrappers: eSpeak (default, FLOSS), | ||
Festival (FLOSS), Nuance TTS API (commercial) | ||
- Use custom TTS engine wrappers besides the built-in ones | ||
- Download audio from a YouTube video | ||
- In multilevel mode, recursive alignment from paragraph to sentence to | ||
word level | ||
- Robust against misspelled/mispronounced words, local rearrangements | ||
of words, background noise/sporadic spikes | ||
- Code suitable for a Web app deployment (e.g., on-demand AWS | ||
instances) | ||
- Adjustable splitting times, including a max character/second | ||
constraint for CC applications | ||
- Automated detection of audio head/tail | ||
- MFCC and DTW computed via Python C extensions to reduce the | ||
processing time | ||
- On Linux, ``espeak`` called via a Python C extension for faster audio | ||
synthesis | ||
- Output an HTML file (from ``finetuneas`` project) for fine tuning the | ||
sync map manually | ||
- Output an HTML file for fine tuning the sync map manually | ||
(``finetuneas`` project) | ||
- Execution parameters tunable at runtime | ||
- Code suitable for Web app deployment (e.g., on-demand cloud | ||
computing) | ||
|
||
Limitations and Missing Features | ||
-------------------------------- | ||
|
@@ -238,8 +260,6 @@ Limitations and Missing Features | |
- Audio is assumed to be spoken: not suitable/YMMV for song captioning | ||
- No protection against memory trashing if you feed extremely long | ||
audio files | ||
- On Mac OS X and Windows, audio synthesis might be slow if you have | ||
thousands of text fragments | ||
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__ | ||
|
||
License | ||
|
@@ -252,7 +272,7 @@ details. | |
|
||
Licenses for third party code and files included in **aeneas** can be | ||
found in the | ||
`licenses/ <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>`__ | ||
`licenses <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>`__ | ||
directory. | ||
|
||
No copy rights were harmed in the making of this project. | ||
|
@@ -278,6 +298,9 @@ Sponsors | |
- **October 2015**: an anonymous donation sponsored the development of | ||
the "YouTube downloader" option (v1.3.0) | ||
|
||
- **April 2016**: the Fruch Foundation kindly sponsored the development | ||
and documentation of v1.5.0 | ||
|
||
Supporting | ||
~~~~~~~~~~ | ||
|
||
|
@@ -337,6 +360,9 @@ asynchronous usage. | |
**Chris Hubbard** prepared the files for packaging aeneas as a | ||
Debian/Ubuntu ``.deb``. | ||
|
||
**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine | ||
tuning sync maps in the browser. | ||
|
||
All the mighty `GitHub | ||
contributors <https://github.com/readbeyond/aeneas/graphs/contributors>`__, | ||
and the members of the `Google | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
1.4.1 | ||
1.5.0 |
Oops, something went wrong.