Skip to content

Commit

Permalink
Merge pull request #78 from pettarin/nextmajor
Browse files Browse the repository at this point in the history
aeneas v1.5.0
  • Loading branch information
readbeyond committed Apr 1, 2016
2 parents 5554379 + a50fce0 commit 2d7c18d
Show file tree
Hide file tree
Showing 389 changed files with 20,014 additions and 9,675 deletions.
7 changes: 7 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
recursive-include aeneas/cdtw *
recursive-include aeneas/cew *
recursive-include aeneas/cint *
recursive-include aeneas/cmfcc *
recursive-include aeneas/cwave *
recursive-include aeneas/extra *
prune aeneas/extra/ctw_speect
recursive-include aeneas/res *
recursive-include aeneas/tools/res *
include aeneas_check_setup.py
Expand Down
216 changes: 131 additions & 85 deletions README.md

Large diffs are not rendered by default.

206 changes: 116 additions & 90 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@ aeneas
**aeneas** is a Python/C library and a set of tools to automagically
synchronize audio and text (aka forced alignment).

- Version: 1.4.1
- Date: 2016-02-13
- Version: 1.5.0
- Date: 2016-04-02
- Developed by: `ReadBeyond <http://www.readbeyond.it/>`__
- Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/>`__
- License: the GNU Affero General Public License Version 3 (AGPL v3)
- Contact: [email protected]
- Quick Links: `Home <http://www.readbeyond.it/aeneas/>`__ -
`GitHub <https://github.com/readbeyond/aeneas/>`__ -
`PyPI <https://pypi.python.org/pypi/aeneas/>`__ - `API
Docs <http://www.readbeyond.it/aeneas/docs/>`__ - `Mailing
`PyPI <https://pypi.python.org/pypi/aeneas/>`__ -
`Docs <http://www.readbeyond.it/aeneas/docs/>`__ -
`Tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__
- `Mailing
List <https://groups.google.com/d/forum/aeneas-forced-alignment>`__ -
`Web App <http://aeneasweb.org>`__

Expand All @@ -34,25 +36,31 @@ interval in the audio file:

::

1 => [00:00:00.000, 00:00:02.680]
From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480]
That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640]
But as the riper should by time decease, => [00:00:08.640, 00:00:11.960]
His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280]
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760]
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720]
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240]
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280]
And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960]
Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640]
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600]
Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000]
To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280]

This synchronization map can be output to file in several formats: SMIL
for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON/RBSE for
Web usage, or raw CSV/SSV/TSV/TXT/XML for further processing.
1 => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]

.. figure:: wiki/align.png
:alt: Waveform with aligned labels, detail

Waveform with aligned labels, detail

This synchronization map can be output to file in several formats: EAF
for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed
captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for
further processing.

System Requirements, Supported Platforms and Installation
---------------------------------------------------------
Expand All @@ -66,20 +74,17 @@ System Requirements
3. `FFmpeg <https://www.ffmpeg.org/>`__
4. `eSpeak <http://espeak.sourceforge.net/>`__
5. Python modules ``BeautifulSoup4``, ``lxml``, and ``numpy``
6. Python C headers to compile the Python C extensions (Optional but
6. Python C headers to compile the Python C extensions (optional but
strongly recommended)
7. A shell supporting UTF-8 (Optional but strongly recommended)
8. Python module ``pafy`` (Optional, only required if you want to
download audio from YouTube)
7. A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms
~~~~~~~~~~~~~~~~~~~

**aeneas** has been developed and tested on **Debian 64bit**, which is
the **only supported OS** at the moment.

However, **aeneas** has been confirmed to work on other Linux
distributions, OS X, and Windows. See the `PLATFORMS
the **only supported OS** at the moment. Nevertheless, **aeneas** has
been confirmed to work on other Linux distributions, OS X, and Windows.
See the `PLATFORMS
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__
for the details.

Expand Down Expand Up @@ -115,37 +120,45 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
Usage
-----

1. To check that you installed ``aeneas`` correctly, run:
1. To **check** whether you installed **aeneas** correctly, run:

``bash python -m aeneas.diagnostics``

2. Run ``execute_task`` or ``execute_job`` with ``-h`` (resp.,
``--help``) to get a short (resp., long) usage message:
2. Run without arguments to get the **usage message**:

.. code:: bash
python -m aeneas.tools.execute_task -h
python -m aeneas.tools.execute_job -h
python -m aeneas.tools.execute_task
python -m aeneas.tools.execute_job
You can also get a list of **live examples** that you can immediately
run on your machine thanks to the included files:

The above commands also print a list of live usage examples that you
can immediately run on your machine, thanks to the included example
files.
.. code:: bash
3. To compute a synchronization map ``map.json`` for a pair
python -m aeneas.tools.execute_task --examples
python -m aeneas.tools.execute_task --examples-all
3. To **compute a synchronization map** ``map.json`` for a pair
(``audio.mp3``, ``text.txt`` in
```plain`` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
`plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
text format), you can run:

.. code:: bash
python -m aeneas.tools.execute_task \
audio.mp3 \
text.txt \
"task_language=en|os_task_file_format=json|is_text_type=plain" \
"task_language=eng|os_task_file_format=json|is_text_type=plain" \
map.json
To compute a synchronization map ``map.smil`` for a pair (``audio.mp3``,
```page.xhtml`` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__
(The command has been split into lines with ``\`` for visual clarity; in
production you can have the entire command on a single line and/or you
can use shell variables.)

To **compute a synchronization map** ``map.smil`` for a pair
(``audio.mp3``,
`page.xhtml <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__
containing fragments marked by ``id`` attributes like ``f001``), you can
run:

Expand All @@ -155,80 +168,89 @@ run:
python -m aeneas.tools.execute_task \
audio.mp3 \
page.xhtml \
"task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
map.smil
```

The third parameter (the *configuration string*) can specify several
other parameters/options. See the
As you can see, the third argument (the *configuration string*)
specifies the parameters controlling the I/O formats and the processing
options for the task. Consult the
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details.

4. If you have several tasks to process, you can create a job container
and a configuration file, to process them all at once:
4. If you have several tasks to process, you can create a **job
container** to batch process them:

.. code:: bash
python -m aeneas.tools.execute_job job.zip output_directory
File ``job.zip`` should contain a ``config.txt`` or ``config.xml``
configuration file, providing **aeneas** with all the information needed
to parse the input assets and format the output sync map files. See the
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details.
to parse the input assets and format the output sync map files. Consult
the `documentation <http://www.readbeyond.it/aeneas/docs/>`__ for
details.

The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ provides
an introduction to the concepts of
```task`` <http://www.readbeyond.it/aeneas/docs/#tasks>`__ and
```job`` <http://www.readbeyond.it/aeneas/docs/#job>`__, and it lists of
all the options and tools available in the library.
The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ contains a
highly suggested
`tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__
which explains how to use the built-in command line tools.

Documentation and Support
-------------------------

Documentation: http://www.readbeyond.it/aeneas/docs/

High level description of how aeneas works:
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__

Tutorial: `A Practical Introduction To The aeneas
Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__

Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment

Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html

Development history:
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
- Documentation: http://www.readbeyond.it/aeneas/docs/
- Command line tools tutorial:
http://www.readbeyond.it/aeneas/docs/clitutorial.html
- Library tutorial:
http://www.readbeyond.it/aeneas/docs/libtutorial.html
- Old, verbose tutorial: `A Practical Introduction To The aeneas
Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__
- Mailing list:
https://groups.google.com/d/forum/aeneas-forced-alignment
- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
- High level description of how **aeneas** works:
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
- Development history:
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__

Supported Features
------------------

- Input text files in plain, parsed, subtitles, or unparsed format
- Input text files in ``parsed``, ``plain``, ``subtitles``, or
``unparsed`` (XML) format
- Multilevel input text files in ``mplain`` and ``munparsed`` (XML)
format
- Text extraction from XML (e.g., XHTML) files using ``id`` and
``class`` attributes
- Arbitrary text fragment granularity (single word, subphrase, phrase,
paragraph, etc.)
- Input audio file formats: all those supported by ``ffmpeg``
- Possibility of downloading the audio file from a YouTube video
- Batch processing
- Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT,
VTT, XML
- Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI,
FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK,
SR, SV, SW, TR, UK
- Input audio file formats: all those readable by ``ffmpeg``
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
TSV, TTML, TXT, VTT, XML
- Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO,
EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD,
NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
- MFCC and DTW computed via Python C extensions to reduce the
processing time
- On Linux, eSpeak called via a Python C extension for faster audio
synthesis
- Batch processing of multiple audio/text pairs
- Several built-in TTS engine wrappers: eSpeak (default, FLOSS),
Festival (FLOSS), Nuance TTS API (commercial)
- Use custom TTS engine wrappers besides the built-in ones
- Download audio from a YouTube video
- In multilevel mode, recursive alignment from paragraph to sentence to
word level
- Robust against misspelled/mispronounced words, local rearrangements
of words, background noise/sporadic spikes
- Code suitable for a Web app deployment (e.g., on-demand AWS
instances)
- Adjustable splitting times, including a max character/second
constraint for CC applications
- Automated detection of audio head/tail
- MFCC and DTW computed via Python C extensions to reduce the
processing time
- On Linux, ``espeak`` called via a Python C extension for faster audio
synthesis
- Output an HTML file (from ``finetuneas`` project) for fine tuning the
sync map manually
- Output an HTML file for fine tuning the sync map manually
(``finetuneas`` project)
- Execution parameters tunable at runtime
- Code suitable for Web app deployment (e.g., on-demand cloud
computing)

Limitations and Missing Features
--------------------------------
Expand All @@ -238,8 +260,6 @@ Limitations and Missing Features
- Audio is assumed to be spoken: not suitable/YMMV for song captioning
- No protection against memory trashing if you feed extremely long
audio files
- On Mac OS X and Windows, audio synthesis might be slow if you have
thousands of text fragments
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__

License
Expand All @@ -252,7 +272,7 @@ details.

Licenses for third party code and files included in **aeneas** can be
found in the
`licenses/ <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>`__
`licenses <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>`__
directory.

No copy rights were harmed in the making of this project.
Expand All @@ -278,6 +298,9 @@ Sponsors
- **October 2015**: an anonymous donation sponsored the development of
the "YouTube downloader" option (v1.3.0)

- **April 2016**: the Fruch Foundation kindly sponsored the development
and documentation of v1.5.0

Supporting
~~~~~~~~~~

Expand Down Expand Up @@ -337,6 +360,9 @@ asynchronous usage.
**Chris Hubbard** prepared the files for packaging aeneas as a
Debian/Ubuntu ``.deb``.

**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
tuning sync maps in the browser.

All the mighty `GitHub
contributors <https://github.com/readbeyond/aeneas/graphs/contributors>`__,
and the members of the `Google
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.4.1
1.5.0
Loading

0 comments on commit 2d7c18d

Please sign in to comment.