diff --git a/MANIFEST.in b/MANIFEST.in
index ef99fc9b..94d3d07c 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,3 +1,10 @@
+recursive-include aeneas/cdtw *
+recursive-include aeneas/cew *
+recursive-include aeneas/cint *
+recursive-include aeneas/cmfcc *
+recursive-include aeneas/cwave *
+recursive-include aeneas/extra *
+prune aeneas/extra/ctw_speect
recursive-include aeneas/res *
recursive-include aeneas/tools/res *
include aeneas_check_setup.py
diff --git a/README.md b/README.md
index 39fc7756..5e78ff43 100644
--- a/README.md
+++ b/README.md
@@ -2,13 +2,13 @@
**aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
-* Version: 1.4.1
-* Date: 2016-02-13
+* Version: 1.5.0
+* Date: 2016-04-02
* Developed by: [ReadBeyond](http://www.readbeyond.it/)
* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)
* License: the GNU Affero General Public License Version 3 (AGPL v3)
* Contact: [aeneas@readbeyond.it](mailto:aeneas@readbeyond.it)
-* Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [API Docs](http://www.readbeyond.it/aeneas/docs/) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org)
+* Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [Docs](http://www.readbeyond.it/aeneas/docs/) - [Tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org)
## Goal
@@ -19,32 +19,38 @@ and an audio file containing the narration of the text.
In computer science this task is known as
(automatically computing a) **forced alignment**.
-For example, given [this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml)
-and [this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3),
+For example, given
+[this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml)
+and
+[this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3),
**aeneas** determines, for each fragment, the corresponding time interval in the audio file:
```
-1 => [00:00:00.000, 00:00:02.680]
-From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480]
-That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640]
-But as the riper should by time decease, => [00:00:08.640, 00:00:11.960]
-His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280]
-But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520]
-Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760]
-Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720]
-Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240]
-Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280]
-And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960]
-Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640]
-And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600]
-Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000]
-To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280]
+1 => [00:00:00.000, 00:00:02.640]
+From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
+That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
+But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
+His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
+But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
+Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
+Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
+Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
+Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
+And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
+Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
+And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
+Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
+To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
```
+![Waveform with aligned labels, detail](wiki/align.png)
+
This synchronization map can be output to file in several formats:
-SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning,
-JSON/RBSE for Web usage,
-or raw CSV/SSV/TSV/TXT/XML for further processing.
+EAF for research purposes,
+SMIL for EPUB 3,
+SBV/SRT/SUB/TTML/VTT for closed captioning,
+JSON for Web usage,
+or raw AUD/CSV/SSV/TSV/TXT/XML for further processing.
## System Requirements, Supported Platforms and Installation
@@ -56,30 +62,33 @@ or raw CSV/SSV/TSV/TXT/XML for further processing.
3. [FFmpeg](https://www.ffmpeg.org/)
4. [eSpeak](http://espeak.sourceforge.net/)
5. Python modules `BeautifulSoup4`, `lxml`, and `numpy`
-6. Python C headers to compile the Python C extensions (Optional but strongly recommended)
-7. A shell supporting UTF-8 (Optional but strongly recommended)
-8. Python module `pafy` (Optional, only required if you want to download audio from YouTube)
+6. Python C headers to compile the Python C extensions (optional but strongly recommended)
+7. A shell supporting UTF-8 (optional but strongly recommended)
### Supported Platforms
**aeneas** has been developed and tested on **Debian 64bit**,
which is the **only supported OS** at the moment.
-
-However, **aeneas** has been confirmed to work on
+Nevertheless, **aeneas** has been confirmed to work on
other Linux distributions, OS X, and Windows.
-See the [PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md) for the details.
+See the
+[PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md)
+for the details.
If installing **aeneas** natively on your OS proves difficult,
you are strongly encouraged to use
[aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant),
which provides **aeneas** inside a virtualized Debian image
-running under [VirtualBox](https://www.virtualbox.org/)
-and [Vagrant](http://www.vagrantup.com/), which can be installed
-on any modern OS (Linux, Mac OS X, Windows).
+running under
+[VirtualBox](https://www.virtualbox.org/)
+and
+[Vagrant](http://www.vagrantup.com/),
+which can be installed on any modern OS (Linux, Mac OS X, Windows).
### Installation
-1. Install [Python](https://python.org/) (2.7.x preferred),
+1. Install
+ [Python](https://python.org/) (2.7.x preferred),
[FFmpeg](https://www.ffmpeg.org/), and
[eSpeak](http://espeak.sourceforge.net/)
@@ -93,59 +102,76 @@ on any modern OS (Linux, Mac OS X, Windows).
pip install aeneas
```
-See the [INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md)
+See the
+[INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md)
for detailed, step-by-step procedures for Linux, OS X, and Windows.
## Usage
-1. To check that you installed `aeneas` correctly, run:
+1. To **check** whether you installed **aeneas** correctly, run:
```bash
python -m aeneas.diagnostics
```
-2. Run `execute_task` or `execute_job`
- with `-h` (resp., `--help`) to get a short (resp., long) usage message:
+2. Run without arguments to get the **usage message**:
```bash
- python -m aeneas.tools.execute_task -h
- python -m aeneas.tools.execute_job -h
+ python -m aeneas.tools.execute_task
+ python -m aeneas.tools.execute_job
```
- The above commands also print a list of live usage examples
- that you can immediately run on your machine,
- thanks to the included example files.
+ You can also get a list of **live examples**
+ that you can immediately run on your machine
+ thanks to the included files:
+
+ ```bash
+ python -m aeneas.tools.execute_task --examples
+ python -m aeneas.tools.execute_task --examples-all
+ ```
-3. To compute a synchronization map `map.json` for a pair
- (`audio.mp3`, `text.txt` in [`plain`](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN) text format), you can run:
+3. To **compute a synchronization map** `map.json` for a pair
+ (`audio.mp3`, `text.txt` in
+ [plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN)
+ text format), you can run:
```bash
python -m aeneas.tools.execute_task \
audio.mp3 \
text.txt \
- "task_language=en|os_task_file_format=json|is_text_type=plain" \
+ "task_language=eng|os_task_file_format=json|is_text_type=plain" \
map.json
```
- To compute a synchronization map `map.smil` for a pair
- (`audio.mp3`, [`page.xhtml`](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED) containing fragments marked by `id` attributes like `f001`),
+ (The command has been split into lines with `\` for visual clarity;
+ in production you can have the entire command on a single line
+ and/or you can use shell variables.)
+
+ To **compute a synchronization map** `map.smil` for a pair
+ (`audio.mp3`,
+ [page.xhtml](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED)
+ containing fragments marked by `id` attributes like `f001`),
you can run:
```bash
python -m aeneas.tools.execute_task \
audio.mp3 \
page.xhtml \
- "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
+ "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
map.smil
```
- The third parameter (the _configuration string_) can specify several other parameters/options.
- See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.
+ As you can see, the third argument (the _configuration string_)
+ specifies the parameters controlling the I/O formats
+ and the processing options for the task.
+ Consult the
+ [documentation](http://www.readbeyond.it/aeneas/docs/)
+ for details.
4. If you have several tasks to process,
- you can create a job container and a configuration file,
- to process them all at once:
+ you can create a **job container**
+ to batch process them:
```bash
python -m aeneas.tools.execute_job job.zip output_directory
@@ -155,48 +181,59 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
configuration file, providing **aeneas**
with all the information needed to parse the input assets
and format the output sync map files.
- See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.
+ Consult the
+ [documentation](http://www.readbeyond.it/aeneas/docs/)
+ for details.
-The [documentation](http://www.readbeyond.it/aeneas/docs/)
-provides an introduction to the concepts of
-[`task`](http://www.readbeyond.it/aeneas/docs/#tasks) and
-[`job`](http://www.readbeyond.it/aeneas/docs/#job),
-and it lists of all the options and tools available in the library.
+The
+[documentation](http://www.readbeyond.it/aeneas/docs/)
+contains a highly suggested
+[tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
+which explains how to use the built-in command line tools.
## Documentation and Support
-Documentation: [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/)
-
-High level description of how aeneas works: [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
-
-Tutorial: [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html)
-
-Mailing list: [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment)
-
-Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html)
-
-Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
+* Documentation:
+ [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/)
+* Command line tools tutorial:
+ [http://www.readbeyond.it/aeneas/docs/clitutorial.html](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
+* Library tutorial:
+ [http://www.readbeyond.it/aeneas/docs/libtutorial.html](http://www.readbeyond.it/aeneas/docs/libtutorial.html)
+* Old, verbose tutorial:
+ [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html)
+* Mailing list:
+ [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment)
+* Changelog:
+ [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html)
+* High level description of how **aeneas** works:
+ [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
+* Development history:
+ [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
## Supported Features
-* Input text files in plain, parsed, subtitles, or unparsed format
+* Input text files in `parsed`, `plain`, `subtitles`, or `unparsed` (XML) format
+* Multilevel input text files in `mplain` and `munparsed` (XML) format
* Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
-* Input audio file formats: all those supported by `ffmpeg`
-* Possibility of downloading the audio file from a YouTube video
-* Batch processing
-* Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT, VTT, XML
-* Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, SW, TR, UK
+* Input audio file formats: all those readable by `ffmpeg`
+* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TSV, TTML, TXT, VTT, XML
+* Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
+* MFCC and DTW computed via Python C extensions to reduce the processing time
+* On Linux, eSpeak called via a Python C extension for faster audio synthesis
+* Batch processing of multiple audio/text pairs
+* Several built-in TTS engine wrappers: eSpeak (default, FLOSS), Festival (FLOSS), Nuance TTS API (commercial)
+* Use custom TTS engine wrappers besides the built-in ones
+* Download audio from a YouTube video
+* In multilevel mode, recursive alignment from paragraph to sentence to word level
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
-* Code suitable for a Web app deployment (e.g., on-demand AWS instances)
* Adjustable splitting times, including a max character/second constraint for CC applications
* Automated detection of audio head/tail
-* MFCC and DTW computed via Python C extensions to reduce the processing time
-* On Linux, `espeak` called via a Python C extension for faster audio synthesis
-* Output an HTML file (from `finetuneas` project) for fine tuning the sync map manually
+* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
* Execution parameters tunable at runtime
+* Code suitable for Web app deployment (e.g., on-demand cloud computing)
## Limitations and Missing Features
@@ -204,7 +241,6 @@ Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/
* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
* Audio is assumed to be spoken: not suitable/YMMV for song captioning
* No protection against memory trashing if you feed extremely long audio files
-* On Mac OS X and Windows, audio synthesis might be slow if you have thousands of text fragments
* [Open issues](https://github.com/readbeyond/aeneas/issues)
@@ -212,10 +248,12 @@ Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/
**aeneas** is released under the terms of the
GNU Affero General Public License Version 3.
-See the [LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details.
+See the
+[LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details.
Licenses for third party code and files included in **aeneas**
-can be found in the [licenses/](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory.
+can be found in the
+[licenses](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory.
No copy rights were harmed in the making of this project.
@@ -232,6 +270,8 @@ No copy rights were harmed in the making of this project.
* **October 2015**: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0)
+* **April 2016**: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0
+
### Supporting
Would you like supporting the development of **aeneas**?
@@ -245,7 +285,8 @@ I accept sponsorships to
* support of third party installations, and
* improve the documentation.
-Feel free to [get in touch](mailto:aeneas@readbeyond.it).
+Feel free to
+[get in touch](mailto:aeneas@readbeyond.it).
### Contributing
@@ -297,8 +338,13 @@ for its asynchronous usage.
**Chris Hubbard** prepared the files for
packaging aeneas as a Debian/Ubuntu `.deb`.
-All the mighty [GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors),
-and the members of the [Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment).
+**Firat Ozdemir** contributed the `finetuneas`
+HTML/JS code for fine tuning sync maps in the browser.
+
+All the mighty
+[GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors),
+and the members of the
+[Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment).
diff --git a/README.rst b/README.rst
index 629ab29c..dec6c93f 100644
--- a/README.rst
+++ b/README.rst
@@ -4,16 +4,18 @@ aeneas
**aeneas** is a Python/C library and a set of tools to automagically
synchronize audio and text (aka forced alignment).
-- Version: 1.4.1
-- Date: 2016-02-13
+- Version: 1.5.0
+- Date: 2016-04-02
- Developed by: `ReadBeyond `__
- Lead Developer: `Alberto Pettarin `__
- License: the GNU Affero General Public License Version 3 (AGPL v3)
- Contact: aeneas@readbeyond.it
- Quick Links: `Home `__ -
`GitHub `__ -
- `PyPI `__ - `API
- Docs `__ - `Mailing
+ `PyPI `__ -
+ `Docs `__ -
+ `Tutorial `__
+ - `Mailing
List `__ -
`Web App `__
@@ -34,25 +36,31 @@ interval in the audio file:
::
- 1 => [00:00:00.000, 00:00:02.680]
- From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480]
- That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640]
- But as the riper should by time decease, => [00:00:08.640, 00:00:11.960]
- His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280]
- But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520]
- Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760]
- Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720]
- Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240]
- Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280]
- And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960]
- Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640]
- And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600]
- Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000]
- To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280]
-
-This synchronization map can be output to file in several formats: SMIL
-for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON/RBSE for
-Web usage, or raw CSV/SSV/TSV/TXT/XML for further processing.
+ 1 => [00:00:00.000, 00:00:02.640]
+ From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
+ That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
+ But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
+ His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
+ But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
+ Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
+ Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
+ Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
+ Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
+ And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
+ Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
+ And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
+ Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
+ To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
+
+.. figure:: wiki/align.png
+ :alt: Waveform with aligned labels, detail
+
+ Waveform with aligned labels, detail
+
+This synchronization map can be output to file in several formats: EAF
+for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed
+captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for
+further processing.
System Requirements, Supported Platforms and Installation
---------------------------------------------------------
@@ -66,20 +74,17 @@ System Requirements
3. `FFmpeg `__
4. `eSpeak `__
5. Python modules ``BeautifulSoup4``, ``lxml``, and ``numpy``
-6. Python C headers to compile the Python C extensions (Optional but
+6. Python C headers to compile the Python C extensions (optional but
strongly recommended)
-7. A shell supporting UTF-8 (Optional but strongly recommended)
-8. Python module ``pafy`` (Optional, only required if you want to
- download audio from YouTube)
+7. A shell supporting UTF-8 (optional but strongly recommended)
Supported Platforms
~~~~~~~~~~~~~~~~~~~
**aeneas** has been developed and tested on **Debian 64bit**, which is
-the **only supported OS** at the moment.
-
-However, **aeneas** has been confirmed to work on other Linux
-distributions, OS X, and Windows. See the `PLATFORMS
+the **only supported OS** at the moment. Nevertheless, **aeneas** has
+been confirmed to work on other Linux distributions, OS X, and Windows.
+See the `PLATFORMS
file `__
for the details.
@@ -115,25 +120,28 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
Usage
-----
-1. To check that you installed ``aeneas`` correctly, run:
+1. To **check** whether you installed **aeneas** correctly, run:
``bash python -m aeneas.diagnostics``
-2. Run ``execute_task`` or ``execute_job`` with ``-h`` (resp.,
- ``--help``) to get a short (resp., long) usage message:
+2. Run without arguments to get the **usage message**:
.. code:: bash
- python -m aeneas.tools.execute_task -h
- python -m aeneas.tools.execute_job -h
+ python -m aeneas.tools.execute_task
+ python -m aeneas.tools.execute_job
+
+ You can also get a list of **live examples** that you can immediately
+ run on your machine thanks to the included files:
- The above commands also print a list of live usage examples that you
- can immediately run on your machine, thanks to the included example
- files.
+ .. code:: bash
-3. To compute a synchronization map ``map.json`` for a pair
+ python -m aeneas.tools.execute_task --examples
+ python -m aeneas.tools.execute_task --examples-all
+
+3. To **compute a synchronization map** ``map.json`` for a pair
(``audio.mp3``, ``text.txt`` in
- ```plain`` `__
+ `plain `__
text format), you can run:
.. code:: bash
@@ -141,11 +149,16 @@ Usage
python -m aeneas.tools.execute_task \
audio.mp3 \
text.txt \
- "task_language=en|os_task_file_format=json|is_text_type=plain" \
+ "task_language=eng|os_task_file_format=json|is_text_type=plain" \
map.json
-To compute a synchronization map ``map.smil`` for a pair (``audio.mp3``,
-```page.xhtml`` `__
+(The command has been split into lines with ``\`` for visual clarity; in
+production you can have the entire command on a single line and/or you
+can use shell variables.)
+
+To **compute a synchronization map** ``map.smil`` for a pair
+(``audio.mp3``,
+`page.xhtml `__
containing fragments marked by ``id`` attributes like ``f001``), you can
run:
@@ -155,16 +168,17 @@ run:
python -m aeneas.tools.execute_task \
audio.mp3 \
page.xhtml \
- "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
+ "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
map.smil
```
-The third parameter (the *configuration string*) can specify several
-other parameters/options. See the
+As you can see, the third argument (the *configuration string*)
+specifies the parameters controlling the I/O formats and the processing
+options for the task. Consult the
`documentation `__ for details.
-4. If you have several tasks to process, you can create a job container
- and a configuration file, to process them all at once:
+4. If you have several tasks to process, you can create a **job
+ container** to batch process them:
.. code:: bash
@@ -172,63 +186,71 @@ other parameters/options. See the
File ``job.zip`` should contain a ``config.txt`` or ``config.xml``
configuration file, providing **aeneas** with all the information needed
-to parse the input assets and format the output sync map files. See the
-`documentation `__ for details.
+to parse the input assets and format the output sync map files. Consult
+the `documentation `__ for
+details.
-The `documentation `__ provides
-an introduction to the concepts of
-```task`` `__ and
-```job`` `__, and it lists of
-all the options and tools available in the library.
+The `documentation `__ contains a
+highly suggested
+`tutorial `__
+which explains how to use the built-in command line tools.
Documentation and Support
-------------------------
-Documentation: http://www.readbeyond.it/aeneas/docs/
-
-High level description of how aeneas works:
-`HOWITWORKS `__
-
-Tutorial: `A Practical Introduction To The aeneas
-Package `__
-
-Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment
-
-Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
-
-Development history:
-`HISTORY `__
+- Documentation: http://www.readbeyond.it/aeneas/docs/
+- Command line tools tutorial:
+ http://www.readbeyond.it/aeneas/docs/clitutorial.html
+- Library tutorial:
+ http://www.readbeyond.it/aeneas/docs/libtutorial.html
+- Old, verbose tutorial: `A Practical Introduction To The aeneas
+ Package `__
+- Mailing list:
+ https://groups.google.com/d/forum/aeneas-forced-alignment
+- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
+- High level description of how **aeneas** works:
+ `HOWITWORKS `__
+- Development history:
+ `HISTORY `__
Supported Features
------------------
-- Input text files in plain, parsed, subtitles, or unparsed format
+- Input text files in ``parsed``, ``plain``, ``subtitles``, or
+ ``unparsed`` (XML) format
+- Multilevel input text files in ``mplain`` and ``munparsed`` (XML)
+ format
- Text extraction from XML (e.g., XHTML) files using ``id`` and
``class`` attributes
- Arbitrary text fragment granularity (single word, subphrase, phrase,
paragraph, etc.)
-- Input audio file formats: all those supported by ``ffmpeg``
-- Possibility of downloading the audio file from a YouTube video
-- Batch processing
-- Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT,
- VTT, XML
-- Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI,
- FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK,
- SR, SV, SW, TR, UK
+- Input audio file formats: all those readable by ``ffmpeg``
+- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
+ TSV, TTML, TXT, VTT, XML
+- Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO,
+ EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD,
+ NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
+- MFCC and DTW computed via Python C extensions to reduce the
+ processing time
+- On Linux, eSpeak called via a Python C extension for faster audio
+ synthesis
+- Batch processing of multiple audio/text pairs
+- Several built-in TTS engine wrappers: eSpeak (default, FLOSS),
+ Festival (FLOSS), Nuance TTS API (commercial)
+- Use custom TTS engine wrappers besides the built-in ones
+- Download audio from a YouTube video
+- In multilevel mode, recursive alignment from paragraph to sentence to
+ word level
- Robust against misspelled/mispronounced words, local rearrangements
of words, background noise/sporadic spikes
-- Code suitable for a Web app deployment (e.g., on-demand AWS
- instances)
- Adjustable splitting times, including a max character/second
constraint for CC applications
- Automated detection of audio head/tail
-- MFCC and DTW computed via Python C extensions to reduce the
- processing time
-- On Linux, ``espeak`` called via a Python C extension for faster audio
- synthesis
-- Output an HTML file (from ``finetuneas`` project) for fine tuning the
- sync map manually
+- Output an HTML file for fine tuning the sync map manually
+ (``finetuneas`` project)
- Execution parameters tunable at runtime
+- Code suitable for Web app deployment (e.g., on-demand cloud
+ computing)
Limitations and Missing Features
--------------------------------
@@ -238,8 +260,6 @@ Limitations and Missing Features
- Audio is assumed to be spoken: not suitable/YMMV for song captioning
- No protection against memory trashing if you feed extremely long
audio files
-- On Mac OS X and Windows, audio synthesis might be slow if you have
- thousands of text fragments
- `Open issues `__
License
@@ -252,7 +272,7 @@ details.
Licenses for third party code and files included in **aeneas** can be
found in the
-`licenses/ `__
+`licenses `__
directory.
No copy rights were harmed in the making of this project.
@@ -278,6 +298,9 @@ Sponsors
- **October 2015**: an anonymous donation sponsored the development of
the "YouTube downloader" option (v1.3.0)
+- **April 2016**: the Fruch Foundation kindly sponsored the development
+ and documentation of v1.5.0
+
Supporting
~~~~~~~~~~
@@ -337,6 +360,9 @@ asynchronous usage.
**Chris Hubbard** prepared the files for packaging aeneas as a
Debian/Ubuntu ``.deb``.
+**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
+tuning sync maps in the browser.
+
All the mighty `GitHub
contributors `__,
and the members of the `Google
diff --git a/VERSION b/VERSION
index 347f5833..bc80560f 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-1.4.1
+1.5.0
diff --git a/aeneas/__init__.py b/aeneas/__init__.py
index 18a457ef..3bb2b30b 100644
--- a/aeneas/__init__.py
+++ b/aeneas/__init__.py
@@ -6,51 +6,6 @@
to automagically synchronize audio and text (aka forced alignment).
"""
-from __future__ import absolute_import
-from __future__ import print_function
-from aeneas.adjustboundaryalgorithm import AdjustBoundaryAlgorithm
-from aeneas.analyzecontainer import AnalyzeContainer
-from aeneas.audiofile import AudioFile
-from aeneas.audiofile import AudioFileMonoWAVE
-from aeneas.audiofile import AudioFileUnsupportedFormatError
-from aeneas.container import Container
-from aeneas.container import ContainerFormat
-from aeneas.downloader import Downloader
-from aeneas.dtw import DTWAlgorithm
-from aeneas.dtw import DTWAligner
-from aeneas.espeakwrapper import ESPEAKWrapper
-from aeneas.executejob import ExecuteJob
-from aeneas.executetask import ExecuteTask
-from aeneas.executetask import ExecuteTaskExecutionError
-from aeneas.executetask import ExecuteTaskInputError
-from aeneas.ffmpegwrapper import FFMPEGWrapper
-from aeneas.ffprobewrapper import FFPROBEParsingError
-from aeneas.ffprobewrapper import FFPROBEUnsupportedFormatError
-from aeneas.ffprobewrapper import FFPROBEWrapper
-from aeneas.hierarchytype import HierarchyType
-from aeneas.idsortingalgorithm import IDSortingAlgorithm
-from aeneas.job import Job
-from aeneas.job import JobConfiguration
-from aeneas.language import Language
-from aeneas.logger import Logger
-from aeneas.sd import SD
-from aeneas.sd import SDMetric
-from aeneas.syncmap import SyncMap
-from aeneas.syncmap import SyncMapFormat
-from aeneas.syncmap import SyncMapFragment
-from aeneas.syncmap import SyncMapHeadTailFormat
-from aeneas.syncmap import SyncMapMissingParameterError
-from aeneas.synthesizer import Synthesizer
-from aeneas.task import Task
-from aeneas.task import TaskConfiguration
-from aeneas.textfile import TextFile
-from aeneas.textfile import TextFileFormat
-from aeneas.textfile import TextFragment
-from aeneas.vad import VAD
-from aeneas.validator import Validator
-import aeneas.globalconstants as gc
-import aeneas.globalfunctions as gf
-
__author__ = "Alberto Pettarin"
__copyright__ = """
Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
@@ -58,7 +13,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/adjustboundaryalgorithm.py b/aeneas/adjustboundaryalgorithm.py
index 2d110c06..4651c746 100644
--- a/aeneas/adjustboundaryalgorithm.py
+++ b/aeneas/adjustboundaryalgorithm.py
@@ -2,18 +2,26 @@
# coding=utf-8
"""
-Enumeration of the available algorithms to adjust
-the boundary point between two fragments.
+This module contains the following classes:
-.. versionadded:: 1.0.4
+* :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm`
+ implementing functions to adjust
+ the boundary point between two consecutive fragments.
+
+.. warning:: This module is likely to be refactored in a future version
"""
from __future__ import absolute_import
+from __future__ import division
from __future__ import print_function
-import copy
+import numpy
-from aeneas.logger import Logger
+from aeneas.audiofilemfcc import AudioFileMFCC
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.textfile import TextFile
+from aeneas.timevalue import Decimal
+from aeneas.timevalue import TimeValue
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -22,67 +30,183 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class AdjustBoundaryAlgorithm(object):
+class AdjustBoundaryAlgorithm(Loggable):
"""
- Enumeration of the available algorithms to adjust
- the boundary point between two consecutive fragments.
-
- :param algorithm: the boundary adjustment algorithm to be used
- :type algorithm: :class:`aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` enum
- :param text_map: a text map list [[start, end, id, text], ..., []]
- :type text_map: list
- :param speech: a list of time intervals [[s_1, e_1,], ..., [s_k, e_k]]
- containing speech
- :type speech: list
- :param nonspeech: a list of time intervals [[s_1, e_1,], ..., [s_j, e_j]]
- not containing speech
- :type nonspeech: list
- :param value: an optional parameter to be passed
- to the boundary adjustment algorithm,
- it will be converted (to int, to float) as needed,
- depending on the selected algorithm
- :type value: string
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ Enumeration and implementation of the available algorithms
+ to adjust the boundary point between two consecutive fragments.
+
+ :param algorithm: the algorithm to be used
+ :type algorithm: :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm`
+ :param list parameters: a list of additional parameters to be passed to the algorithm
+ :param boundary_indices: the current boundary indices,
+ with respect to the audio file full MFCCs
+ :type boundary_indices: :class:`numpy.ndarray` (1D)
+ :param real_wave_mfcc: the audio file MFCCs
+ :type real_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :param text_file: the text file containing the text fragments associated
+ :type text_file: :class:`~aeneas.textfile.TextFile`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
-
- :raises ValueError: if one of `text_map`, `speech` or `nonspeech` is `None` or `algorithm` value is not allowed
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: ValueError: if the value of ``algorithm`` is not allowed
+ :raises: TypeError: if one of ``boundary_indices``, ``real_wave_mfcc``,
+ or ``text_file`` is ``None`` or it has a wrong type
"""
AFTERCURRENT = "aftercurrent"
- """ Set the boundary at ``value`` seconds
- after the end of the current fragment """
+ """
+ Set the boundary at ``value`` seconds
+ after the end of the current fragment,
+ if the current boundary falls inside
+ a nonspeech interval.
+ If not, no adjustment is made.
+
+ Example (value ``0.200`` seconds):
+
+ .. image:: _static/aftercurrent.200.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and AFTERCURRENT labels with 0.200 seconds offset
+ """
AUTO = "auto"
- """ Auto (no adjustment) """
+ """
+ Auto (no adjustment).
+
+ Example:
+
+ .. image:: _static/auto.png
+ :scale: 100%
+ :align: center
+ :alt: The AUTO method does not change the time intervals
+ """
BEFORENEXT = "beforenext"
- """ Set the boundary at ``value`` seconds
- before the beginning of the next fragment """
+ """
+ Set the boundary at ``value`` seconds
+ before the beginning of the next fragment,
+ if the current boundary falls inside
+ a nonspeech interval.
+ If not, no adjustment is made.
+
+ Example (value ``0.200`` seconds):
+
+ .. image:: _static/beforenext.200.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and BEFORENEXT labels with 0.200 seconds offset
+ """
OFFSET = "offset"
- """ Offset the current boundaries by ``value`` seconds
+ """
+ Offset the current boundaries by ``value`` seconds.
+ The ``value`` can be negative or positive.
+
+ Example (value ``-0.200`` seconds):
+
+ .. image:: _static/offset.m200.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and OFFSET labels with value -0.200
+
+ Example (value ``0.200`` seconds):
+
+ .. image:: _static/offset.200.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and OFFSET labels with value 0.200
.. versionadded:: 1.1.0
"""
PERCENT = "percent"
- """ Set the boundary at ``value`` percent of
- the nonspeech interval between the current and the next fragment """
+ """
+ Set the boundary at ``value`` percent of
+ the nonspeech interval between the current and the next fragment,
+ if the current boundary falls inside
+ a nonspeech interval.
+ The ``value`` must be an integer in ``[0, 100]``.
+ If not, no adjustment is made.
+
+ Example (value ``25`` %):
+
+ .. image:: _static/percent.25.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and PERCENT labels with value 25 %
+
+ Example (value ``50`` %):
+
+ .. image:: _static/percent.50.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and PERCENT labels with value 50 %
+
+ Example (value ``75`` %):
+
+ .. image:: _static/percent.75.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and PERCENT labels with value 75 %
+
+ """
RATE = "rate"
- """ Adjust boundaries trying to respect the
- ``value`` characters/second constraint """
+ """
+ Adjust boundaries trying to respect the
+ ``value`` characters/second constraint.
+ The ``value`` must be positive.
+ First, the rates of all fragments are computed,
+ using the current boundaries.
+ For those fragments exceeding ``value`` characters/second,
+ the algorithm will try to move the end boundary forward,
+ so that its time interval increases (and hence its rate decreases).
+ Clearly, it is possible that not all fragments
+ can be adjusted this way: for example,
+ if you have three consecutive fragments exceeding ``value``,
+ the middle one cannot be stretched.
+
+ Example (value ``13.0``, note how ``f000003`` is modified):
+
+ .. image:: _static/rate.13.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and RATE labels with value 13.0
+
+ """
RATEAGGRESSIVE = "rateaggressive"
- """ Adjust boundaries trying to respect the
- ``value`` characters/second constraint (aggressive mode)
+ """
+ Adjust boundaries trying to respect the
+ ``value`` characters/second constraint, in aggressive mode.
+ The ``value`` must be positive.
+ First, the rates of all fragments are computed,
+ using the current boundaries.
+ For those fragments exceeding ``value`` characters/second,
+ the algorithm will try to move the end boundary forward,
+ so that its time interval increases (and hence its rate decreases).
+ If moving the end boundary is not possible,
+ or it is not enough to keep the rate below ``value``,
+ the algorithm will try to move the begin boundary back;
+ this is the difference with the less aggressive
+ :data:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm.RATE`
+ algorithm.
+ Clearly, it is possible that not all fragments
+ can be adjusted this way: for example,
+ if you have three consecutive fragments exceeding ``value``,
+ the middle one cannot be stretched.
+
+ Example (value ``13.0``, note how ``f000003`` is modified):
+
+ .. image:: _static/rateaggressive.13.png
+ :scale: 100%
+ :align: center
+ :alt: Comparison between AUTO labels and RATEAGGRESSIVE labels with value 13.0
.. versionadded:: 1.1.0
"""
@@ -98,515 +222,334 @@ class AdjustBoundaryAlgorithm(object):
]
""" List of all the allowed values """
- DEFAULT_MAX_RATE = 21.0
- """ Default max rate (used only when ``RATE`` or ``RATEAGGRESSIVE``
- algorithms are used) """
-
- DEFAULT_PERCENT = 50
- """ Default percent value (used only when ``PERCENT`` algorithm is used) """
-
- TOLERANCE = 0.001
- """ Tolerance when comparing floats """
-
TAG = u"AdjustBoundaryAlgorithm"
def __init__(
self,
algorithm,
- text_map,
- speech,
- nonspeech,
- value=None,
+ parameters,
+ boundary_indices,
+ real_wave_mfcc,
+ text_file,
rconf=None,
logger=None
):
if algorithm not in self.ALLOWED_VALUES:
- raise ValueError("Algorithm value not allowed")
- if text_map is None:
- raise ValueError("Text map is None")
- if speech is None:
- raise ValueError("Speech list is None")
- if nonspeech is None:
- raise ValueError("Nonspeech list is None")
+ raise ValueError(u"Algorithm value not allowed")
+ if boundary_indices is None:
+ raise TypeError(u"boundary_indices is None")
+ if (real_wave_mfcc is None) or (not isinstance(real_wave_mfcc, AudioFileMFCC)):
+ raise TypeError(u"real_wave_mfcc is None or not an AudioFileMFCC object")
+ if (text_file is None) or (not isinstance(text_file, TextFile)):
+ raise TypeError(u"text_file is None or not a TextFile object")
+ super(AdjustBoundaryAlgorithm, self).__init__(rconf=rconf, logger=logger)
self.algorithm = algorithm
- self.text_map = copy.deepcopy(text_map)
- self.speech = speech
- self.nonspeech = nonspeech
- self.value = value
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
- self._parse_value()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
- def _parse_value(self):
- """
- Parse the self.value value
- """
- if self.algorithm == self.AUTO:
- return
- elif self.algorithm == self.PERCENT:
- try:
- self.value = int(self.value)
- except ValueError:
- self.value = self.DEFAULT_PERCENT
- self.value = max(min(self.value, 100), 0)
- else:
- try:
- self.value = float(self.value)
- except ValueError:
- self.value = 0.0
- if (
- (self.value <= 0) and
- (self.algorithm in [self.RATE, self.RATEAGGRESSIVE])
- ):
- self.value = self.DEFAULT_MAX_RATE
+ self.parameters = parameters
+ self.real_wave_mfcc = real_wave_mfcc
+ self.boundary_indices = boundary_indices
+ self.text_file = text_file
+ self.intervals = []
- def adjust(self):
+ def to_time_map(self):
"""
- Adjust the boundaries of the text map.
+ Adjust the boundaries of the text map
+ using the algorithm and parameters
+ specified in the constructor,
+ and return a list of time intervals.
:rtype: list of intervals
"""
if self.algorithm == self.AUTO:
- return self._adjust_auto()
+ self._adjust_auto()
elif self.algorithm == self.AFTERCURRENT:
- return self._adjust_aftercurrent()
+ self._adjust_aftercurrent()
elif self.algorithm == self.BEFORENEXT:
- return self._adjust_beforenext()
+ self._adjust_beforenext()
elif self.algorithm == self.OFFSET:
- return self._adjust_offset()
+ self._adjust_offset()
elif self.algorithm == self.PERCENT:
- return self._adjust_percent()
+ self._adjust_percent()
elif self.algorithm == self.RATE:
- return self._adjust_rate(False)
+ self._adjust_rate(False)
elif self.algorithm == self.RATEAGGRESSIVE:
- return self._adjust_rate(True)
- return self.text_map
+ self._adjust_rate(True)
+ else:
+ self._adjust_auto()
+ return self.intervals
def _adjust_auto(self):
- self._log(u"Called _adjust_auto: returning text_map unchanged")
- return self.text_map
-
- def _adjust_offset(self):
- self._log(u"Called _adjust_offset")
- try:
- for index in range(1, len(self.text_map)):
- current = self.text_map[index]
- previous = self.text_map[index - 1]
- if self.value >= 0:
- offset = min(self.value, current[1] - current[0])
- else:
- offset = -min(-self.value, previous[1] - previous[0])
- previous[1] += offset
- current[0] += offset
- except:
- self._log(u"Exception in _adjust_offset: returning text_map unchanged")
- return self.text_map
-
- def _adjust_percent(self):
- def new_time(current_boundary, nsi):
- duration = nsi[1] - nsi[0]
- percent = self.value / 100.0
- return nsi[0] + duration * percent
- return self._adjust_on_nsi(new_time)
-
- def _adjust_aftercurrent(self):
- def new_time(current_boundary, nsi):
- duration = nsi[1] - nsi[0]
- try:
- delay = max(min(self.value, duration), 0)
- if delay == 0:
- return current_boundary
- return nsi[0] + delay
- except:
- return current_boundary
- return self._adjust_on_nsi(new_time)
-
- def _adjust_beforenext(self):
- def new_time(current_boundary, nsi):
- duration = nsi[1] - nsi[0]
- try:
- delay = max(min(self.value, duration), 0)
- if delay == 0:
- return current_boundary
- return nsi[1] - delay
- except:
- return current_boundary
- return self._adjust_on_nsi(new_time)
-
- def _adjust_on_nsi(self, new_time_function):
- nsi_index = 0
- # TODO numpy-fy this loop?
- for index in range(len(self.text_map) - 1):
- current_boundary = self.text_map[index][1]
- self._log([u"current_boundary: %.3f", current_boundary])
- # the tolerance comparison seems necessary
- while (
- (nsi_index < len(self.nonspeech)) and
- (self.nonspeech[nsi_index][1] + self.TOLERANCE <= current_boundary)
- ):
- nsi_index += 1
- nsi = None
- if (
- (nsi_index < len(self.nonspeech)) and
- (current_boundary >= self.nonspeech[nsi_index][0] - self.TOLERANCE)
- ):
- nsi = self.nonspeech[nsi_index]
- nsi_index += 1
- if nsi:
- self._log([u" in interval %.3f %.3f", nsi[0], nsi[1]])
- new_time = new_time_function(current_boundary, nsi)
- self._log([u" new_time: %.3f", new_time])
- new_start = self.text_map[index][0]
- new_end = self.text_map[index + 1][1]
- if self._time_in_interval(new_time, new_start, new_end):
- self._log([u" updating %.3f => %.3f", current_boundary, new_time])
- self.text_map[index][1] = new_time
- self.text_map[index + 1][0] = new_time
- else:
- self._log(u" new_time outside: no adjustment performed")
- else:
- self._log(u" no nonspeech interval found: no adjustment performed")
- return self.text_map
-
- def _len(self, string):
- """
- Return the length of the given string.
- If it is greater than 2 times the self.value (= user max rate),
- one space will become a newline,
- and hence we do not count it
- (e.g., value = 21 => max 42 chars per line).
-
- :param string: the string to be counted
- :type string: string
- :rtype: int
"""
- # TODO this should depend on the number of lines
- # in the text fragment; current code assumes
- # at most 2 lines of at most value characters each
- # (the effect of this finesse is negligible in practice)
- if string is None:
- return 0
- length = len(string)
- if length > 2 * self.value:
- length -= 1
- return length
-
- def _time_in_interval(self, time, start, end):
+ AUTO (do not modify)
"""
- Decides whether the given time is within the given interval.
-
- :param time: a time value
- :type time: float
- :param start: the start of the interval
- :type start: float
- :param end: the end of the interval
- :type end: float
- :rtype: bool
- """
- return (time >= start) and (time <= end)
+ self.log(u"Called _adjust_auto")
+ self._apply_offset(TimeValue("0.000"))
- # TODO a more efficient search (e.g., binary) is possible
- # the tolerance comparison seems necessary
- def _find_interval_containing(self, intervals, time):
- """
- Return the interval containing the given time,
- or None if no such interval exists.
-
- :param intervals: a list of time intervals
- [[s_1, e_1], ..., [s_k, e_k]]
- :type intervals: list of lists
- :param time: a time value
- :type time: float
- :rtype: a time interval ``[s, e]`` or ``None``
- """
- for interval in intervals:
- start = interval[0] - self.TOLERANCE
- end = interval[1] + self.TOLERANCE
- if self._time_in_interval(time, start, end):
- return interval
- return None
-
- def _compute_rate_raw(self, start, end, length):
+ def _adjust_offset(self):
"""
- Compute the rate of a fragment, that is,
- the number of characters per second.
-
- :param start: the start time
- :type start: float
- :param end: the end time
- :type end: float
- :param length: the number of character (possibly adjusted) of the text
- :type length: int
- :rtype: float
+ OFFSET
"""
- duration = end - start
- if duration > 0:
- return length / duration
- return 0
+ self.log(u"Called _adjust_offset")
+ # NOTE self.parameters[0] is TimeValue
+ self._apply_offset(self.parameters[0])
- def _compute_rate(self, index):
+ def _adjust_percent(self):
+ """
+ PERCENT
"""
- Compute the rate of a fragment, that is,
- the number of characters per second.
+ def new_time(begin, end, current):
+ """ Compute new time """
+ # NOTE self.parameters[0] is an int
+ percent = max(min(Decimal(self.parameters[0]) / 100, 100), 0)
+ return (begin + (end + 1 - begin) * percent) * self.rconf.mws
+ self.log(u"Called _adjust_percent")
+ self._adjust_on_nonspeech(new_time)
- :param index: the index of the fragment in the text map
- :type index: int
- :rtype: float
+ def _adjust_aftercurrent(self):
"""
- if (index < 0) or (index >= len(self.text_map)):
- return 0
- fragment = self.text_map[index]
- start = fragment[0]
- end = fragment[1]
- length = self._len(fragment[3])
- return self._compute_rate_raw(start, end, length)
-
- def _compute_slack(self, index):
+ AFTERCURRENT
"""
- Return the slack of a fragment, that is,
- the difference between the current duration
- of the fragment and the duration it should have
- if its rate was exactly self.value (= max rate)
-
- If the slack is positive, the fragment
- can be shrinken; if the slack is negative,
- the fragment should be stretched.
-
- The returned value can be None,
- in case the index is out of self.text_map bounds.
+ def new_time(begin, end, current):
+ """ Compute new time """
+ mws = self.rconf.mws
+ # NOTE self.parameters[0] is TimeValue
+ delay = max(self.parameters[0], TimeValue("0.000"))
+ tentative = begin * mws + delay
+ if tentative > (end + 1) * mws:
+ return current * mws
+ return tentative
+ self.log(u"Called _adjust_aftercurrent")
+ self._adjust_on_nonspeech(new_time)
- :param index: the index of the fragment in the text map
- :type index: int
- :rtype: float
+ def _adjust_beforenext(self):
+ """
+ BEFORENEXT
"""
- if (index < 0) or (index >= len(self.text_map)):
- return None
- fragment = self.text_map[index]
- start = fragment[0]
- end = fragment[1]
- length = self._len(fragment[3])
- duration = end - start
- return duration - (length / self.value)
+ def new_time(begin, end, current):
+ """ Compute new time """
+ mws = self.rconf.mws
+ # NOTE self.parameters[0] is TimeValue
+ delay = max(self.parameters[0], TimeValue("0.000"))
+ tentative = (end + 1) * mws - delay
+ if tentative < begin * mws:
+ return current * mws
+ return tentative
+ self.log(u"Called _adjust_beforenext")
+ self._adjust_on_nonspeech(new_time)
def _adjust_rate(self, aggressive=False):
- faster = []
-
- # TODO numpy-fy this loop?
- for index in range(len(self.text_map)):
- fragment = self.text_map[index]
- self._log([u"Fragment %d", index])
- rate = self._compute_rate(index)
- self._log([u" %.3f %.3f => %.3f", fragment[0], fragment[1], rate])
- if rate > self.value:
- self._log(u" too fast")
- faster.append(index)
-
- if len(self.text_map) == 1:
- self._log(u"Only one fragment, and it is too fast")
- return self.text_map
+ self.log(u"Called _adjust_rate")
+ # if only one fragment, return unchanged
+ if len(self.text_file) <= 1:
+ self.log(u"Only one fragment, returning")
+ self._apply_offset(TimeValue("0.000"))
+ return
+ # compute fragments too fast
+ mws = self.rconf.mws
+ # NOTE self.parameters[0] is Decimal
+ max_rate = self.parameters[0]
+ times = self.boundary_indices * mws
+ durations = numpy.diff(times)
+ lengths = numpy.array([f.chars for f in self.text_file.fragments])
+ # compute rates, dealing with division by zero
+ with numpy.errstate(divide="ignore", invalid="ignore"):
+ rates = numpy.divide(lengths, durations)
+ rates[rates == numpy.inf] = 0
+ rates = numpy.nan_to_num(rates)
+ faster = numpy.where(rates > max_rate)[0]
+
+ # if no fragment is faster, return unchanged
if len(faster) == 0:
- self._log([u"No fragment faster than max rate %.3f", self.value])
- return self.text_map
+ self.log([u"No fragment faster than max rate %.3f", max_rate])
+ self._apply_offset(TimeValue("0.000"))
+ return
- # TODO numpy-fy this loop?
# try fixing faster fragments
- self._log(u"Fixing faster fragments...")
for index in faster:
- self._log([u"Fixing faster fragment %d ...", index])
- if aggressive:
- try:
- self._rateaggressive_fix_fragment(index)
- except:
- self._log(u"Exception in _rateaggressive_fix_fragment")
- else:
- try:
- self._rate_fix_fragment(index)
- except:
- self._log(u"Exception in _rate_fix_fragment")
- self._log([u"Fixing faster fragment %d ... done", index])
- self._log(u"Fixing faster fragments... done")
- return self.text_map
-
- def _rate_fix_fragment(self, index):
- """
- Fix index-th fragment using the rate algorithm (standard variant).
- """
- succeeded = False
- current = self.text_map[index]
- current_start = current[0]
- current_end = current[1]
- current_rate = self._compute_rate(index)
- previous_slack = self._compute_slack(index - 1)
- current_slack = self._compute_slack(index)
- next_slack = self._compute_slack(index + 1)
- if previous_slack is not None:
- previous = self.text_map[index - 1]
- self._log([u" previous: %.3f %.3f => %.3f", previous[0], previous[1], self._compute_rate(index - 1)])
- self._log([u" previous slack: %.3f", previous_slack])
- if current_slack is not None:
- self._log([u" current: %.3f %.3f => %.3f", current_start, current_end, current_rate])
- self._log([u" current slack: %.3f", current_slack])
- if next_slack is not None:
- nextf = self.text_map[index]
- self._log([u" next: %.3f %.3f => %.3f", nextf[0], nextf[1], self._compute_rate(index + 1)])
- self._log([u" next slack: %.3f", next_slack])
-
- # try expanding into the previous fragment
- new_start = current_start
- new_end = current_end
- if (previous_slack is not None) and (previous_slack > 0):
- self._log(u" can expand into previous")
- nsi = self._find_interval_containing(self.nonspeech, current[0])
- previous = self.text_map[index - 1]
- if nsi is not None:
- if nsi[0] > previous[0]:
- self._log([u" found suitable nsi: %.3f %.3f", nsi[0], nsi[1]])
- previous_slack = min(current[0] - nsi[0], previous_slack)
- self._log([u" previous slack after min: %.3f", previous_slack])
- if previous_slack + current_slack >= 0:
- self._log(u" enough slack to completely fix")
- steal_from_previous = -current_slack
- succeeded = True
- else:
- self._log(u" not enough slack to completely fix")
- steal_from_previous = previous_slack
- new_start = current_start - steal_from_previous
- self.text_map[index - 1][1] = new_start
- self.text_map[index][0] = new_start
- new_rate = self._compute_rate(index)
- self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate])
- self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate])
+ self.log([u"Fragment %d has rate %.3f", index, rates[index]])
+ fixed = False
+
+ # first, try moving begin time back
+ if index > 0:
+ self.log(u" Trying to move begin time back...")
+ lacking = lengths[index] / max_rate - durations[index]
+ self.log([u" Overflow current fragment: %.3f", lacking])
+ slack = durations[index - 1] - lengths[index - 1] / max_rate
+ self.log([u" Slack previous fragment: %.3f", slack])
+ if slack >= lacking:
+ self.log([u" Moving begin time: %.3f => %.3f", times[index], times[index] - lacking])
+ self.log(u" Complete fix (slack >= lacking)")
+ times[index] -= lacking
+ durations[index - 1] -= lacking
+ durations[index] += lacking
+ rates[index - 1] = lengths[index - 1] / durations[index - 1]
+ rates[index] = lengths[index] / durations[index]
+ fixed = True
+ elif slack > 0:
+ self.log([u" Moving begin time: %.3f => %.3f", times[index], times[index] - slack])
+ self.log(u" Partial fix (slack < lacking but slack > 0)")
+ times[index] -= slack
+ durations[index - 1] -= slack
+ durations[index] += slack
+ rates[index - 1] = lengths[index - 1] / durations[index - 1]
+ rates[index] = lengths[index] / durations[index]
else:
- self._log(u" nsi found is not suitable")
- else:
- self._log(u" no nsi found")
- else:
- self._log(u" cannot expand into previous")
+ self.log(u" Cannot move begin time back (slack <= 0)")
+
+ # if aggressive and not completely fixed, try moving end time forward
+ if (aggressive) and (not fixed) and (index < len(self.text_file) - 1):
+ self.log(u" Trying to move end time forward...")
+ lacking = lengths[index] / max_rate - durations[index]
+ self.log([u" Overflow current fragment: %.3f", lacking])
+ slack = durations[index + 1] - lengths[index + 1] / max_rate
+ self.log([u" Slack next fragment: %.3f", slack])
+ if slack >= lacking:
+ self.log([u" Moving end time: %.3f => %.3f", times[index + 1], times[index + 1] + lacking])
+ self.log(u" Complete fix (slack >= lacking)")
+ times[index + 1] += lacking
+ durations[index] += lacking
+ durations[index + 1] -= lacking
+ rates[index] = lengths[index] / durations[index]
+ rates[index + 1] = lengths[index + 1] / durations[index + 1]
+ fixed = True
+ elif slack > 0:
+ self.log([u" Moving end time: %.3f => %.3f", times[index + 1], times[index + 1] + slack])
+ self.log(u" Partial fix (slack < lacking but slack > 0)")
+ times[index + 1] += slack
+ durations[index] += slack
+ durations[index + 1] -= slack
+ rates[index] = lengths[index] / durations[index]
+ rates[index + 1] = lengths[index + 1] / durations[index + 1]
+ else:
+ self.log(u" Cannot move end time forward (slack <= 0)")
- if succeeded:
- self._log(u" succeeded: returning")
- return
+ # if not completely fixed, log warning
+ if not fixed:
+ self.log_warn([u"Fragment %d is faster and could not be fixed", index])
- # recompute current fragment
- current_rate = self._compute_rate(index)
- current_slack = self._compute_slack(index)
- current_rate = self._compute_rate(index)
-
- # try expanding into the next fragment
- new_start = current_start
- new_end = current_end
- if (next_slack is not None) and (next_slack > 0):
- self._log(u" can expand into next")
- nsi = self._find_interval_containing(self.nonspeech, current[1])
- previous = self.text_map[index - 1]
- if nsi is not None:
- if nsi[0] > previous[0]:
- self._log([u" found suitable nsi: %.3f %.3f", nsi[0], nsi[1]])
- next_slack = min(nsi[1] - current[1], next_slack)
- self._log([u" next slack after min: %.3f", next_slack])
- if next_slack + current_slack >= 0:
- self._log(u" enough slack to completely fix")
- steal_from_next = -current_slack
- succeeded = True
- else:
- self._log(u" not enough slack to completely fix")
- steal_from_next = next_slack
- new_end = current_end + steal_from_next
- self.text_map[index][1] = new_end
- self.text_map[index + 1][0] = new_end
- new_rate = self._compute_rate(index)
- self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate])
- self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate])
- else:
- self._log(u" nsi found is not suitable")
- else:
- self._log(u" no nsi found")
- else:
- self._log(u" cannot expand into next")
+ # create intervals and return
+ self._times_to_intervals(times)
- if succeeded:
- self._log(u" succeeded: returning")
- return
+ def _times_to_intervals(self, times):
+ """
+ Transform a list of time values into a list of intervals.
- self._log(u" not succeeded, returning")
+ For example: [0,1,2,3,4] => [[0,1], [1,2], [2,3], [3,4]]
- def _rateaggressive_fix_fragment(self, index):
+ :param times: the time values
+ :type times: list of :class:`~aeneas.timevalue.TimeValue`
"""
- Fix index-th fragment using the rate algorithm (aggressive variant).
+ self.log(u"Converting times to intervals...")
+ intervals = [[times[i], times[i+1]] for i in range(len(times) - 1)]
+ self.log(u"Converting times to intervals... done")
+ self.log(u"Adding head and tail...")
+ self.intervals = [[TimeValue("0.000"), intervals[0][0]]] + intervals + [[intervals[-1][1], self.real_wave_mfcc.audio_length]]
+ self.log(u"Adding head and tail... done")
+
+ def _apply_offset(self, offset):
"""
- current = self.text_map[index]
- current_start = current[0]
- current_end = current[1]
- current_rate = self._compute_rate(index)
- previous_slack = self._compute_slack(index - 1)
- current_slack = self._compute_slack(index)
- next_slack = self._compute_slack(index + 1)
- if previous_slack is not None:
- self._log([u" previous slack: %.3f", previous_slack])
- if current_slack is not None:
- self._log([u" current slack: %.3f", current_slack])
- if next_slack is not None:
- self._log([u" next slack: %.3f", next_slack])
- steal_from_previous = 0
- steal_from_next = 0
- if (
- (previous_slack is not None) and
- (next_slack is not None) and
- (previous_slack > 0) and
- (next_slack > 0)
- ):
- self._log(u" can expand into both previous and next")
- total_slack = previous_slack + next_slack
- self._log([u" total slack: %.3f", total_slack])
- if total_slack + current_slack >= 0:
- self._log(u" enough total slack to completely fix")
- # partition the needed slack proportionally
- previous_percentage = previous_slack / total_slack
- self._log([u" previous percentage: %.3f", previous_percentage])
- steal_from_previous = -current_slack * previous_percentage
- steal_from_next = -current_slack - steal_from_previous
- else:
- self._log(u" not enough total slack to completely fix")
- # consume all the available slack
- steal_from_previous = previous_slack
- steal_from_next = next_slack
- elif (previous_slack is not None) and (previous_slack > 0):
- self._log(u" can expand into previous only")
- if previous_slack + current_slack >= 0:
- self._log(u" enough previous slack to completely fix")
- steal_from_previous = -current_slack
- else:
- self._log(u" not enough previous slack to completely fix")
- steal_from_previous = previous_slack
- elif (next_slack is not None) and (next_slack > 0):
- self._log(u" can expand into next only")
- if next_slack + current_slack >= 0:
- self._log(u" enough next slack to completely fix")
- steal_from_next = -current_slack
+ Apply the given offset (negative, zero, or positive)
+ to all times.
+
+ :param offset: the offset, in seconds
+ :type offset: :class:`~aeneas.timevalue.TimeValue`
+ """
+ times = (self.boundary_indices * self.rconf.mws) + offset
+ if numpy.min(times) < TimeValue("0.000"):
+ self.log_warn(u"After applying offset some boundary times are negative")
+ if numpy.max(times) > self.real_wave_mfcc.audio_length:
+ self.log_warn(u"After applying offset some boundary times are beyond audio file duration")
+ times = numpy.clip(times, TimeValue("0.000"), self.real_wave_mfcc.audio_length)
+ self._times_to_intervals(times)
+
+ def _adjust_on_nonspeech(self, adjust_function):
+ """
+ Apply the adjust function to each boundary point
+ falling inside (extrema included) of a nonspeech interval.
+
+ The adjust function is not applied to a boundary index
+ if there are two or more boundary indices falling
+ inside the same nonspeech interval.
+
+ The adjust function is not applied to the last boundary index
+ to avoid anticipating the end of the audio file.
+
+ The adjust function takes three arguments: the begin and end
+ indices of the nonspeech interval, and the current boundary index.
+ """
+ self.log(u"Called _adjust_on_nonspeech")
+ mws = self.rconf.mws
+ nonspeech_intervals = self.real_wave_mfcc.intervals(speech=False, time=False)
+ #
+ # first iteration
+ # nonspeech_counter[i] is the number of boundary indices
+ # falling in the i-th nonspeech interval
+ #
+ self.log(u" First iteration...")
+ nonspeech_counter = numpy.zeros(len(nonspeech_intervals), dtype=int)
+ i = 0 # index of current boundary_index
+ j = 0 # index of current nonspeech_interval
+ while i < len(self.boundary_indices):
+ # current boundary index
+ cbi = self.boundary_indices[i]
+ # current nonspeech interval
+ # with the property that it ends at an index >= cbi - 1
+ while (j < len(nonspeech_intervals)) and (nonspeech_intervals[j][1] < cbi - 1):
+ j += 1
+ if j >= len(nonspeech_intervals):
+ break
+ cni = nonspeech_intervals[j]
+ self.log([u"FI Current boundary index: %d %.3f", cbi, cbi * mws])
+ self.log([u"FI Current nonspeech interval: %d %d", cni[0], cni[1]])
+ if (cbi - 1 >= cni[0]) and (cbi - 1 <= cni[1]):
+ self.log(u"FI Current boundary index is inside nonspeech")
+ nonspeech_counter[j] += 1
+ i += 1
+ self.log(u" First iteration... done")
+ #
+ # second iteration
+ # we adjust the time value only for those boundary indices that
+ # 1. fall within a nonspeech interval and,
+ # 2. each is the only boundary index falling in that nonspeech interval
+ # all the other boundary indices are returned unchanged
+ #
+ self.log(u" Second iteration...")
+ times = numpy.zeros(len(self.boundary_indices), dtype=TimeValue)
+ i = 0
+ j = 0
+ while i < len(self.boundary_indices):
+ # current boundary index
+ cbi = self.boundary_indices[i]
+ # current nonspeech interval
+ # with the property that it ends at an index >= cbi - 1
+ while (j < len(nonspeech_intervals)) and (nonspeech_intervals[j][1] < cbi - 1):
+ j += 1
+ if j >= len(nonspeech_intervals):
+ break
+ cni = nonspeech_intervals[j]
+ self.log([u"SI Current boundary index: %d %.3f", cbi, cbi * mws])
+ self.log([u"SI Current nonspeech interval: %d %d", cni[0], cni[1]])
+ if (
+ (cbi - 1 >= cni[0]) and
+ (cbi - 1 <= cni[1]) and
+ (nonspeech_counter[j] == 1) and (i < len(self.boundary_indices) - 1)
+ ):
+ # falling inside and unique and not last => adjust
+ times[i] = adjust_function(cni[0], cni[1], cbi)
+ self.log([u"SI Adjusted cbi %d : %.3f => %.3f", cbi, cbi * mws, times[i]])
else:
- self._log(u" not enough next slack to completely fix")
- steal_from_next = next_slack
- else:
- self._log([u" fragment %d cannot be fixed", index])
-
- self._log([u" steal from previous: %.3f", steal_from_previous])
- self._log([u" steal from next: %.3f", steal_from_next])
- new_start = current_start - steal_from_previous
- new_end = current_end + steal_from_next
- if index - 1 >= 0:
- self.text_map[index - 1][1] = new_start
- self.text_map[index][0] = new_start
- self.text_map[index][1] = new_end
- if index + 1 < len(self.text_map):
- self.text_map[index + 1][0] = new_end
- new_rate = self._compute_rate(index)
- self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate])
- self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate])
+ # not falling inside or not unique or last => do not adjust
+ times[i] = cbi * mws
+ self.log([u"SI Not adjusted cbi %d : %.3f => %.3f", cbi, times[i], times[i]])
+ i += 1
+ while i < len(self.boundary_indices):
+ # complete with remaining indices
+ cbi = self.boundary_indices[i]
+ times[i] = cbi * mws
+ self.log([u"Not adjusting %d %.3f", cbi, times[i]])
+ i += 1
+ self.log(u" Second iteration... done")
+ self._times_to_intervals(times)
diff --git a/aeneas/analyzecontainer.py b/aeneas/analyzecontainer.py
index 2bfea32f..9fcf6868 100644
--- a/aeneas/analyzecontainer.py
+++ b/aeneas/analyzecontainer.py
@@ -2,7 +2,13 @@
# coding=utf-8
"""
-Analyze a given container and build the corresponding job.
+This module contains the following classes:
+
+* :class:`~aeneas.analyzecontainer.AnalyzeContainer`
+ implementing functions to analyze a given container
+ and build the corresponding job object.
+
+.. warning:: This module might be refactored in a future version
"""
from __future__ import absolute_import
@@ -13,7 +19,7 @@
from aeneas.container import Container
from aeneas.hierarchytype import HierarchyType
from aeneas.job import Job
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
from aeneas.task import Task
import aeneas.globalconstants as gc
@@ -26,40 +32,33 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class AnalyzeContainer(object):
+class AnalyzeContainer(Loggable):
"""
Analyze a given container and build the corresponding job.
:param container: the container to be analyzed
- :type container: :class:`aeneas.container.Container`
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :type container: :class:`~aeneas.container.Container`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
-
- :raise TypeError: if ``container`` is ``None`` or not an instance of ``Container``
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: TypeError: if ``container`` is ``None`` or not an instance of :class:`~aeneas.container.Container`
"""
TAG = u"AnalyzeContainer"
def __init__(self, container, rconf=None, logger=None):
if container is None:
- raise TypeError("container is None")
+ raise TypeError(u"container is None")
if not isinstance(container, Container):
- raise TypeError("container is not an instance of Container")
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ raise TypeError(u"container is not an instance of Container")
+ super(AnalyzeContainer, self).__init__(rconf=rconf, logger=logger)
self.container = container
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def analyze(self, config_string=None):
"""
Analyze the given container and
@@ -67,25 +66,23 @@ def analyze(self, config_string=None):
On error, it will return ``None``.
- :param config_string: the configuration string generated by wizard
- :type config_string: string
- :rtype: :class:`aeneas.job.Job`
+ :param string config_string: the configuration string generated by wizard
+ :rtype: :class:`~aeneas.job.Job` or ``None``
"""
try:
if config_string is not None:
- self._log(u"Analyzing container with the given config string")
+ self.log(u"Analyzing container with the given config string")
return self._analyze_txt_config(config_string=config_string)
elif self.container.has_config_xml:
- self._log(u"Analyzing container with XML config file")
+ self.log(u"Analyzing container with XML config file")
return self._analyze_xml_config(config_contents=None)
elif self.container.has_config_txt:
- self._log(u"Analyzing container with TXT config file")
+ self.log(u"Analyzing container with TXT config file")
return self._analyze_txt_config(config_string=None)
else:
- self._log(u"No configuration file in this container, returning None")
+ self.log(u"No configuration file in this container, returning None")
except (OSError, KeyError, TypeError) as exc:
- self._log(u"Error in analyze", Logger.CRITICAL)
- self._log([u"Message: %s", exc], Logger.CRITICAL)
+ self.log_exc(u"An unexpected error occurred while analyzing", exc, True, None)
return None
def _analyze_txt_config(self, config_string=None):
@@ -95,96 +92,95 @@ def _analyze_txt_config(self, config_string=None):
If ``config_string`` is ``None``,
try reading it from the TXT config file inside the container.
- :param config_string: the configuration string
- :type config_string: string
- :rtype: :class:`aeneas.job.Job`
+ :param string config_string: the configuration string
+ :rtype: :class:`~aeneas.job.Job`
"""
- self._log(u"Analyzing container with TXT config string")
+ self.log(u"Analyzing container with TXT config string")
if config_string is None:
- self._log(u"Analyzing container with TXT config file")
+ self.log(u"Analyzing container with TXT config file")
config_entry = self.container.entry_config_txt
- self._log([u"Found TXT config entry '%s'", config_entry])
+ self.log([u"Found TXT config entry '%s'", config_entry])
config_dir = os.path.dirname(config_entry)
- self._log([u"Directory of TXT config entry: '%s'", config_dir])
- self._log([u"Reading TXT config entry: '%s'", config_entry])
+ self.log([u"Directory of TXT config entry: '%s'", config_dir])
+ self.log([u"Reading TXT config entry: '%s'", config_entry])
config_contents = self.container.read_entry(config_entry)
- self._log(u"Converting config contents to config string")
+ self.log(u"Converting config contents to config string")
config_contents = gf.safe_unicode(config_contents)
config_string = gf.config_txt_to_string(config_contents)
else:
- self._log([u"Analyzing container with TXT config string '%s'", config_string])
+ self.log([u"Analyzing container with TXT config string '%s'", config_string])
config_dir = ""
- self._log(u"Creating the Job object")
+ self.log(u"Creating the Job object")
job = Job(config_string)
- self._log(u"Getting entries")
- entries = self.container.entries()
+ self.log(u"Getting entries")
+ entries = self.container.entries
- self._log(u"Converting config string into config dict")
+ self.log(u"Converting config string into config dict")
parameters = gf.config_string_to_dict(config_string)
- self._log(u"Calculating the path of the tasks root directory")
+ self.log(u"Calculating the path of the tasks root directory")
tasks_root_directory = gf.norm_join(
config_dir,
parameters[gc.PPN_JOB_IS_HIERARCHY_PREFIX]
)
- self._log([u"Path of the tasks root directory: '%s'", tasks_root_directory])
+ self.log([u"Path of the tasks root directory: '%s'", tasks_root_directory])
- self._log(u"Calculating the path of the sync map root directory")
+ self.log(u"Calculating the path of the sync map root directory")
sync_map_root_directory = gf.norm_join(
config_dir,
parameters[gc.PPN_JOB_OS_HIERARCHY_PREFIX]
)
job_os_hierarchy_type = parameters[gc.PPN_JOB_OS_HIERARCHY_TYPE]
- self._log([u"Path of the sync map root directory: '%s'", sync_map_root_directory])
+ self.log([u"Path of the sync map root directory: '%s'", sync_map_root_directory])
text_file_relative_path = parameters[gc.PPN_JOB_IS_TEXT_FILE_RELATIVE_PATH]
- self._log([u"Relative path for text file: '%s'", text_file_relative_path])
+ self.log([u"Relative path for text file: '%s'", text_file_relative_path])
text_file_name_regex = re.compile(r"" + parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX])
- self._log([u"Regex for text file: '%s'", parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX]])
+ self.log([u"Regex for text file: '%s'", parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX]])
audio_file_relative_path = parameters[gc.PPN_JOB_IS_AUDIO_FILE_RELATIVE_PATH]
- self._log([u"Relative path for audio file: '%s'", audio_file_relative_path])
+ self.log([u"Relative path for audio file: '%s'", audio_file_relative_path])
audio_file_name_regex = re.compile(r"" + parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX])
- self._log([u"Regex for audio file: '%s'", parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX]])
+ self.log([u"Regex for audio file: '%s'", parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX]])
if parameters[gc.PPN_JOB_IS_HIERARCHY_TYPE] == HierarchyType.FLAT:
- self._log(u"Looking for text/audio pairs in flat hierarchy")
+ self.log(u"Looking for text/audio pairs in flat hierarchy")
text_files = self._find_files(
entries,
tasks_root_directory,
text_file_relative_path,
text_file_name_regex
)
- self._log([u"Found text files: '%s'", text_files])
+ self.log([u"Found text files: '%s'", text_files])
audio_files = self._find_files(
entries,
tasks_root_directory,
audio_file_relative_path,
audio_file_name_regex
)
- self._log([u"Found audio files: '%s'", audio_files])
+ self.log([u"Found audio files: '%s'", audio_files])
- self._log(u"Matching files in flat hierarchy...")
+ self.log(u"Matching files in flat hierarchy...")
matched_tasks = self._match_files_flat_hierarchy(
text_files,
audio_files
)
- self._log(u"Matching files in flat hierarchy... done")
+ self.log(u"Matching files in flat hierarchy... done")
for task_info in matched_tasks:
- self._log([u"Creating task: '%s'", str(task_info)])
+ self.log([u"Creating task: '%s'", str(task_info)])
task = self._create_task(
task_info,
config_string,
sync_map_root_directory,
job_os_hierarchy_type
)
- job.append_task(task)
+ job.add_task(task)
if parameters[gc.PPN_JOB_IS_HIERARCHY_TYPE] == HierarchyType.PAGED:
- self._log(u"Looking for text/audio pairs in paged hierarchy")
+ self.log(u"Looking for text/audio pairs in paged hierarchy")
# find all subdirectories of tasks_root_directory
# that match gc.PPN_JOB_IS_TASK_DIRECTORY_NAME_REGEX
matched_directories = self._match_directories(
@@ -198,7 +194,7 @@ def _analyze_txt_config(self, config_string=None):
tasks_root_directory,
matched_directory
)
- self._log([u"Looking for text/audio pairs in directory '%s'", matched_directory_full_path])
+ self.log([u"Looking for text/audio pairs in directory '%s'", matched_directory_full_path])
# look for text and audio files there
text_files = self._find_files(
@@ -207,38 +203,38 @@ def _analyze_txt_config(self, config_string=None):
text_file_relative_path,
text_file_name_regex
)
- self._log([u"Found text files: '%s'", text_files])
+ self.log([u"Found text files: '%s'", text_files])
audio_files = self._find_files(
entries,
matched_directory_full_path,
audio_file_relative_path,
audio_file_name_regex
)
- self._log([u"Found audio files: '%s'", audio_files])
+ self.log([u"Found audio files: '%s'", audio_files])
# if we have found exactly one text and one audio file,
# create a Task
if (len(text_files) == 1) and (len(audio_files) == 1):
- self._log([u"Exactly one text file and one audio file in '%s'", matched_directory])
+ self.log([u"Exactly one text file and one audio file in '%s'", matched_directory])
task_info = [
matched_directory,
text_files[0],
audio_files[0]
]
- self._log([u"Creating task: '%s'", str(task_info)])
+ self.log([u"Creating task: '%s'", str(task_info)])
task = self._create_task(
task_info,
config_string,
sync_map_root_directory,
job_os_hierarchy_type
)
- job.append_task(task)
+ job.add_task(task)
elif len(text_files) > 1:
- self._log([u"More than one text file in '%s'", matched_directory])
+ self.log([u"More than one text file in '%s'", matched_directory])
elif len(audio_files) > 1:
- self._log([u"More than one audio file in '%s'", matched_directory])
+ self.log([u"More than one audio file in '%s'", matched_directory])
else:
- self._log([u"No text nor audio file in '%s'", matched_directory])
+ self.log([u"No text nor audio file in '%s'", matched_directory])
return job
@@ -249,53 +245,52 @@ def _analyze_xml_config(self, config_contents=None):
If ``config_contents`` is ``None``,
try reading it from the XML config file inside the container.
- :param config_contents: the contents of the XML config file
- :type config_contents: string
- :rtype: :class:`aeneas.job.Job`
+ :param string config_contents: the contents of the XML config file
+ :rtype: :class:`~aeneas.job.Job`
"""
- self._log(u"Analyzing container with XML config string")
+ self.log(u"Analyzing container with XML config string")
if config_contents is None:
- self._log(u"Analyzing container with XML config file")
+ self.log(u"Analyzing container with XML config file")
config_entry = self.container.entry_config_xml
- self._log([u"Found XML config entry '%s'", config_entry])
+ self.log([u"Found XML config entry '%s'", config_entry])
config_dir = os.path.dirname(config_entry)
- self._log([u"Directory of XML config entry: '%s'", config_dir])
- self._log([u"Reading XML config entry: '%s'", config_entry])
+ self.log([u"Directory of XML config entry: '%s'", config_dir])
+ self.log([u"Reading XML config entry: '%s'", config_entry])
config_contents = self.container.read_entry(config_entry)
else:
- self._log(u"Analyzing container with XML config contents")
+ self.log(u"Analyzing container with XML config contents")
config_dir = ""
- self._log(u"Converting config contents into job config dict")
+ self.log(u"Converting config contents into job config dict")
job_parameters = gf.config_xml_to_dict(
config_contents,
result=None,
parse_job=True
)
- self._log(u"Converting config contents into tasks config dict")
+ self.log(u"Converting config contents into tasks config dict")
tasks_parameters = gf.config_xml_to_dict(
config_contents,
result=None,
parse_job=False
)
- self._log(u"Calculating the path of the sync map root directory")
+ self.log(u"Calculating the path of the sync map root directory")
sync_map_root_directory = gf.norm_join(
config_dir,
job_parameters[gc.PPN_JOB_OS_HIERARCHY_PREFIX]
)
job_os_hierarchy_type = job_parameters[gc.PPN_JOB_OS_HIERARCHY_TYPE]
- self._log([u"Path of the sync map root directory: '%s'", sync_map_root_directory])
+ self.log([u"Path of the sync map root directory: '%s'", sync_map_root_directory])
- self._log(u"Converting job config dict into job config string")
+ self.log(u"Converting job config dict into job config string")
config_string = gf.config_dict_to_string(job_parameters)
job = Job(config_string)
for task_parameters in tasks_parameters:
- self._log(u"Converting task config dict into task config string")
+ self.log(u"Converting task config dict into task config string")
config_string = gf.config_dict_to_string(task_parameters)
- self._log([u"Creating task with config string '%s'", config_string])
+ self.log([u"Creating task with config string '%s'", config_string])
try:
custom_id = task_parameters[gc.PPN_TASK_CUSTOM_ID]
except KeyError:
@@ -311,14 +306,14 @@ def _analyze_xml_config(self, config_contents=None):
task_parameters[gc.PPN_TASK_IS_AUDIO_FILE_XML]
)
]
- self._log([u"Creating task: '%s'", str(task_info)])
+ self.log([u"Creating task: '%s'", str(task_info)])
task = self._create_task(
task_info,
config_string,
sync_map_root_directory,
job_os_hierarchy_type
)
- job.append_task(task)
+ job.add_task(task)
return job
@@ -335,67 +330,64 @@ def _create_task(
1. the ``task_info`` found analyzing the container entries, and
2. the given ``config_string``.
- :param task_info: the task information: ``[prefix, text_path, audio_path]``
- :type task_info: list of strings
- :param config_string: the configuration string
- :type config_string: string
- :param sync_map_root_directory: the root directory for the sync map files
- :type sync_map_root_directory: string (path)
+ :param list task_info: the task information: ``[prefix, text_path, audio_path]``
+ :param string config_string: the configuration string
+ :param string sync_map_root_directory: the root directory for the sync map files
:param job_os_hierarchy_type: type of job output hierarchy
- :type job_os_hierarchy_type: :class:`aeneas.hierarchytype.HierarchyType`
- :rtype: :class:`aeneas.task.Task`
+ :type job_os_hierarchy_type: :class:`~aeneas.hierarchytype.HierarchyType`
+ :rtype: :class:`~aeneas.task.Task`
"""
- self._log(u"Converting config string to config dict")
+ self.log(u"Converting config string to config dict")
parameters = gf.config_string_to_dict(config_string)
- self._log(u"Creating task")
+ self.log(u"Creating task")
task = Task(config_string, logger=self.logger)
task.configuration["description"] = "Task %s" % task_info[0]
- self._log([u"Task description: %s", task.configuration["description"]])
+ self.log([u"Task description: %s", task.configuration["description"]])
try:
task.configuration["language"] = parameters[gc.PPN_TASK_LANGUAGE]
- self._log([u"Set language from task: '%s'", task.configuration["language"]])
+ self.log([u"Set language from task: '%s'", task.configuration["language"]])
except KeyError:
task.configuration["language"] = parameters[gc.PPN_JOB_LANGUAGE]
- self._log([u"Set language from job: '%s'", task.configuration["language"]])
+ self.log([u"Set language from job: '%s'", task.configuration["language"]])
custom_id = task_info[0]
task.configuration["custom_id"] = custom_id
- self._log([u"Task custom_id: %s", task.configuration["custom_id"]])
+ self.log([u"Task custom_id: %s", task.configuration["custom_id"]])
task.text_file_path = task_info[1]
- self._log([u"Task text file path: %s", task.text_file_path])
+ self.log([u"Task text file path: %s", task.text_file_path])
task.audio_file_path = task_info[2]
- self._log([u"Task audio file path: %s", task.audio_file_path])
+ self.log([u"Task audio file path: %s", task.audio_file_path])
task.sync_map_file_path = self._compute_sync_map_file_path(
sync_map_root_directory,
job_os_hierarchy_type,
custom_id,
task.configuration["o_name"]
)
- self._log([u"Task sync map file path: %s", task.sync_map_file_path])
+ self.log([u"Task sync map file path: %s", task.sync_map_file_path])
- self._log(u"Replacing placeholder in os_file_smil_audio_ref")
+ self.log(u"Replacing placeholder in os_file_smil_audio_ref")
task.configuration["o_smil_audio_ref"] = self._replace_placeholder(
task.configuration["o_smil_audio_ref"],
custom_id
)
- self._log(u"Replacing placeholder in os_file_smil_page_ref")
+ self.log(u"Replacing placeholder in os_file_smil_page_ref")
task.configuration["o_smil_page_ref"] = self._replace_placeholder(
task.configuration["o_smil_page_ref"],
custom_id
)
- self._log(u"Returning task")
+ self.log(u"Returning task")
return task
def _replace_placeholder(self, string, custom_id):
"""
Replace the prefix placeholder
- :class:`aeneas.globalconstants.PPV_OS_TASK_PREFIX`
+ :class:`~aeneas.globalconstants.PPV_OS_TASK_PREFIX`
with ``custom_id`` and return the resulting string.
:rtype: string
"""
if string is None:
return None
- self._log([u"Replacing '%s' with '%s' in '%s'", gc.PPV_OS_TASK_PREFIX, custom_id, string])
+ self.log([u"Replacing '%s' with '%s' in '%s'", gc.PPV_OS_TASK_PREFIX, custom_id, string])
return string.replace(gc.PPV_OS_TASK_PREFIX, custom_id)
def _compute_sync_map_file_path(
@@ -408,16 +400,13 @@ def _compute_sync_map_file_path(
"""
Compute the sync map file path inside the output container.
- :param root: the root of the sync map files inside the container
- :type root: string (path)
+ :param string root: the root of the sync map files inside the container
:param job_os_hierarchy_type: type of job output hierarchy
- :type job_os_hierarchy_type: :class:`aeneas.hierarchytype.HierarchyType`
- :param custom_id: the task custom id (flat) or
- page directory name (paged)
- :type custom_id: string
- :param file_name: the output file name for the sync map
- :type file_name: string
- :rtype: string (path)
+ :type job_os_hierarchy_type: :class:`~aeneas.hierarchytype.HierarchyType`
+ :param string custom_id: the task custom id (flat) or
+ page directory name (paged)
+ :param string file_name: the output file name for the sync map
+ :rtype: string
"""
prefix = root
if hierarchy_type == HierarchyType.PAGED:
@@ -432,34 +421,30 @@ def _find_files(self, entries, root, relative_path, file_name_regex):
1. are in ``root/relative_path``, and
2. match ``file_name_regex``.
- :param entries: the list of entries (file paths) in the container
- :type entries: list of strings (path)
- :param root: the root directory of the container
- :type root: string (path)
- :param relative_path: the relative path in which we must search
- :type relative_path: string (path)
- :param file_name_regex: the regex matching the desired file names
- :type file_name_regex: regex
+ :param list entries: the list of entries (file paths) in the container
+ :param string root: the root directory of the container
+ :param string relative_path: the relative path in which we must search
+ :param regex file_name_regex: the regex matching the desired file names
:rtype: list of strings (path)
"""
- self._log([u"Finding files within root: '%s'", root])
+ self.log([u"Finding files within root: '%s'", root])
target = root
if relative_path is not None:
- self._log([u"Joining relative path: '%s'", relative_path])
+ self.log([u"Joining relative path: '%s'", relative_path])
target = gf.norm_join(root, relative_path)
- self._log([u"Finding files within target: '%s'", target])
+ self.log([u"Finding files within target: '%s'", target])
files = []
target_len = len(target)
for entry in entries:
if entry.startswith(target):
- self._log([u"Examining entry: '%s'", entry])
+ self.log([u"Examining entry: '%s'", entry])
entry_suffix = entry[target_len + 1:]
- self._log([u"Examining entry suffix: '%s'", entry_suffix])
+ self.log([u"Examining entry suffix: '%s'", entry_suffix])
if re.search(file_name_regex, entry_suffix) is not None:
- self._log([u"Match: '%s'", entry])
+ self.log([u"Match: '%s'", entry])
files.append(entry)
else:
- self._log([u"No match: '%s'", entry])
+ self.log([u"No match: '%s'", entry])
return sorted(files)
def _match_files_flat_hierarchy(self, text_files, audio_files):
@@ -477,32 +462,30 @@ def _match_files_flat_hierarchy(self, text_files, audio_files):
foo/res/c.txt foo/res/c.mp3 => match: ["c", "foo/res/c.txt", "foo/res/c.mp3"]
foo/res/d.txt foo/res/e.mp3 => no match
- :param text_files: the entries corresponding to text files
- :type text_files: list of strings (path)
- :param audio_files: the entries corresponding to audio files
- :type audio_files: list of strings (path)
+ :param list text_files: the entries corresponding to text files
+ :param list audio_files: the entries corresponding to audio files
:rtype: list of lists (see above)
"""
- self._log(u"Matching files in flat hierarchy")
- self._log([u"Text files: '%s'", text_files])
- self._log([u"Audio files: '%s'", audio_files])
+ self.log(u"Matching files in flat hierarchy")
+ self.log([u"Text files: '%s'", text_files])
+ self.log([u"Audio files: '%s'", audio_files])
d_text = {}
d_audio = {}
for text_file in text_files:
text_file_no_ext = gf.file_name_without_extension(text_file)
d_text[text_file_no_ext] = text_file
- self._log([u"Added text file '%s' to key '%s'", text_file, text_file_no_ext])
+ self.log([u"Added text file '%s' to key '%s'", text_file, text_file_no_ext])
for audio_file in audio_files:
audio_file_no_ext = gf.file_name_without_extension(audio_file)
d_audio[audio_file_no_ext] = audio_file
- self._log([u"Added audio file '%s' to key '%s'", audio_file, audio_file_no_ext])
+ self.log([u"Added audio file '%s' to key '%s'", audio_file, audio_file_no_ext])
tasks = []
for key in d_text.keys():
- self._log([u"Examining text key '%s'", key])
+ self.log([u"Examining text key '%s'", key])
if key in d_audio:
- self._log([u"Key '%s' is also in audio", key])
+ self.log([u"Key '%s' is also in audio", key])
tasks.append([key, d_text[key], d_audio[key]])
- self._log([u"Added pair ('%s', '%s')", d_text[key], d_audio[key]])
+ self.log([u"Added pair ('%s', '%s')", d_text[key], d_audio[key]])
return tasks
def _match_directories(self, entries, root, regex_string):
@@ -525,24 +508,21 @@ def _match_directories(self, entries, root, regex_string):
=> ["/foo/bar/1", "/foo/bar/2", "/foo/bar/3"]
- :param entries: the list of entries (paths) of a container
- :type entries: list of strings (paths)
- :param root: the root directory to search within
- :type root: string (path)
- :param regex_string: regex string to match directory names
- :type regex_string: string
+ :param list entries: the list of entries (paths) of a container
+ :param string root: the root directory to search within
+ :param string regex_string: regex string to match directory names
:rtype: list of matched directories
"""
- self._log(u"Matching directory names in paged hierarchy")
- self._log([u"Matching within '%s'", root])
- self._log([u"Matching regex '%s'", regex_string])
+ self.log(u"Matching directory names in paged hierarchy")
+ self.log([u"Matching within '%s'", root])
+ self.log([u"Matching regex '%s'", regex_string])
regex = re.compile(r"" + regex_string)
directories = set()
root_len = len(root)
for entry in entries:
# look only inside root dir
if entry.startswith(root):
- self._log([u"Examining '%s'", entry])
+ self.log([u"Examining '%s'", entry])
# remove common prefix root/
entry = entry[root_len + 1:]
# split path
@@ -551,9 +531,9 @@ def _match_directories(self, entries, root, regex_string):
if ((len(entry_splitted) >= 2) and
(re.match(regex, entry_splitted[0]) is not None)):
directories.add(entry_splitted[0])
- self._log([u"Match: '%s'", entry_splitted[0]])
+ self.log([u"Match: '%s'", entry_splitted[0]])
else:
- self._log([u"No match: '%s'", entry])
+ self.log([u"No match: '%s'", entry])
return sorted(directories)
diff --git a/aeneas/audiofile.py b/aeneas/audiofile.py
index 3c4f8dfd..25638fcc 100644
--- a/aeneas/audiofile.py
+++ b/aeneas/audiofile.py
@@ -2,7 +2,14 @@
# coding=utf-8
"""
-A class representing an audio file.
+This module contains the following classes:
+
+* :class:`~aeneas.audiofile.AudioFile`, representing an audio file;
+* :class:`~aeneas.audiofile.AudioFileConverterError`,
+* :class:`~aeneas.audiofile.AudioFileNotInitializedError`,
+* :class:`~aeneas.audiofile.AudioFileProbeError`, and
+* :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`,
+ representing errors generated by audio files.
"""
from __future__ import absolute_import
@@ -14,9 +21,11 @@
from aeneas.ffprobewrapper import FFPROBEPathError
from aeneas.ffprobewrapper import FFPROBEUnsupportedFormatError
from aeneas.ffprobewrapper import FFPROBEWrapper
-from aeneas.logger import Logger
-from aeneas.mfcc import MFCC
+from aeneas.ffmpegwrapper import FFMPEGPathError
+from aeneas.ffmpegwrapper import FFMPEGWrapper
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.timevalue import TimeValue
from aeneas.wavfile import read as scipywavread
from aeneas.wavfile import write as scipywavwrite
import aeneas.globalfunctions as gf
@@ -28,13 +37,31 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
+class AudioFileConverterError(Exception):
+ """
+ Error raised when the audio converter executable cannot be executed.
+ """
+ pass
+
+
+
+class AudioFileNotInitializedError(Exception):
+ """
+ Error raised when trying to access audio samples from
+ an :class:`~aeneas.audiofile.AudioFile` object which
+ has not been initialized yet.
+ """
+ pass
+
+
+
class AudioFileProbeError(Exception):
"""
- Error raised when the probe executable cannot be executed.
+ Error raised when the audio probe executable cannot be executed.
"""
pass
@@ -48,39 +75,61 @@ class AudioFileUnsupportedFormatError(Exception):
-class AudioFile(object):
+class AudioFile(Loggable):
"""
A class representing an audio file.
+ This class can be used either to extract properties
+ from an audio file on disk,
+ or to load/edit/save a monoaural (single channel) audio file,
+ represented as an array of audio samples.
+
The properties of the audio file (length, format, etc.)
- are set by invoking the ``read_properties()`` function,
+ can set by invoking the :func:`~aeneas.audiofile.AudioFile.read_properties` function,
which calls an audio file probe.
- (Currently, the probe is :class:`aeneas.ffprobewrapper.FFPROBEWrapper`)
-
- :param file_path: the path of the audio file
- :type file_path: Unicode string (path)
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ (Currently, the probe is :class:`~aeneas.ffprobewrapper.FFPROBEWrapper`)
+
+ Moreover, this class can read the audio data,
+ by converting the original file format
+ into a temporary PCM16 Mono WAVE (RIFF) file,
+ which is deleted as soon as audio data is read in memory.
+ (Currently, the converter is :class:`~aeneas.ffmpegwrapper.FFMPEGWrapper`)
+
+ The internal representation of the wave is a
+ a NumPy 1D array of ``float64`` values in ``[-1.0, 1.0]``.
+ It supports append, reverse, and trim operations.
+ Audio samples can be written to file.
+ Memory can be pre-allocated to speed append operations up.
+ Allocated memory is doubled when an append operation
+ requires more memory than what is available;
+ this leads to an amortized linear complexity
+ (in the number of audio samples)
+ for append operations.
+
+ .. note:: Support for stereo WAVE files might be implemented in a future version
+
+ :param string file_path: the path of the audio file
+ :param bool is_mono_wave: set to ``True`` if the audio file is a PCM16 mono WAVE file
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
TAG = u"AudioFile"
- def __init__(self, file_path=None, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ def __init__(self, file_path=None, is_mono_wave=False, rconf=None, logger=None):
+ super(AudioFile, self).__init__(rconf=rconf, logger=logger)
self.file_path = file_path
self.file_size = None
+ self.is_mono_wave = is_mono_wave
self.audio_length = None
self.audio_format = None
self.audio_sample_rate = None
self.audio_channels = None
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
+ self.__samples_capacity = 0
+ self.__samples_length = 0
+ self.__samples = None
def __unicode__(self):
msg = [
@@ -90,6 +139,8 @@ def __unicode__(self):
u"Audio format: %s" % self.audio_format,
u"Audio sample rate: %s" % gf.safe_int(self.audio_sample_rate),
u"Audio channels: %s" % gf.safe_int(self.audio_channels),
+ u"Samples capacity: %s" % gf.safe_int(self.__samples_capacity),
+ u"Samples length: %s" % gf.safe_int(self.__samples_length),
]
return u"\n".join(msg)
@@ -101,7 +152,7 @@ def file_path(self):
"""
The path of the audio file.
- :rtype: Unicode string
+ :rtype: string
"""
return self.__file_path
@file_path.setter
@@ -125,7 +176,7 @@ def audio_length(self):
"""
The length of the audio file, in seconds.
- :rtype: float
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
"""
return self.__audio_length
@audio_length.setter
@@ -137,7 +188,7 @@ def audio_format(self):
"""
The format of the audio file.
- :rtype: Unicode string
+ :rtype: string
"""
return self.__audio_format
@audio_format.setter
@@ -168,225 +219,245 @@ def audio_channels(self):
def audio_channels(self, audio_channels):
self.__audio_channels = audio_channels
+ @property
+ def audio_samples(self):
+ """
+ The audio audio_samples, that is, an array of ``float64`` values,
+ each representing an audio sample in ``[-1.0, 1.0]``.
+
+ Note that this function returns a view into the
+ first ``self.__samples_length`` elements of ``self.__samples``.
+ If you want to clone the values,
+ you must use e.g. ``numpy.array(audiofile.audio_samples)``.
+
+ :rtype: :class:`numpy.ndarray` (1D, view)
+ :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet
+ """
+ if self.__samples is None:
+ if self.file_path is None:
+ self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError)
+ else:
+ self.read_samples_from_file()
+ return self.__samples[0:self.__samples_length]
+
def read_properties(self):
"""
Populate this object by reading
the audio properties of the file at the given path.
Currently this function uses
- :class:`aeneas.ffprobewrapper.FFPROBEWrapper`
+ :class:`~aeneas.ffprobewrapper.FFPROBEWrapper`
to get the audio file properties.
- :raises AudioFileProbeError: if the path to the ``ffprobe`` executable cannot be called
- :raises AudioFileUnsupportedFormatError: if the audio file has a format not supported
- :raises OSError: if the audio file cannot be read
+ :raises: :class:`~aeneas.audiofile.AudioFileProbeError`: if the path to the ``ffprobe`` executable cannot be called
+ :raises: :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`: if the audio file has a format not supported
+ :raises: OSError: if the audio file cannot be read
"""
-
- self._log(u"Reading properties...")
+ self.log(u"Reading properties...")
# check the file can be read
if not gf.file_can_be_read(self.file_path):
- self._log([u"File '%s' cannot be read", self.file_path], Logger.CRITICAL)
- raise OSError(u"File '%s' cannot be read" % self.file_path)
+ self.log_exc(u"File '%s' cannot be read" % (self.file_path), None, True, OSError)
# get the file size
- self._log([u"Getting file size for '%s'", self.file_path])
+ self.log([u"Getting file size for '%s'", self.file_path])
self.file_size = gf.file_size(self.file_path)
- self._log([u"File size for '%s' is '%d'", self.file_path, self.file_size])
+ self.log([u"File size for '%s' is '%d'", self.file_path, self.file_size])
# get the audio properties using FFPROBEWrapper
try:
- self._log(u"Reading properties with FFPROBEWrapper...")
- properties = FFPROBEWrapper(rconf=self.rconf, logger=self.logger).read_properties(self.file_path)
- self._log(u"Reading properties with FFPROBEWrapper... done")
+ self.log(u"Reading properties with FFPROBEWrapper...")
+ properties = FFPROBEWrapper(
+ rconf=self.rconf,
+ logger=self.logger
+ ).read_properties(self.file_path)
+ self.log(u"Reading properties with FFPROBEWrapper... done")
except FFPROBEPathError:
- self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL)
- self._log(u"Unable to call ffprobe executable", Logger.CRITICAL)
- raise AudioFileProbeError("Unable to call the audio probe executable")
- except FFPROBEUnsupportedFormatError:
- self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL)
- self._log(u"Unsupported audio file format", Logger.CRITICAL)
- raise AudioFileUnsupportedFormatError("Unsupported audio file format")
- except FFPROBEParsingError:
- self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL)
- self._log(u"Failed while parsing the ffprobe output", Logger.CRITICAL)
- raise AudioFileUnsupportedFormatError("Unsupported audio file format")
+ self.log_exc(u"Unable to call ffprobe executable", None, True, AudioFileProbeError)
+ except (FFPROBEUnsupportedFormatError, FFPROBEParsingError):
+ self.log_exc(u"Audio file format not supported by ffprobe", None, True, AudioFileUnsupportedFormatError)
# save relevant properties in results inside the audiofile object
- self.audio_length = gf.safe_float(properties[FFPROBEWrapper.STDOUT_DURATION])
+ self.audio_length = TimeValue(properties[FFPROBEWrapper.STDOUT_DURATION])
self.audio_format = properties[FFPROBEWrapper.STDOUT_CODEC_NAME]
self.audio_sample_rate = gf.safe_int(properties[FFPROBEWrapper.STDOUT_SAMPLE_RATE])
self.audio_channels = gf.safe_int(properties[FFPROBEWrapper.STDOUT_CHANNELS])
- self._log([u"Stored audio_length: '%s'", self.audio_length])
- self._log([u"Stored audio_format: '%s'", self.audio_format])
- self._log([u"Stored audio_sample_rate: '%s'", self.audio_sample_rate])
- self._log([u"Stored audio_channels: '%s'", self.audio_channels])
- self._log(u"Reading properties... done")
-
-
-
-class AudioFileMonoWAVE(AudioFile):
- """
- A monoaural (single-channel) WAVE audio file.
-
- Its data can be read from and write to file, set from a ``numpy`` 1D array.
-
- It supports append, prepend, reverse, and trim operations.
-
- It can also extract MFCCs and store them internally,
- also after the audio data has been discarded.
-
- NOTE
- At the moment, the state of this object might be inconsistent
- (e.g., setting a new path after loading audio data will not flush the audio data).
- Use this class with care.
-
- :param file_path: the path of the audio file
- :type file_path: Unicode string (path)
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
- :param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
- """
-
- TAG = u"AudioFileMonoWAVE"
-
- def __init__(self, file_path=None, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
- self.audio_data = None
- self.audio_mfcc = None
- AudioFile.__init__(self, file_path=file_path, rconf=rconf, logger=logger)
-
- @property
- def audio_data(self):
- """
- The audio data.
-
- :rtype: numpy 1D array
- """
- return self.__audio_data
- @audio_data.setter
- def audio_data(self, audio_data):
- self.__audio_data = audio_data
-
- @property
- def audio_mfcc(self):
- """
- The MFCCs of the audio file.
+ self.log([u"Stored audio_length: '%s'", self.audio_length])
+ self.log([u"Stored audio_format: '%s'", self.audio_format])
+ self.log([u"Stored audio_sample_rate: '%s'", self.audio_sample_rate])
+ self.log([u"Stored audio_channels: '%s'", self.audio_channels])
+ self.log(u"Reading properties... done")
- :rtype: numpy 2D array
+ def read_samples_from_file(self):
"""
- return self.__audio_mfcc
+ Load the audio samples from file into memory.
- @audio_mfcc.setter
- def audio_mfcc(self, audio_mfcc):
- self.__audio_mfcc = audio_mfcc
+ If ``self.is_mono_wave`` is ``False``,
+ the file will be first converted
+ to a temporary PCM16 mono WAVE file.
+ Audio data will be read from this temporary file,
+ which will be then deleted from disk immediately.
- def load_data(self):
- """
- Load the audio file data.
+ If ``self.is_mono_wave`` is ``True``,
+ the audio data will be read directly
+ from the given file,
+ which will not be deleted from disk.
- :raises AudioFileUnsupportedFormatError: if the audio file is not a mono WAVE file
- :raises OSError: if the audio file cannot be read
+ :raises: :class:`~aeneas.audiofile.AudioFileConverterError`: if the path to the ``ffmpeg`` executable cannot be called
+ :raises: :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`: if the audio file has a format not supported
+ :raises: OSError: if the audio file cannot be read
"""
- self._log(u"Loading audio data...")
+ self.log(u"Loading audio data...")
# check the file can be read
if not gf.file_can_be_read(self.file_path):
- self._log([u"File '%s' cannot be read", self.file_path], Logger.CRITICAL)
- raise OSError("File '%s' cannot be read" % self.file_path)
+ self.log_exc(u"File '%s' cannot be read" % (self.file_path), None, True, OSError)
+ # convert file to PCM16 mono WAVE
+ if self.is_mono_wave:
+ self.log(u"is_mono_wave=True => reading self.file_path directly")
+ tmp_handler = None
+ tmp_file_path = self.file_path
+ else:
+ self.log(u"is_mono_wave=False => converting self.file_path")
+ tmp_handler, tmp_file_path = gf.tmp_file(suffix=u".wav", root=self.rconf[RuntimeConfiguration.TMP_PATH])
+ self.log([u"Temporary PCM16 mono WAVE file: '%s'", tmp_file_path])
+ try:
+ self.log(u"Converting audio file to mono...")
+ converter = FFMPEGWrapper(rconf=self.rconf, logger=self.logger)
+ converter.convert(self.file_path, tmp_file_path)
+ self.log(u"Converting audio file to mono... done")
+ except FFMPEGPathError:
+ gf.delete_file(tmp_handler, tmp_file_path)
+ self.log_exc(u"Unable to call ffmpeg executable", None, True, AudioFileConverterError)
+ except OSError:
+ gf.delete_file(tmp_handler, tmp_file_path)
+ self.log_exc(u"Audio file format not supported by ffmpeg", None, True, AudioFileUnsupportedFormatError)
+
+ # TODO allow calling C extension cwave to read samples faster
try:
self.audio_format = "pcm16"
- self.audio_sample_rate, self.audio_data = scipywavread(self.file_path)
+ self.audio_channels = 1
+ self.audio_sample_rate, self.__samples = scipywavread(tmp_file_path)
# scipy reads a sample as an int16_t, that is, a number in [-32768, 32767]
# so we convert it to a float64 in [-1, 1]
- self.audio_data = self.audio_data.astype("float64") / 32768
+ self.__samples = self.__samples.astype("float64") / 32768
+ self.__samples_capacity = len(self.__samples)
+ self.__samples_length = self.__samples_capacity
+ self._update_length()
except ValueError:
- self._log(u"Unsupported audio file format", Logger.CRITICAL)
- raise AudioFileUnsupportedFormatError("Unsupported audio file format")
+ self.log_exc(u"Audio format not supported by scipywavread", None, True, AudioFileUnsupportedFormatError)
+
+ if not self.is_mono_wave:
+ gf.delete_file(tmp_handler, tmp_file_path)
+ self.log([u"Deleted temporary PCM16 mono WAVE file: '%s'", tmp_file_path])
self._update_length()
- self._log([u"Sample length: %f", self.audio_length])
- self._log([u"Sample rate: %f", self.audio_sample_rate])
- self._log([u"Audio format: %s", self.audio_format])
- self._log(u"Loading audio data... done")
+ self.log([u"Sample length: %.3f", self.audio_length])
+ self.log([u"Sample rate: %d", self.audio_sample_rate])
+ self.log([u"Audio format: %s", self.audio_format])
+ self.log([u"Audio channels: %d", self.audio_channels])
+ self.log(u"Loading audio data... done")
- def append_data(self, new_data):
+ def preallocate_memory(self, capacity):
"""
- Append the given new data to the current audio data.
+ Preallocate memory to store audio samples,
+ to avoid repeated new allocations and copies
+ while performing several consecutive append operations.
- If audio data is not loaded, create an empty data structure
- and then append to it.
+ If ``self.__samples`` is not initialized,
+ it will become an array of ``capacity`` zeros.
- :param new_data: the new data to be appended
- :type new_data: numpy 1D array
+ If ``capacity`` is larger than the current capacity,
+ the current ``self.__samples`` will be extended with zeros.
- .. versionadded:: 1.2.1
+ If ``capacity`` is smaller than the current capacity,
+ the first ``capacity`` values of ``self.__samples``
+ will be retained.
+
+ :param int capacity: the new capacity, in number of samples
+ :raises: ValueError: if ``capacity`` is negative
+
+ .. versionadded:: 1.5.0
"""
- self._log(u"Appending audio data...")
- self._audio_data_is_initialized(load=False)
- self.audio_data = numpy.append(self.audio_data, new_data)
- self._update_length()
- self._log(u"Appending audio data... done")
+ if capacity < 0:
+ raise ValueError(u"The capacity value cannot be negative")
+ if self.__samples is None:
+ self.log(u"Not initialized")
+ self.__samples = numpy.zeros(capacity)
+ self.__samples_length = 0
+ else:
+ self.log([u"Previous sample length was (samples): %d", self.__samples_length])
+ self.log([u"Previous sample capacity was (samples): %d", self.__samples_capacity])
+ self.__samples = numpy.resize(self.__samples, capacity)
+ self.__samples_length = min(self.__samples_length, capacity)
+ self.__samples_capacity = capacity
+ self.log([u"Current sample capacity is (samples): %d", self.__samples_capacity])
+
+ def minimize_memory(self):
+ """
+ Reduce the allocated memory to the minimum
+ required to store the current audio samples.
+
+ This function is meant to be called
+ when building a wave incrementally,
+ after the last append operation.
- def prepend_data(self, new_data):
+ .. versionadded:: 1.5.0
"""
- Prepend the given new data to the current audio data.
+ if self.__samples is None:
+ self.log(u"Not initialized, returning")
+ else:
+ self.log(u"Initialized, minimizing memory...")
+ self.preallocate_memory(self.__samples_length)
+ self.log(u"Initialized, minimizing memory... done")
+
+ def add_samples(self, samples, reverse=False):
+ """
+ Concatenate the given new samples to the current audio data.
- If audio data is not loaded, create an empty data structure
- and then preppend to it.
+ This function initializes the memory if no audio data
+ is present already.
- :param new_data: the new data to be prepended
- :type new_data: numpy 1D array
+ If ``reverse`` is ``True``, the new samples
+ will be reversed and then concatenated.
+
+ :param samples: the new samples to be concatenated
+ :type samples: :class:`numpy.ndarray` (1D)
+ :param bool reverse: if ``True``, concatenate new samples after reversing them
.. versionadded:: 1.2.1
"""
- self._log(u"Prepending audio data...")
- self._audio_data_is_initialized(load=False)
- self.audio_data = numpy.append(new_data, self.audio_data)
- self._update_length()
- self._log(u"Prepending audio data... done")
-
- def extract_mfcc(self):
- """
- Extract MFCCs from the given audio file.
-
- If audio data is not loaded, load it, extract MFCCs,
- store them internally, and discard the audio data immediately.
-
- :raise RuntimeError: if both the C extension and
- the pure Python code did not succeed.
- """
- had_audio_data = self._audio_data_is_initialized(load=True)
- gf.run_c_extension_with_fallback(
- self._log,
- "cmfcc",
- self._compute_mfcc_c_extension,
- self._compute_mfcc_pure_python,
- (),
- c_extension=self.rconf["c_ext"]
- )
- if not had_audio_data:
- self._log(u"Audio data was not loaded, clearing it")
- self.clear_data()
+ self.log(u"Adding samples...")
+ samples_length = len(samples)
+ current_length = self.__samples_length
+ future_length = current_length + samples_length
+ if (self.__samples is None) or (self.__samples_capacity < future_length):
+ self.preallocate_memory(2 * future_length)
+ if reverse:
+ self.__samples[current_length:future_length] = samples[::-1]
else:
- self._log(u"Audio data was loaded, not clearing it")
+ self.__samples[current_length:future_length] = samples[:]
+ self.__samples_length = future_length
+ self._update_length()
+ self.log(u"Adding samples... done")
def reverse(self):
"""
Reverse the audio data.
- If audio data is not loaded, load it and then reverse it.
+ :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet
.. versionadded:: 1.2.0
"""
- self._log(u"Reversing...")
- self._audio_data_is_initialized(load=True)
- self.audio_data = self.audio_data[::-1]
- self._log(u"Reversing... done")
+ if self.__samples is None:
+ if self.file_path is None:
+ self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError)
+ else:
+ self.read_samples_from_file()
+ self.log(u"Reversing...")
+ self.__samples[0:self.__samples_length] = numpy.flipud(self.__samples[0:self.__samples_length])
+ self.log(u"Reversing... done")
def trim(self, begin=None, length=None):
"""
@@ -396,62 +467,72 @@ def trim(self, begin=None, length=None):
If audio data is not loaded, load it and then slice it.
:param begin: the start position, in seconds
- :type begin: float
+ :type begin: :class:`~aeneas.timevalue.TimeValue`
:param length: the position, in seconds
- :type length: float
+ :type length: :class:`~aeneas.timevalue.TimeValue`
+ :raises: TypeError: if one of the arguments is not ``None``
+ or :class:`~aeneas.timevalue.TimeValue`
.. versionadded:: 1.2.0
"""
- self._log(u"Trimming...")
+ for variable, name in [(begin, "begin"), (length, "length")]:
+ if (variable is not None) and (not isinstance(variable, TimeValue)):
+ raise TypeError(u"%s is not None or TimeValue" % name)
+ self.log(u"Trimming...")
if (begin is None) and (length is None):
- self._log(u"begin and length are both None: nothing to do")
+ self.log(u"begin and length are both None: nothing to do")
else:
- self._audio_data_is_initialized(load=True)
- self._log([u"audio_length is %.3f", self.audio_length])
if begin is None:
- begin = 0
- self._log([u"begin was None, now set to %.3f", begin])
- begin = min(max(0, begin), self.audio_length)
- self._log([u"begin is %.3f", begin])
+ begin = TimeValue("0.000")
+ self.log([u"begin was None, now set to %.3f", begin])
+ begin = min(max(TimeValue("0.000"), begin), self.audio_length)
+ self.log([u"begin is %.3f", begin])
if length is None:
length = self.audio_length - begin
- self._log([u"length was None, now set to %.3f", length])
- length = min(max(0, length), self.audio_length - begin)
- self._log([u"length is %.3f", length])
+ self.log([u"length was None, now set to %.3f", length])
+ length = min(max(TimeValue("0.000"), length), self.audio_length - begin)
+ self.log([u"length is %.3f", length])
begin_index = int(begin * self.audio_sample_rate)
end_index = int((begin + length) * self.audio_sample_rate)
- self.audio_data = self.audio_data[begin_index:end_index]
+ new_idx = end_index - begin_index
+ self.__samples[0:new_idx] = self.__samples[begin_index:end_index]
+ self.__samples_length = new_idx
self._update_length()
- self._log(u"Trimming... done")
+ self.log(u"Trimming... done")
def write(self, file_path):
"""
Write the audio data to file.
Return ``True`` on success, or ``False`` otherwise.
- :param file_path: the path of the output file to be written
- :type file_path: Unicode string (path)
+ :param string file_path: the path of the output file to be written
+ :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet
.. versionadded:: 1.2.0
"""
- self._log([u"Writing audio file '%s'...", file_path])
- self._audio_data_is_initialized(load=False)
+ if self.__samples is None:
+ if self.file_path is None:
+ self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError)
+ else:
+ self.read_samples_from_file()
+ self.log([u"Writing audio file '%s'...", file_path])
try:
# our value is a float64 in [-1, 1]
# scipy writes the sample as an int16_t, that is, a number in [-32768, 32767]
- data = (self.audio_data * 32768).astype("int16")
+ data = (self.audio_samples * 32768).astype("int16")
scipywavwrite(file_path, self.audio_sample_rate, data)
- except:
- self._log(u"Error writing audio file", severity=Logger.CRITICAL)
- raise OSError("Error writing audio file to '%s'" % file_path)
- self._log([u"Writing audio file '%s'... done", file_path])
+ except Exception as exc:
+ self.log_exc(u"Error writing audio file to '%s'" % (file_path), exc, True, OSError)
+ self.log([u"Writing audio file '%s'... done", file_path])
def clear_data(self):
"""
Clear the audio data, freeing memory.
"""
- self._log(u"Clear audio_data")
- self.audio_data = None
+ self.log(u"Clear audio_data")
+ self.__samples_capacity = 0
+ self.__samples_length = 0
+ self.__samples = None
def _update_length(self):
"""
@@ -459,81 +540,10 @@ def _update_length(self):
according to the length of the current audio data
and audio sample rate.
- This function fails silently if one of the two is None.
- """
- if (self.audio_sample_rate is not None) and (self.audio_data is not None):
- self.audio_length = len(self.audio_data) / self.audio_sample_rate
-
- def _audio_data_is_initialized(self, load=True):
- """
- Check if audio data is loaded:
- if so, return True.
-
- Otherwise, either load or initialize the audio data
- and return False.
-
- :param load: if True, load from file; if False, initialize to empty
- :type load: bool
- :rtype: bool
- """
- if self.audio_data is not None:
- self._log(u"audio data is not None: returning True")
- return True
- if load:
- self._log(u"No audio data: loading it from file")
- self.load_data()
- else:
- self._log(u"No audio data: initializing it to an empty data structure")
- self.audio_data = numpy.array([])
- self._log(u"audio data was None: returning False")
- return False
-
- def _compute_mfcc_c_extension(self):
+ This function fails silently if one of the two is ``None``.
"""
- Compute MFCCs using the Python C extension cmfcc.
- """
- self._log(u"Computing MFCCs using C extension...")
- try:
- self._log(u"Importing cmfcc...")
- import aeneas.cmfcc.cmfcc
- self._log(u"Importing cmfcc... done")
- self.audio_mfcc = (aeneas.cmfcc.cmfcc.compute_from_data(
- self.audio_data,
- self.audio_sample_rate,
- self.rconf["mfcc_filters"],
- self.rconf["mfcc_size"],
- self.rconf["mfcc_order"],
- self.rconf["mfcc_lower_freq"],
- self.rconf["mfcc_upper_freq"],
- self.rconf["mfcc_emph"],
- self.rconf["mfcc_win_len"],
- self.rconf["mfcc_win_shift"]
- )[0]).transpose()
- self._log(u"Computing MFCCs using C extension... done")
- return (True, None)
- except Exception as exc:
- self._log(u"Computing MFCCs using C extension... failed")
- self._log(u"An unexpected exception occurred while running cmfcc:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
-
- def _compute_mfcc_pure_python(self):
- """
- Compute MFCCs using the pure Python code.
- """
- self._log(u"Computing MFCCs using pure Python code...")
- try:
- self.audio_mfcc = MFCC(
- rconf=self.rconf,
- logger=self.logger
- ).compute_from_data(self.audio_data, self.audio_sample_rate).transpose()
- self._log(u"Computing MFCCs using pure Python code... done")
- return (True, None)
- except Exception as exc:
- self._log(u"Computing MFCCs using pure Python code... failed")
- self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
+ if (self.audio_sample_rate is not None) and (self.__samples is not None):
+ self.audio_length = TimeValue(self.__samples_length / self.audio_sample_rate)
diff --git a/aeneas/audiofilemfcc.py b/aeneas/audiofilemfcc.py
new file mode 100644
index 00000000..2dd5efb4
--- /dev/null
+++ b/aeneas/audiofilemfcc.py
@@ -0,0 +1,637 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+This module contains the following classes:
+
+* :class:`~aeneas.audiofilemfcc.AudioFileMFCC`,
+ representing a mono WAVE audio file as a matrix of
+ Mel-frequency ceptral coefficients (MFCC).
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy
+
+from aeneas.audiofile import AudioFile
+from aeneas.logger import Loggable
+from aeneas.mfcc import MFCC
+from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.timevalue import TimeValue
+from aeneas.vad import VAD
+import aeneas.globalfunctions as gf
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+class AudioFileMFCC(Loggable):
+ """
+ A monoaural (single channel) WAVE audio file,
+ represented as a NumPy 2D matrix of
+ Mel-frequency ceptral coefficients (MFCC).
+
+ The matrix is "fat", that is,
+ its number of rows is equal to the number of MFCC coefficients
+ and its number of columns is equal to the number of window shifts
+ in the audio file.
+ The number of MFCC coefficients and the MFCC window shift can
+ be modified via the
+ :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_SIZE`
+ and
+ :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_SHIFT`
+ keys in the ``rconf`` object.
+
+ If ``mfcc_matrix`` is not ``None``,
+ it will be used as the MFCC matrix.
+
+ If ``file_path`` or ``audio_file`` is not ``None``,
+ the MFCCs will be computed upon creation of the object,
+ possibly converting to PCM16 Mono WAVE and/or
+ loading audio data in memory.
+
+ The MFCCs for the entire wave
+ are divided into three
+ contiguous intervals (possibly, zero-length)::
+
+ HEAD = [:middle_begin[
+ MIDDLE = [middle_begin:middle_end[
+ TAIL = [middle_end:[
+
+ The usual NumPy convention of including the left/start index
+ and excluding the right/end index is adopted.
+
+ For alignment purposes, only the ``MIDDLE`` portion of the wave
+ is taken into account; the ``HEAD`` and ``TAIL`` intervals are ignored.
+
+ This class heavily uses NumPy views and in-place operations
+ to avoid creating temporary data or copying data around.
+
+ :param string file_path: the path of the PCM16 mono WAVE file, or ``None``
+ :param bool file_path_is_mono_wave: set to ``True`` if the audio file at ``file_path`` is a PCM16 mono WAVE file
+ :param mfcc_matrix: the MFCC matrix to be set, or ``None``
+ :type mfcc_matrix: :class:`numpy.ndarray`
+ :param audio_file: an audio file, or ``None``
+ :type audio_file: :class:`~aeneas.audiofile.AudioFile`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: ValueError: if ``file_path``, ``audio_file``, and ``mfcc_matrix`` are all ``None``
+
+ .. versionadded:: 1.5.0
+ """
+
+ TAG = u"AudioFileMFCC"
+
+ def __init__(
+ self,
+ file_path=None,
+ file_path_is_mono_wave=False,
+ mfcc_matrix=None,
+ audio_file=None,
+ rconf=None,
+ logger=None
+ ):
+ if (file_path is None) and (audio_file is None) and (mfcc_matrix is None):
+ raise ValueError(u"You must initialize with at least one of: file_path, audio_file, or mfcc_matrix")
+ super(AudioFileMFCC, self).__init__(rconf=rconf, logger=logger)
+ self.file_path = file_path
+ self.audio_file = audio_file
+ self.is_reversed = False
+ self.__mfcc = None
+ self.__mfcc_mask = None
+ self.__mfcc_mask_map = None
+ self.__speech_intervals = None
+ self.__nonspeech_intervals = None
+ self.log(u"Initializing MFCCs...")
+ if mfcc_matrix is not None:
+ self.__mfcc = mfcc_matrix
+ self.audio_length = self.all_length * self.rconf.mws
+ elif (self.file_path is not None) or (self.audio_file is not None):
+ audio_file_was_none = False
+ if self.audio_file is None:
+ audio_file_was_none = True
+ self.audio_file = AudioFile(
+ self.file_path,
+ is_mono_wave=file_path_is_mono_wave,
+ rconf=self.rconf,
+ logger=self.logger
+ )
+ # NOTE load audio samples into memory, if not present already
+ self.audio_file.audio_samples
+ gf.run_c_extension_with_fallback(
+ self.log,
+ "cmfcc",
+ self._compute_mfcc_c_extension,
+ self._compute_mfcc_pure_python,
+ (),
+ c_extension=self.rconf[RuntimeConfiguration.C_EXTENSIONS]
+ )
+ self.audio_length = self.audio_file.audio_length
+ if audio_file_was_none:
+ self.log(u"Clearing the audio data...")
+ self.audio_file.clear_data()
+ self.audio_file = None
+ self.log(u"Clearing the audio data... done")
+ self.__middle_begin = 0
+ self.__middle_end = self.__mfcc.shape[1]
+ self.log(u"Initializing MFCCs... done")
+
+ def __unicode__(self):
+ msg = [
+ u"File path: %s" % self.file_path,
+ u"Audio length (s): %s" % gf.safe_float(self.audio_length),
+ ]
+ return u"\n".join(msg)
+
+ def __str__(self):
+ return gf.safe_str(self.__unicode__())
+
+ @property
+ def all_mfcc(self):
+ """
+ The MFCCs of the entire audio file,
+ that is, HEAD + MIDDLE + TAIL.
+
+ :rtype: :class:`numpy.ndarray` (2D)
+ """
+ return self.__mfcc
+
+ @property
+ def all_length(self):
+ """
+ The length, in MFCC coefficients,
+ of the entire audio file,
+ that is, HEAD + MIDDLE + TAIL.
+
+ :rtype: int
+ """
+ return self.__mfcc.shape[1]
+
+ @property
+ def middle_mfcc(self):
+ """
+ The MFCCs of the middle part of the audio file,
+ that is, without HEAD and TAIL.
+
+ :rtype: :class:`numpy.ndarray` (2D)
+ """
+ return self.__mfcc[:, self.__middle_begin:self.__middle_end]
+
+ @property
+ def middle_length(self):
+ """
+ The length, in MFCC coefficients,
+ of the middle part of the audio file,
+ that is, without HEAD and TAIL.
+
+ :rtype: int
+ """
+ return self.__middle_end - self.__middle_begin
+
+ @property
+ def middle_map(self):
+ """
+ Return the map
+ from the MFCC frame indices
+ in the MIDDLE portion of the wave
+ to the MFCC FULL frame indices,
+ that is, an ``numpy.arange(self.middle_begin, self.middle_end)``.
+
+ NOTE: to translate indices of MIDDLE,
+ instead of using fancy indexing with the
+ result of this function, you might want to simply
+ add ``self.head_length``.
+ This function is provided mostly for consistency
+ with the MASKED case.
+
+ :rtype: :class:`numpy.ndarray` (1D)
+ """
+ return numpy.arange(self.__middle_begin, self.__middle_end)
+
+ @property
+ def head_length(self):
+ """
+ The length, in MFCC coefficients,
+ of the HEAD of the audio file.
+
+ :rtype: int
+ """
+ return self.__middle_begin
+
+ @property
+ def tail_length(self):
+ """
+ The length, in MFCC coefficients,
+ of the TAIL of the audio file.
+
+ :rtype: int
+ """
+ return self.all_length - self.__middle_end
+
+ @property
+ def tail_begin(self):
+ """
+ The index, in MFCC coefficients,
+ where the TAIL of the audio file starts.
+
+ :rtype: int
+ """
+ return self.__middle_end
+
+ @property
+ def audio_length(self):
+ """
+ The length, in seconds, of the audio file.
+
+ This value is the actual length of the audio file,
+ computed as ``number of samples / sample_rate``,
+ hence it might differ than ``len(self.__mfcc) * mfcc_window_shift``.
+
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
+ """
+ return self.__audio_length
+ @audio_length.setter
+ def audio_length(self, audio_length):
+ self.__audio_length = audio_length
+
+ @property
+ def is_reversed(self):
+ """
+ Return ``True`` if currently reversed.
+
+ :rtype: bool
+ """
+ return self.__is_reversed
+ @is_reversed.setter
+ def is_reversed(self, is_reversed):
+ self.__is_reversed = is_reversed
+
+ @property
+ def masked_mfcc(self):
+ """
+ Return the MFCC speech frames
+ in the FULL wave.
+
+ :rtype: :class:`numpy.ndarray` (2D)
+ """
+ self._ensure_mfcc_mask()
+ return self.__mfcc[:, self.__mfcc_mask]
+
+ @property
+ def masked_length(self):
+ """
+ Return the number of MFCC speech frames
+ in the FULL wave.
+
+ :rtype: int
+ """
+ self._ensure_mfcc_mask()
+ return len(self.__mfcc_mask_map)
+
+ @property
+ def masked_map(self):
+ """
+ Return the map
+ from the MFCC speech frame indices
+ to the MFCC FULL frame indices.
+
+ :rtype: :class:`numpy.ndarray` (1D)
+ """
+ self._ensure_mfcc_mask()
+ return self.__mfcc_mask_map
+
+ @property
+ def masked_middle_mfcc(self):
+ """
+ Return the MFCC speech frames
+ in the MIDDLE portion of the wave.
+
+ :rtype: :class:`numpy.ndarray` (2D)
+ """
+ begin, end = self._masked_middle_begin_end()
+ return (self.masked_mfcc)[:, begin:end]
+
+ @property
+ def masked_middle_length(self):
+ """
+ Return the number of MFCC speech frames
+ in the MIDDLE portion of the wave.
+
+ :rtype: int
+ """
+ begin, end = self._masked_middle_begin_end()
+ return end - begin
+
+ @property
+ def masked_middle_map(self):
+ """
+ Return the map
+ from the MFCC speech frame indices
+ in the MIDDLE portion of the wave
+ to the MFCC FULL frame indices.
+
+ :rtype: :class:`numpy.ndarray` (1D)
+ """
+ begin, end = self._masked_middle_begin_end()
+ return self.__mfcc_mask_map[begin:end]
+
+ def _masked_middle_begin_end(self):
+ """
+ Return the begin and end indices w.r.t. ``self.__mfcc_mask_map``,
+ corresponding to indices in the MIDDLE portion of the wave,
+ that is, which fall between ``self.__middle_begin`` and
+ ``self.__middle_end`` in ``self.__mfcc``.
+
+ :rtype: (int, int)
+ """
+ self._ensure_mfcc_mask()
+ begin = numpy.searchsorted(self.__mfcc_mask_map, self.__middle_begin, side="left")
+ end = numpy.searchsorted(self.__mfcc_mask_map, self.__middle_end, side="right")
+ return (begin, end)
+
+ def intervals(self, speech=True, time=True):
+ """
+ Return a list of intervals::
+
+ [(b_1, e_1), (b_2, e_2), ..., (b_k, e_k)]
+
+ where ``b_i`` is the time when the ``i``-th interval begins,
+ and ``e_i`` is the time when it ends.
+
+ :param bool speech: if ``True``, return speech intervals,
+ otherwise return nonspeech intervals
+ :param bool time: if ``True``, return values in seconds (:class:`~aeneas.timevalue.TimeValue`),
+ otherwise in indices (int)
+ :rtype: list of pairs (see above)
+ """
+ self._ensure_mfcc_mask()
+ if speech:
+ self.log(u"Converting speech runs to intervals")
+ intervals = self.__speech_intervals
+ else:
+ self.log(u"Converting nonspeech runs to intervals")
+ intervals = self.__nonspeech_intervals
+ if time:
+ mws = self.rconf.mws
+ return [(i[0] * mws, (i[1] + 1) * mws) for i in intervals]
+ return intervals
+
+ def inside_nonspeech(self, index):
+ """
+ If ``index`` is contained in a nonspeech interval,
+ return a pair ``(interval_begin, interval_end)``
+ such that ``interval_begin <= index < interval_end``,
+ i.e., ``interval_end`` is assumed not to be included.
+
+ Otherwise, return ``None``.
+
+ :rtype: ``None`` or tuple
+ """
+ self._ensure_mfcc_mask()
+ if (index < 0) or (index >= self.all_length) or (self.__mfcc_mask[index]):
+ return None
+ return self._binary_search_intervals(self.__nonspeech_intervals, index)
+
+ @classmethod
+ def _binary_search_intervals(cls, intervals, index):
+ """
+ Binary search for the interval containing index,
+ assuming there is such an interval.
+ This function should never return ``None``.
+ """
+ start = 0
+ end = len(intervals) - 1
+ while start <= end:
+ middle_index = start + ((end - start) // 2)
+ middle = intervals[middle_index]
+ if (middle[0] <= index) and (index < middle[1]):
+ return middle
+ elif middle[0] > index:
+ end = middle_index - 1
+ else:
+ start = middle_index + 1
+ return None
+
+ @property
+ def middle_begin(self):
+ """
+ Return the index where MIDDLE starts.
+
+ :rtype: int
+ """
+ return self.__middle_begin
+
+ @middle_begin.setter
+ def middle_begin(self, index):
+ """
+ Set the index where MIDDLE starts.
+
+ :param int index: the new index for MIDDLE begin
+ """
+ if (index < 0) or (index > self.all_length):
+ raise ValueError(u"The given index is not valid")
+ self.__middle_begin = index
+
+ @property
+ def middle_begin_seconds(self):
+ """
+ Return the time instant, in seconds, where MIDDLE starts.
+
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
+ """
+ return TimeValue(self.__middle_begin) * self.rconf.mws
+
+ @property
+ def middle_end(self):
+ """
+ Return the index (+1) where MIDDLE ends.
+
+ :rtype: int
+ """
+ return self.__middle_end
+
+ @middle_end.setter
+ def middle_end(self, index):
+ """
+ Set the index (+1) where MIDDLE ends.
+
+ :param int index: the new index for MIDDLE end
+ """
+ if (index < 0) or (index > self.all_length):
+ raise ValueError(u"The given index is not valid")
+ self.__middle_end = index
+
+ @property
+ def middle_end_seconds(self):
+ """
+ Return the time instant, in seconds, where MIDDLE ends.
+
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
+ """
+ return TimeValue(self.__middle_end) * self.rconf.mws
+
+ def _ensure_mfcc_mask(self):
+ """
+ Ensure that ``run_vad()`` has already been called,
+ and hence ``self.__mfcc_mask`` has a meaningful value.
+ """
+ if self.__mfcc_mask is None:
+ self.log(u"VAD was not run: running it now")
+ self.run_vad()
+
+ def _compute_mfcc_c_extension(self):
+ """
+ Compute MFCCs using the Python C extension cmfcc.
+ """
+ self.log(u"Computing MFCCs using C extension...")
+ try:
+ self.log(u"Importing cmfcc...")
+ import aeneas.cmfcc.cmfcc
+ self.log(u"Importing cmfcc... done")
+ self.__mfcc = (aeneas.cmfcc.cmfcc.compute_from_data(
+ self.audio_file.audio_samples,
+ self.audio_file.audio_sample_rate,
+ self.rconf[RuntimeConfiguration.MFCC_FILTERS],
+ self.rconf[RuntimeConfiguration.MFCC_SIZE],
+ self.rconf[RuntimeConfiguration.MFCC_FFT_ORDER],
+ self.rconf[RuntimeConfiguration.MFCC_LOWER_FREQUENCY],
+ self.rconf[RuntimeConfiguration.MFCC_UPPER_FREQUENCY],
+ self.rconf[RuntimeConfiguration.MFCC_EMPHASIS_FACTOR],
+ self.rconf[RuntimeConfiguration.MFCC_WINDOW_LENGTH],
+ self.rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT]
+ )[0]).transpose()
+ self.log(u"Computing MFCCs using C extension... done")
+ return (True, None)
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running cmfcc", exc, False, None)
+ return (False, None)
+
+ def _compute_mfcc_pure_python(self):
+ """
+ Compute MFCCs using the pure Python code.
+ """
+ self.log(u"Computing MFCCs using pure Python code...")
+ try:
+ self.__mfcc = MFCC(
+ rconf=self.rconf,
+ logger=self.logger
+ ).compute_from_data(
+ self.audio_file.audio_samples,
+ self.audio_file.audio_sample_rate
+ ).transpose()
+ self.log(u"Computing MFCCs using pure Python code... done")
+ return (True, None)
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None)
+ return (False, None)
+
+ def reverse(self):
+ """
+ Reverse the audio file.
+
+ The reversing is done efficiently using NumPy views inplace
+ instead of swapping values.
+
+ Only speech and nonspeech intervals are actually recomputed
+ as Python lists.
+ """
+ self.log(u"Reversing...")
+ all_length = self.all_length
+ self.__mfcc = self.__mfcc[:, ::-1]
+ tmp = self.__middle_end
+ self.__middle_end = all_length - self.__middle_begin
+ self.__middle_begin = all_length - tmp
+ if self.__mfcc_mask is not None:
+ self.__mfcc_mask = self.__mfcc_mask[::-1]
+ # equivalent to
+ # self.__mfcc_mask_map = ((all_length - 1) - self.__mfcc_mask_map)[::-1]
+ # but done in place using NumPy view
+ self.__mfcc_mask_map *= -1
+ self.__mfcc_mask_map += all_length - 1
+ self.__mfcc_mask_map = self.__mfcc_mask_map[::-1]
+ self.__speech_intervals = [(all_length - i[1], all_length - i[0]) for i in self.__speech_intervals[::-1]]
+ self.__nonspeech_intervals = [(all_length - i[1], all_length - i[0]) for i in self.__nonspeech_intervals[::-1]]
+ self.is_reversed = not self.is_reversed
+ self.log(u"Reversing...done")
+
+ def run_vad(self):
+ """
+ Determine which frames contain speech and nonspeech,
+ and store the resulting boolean mask internally.
+ """
+ def _compute_runs(array):
+ """
+ Compute runs as a list of arrays,
+ each containing the indices of a contiguous run.
+
+ :param array: the data array
+ :type array: :class:`numpy.ndarray` (1D)
+ :rtype: list of :class:`numpy.ndarray` (1D)
+ """
+ if len(array) < 1:
+ return []
+ return numpy.split(array, numpy.where(numpy.diff(array) != 1)[0] + 1)
+ self.log(u"Creating VAD object")
+ vad = VAD(rconf=self.rconf, logger=self.logger)
+ self.log(u"Running VAD...")
+ self.__mfcc_mask = vad.run_vad(self.__mfcc[0])
+ self.__mfcc_mask_map = (numpy.where(self.__mfcc_mask))[0]
+ self.log(u"Running VAD... done")
+ self.log(u"Storing speech and nonspeech intervals...")
+ # where( == True) already computed, reusing
+ #runs = _compute_runs((numpy.where(self.__mfcc_mask))[0])
+ runs = _compute_runs(self.__mfcc_mask_map)
+ self.__speech_intervals = [(r[0], r[-1]) for r in runs]
+ # where( == False) not already computed, computing now
+ runs = _compute_runs((numpy.where(~self.__mfcc_mask))[0])
+ self.__nonspeech_intervals = [(r[0], r[-1]) for r in runs]
+ self.log(u"Storing speech and nonspeech intervals... done")
+
+ def set_head_middle_tail(self, head_length=None, middle_length=None, tail_length=None):
+ """
+ Set the HEAD, MIDDLE, TAIL explicitly.
+
+ If a parameter is ``None``, it will be ignored.
+ If both ``middle_length`` and ``tail_length`` are specified,
+ only ``middle_length`` will be applied.
+
+ :param head_length: the length of HEAD, in seconds
+ :type head_length: :class:`~aeneas.timevalue.TimeValue`
+ :param middle_length: the length of MIDDLE, in seconds
+ :type middle_length: :class:`~aeneas.timevalue.TimeValue`
+ :param tail_length: the length of TAIL, in seconds
+ :type tail_length: :class:`~aeneas.timevalue.TimeValue`
+ :raises: TypeError: if one of the arguments is not ``None``
+ or :class:`~aeneas.timevalue.TimeValue`
+ """
+ for variable, name in [
+ (head_length, "head_length"),
+ (middle_length, "middle_length"),
+ (tail_length, "tail_length")
+ ]:
+ if (variable is not None) and (not isinstance(variable, TimeValue)):
+ raise TypeError(u"%s is not None or TimeValue" % name)
+ self.log(u"Setting head middle tail...")
+ mws = self.rconf.mws
+ self.log([u"Before: 0 %d %d %d", self.middle_begin, self.middle_end, self.all_length])
+ if head_length is not None:
+ self.middle_begin = int(head_length / mws)
+ if middle_length is not None:
+ self.middle_end = self.middle_begin + int(middle_length / mws)
+ elif tail_length is not None:
+ self.middle_end = self.all_length - int(tail_length / mws)
+ self.log([u"After: 0 %d %d %d", self.middle_begin, self.middle_end, self.all_length])
+ self.log(u"Setting head middle tail... done")
+
+
+
diff --git a/aeneas/cdtw/000_compile_driver.sh b/aeneas/cdtw/000_compile_driver.sh
new file mode 100644
index 00000000..4cb2f2b7
--- /dev/null
+++ b/aeneas/cdtw/000_compile_driver.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+gcc cdtw_driver.c cdtw_func.c cint.c -o cdtw_driver -lm -Wall -pedantic -std=c99
+
+
+
diff --git a/aeneas/cdtw/100_run_driver.sh b/aeneas/cdtw/100_run_driver.sh
new file mode 100644
index 00000000..a68ab4f8
--- /dev/null
+++ b/aeneas/cdtw/100_run_driver.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+if [ ! -e cdtw_driver ]
+then
+ bash 000_compile_driver.sh
+fi
+
+echo "Run 1"
+./cdtw_driver
+echo ""
+
+echo "Run 2 (no stdout)"
+./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 cm > /dev/null
+echo ""
+
+echo "Run 3 (no stdout)"
+./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 acm > /dev/null
+echo ""
+
+echo "Run 4 (no stdout)"
+./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path > /dev/null
+echo ""
+
+
+
diff --git a/aeneas/cdtw/800_compile_py.sh b/aeneas/cdtw/800_compile_py.sh
new file mode 100644
index 00000000..46ad13b6
--- /dev/null
+++ b/aeneas/cdtw/800_compile_py.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+rm -rf build *.so
+python cdtw_setup.py build_ext --inplace
+
diff --git a/aeneas/cdtw/README.md b/aeneas/cdtw/README.md
new file mode 100644
index 00000000..f1b6e035
--- /dev/null
+++ b/aeneas/cdtw/README.md
@@ -0,0 +1,22 @@
+# aeneas.cdtw
+
+**aeneas.cdtw** is a Python C extension for computing the DTW.
+
+## API
+
+See the [__init__.py](__init__.py) file.
+
+## Compiling the Python C extension locally
+
+```bash
+$ python cdtw_setup.py build_ext --inplace
+```
+
+## Compiling the pure C driver program
+
+```bash
+$ bash 000_compile_driver.sh
+```
+
+
+
diff --git a/aeneas/cdtw/__init__.py b/aeneas/cdtw/__init__.py
index 9afa8c04..a8215303 100644
--- a/aeneas/cdtw/__init__.py
+++ b/aeneas/cdtw/__init__.py
@@ -3,6 +3,108 @@
"""
aeneas.cdtw is a Python C extension for computing the DTW.
+
+.. function:: cdtw.compute_best_path(mfcc1, mfcc2, delta)
+
+ Compute the DTW (approximated) best path
+ for the two audio waves, represented by their MFCCs.
+
+ This function implements the Sakoe-Chiba heuristic,
+ that is, it explores only a band of width ``2 * delta``
+ around the main diagonal of the cost matrix.
+
+ The computation is done in-memory, and it might fail
+ if there is not enough memory to allocate the cost matrix
+ or the list to be returned.
+
+ The returned list contains tuples ``(i, j)``,
+ representing the best path from ``(0, 0)`` to ``(n-1, m-1)``,
+ where ``n`` is the length of ``mfcc1``, and
+ ``m`` is the length of ``mfcc2``.
+ The returned list has length between ``min(n, m)`` and ``n + m``
+ (it can be less than ``n + m`` if diagonal steps
+ are selected in the best path).
+
+ :param mfcc1: the MFCCs of the first wave ``(n, mfcc_size)``
+ :type mfcc1: :class:`numpy.ndarray`
+ :param mfcc2: the MFCCs of the second wave ``(m, mfcc_size)``
+ :type mfcc2: :class:`numpy.ndarray`
+ :param int delta: the margin parameter
+ :rtype: list of tuples
+
+.. function:: cdtw.compute_cost_matrix_step(mfcc1, mfcc2, delta)
+
+ Compute the DTW (approximated) cost matrix
+ for the two audio waves, represented by their MFCCs.
+
+ This function implements the Sakoe-Chiba heuristic,
+ that is, it explores only a band of width ``2 * delta``
+ around the main diagonal of the cost matrix.
+
+ The computation is done in-memory, and it might fail
+ if there is not enough memory to allocate the cost matrix.
+
+ The returned tuple ``(cost_matrix, centers)``
+ contains the cost matrix (NumPy 2D array of shape (n, delta))
+ and the row centers (NumPy 1D array of size n).
+
+ :param mfcc1: the MFCCs of the first wave ``(n, mfcc_size)``
+ :type mfcc1: :class:`numpy.ndarray`
+ :param mfcc2: the MFCCs of the second wave ``(m, mfcc_size)``
+ :type mfcc2: :class:`numpy.ndarray`
+ :param int delta: the margin parameter
+ :rtype: tuple
+
+.. function:: cdtw.compute_accumulated_cost_matrix_step(cost_matrix, centers)
+
+ Compute the DTW (approximated) accumulated cost matrix
+ from the cost matrix and the row centers.
+
+ This function implements the Sakoe-Chiba heuristic,
+ that is, it explores only a band of width ``2 * delta``
+ around the main diagonal of the cost matrix.
+
+ The computation is done in-memory,
+ and the accumulated cost matrix is computed in place,
+ that is, the original cost matrix is destroyed
+ and its allocated memory used to store
+ the accumulated cost matrix.
+ Hence, this call should not fail for memory reasons.
+
+ The returned NumPy 2D array of shape ``(n, delta)``
+ contains the accumulated cost matrix.
+
+ :param cost_matrix: the cost matrix ``(n, delta)``
+ :type cost_matrix: :class:`numpy.ndarray`
+ :param centers: the row centers ``(n,)``
+ :type centers: :class:`numpy.ndarray`
+ :rtype: :class:`numpy.ndarray`
+
+.. function:: cdtw.compute_best_path_step(accumulated_cost_matrix, centers)
+
+ Compute the DTW (approximated) best path
+ from the accumulated cost matrix and the row centers.
+
+ This function implements the Sakoe-Chiba heuristic,
+ that is, it explores only a band of width ``2 * delta``
+ around the main diagonal of the cost matrix.
+
+ The computation is done in-memory, and it might fail
+ if there is not enough memory to allocate the list to be returned.
+
+ The returned list contains tuples ``(i, j)``,
+ representing the best path from ``(0, 0)`` to ``(n-1, m-1)``,
+ where ``n`` is the length of ``mfcc1``, and
+ ``m`` is the length of ``mfcc2``.
+ The returned list has length between ``min(n, m)`` and ``n + m``
+ (it can be less than ``n + m`` if diagonal steps
+ are selected in the best path).
+
+ :param cost_matrix: the accumulated cost matrix ``(n, delta)``
+ :type cost_matrix: :class:`numpy.ndarray`
+ :param centers: the row centers ``(n, )``
+ :type centers: :class:`numpy.ndarray`
+ :rtype: list of tuples
"""
__author__ = "Alberto Pettarin"
@@ -12,7 +114,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL 3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/cdtw/cdtw_driver.c b/aeneas/cdtw/cdtw_driver.c
index 09c3fb7d..6d511227 100644
--- a/aeneas/cdtw/cdtw_driver.c
+++ b/aeneas/cdtw/cdtw_driver.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for computing the DTW
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -20,11 +20,30 @@ __status__ = "Production"
#include
#include "cdtw_func.h"
-void _read_matrix(const char *file_name, double *matrix, unsigned int n, unsigned int m) {
- unsigned int i, j;
+#define DRIVER_SUCCESS 0
+#define DRIVER_FAILURE 1
+
+// print usage
+void _usage(const char *prog) {
+ printf("\n");
+ printf("Usage: %s MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN [cm|acm|path]\n", prog);
+ printf("\n");
+ printf("Example: %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 cm\n", prog);
+ printf(" %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 acm\n", prog);
+ printf(" %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path\n", prog);
+ printf("\n");
+}
+
+// read matrix from file
+int _read_matrix(const char *file_name, double *matrix, uint32_t n, uint32_t m) {
+ uint32_t i, j;
FILE *file_ptr;
file_ptr = fopen(file_name, "r");
+ if (file_ptr == NULL) {
+ return DRIVER_FAILURE;
+ }
+
for (i = 0; i < n; ++i) {
for (j = 0; j < m; ++j) {
if (!fscanf(file_ptr, "%lf", matrix + i * m + j)) {
@@ -35,10 +54,12 @@ void _read_matrix(const char *file_name, double *matrix, unsigned int n, unsigne
}
fclose(file_ptr);
file_ptr = NULL;
+ return DRIVER_SUCCESS;
}
-void _print_matrix(double *matrix, unsigned int n, unsigned int m) {
- unsigned int i, j;
+// print matrix to stdout
+void _print_matrix(double *matrix, uint32_t n, uint32_t m) {
+ uint32_t i, j;
for (i = 0; i < n; ++i) {
for (j = 0; j < m; ++j) {
@@ -48,33 +69,18 @@ void _print_matrix(double *matrix, unsigned int n, unsigned int m) {
}
}
-//
-// this is a simple driver to test on the command line
-//
-// compile it with:
-//
-// $ gcc cdtw_driver.c cdtw_func.c -o cdtw_driver -lm
-//
-// use it as follows:
-//
-// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN cm => compute and print cost matrix
-// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN acm => compute and print accumulated cost matrix
-// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN path => compute and print best path
-//
-// example:
-// ./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path
-//
int main(int argc, char **argv) {
double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr;
char *mfcc1_file_name, *mfcc2_file_name, *mode;
- unsigned int *centers_ptr;
- unsigned int mfcc_size, delta, mfcc1_len, mfcc2_len, best_path_length, k;
+ uint32_t *centers_ptr;
+ uint32_t mfcc_size, delta, mfcc1_len, mfcc2_len;
+ uint32_t best_path_length, k;
struct PATH_CELL *best_path;
if (argc < 8) {
- printf("\nUsage: %s MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN [cm|acm|path]\n\n", argv[0]);
- return 1;
+ _usage(argv[0]);
+ return DRIVER_FAILURE;
}
mfcc_size = atoi(argv[1]);
delta = atoi(argv[2]);
@@ -88,29 +94,62 @@ int main(int argc, char **argv) {
delta = mfcc2_len;
}
+ // allocate space for the MFCCs and read the input files
mfcc1_ptr = (double *)calloc(mfcc1_len * mfcc_size, sizeof(double));
- _read_matrix(mfcc1_file_name, mfcc1_ptr, mfcc1_len, mfcc_size);
mfcc2_ptr = (double *)calloc(mfcc2_len * mfcc_size, sizeof(double));
- _read_matrix(mfcc2_file_name, mfcc2_ptr, mfcc2_len, mfcc_size);
+ if ((mfcc1_ptr == NULL) || (mfcc2_ptr == NULL)) {
+ printf("Error: unable to allocate space for the input MFCCs.\n");
+ return DRIVER_FAILURE;
+ }
+ if (_read_matrix(mfcc1_file_name, mfcc1_ptr, mfcc1_len, mfcc_size) != DRIVER_SUCCESS) {
+ printf("Error: unable to read MFCC1.\n");
+ return DRIVER_FAILURE;
+ }
+ if (_read_matrix(mfcc2_file_name, mfcc2_ptr, mfcc2_len, mfcc_size) != DRIVER_SUCCESS) {
+ printf("Error: unable to read MFCC2.\n");
+ return DRIVER_FAILURE;
+ }
- // allocate space
+ // allocate space for the cost matrix
cost_matrix_ptr = (double *)calloc(mfcc1_len * delta, sizeof(double));
- centers_ptr = (unsigned int *)calloc(mfcc1_len, sizeof(unsigned int));
+ centers_ptr = (uint32_t *)calloc(mfcc1_len, sizeof(uint32_t));
+ if ((cost_matrix_ptr == NULL) || (centers_ptr == NULL)) {
+ printf("Error: unable to allocate space for the cost matrix and the centers.\n");
+ return DRIVER_FAILURE;
+ }
// compute cost matrix
- _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, mfcc1_len, mfcc2_len, mfcc_size);
+ if (_compute_cost_matrix(
+ mfcc1_ptr,
+ mfcc2_ptr,
+ delta,
+ cost_matrix_ptr,
+ centers_ptr,
+ mfcc1_len,
+ mfcc2_len,
+ mfcc_size) != CDTW_SUCCESS) {
+ printf("Error: unable to compute cost matrix.\n");
+ return DRIVER_FAILURE;
+ }
+
if (strcmp(mode, "cm") == 0) {
// print cost matrix
_print_matrix(cost_matrix_ptr, mfcc1_len, delta);
} else if ((strcmp(mode, "acm") == 0) || (strcmp(mode, "path") == 0)) {
// compute accumulated cost matrix
- _compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, mfcc1_len, delta);
+ if (_compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, mfcc1_len, delta) != CDTW_SUCCESS) {
+ printf("Error: unable to compute accumulated cost matrix.\n");
+ return DRIVER_FAILURE;
+ }
if (strcmp(mode, "acm") == 0) {
// print accumulated cost matrix
_print_matrix(cost_matrix_ptr, mfcc1_len, delta);
} else {
// print best path
- _compute_best_path(cost_matrix_ptr, centers_ptr, mfcc1_len, delta, &best_path, &best_path_length);
+ if (_compute_best_path(cost_matrix_ptr, centers_ptr, mfcc1_len, delta, &best_path, &best_path_length) != CDTW_SUCCESS) {
+ printf("Error: unable to compute best path.\n");
+ return DRIVER_FAILURE;
+ }
for (k = 0; k < best_path_length; ++k) {
printf("%u %u\n", best_path[k].i, best_path[k].j);
}
@@ -124,7 +163,7 @@ int main(int argc, char **argv) {
free((void *)mfcc2_ptr);
free((void *)mfcc1_ptr);
- return 0;
+ return DRIVER_SUCCESS;
}
diff --git a/aeneas/cdtw/cdtw_func.c b/aeneas/cdtw/cdtw_func.c
index fd55df51..6bed35a3 100644
--- a/aeneas/cdtw/cdtw_func.c
+++ b/aeneas/cdtw/cdtw_func.c
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -27,23 +27,27 @@ __status__ = "Production"
#define NPY_INFINITY DBL_MAX
#endif
-// return the max of the given arguments
-unsigned int _max(const int a, const int b) {
- if (a > b) {
- return a;
+#define MOVE0 0 // up
+#define MOVE1 1 // left
+#define MOVE2 2 // up and left
+
+// return the max(0, center_j - half_delta)
+uint32_t _nonnegative_difference(uint32_t center_j, uint32_t half_delta) {
+ if (half_delta > center_j) {
+ return 0;
}
- return b;
+ return center_j - half_delta;
}
// return the argmin of the three arguments
unsigned int _three_way_argmin(const double cost0, const double cost1, const double cost2) {
if ((cost0 <= cost1) && (cost0 <= cost2)) {
- return 0;
+ return MOVE0;
}
if (cost1 <= cost2) {
- return 1;
+ return MOVE1;
}
- return 2;
+ return MOVE2;
}
// return the min of three arguments
@@ -58,20 +62,19 @@ double _three_way_min(const double cost0, const double cost1, const double cost2
}
// copy the row-th row of cost_matrix into buffer
-void _copy_cost_matrix_row(const double *cost_matrix_ptr, const unsigned int row, const unsigned int width, double *buffer_ptr) {
+void _copy_cost_matrix_row(const double *cost_matrix_ptr, const uint32_t row, const uint32_t width, double *buffer_ptr) {
memcpy(buffer_ptr, cost_matrix_ptr + row * width, width * sizeof(double));
}
// appen the given (i, j) cell to the k-th position of the best path
-void _append(struct PATH_CELL *best_path_ptr, const unsigned int k, const unsigned int i, const unsigned int j) {
+void _append(struct PATH_CELL *best_path_ptr, const uint32_t k, const uint32_t i, const uint32_t j) {
best_path_ptr[k].i = i;
best_path_ptr[k].j = j;
}
// reverse the best path
-void _reverse(struct PATH_CELL *best_path_ptr, const unsigned int best_path_len) {
- unsigned int tmp_i, tmp_j;
- unsigned int a, b;
+void _reverse(struct PATH_CELL *best_path_ptr, const uint32_t best_path_len) {
+ uint32_t a, b, tmp_i, tmp_j;
// reverse the min path
for (a = 0; a < best_path_len / 2; ++a) {
@@ -86,13 +89,13 @@ void _reverse(struct PATH_CELL *best_path_ptr, const unsigned int best_path_len)
}
// compute the norm2 of the given MFCCs vector
-void _compute_norm2(double *mfcc_ptr, const unsigned int mfcc_len, const unsigned int mfcc_coeffs, double *norm2_ptr) {
- unsigned int i, k;
+void _compute_norm2(double *mfcc_ptr, const uint32_t mfcc_len, const uint32_t mfcc_size, double *norm2_ptr) {
+ uint32_t i, k;
double v, sum;
for (i = 0; i < mfcc_len; ++i) {
sum = 0.0;
- for (k = 0; k < mfcc_coeffs; ++k) {
+ for (k = 0; k < mfcc_size; ++k) {
v = mfcc_ptr[k * mfcc_len + i];
sum += v * v;
}
@@ -101,31 +104,34 @@ void _compute_norm2(double *mfcc_ptr, const unsigned int mfcc_len, const unsigne
}
// compute cost matrix from mfcc?
-void _compute_cost_matrix(
+int _compute_cost_matrix(
double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n)
double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m)
- const unsigned int delta, // margin parameter
+ const uint32_t delta, // margin parameter
double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row; delta/2 <= centers[i] < m - delta/2
- const unsigned int n, // number of frames of the first wave
- const unsigned int m, // number of frames of the second wave
- const unsigned int l // number of MFCCs
+ uint32_t *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row
+ const uint32_t n, // number of frames (MFCC vectors) of the first wave
+ const uint32_t m, // number of frames (MFCC vectors) of the second wave
+ const uint32_t l // MFCC size
) {
double *norm2_1_ptr, *norm2_2_ptr;
double sum;
- unsigned int center_j, range_start, range_end;
- unsigned int i, j, k;
+ uint32_t center_j, range_start, range_end;
+ uint32_t i, j, k;
// compute norm2 vectors
norm2_1_ptr = (double *)calloc(n, sizeof(double));
norm2_2_ptr = (double *)calloc(m, sizeof(double));
+ if ((norm2_1_ptr == NULL) || (norm2_2_ptr == NULL)) {
+ return CDTW_FAILURE;
+ }
_compute_norm2(mfcc1_ptr, n, l, norm2_1_ptr);
_compute_norm2(mfcc2_ptr, m, l, norm2_2_ptr);
for (i = 0; i < n; ++i) {
center_j = (int)floor(m * (1.0 * i / n));
- range_start = _max(0, center_j - (delta / 2));
+ range_start = _nonnegative_difference(center_j, delta / 2);
range_end = range_start + delta;
if (range_end > m) {
range_end = m;
@@ -144,20 +150,21 @@ void _compute_cost_matrix(
// deallocate norm2 vectors as they are no longer needed
free((void *)norm2_1_ptr);
free((void *)norm2_2_ptr);
+ return CDTW_SUCCESS;
}
// compute accumulated cost matrix, not in-place
-void _compute_accumulated_cost_matrix(
- double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- unsigned int n, // number of frames of the first wave
- unsigned int delta, // margin parameter
- double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta)
+int _compute_accumulated_cost_matrix(
+ const double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta, // margin parameter
+ double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta)
) {
double cost0, cost1, cost2;
- unsigned int current_idx, offset;
- unsigned int i, j;
+ uint32_t current_idx, offset;
+ uint32_t i, j;
accumulated_cost_matrix_ptr[0] = cost_matrix_ptr[0];
for (j = 1; j < delta; ++j) {
@@ -182,29 +189,33 @@ void _compute_accumulated_cost_matrix(
accumulated_cost_matrix_ptr[current_idx] = cost_matrix_ptr[current_idx] + _three_way_min(cost0, cost1, cost2);
}
}
+ return CDTW_SUCCESS;
}
// compute accumulated cost matrix, in-place
// (i.e., this function overwrites cost_matrix with the accumulated cost values)
-void _compute_accumulated_cost_matrix_in_place(
- double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- const unsigned int n, // number of frames of the first wave
- const unsigned int delta // margin parameter
+int _compute_accumulated_cost_matrix_in_place(
+ double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta // margin parameter
) {
double *current_row_ptr;
double cost0, cost1, cost2;
- unsigned int current_idx, offset;
- unsigned int i, j;
+ uint32_t current_idx, offset;
+ uint32_t i, j;
// to compute the i-th row of the accumulated cost matrix
// we only need the i-th row of the cost matrix
- current_row_ptr = (double *)malloc(delta * sizeof(double));
+ current_row_ptr = (double *)calloc(delta, sizeof(double));
+ if (current_row_ptr == NULL) {
+ return CDTW_FAILURE;
+ }
// copy the first row of cost_matrix_ptr to current row buffer
_copy_cost_matrix_row(cost_matrix_ptr, 0, delta, current_row_ptr);
- //cost_matrix_ptr[0] = current_row_ptr[0];
+ //cost_matrix_ptr[0] = current_row_ptr[0]; // not needed!
for (j = 1; j < delta; ++j) {
cost_matrix_ptr[j] = current_row_ptr[j] + cost_matrix_ptr[j-1];
}
@@ -230,21 +241,22 @@ void _compute_accumulated_cost_matrix_in_place(
}
}
free((void *)current_row_ptr);
+ return CDTW_SUCCESS;
}
// compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, delta-1)
-void _compute_best_path(
- double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- const unsigned int n, // number of frames of the first wave
- const unsigned int delta, // margin parameter
- struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path
- unsigned int *best_path_len // length of the best path
+int _compute_best_path(
+ const double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta, // margin parameter
+ struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path
+ uint32_t *best_path_len // length of the best path
) {
double cost0, cost1, cost2;
- unsigned int argmin, offset;
- unsigned int i, j, k, r_j, max_path_len;
+ uint32_t argmin, r_j, offset;
+ uint32_t i, j, k, max_path_len;
// allocate space for keeping the best path
//
@@ -256,6 +268,9 @@ void _compute_best_path(
//
max_path_len = n + centers_ptr[n-1] + delta;
*best_path_ptr = (struct PATH_CELL *)calloc(max_path_len, sizeof(struct PATH_CELL));
+ if ((*best_path_ptr) == NULL) {
+ return CDTW_FAILURE;
+ }
i = n - 1;
j = centers_ptr[i] + delta - 1;
@@ -283,9 +298,9 @@ void _compute_best_path(
cost2 = accumulated_cost_matrix_ptr[(i-1) * delta + (r_j+offset-1)];
}
argmin = _three_way_argmin(cost0, cost1, cost2);
- if (argmin == 0) {
+ if (argmin == MOVE0) {
_append(*best_path_ptr, k++, --i, j);
- } else if (argmin == 1) {
+ } else if (argmin == MOVE1) {
_append(*best_path_ptr, k++, i, --j);
} else {
_append(*best_path_ptr, k++, --i, --j);
@@ -298,6 +313,7 @@ void _compute_best_path(
// reverse the path
_reverse(*best_path_ptr, k);
+ return CDTW_SUCCESS;
}
diff --git a/aeneas/cdtw/cdtw_func.h b/aeneas/cdtw/cdtw_func.h
index a7f3f16c..abef0212 100644
--- a/aeneas/cdtw/cdtw_func.h
+++ b/aeneas/cdtw/cdtw_func.h
@@ -9,55 +9,60 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
*/
+#include "cint.h"
+
+#define CDTW_SUCCESS 0
+#define CDTW_FAILURE 1
+
struct PATH_CELL {
- unsigned int i;
- unsigned int j;
+ uint32_t i; // row index in the virtual full matrix (n x m)
+ uint32_t j; // column index in the virtual full matrix (n x m)
};
// compute cost matrix from mfcc?
-void _compute_cost_matrix(
- double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n)
- double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m)
- const unsigned int delta, // margin parameter
- double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row; delta/2 <= centers[i] < m - delta/2
- const unsigned int n, // number of frames of the first wave
- const unsigned int m, // number of frames of the second wave
- const unsigned int l // number of MFCCs
+int _compute_cost_matrix(
+ double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n)
+ double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m)
+ const uint32_t delta, // margin parameter
+ double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
+ uint32_t *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row
+ const uint32_t n, // number of frames (MFCC vectors) of the first wave
+ const uint32_t m, // number of frames (MFCC vectors) of the second wave
+ const uint32_t l // MFCC size
);
// compute accumulated cost matrix, not in-place
-void _compute_accumulated_cost_matrix(
- double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- const unsigned int n, // number of frames of the first wave
- const unsigned int delta, // margin parameter
- double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta)
+int _compute_accumulated_cost_matrix(
+ const double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta, // margin parameter
+ double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta)
);
// compute accumulated cost matrix, in-place
// (i.e., this function overwrites cost_matrix with the accumulated cost values)
-void _compute_accumulated_cost_matrix_in_place(
- double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- const unsigned int n, // number of frames of the first wave
- const unsigned int delta // margin parameter
+int _compute_accumulated_cost_matrix_in_place(
+ double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta // margin parameter
);
-// compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, delta-1)
-void _compute_best_path(
- double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta)
- unsigned int *centers_ptr, // pointer to the centers (1D, n)
- const unsigned int n, // number of frames of the first wave
- const unsigned int delta, // margin parameter
- struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path
- unsigned int *best_path_len // length of the best path
+// compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, m-1)
+int _compute_best_path(
+ const double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta)
+ const uint32_t *centers_ptr, // pointer to the centers (1D, n)
+ const uint32_t n, // number of frames of the first wave
+ const uint32_t delta, // margin parameter
+ struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path
+ uint32_t *best_path_len // length of the best path
);
diff --git a/aeneas/cdtw/cdtw_py.c b/aeneas/cdtw/cdtw_py.c
index fc004679..14e516ea 100644
--- a/aeneas/cdtw/cdtw_py.c
+++ b/aeneas/cdtw/cdtw_py.c
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -25,7 +25,7 @@ __status__ = "Production"
#include "cdtw_func.h"
// append a new tuple (i, j) to the given list
-static void _append(PyObject *list, unsigned int i, unsigned int j) {
+static void _append(PyObject *list, uint32_t i, uint32_t j) {
PyObject *tuple;
tuple = PyTuple_New(2);
@@ -36,15 +36,10 @@ static void _append(PyObject *list, unsigned int i, unsigned int j) {
}
// convert array of struct to list of tuples
-static void _array_to_list(struct PATH_CELL *best_path, unsigned int best_path_length, PyObject *list) {
- //unsigned int i, j;
- unsigned int k;
+static void _array_to_list(struct PATH_CELL *best_path, uint32_t best_path_length, PyObject *list) {
+ uint32_t k;
for (k = 0; k < best_path_length; ++k) {
- //i = (*best_path).i;
- //j = (*best_path).j;
- //printf("k = %d : i = %d, j = %d\n", k, (int)i, (int)j);
- //printf("k = %d : i = %d, j = %d\n", k, best_path[k].i, best_path[k].j);
_append(list, best_path[k].i, best_path[k].j);
}
}
@@ -53,22 +48,22 @@ static void _array_to_list(struct PATH_CELL *best_path, unsigned int best_path_l
// take the PyObject containing the following arguments:
// - mfcc1: 2D array (l x n) of double, MFCCs of the first wave
// - mfcc2: 2D array (l x m) of double, MFCCs of the second wave
-// - delta: int, the number of frames of margin
-// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, delta-1)
+// - delta: uint, the number of frames of margin
+// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, m-1)
static PyObject *compute_best_path(PyObject *self, PyObject *args) {
PyObject *mfcc1_raw;
PyObject *mfcc2_raw;
- unsigned int delta;
+ uint32_t delta;
PyArrayObject *mfcc1, *mfcc2, *cost_matrix, *centers;
PyObject *best_path_ptr;
npy_intp cost_matrix_dimensions[2];
npy_intp centers_dimensions[1];
double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr;
- unsigned int *centers_ptr;
- unsigned int l1, l2, n, m;
+ uint32_t *centers_ptr;
+ uint32_t l1, l2, n, m;
struct PATH_CELL *best_path;
- unsigned int best_path_length;
+ uint32_t best_path_length;
// O = object (do not convert or check for errors)
// I = unsigned int
@@ -87,13 +82,11 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) {
return NULL;
}
- // NOTE: if arrived here, the mfcc? have the correct number of dimensions (2)
-
// get the dimensions of the input arguments
l1 = PyArray_DIMS(mfcc1)[0]; // number of MFCCs in the first wave
l2 = PyArray_DIMS(mfcc2)[0]; // number of MFCCs in the second wave
- n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave
- m = PyArray_DIMS(mfcc2)[1];; // number of frames in the second wave
+ n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave
+ m = PyArray_DIMS(mfcc2)[1]; // number of frames in the second wave
// check that the number of MFCCs is the same for both waves
if (l1 != l2) {
@@ -107,8 +100,8 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) {
}
// pointer to cost matrix data
- mfcc1_ptr = (double *)PyArray_DATA(mfcc1);
- mfcc2_ptr = (double *)PyArray_DATA(mfcc2);
+ mfcc1_ptr = (double *)PyArray_DATA(mfcc1);
+ mfcc2_ptr = (double *)PyArray_DATA(mfcc2);
// create cost matrix object
cost_matrix_dimensions[0] = n;
@@ -118,13 +111,36 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) {
// create centers object
centers_dimensions[0] = n;
- centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_INT32);
- centers_ptr = (unsigned int *)PyArray_DATA(centers);
+ centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_UINT32);
+ centers_ptr = (uint32_t *)PyArray_DATA(centers);
// actual computation
- _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1);
- _compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, n, delta);
- _compute_best_path(cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length);
+ if (_compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1) != CDTW_SUCCESS) {
+ Py_XDECREF(mfcc1);
+ Py_XDECREF(mfcc2);
+ Py_XDECREF(cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing cost matrix");
+ return NULL;
+ }
+
+ if (_compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, n, delta) != CDTW_SUCCESS) {
+ Py_XDECREF(mfcc1);
+ Py_XDECREF(mfcc2);
+ Py_XDECREF(cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix");
+ return NULL;
+ }
+
+ if (_compute_best_path(cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length) != CDTW_SUCCESS) {
+ Py_XDECREF(mfcc1);
+ Py_XDECREF(mfcc2);
+ Py_XDECREF(cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing best path");
+ return NULL;
+ }
// convert array of struct to list of tuples
best_path_ptr = PyList_New(0);
@@ -145,22 +161,22 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) {
// take the PyObject containing the following arguments:
// - mfcc1: 2D array (l x n) of double, MFCCs of the first wave
// - mfcc2: 2D array (l x m) of double, MFCCs of the second wave
-// - delta: int, the number of frames of margin
+// - delta: uint, the number of frames of margin
// and return a tuple (cost_matrix, centers), where
// - cost_matrix: 2D array (n x delta) of double
-// - centers: 1D array (n x 1) of int, centers[i] is the 0 <= center < m of the stripe at row i
+// - centers: 1D array (n x 1) of uint, centers[i] is the 0 <= center < m of the stripe at row i
static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) {
PyObject *mfcc1_raw;
PyObject *mfcc2_raw;
- unsigned int delta;
+ uint32_t delta;
PyArrayObject *mfcc1, *mfcc2, *cost_matrix, *centers;
PyObject *tuple;
npy_intp cost_matrix_dimensions[2];
npy_intp centers_dimensions[1];
double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr;
- unsigned int *centers_ptr;
- unsigned int l1, l2, n, m;
+ uint32_t *centers_ptr;
+ uint32_t l1, l2, n, m;
// O = object (do not convert or check for errors)
// I = unsigned int
@@ -179,13 +195,11 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) {
return NULL;
}
- // NOTE: if arrived here, the mfcc? have the correct number of dimensions (2)
-
// get the dimensions of the input arguments
l1 = PyArray_DIMS(mfcc1)[0]; // number of MFCCs in the first wave
l2 = PyArray_DIMS(mfcc2)[0]; // number of MFCCs in the second wave
- n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave
- m = PyArray_DIMS(mfcc2)[1];; // number of frames in the second wave
+ n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave
+ m = PyArray_DIMS(mfcc2)[1]; // number of frames in the second wave
// check that the number of MFCCs is the same for both waves
if (l1 != l2) {
@@ -199,8 +213,8 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) {
}
// pointer to cost matrix data
- mfcc1_ptr = (double *)PyArray_DATA(mfcc1);
- mfcc2_ptr = (double *)PyArray_DATA(mfcc2);
+ mfcc1_ptr = (double *)PyArray_DATA(mfcc1);
+ mfcc2_ptr = (double *)PyArray_DATA(mfcc2);
// create cost matrix object
cost_matrix_dimensions[0] = n;
@@ -210,11 +224,18 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) {
// create centers object
centers_dimensions[0] = n;
- centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_INT32);
- centers_ptr = (unsigned int *)PyArray_DATA(centers);
+ centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_UINT32);
+ centers_ptr = (uint32_t *)PyArray_DATA(centers);
// compute cost matrix
- _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1);
+ if (_compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1) != CDTW_SUCCESS) {
+ Py_XDECREF(mfcc1);
+ Py_XDECREF(mfcc2);
+ Py_XDECREF(cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing cost matrix");
+ return NULL;
+ }
// decrement reference to local object no longer needed
Py_DECREF(mfcc1);
@@ -242,8 +263,8 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject *
PyArrayObject *cost_matrix, *centers, *accumulated_cost_matrix;
npy_intp accumulated_cost_matrix_dimensions[2];
double *cost_matrix_ptr, *accumulated_cost_matrix_ptr;
- unsigned int *centers_ptr;
- unsigned int n, delta;
+ uint32_t *centers_ptr;
+ uint32_t n, delta;
// O = object (do not convert or check for errors)
if (!PyArg_ParseTuple(args, "OO", &cost_matrix_raw, ¢ers_raw)) {
@@ -253,7 +274,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject *
// convert to C contiguous array
cost_matrix = (PyArrayObject *) PyArray_ContiguousFromAny(cost_matrix_raw, NPY_DOUBLE, 2, 2);
- centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_INT32, 1, 1);
+ centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_UINT32, 1, 1);
// pointer to cost matrix data
cost_matrix_ptr = (double *)PyArray_DATA(cost_matrix);
@@ -269,7 +290,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject *
}
// pointer to centers data
- centers_ptr = (unsigned int *)PyArray_DATA(centers);
+ centers_ptr = (uint32_t *)PyArray_DATA(centers);
// create accumulated cost matrix object
accumulated_cost_matrix_dimensions[0] = n;
@@ -280,7 +301,12 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject *
accumulated_cost_matrix_ptr = (double *)PyArray_DATA(accumulated_cost_matrix);
// compute accumulated cost matrix
- _compute_accumulated_cost_matrix(cost_matrix_ptr, centers_ptr, n, delta, accumulated_cost_matrix_ptr);
+ if (_compute_accumulated_cost_matrix(cost_matrix_ptr, centers_ptr, n, delta, accumulated_cost_matrix_ptr) != CDTW_SUCCESS) {
+ Py_XDECREF(cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix");
+ return NULL;
+ }
// decrement reference to local object no longer needed
Py_DECREF(cost_matrix);
@@ -294,7 +320,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject *
// take the PyObject containing the following arguments:
// - accumulated_cost_matrix: 2D array (n x delta) of double
// - centers: 1D array (n x 1) of int, centers[i] is the 0 <= center < m of the stripe at row i
-// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, delta-1)
+// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, m-1)
static PyObject *compute_best_path_step(PyObject *self, PyObject *args) {
PyObject *accumulated_cost_matrix_raw;
PyObject *centers_raw;
@@ -302,10 +328,10 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) {
PyArrayObject *accumulated_cost_matrix, *centers;
PyObject *best_path_ptr;
double *accumulated_cost_matrix_ptr;
- unsigned int *centers_ptr;
- unsigned int n, delta;
+ uint32_t *centers_ptr;
+ uint32_t n, delta;
struct PATH_CELL *best_path;
- unsigned int best_path_length;
+ uint32_t best_path_length;
// O = object (do not convert or check for errors)
if (!PyArg_ParseTuple(args, "OO", &accumulated_cost_matrix_raw, ¢ers_raw)) {
@@ -315,7 +341,7 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) {
// convert to C contiguous array
accumulated_cost_matrix = (PyArrayObject *) PyArray_ContiguousFromAny(accumulated_cost_matrix_raw, NPY_DOUBLE, 2, 2);
- centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_INT32, 1, 1);
+ centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_UINT32, 1, 1);
// pointer to cost matrix data
accumulated_cost_matrix_ptr = (double *)PyArray_DATA(accumulated_cost_matrix);
@@ -331,13 +357,18 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) {
}
// pointer to centers data
- centers_ptr = (unsigned int *)PyArray_DATA(centers);
+ centers_ptr = (uint32_t *)PyArray_DATA(centers);
// create best path array of integers
best_path_ptr = PyList_New(0);
// compute best path
- _compute_best_path(accumulated_cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length);
+ if (_compute_best_path(accumulated_cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length) != CDTW_SUCCESS) {
+ Py_XDECREF(accumulated_cost_matrix);
+ Py_XDECREF(centers);
+ PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix");
+ return NULL;
+ }
// convert array of struct to list of tuples
_array_to_list(best_path, best_path_length, best_path_ptr);
@@ -355,31 +386,43 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) {
static PyMethodDef cdtw_methods[] = {
- // compute best path at once
{
"compute_best_path",
compute_best_path,
METH_VARARGS,
- "Given the MFCCs of the two waves, compute and return the DTW best path at once"
+ "Given the MFCCs of the two waves, compute and return the DTW best path at once\n"
+ ":param object mfcc1: numpy 2D matrix (mfcc_size, n) of MFCCs of the first wave\n"
+ ":param object mfcc2: numpy 2D matrix (mfcc_size, m) of MFCCs of the second wave\n"
+ ":param uint delta: the margin, in number of frames\n"
+ ":rtype: a list of tuples (i, j), from (0, 0) to (n-1, m-1) representing the best path"
},
- // compute in separate steps
{
"compute_cost_matrix_step",
compute_cost_matrix_step,
METH_VARARGS,
- "Given the MFCCs of the two waves, compute and return the DTW cost matrix"
+ "Given the MFCCs of the two waves, compute and return the DTW cost matrix\n"
+ ":param object mfcc1: numpy 2D matrix (mfcc_size, n) of MFCCs of the first wave\n"
+ ":param object mfcc2: numpy 2D matrix (mfcc_size, m) of MFCCs of the second wave\n"
+ ":param uint delta: the margin, in number of frames\n"
+ ":rtype: tuple (cost_matrix, centers)"
},
{
"compute_accumulated_cost_matrix_step",
compute_accumulated_cost_matrix_step,
METH_VARARGS,
- "Given the DTW cost matrix, compute and return the DTW accumulated cost matrix"
+ "Given the DTW cost matrix, compute and return the DTW accumulated cost matrix\n"
+ ":param object cost_matrix: the cost matrix (n, delta)\n"
+ ":param object centers: the centers (n)\n"
+ ":rtype: the accumulated cost matrix"
},
{
"compute_best_path_step",
compute_best_path_step,
METH_VARARGS,
- "Given the DTW accumulated cost matrix, compute and return the DTW best path"
+ "Given the DTW accumulated cost matrix, compute and return the DTW best path\n"
+ ":param object accumulated_cost_matrix: the accumulated cost matrix (n, delta)\n"
+ ":param object centers: the centers (n)\n"
+ ":rtype: a list of tuples (i, j), from (0, 0) to (n-1, m-1) representing the best path"
},
{
NULL,
diff --git a/aeneas/cdtw/cdtw_setup.py b/aeneas/cdtw/cdtw_setup.py
index a844b8fe..aab0c739 100644
--- a/aeneas/cdtw/cdtw_setup.py
+++ b/aeneas/cdtw/cdtw_setup.py
@@ -23,15 +23,15 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-CMODULE = Extension("cdtw", sources=["cdtw_py.c", "cdtw_func.c"], include_dirs=[get_include()])
+CMODULE = Extension("cdtw", sources=["cdtw_py.c", "cdtw_func.c", "cint.c"], include_dirs=[get_include()])
setup(
name="cdtw",
- version="1.4.1",
+ version="1.5.0",
description="""
Python C Extension for computing the DTW as fast as your bare metal allows.
""",
diff --git a/aeneas/cdtw/cint.c b/aeneas/cdtw/cint.c
new file mode 120000
index 00000000..8e1c9dae
--- /dev/null
+++ b/aeneas/cdtw/cint.c
@@ -0,0 +1 @@
+../cint/cint.c
\ No newline at end of file
diff --git a/aeneas/cdtw/cint.h b/aeneas/cdtw/cint.h
new file mode 120000
index 00000000..27a6bb39
--- /dev/null
+++ b/aeneas/cdtw/cint.h
@@ -0,0 +1 @@
+../cint/cint.h
\ No newline at end of file
diff --git a/aeneas/cew/000_compile_driver.sh b/aeneas/cew/000_compile_driver.sh
new file mode 100644
index 00000000..e26727c6
--- /dev/null
+++ b/aeneas/cew/000_compile_driver.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+gcc cew_driver.c cew_func.c -lespeak -o cew_driver -Wall -pedantic -std=c99
+
+
+
diff --git a/aeneas/cew/100_run_driver.sh b/aeneas/cew/100_run_driver.sh
new file mode 100644
index 00000000..f11e4eef
--- /dev/null
+++ b/aeneas/cew/100_run_driver.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+
+if [ ! -e cew_driver ]
+then
+ bash 000_compile_driver.sh
+fi
+
+echo "Run 1"
+./cew_driver
+echo ""
+
+echo "Run 2"
+./cew_driver en "Hello World" /tmp/out.wav single
+echo ""
+
+echo "Run 3"
+./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 0.0 0
+echo ""
+
+echo "Run 4"
+./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 0.0 1
+echo ""
+
+echo "Run 4"
+./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 2.0 1
+echo ""
+
+
+
+
diff --git a/aeneas/cew/800_compile_py.sh b/aeneas/cew/800_compile_py.sh
new file mode 100644
index 00000000..390950af
--- /dev/null
+++ b/aeneas/cew/800_compile_py.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+rm -rf build *.so
+python cew_setup.py build_ext --inplace
+
diff --git a/aeneas/cew/README.md b/aeneas/cew/README.md
new file mode 100644
index 00000000..8925eb4f
--- /dev/null
+++ b/aeneas/cew/README.md
@@ -0,0 +1,22 @@
+# aeneas.cew
+
+**aeneas.cew** is a Python C extension to synthesize text with eSpeak.
+
+## API
+
+See the [__init__.py](__init__.py) file.
+
+## Compiling the Python C extension locally
+
+```bash
+$ python cew_setup.py build_ext --inplace
+```
+
+## Compiling the pure C driver program
+
+```bash
+$ bash 000_compile_driver.sh
+```
+
+
+
diff --git a/aeneas/cew/__init__.py b/aeneas/cew/__init__.py
index 96f3cb04..eea4eea4 100644
--- a/aeneas/cew/__init__.py
+++ b/aeneas/cew/__init__.py
@@ -2,7 +2,56 @@
# coding=utf-8
"""
-aeneas.cew is a Python C extension to synthesize text with eSpeak
+aeneas.cew is a Python C extension to synthesize text with eSpeak.
+
+The functions provided by this module are:
+
+.. function:: cew.synthesize_single(output_file_path, voice_code, text)
+
+ Synthesize a single text fragment into a single WAVE file.
+
+ The returned tuple ``(sr, begin, end)`` contains
+ the sample rate and the begin and end time values
+ of the output WAVE file.
+
+ Note that ``begin`` is always ``0.0``, while ``end`` is equal to the
+ duration of the synthesized WAVE file, in seconds.
+
+ :param string output_file_path: the path of the WAVE file to be created, UTF-8 encoded
+ :param string voice_code: the eSpeak voice code (e.g., ``en``, ``en-gb``, ``it``, etc.)
+ :param string text: the text to be synthesized, UTF-8 encoded
+ :rtype: tuple
+
+
+.. function:: cew.synthesize_multiple(output_file_path, quit_after, backwards, text)
+
+ Synthesize several text fragments into a single WAVE file.
+
+ The returned tuple ``(sr, synt, anchors)`` contains
+ the sample rate of the output WAVE file,
+ the number of fragments actually synthesized,
+ and a list of time values, each representing
+ the begin time in the output WAVE file
+ of the corresponding text fragment.
+
+ Note that if ``quit_after`` is specified,
+ the number ``synt`` of fragments actually synthesized
+ might be less than the number of fragments in ``text``.
+
+ :param string output_file_path: the path of the WAVE file to be created, UTF-8 encoded
+ :param float quit_after: stop synthesizing after reaching the given duration (in seconds)
+ :param int backwards: if nonzero, synthesize backwards, that is,
+ starting from the last fragment.
+ In any case, the fragments in the output WAVE file
+ will be in natural order.
+ This option is meaningful only if ``quit_after > 0``.
+ :param list text: a list of ``(voice_code, fragment_text)`` tuples
+ with the text to be synthesized.
+ The ``voice_code`` is the the eSpeak voice code
+ (e.g., ``en``, ``en-gb``, ``it``, etc.).
+ The ``fragment_text`` must be UTF-8 encoded.
+ :rtype: tuple
+
"""
__author__ = "Alberto Pettarin"
@@ -12,7 +61,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL 3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/cew/cew_driver.c b/aeneas/cew/cew_driver.c
index 3f6dbd71..65354bc9 100644
--- a/aeneas/cew/cew_driver.c
+++ b/aeneas/cew/cew_driver.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for synthesizing text with eSpeak
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -21,17 +21,26 @@ __status__ = "Production"
#include "cew_func.h"
+#define DRIVER_SUCCESS 0
+#define DRIVER_FAILURE 1
+
// print usage
-void usage(const char *prog) {
+void _usage(const char *prog) {
+ printf("\n");
+ printf("Usage: %s VOICE_CODE TEXT AUDIO_FILE.wav single\n", prog);
+ printf(" %s VOICE_CODE TEXT AUDIO_FILE.wav multi QUIT_AFTER BACKWARDS\n", prog);
+ printf("\n");
+ printf("Example: %s en \"Hello World\" /tmp/out.wav single\n", prog);
+ printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 0.0 0\n", prog);
+ printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 0.0 1\n", prog);
+ printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 2.0 1\n", prog);
printf("\n");
- printf("Usage: %s VOICE_CODE TEXT AUDIO_FILE.wav single\n", prog);
- printf(" %s VOICE_CODE TEXT AUDIO_FILE.wav multi QUIT_AFTER BACKWARDS\n\n", prog);
}
// split a given string using a delimiter character
-// adapted from http://stackoverflow.com/questions/9210528/split-string-with-delimiters-in-c
-char** str_split(char* a_str, const char a_delim, int *count)
-{
+// adapted from
+// http://stackoverflow.com/questions/9210528/split-string-with-delimiters-in-c
+char **_str_split(char* a_str, const char a_delim, int *count) {
char** result = 0;
char* tmp = a_str;
char* last_delim = 0;
@@ -51,8 +60,8 @@ char** str_split(char* a_str, const char a_delim, int *count)
// add space for trailing token
(*count) += last_delim < (a_str + strlen(a_str) - 1);
- result = malloc(sizeof(char*) * (*count));
-
+ // tokenize
+ result = calloc((*count), sizeof(char*));
if (result) {
size_t idx = 0;
char* token = strtok(a_str, delim);
@@ -66,19 +75,6 @@ char** str_split(char* a_str, const char a_delim, int *count)
return result;
}
-//
-// this is a simple driver to test on the command line
-// compile with:
-//
-// gcc cew_driver.c cew_func.c -lespeak -o cew_driver
-//
-// and use it as:
-//
-// ./cew_driver en "Hello World" out.wav single => synth single
-// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 0.0 0 => synth multi normal
-// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 0.0 1 => synth multi normal, quit after reaching 2.0 seconds
-// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 2.0 1 => synth multi backwards, quit after reaching 2.0 seconds
-//
int main(int argc, char **argv) {
const char *voice_code, *text, *output_file_name, *mode;
@@ -88,11 +84,11 @@ int main(int argc, char **argv) {
struct FRAGMENT_INFO *fragments;
char **texts;
int i, n;
- unsigned int synthesized_ret;
+ size_t synthesized_ret;
if (argc < 5) {
- usage(argv[0]);
- return 1;
+ _usage(argv[0]);
+ return DRIVER_FAILURE;
}
voice_code = argv[1];
text = argv[2];
@@ -101,15 +97,15 @@ int main(int argc, char **argv) {
if (strcmp(mode, "multi") == 0) {
if (argc < 7) {
- usage(argv[0]);
- return 1;
+ _usage(argv[0]);
+ return DRIVER_FAILURE;
}
quit_after = (float)atof(argv[5]);
backwards = atoi(argv[6]);
// split text into fragments
n = 0;
- texts = str_split((char *)text, '|', &n);
+ texts = _str_split((char *)text, '|', &n);
// create fragments
fragments = (struct FRAGMENT_INFO *)calloc(sizeof(fragment), n);
@@ -127,12 +123,12 @@ int main(int argc, char **argv) {
backwards,
&sample_rate_ret,
&synthesized_ret
- ) != 0) {
+ ) != CEW_SUCCESS) {
printf("Error while calling _synthesize_single()\n");
- return 1;
+ return DRIVER_FAILURE;
}
printf("Sample rate: %d\n", sample_rate_ret);
- printf("Synthesized: %u\n", synthesized_ret);
+ printf("Synthesized: %lu\n", synthesized_ret);
for (i = 0; i < synthesized_ret; ++i) {
printf("%d %.3f %.3f\n", i, fragments[i].begin, fragments[i].end);
}
@@ -148,15 +144,15 @@ int main(int argc, char **argv) {
} else {
fragment.voice_code = voice_code;
fragment.text = text;
- if (_synthesize_single(output_file_name, &sample_rate_ret, &fragment) != 0) {
+ if (_synthesize_single(output_file_name, &sample_rate_ret, &fragment) != CEW_SUCCESS) {
printf("Error while calling _synthesize_single()\n");
- return 1;
+ return DRIVER_FAILURE;
}
printf("Sample rate: %d\n", sample_rate_ret);
printf("Begin: %.3f\n", fragment.begin);
printf("End: %.3f\n", fragment.end);
}
- return 0;
+ return DRIVER_SUCCESS;
}
diff --git a/aeneas/cew/cew_func.c b/aeneas/cew/cew_func.c
index a4481fcb..2cfee22b 100644
--- a/aeneas/cew/cew_func.c
+++ b/aeneas/cew/cew_func.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for synthesizing text with espeak
+Python C Extension for synthesizing text with eSpeak
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -30,8 +30,19 @@ static int sample_rate;
static FILE *wave_file = NULL;
-// write an uint32_t as a big endian int to file
-// that is, least significant first
+/*
+00000000 52 49 46 46 XX XX XX XX 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
+00000010 10 00 00 00 01 00 01 00 22 56 00 00 44 ac 00 00 |........"V..D...|
+00000020 02 00 10 00 64 61 74 61 XX XX XX XX |....data.... |
+*/
+static const unsigned char wave_hdr[44] = {
+ 'R' , 'I', 'F' , 'F', 0x2c , 0 , 0 , 0 , 'W' , 'A' , 'V' , 'E' , 'f' , 'm' , 't', ' ',
+ 0x10, 0 , 0 , 0 , 1 , 0 , 1 , 0 , 9 , 0x3d, 0 , 0 , 0x12, 0x7a, 0 , 0 ,
+ 2 , 0 , 0x10 , 0 , 'd' , 'a' , 't' , 'a' , 0 , 0 , 0 , 0
+};
+
+// write an uint32_t as a little endian int to file
+// that is, least significant byte first
void _write_uint32_t(FILE *f, int value) {
int ix;
for (ix = 0; ix < 4; ix++) {
@@ -45,19 +56,8 @@ void _write_uint32_t(FILE *f, int value) {
// will be set by _close_wave_file()
// once all audio samples are generated
int _open_wave_file(char const *path, int rate) {
- /*
- 00000000 52 49 46 46 XX XX XX XX 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
- 00000010 10 00 00 00 01 00 01 00 22 56 00 00 44 ac 00 00 |........"V..D...|
- 00000020 02 00 10 00 64 61 74 61 XX XX XX XX |....data.... |
- */
- static unsigned char wave_hdr[44] = {
- 'R' , 'I', 'F' , 'F', 0x2c , 0 , 0 , 0 , 'W' , 'A' , 'V' , 'E' , 'f' , 'm' , 't', ' ',
- 0x10, 0 , 0 , 0 , 1 , 0 , 1 , 0 , 9 , 0x3d, 0 , 0 , 0x12, 0x7a, 0 , 0 ,
- 2 , 0 , 0x10 , 0 , 'd' , 'a' , 't' , 'a' , 0 , 0 , 0 , 0
- };
-
if (path == NULL) {
- return 2;
+ return CEW_FAILURE;
}
while (isspace(*path)) {
@@ -71,22 +71,22 @@ int _open_wave_file(char const *path, int rate) {
}
if (wave_file == NULL) {
- return 1;
+ return CEW_FAILURE;
}
fwrite(wave_hdr, 1, 24, wave_file);
_write_uint32_t(wave_file, rate);
_write_uint32_t(wave_file, rate * 2);
fwrite(&wave_hdr[32], 1, 12, wave_file);
- return 0;
+ return CEW_SUCCESS;
}
// close wave file
int _close_wave_file(void) {
- unsigned int pos;
+ long pos;
if (wave_file == NULL) {
- return 1;
+ return CEW_FAILURE;
}
// flush and get the current position,
@@ -106,13 +106,13 @@ int _close_wave_file(void) {
fclose(wave_file);
wave_file = NULL;
- return 0;
+ return CEW_SUCCESS;
}
// callback for synth events
int _synth_callback(short *wav, int numsamples, espeak_EVENT *events) {
if (wav == NULL) {
- return 1;
+ return CEW_FAILURE;
}
while (events->type != 0) {
if (events->type == espeakEVENT_SAMPLERATE) {
@@ -128,7 +128,7 @@ int _synth_callback(short *wav, int numsamples, espeak_EVENT *events) {
if (numsamples > 0) {
fwrite(wav, numsamples * 2, 1, wave_file);
}
- return 0;
+ return CEW_SUCCESS;
}
// terminate synthesis and close file
@@ -145,10 +145,10 @@ int _synthesize_string(char const *text) {
espeak_Synth(text, size + 1, 0, POS_CHARACTER, 0, synth_flags, NULL, NULL);
}
if (espeak_Synchronize() != EE_OK) {
- return 1;
+ return CEW_FAILURE;
}
current_time += last_end_time;
- return 0;
+ return CEW_SUCCESS;
}
// set the current language
@@ -159,9 +159,9 @@ int _set_voice_code(char const *voice_code) {
memset(&voice, 0, sizeof(voice));
voice.languages = voice_code;
if (espeak_SetVoiceByProperties(&voice) != EE_OK) {
- return 1;
+ return CEW_FAILURE;
}
- return 0;
+ return CEW_SUCCESS;
}
// initialize the synthesizer
@@ -176,7 +176,8 @@ int _initialize_synthesizer(char const *output_file_path) {
sample_rate = 0;
// synthesizer flags
- synth_flags = espeakCHARS_UTF8 | espeakPHONEMES | espeakENDPAUSE;
+ // TODO let the user control espeakENDPAUSE
+ synth_flags = espeakCHARS_UTF8 | espeakENDPAUSE;
// writing to a file (or no output), we can use synchronous mode
sample_rate = espeak_Initialize(AUDIO_OUTPUT_SYNCHRONOUS, 0, data_path, 0);
@@ -217,8 +218,8 @@ int _initialize_synthesizer(char const *output_file_path) {
// open wave file
if (wave_file == NULL) {
- if(_open_wave_file(output_file_path, sample_rate) != 0) {
- return 1;
+ if(_open_wave_file(output_file_path, sample_rate) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
}
@@ -226,52 +227,52 @@ int _initialize_synthesizer(char const *output_file_path) {
current_time = 0.0;
last_end_time = 0.0;
- return 0;
+ return CEW_SUCCESS;
}
// synthesize a single text fragment
int _synthesize_single(
const char *output_file_path,
int *sample_rate_ret,
- struct FRAGMENT_INFO *fragment
+ struct FRAGMENT_INFO *fragment_ret
) {
// open output wave file
- if (_initialize_synthesizer(output_file_path) != 0) {
- return 1;
+ if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
// set voice code
- if (_set_voice_code((*fragment).voice_code) != 0) {
- return 1;
+ if (_set_voice_code((*fragment_ret).voice_code) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
// synthesize text
*sample_rate_ret = sample_rate;
- (*fragment).begin = current_time;
- if (_synthesize_string((*fragment).text) != 0) {
- return 1;
+ (*fragment_ret).begin = current_time;
+ if (_synthesize_string((*fragment_ret).text) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
- (*fragment).end = current_time;
+ (*fragment_ret).end = current_time;
// close output wave file
_terminate_synthesis();
- return 0;
+ return CEW_SUCCESS;
}
// synthesize multiple fragments
int _synthesize_multiple(
const char *output_file_path,
- struct FRAGMENT_INFO **ret,
- const int number_of_fragments,
+ struct FRAGMENT_INFO **fragments_ret,
+ const size_t number_of_fragments,
const float quit_after,
const int backwards,
int *sample_rate_ret,
- unsigned int *synthesized_ret
+ size_t *synthesized_ret
) {
- int i, synthesized, start;
+ size_t i, synthesized, start;
start = 0;
@@ -280,23 +281,27 @@ int _synthesize_multiple(
// from the back we need to reach quit_after seconds of audio
// open output wave file
- if (_initialize_synthesizer(output_file_path) != 0) {
- return 1;
+ if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
// synthesize from the back
- for (i = number_of_fragments - 1; i >= 0 ; --i) {
- if (_set_voice_code((*ret)[i].voice_code) != 0) {
- return 1;
+ for (i = number_of_fragments - 1; ; --i) {
+ if (_set_voice_code((*fragments_ret)[i].voice_code) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
- if (_synthesize_string((*ret)[i].text) != 0) {
- return 1;
+ if (_synthesize_string((*fragments_ret)[i].text) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
start = i;
// check if we generated >= quit_after seconds of audio
if (current_time >= quit_after) {
break;
}
+ // end of the loop, checked here because i is size_t i.e. unsigned!
+ if (i == 0) {
+ break;
+ }
}
// close output wave file
@@ -304,8 +309,8 @@ int _synthesize_multiple(
}
// open output wave file
- if (_initialize_synthesizer(output_file_path) != 0) {
- return 1;
+ if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
// number of synthesized fragments
@@ -313,8 +318,8 @@ int _synthesize_multiple(
// loop over all input fragments
for (i = start; i < number_of_fragments; ++i) {
- if (_set_voice_code((*ret)[i].voice_code) != 0) {
- return 1;
+ if (_set_voice_code((*fragments_ret)[i].voice_code) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
// NOTE: if backwards, we move the anchor times to the first fragments,
@@ -322,11 +327,11 @@ int _synthesize_multiple(
// despite the fact that they will not be saved with the "correct" text
// this trick avoids copying data around
// if backwards, the user is not expected to use the time anchors anyway
- (*ret)[i-start].begin = current_time;
- if (_synthesize_string((*ret)[i].text) != 0) {
- return 1;
+ (*fragments_ret)[i-start].begin = current_time;
+ if (_synthesize_string((*fragments_ret)[i].text) != CEW_SUCCESS) {
+ return CEW_FAILURE;
}
- (*ret)[i-start].end = current_time;
+ (*fragments_ret)[i-start].end = current_time;
synthesized += 1;
// check if we generated >= quit_after seconds of audio
@@ -342,7 +347,7 @@ int _synthesize_multiple(
*sample_rate_ret = sample_rate;
*synthesized_ret = synthesized;
- return 0;
+ return CEW_SUCCESS;
}
diff --git a/aeneas/cew/cew_func.h b/aeneas/cew/cew_func.h
index 58749679..7010d19c 100644
--- a/aeneas/cew/cew_func.h
+++ b/aeneas/cew/cew_func.h
@@ -1,6 +1,6 @@
/*
-Python C Extension for synthesizing text with espeak
+Python C Extension for synthesizing text with eSpeak
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,12 +9,15 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
*/
+#define CEW_SUCCESS 0
+#define CEW_FAILURE 1
+
struct FRAGMENT_INFO {
float begin;
float end;
@@ -22,22 +25,49 @@ struct FRAGMENT_INFO {
const char *text;
};
-// synthesize a single text fragment
+/*
+ Synthesize a single text fragment,
+ described by the FRAGMENT_INFO fragment_ret,
+ creating a WAVE file at output_file_path.
+
+ The sample rate of the output WAVE file is stored
+ in sample_rate_ret, and the begin and end times
+ are stored in the begin and end attributes of
+ fragment_ret.
+*/
int _synthesize_single(
const char *output_file_path,
- int *sample_rate_ret,
- struct FRAGMENT_INFO *ret
+ int *sample_rate_ret, // int because the espeak lib returns it as such
+ struct FRAGMENT_INFO *fragment_ret
);
-// synthesize multiple fragments
+/*
+ Synthesize multiple text fragments,
+ described by the FRAGMENT_INFO fragments_ret array,
+ creating a WAVE file at output_file_path.
+
+ If quit_after > 0, then the synthesis is terminated
+ as soon as the total duration reaches >= quit_after seconds.
+
+ If backwards is != 0, then the synthesis is done
+ backwards, from the end of the fragments array.
+ This option is meaningful only if quit_after is > 0,
+ otherwise it has no effect.
+
+ The sample rate of the output WAVE file is stored
+ in sample_rate_ret, the number of synthesized fragments
+ in synthesized_ret, and the begin and end times
+ are stored in the begin and end attributes of
+ the elements of fragments_ret.
+*/
int _synthesize_multiple(
const char *output_file_path,
- struct FRAGMENT_INFO **ret,
- const int number_of_fragments,
+ struct FRAGMENT_INFO **fragments_ret,
+ const size_t number_of_fragments,
const float quit_after,
const int backwards,
- int *sample_rate_ret,
- unsigned int *synthesized_ret
+ int *sample_rate_ret, // int because the espeak lib returns it as such
+ size_t *synthesized_ret
);
diff --git a/aeneas/cew/cew_py.c b/aeneas/cew/cew_py.c
index d41d3c43..77661b60 100644
--- a/aeneas/cew/cew_py.c
+++ b/aeneas/cew/cew_py.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for synthesizing text with espeak
+Python C Extension for synthesizing text with eSpeak
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -28,7 +28,7 @@ static PyObject *synthesize_single(PyObject *self, PyObject *args) {
PyObject *tuple;
char const *output_file_path;
struct FRAGMENT_INFO ret;
- int sample_rate;
+ int sample_rate; // int because espeak lib returns it as such
// s = string
if (!PyArg_ParseTuple(args, "sss", &output_file_path, &ret.voice_code, &ret.text)) {
@@ -42,7 +42,6 @@ static PyObject *synthesize_single(PyObject *self, PyObject *args) {
}
// build the tuple to be returned
- // NOTE: returning sample_rate as an int, as the espeak lib does
tuple = PyTuple_New(3);
PyTuple_SetItem(tuple, 0, Py_BuildValue("i", sample_rate));
PyTuple_SetItem(tuple, 1, Py_BuildValue("f", ret.begin));
@@ -60,8 +59,8 @@ static PyObject *synthesize_multiple(PyObject *self, PyObject *args) {
int const backwards;
struct FRAGMENT_INFO *fragments_synt;
- int sample_rate;
- unsigned int number_of_fragments, i, synthesized;
+ int sample_rate; // int because espeak lib returns it as such
+ size_t number_of_fragments, i, synthesized;
// s = string
// f = float
@@ -107,7 +106,8 @@ static PyObject *synthesize_multiple(PyObject *self, PyObject *args) {
number_of_fragments,
quit_after,
backwards,
- &sample_rate, &synthesized
+ &sample_rate,
+ &synthesized
) != 0) {
PyErr_SetString(PyExc_ValueError, "Error while synthesizing multiple fragments");
free((void*)fragments_synt);
@@ -150,13 +150,22 @@ static PyMethodDef cew_methods[] = {
"synthesize_single",
synthesize_single,
METH_VARARGS,
- "Synthesize a single text fragment with espeak"
+ "Synthesize a single text fragment with eSpeak\n"
+ ":param string output_file_path: the path of the WAVE file to be created\n"
+ ":param string voice_code: the voice code of the language to be used\n"
+ ":param string text: the text to be synthesized\n"
+ ":rtype: tuple (sample_rate, begin, end)"
},
{
"synthesize_multiple",
synthesize_multiple,
METH_VARARGS,
- "Synthesize multiple text fragments with espeak"
+ "Synthesize multiple text fragments with eSpeak\n"
+ ":param string output_file_path: the path of the WAVE file to be created\n"
+ ":param float quit_after: if > 0, stop synthesizing when reaching quit_after seconds\n"
+ ":param int backwards: if 1, synthesize backwards, from the last fragment to the first\n"
+ ":param list fragments: list of (voice_code, text) tuples of text fragments to be synthesized\n"
+ ":rtype: tuple (sample_rate, synthesized, list) where list is a list of (begin, end) time values"
},
{
NULL,
diff --git a/aeneas/cew/cew_setup.py b/aeneas/cew/cew_setup.py
index d0f52ba1..41de70a1 100644
--- a/aeneas/cew/cew_setup.py
+++ b/aeneas/cew/cew_setup.py
@@ -2,7 +2,7 @@
# coding=utf-8
"""
-Compile the Python C Extension for synthesizing text with espeak.
+Compile the Python C Extension for synthesizing text with eSpeak.
.. versionadded:: 1.3.0
"""
@@ -21,7 +21,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -29,9 +29,9 @@
setup(
name="cew",
- version="1.4.1",
+ version="1.5.0",
description="""
- Python C Extension for synthesizing text with espeak.
+ Python C Extension for synthesizing text with eSpeak.
""",
ext_modules=[CMODULE]
)
diff --git a/aeneas/cewsubprocess.py b/aeneas/cewsubprocess.py
new file mode 100644
index 00000000..1b69db80
--- /dev/null
+++ b/aeneas/cewsubprocess.py
@@ -0,0 +1,208 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+This module contains the following classes:
+
+* :class:`aeneas.cewsubprocess.CEWSubprocess` which is an
+ helper class executes the :mod:`aeneas.cew` C extension
+ in a separate process via ``subprocess``.
+
+This module works around a problem with the ``eSpeak`` library,
+which seems to generate different audio data for the same
+input parameters/text, when run multiple times in the same process.
+See the following discussions for details:
+
+#. https://groups.google.com/d/msg/aeneas-forced-alignment/NLbtSRf2_vg/mMHuTQiFEgAJ
+#. https://sourceforge.net/p/espeak/mailman/message/34861696/
+
+.. warning:: This module might be removed in a future version
+
+.. versionadded:: 1.5.0
+"""
+
+from __future__ import absolute_import
+from __future__ import print_function
+import io
+import subprocess
+import sys
+
+from aeneas.logger import Loggable
+from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.timevalue import TimeValue
+import aeneas.globalfunctions as gf
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+class CEWSubprocess(Loggable):
+ """
+ This helper class executes the ``aeneas.cew`` C extension
+ in a separate process by running
+ the :func:`aeneas.cewsubprocess.CEWSubprocess.main` function
+ via ``subprocess``.
+
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ """
+
+ TAG = u"CEWSubprocess"
+
+ def synthesize_single(self, audio_file_path, voice_code, text):
+ """
+ Create a ``wav`` audio file containing the synthesized text.
+
+ The ``text`` must be a unicode string encodable with UTF-8,
+ otherwise ``espeak`` might fail.
+
+ Return the duration of the synthesized audio file, in seconds.
+
+ :param string audio_file_path: the path of the output audio file
+ :param string voice_code: the code of the voice to use
+ :param string text: the text to synthesize
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
+ """
+ u_text = [(voice_code, text)]
+ sr, sf, intervals = self.synthesize_multiple(audio_file_path, 0, 0, u_text)
+ if len(intervals) > 0:
+ return intervals[0][1]
+ return None
+
+ def synthesize_multiple(self, audio_file_path, c_quit_after, c_backwards, u_text):
+ """
+ Synthesize the text contained in the given fragment list
+ into a ``wav`` file.
+
+ :param string audio_file_path: the path to the output audio file
+ :param float c_quit_after: stop synthesizing as soon as
+ reaching this many seconds
+ :param bool c_backwards: synthesizing from the end of the text file
+ :param object u_text: a list of ``(voice_code, text)`` tuples
+ :rtype: tuple ``(sample_rate, synthesized, intervals)``
+ """
+ self.log([u"Audio file path: '%s'", audio_file_path])
+ self.log([u"c_quit_after: '%.3f'", c_quit_after])
+ self.log([u"c_backwards: '%d'", c_backwards])
+
+ text_file_handler, text_file_path = gf.tmp_file()
+ data_file_handler, data_file_path = gf.tmp_file()
+ self.log([u"Temporary text file path: '%s'", text_file_path])
+ self.log([u"Temporary data file path: '%s'", data_file_path])
+
+ self.log(u"Populating the text file...")
+ with io.open(text_file_path, "w", encoding="utf-8") as tmp_text_file:
+ for f_voice_code, f_text in u_text:
+ tmp_text_file.write(u"%s %s\n" % (f_voice_code, f_text))
+ self.log(u"Populating the text file... done")
+
+ arguments = [
+ self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_PATH],
+ "-m",
+ "aeneas.cewsubprocess",
+ "%.3f" % c_quit_after,
+ "%d" % c_backwards,
+ text_file_path,
+ audio_file_path,
+ data_file_path
+ ]
+ self.log([u"Calling with arguments '%s'", u" ".join(arguments)])
+ proc = subprocess.Popen(
+ arguments,
+ stdout=subprocess.PIPE,
+ stdin=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ universal_newlines=True)
+ proc.communicate()
+
+ self.log(u"Reading output data...")
+ with io.open(data_file_path, "r", encoding="utf-8") as data_file:
+ lines = data_file.read().splitlines()
+ sr = int(lines[0])
+ sf = int(lines[1])
+ intervals = []
+ for line in lines[2:]:
+ values = line.split(u" ")
+ if len(values) == 2:
+ intervals.append((TimeValue(values[0]), TimeValue(values[1])))
+ self.log(u"Reading output data... done")
+
+ self.log(u"Deleting text and data files...")
+ gf.delete_file(text_file_handler, text_file_path)
+ gf.delete_file(data_file_handler, data_file_path)
+ self.log(u"Deleting text and data files... done")
+
+ return (sr, sf, intervals)
+
+
+
+def main():
+ """
+ Run ``aeneas.cew``, reading input text from file and writing audio and interval data to file.
+ """
+
+ # make sure we have enough parameters
+ if len(sys.argv) < 6:
+ print("You must pass five arguments: QUIT_AFTER BACKWARDS TEXT_FILE_PATH AUDIO_FILE_PATH DATA_FILE_PATH")
+ return 1
+
+ # read parameters
+ c_quit_after = float(sys.argv[1]) # NOTE: cew needs float, not TimeValue
+ c_backwards = int(sys.argv[2])
+ text_file_path = sys.argv[3]
+ audio_file_path = sys.argv[4]
+ data_file_path = sys.argv[5]
+
+ # read (voice_code, text) from file
+ s_text = []
+ with io.open(text_file_path, "r", encoding="utf-8") as text:
+ for line in text.readlines():
+ # NOTE: not using strip() to avoid removing trailing blank characters
+ line = line.replace(u"\n", u"").replace(u"\r", u"")
+ idx = line.find(" ")
+ if idx > 0:
+ f_voice_code = line[:idx]
+ f_text = line[idx+1:]
+ #print("%s => '%s' and '%s'" % (line, f_voice_code, f_text))
+ s_text.append((f_voice_code, f_text))
+
+ # convert to bytes/unicode as required by subprocess
+ c_text = []
+ if gf.PY2:
+ for f_voice_code, f_text in s_text:
+ c_text.append((gf.safe_bytes(f_voice_code), gf.safe_bytes(f_text)))
+ else:
+ for f_voice_code, f_text in s_text:
+ c_text.append((gf.safe_unicode(f_voice_code), gf.safe_unicode(f_text)))
+
+ try:
+ import aeneas.cew.cew
+ sr, sf, intervals = aeneas.cew.cew.synthesize_multiple(
+ audio_file_path,
+ c_quit_after,
+ c_backwards,
+ c_text
+ )
+ with io.open(data_file_path, "w", encoding="utf-8") as data:
+ data.write(u"%d\n" % (sr))
+ data.write(u"%d\n" % (sf))
+ data.write(u"\n".join([u"%.3f %.3f" % (i[0], i[1]) for i in intervals]))
+ except Exception as exc:
+ print(u"Unexpected error: %s" % str(exc))
+
+
+
+if __name__ == "__main__":
+ main()
+
+
+
diff --git a/aeneas/cint/README.md b/aeneas/cint/README.md
new file mode 100644
index 00000000..f08a006f
--- /dev/null
+++ b/aeneas/cint/README.md
@@ -0,0 +1,6 @@
+# aeneas.cint
+
+This directory contains portable
+fixed-size int type definitions and functions
+for the other Python C extensions.
+
diff --git a/aeneas/cint/__init__.py b/aeneas/cint/__init__.py
new file mode 100644
index 00000000..7087cd97
--- /dev/null
+++ b/aeneas/cint/__init__.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+aeneas.cint contains portable
+fixed-size int type definitions and functions
+for the other Python C extensions.
+"""
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL 3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+
+
+
diff --git a/aeneas/cint/cint.c b/aeneas/cint/cint.c
new file mode 100644
index 00000000..99541bfc
--- /dev/null
+++ b/aeneas/cint/cint.c
@@ -0,0 +1,111 @@
+/*
+
+Portable fixed-size int definitions for the other Python C extensions.
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+*/
+
+#include "cint.h"
+
+uint8_t le_u8_to_cpu(const unsigned char *buf) {
+ return (uint8_t)buf[0];
+}
+uint8_t be_u8_to_cpu(const unsigned char *buf) {
+ return (uint8_t)buf[0];
+}
+uint16_t le_u16_to_cpu(const unsigned char *buf) {
+ return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
+}
+uint16_t be_u16_to_cpu(const unsigned char *buf) {
+ return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
+}
+uint32_t le_u32_to_cpu(const unsigned char *buf) {
+ return ((uint32_t)buf[0]) | (((uint32_t)buf[1]) << 8) | (((uint32_t)buf[2]) << 16) | (((uint32_t)buf[3]) << 24);
+}
+uint32_t be_u32_to_cpu(const unsigned char *buf) {
+ return ((uint32_t)buf[3]) | (((uint32_t)buf[2]) << 8) | (((uint32_t)buf[1]) << 16) | (((uint32_t)buf[0]) << 24);
+}
+
+int8_t le_s8_to_cpu(const unsigned char *buf) {
+ return (uint8_t)buf[0];
+}
+int8_t be_s8_to_cpu(const unsigned char *buf) {
+ return (uint8_t)buf[0];
+}
+int16_t le_s16_to_cpu(const unsigned char *buf) {
+ return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
+}
+int16_t be_s16_to_cpu(const unsigned char *buf) {
+ return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
+}
+int32_t le_s32_to_cpu(const unsigned char *buf) {
+ return ((uint32_t)buf[0]) | (((uint32_t)buf[1]) << 8) | (((uint32_t)buf[2]) << 16) | (((uint32_t)buf[3]) << 24);
+}
+int32_t be_s32_to_cpu(const unsigned char *buf) {
+ return ((uint32_t)buf[3]) | (((uint32_t)buf[2]) << 8) | (((uint32_t)buf[1]) << 16) | (((uint32_t)buf[0]) << 24);
+}
+
+void cpu_to_le_u8(unsigned char *buf, uint8_t val) {
+ buf[0] = (val & 0xFF);
+}
+void cpu_to_be_u8(uint8_t *buf, uint8_t val) {
+ buf[0] = (val & 0xFF);
+}
+void cpu_to_le_u16(unsigned char *buf, uint16_t val) {
+ buf[0] = (val & 0x00FF);
+ buf[1] = (val & 0xFF00) >> 8;
+}
+void cpu_to_be_u16(uint8_t *buf, uint16_t val) {
+ buf[0] = (val & 0xFF00) >> 8;
+ buf[1] = (val & 0x00FF);
+}
+void cpu_to_le_u32(unsigned char *buf, uint32_t val) {
+ buf[0] = (val & 0x000000FF);
+ buf[1] = (val & 0x0000FF00) >> 8;
+ buf[2] = (val & 0x00FF0000) >> 16;
+ buf[3] = (val & 0xFF000000) >> 24;
+}
+void cpu_to_be_u32(uint8_t *buf, uint32_t val) {
+ buf[0] = (val & 0xFF000000) >> 24;
+ buf[1] = (val & 0x00FF0000) >> 16;
+ buf[2] = (val & 0x0000FF00) >> 8;
+ buf[3] = (val & 0x000000FF);
+}
+
+void cpu_to_le_s8(unsigned char *buf, int8_t val) {
+ buf[0] = (val & 0xFF);
+}
+void cpu_to_be_s8(uint8_t *buf, int8_t val) {
+ buf[0] = (val & 0xFF);
+}
+void cpu_to_le_s16(unsigned char *buf, int16_t val) {
+ buf[0] = (val & 0x00FF);
+ buf[1] = (val & 0xFF00) >> 8;
+}
+void cpu_to_be_s16(uint8_t *buf, int16_t val) {
+ buf[0] = (val & 0xFF00) >> 8;
+ buf[1] = (val & 0x00FF);
+}
+void cpu_to_le_s32(unsigned char *buf, int32_t val) {
+ buf[0] = (val & 0x000000FF);
+ buf[1] = (val & 0x0000FF00) >> 8;
+ buf[2] = (val & 0x00FF0000) >> 16;
+ buf[3] = (val & 0xFF000000) >> 24;
+}
+void cpu_to_be_s32(uint8_t *buf, int32_t val) {
+ buf[0] = (val & 0xFF000000) >> 24;
+ buf[1] = (val & 0x00FF0000) >> 16;
+ buf[2] = (val & 0x0000FF00) >> 8;
+ buf[3] = (val & 0x000000FF);
+}
+
diff --git a/aeneas/cint/cint.h b/aeneas/cint/cint.h
new file mode 100644
index 00000000..a8abed0e
--- /dev/null
+++ b/aeneas/cint/cint.h
@@ -0,0 +1,58 @@
+/*
+
+Portable fixed-size int definitions for the other Python C extensions.
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+*/
+
+#ifdef _MSC_VER
+typedef __int8 int8_t;
+typedef __int16 int16_t;
+typedef __int32 int32_t;
+typedef __int64 int64_t;
+typedef unsigned __int8 uint8_t;
+typedef unsigned __int16 uint16_t;
+typedef unsigned __int32 uint32_t;
+typedef unsigned __int64 uint64_t;
+#else
+#include
+#endif
+
+uint8_t le_u8_to_cpu(const unsigned char *buf);
+uint8_t be_u8_to_cpu(const unsigned char *buf);
+uint16_t le_u16_to_cpu(const unsigned char *buf);
+uint16_t be_u16_to_cpu(const unsigned char *buf);
+uint32_t le_u32_to_cpu(const unsigned char *buf);
+uint32_t be_u32_to_cpu(const unsigned char *buf);
+
+int8_t le_s8_to_cpu(const unsigned char *buf);
+int8_t be_s8_to_cpu(const unsigned char *buf);
+int16_t le_s16_to_cpu(const unsigned char *buf);
+int16_t be_s16_to_cpu(const unsigned char *buf);
+int32_t le_s32_to_cpu(const unsigned char *buf);
+int32_t be_s32_to_cpu(const unsigned char *buf);
+
+void cpu_to_le_u8(unsigned char *buf, uint8_t val);
+void cpu_to_be_u8(unsigned char *buf, uint8_t val);
+void cpu_to_le_u16(unsigned char *buf, uint16_t val);
+void cpu_to_be_u16(unsigned char *buf, uint16_t val);
+void cpu_to_le_u32(unsigned char *buf, uint32_t val);
+void cpu_to_be_u32(unsigned char *buf, uint32_t val);
+
+void cpu_to_le_s8(unsigned char *buf, int8_t val);
+void cpu_to_be_s8(unsigned char *buf, int8_t val);
+void cpu_to_le_s16(unsigned char *buf, int16_t val);
+void cpu_to_be_s16(unsigned char *buf, int16_t val);
+void cpu_to_le_s32(unsigned char *buf, int32_t val);
+void cpu_to_be_s32(unsigned char *buf, int32_t val);
+
diff --git a/aeneas/cmfcc/000_compile_driver.sh b/aeneas/cmfcc/000_compile_driver.sh
new file mode 100644
index 00000000..63fc926d
--- /dev/null
+++ b/aeneas/cmfcc/000_compile_driver.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_wo_fo -Wall -pedantic -std=c99 -lm
+gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_wo_ff -Wall -pedantic -std=c99 -lm -lrfftw -lfftw -DUSE_FFTW
+gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_ws_fo -Wall -pedantic -std=c99 -lm -lsndfile -DUSE_SNDFILE
+gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_ws_ff -Wall -pedantic -std=c99 -lm -lsndfile -lrfftw -lfftw -DUSE_SNDFILE -DUSE_FFTW
+
+
diff --git a/aeneas/cmfcc/100_run_driver.sh b/aeneas/cmfcc/100_run_driver.sh
new file mode 100644
index 00000000..6c3fdb8b
--- /dev/null
+++ b/aeneas/cmfcc/100_run_driver.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+if [ ! -e cmfcc_driver_wo_fo ]
+then
+ bash 000_compile_driver.sh
+fi
+
+echo "Run 1"
+./cmfcc_driver_wo_fo
+echo ""
+
+echo "Run 2"
+./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.dt.bin data text
+echo ""
+
+echo "Run 3"
+./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.db.bin data binary
+echo ""
+
+echo "Run 4"
+./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.ft.bin file text
+echo ""
+
+echo "Run 5"
+./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.fb.bin file binary
+echo ""
+
+
+
diff --git a/aeneas/cmfcc/800_compile_py.sh b/aeneas/cmfcc/800_compile_py.sh
new file mode 100644
index 00000000..94186b22
--- /dev/null
+++ b/aeneas/cmfcc/800_compile_py.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+rm -rf build *.so
+python cmfcc_setup.py build_ext --inplace
+
diff --git a/aeneas/cmfcc/README.md b/aeneas/cmfcc/README.md
new file mode 100644
index 00000000..63244e2d
--- /dev/null
+++ b/aeneas/cmfcc/README.md
@@ -0,0 +1,22 @@
+# aeneas.cmfcc
+
+**aeneas.cmfcc** is a Python C extension to extract MFCCs from a WAVE mono file.
+
+## API
+
+See the [__init__.py](__init__.py) file.
+
+## Compiling the Python C extension locally
+
+```bash
+$ python cmfcc_setup.py build_ext --inplace
+```
+
+## Compiling the pure C driver program
+
+```bash
+$ bash 000_compile_driver.sh
+```
+
+
+
diff --git a/aeneas/cmfcc/__init__.py b/aeneas/cmfcc/__init__.py
index 4cddbc42..16ea6cb7 100644
--- a/aeneas/cmfcc/__init__.py
+++ b/aeneas/cmfcc/__init__.py
@@ -2,7 +2,53 @@
# coding=utf-8
"""
-aeneas.cmfcc is a Python C extension to extract MFCCs from a wave file
+aeneas.cmfcc is a Python C Extension for computing the MFCCs from a WAVE mono file.
+
+.. function:: cmfcc.compute_from_data(data, sample_rate, filter_bank_size, mfcc_size, fft_order, lower_frequency, upper_frequency, emphasis_factor, window_length, window_shift)
+
+ Compute MFCCs for a given WAVE mono file,
+ passed as a NumPy 1D array of ``float64`` values in ``[-1.0, 1.0]``.
+
+ The returned tuple ``(mfcc, length, sr)`` contains
+ the MFCCs as a NumPy 2D matrix of shape ``(n, mfcc_size)``,
+ and the number of samples and sample rate of the WAVE file.
+
+ The last two elements ``length`` and ``sr``
+ are returned to make the signature of this function
+ consistent with that of function :func:`cmfcc.compute_from_file`.
+
+ :param data: the audio data
+ :type data: :class:`numpy.ndarray` (1D)
+ :param int sample_rate: the audio sample rate
+ :param int filter_bank_size: the number of Mel filters
+ :param int mfcc_size: the number of MFCC coefficients
+ :param int fft_order: the order of the FFT
+ :param float lower_frequency: the lower frequency to cut, in Hz
+ :param float upper_frequency: the upper frequency to cut, in Hz
+ :param float emphasis_factor: the pre-emphasis factor
+ :param float window_length: the length of the MFCC window, in seconds
+ :param float window_shift: the shift of the MFCC window, in seconds
+ :rtype: tuple
+
+.. function:: cmfcc.compute_from_file(audio_file_path, filter_bank_size, mfcc_size, fft_order, lower_frequency, upper_frequency, emphasis_factor, window_length, window_shift)
+
+ Compute MFCCs for a given WAVE mono file,
+ passed as a file path on disk.
+
+ The returned tuple ``(mfcc, length, sr)`` contains
+ the MFCCs as a NumPy 2D matrix of shape ``(n, mfcc_size)``,
+ and the number of samples and sample rate of the WAVE file.
+
+ :param string audio_file_path: the path of the WAVE file to be created, UTF-8 encoded
+ :param int filter_bank_size: the number of Mel filters
+ :param int mfcc_size: the number of MFCC coefficients
+ :param int fft_order: the order of the FFT
+ :param float lower_frequency: the lower frequency to cut, in Hz
+ :param float upper_frequency: the upper frequency to cut, in Hz
+ :param float emphasis_factor: the pre-emphasis factor
+ :param float window_length: the length of the MFCC window, in seconds
+ :param float window_shift: the shift of the MFCC window, in seconds
+ :rtype: tuple
"""
__author__ = "Alberto Pettarin"
@@ -12,7 +58,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL 3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/cmfcc/cint.c b/aeneas/cmfcc/cint.c
new file mode 120000
index 00000000..8e1c9dae
--- /dev/null
+++ b/aeneas/cmfcc/cint.c
@@ -0,0 +1 @@
+../cint/cint.c
\ No newline at end of file
diff --git a/aeneas/cmfcc/cint.h b/aeneas/cmfcc/cint.h
new file mode 120000
index 00000000..27a6bb39
--- /dev/null
+++ b/aeneas/cmfcc/cint.h
@@ -0,0 +1 @@
+../cint/cint.h
\ No newline at end of file
diff --git a/aeneas/cmfcc/cmfcc_driver.c b/aeneas/cmfcc/cmfcc_driver.c
index 4ff40ea7..cd87a2f5 100644
--- a/aeneas/cmfcc/cmfcc_driver.c
+++ b/aeneas/cmfcc/cmfcc_driver.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for computing the MFCCs from a WAVE mono file.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -26,26 +26,21 @@ __status__ = "Production"
#include "cwave_func.h"
#endif
-//
-// this is a simple driver to test on the command line
-//
-// you can compile it with: sndfile (s) or cwave_func (o) for reading WAVE info and data,
-// and with: fftw (f) or cmfcc_func (o) for computing the RFFT
-//
-// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_wo_fo -lm
-// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_wo_ff -lm -lrfftw -lfftw -DUSE_FFTW
-// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_ws_fo -lm -lsndfile -DUSE_SNDFILE
-// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_ws_ff -lm -lsndfile -lrfftw -lfftw -DUSE_SNDFILE -DUSE_FFTW
-//
-// use it as follows:
-//
-// ./cmfcc_driver_X audio.wav out.mfcc data text => load file in RAM, then compute MFCC, output text
-// ./cmfcc_driver_X audio.wav out.mfcc file text => compute MFCC directly from file, output text
-// ./cmfcc_driver_X audio.wav out.mfcc data binary => load file in RAM, then compute MFCC, output binary
-// ./cmfcc_driver_X audio.wav out.mfcc file binary => compute MFCC directly from file, output binary
-//
-// where X is wo_fo|wo_ff|ws_fo|ws_ff as described above
-//
+#define DRIVER_SUCCESS 0
+#define DRIVER_FAILURE 1
+
+// print usage
+void _usage(const char *prog) {
+ printf("\n");
+ printf("Usage: %s AUDIO_FILE.wav OUTPUT.bin [data|file] [text|binary]\n", prog);
+ printf("\n");
+ printf("Example: %s ../tools/res/audio.wav /tmp/out.dt.bin data text\n", prog);
+ printf(" %s ../tools/res/audio.wav /tmp/out.db.bin data binary\n", prog);
+ printf(" %s ../tools/res/audio.wav /tmp/out.ft.bin file text\n", prog);
+ printf(" %s ../tools/res/audio.wav /tmp/out.fb.bin file binary\n", prog);
+ printf("\n");
+}
+
int main(int argc, char **argv) {
#if USE_SNDFILE
@@ -58,22 +53,23 @@ int main(int argc, char **argv) {
char *audio_file_name, *output_file_name, *mode, *output_format;
double *data_ptr, *mfcc_ptr;
- unsigned int data_length, sample_rate, mfcc_length;
FILE *output_file;
- unsigned int i, j;
+ uint32_t sample_rate;
+ uint32_t data_length, mfcc_length;
+ uint32_t i, j;
- const unsigned int filter_bank_size = 40;
- const unsigned int mfcc_size = 13;
- const unsigned int fft_order = 512;
+ const uint32_t filter_bank_size = 40;
+ const uint32_t mfcc_size = 13;
+ const uint32_t fft_order = 512;
const double lower_frequency = 133.3333;
const double upper_frequency = 6855.4976;
const double emphasis_factor = 0.97;
- const double window_length = 0.025;
- const double window_shift = 0.010;
+ const double window_length = 0.100;
+ const double window_shift = 0.040;
if (argc < 5) {
- printf("\nUsage: %s AUDIO_FILE.wav OUTPUT.bin [data|file] [text|binary]\n\n", argv[0]);
- return 1;
+ _usage(argv[0]);
+ return DRIVER_FAILURE;
}
audio_file_name = argv[1];
output_file_name = argv[2];
@@ -102,7 +98,7 @@ int main(int argc, char **argv) {
if (!(audio_file = sf_open(audio_file_name, SFM_READ, &audio_info))) {
printf("Error: unable to open input file %s.\n", audio_file_name);
puts(sf_strerror(NULL));
- return 1;
+ return DRIVER_FAILURE;
}
data_length = audio_info.frames;
sample_rate = audio_info.samplerate;
@@ -110,10 +106,9 @@ int main(int argc, char **argv) {
sf_read_double(audio_file, data_ptr, audio_info.frames);
sf_close(audio_file);
#else
- memset(&audio_info, 0, sizeof(audio_info));
if (!(audio_file = wave_open(audio_file_name, &audio_info))) {
printf("Error: unable to open input file %s.\n", audio_file_name);
- return 1;
+ return DRIVER_FAILURE;
}
data_length = audio_info.coNumSamples;
sample_rate = audio_info.leSampleRate;
@@ -193,6 +188,6 @@ int main(int argc, char **argv) {
free((void *)mfcc_ptr);
mfcc_ptr = NULL;
- return 0;
+ return DRIVER_SUCCESS;
}
diff --git a/aeneas/cmfcc/cmfcc_func.c b/aeneas/cmfcc/cmfcc_func.c
index 2f11741c..7a9c1b7f 100644
--- a/aeneas/cmfcc/cmfcc_func.c
+++ b/aeneas/cmfcc/cmfcc_func.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for computing the MFCCs from a WAVE mono file.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -34,7 +34,7 @@ __status__ = "Production"
#endif
// return the min of the given arguments
-unsigned int _min(unsigned int a, unsigned int b) {
+uint32_t _min(uint32_t a, uint32_t b) {
if (a < b) {
return a;
}
@@ -42,7 +42,7 @@ unsigned int _min(unsigned int a, unsigned int b) {
}
// return the max of the given arguments
-unsigned int _max(unsigned int a, unsigned int b) {
+uint32_t _max(uint32_t a, uint32_t b) {
if (a > b) {
return a;
}
@@ -50,24 +50,26 @@ unsigned int _max(unsigned int a, unsigned int b) {
}
// round the given number to the nearest integer
-// or return 0 if the argument is negative
+// or return zero if the argument is negative
// e.g.: 1.1 => 1; 1.6 => 2
-unsigned int _round(double x) {
- if (x <= 0) {
- //printf("Error: _round argument is negative!!!\n");
- return 0;
+uint32_t _round(double x) {
+ if (x < 0.0) {
+ return 0; //printf("Error: _round argument is negative!!!\n");
}
- return (unsigned int)floor(x + 0.5);
+ return (uint32_t)floor(x + 0.5);
}
// precompute the sin table for the FFT/RFFT
-double *_precompute_sin_table(unsigned int m) {
+double *_precompute_sin_table(uint32_t m) {
const double arg = PI / m * 2;
- const unsigned int size = m - m / 4 + 1;
+ const uint32_t size = m - m / 4 + 1;
double *table;
int k;
table = (double *)calloc(size, sizeof(double));
+ if (table == NULL) {
+ return NULL;
+ }
table[0] = 0;
for (k = 1; k < size; ++k) {
table[k] = sin(arg * k);
@@ -75,7 +77,7 @@ double *_precompute_sin_table(unsigned int m) {
table[m / 2] = 0;
return table;
}
-int fft(double *x, double *y, const unsigned int m, double *sin_table) {
+int fft(double *x, double *y, const uint32_t m, double *sin_table) {
// code adapted from the fft function of SPTK
double t1, t2;
double *cosp, *sinp, *xp, *yp;
@@ -146,12 +148,12 @@ int fft(double *x, double *y, const unsigned int m, double *sin_table) {
yp = y + j;
}
- return 0;
+ return CMFCC_SUCCESS;
}
int rfft(
double *x,
double *y,
- const unsigned int m,
+ const uint32_t m,
double *sin_table_full,
double *sin_table_half
) {
@@ -169,7 +171,7 @@ int rfft(
}
if (fft(x, y, mv2, sin_table_half) == -1) {
- return -1;
+ return CMFCC_FAILURE;
}
sinp = sin_table_full;
@@ -202,7 +204,7 @@ int rfft(
*yp++ = -(*(--yq));
}
- return 0;
+ return CMFCC_SUCCESS;
}
// convert Hz frequency to Mel frequency
@@ -217,18 +219,21 @@ double _mel2hz(const double m) {
// pre emphasis of the given frame
// returns the prior to be used for the next frame
-void _apply_emphasis(
+int _apply_emphasis(
double *frame,
- const unsigned int length,
+ const uint32_t length,
const double emphasis_factor,
double *prior
) {
double prior_orig;
double *frame_orig;
- unsigned int i;
+ uint32_t i;
prior_orig = frame[length - 1];
frame_orig = (double *)calloc(length, sizeof(double));
+ if (frame_orig == NULL) {
+ return CMFCC_FAILURE;
+ }
memcpy(frame_orig, frame, length * sizeof(double));
frame[0] = frame_orig[0] - emphasis_factor * (*prior);
for (i = 1; i < length; ++i) {
@@ -237,23 +242,27 @@ void _apply_emphasis(
free((void *)frame_orig);
frame_orig = NULL;
*prior = prior_orig;
+ return CMFCC_SUCCESS;
}
// own code
// compute the power of the given frame
-void _compute_power(
+int _compute_power(
double *frame, // it has length == fft_order
double *power, // power has length == (fft_order / 2) + 1
- const unsigned int fft_order,
+ const uint32_t fft_order,
double *sin_table_full,
double *sin_table_half
) {
double *tmp;
- unsigned int k;
- const unsigned int n = fft_order; // length of the I/O vectors
- const unsigned int m = (fft_order / 2) + 1; // length of power
+ uint32_t k;
+ const uint32_t n = fft_order; // length of the I/O vectors
+ const uint32_t m = (fft_order / 2) + 1; // length of power
tmp = (double *)calloc(n + m, sizeof(double));
+ if (tmp == NULL) {
+ return CMFCC_FAILURE;
+ }
rfft(frame, tmp, fft_order, sin_table_full, sin_table_half);
power[0] = frame[0] * frame[0];
for (k = 1; k < m; ++k) {
@@ -261,24 +270,28 @@ void _compute_power(
}
free((void *)tmp);
tmp = NULL;
+ return CMFCC_SUCCESS;
}
#ifdef USE_FFTW
// fftw code
// compute the power of the given frame
-void _compute_power_fftw(
+int _compute_power_fftw(
double *frame, // it has length == fft_order
double *power, // power has length == (fft_order / 2) + 1
- const unsigned int fft_order,
+ const uint32_t fft_order,
rfftw_plan plan
) {
- unsigned int k;
+ uint32_t k;
double *out;
- const unsigned int n = fft_order; // length of the I/O vectors
- //const unsigned int m = (fft_order / 2) + 1; // length of power
+ const uint32_t n = fft_order; // length of the I/O vectors
+ //const uint32_t m = (fft_order / 2) + 1; // length of power
out = (double *)calloc(n, sizeof(double));
+ if (out == NULL) {
+ return CMFCC_FAILURE;
+ }
rfftw_one(plan, frame, out);
power[0] = out[0] * out[0];
for (k = 1; k < (n+1)/2; ++k) {
@@ -289,27 +302,32 @@ void _compute_power_fftw(
}
free((void *)out);
out = NULL;
+ return CMFCC_SUCCESS;
}
#endif
// transform the frame using the Hamming window
-void _apply_hamming(
+int _apply_hamming(
double *frame,
- const unsigned int frame_length,
+ const uint32_t frame_length,
double *coefficients
) {
- unsigned int k;
+ uint32_t k;
for (k = 0; k < frame_length; ++k) {
frame[k] *= coefficients[k];
}
+ return CMFCC_SUCCESS;
}
-double *_precompute_hamming(const unsigned int frame_length) {
+double *_precompute_hamming(const uint32_t frame_length) {
const double arg = PI_2 / (frame_length - 1);
double *coefficients;
- unsigned int k;
+ uint32_t k;
coefficients = (double *)calloc(frame_length, sizeof(double));
+ if (coefficients == NULL) {
+ return NULL;
+ }
for (k = 0; k < frame_length; ++k) {
coefficients[k] = (0.54 - 0.46 * cos(k * arg));
}
@@ -319,9 +337,9 @@ double *_precompute_hamming(const unsigned int frame_length) {
// create Mel filter bank
// return a pointer to a 2D matrix (filters_n x filter_bank_size)
double *_create_mel_filter_bank(
- unsigned int fft_order,
- unsigned int filter_bank_size,
- unsigned int sample_rate,
+ uint32_t fft_order,
+ uint32_t filter_bank_size,
+ uint32_t sample_rate,
double upper_frequency,
double lower_frequency
) {
@@ -329,28 +347,34 @@ double *_create_mel_filter_bank(
const double melmax = _hz2mel(upper_frequency);
const double melmin = _hz2mel(lower_frequency);
const double melstep = (melmax - melmin) / (filter_bank_size + 1);
- const unsigned int filter_edge_length = filter_bank_size + 2;
- const unsigned int filters_n = (fft_order / 2) + 1;
+ const uint32_t filter_edge_length = filter_bank_size + 2;
+ const uint32_t filters_n = (fft_order / 2) + 1;
double *filter_edges, *filters;
- unsigned int k;
+ uint32_t k;
// filter bank
filters = (double *)calloc(filters_n * filter_bank_size, sizeof(double));
+ if (filters == NULL) {
+ return NULL;
+ }
// filter edges
filter_edges = (double *)calloc(filter_edge_length, sizeof(double));
+ if (filter_edges == NULL) {
+ return NULL;
+ }
for (k = 0; k < filter_edge_length; ++k) {
filter_edges[k] = _mel2hz(melmin + melstep * k);
}
for (k = 0; k < filter_bank_size; ++k) {
- const unsigned int left_frequency = _round(filter_edges[k] / step_frequency);
- const unsigned int center_frequency = _round(filter_edges[k + 1] / step_frequency);
- const unsigned int right_frequency = _round(filter_edges[k + 2] / step_frequency);
+ const uint32_t left_frequency = _round(filter_edges[k] / step_frequency);
+ const uint32_t center_frequency = _round(filter_edges[k + 1] / step_frequency);
+ const uint32_t right_frequency = _round(filter_edges[k + 2] / step_frequency);
const double width_frequency = (right_frequency - left_frequency) * step_frequency;
const double height_frequency = 2.0 / width_frequency;
double left_slope, right_slope;
- unsigned int current_frequency;
+ uint32_t current_frequency;
left_slope = 0.0;
if (center_frequency != left_frequency) {
@@ -381,11 +405,14 @@ double *_create_mel_filter_bank(
// create the DCT matrix
// return a pointer to a 2D matrix (mfcc_size x filter_bank_size)
-double *_create_dct_matrix(unsigned int mfcc_size, unsigned int filter_bank_size) {
+double *_create_dct_matrix(uint32_t mfcc_size, uint32_t filter_bank_size) {
double *s2dct;
- unsigned int i, j;
+ uint32_t i, j;
s2dct = (double *)calloc(mfcc_size * filter_bank_size, sizeof(double));
+ if (s2dct == NULL) {
+ return NULL;
+ }
for (i = 0; i < mfcc_size; ++i) {
const double frequency = PI * i / filter_bank_size;
for (j = 0; j < filter_bank_size; ++j) {
@@ -404,26 +431,26 @@ int _compute_mfcc(
double *data_ptr,
FILE *audio_file_ptr,
struct WAVE_INFO header,
- const unsigned int data_length,
- const unsigned int sample_rate,
- const unsigned int filter_bank_size,
- const unsigned int mfcc_size,
- const unsigned int fft_order,
+ const uint32_t data_length,
+ const uint32_t sample_rate,
+ const uint32_t filter_bank_size,
+ const uint32_t mfcc_size,
+ const uint32_t fft_order,
const double lower_frequency,
const double upper_frequency,
const double emphasis_factor,
const double window_length,
const double window_shift,
double **mfcc_ptr,
- unsigned int *mfcc_length
+ uint32_t *mfcc_length
) {
double *filters, *s2dct, *sin_table_full, *sin_table_half, *hamming_coefficients;
double *frame, *power, *logsp;
double prior, acc;
- unsigned int i, j, filters_n;
- unsigned int frame_length, frame_shift, frame_length_padded;
- unsigned int number_of_frames, frame_index, frame_start, frame_end;
+ uint32_t filters_n, frame_length, frame_shift, frame_length_padded;
+ uint32_t number_of_frames, frame_index, frame_start, frame_end;
+ uint32_t i, j;
#if USE_FFTW
rfftw_plan plan;
@@ -431,7 +458,7 @@ int _compute_mfcc(
if (upper_frequency > (sample_rate / 2.0)) {
// upper frequency exceeds Nyquist
- return 0;
+ return CMFCC_FAILURE;
}
#if USE_FFTW
@@ -447,34 +474,43 @@ int _compute_mfcc(
sample_rate,
upper_frequency,
lower_frequency);
+ if (filters == NULL) {
+ return CMFCC_FAILURE;
+ }
// compute DCT matrix
s2dct = _create_dct_matrix(mfcc_size, filter_bank_size);
+ if (s2dct == NULL) {
+ return CMFCC_FAILURE;
+ }
// length of a frame, in samples
- frame_length = (unsigned int)floor(window_length * sample_rate);
+ frame_length = (uint32_t)floor(window_length * sample_rate);
frame_length_padded = _max(frame_length, fft_order);
// shift of a frame, in samples
- frame_shift = (unsigned int)floor(window_shift * sample_rate);
+ frame_shift = (uint32_t)floor(window_shift * sample_rate);
// value of the last sample in the previous frame
prior = 0.0;
// number of frames
- number_of_frames = (unsigned int)floor(1.0 * data_length / frame_shift);
+ number_of_frames = (uint32_t)floor(1.0 * data_length / frame_shift);
*mfcc_length = number_of_frames;
// allocate the mfcc matrix
*mfcc_ptr = (double *)calloc(number_of_frames * mfcc_size, sizeof(double));
+ if ((*mfcc_ptr) == NULL) {
+ return CMFCC_FAILURE;
+ }
- // precompute sin tables
+ // precompute sin tables and hamming coefficients
sin_table_full = _precompute_sin_table(fft_order);
sin_table_half = _precompute_sin_table(fft_order / 2);
-
- // precompute hamming coefficients
hamming_coefficients = _precompute_hamming(frame_length);
-
+ if ((sin_table_full == NULL) || (sin_table_half == NULL) || (hamming_coefficients == NULL)) {
+ return CMFCC_FAILURE;
+ }
//printf("Frame length: %d\n", frame_length);
//printf("Frame shift: %d\n", frame_shift);
//printf("Frame length padded: %d\n", frame_length_padded);
@@ -489,6 +525,9 @@ int _compute_mfcc(
frame = (double *)calloc(frame_length_padded, sizeof(double));
power = (double *)calloc(filters_n, sizeof(double));
logsp = (double *)calloc(filter_bank_size, sizeof(double));
+ if ((frame == NULL) || (power == NULL) || (logsp == NULL)) {
+ return CMFCC_FAILURE;
+ }
// process frames
for (frame_index = 0; frame_index < number_of_frames; ++frame_index) {
@@ -507,7 +546,9 @@ int _compute_mfcc(
frame_start = frame_index * frame_shift;
frame_end = _min(frame_start + frame_length, data_length);
if (data_ptr == NULL) {
- wave_read_double(audio_file_ptr, &header, frame, frame_start, (frame_end - frame_start));
+ if (wave_read_double(audio_file_ptr, &header, frame, frame_start, (frame_end - frame_start)) != CWAVE_SUCCESS) {
+ return CMFCC_FAILURE;
+ }
} else {
memcpy(frame, data_ptr + frame_start, (frame_end - frame_start) * sizeof(double));
}
@@ -515,15 +556,23 @@ int _compute_mfcc(
//printf("Frame %d : %d -> %d\n", frame_index, frame_start, frame_end);
// emphasis + hamming + compute power
- _apply_emphasis(frame, frame_length, emphasis_factor, &prior);
- _apply_hamming(frame, frame_length, hamming_coefficients);
+ if (_apply_emphasis(frame, frame_length, emphasis_factor, &prior) != CMFCC_SUCCESS) {
+ return CMFCC_FAILURE;
+ }
+ if (_apply_hamming(frame, frame_length, hamming_coefficients) != CMFCC_SUCCESS) {
+ return CMFCC_FAILURE;
+ }
#ifdef USE_FFTW
// fftw code
- _compute_power_fftw(frame, power, fft_order, plan);
+ if (_compute_power_fftw(frame, power, fft_order, plan) != CMFCC_SUCCESS) {
+ return CMFCC_FAILURE;
+ }
#else
// own code
- _compute_power(frame, power, fft_order, sin_table_full, sin_table_half);
+ if (_compute_power(frame, power, fft_order, sin_table_full, sin_table_half) != CMFCC_SUCCESS) {
+ return CMFCC_FAILURE;
+ }
#endif
// apply Mel filter bank
@@ -568,25 +617,24 @@ int _compute_mfcc(
sin_table_full = NULL;
s2dct = NULL;
filters = NULL;
-
- return 1;
+ return CMFCC_SUCCESS;
}
// compute MFCC from data loaded in RAM
int compute_mfcc_from_data(
double *data_ptr,
- const unsigned int data_length,
- const unsigned int sample_rate,
- const unsigned int filter_bank_size,
- const unsigned int mfcc_size,
- const unsigned int fft_order,
+ const uint32_t data_length,
+ const uint32_t sample_rate,
+ const uint32_t filter_bank_size,
+ const uint32_t mfcc_size,
+ const uint32_t fft_order,
const double lower_frequency,
const double upper_frequency,
const double emphasis_factor,
const double window_length,
const double window_shift,
double **mfcc_ptr,
- unsigned int *mfcc_length
+ uint32_t *mfcc_length
) {
// to keep the compile happy, it will never be used
@@ -614,41 +662,42 @@ int compute_mfcc_from_data(
// compute MFCC from file on disk
int compute_mfcc_from_file(
char *audio_file_path,
- const unsigned int filter_bank_size,
- const unsigned int mfcc_size,
- const unsigned int fft_order,
+ const uint32_t filter_bank_size,
+ const uint32_t mfcc_size,
+ const uint32_t fft_order,
const double lower_frequency,
const double upper_frequency,
const double emphasis_factor,
const double window_length,
const double window_shift,
- unsigned int *data_length_ret,
- unsigned int *sample_rate_ret,
+ uint32_t *data_length,
+ uint32_t *sample_rate,
double **mfcc_ptr,
- unsigned int *mfcc_length
+ uint32_t *mfcc_length
) {
FILE *audio_file_ptr;
struct WAVE_INFO header;
- unsigned int data_length, sample_rate;
+ uint32_t sample_rate_loc;
+ uint32_t data_length_loc;
int ret;
// open file
- memset(&header, 0, sizeof(header));
- if (! (audio_file_ptr = wave_open(audio_file_path, &header))) {
+ audio_file_ptr = wave_open(audio_file_path, &header);
+ if (audio_file_ptr == NULL) {
//printf("Error: cannot open file\n");
- return 0;
+ return CMFCC_FAILURE;
}
- data_length = header.coNumSamples;
- sample_rate = header.leSampleRate;
+ data_length_loc = header.coNumSamples;
+ sample_rate_loc = header.leSampleRate;
// compute mfcc
ret = _compute_mfcc(
NULL,
audio_file_ptr,
header,
- data_length,
- sample_rate,
+ data_length_loc,
+ sample_rate_loc,
filter_bank_size,
mfcc_size,
fft_order,
@@ -663,11 +712,11 @@ int compute_mfcc_from_file(
// close file
wave_close(audio_file_ptr);
- *data_length_ret = data_length;
- *sample_rate_ret = sample_rate;
+ *data_length = data_length_loc;
+ *sample_rate = sample_rate_loc;
return ret;
-};
+}
diff --git a/aeneas/cmfcc/cmfcc_func.h b/aeneas/cmfcc/cmfcc_func.h
index ba80bfc6..e6aa89b2 100644
--- a/aeneas/cmfcc/cmfcc_func.h
+++ b/aeneas/cmfcc/cmfcc_func.h
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for computing the MFCCs from a WAVE mono file.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,47 +9,48 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
*/
-// NOTE: using unsigned int as it is 32-bit wide on all modern architectures
-// not using uint32_t because the MS C compiler does not have
-// or, at least, it is not easy to use it
+#include "cint.h"
+
+#define CMFCC_SUCCESS 0
+#define CMFCC_FAILURE 1
// compute MFCC from data loaded in RAM
int compute_mfcc_from_data(
double *data_ptr,
- const unsigned int data_length,
- const unsigned int sample_rate,
- const unsigned int filter_bank_size,
- const unsigned int mfcc_size,
- const unsigned int fft_order,
+ const uint32_t data_length,
+ const uint32_t sample_rate,
+ const uint32_t filter_bank_size,
+ const uint32_t mfcc_size,
+ const uint32_t fft_order,
const double lower_frequency,
const double upper_frequency,
const double emphasis_factor,
const double window_length,
const double window_shift,
double **mfcc_ptr,
- unsigned int *mfcc_length
+ uint32_t *mfcc_length
);
// compute MFCC from file on disk
int compute_mfcc_from_file(
char *audio_file_path,
- const unsigned int filter_bank_size,
- const unsigned int mfcc_size,
- const unsigned int fft_order,
+ const uint32_t filter_bank_size,
+ const uint32_t mfcc_size,
+ const uint32_t fft_order,
const double lower_frequency,
const double upper_frequency,
const double emphasis_factor,
const double window_length,
const double window_shift,
- unsigned int *data_length_ret,
- unsigned int *sample_rate_ret,
+ uint32_t *data_length,
+ uint32_t *sample_rate,
double **mfcc_ptr,
- unsigned int *mfcc_length
+ uint32_t *mfcc_length
);
diff --git a/aeneas/cmfcc/cmfcc_py.c b/aeneas/cmfcc/cmfcc_py.c
index 59319b97..b65df710 100644
--- a/aeneas/cmfcc/cmfcc_py.c
+++ b/aeneas/cmfcc/cmfcc_py.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for computing the MFCCs from a WAVE mono file.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -25,30 +25,26 @@ __status__ = "Production"
#include "cmfcc_func.h"
// compute the MFCCs of the given audio data (mono)
-// take the PyObject containing the following arguments (see below)
-// and return the MFCCs as a n x mfcc_size 2D array of double, where
-// - n is the number of frames
-// - mfcc_size is the number of ceptral coefficients (including the 0-th)
static PyObject *compute_from_data(PyObject *self, PyObject *args) {
- PyObject *data_raw; // 1D array of double, holding the data
- unsigned int sample_rate; // sample rate (default: 16000)
- unsigned int filter_bank_size; // number of filters in the filter bank (default: 40)
- unsigned int mfcc_size; // number of ceptral coefficients (default: 13)
- unsigned int fft_order; // FFT order; must be a power of 2 (default: 512)
- double lower_frequency; // lower frequency (default: 133.3333)
- double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976)
- double emphasis_factor; // pre-emphasis factor (default: 0.97)
- double window_length; // window length (default: 0.0250)
- double window_shift; // window shift (default: 0.010)
+ PyObject *data_raw; // 1D array of double, holding the data
+ uint32_t sample_rate; // sample rate (default: 16000)
+ uint32_t filter_bank_size; // number of filters in the filter bank (default: 40)
+ uint32_t mfcc_size; // number of ceptral coefficients (default: 13)
+ uint32_t fft_order; // FFT order; must be a power of 2 (default: 512)
+ double lower_frequency; // lower frequency (default: 133.3333)
+ double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976)
+ double emphasis_factor; // pre-emphasis factor (default: 0.97)
+ double window_length; // window length (default: 0.0250)
+ double window_shift; // window shift (default: 0.010)
PyObject *tuple;
PyArrayObject *data, *mfcc;
npy_intp mfcc_dimensions[2];
double *data_ptr, *mfcc_ptr;
- unsigned int data_length, mfcc_length;
+ uint32_t data_length, mfcc_length;
// O = object (do not convert or check for errors)
- // I = unsigned integer
+ // I = uint32_teger
// d = double
if (!PyArg_ParseTuple(
args,
@@ -75,10 +71,10 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) {
data_ptr = (double *)PyArray_DATA(data);
// number of audio samples in data (= duration in seconds * sample_rate)
- data_length = (unsigned int)PyArray_DIMS(data)[0];
+ data_length = (uint32_t)PyArray_DIMS(data)[0];
// compute MFCC matrix
- if (!compute_mfcc_from_data(
+ if (compute_mfcc_from_data(
data_ptr,
data_length,
sample_rate,
@@ -91,7 +87,7 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) {
window_length,
window_shift,
&mfcc_ptr,
- &mfcc_length)
+ &mfcc_length) != CMFCC_SUCCESS
) {
// failed
PyErr_SetString(PyExc_ValueError, "Error while calling compute_mfcc_from_data()");
@@ -115,30 +111,27 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) {
return tuple;
}
-// compute the MFCCs of the given data
-// take the PyObject containing the following arguments (see below)
-// and return the MFCCs as a n x mfcc_size 2D array of double, where
-// - n is the number of frames
-// - mfcc_size is the number of ceptral coefficients (including the 0-th)
+// compute the MFCCs of the given audio file
static PyObject *compute_from_file(PyObject *self, PyObject *args) {
- char *audio_file_path; // path of the WAVE file
- unsigned int filter_bank_size; // number of filters in the filter bank (default: 40)
- unsigned int mfcc_size; // number of ceptral coefficients (default: 13)
- unsigned int fft_order; // FFT order; must be a power of 2 (default: 512)
- double lower_frequency; // lower frequency (default: 133.3333)
- double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976)
- double emphasis_factor; // pre-emphasis factor (default: 0.97)
- double window_length; // window length (default: 0.0250)
- double window_shift; // window shift (default: 0.010)
+ char *audio_file_path; // path of the WAVE file
+ uint32_t filter_bank_size; // number of filters in the filter bank (default: 40)
+ uint32_t mfcc_size; // number of ceptral coefficients (default: 13)
+ uint32_t fft_order; // FFT order; must be a power of 2 (default: 512)
+ double lower_frequency; // lower frequency (default: 133.3333)
+ double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976)
+ double emphasis_factor; // pre-emphasis factor (default: 0.97)
+ double window_length; // window length (default: 0.0250)
+ double window_shift; // window shift (default: 0.010)
PyObject *tuple;
PyArrayObject *mfcc;
npy_intp mfcc_dimensions[2];
double *mfcc_ptr;
- unsigned int data_length, sample_rate, mfcc_length;
+ uint32_t sample_rate;
+ uint32_t data_length, mfcc_length;
// s = string
- // I = unsigned integer
+ // I = uint32_teger
// d = double
if (!PyArg_ParseTuple(
args,
@@ -158,7 +151,7 @@ static PyObject *compute_from_file(PyObject *self, PyObject *args) {
}
// compute MFCC matrix
- if (!compute_mfcc_from_file(
+ if (compute_mfcc_from_file(
audio_file_path,
filter_bank_size,
mfcc_size,
@@ -171,7 +164,7 @@ static PyObject *compute_from_file(PyObject *self, PyObject *args) {
&data_length,
&sample_rate,
&mfcc_ptr,
- &mfcc_length)
+ &mfcc_length) != CMFCC_SUCCESS
) {
// failed
PyErr_SetString(PyExc_ValueError, "Error while calling compute_mfcc_from_file()");
@@ -197,13 +190,34 @@ static PyMethodDef cmfcc_methods[] = {
"compute_from_data",
compute_from_data,
METH_VARARGS,
- "Given the data from a mono PCM16 WAVE file, compute and return the MFCCs"
+ "Given the data from a mono PCM16 WAVE file, compute and return the MFCCs\n"
+ ":param object data_raw: numpy 1D array of float values, one per sample\n"
+ ":param uint sample_rate: the sample rate of the WAVE file\n"
+ ":param uint filter_bank_size: the number of MFCC filters\n"
+ ":param uint mfcc_size: the number of MFCCs\n"
+ ":param uint fft_order: the order of the FFT\n"
+ ":param float lower_frequency: cut below this frequency, in Hz\n"
+ ":param float upper_frequency: cut above this frequency, in Hz\n"
+ ":param float emphasis_factor: pre-amplify frames by this factor\n"
+ ":param float window_length: MFCC window lenght, in s\n"
+ ":param float window_shift: MFCC window shift, in s\n"
+ ":rtype: tuple (mfccs, data_length, sample_rate)"
},
{
"compute_from_file",
compute_from_file,
METH_VARARGS,
- "Given the path of the mono PCM16 WAVE file, compute and return the MFCCs"
+ "Given the path of the mono PCM16 WAVE file, compute and return the MFCCs\n"
+ ":param string audio_file_path: the path of the audio file\n"
+ ":param uint filter_bank_size: the number of MFCC filters\n"
+ ":param uint mfcc_size: the number of MFCCs\n"
+ ":param uint fft_order: the order of the FFT\n"
+ ":param float lower_frequency: cut below this frequency, in Hz\n"
+ ":param float upper_frequency: cut above this frequency, in Hz\n"
+ ":param float emphasis_factor: pre-amplify frames by this factor\n"
+ ":param float window_length: MFCC window lenght, in s\n"
+ ":param float window_shift: MFCC window shift, in s\n"
+ ":rtype: tuple (mfccs, data_length, sample_rate)"
},
{
NULL,
diff --git a/aeneas/cmfcc/cmfcc_setup.py b/aeneas/cmfcc/cmfcc_setup.py
index b0c18122..dbc1fa1e 100644
--- a/aeneas/cmfcc/cmfcc_setup.py
+++ b/aeneas/cmfcc/cmfcc_setup.py
@@ -2,7 +2,7 @@
# coding=utf-8
"""
-Compile the Python C Extension for computing the MFCCs.
+Compile the Python C Extension for computing the MFCCs from a WAVE mono file.
.. versionadded:: 1.1.0
"""
@@ -23,15 +23,15 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-CMODULE = Extension("cmfcc", sources=["cmfcc_py.c", "cmfcc_func.c", "cwave_func.c"], include_dirs=[get_include()])
+CMODULE = Extension("cmfcc", sources=["cmfcc_py.c", "cmfcc_func.c", "cwave_func.c", "cint.c"], include_dirs=[get_include()])
setup(
name="cmfcc",
- version="1.4.1",
+ version="1.5.0",
description="""
Python C Extension for computing the MFCCs as fast as your bare metal allows.
""",
diff --git a/aeneas/configurationobject.py b/aeneas/configuration.py
similarity index 62%
rename from aeneas/configurationobject.py
rename to aeneas/configuration.py
index 44dbb5d9..9f45193a 100644
--- a/aeneas/configurationobject.py
+++ b/aeneas/configuration.py
@@ -2,8 +2,13 @@
# coding=utf-8
"""
-Basically a dictionary with a fixed set of keys,
-with default values and aliases.
+This module contains the following classes:
+
+* :class:`~aeneas.configuration.Configuration`
+ which is a dictionary with a fixed set of keys,
+ possibly with default values and key aliases.
+
+.. versionadded:: 1.4.1
"""
from __future__ import absolute_import
@@ -19,28 +24,35 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class ConfigurationObject(object):
+class Configuration(object):
"""
- A structure representing a generic configuration object, that is,
+ A generic configuration object, that is,
a dictionary with a fixed set of keys,
- each with a default value, type, and possibly aliases.
+ each with a type, a default value, and possibly aliases.
+
+ Keys are (unique) Unicode strings.
- Values are stored as Unicode strings, and casted to int or float
+ Values are stored as Unicode strings (or ``None``), and casted
+ to the type of the field (``int``, ``float``,
+ ``bool``, :class:`~aeneas.timevalue.TimeValue`, etc.)
when accessed.
- :param config_string: the job configuration string
- :type config_string: Unicode string
+ For ``bool`` keys, values listed in
+ :data:`~aeneas.configuration.Configuration.TRUE_ALIASES`
+ are considered equivalent to a ``True`` value.
- :raises TypeError: if ``config_string`` is not ``None`` and
- it is not a Unicode string
- :raises KeyError: if trying to access a key not listed above
- """
+ If ``config_string`` is not ``None``, the given string will be parsed
+ and ``key=value`` pairs will be stored in the object,
+ provided that ``key`` is listed in :data:`~aeneas.configuration.Configuration.FIELDS`.
- TAG = u"ConfigurationObject"
+ :param string config_string: the configuration string to be parsed
+ :raises: TypeError: if ``config_string`` is not ``None`` and it is not a Unicode string
+ :raises: KeyError: if trying to access a key not listed above
+ """
FIELDS = [
#
@@ -49,13 +61,26 @@ class ConfigurationObject(object):
#
# examples:
# (gc.FOO, (None, None, ["foo"]))
- # (gc.BAR, (0.0, float, ["bar", "baz"]))
+ # (gc.BAR, (0.0, float, ["bar", "barrr"]))
+ # (gc.BAZ, (None, TimeValue, ["baz"]))
#
]
+ """
+ The fields, that is, key names each with associated
+ default value, type, and possibly aliases,
+ of this object.
+ """
+
+ TRUE_ALIASES = [True, u"TRUE", u"True", u"true", u"YES", u"Yes", u"yes", u"1", 1]
+ """
+ Aliases for a ``True`` value for ``bool`` fields
+ """
+
+ TAG = u"Configuration"
def __init__(self, config_string=None):
if (config_string is not None) and (not gf.is_unicode(config_string)):
- raise TypeError("config_string is not a Unicode string")
+ raise TypeError(u"config_string is not a Unicode string")
# set dictionaries up to keep the config data
self.data = {}
@@ -70,7 +95,7 @@ def __init__(self, config_string=None):
if config_string is not None:
# strip leading/trailing " or ' characters
- if (config_string[0] == config_string[-1]) and (config_string[0] in [u"\"", u"'"]):
+ if (len(config_string) > 0) and (config_string[0] == config_string[-1]) and (config_string[0] in [u"\"", u"'"]):
config_string = config_string[1:-1]
# populate values from config_string,
# ignoring keys not present in FIELDS
@@ -106,7 +131,7 @@ def __str__(self):
def _cast(self, key, value):
if (value is not None) and (self.types[key] is not None):
if self.types[key] is bool:
- return value in [True, u"True", u"true", u"Yes", u"yes", u"1", 1]
+ return value in self.TRUE_ALIASES
else:
return self.types[key](value)
return value
@@ -114,7 +139,7 @@ def _cast(self, key, value):
def config_string(self):
"""
Build the storable string corresponding
- to this job configuration object.
+ to this configuration object.
:rtype: string
"""
diff --git a/aeneas/container.py b/aeneas/container.py
index 04f330c1..c4aaeba2 100644
--- a/aeneas/container.py
+++ b/aeneas/container.py
@@ -2,18 +2,15 @@
# coding=utf-8
"""
-A container is an abstraction for a group of files (entries)
-compressed into an archive file (e.g., ZIP or TAR)
-or uncompressed inside a directory.
-
-This module contains two main classes.
-
-1. :class:`aeneas.container.Container`
- is the main class, exposing functions
- like extracting all or just one entry,
- listing the entries in the container, etc.
-2. :class:`aeneas.container.ContainerFormat`
- is an enumeration of the supported container formats.
+This module contains the following classes:
+
+* :class:`~aeneas.container.Container`
+ is the main class, exposing functions
+ like extracting all entries,
+ extracting just one entry,
+ listing the entries in the container, etc.;
+* :class:`~aeneas.container.ContainerFormat`
+ is an enumeration of the supported container formats.
"""
from __future__ import absolute_import
@@ -23,7 +20,7 @@
import tarfile
import zipfile
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
import aeneas.globalconstants as gc
import aeneas.globalfunctions as gf
@@ -34,7 +31,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -67,52 +64,49 @@ class ContainerFormat(object):
ALLOWED_VALUES = [EPUB, TAR, TAR_GZ, TAR_BZ2, UNPACKED, ZIP]
""" List of all the allowed values """
-class Container(object):
+
+
+class Container(Loggable):
"""
An abstraction for different archive formats like ZIP or TAR,
- exposing common functions like extracting all files or
- a single file, listing the files, etc.
+ exposing common functions like extracting all entries or
+ just a single entry, listing the entries, etc.
An (uncompressed) directory can be used in lieu of a compressed file.
- :param file_path: the path to the container file (or directory)
- :type file_path: string (path)
+ :param string file_path: the path to the container file (or directory)
:param container_format: the format of the container
- :type container_format: :class:`aeneas.container.ContainerFormat`
+ :type container_format: :class:`~aeneas.container.ContainerFormat`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
-
- :raise TypeError: if ``file_path`` is None
- :raise ValueError: if ``container_format`` is not None and is not an allowed value
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: TypeError: if ``file_path`` is ``None``
+ :raises: ValueError: if ``container_format`` is not ``None`` and is not an allowed value
"""
TAG = u"Container"
- def __init__(self, file_path, container_format=None, logger=None):
+ def __init__(self, file_path, container_format=None, rconf=None, logger=None):
if file_path is None:
- raise TypeError("File path is None")
+ raise TypeError(u"File path is None")
if (
(container_format is not None) and
(container_format not in ContainerFormat.ALLOWED_VALUES)
):
- raise ValueError("Container format not allowed")
+ raise ValueError(u"Container format not allowed")
+ super(Container, self).__init__(rconf=rconf, logger=logger)
self.file_path = file_path
self.container_format = container_format
self.actual_container = None
- self.logger = logger or Logger()
- self._log(u"Setting actual Container object")
self._set_actual_container()
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
@property
def file_path(self):
"""
The path of this container.
- :rtype: string (path)
+ :rtype: string
"""
return self.__file_path
@file_path.setter
@@ -124,7 +118,7 @@ def container_format(self):
"""
The format of this container.
- :rtype: :class:`aeneas.container.ContainerFormat`
+ :rtype: :class:`~aeneas.container.ContainerFormat`
"""
return self.__container_format
@container_format.setter
@@ -138,8 +132,7 @@ def has_config_xml(self):
``False`` otherwise.
:rtype: bool
-
- :raise: see ``entries()``
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
return self.entry_config_xml is not None
@@ -150,9 +143,8 @@ def entry_config_xml(self):
of the XML config file in this container,
or ``None`` if not present.
- :rtype: string (path)
-
- :raise: see ``entries()``
+ :rtype: string
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
return self.find_entry(gc.CONFIG_XML_FILE_NAME, exact=False)
@@ -163,8 +155,7 @@ def has_config_txt(self):
``False`` otherwise.
:rtype: bool
-
- :raise: see ``entries()``
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
return self.entry_config_txt is not None
@@ -175,9 +166,8 @@ def entry_config_txt(self):
of the TXT config file in this container,
or ``None`` if not present.
- :rtype: string (path)
-
- :raise: see ``entries()``
+ :rtype: string
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
return self.find_entry(gc.CONFIG_TXT_FILE_NAME, exact=False)
@@ -188,16 +178,14 @@ def is_safe(self):
that is, if all its entries are safe, ``False`` otherwise.
:rtype: bool
-
- :raise: see ``entries()``
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
- self._log(u"Checking if this container is safe")
- entries = self.entries()
- for entry in entries:
+ self.log(u"Checking if this container is safe")
+ for entry in self.entries:
if not self.is_entry_safe(entry):
- self._log([u"This container is not safe: found unsafe entry '%s'", entry])
+ self.log([u"This container is not safe: found unsafe entry '%s'", entry])
return False
- self._log(u"This container is safe")
+ self.log(u"This container is safe")
return True
def is_entry_safe(self, entry):
@@ -210,27 +198,28 @@ def is_entry_safe(self, entry):
"""
normalized = os.path.normpath(entry)
if normalized.startswith(os.sep) or normalized.startswith(".." + os.sep):
- self._log([u"Entry '%s' is not safe", entry])
+ self.log([u"Entry '%s' is not safe", entry])
return False
- self._log([u"Entry '%s' is safe", entry])
+ self.log([u"Entry '%s' is safe", entry])
return True
+ @property
def entries(self):
"""
Return the sorted list of entries in this container,
each represented by its full path inside the container.
:rtype: list of strings (path)
-
- :raise TypeError: if this container does not exist
- :raise OSError: if an error occurred reading the given container (e.g., empty file, damaged file, etc.)
+ :raises: TypeError: if this container does not exist
+ :raises: OSError: if an error occurred reading the given container
+ (e.g., empty file, damaged file, etc.)
"""
- self._log(u"Getting entries")
+ self.log(u"Getting entries")
if not self.exists():
- raise TypeError("This container does not exist (wrong path?)")
+ self.log_exc(u"This container does not exist. Wrong path?", None, True, TypeError)
if self.actual_container is None:
- raise TypeError("The actual container object has not been set")
- return self.actual_container.entries()
+ self.log_exc(u"The actual container object has not been set", None, True, TypeError)
+ return self.actual_container.entries
def find_entry(self, entry, exact=True):
"""
@@ -246,32 +235,29 @@ def find_entry(self, entry, exact=True):
entry = "config.txt"
- might match: ::
-
- config.txt
- foo/config.txt (if exact = False)
- foo/bar/config.txt (if exact = False)
+ matches: ::
- :param entry: the entry name to be searched for
- :type entry: string (path)
- :param exact: look for the exact entry path
- :type exact: bool
- :rtype: string (path)
+ config.txt (if exact == True or exact == False)
+ foo/config.txt (if exact == False)
+ foo/bar/config.txt (if exact == False)
- :raise: see ``entries()``
+ :param string entry: the entry name to be searched for
+ :param bool exact: look for the exact entry path
+ :rtype: string
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
if exact:
- self._log([u"Finding entry '%s' with exact=True", entry])
- if entry in self.entries():
- self._log([u"Found entry '%s'", entry])
+ self.log([u"Finding entry '%s' with exact=True", entry])
+ if entry in self.entries:
+ self.log([u"Found entry '%s'", entry])
return entry
else:
- self._log([u"Finding entry '%s' with exact=False", entry])
- for ent in self.entries():
+ self.log([u"Finding entry '%s' with exact=False", entry])
+ for ent in self.entries:
if os.path.basename(ent) == entry:
- self._log([u"Found entry '%s'", ent])
+ self.log([u"Found entry '%s'", ent])
return ent
- self._log([u"Entry '%s' not found", entry])
+ self.log([u"Entry '%s' not found", entry])
return None
def read_entry(self, entry):
@@ -283,68 +269,63 @@ def read_entry(self, entry):
or it cannot be found.
:rtype: byte string
-
- :raise: see ``entries()``
+ :raises: same as :func:`~aeneas.container.Container.entries`
"""
if not self.is_entry_safe(entry):
- self._log([u"Accessing entry '%s' is not safe", entry])
+ self.log([u"Accessing entry '%s' is not safe", entry])
return None
- if entry not in self.entries():
- self._log([u"Entry '%s' not found in this container", entry])
+ if entry not in self.entries:
+ self.log([u"Entry '%s' not found in this container", entry])
return None
- self._log([u"Reading contents of entry '%s'", entry])
+ self.log([u"Reading contents of entry '%s'", entry])
try:
return self.actual_container.read_entry(entry)
except:
- self._log([u"An error occurred while reading the contents of '%s'", entry])
+ self.log([u"An error occurred while reading the contents of '%s'", entry])
return None
def decompress(self, output_path):
"""
Decompress the entire container into the given directory.
- :param output_path: path of the destination directory
- :type output_path: string (path)
-
- :raise TypeError: if this container does not exist
- :raise ValueError: if this container contains unsafe entries,
- or ``output_path`` is not an existing directory
- :raise OSError: if an error occurred decompressing the given container
- (e.g., empty file, damaged file, etc.)
+ :param string output_path: path of the destination directory
+ :raises: TypeError: if this container does not exist
+ :raises: ValueError: if this container contains unsafe entries,
+ or ``output_path`` is not an existing directory
+ :raises: OSError: if an error occurred decompressing the given container
+ (e.g., empty file, damaged file, etc.)
"""
- self._log([u"Decompressing the container into '%s'", output_path])
+ self.log([u"Decompressing the container into '%s'", output_path])
if not self.exists():
- raise TypeError("This container does not exist (wrong path?)")
+ self.log_exc(u"This container does not exist. Wrong path?", None, True, TypeError)
if self.actual_container is None:
- raise TypeError("The actual container object has not been set")
+ self.log_exc(u"The actual container object has not been set", None, True, TypeError)
if not gf.directory_exists(output_path):
- raise ValueError("The output_path is not an existing directory")
+ self.log_exc(u"The output path is not an existing directory", None, True, ValueError)
if not self.is_safe:
- raise ValueError("This container contains unsafe entries")
+ self.log_exc(u"This container contains unsafe entries", None, True, ValueError)
self.actual_container.decompress(output_path)
def compress(self, input_path):
"""
Compress the contents of the given directory.
- :param input_path: path of the input directory
- :type input_path: string (path)
-
- :raise TypeError: if the container path has not been set
- :raise ValueError: if ``input_path`` is not an existing directory
- :raise OSError: if an error occurred compressing the given container
- (e.g., empty file, damaged file, etc.)
+ :param string input_path: path of the input directory
+ :raises: TypeError: if the container path has not been set
+ :raises: ValueError: if ``input_path`` is not an existing directory
+ :raises: OSError: if an error occurred compressing the given container
+ (e.g., empty file, damaged file, etc.)
"""
- self._log([u"Compressing '%s' into this container", input_path])
+ self.log([u"Compressing '%s' into this container", input_path])
if self.file_path is None:
- raise TypeError("The container path has not been set")
+ self.log_exc(u"The container path has not been set", None, True, TypeError)
if self.actual_container is None:
- raise TypeError("The actual container object has not been set")
+ self.log_exc(u"The actual container object has not been set", None, True, TypeError)
if not gf.directory_exists(input_path):
- raise ValueError("The input_path is not an existing directory")
+ self.log_exc(u"The input path is not an existing directory", None, True, ValueError)
gf.ensure_parent_directory(input_path)
self.actual_container.compress(input_path)
@@ -364,62 +345,63 @@ def _set_actual_container(self):
If the container format is not specified,
infer it from the (lowercased) extension of the file path.
If the format cannot be inferred, it is assumed to be
- of type :class:`aeneas.container.ContainerFormat.UNPACKED`
+ of type :class:`~aeneas.container.ContainerFormat.UNPACKED`
(unpacked directory).
"""
- self._log(u"Setting actual container")
-
# infer container format
if self.container_format is None:
- self._log(u"Inferring actual container format")
+ self.log(u"Inferring actual container format...")
path_lowercased = self.file_path.lower()
- self._log([u"Lowercased file path: '%s'", path_lowercased])
+ self.log([u"Lowercased file path: '%s'", path_lowercased])
self.container_format = ContainerFormat.UNPACKED
for fmt in ContainerFormat.ALLOWED_FILE_VALUES:
if path_lowercased.endswith(fmt):
self.container_format = fmt
break
- self._log([u"Inferred format: '%s'", self.container_format])
+ self.log(u"Inferring actual container format... done")
+ self.log([u"Inferred format: '%s'", self.container_format])
# set the actual container
- self._log(u"Setting actual container")
+ self.log(u"Setting actual container...")
+ # TODO map this
if self.container_format == ContainerFormat.ZIP:
- self.actual_container = _ContainerZIP(self.file_path)
+ self.actual_container = _ContainerZIP(self.file_path, rconf=self.rconf, logger=self.logger)
elif self.container_format == ContainerFormat.EPUB:
- self.actual_container = _ContainerZIP(self.file_path)
+ self.actual_container = _ContainerZIP(self.file_path, rconf=self.rconf, logger=self.logger)
elif self.container_format == ContainerFormat.TAR:
- self.actual_container = _ContainerTAR(self.file_path, "")
+ self.actual_container = _ContainerTAR(self.file_path, "", rconf=self.rconf, logger=self.logger)
elif self.container_format == ContainerFormat.TAR_GZ:
- self.actual_container = _ContainerTAR(self.file_path, ":gz")
+ self.actual_container = _ContainerTAR(self.file_path, ":gz", rconf=self.rconf, logger=self.logger)
elif self.container_format == ContainerFormat.TAR_BZ2:
- self.actual_container = _ContainerTAR(self.file_path, ":bz2")
+ self.actual_container = _ContainerTAR(self.file_path, ":bz2", rconf=self.rconf, logger=self.logger)
elif self.container_format == ContainerFormat.UNPACKED:
- self.actual_container = _ContainerUnpacked(self.file_path)
- self._log([u"Actual container format: '%s'", self.container_format])
- self._log(u"Actual container set")
+ self.actual_container = _ContainerUnpacked(self.file_path, rconf=self.rconf, logger=self.logger)
+ self.log([u"Actual container format: '%s'", self.container_format])
+ self.log(u"Setting actual container... done")
-class _ContainerTAR(object):
+class _ContainerTAR(Loggable):
"""
A TAR container.
"""
TAG = u"ContainerTAR"
- def __init__(self, file_path, variant, logger=None):
+ def __init__(self, file_path, variant, rconf=None, logger=None):
+ super(_ContainerTAR, self).__init__(rconf=rconf, logger=logger)
self.file_path = file_path
self.variant = variant
- self.logger = logger or Logger()
+ @property
def entries(self):
try:
argument = "r" + self.variant
with tarfile.open(self.file_path, argument) as tar_file:
result = [e.name for e in tar_file.getmembers() if e.isfile()]
return sorted(result)
- except:
- raise OSError("Cannot read entries from TAR file")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entries from TAR file", exc, True, OSError)
def read_entry(self, entry):
try:
@@ -429,16 +411,16 @@ def read_entry(self, entry):
result = tar_entry.read()
tar_entry.close()
return result
- except:
- raise OSError("Cannot read entry from TAR file")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entry from TAR file", exc, True, OSError)
def decompress(self, output_path):
try:
argument = "r" + self.variant
with tarfile.open(self.file_path, argument) as tar_file:
tar_file.extractall(output_path)
- except:
- raise OSError("Cannot decompress TAR file")
+ except Exception as exc:
+ self.log_exc(u"Cannot decompress TAR file", exc, True, OSError)
def compress(self, input_path):
try:
@@ -451,27 +433,30 @@ def compress(self, input_path):
fullpath = os.path.join(root, f)
archive_name = os.path.join(archive_root, f)
tar_file.add(name=fullpath, arcname=archive_name)
- except:
- raise OSError("Cannot compress TAR File")
+ except Exception as exc:
+ self.log_exc(u"Cannot compress TAR File", exc, True, OSError)
+
+
-class _ContainerZIP(object):
+class _ContainerZIP(Loggable):
"""
A ZIP container.
"""
TAG = u"ContainerZIP"
- def __init__(self, file_path, logger=None):
+ def __init__(self, file_path, rconf=None, logger=None):
+ super(_ContainerZIP, self).__init__(rconf=rconf, logger=logger)
self.file_path = file_path
- self.logger = logger or Logger()
+ @property
def entries(self):
try:
with zipfile.ZipFile(self.file_path) as zip_file:
result = [e for e in zip_file.namelist() if not e.endswith("/")]
return sorted(result)
- except:
- raise OSError("Cannot read entries from ZIP file")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entries from ZIP file", exc, True, OSError)
def read_entry(self, entry):
try:
@@ -480,15 +465,15 @@ def read_entry(self, entry):
result = zip_entry.read()
zip_entry.close()
return result
- except:
- raise OSError("Cannot read entry from ZIP file")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entry from ZIP file", exc, True, OSError)
def decompress(self, output_path):
try:
with zipfile.ZipFile(self.file_path) as zip_file:
zip_file.extractall(output_path)
- except:
- raise OSError("Cannot decompress ZIP file")
+ except Exception as exc:
+ self.log_exc(u"Cannot decompress ZIP file", exc, True, OSError)
def compress(self, input_path):
try:
@@ -500,20 +485,23 @@ def compress(self, input_path):
fullpath = os.path.join(root, f)
archive_name = os.path.join(archive_root, f)
zip_file.write(fullpath, archive_name)
- except:
- raise OSError("Cannot compress ZIP file")
+ except Exception as exc:
+ self.log_exc(u"Cannot compress ZIP file", exc, True, OSError)
+
-class _ContainerUnpacked(object):
+
+class _ContainerUnpacked(Loggable):
"""
An unpacked container.
"""
TAG = u"ContainerUnpacked"
- def __init__(self, file_path, logger=None):
+ def __init__(self, file_path, rconf=None, logger=None):
+ super(_ContainerUnpacked, self).__init__(rconf=rconf, logger=logger)
self.file_path = file_path
- self.logger = logger or Logger()
+ @property
def entries(self):
try:
result = []
@@ -524,32 +512,32 @@ def entries(self):
relative_path = os.path.join(current_dir_abs, f)[root_len+1:]
result.append(relative_path)
return sorted(result)
- except:
- raise OSError("Cannot read entries from unpacked")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entries from unpacked", exc, True, OSError)
def read_entry(self, entry):
try:
with io.open(os.path.join(self.file_path, entry), "rb") as unpacked_entry:
result = unpacked_entry.read()
return result
- except:
- raise OSError("Cannot read entry from unpacked")
+ except Exception as exc:
+ self.log_exc(u"Cannot read entry from unpacked", exc, True, OSError)
def decompress(self, output_path):
try:
if os.path.abspath(output_path) == os.path.abspath(self.file_path):
return
gf.copytree(self.file_path, output_path)
- except:
- raise OSError("Cannot decompress unpacked")
+ except Exception as exc:
+ self.log_exc(u"Cannot decompress unpacked", exc, True, OSError)
def compress(self, input_path):
try:
if os.path.abspath(input_path) == os.path.abspath(self.file_path):
return
gf.copytree(input_path, self.file_path)
- except:
- raise OSError("Cannot compress unpacked")
+ except Exception as exc:
+ self.log_exc(u"Cannot compress unpacked", exc, True, OSError)
diff --git a/aeneas/cwave/000_compile_driver.sh b/aeneas/cwave/000_compile_driver.sh
new file mode 100644
index 00000000..4b24ca6d
--- /dev/null
+++ b/aeneas/cwave/000_compile_driver.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+gcc cwave_driver.c cwave_func.c cint.c -o cwave_driver -Wall -pedantic -std=c99
+
+
+
diff --git a/aeneas/cwave/100_run_driver.sh b/aeneas/cwave/100_run_driver.sh
new file mode 100644
index 00000000..37812bdd
--- /dev/null
+++ b/aeneas/cwave/100_run_driver.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+if [ ! -e cwave_driver ]
+then
+ bash 000_compile_driver.sh
+fi
+
+echo "Run 1"
+./cwave_driver
+echo ""
+
+echo "Run 2"
+./cwave_driver ../tools/res/audio.wav
+echo ""
+
+echo "Run 3"
+./cwave_driver ../tools/res/audio.wav 0 10
+echo ""
+
+echo "Run 4"
+./cwave_driver ../tools/res/audio.wav 5 5
+echo ""
+
+echo "Run 5"
+./cwave_driver ../tests/res/audioformats/mono.empty.wav
+./cwave_driver ../tests/res/audioformats/mono.invalid.wav
+./cwave_driver ../tests/res/audioformats/mono.zero.wav
+./cwave_driver ../tests/res/audioformats/mono.16000.wav
+./cwave_driver ../tests/res/audioformats/mono.22050.wav
+./cwave_driver ../tests/res/audioformats/mono.44100.wav
+./cwave_driver ../tests/res/audioformats/mono.48000.wav
+echo ""
+
diff --git a/aeneas/cwave/800_compile_py.sh b/aeneas/cwave/800_compile_py.sh
new file mode 100644
index 00000000..62de2877
--- /dev/null
+++ b/aeneas/cwave/800_compile_py.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+rm -rf build *.so
+python cwave_setup.py build_ext --inplace
+
diff --git a/aeneas/cwave/README.md b/aeneas/cwave/README.md
new file mode 100644
index 00000000..72daaca6
--- /dev/null
+++ b/aeneas/cwave/README.md
@@ -0,0 +1,22 @@
+# aeneas.cwave
+
+**aeneas.cwave** is a Python C extension to read WAVE files.
+
+## API
+
+See the [__init__.py](__init__.py) file.
+
+## Compiling the Python C extension locally
+
+```bash
+$ python cwave_setup.py build_ext --inplace
+```
+
+## Compiling the pure C driver program
+
+```bash
+$ bash 000_compile_driver.sh
+```
+
+
+
diff --git a/aeneas/cwave/__init__.py b/aeneas/cwave/__init__.py
index d2dca9bf..a435100a 100644
--- a/aeneas/cwave/__init__.py
+++ b/aeneas/cwave/__init__.py
@@ -2,7 +2,32 @@
# coding=utf-8
"""
-aeneas.cwave is a Python C extension to read WAVE files.
+aeneas.cwave is a Python C extension to read WAVE mono files.
+
+.. function:: cwave.get_audio_info(audio_file_path)
+
+ Read the sample rate and length of the given WAVE mono file.
+
+ The returned tuple ``(sr, length)`` contains
+ the sample rate and the number of samples
+ of the WAVE file.
+
+ :param string audio_file_path: the path of the WAVE file to be read, UTF-8 encoded
+ :rtype: tuple
+
+.. function:: cwave.read_audio_data(audio_file_path, from_sample, num_samples)
+
+ Read audio samples from the given WAVE mono file.
+
+ The returned tuple ``(sr, data)`` contains
+ the sample rate of the WAVE file,
+ and the samples read as a NumPy 1D array
+ of ``float64`` values in ``[-1.0, 1.0]``.
+
+ :param string audio_file_path: the path of the WAVE file to be read, UTF-8 encoded
+ :param int from_sample: index of the first sample to be read
+ :param int num_samples: number of samples to be read
+ :rtype: tuple
"""
__author__ = "Alberto Pettarin"
@@ -12,7 +37,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL 3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/cwave/cint.c b/aeneas/cwave/cint.c
new file mode 120000
index 00000000..8e1c9dae
--- /dev/null
+++ b/aeneas/cwave/cint.c
@@ -0,0 +1 @@
+../cint/cint.c
\ No newline at end of file
diff --git a/aeneas/cwave/cint.h b/aeneas/cwave/cint.h
new file mode 120000
index 00000000..27a6bb39
--- /dev/null
+++ b/aeneas/cwave/cint.h
@@ -0,0 +1 @@
+../cint/cint.h
\ No newline at end of file
diff --git a/aeneas/cwave/cwave_driver.c b/aeneas/cwave/cwave_driver.c
index 5164cdcc..b1790e9f 100644
--- a/aeneas/cwave/cwave_driver.c
+++ b/aeneas/cwave/cwave_driver.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for reading WAVE mono files.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,43 +9,44 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
*/
-//
-// this is a simple driver to test on the command line
-//
-// you can compile it with:
-//
-// $ gcc cwave_driver.c cwave_func.c -o cwave_driver
-//
-// use it as follows:
-//
-// ./cwave_driver audio.wav => print info about the WAVE file
-// ./cwave_driver audio.wav 0 100 => print the value of the first 100 samples, as (signed) double
-// ./cwave_driver audio.wav 25 75 => print the value of the samples with index (starting at 0) 25-99, as (signed) double
-//
-
#include
#include
#include
#include "cwave_func.h"
+#define DRIVER_SUCCESS 0
+#define DRIVER_FAILURE 1
+
+// print usage
+void _usage(const char *prog) {
+ printf("\n");
+ printf("Usage: $ %s AUDIO.wav [FROM_SAMPLE] [NUM_SAMPLES]\n", prog);
+ printf("\n");
+ printf("Example: %s ../tools/res/audio.wav\n", prog);
+ printf(" %s ../tools/res/audio.wav 0 100\n", prog);
+ printf(" %s ../tools/res/audio.wav 25 75\n", prog);
+ printf("\n");
+}
+
int main(int argc, char **argv) {
FILE *audio_file_ptr;
struct WAVE_INFO audio_info;
char *filename;
double *buffer;
double duration;
- unsigned int i, from_sample, num_samples;
+ uint32_t i, from_sample, num_samples; // a WAVE file cannot have more 2^32 samples
+ // parse arguments
if (argc < 2) {
- printf("\nUsage: $ %s AUDIO.wav [FROM_SAMPLE] [NUM_SAMPLES]\n\n", argv[0]);
- return 1;
+ _usage(argv[0]);
+ return DRIVER_FAILURE;
}
filename = argv[1];
from_sample = 0;
@@ -55,20 +56,24 @@ int main(int argc, char **argv) {
num_samples = atol(argv[3]);
}
- memset(&audio_info, 0, sizeof(audio_info));
- if (!(audio_file_ptr = wave_open(filename, &audio_info))) {
+ audio_file_ptr = wave_open(filename, &audio_info);
+ if (audio_file_ptr == NULL) {
printf("Error: cannot open file %s\n", filename);
- return 1;
+ return DRIVER_FAILURE;
}
duration = 1.0 * audio_info.coNumSamples / audio_info.leSampleRate;
if (num_samples > 0) {
buffer = (double *)calloc(num_samples, sizeof(double));
- if (!wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples)) {
+ if (buffer == NULL) {
+ printf("Error: cannot allocate buffer\n");
+ return DRIVER_FAILURE;
+ }
+ if (wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples) != CWAVE_SUCCESS) {
printf("Error: cannot read the specified range: %u %u\n", from_sample, num_samples);
free((void *)buffer);
buffer = NULL;
- return 1;
+ return DRIVER_FAILURE;
}
for (i = 0; i < num_samples; ++i) {
printf("%.12f\n", buffer[i]);
@@ -86,5 +91,5 @@ int main(int argc, char **argv) {
}
wave_close(audio_file_ptr);
- return 0;
+ return DRIVER_SUCCESS;
}
diff --git a/aeneas/cwave/cwave_func.c b/aeneas/cwave/cwave_func.c
index 4aa97ff2..5e717e99 100644
--- a/aeneas/cwave/cwave_func.c
+++ b/aeneas/cwave/cwave_func.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for reading WAVE mono files.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -23,207 +23,183 @@ __status__ = "Production"
static const int CWAVE_BUFFER_SIZE = 4096;
-// TODO make me faster and more portable
-// convert a little-endian buffer to big-endian as unsigned int
-// return the number of bytes read
-unsigned int _be_to_le_uint(unsigned char *buffer, const int length) {
- unsigned int ret;
-
- ret = 0;
+// convert a little-endian buffer to signed double
+static double _le_to_double(unsigned char *buffer, const uint32_t length) {
if (length == 1) {
- ret = buffer[0];
+ return ((double)le_s8_to_cpu(buffer)) / 128;
}
if (length == 2) {
- ret = buffer[0];
- ret |= ((buffer[1]) << 8);
+ return ((double)le_s16_to_cpu(buffer)) / 32768;
}
if (length == 4) {
- ret = buffer[0];
- ret |= ((buffer[1]) << 8);
- ret |= ((buffer[2]) << 16);
- ret |= ((buffer[3]) << 24);
+ return ((double)le_s32_to_cpu(buffer)) / 2147483648;
}
- return ret;
+ return 0.0;
}
-// TODO make me faster and more portable
-// convert a little-endian buffer to big-endian as unsigned int
-// return the number of bytes read
-int _be_to_le_int(unsigned char *buffer, const int length) {
- int ret;
-
- ret = 0;
- if (length == 1) {
- ret = buffer[0];
- ret = (ret << 24) >> 24;
- }
- if (length == 2) {
- ret = buffer[0];
- ret |= ((buffer[1]) << 8);
- ret = (ret << 16) >> 16;
- }
- if (length == 4) {
- ret = buffer[0];
- ret |= ((buffer[1]) << 8);
- ret |= ((buffer[2]) << 16);
- ret |= ((buffer[3]) << 24);
- }
- return ret;
-}
+// read a little-endian u16 field
+static int _read_le_u16_field(FILE *ptr, uint16_t *dest) {
+ unsigned char buffer[2];
-// TODO make me faster and more portable
-// convert a little-endian buffer to big-endian as signed double
-// return the number of bytes read
-double _be_to_le_double(unsigned char *buffer, const int length) {
- if (length == 1) {
- return ((double)_be_to_le_int(buffer, length)) / 128;
+ if (fread(buffer, 2, 1, ptr) != 1) {
+ return CWAVE_FAILURE;
}
- if (length == 2) {
- return ((double)_be_to_le_int(buffer, length)) / 32768;
- }
- if (length == 4) {
- return ((double)_be_to_le_int(buffer, length)) / 2147483648;
- }
- return 0;
+ *dest = le_u16_to_cpu(buffer);
+ return CWAVE_SUCCESS;
}
-// TODO make me faster and more portable
-// read a little-endian field and convert it to big-endian into an int
-// return the number of bytes read
-int _read_le_field(FILE *ptr, unsigned int *dest, const int length) {
- unsigned char buffer1[1];
- unsigned char buffer2[2];
- unsigned char buffer4[4];
- unsigned char *buffer;
- int read;
+// read a little-endian u32 field
+static int _read_le_u32_field(FILE *ptr, uint32_t *dest) {
+ unsigned char buffer[4];
- if (length == 1) {
- buffer = buffer1;
- } else if (length == 2) {
- buffer = buffer2;
- } else if (length == 4) {
- buffer = buffer4;
- } else {
- return 0;
+ if (fread(buffer, 4, 1, ptr) != 1) {
+ return CWAVE_FAILURE;
}
- read = fread(buffer, length, 1, ptr);
- *dest = _be_to_le_uint(buffer, length);
- return read;
+ *dest = le_u32_to_cpu(buffer);
+ return CWAVE_SUCCESS;
}
-// TODO make me faster and more portable
// read a big-endian field
-// return the number of bytes read
-int _read_be_field(FILE *ptr, char *dest, const int length) {
- return fread(dest, length, 1, ptr);
+static int _read_be_field(FILE *ptr, char *dest, const int length) {
+ if (fread(dest, length, 1, ptr) != 1) {
+ return CWAVE_FAILURE;
+ }
+ return CWAVE_SUCCESS;
}
-// find the "match" chunk, and store its size in size
-// return 1 on success or 0 on failure
-int _seek_to_chunk(FILE *ptr, struct WAVE_INFO *header, const char *match, unsigned int *size) {
+// find the "match" chunk, and store its size in "size"
+static int _seek_to_chunk(FILE *ptr, struct WAVE_INFO *header, const char *match, uint32_t *size) {
char buffer4[4];
- unsigned int chunk_size;
- const unsigned int max_pos = (*header).leChunkSize + 8; // max pos in file
+ uint32_t chunk_size;
+ const uint32_t max_pos = (*header).leChunkSize + 8; // max pos in file
rewind(ptr);
chunk_size = 12; // skip first 12 bytes
- while(ftell(ptr) + chunk_size + 8 < max_pos) {
+ while((ftell(ptr) >= 0) && (ftell(ptr) + chunk_size + 8 < max_pos)) {
+ // seek to the next chunk
if (fseek(ptr, chunk_size, SEEK_CUR) != 0) {
- return 0;
+ return CWAVE_FAILURE;
}
- if (_read_be_field(ptr, buffer4, 4) != 1) {
- return 0;
+ // read the chunk description
+ if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) {
+ return CWAVE_FAILURE;
}
- if (_read_le_field(ptr, &chunk_size, 4) != 1) {
- return 0;
+ // read the chunk size
+ if (_read_le_u32_field(ptr, &chunk_size) != CWAVE_SUCCESS) {
+ return CWAVE_FAILURE;
}
+ // compare the chunk description with the desired string
if (memcmp(buffer4, match, 4) == 0) {
*size = chunk_size;
- return 1;
+ return CWAVE_SUCCESS;
}
}
- return 0;
+ return CWAVE_FAILURE;
}
-// parse the header
-// it assumes the given file is a RIFF WAVE file
+// open a WAVE mono file and read header info
+// the header is always initialized to zero
FILE *wave_open(const char *path, struct WAVE_INFO *header) {
FILE *ptr;
char buffer4[4];
struct WAVE_INFO h;
+ // initialize header
+ memset(header, 0, sizeof(*header));
+
// open file
if (path == NULL) {
- printf("Error: path is NULL\n");
+ //printf("Error: path is NULL\n");
return NULL;
}
ptr = fopen(path, "rb");
if (ptr == NULL) {
- printf("Error: unable to open input file %s\n", path);
+ //printf("Error: unable to open input file %s\n", path);
return NULL;
}
// read first 12 bytes: RIFF header.leChunkSize WAVE
rewind(ptr);
- if (_read_be_field(ptr, buffer4, 4) != 1) {
- printf("Error: cannot read beChunkID\n");
+ if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read beChunkID\n");
return NULL;
}
if (memcmp(buffer4, "RIFF", 4) != 0) {
- printf("Error: beChunkID is not RIFF\n");
+ //printf("Error: beChunkID is not RIFF\n");
return NULL;
}
- if (_read_le_field(ptr, &h.leChunkSize, 4) != 1) {
- printf("Error: cannot read leChunkSize\n");
+ if (_read_le_u32_field(ptr, &h.leChunkSize) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leChunkSize\n");
return NULL;
}
- //printf("leChunkSize: %d\n", header.leChunkSize);
- if (_read_be_field(ptr, buffer4, 4) != 1) {
- printf("Error: cannot read beFormat\n");
- return 0;
+ if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read beFormat\n");
+ return NULL;
}
if (memcmp(buffer4, "WAVE", 4) != 0) {
- printf("Error: beFormat is not WAVE\n");
+ //printf("Error: beFormat is not WAVE\n");
return NULL;
}
// locate the fmt chunk
- if (! _seek_to_chunk(ptr, &h, "fmt ", &h.leSubchunkFmtSize)) {
- printf("Error: cannot locate fmt chunk\n");
+ if (_seek_to_chunk(ptr, &h, "fmt ", &h.leSubchunkFmtSize) != CWAVE_SUCCESS) {
+ //printf("Error: cannot locate fmt chunk\n");
return NULL;
}
if (h.leSubchunkFmtSize < 16) {
- printf("Error: fmt chunk has length < 16\n");
+ //printf("Error: fmt chunk has length < 16\n");
return NULL;
}
- _read_le_field(ptr, &h.leAudioFormat, 2);
- _read_le_field(ptr, &h.leNumChannels, 2);
- _read_le_field(ptr, &h.leSampleRate, 4);
- _read_le_field(ptr, &h.leByteRate, 4);
- _read_le_field(ptr, &h.leBlockAlign, 2);
- _read_le_field(ptr, &h.leBitsPerSample, 2);
+
+ // read fields
+ if (_read_le_u16_field(ptr, &h.leAudioFormat) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leAudioFormat\n");
+ return NULL;
+ }
+ // NOTE we fail here because we are only interested in PCM files!
if (h.leAudioFormat != WAVE_FORMAT_PCM) {
- printf("Error: leAudioFormat is not PCM\n");
+ //printf("Error: leAudioFormat is not PCM\n");
+ return NULL;
+ }
+ if (_read_le_u16_field(ptr, &h.leNumChannels) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leNumChannels\n");
return NULL;
}
+ // NOTE we fail here because we are only interested in mono files!
if (h.leNumChannels != WAVE_CHANNELS_MONO) {
- printf("Error: leNumChannels is not 1\n");
+ //printf("Error: leNumChannels is not 1\n");
+ return NULL;
+ }
+ if (_read_le_u32_field(ptr, &h.leSampleRate) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leSampleRate\n");
+ return NULL;
+ }
+ if (_read_le_u32_field(ptr, &h.leByteRate) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leByteRate\n");
+ return NULL;
+ }
+ if (_read_le_u16_field(ptr, &h.leBlockAlign) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leBlockAlign\n");
+ return NULL;
+ }
+ if (_read_le_u16_field(ptr, &h.leBitsPerSample) != CWAVE_SUCCESS) {
+ //printf("Error: cannot read leBitsPerSample\n");
return NULL;
}
// locate the data chunk
- if (! _seek_to_chunk(ptr, &h, "data", &h.leSubchunkDataSize)) {
- printf("Error: cannot locate data chunk\n");
+ if (_seek_to_chunk(ptr, &h, "data", &h.leSubchunkDataSize) != CWAVE_SUCCESS) {
+ //printf("Error: cannot locate data chunk\n");
return NULL;
}
if (h.leSubchunkDataSize == 0) {
- printf("Error: data chunk has length zero\n");
+ //printf("Error: data chunk has length zero\n");
return NULL;
}
// here ptr is at the beginnig of the data info
- h.coSubchunkDataStart = ftell(ptr);
+ h.coSubchunkDataStart = (uint32_t)ftell(ptr);
// compute number of samples
h.coNumSamples = (h.leSubchunkDataSize / (h.leNumChannels * h.leBitsPerSample / 8));
// compute number of bytes/sample (single channel)
@@ -231,12 +207,12 @@ FILE *wave_open(const char *path, struct WAVE_INFO *header) {
// max byte position
h.coMaxDataPosition = h.coSubchunkDataStart + h.leSubchunkDataSize;
- // copy h into header and return success
+ // copy h into header and return the pointer to the audio file
*header = h;
return ptr;
}
-// close file
+// close a WAVE mono file previously open
int wave_close(FILE *ptr) {
int ret;
@@ -245,23 +221,22 @@ int wave_close(FILE *ptr) {
return ret;
}
-// read number_samples samples, starting from sample with index from_sample
-// and save them as doubles into dest
+// read samples from an open WAVE mono file
int wave_read_double(
FILE *ptr,
struct WAVE_INFO *header,
double *dest,
- const unsigned int from_sample,
- const unsigned int number_samples
+ const uint32_t from_sample,
+ const uint32_t number_samples
) {
unsigned char *buffer;
- unsigned int target_pos;
- unsigned int i, j, read, remaining;
- const unsigned int bytes_per_sample = (*header).coBytesPerSample;
+ uint32_t target_pos;
+ const uint32_t bytes_per_sample = (*header).coBytesPerSample;
+ uint32_t i, j, read, remaining;
if (from_sample + number_samples > (*header).coNumSamples) {
- printf("Error: attempted reading outside data\n");
- return 0;
+ //printf("Error: attempted reading outside data\n");
+ return CWAVE_FAILURE;
}
target_pos = (*header).coSubchunkDataStart + bytes_per_sample * from_sample;
@@ -279,14 +254,14 @@ int wave_read_double(
read = fread(buffer, bytes_per_sample, remaining, ptr);
}
for (i = 0; i < read; ++i) {
- dest[j++] = _be_to_le_double(buffer + i * bytes_per_sample, bytes_per_sample);
+ dest[j++] = _le_to_double(buffer + i * bytes_per_sample, bytes_per_sample);
}
remaining -= read;
}
free((void *)buffer);
buffer = NULL;
- return 1;
+ return CWAVE_SUCCESS;
}
diff --git a/aeneas/cwave/cwave_func.h b/aeneas/cwave/cwave_func.h
index adfc14bd..ae429924 100644
--- a/aeneas/cwave/cwave_func.h
+++ b/aeneas/cwave/cwave_func.h
@@ -1,6 +1,6 @@
/*
-Python C Extension for computing the MFCC
+Python C Extension for reading WAVE mono files.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,15 +9,16 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
*/
-// NOTE: using unsigned int as it is 32-bit wide on all modern architectures
-// not using uint32_t because the MS C compiler does not have
-// or, at least, it is not easy to use it
+#include "cint.h"
+
+#define CWAVE_SUCCESS 0
+#define CWAVE_FAILURE 1
enum {
WAVE_FORMAT_PCM = 0x0001, // PCM
@@ -31,35 +32,46 @@ enum {
};
struct WAVE_INFO {
- // be = big endian
- // le = little endian
-
- // read
- unsigned int leChunkSize; // (size of the whole file in bytes - 8)
- unsigned int leSubchunkFmtSize; // (size of the subchunk 1 in bytes - 4)
- unsigned int leAudioFormat; // one of the WAVE_FORMAT_* values
- unsigned int leNumChannels; // number of channels (1 = mono, 2 = stereo)
- unsigned int leSampleRate; // samples per second (e.g. 48000, 44100, 22050, 16000, 8000)
- unsigned int leByteRate; // leSampleRate * leNumChannels * leBitsPerSample/8 => data bytes/s
- unsigned int leBlockAlign; // beNumChannels * beBitsPerSample/8 => bytes/sample, including all channels
- unsigned int leBitsPerSample; // number of bits per sample (e.g., 8, 16, 32)
- unsigned int leSubchunkDataSize; // leNumSamples * leNumChannels * leBitsPerSample/8 => data bytes
+ // be = big endian in file => converted into cpu endianness
+ // le = little endian in file => converted into cpu endianness
+ // co = computed, always in cpu endianness
+
+ // first 12 bytes
+ //uint32_t beChunkID; // string 'RIFF'
+ uint32_t leChunkSize; // (size of the whole file in bytes - 8)
+ //uint32_t beFormat; // string 'WAVE'
+
+ // then, we have at least the SubchunkFmt and SubchunkData
+ // in any order, and other kinds of Subchunk can be present as well
+ uint32_t leSubchunkFmtSize; // (size of the subchunk 1 in bytes - 4)
+ uint16_t leAudioFormat; // one of the WAVE_FORMAT_* values
+ uint16_t leNumChannels; // number of channels (1 = mono, 2 = stereo)
+ uint32_t leSampleRate; // samples per second (e.g. 48000, 44100, 22050, 16000, 8000)
+ uint32_t leByteRate; // leSampleRate * leNumChannels * leBitsPerSample/8 => data bytes/s
+ uint16_t leBlockAlign; // leNumChannels * leBitsPerSample/8 => bytes/sample, including all channels
+ uint16_t leBitsPerSample; // number of bits per sample (e.g., 8, 16, 32)
+ uint32_t leSubchunkDataSize; // leNumSamples * leNumChannels * leBitsPerSample/8 => data bytes
// computed
- unsigned int coNumSamples; // number of samples
- unsigned int coSubchunkDataStart; // byte at which the data chunk starts
- unsigned int coBytesPerSample; // leBitsPerSample / 8 => bytes/sample (single channel)
- unsigned int coMaxDataPosition; // coSubchunkDataStart + leSubchunkDataSize => max byte position of data
+ uint32_t coNumSamples; // number of samples
+ uint32_t coSubchunkDataStart; // byte at which the data chunk starts
+ uint32_t coBytesPerSample; // leBitsPerSample / 8 => bytes/sample (single channel)
+ uint32_t coMaxDataPosition; // coSubchunkDataStart + leSubchunkDataSize => max byte position of data
};
+// open a WAVE mono file and read header info
FILE *wave_open(const char *path, struct WAVE_INFO *audio_info);
+
+// close an open WAVE mono file
int wave_close(FILE *audio_file_ptr);
+
+// read samples from an open WAVE mono file
int wave_read_double(
FILE *audio_file_ptr,
struct WAVE_INFO *audio_info,
double *dest,
- const unsigned int from_sample,
- const unsigned int number_samples
+ const uint32_t from_sample,
+ const uint32_t number_samples
);
diff --git a/aeneas/cwave/cwave_py.c b/aeneas/cwave/cwave_py.c
index 6d265bb2..06493adf 100644
--- a/aeneas/cwave/cwave_py.c
+++ b/aeneas/cwave/cwave_py.c
@@ -1,6 +1,6 @@
/*
-Python C Extension for reading WAVE files
+Python C Extension for reading WAVE mono files.
__author__ = "Alberto Pettarin"
__copyright__ = """
@@ -9,7 +9,7 @@ __copyright__ = """
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -31,10 +31,9 @@ __status__ = "Production"
static PyObject *get_audio_info(PyObject *self, PyObject *args) {
PyObject *tuple;
char *audio_file_path;
-
FILE *audio_file_ptr;
struct WAVE_INFO audio_info;
- unsigned int sample_rate, total_samples;
+ uint32_t sample_rate, total_samples; // a WAVE file cannot have more than 2^32 samples
// s = string
if (!PyArg_ParseTuple(args, "s", &audio_file_path)) {
@@ -42,8 +41,8 @@ static PyObject *get_audio_info(PyObject *self, PyObject *args) {
return NULL;
}
- memset(&audio_info, 0, sizeof(audio_info));
- if (!(audio_file_ptr = wave_open(audio_file_path, &audio_info))) {
+ audio_file_ptr = wave_open(audio_file_path, &audio_info);
+ if (audio_file_ptr == NULL) {
PyErr_SetString(PyExc_ValueError, "Error while opening the WAVE file");
return NULL;
}
@@ -63,12 +62,11 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) {
PyArrayObject *audio_data;
npy_intp audio_data_dimensions[1];
char *audio_file_path;
- unsigned int from_sample, num_samples;
-
FILE *audio_file_ptr;
struct WAVE_INFO audio_info;
- unsigned int sample_rate, total_samples;
- double *buffer;
+ uint32_t from_sample, num_samples, total_samples; // a WAVE file cannot have more than 2^32 samples
+ uint32_t sample_rate; // sample_rate is a uint32_t in the WAVE header
+ double *buffer; // this buffer will store the data read
// s = string
// I = unsigned int
@@ -77,8 +75,8 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) {
return NULL;
}
- memset(&audio_info, 0, sizeof(audio_info));
- if (!(audio_file_ptr = wave_open(audio_file_path, &audio_info))) {
+ audio_file_ptr = wave_open(audio_file_path, &audio_info);
+ if (audio_file_ptr == NULL) {
PyErr_SetString(PyExc_ValueError, "Error while opening the WAVE file");
return NULL;
}
@@ -93,7 +91,11 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) {
return NULL;
}
buffer = (double *)calloc(num_samples, sizeof(double));
- wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples);
+ if (wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples) != CWAVE_SUCCESS) {
+ wave_close(audio_file_ptr);
+ PyErr_SetString(PyExc_ValueError, "Error while reading WAVE data: unable to read data");
+ return NULL;
+ }
wave_close(audio_file_ptr);
// build the array to be returned
@@ -114,13 +116,19 @@ static PyMethodDef cwave_methods[] = {
"get_audio_info",
get_audio_info,
METH_VARARGS,
- "Get information about a WAVE file"
+ "Get information about a WAVE file\n"
+ ":param string audio_file_path: the file path of the audio file\n"
+ ":rtype: tuple (sample_rate, num_samples)"
},
{
"read_audio_data",
read_audio_data,
METH_VARARGS,
- "Get audio data from a WAVE file"
+ "Get audio data from a WAVE file\n"
+ ":param string audio_file_path: the file path of the audio file\n"
+ ":param uint from_sample: read from this sample index\n"
+ ":param uint num_samples: read this many samples\n"
+ ":rtype: tuple (sample_rate, list) where list is a list of float values, one per sample"
},
{
NULL,
diff --git a/aeneas/cwave/cwave_setup.py b/aeneas/cwave/cwave_setup.py
index 0586b2e8..513a23ed 100644
--- a/aeneas/cwave/cwave_setup.py
+++ b/aeneas/cwave/cwave_setup.py
@@ -23,15 +23,15 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-CMODULE = Extension("cwave", sources=["cwave_py.c", "cwave_func.c"], include_dirs=[get_include()])
+CMODULE = Extension("cwave", sources=["cwave_py.c", "cwave_func.c", "cint.c"], include_dirs=[get_include()])
setup(
name="cwave",
- version="1.4.1",
+ version="1.5.0",
description="""
Python C Extension for for reading WAVE files.
""",
diff --git a/aeneas/diagnostics.py b/aeneas/diagnostics.py
index 1b659451..f2b571b6 100644
--- a/aeneas/diagnostics.py
+++ b/aeneas/diagnostics.py
@@ -2,12 +2,16 @@
# coding=utf-8
"""
-Check whether the setup of aeneas was successful.
+This module contains the following classes:
-Running the checks in this class makes sense only
-if you git-cloned the original GitHub repository
-and/or if you are interested in contributing to the
-development of aeneas.
+* :class:`~aeneas.diagnostics.Diagnostics`,
+ checking whether the setup of ``aeneas`` was successful.
+
+This module can be executed from command line with::
+
+ python -m aeneas.diagnostics
+
+.. versionadded:: 1.4.1
"""
from __future__ import absolute_import
@@ -23,46 +27,20 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
SETUP_COMMAND = u"'python setup.py build_ext --inplace'"
-ANSI_ERROR = u"\033[91m"
-ANSI_OK = u"\033[92m"
-ANSI_WARNING = u"\033[93m"
-ANSI_END = u"\033[0m"
-
-def print_error(msg):
- if gf.is_posix():
- print(u"%s[ERRO] %s%s" % (ANSI_ERROR, msg, ANSI_END))
- else:
- print(u"[ERRO] %s" % (msg))
-
-def print_info(msg):
- print(u"[INFO] %s" % (msg))
-
-def print_success(msg):
- if gf.is_posix():
- print(u"%s[INFO] %s%s" % (ANSI_OK, msg, ANSI_END))
- else:
- print(u"[INFO] %s" % (msg))
-
-def print_warning(msg):
- if gf.is_posix():
- print(u"%s[WARN] %s%s" % (ANSI_WARNING, msg, ANSI_END))
- else:
- print(u"[WARN] %s" % (msg))
-
class Diagnostics(object):
"""
- Check whether the setup of aeneas was successful.
+ Check whether the setup of ``aeneas`` was successful.
"""
@classmethod
def check_shell_encoding(cls):
"""
- Check whether the shell (sys.stdin and sys.stdout) is UTF-8 encoded.
+ Check whether ``sys.stdin`` and ``sys.stdout`` are UTF-8 encoded.
Return ``True`` on failure and ``False`` on success.
@@ -75,25 +53,25 @@ def check_shell_encoding(cls):
if sys.stdout.encoding not in ["UTF-8", "UTF8"]:
is_out_utf8 = False
if (is_in_utf8) and (is_out_utf8):
- print_success(u"shell encoding OK")
+ gf.print_success(u"shell encoding OK")
else:
- print_warning(u"shell encoding WARNING")
+ gf.print_warning(u"shell encoding WARNING")
if not is_in_utf8:
- print_warning(u" The default input encoding of your shell is not UTF-8")
+ gf.print_warning(u" The default input encoding of your shell is not UTF-8")
if not is_out_utf8:
- print_warning(u" The default output encoding of your shell is not UTF-8")
- print_info(u" If you plan to use aeneas on the command line,")
+ gf.print_warning(u" The default output encoding of your shell is not UTF-8")
+ gf.print_info(u" If you plan to use aeneas on the command line,")
if gf.is_posix():
- print_info(u" you might want to 'export PYTHONIOENCODING=UTF-8' in your shell")
+ gf.print_info(u" you might want to 'export PYTHONIOENCODING=UTF-8' in your shell")
else:
- print_info(u" you might want to 'set PYTHONIOENCODING=UTF-8' in your shell")
- return True
+ gf.print_info(u" you might want to 'set PYTHONIOENCODING=UTF-8' in your shell")
+ return True
return False
@classmethod
def check_ffprobe(cls):
"""
- Check whether ffprobe can be called.
+ Check whether ``ffprobe`` can be called.
Return ``True`` on failure and ``False`` on success.
@@ -101,24 +79,23 @@ def check_ffprobe(cls):
"""
try:
from aeneas.ffprobewrapper import FFPROBEWrapper
- import aeneas.globalfunctions as gf
file_path = gf.absolute_path(u"tools/res/audio.mp3", __file__)
prober = FFPROBEWrapper()
properties = prober.read_properties(file_path)
- print_success(u"ffprobe OK")
+ gf.print_success(u"ffprobe OK")
return False
except:
pass
- print_error(u"ffprobe ERROR")
- print_info(u" Please make sure you have ffprobe installed correctly")
- print_info(u" (usually it is provided by the ffmpeg installer)")
- print_info(u" and that its path is in your PATH environment variable")
+ gf.print_error(u"ffprobe ERROR")
+ gf.print_info(u" Please make sure you have ffprobe installed correctly")
+ gf.print_info(u" (usually it is provided by the ffmpeg installer)")
+ gf.print_info(u" and that its path is in your PATH environment variable")
return True
@classmethod
def check_ffmpeg(cls):
"""
- Check whether ffmpeg can be called.
+ Check whether ``ffmpeg`` can be called.
Return ``True`` on failure and ``False`` on success.
@@ -126,26 +103,25 @@ def check_ffmpeg(cls):
"""
try:
from aeneas.ffmpegwrapper import FFMPEGWrapper
- import aeneas.globalfunctions as gf
input_file_path = gf.absolute_path(u"tools/res/audio.mp3", __file__)
handler, output_file_path = gf.tmp_file(suffix=u".wav")
converter = FFMPEGWrapper()
result = converter.convert(input_file_path, output_file_path)
gf.delete_file(handler, output_file_path)
if result:
- print_success(u"ffmpeg OK")
+ gf.print_success(u"ffmpeg OK")
return False
except:
pass
- print_error(u"ffmpeg ERROR")
- print_info(u" Please make sure you have ffmpeg installed correctly")
- print_info(u" and that its path is in your PATH environment variable")
+ gf.print_error(u"ffmpeg ERROR")
+ gf.print_info(u" Please make sure you have ffmpeg installed correctly")
+ gf.print_info(u" and that its path is in your PATH environment variable")
return True
@classmethod
def check_espeak(cls):
"""
- Check whether espeak can be called.
+ Check whether ``espeak`` can be called.
Return ``True`` on failure and ``False`` on success.
@@ -153,10 +129,8 @@ def check_espeak(cls):
"""
try:
from aeneas.espeakwrapper import ESPEAKWrapper
- from aeneas.language import Language
- import aeneas.globalfunctions as gf
text = u"From fairest creatures we desire increase,"
- language = Language.EN
+ language = u"eng"
handler, output_file_path = gf.tmp_file(suffix=u".wav")
espeak = ESPEAKWrapper()
result = espeak.synthesize_single(
@@ -166,21 +140,21 @@ def check_espeak(cls):
)
gf.delete_file(handler, output_file_path)
if result:
- print_success(u"espeak OK")
+ gf.print_success(u"espeak OK")
return False
except:
pass
- print_error(u"espeak ERROR")
- print_info(u" Please make sure you have espeak installed correctly")
- print_info(u" and that its path is in your PATH environment variable")
- print_info(u" You might also want to check that the espeak-data directory")
- print_info(u" is set up correctly, for example, it has the correct permissions")
+ gf.print_error(u"espeak ERROR")
+ gf.print_info(u" Please make sure you have espeak installed correctly")
+ gf.print_info(u" and that its path is in your PATH environment variable")
+ gf.print_info(u" You might also want to check that the espeak-data directory")
+ gf.print_info(u" is set up correctly, for example, it has the correct permissions")
return True
@classmethod
def check_tools(cls):
"""
- Check whether aeneas.tools.* can be imported.
+ Check whether ``aeneas.tools.*`` can be imported.
Return ``True`` on failure and ``False`` on success.
@@ -188,85 +162,87 @@ def check_tools(cls):
"""
try:
from aeneas.tools.convert_syncmap import ConvertSyncMapCLI
- from aeneas.tools.download import DownloadCLI
- from aeneas.tools.espeak_wrapper import ESPEAKWrapperCLI
+ # disabling this check, as it contains optional dependency pafy
+ #from aeneas.tools.download import DownloadCLI
from aeneas.tools.execute_job import ExecuteJobCLI
from aeneas.tools.execute_task import ExecuteTaskCLI
from aeneas.tools.extract_mfcc import ExtractMFCCCLI
from aeneas.tools.ffmpeg_wrapper import FFMPEGWrapperCLI
from aeneas.tools.ffprobe_wrapper import FFPROBEWrapperCLI
+ # disabling this check, as it contains optional dependency Pillow
+ #from aeneas.tools.plot_waveform import PlotWaveformCLI
from aeneas.tools.read_audio import ReadAudioCLI
from aeneas.tools.read_text import ReadTextCLI
from aeneas.tools.run_sd import RunSDCLI
from aeneas.tools.run_vad import RunVADCLI
from aeneas.tools.synthesize_text import SynthesizeTextCLI
from aeneas.tools.validate import ValidateCLI
- print_success(u"aeneas.tools OK")
+ gf.print_success(u"aeneas.tools OK")
return False
except:
pass
- print_error(u"aeneas.tools ERROR")
- print_info(u" Unable to import one or more aeneas.tools")
- print_info(u" Please check that you installed aeneas properly")
+ gf.print_error(u"aeneas.tools ERROR")
+ gf.print_info(u" Unable to import one or more aeneas.tools")
+ gf.print_info(u" Please check that you installed aeneas properly")
return True
@classmethod
def check_cdtw(cls):
"""
- Check whether Python C extension cdtw can be imported.
+ Check whether Python C extension ``cdtw`` can be imported.
Return ``True`` on failure and ``False`` on success.
:rtype: bool
"""
if gf.can_run_c_extension("cdtw"):
- print_success(u"aeneas.cdtw COMPILED")
+ gf.print_success(u"aeneas.cdtw COMPILED")
return False
- print_warning(u"aeneas.cdtw NOT COMPILED")
- print_info(u" You can still run aeneas but it will be significantly slower")
- print_info(u" To compile the cdtw module, run %s" % SETUP_COMMAND)
+ gf.print_warning(u"aeneas.cdtw NOT COMPILED")
+ gf.print_info(u" You can still run aeneas but it will be significantly slower")
+ gf.print_info(u" To compile the cdtw module, run %s" % SETUP_COMMAND)
return True
@classmethod
def check_cmfcc(cls):
"""
- Check whether Python C extension cmfcc can be imported.
+ Check whether Python C extension ``cmfcc`` can be imported.
Return ``True`` on failure and ``False`` on success.
:rtype: bool
"""
if gf.can_run_c_extension("cmfcc"):
- print_success(u"aeneas.cmfcc COMPILED")
+ gf.print_success(u"aeneas.cmfcc COMPILED")
return False
- print_warning(u"aeneas.cmfcc NOT COMPILED")
- print_info(u" You can still run aeneas but it will be significantly slower")
- print_info(u" To compile the cmfcc module, run %s" % SETUP_COMMAND)
+ gf.print_warning(u"aeneas.cmfcc NOT COMPILED")
+ gf.print_info(u" You can still run aeneas but it will be significantly slower")
+ gf.print_info(u" To compile the cmfcc module, run %s" % SETUP_COMMAND)
return True
@classmethod
def check_cew(cls):
"""
- Check whether Python C extension cew can be imported.
+ Check whether Python C extension ``cew`` can be imported.
Return ``True`` on failure and ``False`` on success.
For those OSes where ``cew`` is not available,
- print a warning but also return ``False`` (success).
+ print a warning and return ``False`` (success).
:rtype: bool
"""
if not gf.is_linux():
- print_warning(u"cew NOT AVAILABLE")
- print_info(u" The Python C Extension cew is not available for your OS")
- print_info(u" You can still run aeneas but it will be a bit slower (than Linux)")
+ gf.print_warning(u"aeneas.cew NOT AVAILABLE")
+ gf.print_info(u" The Python C Extension cew is not available for your OS")
+ gf.print_info(u" You can still run aeneas but it will be a bit slower (than Linux)")
return False
if gf.can_run_c_extension("cew"):
- print_success(u"aeneas.cew COMPILED")
+ gf.print_success(u"aeneas.cew COMPILED")
return False
- print_warning(u"aeneas.cew NOT COMPILED")
- print_info(u" You can still run aeneas but it will be a bit slower")
- print_info(u" To compile the cew module, run %s" % SETUP_COMMAND)
+ gf.print_warning(u"aeneas.cew NOT COMPILED")
+ gf.print_info(u" You can still run aeneas but it will be a bit slower")
+ gf.print_info(u" To compile the cew module, run %s" % SETUP_COMMAND)
return True
@classmethod
@@ -276,12 +252,9 @@ def check_all(cls, tools=True, encoding=True, c_ext=True):
Return a tuple of booleans ``(errors, warnings, c_ext_warnings)``.
- :param tools: if ``True``, check aeneas tools
- :type tools: bool
- :param encoding: if ``True``, check shell encoding
- :type encoding: bool
- :param c_ext: if ``True``, check Python C extensions
- :type c_ext: bool
+ :param bool tools: if ``True``, check aeneas tools
+ :param bool encoding: if ``True``, check shell encoding
+ :param bool c_ext: if ``True``, check Python C extensions
:rtype: (bool, bool, bool)
"""
# errors are fatal
@@ -293,20 +266,17 @@ def check_all(cls, tools=True, encoding=True, c_ext=True):
return (True, False, False)
if (tools) and (cls.check_tools()):
return (True, False, False)
-
# warnings are non-fatal
warnings = False
c_ext_warnings = False
-
if encoding:
warnings = cls.check_shell_encoding()
-
if c_ext:
# we do not want lazy evaluation
c_ext_warnings = cls.check_cdtw() or c_ext_warnings
c_ext_warnings = cls.check_cmfcc() or c_ext_warnings
c_ext_warnings = cls.check_cew() or c_ext_warnings
-
+ # return results
return (False, warnings, c_ext_warnings)
@@ -315,15 +285,11 @@ def main():
errors, warnings, c_ext_warnings = Diagnostics.check_all()
if errors:
sys.exit(1)
- #print_info(u"")
if c_ext_warnings:
- print_warning(u"All required dependencies are met but at least one available Python C extension is not compiled")
- #print_info(u"You can still run aeneas but it will be slower")
- #print_info(u"Enjoy running aeneas!")
+ gf.print_warning(u"All required dependencies are met but at least one available Python C extension is not compiled")
sys.exit(2)
else:
- print_success(u"All required dependencies are met and all available Python C extensions are compiled")
- #print_info(u"Enjoy running aeneas!")
+ gf.print_success(u"All required dependencies are met and all available Python C extensions are compiled")
sys.exit(0)
diff --git a/aeneas/downloader.py b/aeneas/downloader.py
index b231248d..3436b46e 100644
--- a/aeneas/downloader.py
+++ b/aeneas/downloader.py
@@ -2,13 +2,17 @@
# coding=utf-8
"""
-Download files from various Web sources.
+This module contains the following classes:
+
+* :class:`~aeneas.downloader.Downloader`, which download files from various Web sources.
+
+.. note:: This module requires Python modules ``youtube-dl`` and ``pafy`` (``pip install youtube-dl pafy``).
"""
from __future__ import absolute_import
from __future__ import print_function
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
import aeneas.globalfunctions as gf
@@ -19,31 +23,22 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class Downloader(object):
+class Downloader(Loggable):
"""
Download files from various Web sources.
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
TAG = u"Downloader"
- def __init__(self, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def audio_from_youtube(
self,
source_url,
@@ -73,91 +68,80 @@ def audio_from_youtube(
Return the path of the downloaded file.
- :param source_url: the URL of the YouTube video
- :type source_url: string (url)
- :param download: if ``True``, download the audio stream
- best matching ``preferred_index`` or ``preferred_format``
- and ``largest_audio``;
- if ``False``, return the list of available audio streams
- :type download: bool
- :param output_file_path: the path where the downloaded audio should be saved;
- if ``None``, create a temporary file
- :type output_file_path: string (path)
- :param preferred_index: preferably download this audio stream
- :type preferred_index: int
- :param largest_audio: if ``True``, download the largest audio stream available;
- if ``False``, download the smallest one.
- :type largest_audio: bool
- :param preferred_format: preferably download this audio format
- :type preferred_format: string
-
- :rtype: string (path) or list of pafy audio streams
-
- :raise ImportError: if ``pafy`` is not installed
- :raise OSError: if ``output_file_path`` cannot be written
- :raise ValueError: if ``source_url`` is not a valid YouTube URL
+ :param string source_url: the URL of the YouTube video
+ :param bool download: if ``True``, download the audio stream
+ best matching ``preferred_index`` or ``preferred_format``
+ and ``largest_audio``;
+ if ``False``, return the list of available audio streams
+ :param string output_file_path: the path where the downloaded audio should be saved;
+ if ``None``, create a temporary file
+ :param int preferred_index: preferably download this audio stream
+ :param bool largest_audio: if ``True``, download the largest audio stream available;
+ if ``False``, download the smallest one.
+ :param string preferred_format: preferably download this audio format
+ :rtype: string or list of pafy audio streams
+ :raises: ImportError: if ``pafy`` is not installed
+ :raises: OSError: if ``output_file_path`` cannot be written
+ :raises: ValueError: if ``source_url`` is not a valid YouTube URL
"""
def select_audiostream(audiostreams):
""" Select the audiostream best matching the given parameters. """
if preferred_index is not None:
if preferred_index in range(len(audiostreams)):
- self._log([u"Selecting audiostream with index %d", preferred_index])
+ self.log([u"Selecting audiostream with index %d", preferred_index])
return audiostreams[preferred_index]
else:
- self._log([u"Audio stream index %d not allowed", preferred_index], Logger.WARNING)
- self._log(u"Ignoring the requested audio stream index", Logger.WARNING)
- # filter by preferred format
+ self.log_warn([u"Audio stream index '%d' not allowed", preferred_index])
+ self.log_warn(u"Ignoring the requested audio stream index")
+ # selecting by preferred format
streams = audiostreams
if preferred_format is not None:
- self._log([u"Filtering audiostreams by preferred format %s", preferred_format])
+ self.log([u"Selecting audiostreams by preferred format %s", preferred_format])
streams = [audiostream for audiostream in streams if audiostream.extension == preferred_format]
if len(streams) < 1:
- self._log([u"No audiostream with preferred format %s", preferred_format])
+ self.log([u"No audiostream with preferred format %s", preferred_format])
streams = audiostreams
# sort by size
streams = sorted([(audio.get_filesize(), audio) for audio in streams])
if largest_audio:
- self._log(u"Selecting largest audiostream")
+ self.log(u"Selecting largest audiostream")
selected = streams[-1][1]
else:
- self._log(u"Selecting smallest audiostream")
+ self.log(u"Selecting smallest audiostream")
selected = streams[0][1]
return selected
try:
import pafy
except ImportError as exc:
- self._log(u"pafy is not installed", Logger.CRITICAL)
- raise exc
+ self.log_exc(u"Python module pafy is not installed", exc, True, ImportError)
try:
video = pafy.new(source_url)
except (IOError, OSError, ValueError) as exc:
- self._log([u"The specified source URL '%s' is not a valid YouTube URL", source_url], Logger.CRITICAL)
- raise ValueError("The specified source URL is not a valid YouTube URL")
+ self.log_exc(u"The specified source URL '%s' is not a valid YouTube URL or you are offline" % (source_url), exc, True, ValueError)
if not download:
- self._log(u"Returning the list of audio streams")
+ self.log(u"Returning the list of audio streams")
return video.audiostreams
output_path = output_file_path
if output_file_path is None:
- self._log(u"output_path is None: creating temp file")
- handler, output_path = gf.tmp_file(root=self.rconf["tmp_path"])
+ self.log(u"output_path is None: creating temp file")
+ handler, output_path = gf.tmp_file(root=self.rconf[RuntimeConfiguration.TMP_PATH])
else:
if not gf.file_can_be_written(output_path):
- self._log([u"Path '%s' cannot be written (wrong permissions?)", output_path], Logger.CRITICAL)
- raise OSError("Path '%s' cannot be written (wrong permissions?)" % output_path)
+ self.log_exc(u"Path '%s' cannot be written. Wrong permissions?" % (output_path), None, True, OSError)
audiostream = select_audiostream(video.audiostreams)
if output_file_path is None:
gf.delete_file(handler, output_path)
output_path += "." + audiostream.extension
- self._log([u"output_path is '%s'", output_path])
- self._log(u"Downloading...")
+ self.log([u"output_path is '%s'", output_path])
+ self.log(u"Downloading...")
audiostream.download(filepath=output_path, quiet=True)
- self._log(u"Downloading... done")
+ self.log(u"Downloading... done")
return output_path
diff --git a/aeneas/dtw.py b/aeneas/dtw.py
index 1f4a0bfd..afe168d9 100644
--- a/aeneas/dtw.py
+++ b/aeneas/dtw.py
@@ -7,24 +7,28 @@
to align two audio waves, represented by their
Mel-frequency cepstral coefficients (MFCCs).
-The two classes provided by this module are:
+This module contains the following classes:
-1. :class:`aeneas.dtw.DTWAlgorithm`
- is an enumeration of the available algorithms.
-2. :class:`aeneas.dtw.DTWAligner`
- is the actual feature extractor and aligner.
+* :class:`~aeneas.dtw.DTWAlgorithm`,
+ an enumeration of the available algorithms;
+* :class:`~aeneas.dtw.DTWAligner`,
+ the actual wave aligner;
+* :class:`~aeneas.dtw.DTWExact`,
+ a DTW aligner implementing the exact (full) DTW algorithm;
+* :class:`~aeneas.dtw.DTWStripe`,
+ a DTW aligner implementing the Sachoe-Chiba band heuristic.
To align two wave files:
-1. build an :class:`aeneas.dtw.DTWAligner` object
- passing the paths of the two wave files
- in the constructor, possibly with custom arguments
- to fine-tune the alignment;
-2. call ``compute_mfcc`` to extract the MFCCs of the two wave files;
-3. call ``compute_path`` to compute the min cost path between
- the MFCC representations of the two wave files;
-4. obtain the map between the two wave files by reading the
- ``computed_map`` property.
+1. build an :class:`~aeneas.dtw.DTWAligner` object,
+ passing in the constructor
+ the paths of the two wave files
+ or their MFCC representations;
+2. call :func:`~aeneas.dtw.DTWAligner.compute_path`
+ to compute the min cost path between
+ the MFCC representations of the two wave files.
+
+.. warning:: This module might be refactored in a future version
"""
from __future__ import absolute_import
@@ -32,8 +36,8 @@
from __future__ import print_function
import numpy
-from aeneas.audiofile import AudioFileMonoWAVE
-from aeneas.logger import Logger
+from aeneas.audiofilemfcc import AudioFileMFCC
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
import aeneas.globalfunctions as gf
@@ -44,7 +48,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -58,18 +62,19 @@ class DTWAlgorithm(object):
""" Classical (exact) DTW algorithm.
This implementation has ``O(nm)`` time and space complexity,
- where ``n`` (respectively, ``m``) is the number of MFCCs
+ where ``n`` (respectively, ``m``) is the number of MFCC window shifts (vectors)
of the real (respectively, synthesized) wave. """
STRIPE = "stripe"
""" DTW algorithm restricted to a stripe around the main diagonal
- (Sakoe-Chiba Band), for optimized memory usage and processing.
+ (Sakoe-Chiba Band), for reducing memory usage and run time.
Note that this is an heuristic approximation of the optimal (exact) path.
This implementation has ``O(nd)`` time and space complexity,
- where ``n`` is the number of MFCCs of the real wave,
- and ``d`` is the number of MFCCs
+ where ``n`` is the number of MFCC window shifts (vectors)
+ of the real wave,
+ and ``d`` is the number of MFCC window shifts
corresponding to the margin. """
ALLOWED_VALUES = [EXACT, STRIPE]
@@ -77,246 +82,219 @@ class DTWAlgorithm(object):
-class DTWAligner(object):
+class DTWAlignerNotInitialized(Exception):
"""
- The MFCC extractor and wave aligner.
-
- :param real_wave_path: the path to the real wav file (must be mono!)
- :type real_wave_path: string (path)
- :param synt_wave_path: the path to the synthesized wav file (must be mono!)
- :type synt_wave_path: string (path)
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
- :param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ Error raised when trying to compute
+ using an DTWAligner object whose real and/or synt waves
+ are not initialized yet.
+ """
+ pass
- :raise ValueError: if ``real_wave_path`` or ``synt_wave_path`` is ``None``
- or it does not exist, or if ``algorithm`` is not an allowed value
+
+
+class DTWAligner(Loggable):
+ """
+ The audio wave aligner.
+
+ The two waves, henceforth named real and synthesized,
+ can be passed as :class:`~aeneas.audiofilemfcc.AudioFileMFCC` objects
+ or as file paths.
+ In the latter case, MFCCs will be extracted upon object creation.
+
+ :param real_wave_mfcc: the real audio file
+ :type real_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :param synt_wave_mfcc: the synthesized audio file
+ :type synt_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :param string real_wave_path: the path to the real audio file
+ :param string synt_wave_path: the path to the synthesized audio file
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: ValueError: if ``real_wave_mfcc`` or ``synt_wave_mfcc`` is not ``None``
+ but not of type :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :raises: ValueError: if ``real_wave_path`` or ``synt_wave_path`` is not ``None``
+ but it cannot be read
"""
TAG = u"DTWAligner"
- def __init__(self, real_wave_path=None, synt_wave_path=None, rconf=None, logger=None):
+ def __init__(
+ self,
+ real_wave_mfcc=None,
+ synt_wave_mfcc=None,
+ real_wave_path=None,
+ synt_wave_path=None,
+ rconf=None,
+ logger=None
+ ):
+ if (real_wave_mfcc is not None) and (type(real_wave_mfcc) is not AudioFileMFCC):
+ raise ValueError(u"Real wave mfcc must be None or of type AudioFileMFCC")
+ if (synt_wave_mfcc is not None) and (type(synt_wave_mfcc) is not AudioFileMFCC):
+ raise ValueError(u"Synt wave mfcc must be None or of type AudioFileMFCC")
if (real_wave_path is not None) and (not gf.file_can_be_read(real_wave_path)):
- raise ValueError("Real wave cannot be read")
+ raise ValueError(u"Real wave cannot be read")
if (synt_wave_path is not None) and (not gf.file_can_be_read(synt_wave_path)):
- raise ValueError("Synt wave cannot be read")
- if (rconf is not None) and (rconf["dtw_algorithm"] not in DTWAlgorithm.ALLOWED_VALUES):
- raise ValueError("Algorithm value not allowed")
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ raise ValueError(u"Synt wave cannot be read")
+ if (rconf is not None) and (rconf[RuntimeConfiguration.DTW_ALGORITHM] not in DTWAlgorithm.ALLOWED_VALUES):
+ raise ValueError(u"Algorithm value not allowed")
+ super(DTWAligner, self).__init__(rconf=rconf, logger=logger)
+ self.real_wave_mfcc = real_wave_mfcc
+ self.synt_wave_mfcc = synt_wave_mfcc
self.real_wave_path = real_wave_path
self.synt_wave_path = synt_wave_path
- self.real_wave_full_mfcc = None
- self.synt_wave_full_mfcc = None
- self.real_wave_length = None
- self.synt_wave_length = None
- self.computed_path = None
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
- @property
- def real_wave_full_mfcc(self):
- """
- MFCCs of the real wave, including the 0-th.
-
- :rtype: numpy 2D array
-
- .. versionadded:: 1.1.0
- """
- return self.__real_wave_full_mfcc
-
- @real_wave_full_mfcc.setter
- def real_wave_full_mfcc(self, real_wave_full_mfcc):
- self.__real_wave_full_mfcc = real_wave_full_mfcc
+ if (self.real_wave_mfcc is None) and (self.real_wave_path is not None):
+ self.real_wave_mfcc = AudioFileMFCC(self.real_wave_path, rconf=self.rconf, logger=self.logger)
+ if (self.synt_wave_mfcc is None) and (self.synt_wave_path is not None):
+ self.synt_wave_mfcc = AudioFileMFCC(self.synt_wave_path, rconf=self.rconf, logger=self.logger)
- @property
- def real_wave_length(self):
- """
- The length, in seconds, of the real wave.
-
- :rtype: float
-
- .. versionadded:: 1.1.0
+ def compute_accumulated_cost_matrix(self):
"""
- return self.__real_wave_length
+ Compute the accumulated cost matrix, and return it.
- @real_wave_length.setter
- def real_wave_length(self, real_wave_length):
- self.__real_wave_length = real_wave_length
+ :rtype: :class:`numpy.ndarray` (2D)
+ :raises: RuntimeError: if both the C extension and
+ the pure Python code did not succeed.
- @property
- def synt_wave_full_mfcc(self):
+ .. versionadded:: 1.2.0
"""
- MFCCs of the synthesized wave, including the 0-th.
-
- :rtype: numpy 2D array
+ dtw = self._setup_dtw()
+ self.log(u"Returning accumulated cost matrix")
+ return dtw.compute_accumulated_cost_matrix()
- .. versionadded:: 1.1.0
+ def compute_path(self):
"""
- return self.__synt_wave_full_mfcc
+ Compute the min cost path between the two waves, and return it.
- @synt_wave_full_mfcc.setter
- def synt_wave_full_mfcc(self, synt_wave_full_mfcc):
- self.__synt_wave_full_mfcc = synt_wave_full_mfcc
+ Return the computed path as a tuple with two elements,
+ each being a :class:`numpy.ndarray` (1D) of ``int`` indices: ::
- @property
- def synt_wave_length(self):
- """
- The length, in seconds, of the synthesized wave.
+ ([r_1, r_2, ..., r_k], [s_1, s_2, ..., s_k])
- :rtype: float
+ where ``r_i`` are the indices in the real wave
+ and ``s_i`` are the indices in the synthesized wave,
+ and ``k`` is the length of the min cost path.
- .. versionadded:: 1.1.0
+ :rtype: tuple (see above)
+ :raises: RuntimeError: if both the C extension and
+ the pure Python code did not succeed.
"""
- return self.__synt_wave_length
-
- @synt_wave_length.setter
- def synt_wave_length(self, synt_wave_length):
- self.__synt_wave_length = synt_wave_length
-
- def compute_mfcc(self, real_wave=True, synt_wave=True):
- """
- Compute the MFCCs of the two waves,
- and store them internally.
-
- :param real_wave: if ``True``, extract MFCCs for the real wave
- :type real_wave: bool
- :param synt_wave: if ``True``, extract MFCCs for the synt wave
- :type synt_wave: bool
-
- :raise OSError: if the real or synt wave file cannot be read
+ dtw = self._setup_dtw()
+ self.log(u"Computing path...")
+ wave_path = dtw.compute_path()
+ self.log(u"Computing path... done")
+ self.log(u"Translating path to full wave indices...")
+ real_indices = numpy.array([t[0] for t in wave_path])
+ synt_indices = numpy.array([t[1] for t in wave_path])
+ # TODO this depends whether we are masking or not
+ real_indices += self.real_wave_mfcc.head_length
+ self.log(u"Translating path to full wave indices... done")
+ return (real_indices, synt_indices)
+
+ def compute_boundaries(self, synt_anchors):
"""
- if real_wave:
- if not gf.file_can_be_read(self.real_wave_path):
- raise OSError("Real wave path is None or it cannot be read")
- self._log(u"Computing MFCCs for real wave...")
- wave = AudioFileMonoWAVE(self.real_wave_path, rconf=self.rconf, logger=self.logger)
- wave.extract_mfcc()
- self.real_wave_full_mfcc = wave.audio_mfcc
- self.real_wave_length = wave.audio_length
- self._log(u"Computing MFCCs for real wave... done")
-
- if synt_wave:
- if not gf.file_can_be_read(self.synt_wave_path):
- raise OSError("Synt wave path is None or it cannot be read")
- self._log(u"Computing MFCCs for synt wave...")
- wave = AudioFileMonoWAVE(self.synt_wave_path, rconf=self.rconf, logger=self.logger)
- wave.extract_mfcc()
- self.synt_wave_full_mfcc = wave.audio_mfcc
- self.synt_wave_length = wave.audio_length
- self._log(u"Computing MFCCs for synt wave... done")
+ Compute the min cost path between the two waves,
+ and return a list of boundary points,
+ representing the argmin values with respect to
+ the provided ``synt_anchors`` timings.
- def compute_accumulated_cost_matrix(self):
- """
- Compute the accumulated cost matrix,
- and return it.
+ If ``synt_anchors`` has ``k`` elements,
+ the returned array will have ``k+1`` elements,
+ accounting for the tail fragment.
- :rtype: numpy 2D array
+ :param synt_anchors: the anchor time values (in seconds) of the synthesized fragments,
+ each representing the begin time in the synthesized wave
+ of the corresponding fragment
+ :type synt_anchors: list of :class:`~aeneas.timevalue.TimeValue`
- :raise RuntimeError: if both the C extension and
- the pure Python code did not succeed.
+ Return the list of boundary indices.
- .. versionadded:: 1.2.0
+ :rtype: :class:`numpy.ndarray` (1D)
"""
- dtw = self._setup_dtw()
- self._log(u"Returning accumulated cost matrix")
- return dtw.compute_accumulated_cost_matrix()
+ self.log(u"Computing path...")
+ real_indices, synt_indices = self.compute_path()
+ self.log(u"Computing path... done")
+
+ self.log(u"Computing boundary indices...")
+ # both real_indices and synt_indices are w.r.t. the full wave
+ self.log([u"Fragments: %d", len(synt_anchors)])
+ self.log([u"Path length: %d", len(real_indices)])
+ # synt_anchors as in seconds, convert them in MFCC indices
+ mws = self.rconf.mws
+ anchor_indices = numpy.array([int(a[0] / mws) for a in synt_anchors])
+ # right side sets the split point at the very beginning of "next" fragment
+ begin_indices = numpy.searchsorted(synt_indices, anchor_indices, side="right")
+ # first split must occur at zero
+ begin_indices[0] = 0
+ # map onto real indices, obtaining "default" boundary indices
+ boundary_indices = numpy.append(real_indices[begin_indices], self.real_wave_mfcc.tail_begin)
+ self.log([u"Boundary indices: %d", len(boundary_indices)])
+ self.log(u"Computing boundary indices... done")
+ return boundary_indices
- def compute_path(self):
+ def _setup_dtw(self):
"""
- Compute the min cost path between the two waves,
- and store it internally.
-
- :raise RuntimeError: if both the C extension and
- the pure Python code did not succeed.
+ Set the DTW object up.
"""
- dtw = self._setup_dtw()
- self._log(u"Computing path...")
- self.computed_path = dtw.compute_path()
- self._log(u"Computing path... done")
+ # check we have the AudioFileMFCC objects
+ if (self.real_wave_mfcc is None) or (self.real_wave_mfcc.middle_mfcc is None):
+ self.log_exc(u"The real wave MFCCs are not initialized", None, True, DTWAlignerNotInitialized)
+ if (self.synt_wave_mfcc is None) or (self.synt_wave_mfcc.middle_mfcc is None):
+ self.log_exc(u"The synt wave MFCCs are not initialized", None, True, DTWAlignerNotInitialized)
- def _setup_dtw(self):
- """ Setup DTW object """
# setup
- algorithm = self.rconf["dtw_algorithm"]
- delta = int(2 * self.rconf["dtw_margin"] / self.rconf["mfcc_win_shift"])
- mfcc2_length = self.synt_wave_full_mfcc.shape[1]
- self._log([u"Requested algorithm: '%s'", algorithm])
- self._log([u"delta = %d", delta])
- self._log([u"m = %d", mfcc2_length])
+ algorithm = self.rconf[RuntimeConfiguration.DTW_ALGORITHM]
+ delta = int(2 * self.rconf[RuntimeConfiguration.DTW_MARGIN] / self.rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT])
+ mfcc2_length = self.synt_wave_mfcc.middle_length
+ self.log([u"Requested algorithm: '%s'", algorithm])
+ self.log([u"delta = %d", delta])
+ self.log([u"m = %d", mfcc2_length])
# check if delta is >= length of synt wave
if mfcc2_length <= delta:
- self._log(u"We have mfcc2_length <= delta")
- if (self.rconf["c_ext"]) and (gf.can_run_c_extension()):
+ self.log(u"We have mfcc2_length <= delta")
+ if (self.rconf[RuntimeConfiguration.C_EXTENSIONS]) and (gf.can_run_c_extension()):
# the C code can be run: since it is still faster, do not run EXACT
- self._log(u"C extensions enabled and loaded: not selecting EXACT algorithm")
+ self.log(u"C extensions enabled and loaded: not selecting EXACT algorithm")
else:
- self._log(u"Selecting EXACT algorithm")
+ self.log(u"Selecting EXACT algorithm")
algorithm = DTWAlgorithm.EXACT
# execute the selected algorithm
if algorithm == DTWAlgorithm.EXACT:
- self._log(u"Computing with EXACT algo")
+ self.log(u"Computing with EXACT algo")
dtw = DTWExact(
- self.real_wave_full_mfcc,
- self.synt_wave_full_mfcc,
- self.logger
+ self.real_wave_mfcc.middle_mfcc,
+ self.synt_wave_mfcc.middle_mfcc,
+ rconf=self.rconf,
+ logger=self.logger
)
else:
- self._log(u"Computing with STRIPE algo")
+ self.log(u"Computing with STRIPE algo")
dtw = DTWStripe(
- self.real_wave_full_mfcc,
- self.synt_wave_full_mfcc,
+ self.real_wave_mfcc.middle_mfcc,
+ self.synt_wave_mfcc.middle_mfcc,
delta,
- self.logger
+ rconf=self.rconf,
+ logger=self.logger
)
return dtw
- @property
- def computed_map(self):
- """
- Return the computed map between the two waves,
- as a list of lists, each being a pair of floats: ::
- [[r_1, s_1], [r_2, s_2], ..., [r_k, s_k]]
- where ``r_i`` are the time instants in the real wave
- and ``s_i`` are the time instants in the synthesized wave,
- and ``k = n + m`` (or ``k = n + d``)
- is the length of the min cost path.
- :rtype: list of pairs of floats (see above)
- """
- result = []
- for i in range(len(self.computed_path)):
- real_time = self.computed_path[i][0] * self.rconf["mfcc_win_shift"]
- synt_time = self.computed_path[i][1] * self.rconf["mfcc_win_shift"]
- result.append([real_time, synt_time])
- return result
-
-
-
-class DTWStripe(object):
+class DTWStripe(Loggable):
TAG = u"DTWStripe"
- def __init__(self, m1, m2, delta, logger=None):
+ def __init__(self, m1, m2, delta, rconf=None, logger=None):
+ super(DTWStripe, self).__init__(rconf=rconf, logger=logger)
self.m1 = m1
self.m2 = m2
self.delta = delta
- self.logger = logger or Logger()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
def compute_accumulated_cost_matrix(self):
return gf.run_c_extension_with_fallback(
- self._log,
+ self.log,
"cdtw",
self._compute_acm_c_extension,
self._compute_acm_pure_python,
@@ -325,47 +303,43 @@ def compute_accumulated_cost_matrix(self):
)
def _compute_acm_c_extension(self):
- self._log(u"Computing acm using C extension...")
+ self.log(u"Computing acm using C extension...")
try:
- self._log(u"Importing cdtw...")
+ self.log(u"Importing cdtw...")
import aeneas.cdtw.cdtw
- self._log(u"Importing cdtw... done")
+ self.log(u"Importing cdtw... done")
# discard first MFCC component
mfcc1 = self.m1[1:, :]
mfcc2 = self.m2[1:, :]
n = mfcc1.shape[1]
m = mfcc2.shape[1]
delta = self.delta
- self._log([u"n m delta: %d %d %d", n, m, delta])
+ self.log([u"n m delta: %d %d %d", n, m, delta])
if delta > m:
- self._log(u"Limiting delta to m")
+ self.log(u"Limiting delta to m")
delta = m
cost_matrix, centers = aeneas.cdtw.cdtw.compute_cost_matrix_step(mfcc1, mfcc2, delta)
accumulated_cost_matrix = aeneas.cdtw.cdtw.compute_accumulated_cost_matrix_step(cost_matrix, centers)
- self._log(u"Computing acm using C extension... done")
+ self.log(u"Computing acm using C extension... done")
return (True, accumulated_cost_matrix)
except Exception as exc:
- self._log(u"Computing acm using C extension... failed")
- self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
+ self.log_exc(u"An unexpected error occurred while running cdtw", exc, False, None)
return (False, None)
def _compute_acm_pure_python(self):
- self._log(u"Computing acm using pure Python code...")
+ self.log(u"Computing acm using pure Python code...")
try:
cost_matrix, centers = self._compute_cost_matrix()
accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix, centers)
- self._log(u"Computing acm using pure Python code... done")
+ self.log(u"Computing acm using pure Python code... done")
return (True, accumulated_cost_matrix)
except Exception as exc:
- self._log(u"Computing acm using pure Python code... failed")
- self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
+ self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None)
return (False, None)
def compute_path(self):
return gf.run_c_extension_with_fallback(
- self._log,
+ self.log,
"cdtw",
self._compute_path_c_extension,
self._compute_path_pure_python,
@@ -374,50 +348,46 @@ def compute_path(self):
)
def _compute_path_c_extension(self):
- self._log(u"Computing path using C extension...")
+ self.log(u"Computing path using C extension...")
try:
- self._log(u"Importing cdtw...")
+ self.log(u"Importing cdtw...")
import aeneas.cdtw.cdtw
- self._log(u"Importing cdtw... done")
+ self.log(u"Importing cdtw... done")
# discard first MFCC component
mfcc1 = self.m1[1:, :]
mfcc2 = self.m2[1:, :]
n = mfcc1.shape[1]
m = mfcc2.shape[1]
delta = self.delta
- self._log([u"n m delta: %d %d %d", n, m, delta])
+ self.log([u"n m delta: %d %d %d", n, m, delta])
if delta > m:
- self._log(u"Limiting delta to m")
+ self.log(u"Limiting delta to m")
delta = m
best_path = aeneas.cdtw.cdtw.compute_best_path(
mfcc1,
mfcc2,
delta
)
- self._log(u"Computing path using C extension... done")
+ self.log(u"Computing path using C extension... done")
return (True, best_path)
except Exception as exc:
- self._log(u"Computing path using C extension... failed")
- self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
+ self.log_exc(u"An unexpected error occurred while running cdtw", exc, False, None)
return (False, None)
def _compute_path_pure_python(self):
- self._log(u"Computing path using pure Python code...")
+ self.log(u"Computing path using pure Python code...")
try:
cost_matrix, centers = self._compute_cost_matrix()
accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix, centers)
best_path = self._compute_best_path(accumulated_cost_matrix, centers)
- self._log(u"Computing path using pure Python code... done")
+ self.log(u"Computing path using pure Python code... done")
return (True, best_path)
except Exception as exc:
- self._log(u"Computing path using pure Python code... failed")
- self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
+ self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None)
return (False, None)
def _compute_cost_matrix(self):
- self._log(u"Computing cost matrix...")
+ self.log(u"Computing cost matrix...")
# discard first MFCC component
mfcc1 = self.m1[1:, :]
mfcc2 = self.m2[1:, :]
@@ -426,28 +396,28 @@ def _compute_cost_matrix(self):
n = mfcc1.shape[1]
m = mfcc2.shape[1]
delta = self.delta
- self._log([u"n m delta: %d %d %d", n, m, delta])
+ self.log([u"n m delta: %d %d %d", n, m, delta])
if delta > m:
- self._log(u"Limiting delta to m")
+ self.log(u"Limiting delta to m")
delta = m
cost_matrix = numpy.zeros((n, delta))
centers = numpy.zeros(n)
for i in range(n):
# center j at row i
center_j = (m * i) // n
- #self._log([u"Center at row %d is %d", i, center_j])
+ #self.log([u"Center at row %d is %d", i, center_j])
range_start = max(0, center_j - (delta // 2))
range_end = range_start + delta
if range_end > m:
range_end = m
range_start = range_end - delta
centers[i] = range_start
- #self._log([u"Range at row %d is %d %d", i, range_start, range_end])
+ #self.log([u"Range at row %d is %d %d", i, range_start, range_end])
for j in range(range_start, range_end):
tmp = mfcc1[:, i].transpose().dot(mfcc2[:, j])
tmp /= norm2_1[i] * norm2_2[j]
cost_matrix[i][j - range_start] = 1 - tmp
- self._log(u"Computing cost matrix... done")
+ self.log(u"Computing cost matrix... done")
return (cost_matrix, centers)
def _compute_accumulated_cost_matrix(self, cost_matrix, centers):
@@ -458,9 +428,9 @@ def _compute_accumulated_cost_matrix(self, cost_matrix, centers):
return self._compute_acm_in_place(cost_matrix, centers)
def _compute_acm_in_place(self, cost_matrix, centers):
- self._log(u"Computing the acm with the in-place algorithm...")
+ self.log(u"Computing the acm with the in-place algorithm...")
n, delta = cost_matrix.shape
- self._log([u"n delta: %d %d", n, delta])
+ self.log([u"n delta: %d %d", n, delta])
current_row = numpy.copy(cost_matrix[0, :])
#cost_matrix[0][0] = current_row[0]
for j in range(1, delta):
@@ -480,15 +450,15 @@ def _compute_acm_in_place(self, cost_matrix, centers):
if ((j+offset-1) < delta) and ((j+offset-1) >= 0):
cost2 = cost_matrix[i-1][j+offset-1]
cost_matrix[i][j] = current_row[j] + min(cost0, cost1, cost2)
- self._log(u"Computing the acm with the in-place algorithm... done")
+ self.log(u"Computing the acm with the in-place algorithm... done")
return cost_matrix
# DISABLED
#def _compute_acm_not_in_place(self, cost_matrix, centers):
- # self._log(u"Computing the acm with the not-in-place algorithm...")
+ # self.log(u"Computing the acm with the not-in-place algorithm...")
# acc_matrix = numpy.zeros(cost_matrix.shape)
# n, delta = acc_matrix.shape
- # self._log([u"n delta: %d %d", n, delta])
+ # self.log([u"n delta: %d %d", n, delta])
# # first row
# acc_matrix[0][0] = cost_matrix[0][0]
# for j in range(1, delta):
@@ -507,14 +477,14 @@ def _compute_acm_in_place(self, cost_matrix, centers):
# if ((j+offset-1) < delta) and ((j+offset-1) >= 0):
# cost2 = acc_matrix[i-1][j+offset-1]
# acc_matrix[i][j] = cost_matrix[i][j] + min(cost0, cost1, cost2)
- # self._log(u"Computing the acm with the not-in-place algorithm... done")
+ # self.log(u"Computing the acm with the not-in-place algorithm... done")
# return acc_matrix
def _compute_best_path(self, acc_matrix, centers):
- self._log(u"Computing best path...")
+ self.log(u"Computing best path...")
# get dimensions
n, delta = acc_matrix.shape
- self._log([u"n delta: %d %d", n, delta])
+ self.log([u"n delta: %d %d", n, delta])
i = n - 1
j = delta - 1 + centers[i]
path = [(i, j)]
@@ -549,61 +519,57 @@ def _compute_best_path(self, acc_matrix, centers):
(i-1, j-1)
]
min_cost = numpy.argmin(costs)
- #self._log([u"Selected min cost move %d", min_cost])
+ #self.log([u"Selected min cost move %d", min_cost])
min_move = moves[min_cost]
path.append(min_move)
i, j = min_move
# reverse path and return
path.reverse()
- self._log(u"Computing best path... done")
+ self.log(u"Computing best path... done")
return path
-class DTWExact(object):
+class DTWExact(Loggable):
TAG = u"DTWExact"
- def __init__(self, m1, m2, logger=None):
+ def __init__(self, m1, m2, rconf=None, logger=None):
+ super(DTWExact, self).__init__(rconf=rconf, logger=logger)
self.m1 = m1
self.m2 = m2
- self.logger = logger or Logger()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
def compute_accumulated_cost_matrix(self):
- self._log(u"Computing acm using pure Python code...")
+ self.log(u"Computing acm using pure Python code...")
cost_matrix = self._compute_cost_matrix()
accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix)
- self._log(u"Computing acm using pure Python code... done")
+ self.log(u"Computing acm using pure Python code... done")
return accumulated_cost_matrix
def compute_path(self):
- self._log(u"Computing path using pure Python code...")
+ self.log(u"Computing path using pure Python code...")
accumulated_cost_matrix = self.compute_accumulated_cost_matrix()
best_path = self._compute_best_path(accumulated_cost_matrix)
- self._log(u"Computing path using pure Python code... done")
+ self.log(u"Computing path using pure Python code... done")
return best_path
def _compute_cost_matrix(self):
- self._log(u"Computing cost matrix...")
+ self.log(u"Computing cost matrix...")
# discard first MFCC component
mfcc1 = self.m1[1:, :]
mfcc2 = self.m2[1:, :]
norm2_1 = numpy.sqrt(numpy.sum(mfcc1 ** 2, 0))
norm2_2 = numpy.sqrt(numpy.sum(mfcc2 ** 2, 0))
# compute dot product
- self._log(u"Computing matrix with transpose+dot...")
+ self.log(u"Computing matrix with transpose+dot...")
cost_matrix = mfcc1.transpose().dot(mfcc2)
- self._log(u"Computing matrix with transpose+dot... done")
+ self.log(u"Computing matrix with transpose+dot... done")
# normalize
- self._log(u"Normalizing matrix...")
+ self.log(u"Normalizing matrix...")
norm_matrix = numpy.outer(norm2_1, norm2_2)
cost_matrix = 1 - (cost_matrix / norm_matrix)
- self._log(u"Normalizing matrix... done")
- self._log(u"Computing cost matrix... done")
+ self.log(u"Normalizing matrix... done")
+ self.log(u"Computing cost matrix... done")
return cost_matrix
def _compute_accumulated_cost_matrix(self, cost_matrix):
@@ -614,9 +580,9 @@ def _compute_accumulated_cost_matrix(self, cost_matrix):
return self._compute_acm_in_place(cost_matrix)
def _compute_acm_in_place(self, cost_matrix):
- self._log(u"Computing the acm with the in-place algorithm...")
+ self.log(u"Computing the acm with the in-place algorithm...")
n, m = cost_matrix.shape
- self._log([u"n m: %d %d", n, m])
+ self.log([u"n m: %d %d", n, m])
current_row = numpy.copy(cost_matrix[0, :])
#cost_matrix[0][0] = current_row[0]
for j in range(1, m):
@@ -630,15 +596,15 @@ def _compute_acm_in_place(self, cost_matrix):
cost_matrix[i][j-1],
cost_matrix[i-1][j-1]
)
- self._log(u"Computing the acm with the in-place algorithm... done")
+ self.log(u"Computing the acm with the in-place algorithm... done")
return cost_matrix
# DISABLED
#def _compute_acm_not_in_place(self, cost_matrix):
- # self._log(u"Computing the acm with the not-in-place algorithm...")
+ # self.log(u"Computing the acm with the not-in-place algorithm...")
# acc_matrix = numpy.zeros(cost_matrix.shape)
# n, m = acc_matrix.shape
- # self._log([u"n m: %d %d", n, m])
+ # self.log([u"n m: %d %d", n, m])
# acc_matrix[0][0] = cost_matrix[0][0]
# for j in range(1, m):
# acc_matrix[0][j] = acc_matrix[0][j-1] + cost_matrix[0][j]
@@ -651,14 +617,14 @@ def _compute_acm_in_place(self, cost_matrix):
# acc_matrix[i][j-1],
# acc_matrix[i-1][j-1]
# )
- # self._log(u"Computing the acm with the not-in-place algorithm... done")
+ # self.log(u"Computing the acm with the not-in-place algorithm... done")
# return acc_matrix
def _compute_best_path(self, acc_matrix):
- self._log(u"Computing best path...")
+ self.log(u"Computing best path...")
# get dimensions
n, m = acc_matrix.shape
- self._log([u"n m: %d %d", n, m])
+ self.log([u"n m: %d %d", n, m])
i = n - 1
j = m - 1
path = [(i, j)]
@@ -682,13 +648,13 @@ def _compute_best_path(self, acc_matrix):
(i-1, j-1)
]
min_cost = numpy.argmin(costs)
- #self._log([u"Selected min cost move %d", min_cost])
+ #self.log([u"Selected min cost move %d", min_cost])
min_move = moves[min_cost]
path.append(min_move)
i, j = min_move
# reverse path and return
path.reverse()
- self._log(u"Computing best path... done")
+ self.log(u"Computing best path... done")
return path
diff --git a/aeneas/espeakwrapper.py b/aeneas/espeakwrapper.py
index 641d5a30..38c423fa 100644
--- a/aeneas/espeakwrapper.py
+++ b/aeneas/espeakwrapper.py
@@ -2,18 +2,18 @@
# coding=utf-8
"""
-Wrapper around ``espeak`` to synthesize text into a ``wav`` audio file.
+This module contains the following classes:
+
+* :class:`~aeneas.espeakwrapper.ESPEAKWrapper`, a wrapper for the ``eSpeak`` TTS engine.
"""
from __future__ import absolute_import
from __future__ import print_function
-import subprocess
-from aeneas.audiofile import AudioFileMonoWAVE
-from aeneas.audiofile import AudioFileUnsupportedFormatError
from aeneas.language import Language
-from aeneas.logger import Logger
from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.timevalue import TimeValue
+from aeneas.ttswrapper import TTSWrapper
import aeneas.globalfunctions as gf
__author__ = "Alberto Pettarin"
@@ -23,126 +23,631 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class ESPEAKWrapper(object):
+class ESPEAKWrapper(TTSWrapper):
"""
- Wrapper around ``espeak`` to synthesize text into a ``wav`` audio file.
+ A wrapper for the ``espeak`` TTS engine.
+
+ This wrapper is the default TTS engine for ``aeneas``.
+
+ This wrapper supports calling the TTS engine
+ via ``subprocess`` or via Python C extension.
+
+ In abstract terms, it performs one or more calls like ::
+
+ $ espeak -v voice_code -w /tmp/output_file.wav < text
+
+ To specify the path of the TTS executable, use ::
- It will perform one or more calls like ::
+ "tts=espeak|tts_path=/path/to/espeak"
- $ espeak -v language_code -w /tmp/output_file.wav < text
+ in the ``rconf`` object.
- In case of multiple text fragments, the resulting wav files
- will be joined together.
+ To run the ``cew`` Python C extension
+ in a separate process via
+ :class:`~aeneas.cewsubprocess.CEWSubprocess`, use ::
+ "cew_subprocess_enabled=True|cew_subprocess_path=/path/to/python"
+
+ in the ``rconf`` object.
+
+ See :class:`~aeneas.ttswrapper.TTSWrapper` for the available functions.
+ Below are listed the languages supported by this wrapper.
+
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
- TAG = u"ESPEAKWrapper"
+ AFR = Language.AFR
+ """ Afrikaans (not tested) """
- def __init__(self, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ ARG = Language.ARG
+ """ Aragonese (not tested) """
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
+ BOS = Language.BOS
+ """ Bosnian (not tested) """
- def _replace_language(self, language):
- """
- Mock support for a given language by
- synthesizing using a similar language.
+ BUL = Language.BUL
+ """ Bulgarian """
- :param language: the requested language
- :type language: :class:`aeneas.language.Language` enum
- :rtype: :class:`aeneas.language.Language` enum
- """
- if language == Language.UK:
- self._log([u"Replaced '%s' with '%s'", Language.UK, Language.RU])
- return Language.RU
- return language
-
- def synthesize_multiple(
- self,
- text_file,
- output_file_path,
- quit_after=None,
- backwards=False
- ):
- """
- Synthesize the text contained in the given fragment list
- into a ``wav`` file.
-
- :param text_file: the text file to be synthesized
- :type text_file: :class:`aeneas.textfile.TextFile`
- :param output_file_path: the path to the output audio file
- :type output_file_path: string (path)
- :param quit_after: stop synthesizing as soon as
- reaching this many seconds
- :type quit_after: float
- :param backwards: synthesizing from the end of the text file
- :type backwards: bool
- :rtype: tuple (anchors, total_time, num_chars)
-
- :raise TypeError: if ``text_file`` is ``None`` or
- one of the text fragments is not a ``unicode`` object
- :raise ValueError: if ``rconf["allow_unlisted_languages"]`` is ``False`` and
- a fragment has its language code not listed in
- :class:`aeneas.language.Language`
- :raise OSError: if output file cannot be written to ``output_file_path``
- :raise RuntimeError: if both the C extension and
- the pure Python code did not succeed.
- """
- # check that text_file is not None
- if text_file is None:
- self._log(u"text_file is None", Logger.CRITICAL)
- raise TypeError("text_file is None")
-
- # check that the lines in the text file all have
- # a supported language code and unicode type
- if not self.rconf["allow_unlisted_languages"]:
- for fragment in text_file.fragments:
- if fragment.language not in Language.ALLOWED_VALUES:
- self._log([u"Language '%s' is not allowed", fragment.language], Logger.CRITICAL)
- raise ValueError("Language code not allowed")
- for fragment in text_file.fragments:
- for line in fragment.lines:
- if not gf.is_unicode(line):
- self._log(u"Text file must contain only unicode strings", Logger.CRITICAL)
- raise TypeError("Text file must contain only unicode strings")
-
- # log parameters
- if quit_after is not None:
- self._log([u"Quit after reaching %.3f", quit_after])
- if backwards:
- self._log(u"Synthesizing backwards")
-
- # check that output_file_path can be written
- if not gf.file_can_be_written(output_file_path):
- self._log([u"Cannot write output file to '%s'", output_file_path], Logger.CRITICAL)
- raise OSError("Cannot write output file")
-
- return gf.run_c_extension_with_fallback(
- self._log,
- "cew",
- self._synthesize_multiple_c_extension,
- self._synthesize_multiple_pure_python,
- (text_file, output_file_path, quit_after, backwards),
- c_extension=self.rconf["c_ext"]
+ CAT = Language.CAT
+ """ Catalan """
+
+ CES = Language.CES
+ """ Czech """
+
+ CMN = Language.CMN
+ """ Mandarin Chinese (not tested) """
+
+ CYM = Language.CYM
+ """ Welsh """
+
+ DAN = Language.DAN
+ """ Danish """
+
+ DEU = Language.DEU
+ """ German """
+
+ ELL = Language.ELL
+ """ Greek (Modern) """
+
+ ENG = Language.ENG
+ """ English """
+
+ EPO = Language.EPO
+ """ Esperanto (not tested) """
+
+ EST = Language.EST
+ """ Estonian """
+
+ FAS = Language.FAS
+ """ Persian """
+
+ FIN = Language.FIN
+ """ Finnish """
+
+ FRA = Language.FRA
+ """ French """
+
+ GLE = Language.GLE
+ """ Irish """
+
+ GRC = Language.GRC
+ """ Greek (Ancient) """
+
+ HIN = Language.HIN
+ """ Hindi (not tested) """
+
+ HRV = Language.HRV
+ """ Croatian """
+
+ HUN = Language.HUN
+ """ Hungarian """
+
+ HYE = Language.HYE
+ """ Armenian (not tested) """
+
+ IND = Language.IND
+ """ Indonesian (not tested) """
+
+ ISL = Language.ISL
+ """ Icelandic """
+
+ ITA = Language.ITA
+ """ Italian """
+
+ JBO = Language.JBO
+ """ Lojban (not tested) """
+
+ KAN = Language.KAN
+ """ Kannada (not tested) """
+
+ KAT = Language.KAT
+ """ Georgian (not tested) """
+
+ KUR = Language.KUR
+ """ Kurdish (not tested) """
+
+ LAT = Language.LAT
+ """ Latin """
+
+ LAV = Language.LAV
+ """ Latvian """
+
+ LFN = Language.LFN
+ """ Lingua Franca Nova (not tested) """
+
+ LIT = Language.LIT
+ """ Lithuanian """
+
+ MAL = Language.MAL
+ """ Malayalam (not tested) """
+
+ MKD = Language.MKD
+ """ Macedonian (not tested) """
+
+ MSA = Language.MSA
+ """ Malay (not tested) """
+
+ NEP = Language.NEP
+ """ Nepali (not tested) """
+
+ NLD = Language.NLD
+ """ Dutch """
+
+ NOR = Language.NOR
+ """ Norwegian """
+
+ PAN = Language.PAN
+ """ Panjabi (not tested) """
+
+ POL = Language.POL
+ """ Polish """
+
+ POR = Language.POR
+ """ Portuguese """
+
+ RON = Language.RON
+ """ Romanian """
+
+ RUS = Language.RUS
+ """ Russian """
+
+ SLK = Language.SLK
+ """ Slovak """
+
+ SPA = Language.SPA
+ """ Spanish """
+
+ SQI = Language.SQI
+ """ Albanian (not tested) """
+
+ SRP = Language.SRP
+ """ Serbian """
+
+ SWA = Language.SWA
+ """ Swahili """
+
+ SWE = Language.SWE
+ """ Swedish """
+
+ TAM = Language.TAM
+ """ Tamil (not tested) """
+
+ TUR = Language.TUR
+ """ Turkish """
+
+ UKR = Language.UKR
+ """ Ukrainian """
+
+ VIE = Language.VIE
+ """ Vietnamese (not tested) """
+
+ YUE = Language.YUE
+ """ Yue Chinese (not tested) """
+
+ ZHO = Language.ZHO
+ """ Chinese (not tested) """
+
+ ENG_GBR = "eng-GBR"
+ """ English (GB) """
+
+ ENG_SCT = "eng-SCT"
+ """ English (Scotland) (not tested) """
+
+ ENG_USA = "eng-USA"
+ """ English (USA) """
+
+ SPA_ESP = "spa-ESP"
+ """ Spanish (Castillan) """
+
+ FRA_BEL = "fra-BEL"
+ """ French (Belgium) (not tested) """
+
+ FRA_FRA = "fra-FRA"
+ """ French (France) """
+
+ POR_BRA = "por-bra"
+ """ Portuguese (Brazil) (not tested) """
+
+ POR_PRT = "por-prt"
+ """ Portuguese (Portugal) """
+
+ AF = "af"
+ """ Afrikaans (not tested) """
+
+ AN = "an"
+ """ Aragonese (not tested) """
+
+ BG = "bg"
+ """ Bulgarian """
+
+ BS = "bs"
+ """ Bosnian (not tested) """
+
+ CA = "ca"
+ """ Catalan """
+
+ CS = "cs"
+ """ Czech """
+
+ CY = "cy"
+ """ Welsh """
+
+ DA = "da"
+ """ Danish """
+
+ DE = "de"
+ """ German """
+
+ EL = "el"
+ """ Greek (Modern) """
+
+ EN = "en"
+ """ English """
+
+ EN_GB = "en-gb"
+ """ English (GB) """
+
+ EN_SC = "en-sc"
+ """ English (Scotland) (not tested) """
+
+ EN_UK_NORTH = "en-uk-north"
+ """ English (Northern) (not tested) """
+
+ EN_UK_RP = "en-uk-rp"
+ """ English (Received Pronunciation) (not tested) """
+
+ EN_UK_WMIDS = "en-uk-wmids"
+ """ English (Midlands) (not tested) """
+
+ EN_US = "en-us"
+ """ English (USA) """
+
+ EN_WI = "en-wi"
+ """ English (West Indies) (not tested) """
+
+ EO = "eo"
+ """ Esperanto (not tested) """
+
+ ES = "es"
+ """ Spanish (Castillan) """
+
+ ES_LA = "es-la"
+ """ Spanish (Latin America) (not tested) """
+
+ ET = "et"
+ """ Estonian """
+
+ FA = "fa"
+ """ Persian """
+
+ FA_PIN = "fa-pin"
+ """ Persian (Pinglish) """
+
+ FI = "fi"
+ """ Finnish """
+
+ FR = "fr"
+ """ French """
+
+ FR_BE = "fr-be"
+ """ French (Belgium) (not tested) """
+
+ FR_FR = "fr-fr"
+ """ French (France) """
+
+ GA = "ga"
+ """ Irish """
+
+ # NOTE already defined
+ #GRC = "grc"
+ #""" Greek (Ancient) """
+
+ HI = "hi"
+ """ Hindi (not tested) """
+
+ HR = "hr"
+ """ Croatian """
+
+ HU = "hu"
+ """ Hungarian """
+
+ HY = "hy"
+ """ Armenian (not tested) """
+
+ HY_WEST = "hy-west"
+ """ Armenian (West) (not tested) """
+
+ ID = "id"
+ """ Indonesian (not tested) """
+
+ IS = "is"
+ """ Icelandic """
+
+ IT = "it"
+ """ Italian """
+
+ # NOTE already defined
+ #JBO = "jbo"
+ #""" Lojban (not tested) """
+
+ KA = "ka"
+ """ Georgian (not tested) """
+
+ KN = "kn"
+ """ Kannada (not tested) """
+
+ KU = "ku"
+ """ Kurdish (not tested) """
+
+ LA = "la"
+ """ Latin """
+
+ # NOTE already defined
+ #LFN = "lfn"
+ #""" Lingua Franca Nova (not tested) """
+
+ LT = "lt"
+ """ Lithuanian """
+
+ LV = "lv"
+ """ Latvian """
+
+ MK = "mk"
+ """ Macedonian (not tested) """
+
+ ML = "ml"
+ """ Malayalam (not tested) """
+
+ MS = "ms"
+ """ Malay (not tested) """
+
+ NE = "ne"
+ """ Nepali (not tested) """
+
+ NL = "nl"
+ """ Dutch """
+
+ NO = "no"
+ """ Norwegian """
+
+ PA = "pa"
+ """ Panjabi (not tested) """
+
+ PL = "pl"
+ """ Polish """
+
+ PT = "pt"
+ """ Portuguese """
+
+ PT_BR = "pt-br"
+ """ Portuguese (Brazil) (not tested) """
+
+ PT_PT = "pt-pt"
+ """ Portuguese (Portugal) """
+
+ RO = "ro"
+ """ Romanian """
+
+ RU = "ru"
+ """ Russian """
+
+ SQ = "sq"
+ """ Albanian (not tested) """
+
+ SK = "sk"
+ """ Slovak """
+
+ SR = "sr"
+ """ Serbian """
+
+ SV = "sv"
+ """ Swedish """
+
+ SW = "sw"
+ """ Swahili """
+
+ TA = "ta"
+ """ Tamil (not tested) """
+
+ TR = "tr"
+ """ Turkish """
+
+ UK = "uk"
+ """ Ukrainian """
+
+ VI = "vi"
+ """ Vietnamese (not tested) """
+
+ VI_HUE = "vi-hue"
+ """ Vietnamese (hue) (not tested) """
+
+ VI_SGN = "vi-sgn"
+ """ Vietnamese (sgn) (not tested) """
+
+ ZH = "zh"
+ """ Mandarin Chinese (not tested) """
+
+ ZH_YUE = "zh-yue"
+ """ Yue Chinese (not tested) """
+
+ LANGUAGE_TO_VOICE_CODE = {
+ AF : "af",
+ AN : "an",
+ BG : "bg",
+ BS : "bs",
+ CA : "ca",
+ CS : "cs",
+ CY : "cy",
+ DA : "da",
+ DE : "de",
+ EL : "el",
+ EN : "en",
+ EN_GB : "en-gb",
+ EN_SC : "en-sc",
+ EN_UK_NORTH : "en-uk-north",
+ EN_UK_RP : "en-uk-rp",
+ EN_UK_WMIDS : "en-uk-wmids",
+ EN_US : "en-us",
+ EN_WI : "en-wi",
+ EO : "eo",
+ ES : "es",
+ ES_LA : "es-la",
+ ET : "et",
+ FA : "fa",
+ FA_PIN : "fa-pin",
+ FI : "fi",
+ FR : "fr",
+ FR_BE : "fr-be",
+ FR_FR : "fr-fr",
+ GA : "ga",
+ #GRC : "grc",
+ HI : "hi",
+ HR : "hr",
+ HU : "hu",
+ HY : "hy",
+ HY_WEST : "hy-west",
+ ID : "id",
+ IS : "is",
+ IT : "it",
+ #JBO : "jbo",
+ KA : "ka",
+ KN : "kn",
+ KU : "ku",
+ LA : "la",
+ #LFN : "lfn",
+ LT : "lt",
+ LV : "lv",
+ MK : "mk",
+ ML : "ml",
+ MS : "ms",
+ NE : "ne",
+ NL : "nl",
+ NO : "no",
+ PA : "pa",
+ PL : "pl",
+ PT : "pt",
+ PT_BR : "pt-br",
+ PT_PT : "pt-pt",
+ RO : "ro",
+ RU : "ru",
+ SQ : "sq",
+ SK : "sk",
+ SR : "sr",
+ SV : "sv",
+ SW : "sw",
+ TA : "ta",
+ TR : "tr",
+ UK : "ru", # NOTE mocking support for Ukrainian with Russian voice
+ VI : "vi",
+ VI_HUE : "vi-hue",
+ VI_SGN : "vi-sgn",
+ ZH : "zh",
+ ZH_YUE : "zh-yue",
+ AFR : "af",
+ ARG : "an",
+ BOS : "bs",
+ BUL : "bg",
+ CAT : "ca",
+ CES : "cs",
+ CMN : "zh",
+ CYM : "cy",
+ DAN : "da",
+ DEU : "de",
+ ELL : "el",
+ ENG : "en",
+ EPO : "eo",
+ EST : "et",
+ FAS : "fa",
+ FIN : "fi",
+ FRA : "fr",
+ GLE : "ga",
+ GRC : "grc",
+ HIN : "hi",
+ HRV : "hr",
+ HUN : "hu",
+ HYE : "hy",
+ IND : "id",
+ ISL : "is",
+ ITA : "it",
+ JBO : "jbo",
+ KAN : "kn",
+ KAT : "ka",
+ KUR : "ku",
+ LAT : "la",
+ LAV : "lv",
+ LFN : "lfn",
+ LIT : "lt",
+ MAL : "ml",
+ MKD : "mk",
+ MSA : "ms",
+ NEP : "ne",
+ NLD : "nl",
+ NOR : "no",
+ PAN : "pa",
+ POL : "pl",
+ POR : "pt",
+ RON : "ro",
+ RUS : "ru",
+ SLK : "sk",
+ SPA : "es",
+ SQI : "sq",
+ SRP : "sr",
+ SWA : "sw",
+ SWE : "sv",
+ TAM : "ta",
+ TUR : "tr",
+ UKR : "ru", # NOTE mocking support for Ukrainian with Russian voice
+ VIE : "vi",
+ YUE : "zh-yue",
+ ZHO : "zh",
+ ENG_GBR : "en-gb",
+ ENG_SCT : "en-sc",
+ ENG_USA : "en-us",
+ SPA_ESP : "es-es",
+ FRA_BEL : "fr-be",
+ FRA_FRA : "fr-fr",
+ POR_BRA : "pt-br",
+ POR_PRT : "pt-pt"
+ }
+ DEFAULT_LANGUAGE = ENG
+
+ OUTPUT_MONO_WAVE = True
+
+ TAG = u"ESPEAKWrapper"
+
+ def __init__(self, rconf=None, logger=None):
+ super(ESPEAKWrapper, self).__init__(
+ has_subprocess_call=True,
+ has_c_extension_call=True,
+ has_python_call=False,
+ rconf=rconf,
+ logger=logger
)
+ self.set_subprocess_arguments([
+ self.rconf[RuntimeConfiguration.TTS_PATH],
+ u"-v",
+ TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING,
+ u"-w",
+ TTSWrapper.CLI_PARAMETER_WAVE_PATH,
+ TTSWrapper.CLI_PARAMETER_TEXT_STDIN
+ ])
+
+ def _synthesize_multiple_c_extension(self, text_file, output_file_path, quit_after=None, backwards=False):
+ """
+ Synthesize multiple text fragments, using the cew extension.
- def _synthesize_multiple_c_extension(
- self,
- text_file,
- output_file_path,
- quit_after=None,
- backwards=False
- ):
- self._log(u"Synthesizing using C extension...")
+ Return a tuple (anchors, total_time, num_chars).
+
+ :rtype: (bool, (list, :class:`~aeneas.timevalue.TimeValue`, int))
+ """
+ self.log(u"Synthesizing using C extension...")
# convert parameters from Python values to C values
try:
@@ -152,52 +657,75 @@ def _synthesize_multiple_c_extension(
c_backwards = 0
if backwards:
c_backwards = 1
- self._log([u"output_file_path: %s", output_file_path])
- self._log([u"c_quit_after: %.3f", c_quit_after])
- self._log([u"c_backwards: %d", c_backwards])
- self._log(u"Preparing c_text...")
- c_text = []
+ self.log([u"output_file_path: %s", output_file_path])
+ self.log([u"c_quit_after: %.3f", c_quit_after])
+ self.log([u"c_backwards: %d", c_backwards])
+ self.log(u"Preparing u_text...")
+ u_text = []
fragments = text_file.fragments
for fragment in fragments:
f_lang = fragment.language
f_text = fragment.filtered_text
if f_lang is None:
- f_lang = Language.EN
- f_lang = self._replace_language(f_lang)
+ f_lang = self.DEFAULT_LANGUAGE
+ f_voice_code = self._language_to_voice_code(f_lang)
if f_text is None:
f_text = u""
+ u_text.append((f_voice_code, f_text))
+ self.log(u"Preparing u_text... done")
+
+ # call C extension
+ sr = None
+ sf = None
+ intervals = None
+ if self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_ENABLED]:
+ self.log(u"Using cewsubprocess to call aeneas.cew")
+ try:
+ self.log(u"Importing aeneas.cewsubprocess...")
+ from aeneas.cewsubprocess import CEWSubprocess
+ self.log(u"Importing aeneas.cewsubprocess... done")
+ self.log(u"Calling aeneas.cewsubprocess...")
+ cewsub = CEWSubprocess(rconf=self.rconf, logger=self.logger)
+ sr, sf, intervals = cewsub.synthesize_multiple(output_file_path, c_quit_after, c_backwards, u_text)
+ self.log(u"Calling aeneas.cewsubprocess... done")
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running cewsubprocess", exc, False, None)
+ # NOTE not critical, try calling aeneas.cew directly
+ #return (False, None)
+
+ if sr is None:
+ self.log(u"Preparing c_text...")
if gf.PY2:
# Python 2 => pass byte strings
- c_text.append((gf.safe_bytes(f_lang), gf.safe_bytes(f_text)))
+ c_text = [(gf.safe_bytes(t[0]), gf.safe_bytes(t[1])) for t in u_text]
else:
# Python 3 => pass Unicode strings
- c_text.append((gf.safe_unicode(f_lang), gf.safe_unicode(f_text)))
- self._log(u"Preparing c_text... done")
+ c_text = [(gf.safe_unicode(t[0]), gf.safe_unicode(t[1])) for t in u_text]
+ self.log(u"Preparing c_text... done")
+
+ self.log(u"Calling aeneas.cew directly")
+ try:
+ self.log(u"Importing aeneas.cew...")
+ import aeneas.cew.cew
+ self.log(u"Importing aeneas.cew... done")
+ self.log(u"Calling aeneas.cew...")
+ sr, sf, intervals = aeneas.cew.cew.synthesize_multiple(
+ output_file_path,
+ c_quit_after,
+ c_backwards,
+ c_text
+ )
+ self.log(u"Calling aeneas.cew... done")
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running cew", exc, False, None)
+ return (False, None)
- # call C extension
- try:
- self._log(u"Importing aeneas.cew...")
- import aeneas.cew.cew
- self._log(u"Importing aeneas.cew... done")
- self._log(u"Calling aeneas.cew...")
- sr, sf, intervals = aeneas.cew.cew.synthesize_multiple(
- output_file_path,
- c_quit_after,
- c_backwards,
- c_text
- )
- self._log(u"Calling aeneas.cew... done")
- except Exception as exc:
- self._log(u"Calling aeneas.cew... failed")
- self._log(u"An unexpected exception occurred while running cew:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
- self._log([u"sr: %d", sr])
- self._log([u"sf: %d", sf])
+ self.log([u"sr: %d", sr])
+ self.log([u"sf: %d", sf])
# create output
anchors = []
- current_time = 0.0
+ current_time = TimeValue("0.000")
num_chars = 0
if backwards:
fragments = fragments[::-1]
@@ -206,301 +734,78 @@ def _synthesize_multiple_c_extension(
fragment = fragments[i]
# store for later output
anchors.append([
- intervals[i][0],
+ TimeValue(intervals[i][0]),
fragment.identifier,
fragment.filtered_text
])
# increase the character counter
num_chars += fragment.characters
# update current_time
- current_time = intervals[i][1]
+ current_time = TimeValue(intervals[i][1])
# return output
# NOTE anchors do not make sense if backwards == True
- self._log([u"Returning %d time anchors", len(anchors)])
- self._log([u"Current time %.3f", current_time])
- self._log([u"Synthesized %d characters", num_chars])
- self._log(u"Synthesizing using C extension... done")
+ self.log([u"Returning %d time anchors", len(anchors)])
+ self.log([u"Current time %.3f", current_time])
+ self.log([u"Synthesized %d characters", num_chars])
+ self.log(u"Synthesizing using C extension... done")
return (True, (anchors, current_time, num_chars))
- def _synthesize_multiple_pure_python(
- self,
- text_file,
- output_file_path,
- quit_after=None,
- backwards=False
- ):
- def synthesize_and_clean(text, language):
- """
- Synthesize a single fragment, pure Python,
- and immediately remove the temporary file.
- """
- self._log(u"Synthesizing text...")
- handler, tmp_destination = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"])
- result, data = self._synthesize_single_pure_python(
- text=(text + u" "),
- language=language,
- output_file_path=tmp_destination
- )
- self._log([u"Removing temporary file '%s'", tmp_destination])
- gf.delete_file(handler, tmp_destination)
- self._log(u"Synthesizing text... done")
- return data
-
- self._log(u"Synthesizing using pure Python...")
-
- try:
- # get sample rate and encoding
- du_nu, sample_rate, encoding, da_nu = synthesize_and_clean(
- u"Dummy text to get sample_rate",
- Language.EN
- )
-
- # open output file
- output_file = AudioFileMonoWAVE(
- file_path=output_file_path,
- logger=self.logger
- )
- output_file.audio_format = encoding
- output_file.audio_sample_rate = sample_rate
-
- # create output
- anchors = []
- current_time = 0.0
- num = 0
- num_chars = 0
- fragments = text_file.fragments
- if backwards:
- fragments = fragments[::-1]
- for fragment in fragments:
- # replace language
- language = self._replace_language(fragment.language)
- # synthesize and get the duration of the output file
- self._log([u"Synthesizing fragment %d", num])
- duration, sr_nu, enc_nu, data = synthesize_and_clean(
- text=fragment.filtered_text,
- language=language
- )
- # store for later output
- anchors.append([current_time, fragment.identifier, fragment.text])
- # increase the character counter
- num_chars += fragment.characters
- # append/prepend data
- self._log([u"Fragment %d starts at: %f", num, current_time])
- if duration > 0:
- self._log([u"Fragment %d duration: %f", num, duration])
- current_time += duration
- #
- # NOTE since numpy.append cannot be in place,
- # it seems that the only alternative to make
- # this more efficient consists in pre-allocating
- # the destination array,
- # possibly truncating or extending it as needed
- #
- if backwards:
- output_file.prepend_data(data)
- else:
- output_file.append_data(data)
- else:
- self._log([u"Fragment %d has zero duration", num])
-
- # increment fragment counter
- num += 1
-
- # check if we must stop synthesizing because we have enough audio
- if (quit_after is not None) and (current_time > quit_after):
- self._log([u"Quitting after reached duration %.3f", current_time])
- break
-
- # write output file
- self._log([u"Writing audio file '%s'", output_file_path])
- output_file.write(file_path=output_file_path)
- self._log(u"Synthesizing using pure Python... done")
- except Exception as exc:
- self._log(u"Synthesizing using pure Python... failed")
- self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
-
- # return output
- # NOTE anchors do not make sense if backwards == True
- self._log([u"Returning %d time anchors", len(anchors)])
- self._log([u"Current time %.3f", current_time])
- self._log([u"Synthesized %d characters", num_chars])
- self._log(u"Synthesizing using pure Python... done")
- return (True, (anchors, current_time, num_chars))
-
- def synthesize_single(
- self,
- text,
- language,
- output_file_path
- ):
- """
- Create a ``wav`` audio file containing the synthesized text.
-
- The ``text`` must be a unicode string encodable with UTF-8,
- otherwise ``espeak`` might fail.
-
- Return the duration of the synthesized audio file, in seconds.
-
- :param text: the text to synthesize
- :type text: unicode
- :param language: the language to use
- :type language: :class:`aeneas.language.Language` enum
- :param output_file_path: the path of the output audio file
- :type output_file_path: string
- :rtype: float
-
- :raise TypeError: if ``text`` is ``None`` or it is not a ``unicode`` object
- :raise ValueError: if ``rconf["allow_unlisted_languages"]`` is ``False`` and
- ``language`` is not listed in
- :class:`aeneas.language.Language`
- :raise OSError: if output file cannot be written to ``output_file_path``
- :raise RuntimeError: if both the C extension and
- the pure Python code did not succeed.
- """
- # check that text_file is not None
- if text is None:
- self._log(u"text is None", Logger.CRITICAL)
- raise TypeError("text is None")
-
- # check that text has unicode type
- if not gf.is_unicode(text):
- self._log(u"text must be a unicode string", Logger.CRITICAL)
- raise TypeError("text must be a unicode string")
-
- # check that output_file_path can be written
- if not gf.file_can_be_written(output_file_path):
- self._log([u"Cannot write output file to '%s'", output_file_path], Logger.CRITICAL)
- raise OSError("Cannot write output file")
-
- # check that the requested language is listed in language.py
- if (language not in Language.ALLOWED_VALUES) and (not self.rconf["allow_unlisted_languages"]):
- self._log([u"Language '%s' is not allowed", language], Logger.CRITICAL)
- raise ValueError("Language code not allowed")
-
- self._log([u"Synthesizing text: '%s'", text])
- self._log([u"Synthesizing language: '%s'", language])
- self._log([u"Synthesizing to file: '%s'", output_file_path])
-
- # return zero if text is the empty string
- if len(text) == 0:
- self._log(u"len(text) is zero: returning 0.0")
- return 0.0
-
- # replace language
- language = self._replace_language(language)
- self._log([u"Using language: '%s'", language])
-
- result = gf.run_c_extension_with_fallback(
- self._log,
- "cew",
- self._synthesize_single_c_extension,
- self._synthesize_single_pure_python,
- (text, language, output_file_path),
- c_extension=self.rconf["c_ext"]
- )
- return result[0]
-
- def _synthesize_single_c_extension(self, text, language, output_file_path):
+ def _synthesize_single_c_extension(self, text, voice_code, output_file_path):
"""
- Synthesize a single text fragment, using cew extension.
+ Synthesize a single text fragment, using the cew extension.
Return the duration of the synthesized text, in seconds.
- :rtype: (bool, (float, ))
+ :rtype: (bool, (:class:`~aeneas.timevalue.TimeValue`, ))
"""
- self._log(u"Synthesizing using C extension...")
-
- self._log(u"Preparing c_text...")
- if gf.PY2:
- # Python 2 => pass byte strings
- c_text = gf.safe_bytes(text)
- else:
- # Python 3 => pass Unicode strings
- c_text = text
- # NOTE language has been replaced already!
- self._log(u"Preparing c_text... done")
-
- try:
- self._log(u"Importing aeneas.cew...")
- import aeneas.cew.cew
- self._log(u"Importing aeneas.cew... done")
- self._log(u"Calling aeneas.cew...")
- sr, begin, end = aeneas.cew.cew.synthesize_single(
- output_file_path,
- language,
- c_text
- )
- self._log(u"Calling aeneas.cew... done")
- except Exception as exc:
- self._log(u"Calling aeneas.cew... failed")
- self._log(u"An unexpected exception occurred while running cew:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
-
- self._log(u"Synthesizing using C extension... done")
- return (True, (end, ))
-
- def _synthesize_single_pure_python(self, text, language, output_file_path):
- """
- Synthesize a single text fragment, pure Python.
-
- :rtype: tuple (duration, sample_rate, encoding, data)
- """
- self._log(u"Synthesizing using pure Python...")
-
- # NOTE language has been replaced already!
-
- try:
- # call espeak via subprocess
- self._log(u"Calling espeak ...")
- arguments = [self.rconf["espeak_path"], "-v", language, "-w", output_file_path]
- self._log([u"Calling with arguments '%s'", " ".join(arguments)])
- self._log([u"Calling with text '%s'", text])
- proc = subprocess.Popen(
- arguments,
- stdout=subprocess.PIPE,
- stdin=subprocess.PIPE,
- stderr=subprocess.PIPE,
- universal_newlines=True)
+ self.log(u"Synthesizing using C extension...")
+
+ end = None
+ if self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_ENABLED]:
+ self.log(u"Using cewsubprocess to call aeneas.cew")
+ try:
+ self.log(u"Importing aeneas.cewsubprocess...")
+ from aeneas.cewsubprocess import CEWSubprocess
+ self.log(u"Importing aeneas.cewsubprocess... done")
+ self.log(u"Calling aeneas.cewsubprocess...")
+ cewsub = CEWSubprocess(rconf=self.rconf, logger=self.logger)
+ end = cewsub.synthesize_single(output_file_path, voice_code, text)
+ self.log(u"Calling aeneas.cewsubprocess... done")
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running cewsubprocess", exc, False, None)
+ # NOTE not critical, try calling aeneas.cew directly
+ #return (False, None)
+
+ if end is None:
+ self.log(u"Preparing c_text...")
if gf.PY2:
- proc.communicate(input=gf.safe_bytes(text))
+ # Python 2 => pass byte strings
+ c_text = gf.safe_bytes(text)
else:
- proc.communicate(input=text)
- proc.stdout.close()
- proc.stdin.close()
- proc.stderr.close()
- self._log(u"Calling espeak ... done")
- except Exception as exc:
- self._log(u"Calling espeak ... failed")
- self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING)
- self._log([u"%s", exc], Logger.WARNING)
- return (False, None)
-
- # check the file can be read
- if not gf.file_can_be_read(output_file_path):
- self._log([u"Output file '%s' does not exist", output_file_path], Logger.CRITICAL)
- return (False, None)
-
- # return the duration of the output file
- try:
- audio_file = AudioFileMonoWAVE(
- file_path=output_file_path,
- logger=self.logger
- )
- audio_file.load_data()
- duration = audio_file.audio_length
- sample_rate = audio_file.audio_sample_rate
- encoding = audio_file.audio_format
- data = audio_file.audio_data
- self._log([u"Duration of '%s': %f", output_file_path, duration])
- self._log(u"Synthesizing using pure Python... done")
- return (True, (duration, sample_rate, encoding, data))
- except (AudioFileUnsupportedFormatError, OSError) as exc:
- self._log(u"Error while trying reading the sythesized audio file", Logger.CRITICAL)
- return (False, None)
+ # Python 3 => pass Unicode strings
+ c_text = gf.safe_unicode(text)
+ self.log(u"Preparing c_text... done")
+
+ self.log(u"Calling aeneas.cew directly")
+ try:
+ self.log(u"Importing aeneas.cew...")
+ import aeneas.cew.cew
+ self.log(u"Importing aeneas.cew... done")
+ self.log(u"Calling aeneas.cew...")
+ sr, begin, end = aeneas.cew.cew.synthesize_single(
+ output_file_path,
+ voice_code,
+ c_text
+ )
+ end = TimeValue(end)
+ self.log(u"Calling aeneas.cew... done")
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while running cew", exc, False, None)
+ return (False, None)
+
+ self.log(u"Synthesizing using C extension... done")
+ return (True, (end, ))
diff --git a/aeneas/executejob.py b/aeneas/executejob.py
index db1a5ba0..3590fe70 100644
--- a/aeneas/executejob.py
+++ b/aeneas/executejob.py
@@ -2,9 +2,13 @@
# coding=utf-8
"""
-Execute a job, that is, execute all of its tasks
-and generate the output container
-holding the generated sync maps.
+This module contains the following classes:
+
+* :class:`~aeneas.executejob.ExecuteJob`, a class to process a job;
+* :class:`~aeneas.executejob.ExecuteJobExecutionError`,
+* :class:`~aeneas.executejob.ExecuteJobInputError`, and
+* :class:`~aeneas.executejob.ExecuteJobOutputError`,
+ representing errors generated while processing jobs.
"""
from __future__ import absolute_import
@@ -15,7 +19,7 @@
from aeneas.container import ContainerFormat
from aeneas.executetask import ExecuteTask
from aeneas.job import Job
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
import aeneas.globalfunctions as gf
@@ -26,21 +30,21 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class ExecuteJobInputError(Exception):
+class ExecuteJobExecutionError(Exception):
"""
- Error raised when the input parameters of the job are invalid or missing.
+ Error raised when the execution of the job fails for internal reasons.
"""
pass
-class ExecuteJobExecutionError(Exception):
+class ExecuteJobInputError(Exception):
"""
- Error raised when the execution of the job fails for internal reasons.
+ Error raised when the input parameters of the job are invalid or missing.
"""
pass
@@ -54,7 +58,7 @@ class ExecuteJobOutputError(Exception):
-class ExecuteJob(object):
+class ExecuteJob(Loggable):
"""
Execute a job, that is, execute all of its tasks
and generate the output container
@@ -62,7 +66,7 @@ class ExecuteJob(object):
If you do not provide a job object in the constructor,
you must manually set it later, or load it from a container
- with ``load_job_from_container``.
+ with :func:`~aeneas.executejob.ExecuteJob.load_job_from_container`.
In the first case, you are responsible for setting
the absolute audio/text/sync map paths of each task of the job,
@@ -71,102 +75,92 @@ class ExecuteJob(object):
any temporary files you might have generated around.
In the second case, you are responsible for
- calling ``clean`` at the end of the job execution,
+ calling :func:`~aeneas.executejob.ExecuteJob.clean`
+ at the end of the job execution,
to delete the working directory
- created by ``load_job_from_container``
+ created by :func:`~aeneas.executejob.ExecuteJob.load_job_from_container`
when creating the job object.
:param job: the job to be executed
- :type job: :class:`aeneas.job.Job`
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :type job: :class:`~aeneas.job.Job`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
-
- :raise ExecuteJobInputError: if ``job`` is not an instance of ``Job``
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if ``job`` is not an instance of ``Job``
"""
TAG = u"ExecuteJob"
def __init__(self, job=None, rconf=None, logger=None):
+ super(ExecuteJob, self).__init__(rconf=rconf, logger=logger)
self.job = job
self.working_directory = None
self.tmp_directory = None
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
if job is not None:
self.load_job(self.job)
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def load_job(self, job):
"""
Load the job from the given ``Job`` object.
:param job: the job to load
- :type job: :class:`aeneas.job.Job`
-
- :raise ExecuteJobInputError: if ``job`` is not an instance of ``Job``
+ :type job: :class:`~aeneas.job.Job`
+ :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if ``job`` is not an instance of :class:`~aeneas.job.Job`
"""
if not isinstance(job, Job):
- self._failed(u"job is not an instance of Job", "input")
+ self.log_exc(u"job is not an instance of Job", None, True, ExecuteJobInputError)
self.job = job
def load_job_from_container(self, container_path, config_string=None):
"""
- Load the job from the given ``Container`` object.
+ Load the job from the given :class:`aeneas.container.Container` object.
If ``config_string`` is ``None``,
the container must contain a configuration file;
otherwise use the provided config string
(i.e., the wizard case).
- :param container_path: the path to the input container
- :type container_path: string (path)
- :param config_string: the configuration string (from wizard)
- :type config_string: string
-
- :raise ExecuteJobInputError: if the given container does not contain a valid ``Job``
+ :param string container_path: the path to the input container
+ :param string config_string: the configuration string (from wizard)
+ :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if the given container does not contain a valid :class:`~aeneas.job.Job`
"""
- self._log(u"Loading job from container...")
+ self.log(u"Loading job from container...")
# create working directory where the input container
# will be decompressed
- self.working_directory = gf.tmp_directory(root=self.rconf["tmp_path"])
- self._log([u"Created working directory '%s'", self.working_directory])
+ self.working_directory = gf.tmp_directory(root=self.rconf[RuntimeConfiguration.TMP_PATH])
+ self.log([u"Created working directory '%s'", self.working_directory])
try:
- self._log(u"Decompressing input container...")
+ self.log(u"Decompressing input container...")
input_container = Container(container_path, logger=self.logger)
input_container.decompress(self.working_directory)
- self._log(u"Decompressing input container... done")
+ self.log(u"Decompressing input container... done")
except Exception as exc:
self.clean()
- self._failed(u"Unable to decompress container '%s': %s" % (container_path, exc), "input")
+ self.log_exc(u"Unable to decompress container '%s': %s" % (container_path, exc), None, True, ExecuteJobInputError)
try:
- self._log(u"Creating job from working directory...")
+ self.log(u"Creating job from working directory...")
working_container = Container(
self.working_directory,
logger=self.logger
)
analyzer = AnalyzeContainer(working_container, logger=self.logger)
self.job = analyzer.analyze(config_string=config_string)
- self._log(u"Creating job from working directory... done")
+ self.log(u"Creating job from working directory... done")
except Exception as exc:
self.clean()
- self._failed(u"Unable to analyze container '%s': %s" % (container_path, exc), "input")
+ self.log_exc(u"Unable to analyze container '%s': %s" % (container_path, exc), None, True, ExecuteJobInputError)
if self.job is None:
- self._failed(u"The container '%s' does not contain a valid Job" % container_path, "input")
+ self.log_exc(u"The container '%s' does not contain a valid Job" % (container_path), None, True, ExecuteJobInputError)
try:
# set absolute path for text file and audio file
# for each task in the job
- self._log(u"Setting absolute paths for tasks...")
+ self.log(u"Setting absolute paths for tasks...")
for task in self.job.tasks:
task.text_file_path_absolute = gf.norm_join(
self.working_directory,
@@ -176,12 +170,12 @@ def load_job_from_container(self, container_path, config_string=None):
self.working_directory,
task.audio_file_path
)
- self._log(u"Setting absolute paths for tasks... done")
+ self.log(u"Setting absolute paths for tasks... done")
- self._log(u"Loading job from container: succeeded")
+ self.log(u"Loading job from container: succeeded")
except Exception as exc:
self.clean()
- self._failed(u"Error while setting absolute paths for tasks: %s" % exc, "input")
+ self.log_exc(u"Error while setting absolute paths for tasks", exc, True, ExecuteJobInputError)
def execute(self):
"""
@@ -190,142 +184,129 @@ def execute(self):
Each produced sync map will be stored
inside the corresponding task object.
- :raise ExecuteJobExecutionError: if there is a problem during the job execution
+ :raises: :class:`~aeneas.executejob.ExecuteJobExecutionError`: if there is a problem during the job execution
"""
- self._log(u"Executing job")
+ self.log(u"Executing job")
if self.job is None:
- self._failed(u"The job object is None", "execution")
+ self.log_exc(u"The job object is None", None, True, ExecuteJobExecutionError)
if len(self.job) == 0:
- self._failed(u"The job has no tasks", "execution")
- if (self.rconf["job_max_tasks"] > 0) and (len(self.job) > self.rconf["job_max_tasks"]):
- self._failed(u"The Job has %d Tasks, more than the maximum allowed (%d)." % (
- len(self.job),
- self.rconf["job_max_tasks"]
- ), "execution")
- self._log([u"Number of tasks: '%d'", len(self.job)])
+ self.log_exc(u"The job has no tasks", None, True, ExecuteJobExecutionError)
+ job_max_tasks = self.rconf[RuntimeConfiguration.JOB_MAX_TASKS]
+ if (job_max_tasks > 0) and (len(self.job) > job_max_tasks):
+ self.log_exc(u"The Job has %d Tasks, more than the maximum allowed (%d)." % (len(self.job), job_max_tasks), None, True, ExecuteJobExecutionError)
+ self.log([u"Number of tasks: '%d'", len(self.job)])
for task in self.job.tasks:
try:
custom_id = task.configuration["custom_id"]
- self._log([u"Executing task '%s'...", custom_id])
+ self.log([u"Executing task '%s'...", custom_id])
executor = ExecuteTask(task, rconf=self.rconf, logger=self.logger)
executor.execute()
- self._log([u"Executing task '%s'... done", custom_id])
+ self.log([u"Executing task '%s'... done", custom_id])
except Exception as exc:
- self._failed(u"Error while executing task '%s': %s" % (custom_id, exc), "execution")
- self._log(u"Executing task: succeeded")
+ self.log_exc(u"Error while executing task '%s'" % (custom_id), exc, True, ExecuteJobExecutionError)
+ self.log(u"Executing task: succeeded")
- self._log(u"Executing job: succeeded")
+ self.log(u"Executing job: succeeded")
def write_output_container(self, output_directory_path):
"""
Write the output container for this job.
- Return the path to output container.
+ Return the path to output container,
+ which is the concatenation of ``output_directory_path``
+ and of the output container file or directory name.
- :param output_directory_path: the path to a directory where
- the output container must be created
- :type output_directory_path: string (path)
+ :param string output_directory_path: the path to a directory where
+ the output container must be created
:rtype: string
+ :raises: :class:`~aeneas.executejob.ExecuteJobOutputError`: if there is a problem while writing the output container
"""
- self._log(u"Writing output container for this job")
+ self.log(u"Writing output container for this job")
if self.job is None:
- self._failed(u"The job object is None", "output")
+ self.log_exc(u"The job object is None", None, True, ExecuteJobOutputError)
if len(self.job) == 0:
- self._failed(u"The job has no tasks", "output")
- self._log([u"Number of tasks: '%d'", len(self.job)])
+ self.log_exc(u"The job has no tasks", None, True, ExecuteJobOutputError)
+ self.log([u"Number of tasks: '%d'", len(self.job)])
# create temporary directory where the sync map files
# will be created
# this temporary directory will be compressed into
# the output container
- self.tmp_directory = gf.tmp_directory(root=self.rconf["tmp_path"])
- self._log([u"Created temporary directory '%s'", self.tmp_directory])
+ self.tmp_directory = gf.tmp_directory(root=self.rconf[RuntimeConfiguration.TMP_PATH])
+ self.log([u"Created temporary directory '%s'", self.tmp_directory])
for task in self.job.tasks:
custom_id = task.configuration["custom_id"]
# check if the task has sync map and sync map file path
if task.sync_map_file_path is None:
- self._failed(u"Task '%s' has sync_map_file_path not set" % custom_id, "output")
+ self.log_exc(u"Task '%s' has sync_map_file_path not set" % (custom_id), None, True, ExecuteJobOutputError)
if task.sync_map is None:
- self._failed(u"Task '%s' has sync_map not set" % custom_id, "output")
+ self.log_exc(u"Task '%s' has sync_map not set" % (custom_id), None, True, ExecuteJobOutputError)
try:
# output sync map
- self._log([u"Outputting sync map for task '%s'...", custom_id])
+ self.log([u"Outputting sync map for task '%s'...", custom_id])
task.output_sync_map_file(self.tmp_directory)
- self._log([u"Outputting sync map for task '%s'... done", custom_id])
+ self.log([u"Outputting sync map for task '%s'... done", custom_id])
except Exception as exc:
- self._failed(u"Error while outputting sync map for task '%s': %s" % (custom_id, exc), "output")
+ self.log_exc(u"Error while outputting sync map for task '%s'" % (custom_id), None, True, ExecuteJobOutputError)
# get output container info
output_container_format = self.job.configuration["o_container_format"]
- self._log([u"Output container format: '%s'", output_container_format])
+ self.log([u"Output container format: '%s'", output_container_format])
output_file_name = self.job.configuration["o_name"]
if ((output_container_format != ContainerFormat.UNPACKED) and
(not output_file_name.endswith(output_container_format))):
- self._log(u"Adding extension to output_file_name")
+ self.log(u"Adding extension to output_file_name")
output_file_name += "." + output_container_format
- self._log([u"Output file name: '%s'", output_file_name])
+ self.log([u"Output file name: '%s'", output_file_name])
output_file_path = gf.norm_join(
output_directory_path,
output_file_name
)
- self._log([u"Output file path: '%s'", output_file_path])
+ self.log([u"Output file path: '%s'", output_file_path])
try:
- self._log(u"Compressing...")
+ self.log(u"Compressing...")
container = Container(
output_file_path,
output_container_format,
logger=self.logger
)
container.compress(self.tmp_directory)
- self._log(u"Compressing... done")
- self._log([u"Created output file: '%s'", output_file_path])
- self._log(u"Writing output container for this job: succeeded")
+ self.log(u"Compressing... done")
+ self.log([u"Created output file: '%s'", output_file_path])
+ self.log(u"Writing output container for this job: succeeded")
self.clean(False)
return output_file_path
except Exception as exc:
self.clean(False)
- self._failed("%s" % (exc), "output")
+ self.log_exc(u"Error while compressing", exc, True, ExecuteJobOutputError)
return None
def clean(self, remove_working_directory=True):
"""
Remove the temporary directory.
- If ``remove_working_directory`` is True
+ If ``remove_working_directory`` is ``True``
remove the working directory as well,
otherwise just remove the temporary directory.
- :param remove_working_directory: if ``True``, remove
- the working directory as well
- :type remove_working_directory: bool
+ :param bool remove_working_directory: if ``True``, remove
+ the working directory as well
"""
if remove_working_directory is not None:
- self._log(u"Removing working directory... ")
+ self.log(u"Removing working directory... ")
gf.delete_directory(self.working_directory)
self.working_directory = None
- self._log(u"Removing working directory... done")
- self._log(u"Removing temporary directory... ")
+ self.log(u"Removing working directory... done")
+ self.log(u"Removing temporary directory... ")
gf.delete_directory(self.tmp_directory)
self.tmp_directory = None
- self._log(u"Removing temporary directory... done")
-
- def _failed(self, msg, during="execution"):
- """ Bubble exception up """
- if during == "input":
- self._log(msg, Logger.CRITICAL)
- raise ExecuteJobInputError(msg)
- elif during == "output":
- self._log(msg, Logger.CRITICAL)
- raise ExecuteJobOutputError(msg)
- else:
- self._log(msg, Logger.CRITICAL)
- raise ExecuteJobExecutionError(msg)
+ self.log(u"Removing temporary directory... done")
diff --git a/aeneas/executetask.py b/aeneas/executetask.py
index 84c33b13..c949878d 100644
--- a/aeneas/executetask.py
+++ b/aeneas/executetask.py
@@ -2,27 +2,36 @@
# coding=utf-8
"""
-Execute a task, that is, compute the sync map for it.
+This module contains the following classes:
+
+* :class:`~aeneas.executetask.ExecuteTask`, a class to process a task;
+* :class:`~aeneas.executetask.ExecuteTaskExecutionError`, and
+* :class:`~aeneas.executetask.ExecuteTaskInputError`,
+ representing errors generated while processing tasks.
"""
from __future__ import absolute_import
+from __future__ import division
from __future__ import print_function
import numpy
from aeneas.adjustboundaryalgorithm import AdjustBoundaryAlgorithm
-from aeneas.audiofile import AudioFileMonoWAVE
+from aeneas.audiofile import AudioFile
+from aeneas.audiofilemfcc import AudioFileMFCC
from aeneas.dtw import DTWAligner
from aeneas.ffmpegwrapper import FFMPEGWrapper
-from aeneas.language import Language
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
from aeneas.sd import SD
from aeneas.syncmap import SyncMap
from aeneas.syncmap import SyncMapFragment
from aeneas.syncmap import SyncMapHeadTailFormat
from aeneas.synthesizer import Synthesizer
+from aeneas.task import Task
+from aeneas.textfile import TextFileFormat
from aeneas.textfile import TextFragment
-from aeneas.vad import VAD
+from aeneas.timevalue import TimeValue
+from aeneas.tree import Tree
import aeneas.globalfunctions as gf
__author__ = "Alberto Pettarin"
@@ -32,378 +41,439 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class ExecuteTaskInputError(Exception):
+class ExecuteTaskExecutionError(Exception):
"""
- Error raised when the input parameters of the task are invalid or missing.
+ Error raised when the execution of the task fails for internal reasons.
"""
pass
-class ExecuteTaskExecutionError(Exception):
+class ExecuteTaskInputError(Exception):
"""
- Error raised when the execution of the task fails for internal reasons.
+ Error raised when the input parameters of the task are invalid or missing.
"""
pass
-class ExecuteTask(object):
+class ExecuteTask(Loggable):
"""
Execute a task, that is, compute the sync map for it.
:param task: the task to be executed
- :type task: :class:`aeneas.task.Task`
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :type task: :class:`~aeneas.task.Task`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
TAG = u"ExecuteTask"
- def __init__(self, task, rconf=None, logger=None):
+ def __init__(self, task=None, rconf=None, logger=None):
+ super(ExecuteTask, self).__init__(rconf=rconf, logger=logger)
self.task = task
- self.cleanup_info = []
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ self.step_index = 1
+ self.step_label = u""
+ self.step_begin_time = None
+ self.step_total = 0.000
+ if task is not None:
+ self.load_task(self.task)
+
+ def load_task(self, task):
+ """
+ Load the task from the given ``Task`` object.
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
+ :param task: the task to load
+ :type task: :class:`~aeneas.task.Task`
+ :raises: :class:`~aeneas.executetask.ExecuteTaskInputError`: if ``task`` is not an instance of :class:`~aeneas.task.Task`
+ """
+ if not isinstance(task, Task):
+ self.log_exc(u"task is not an instance of Task", None, True, ExecuteTaskInputError)
+ self.task = task
+
+ def _step_begin(self, label, log=True):
+ """ Log begin of a step """
+ if log:
+ self.step_label = label
+ self.step_begin_time = self.log(u"STEP %d BEGIN (%s)" % (self.step_index, label))
+
+ def _step_end(self, log=True):
+ """ Log end of a step """
+ if log:
+ step_end_time = self.log(u"STEP %d END (%s)" % (self.step_index, self.step_label))
+ diff = (step_end_time - self.step_begin_time)
+ diff = float(diff.seconds + diff.microseconds / 1000000.0)
+ self.step_total += diff
+ self.log(u"STEP %d DURATION %.3f (%s)" % (self.step_index, diff, self.step_label))
+ self.step_index += 1
+
+ def _step_failure(self, exc):
+ """ Log failure of a step """
+ self.log_crit(u"STEP %d (%s) FAILURE" % (self.step_index, self.step_label))
+ self.step_index += 1
+ self.log_exc(u"Unexpected error while executing task", exc, True, ExecuteTaskExecutionError)
+
+ def _step_total(self):
+ """ Log total """
+ self.log(u"STEP T DURATION %.3f" % (self.step_total))
def execute(self):
"""
Execute the task.
The sync map produced will be stored inside the task object.
- :raise ExecuteTaskInputError: if there is a problem with the input parameters
- :raise ExecuteTaskExecutionError: if there is a problem during the task execution
+ :raises: :class:`~aeneas.executetask.ExecuteTaskInputError`: if there is a problem with the input parameters
+ :raises: :class:`~aeneas.executetask.ExecuteTaskExecutionError`: if there is a problem during the task execution
"""
- self._log(u"Executing task")
+ self.log(u"Executing task...")
# check that we have the AudioFile object
if self.task.audio_file is None:
- self._failed(u"The task does not seem to have its audio file set", False)
+ self.log_exc(u"The task does not seem to have its audio file set", None, True, ExecuteTaskInputError)
if (
(self.task.audio_file.audio_length is None) or
(self.task.audio_file.audio_length <= 0)
):
- self._failed(u"The task seems to have an invalid audio file", False)
+ self.log_exc(u"The task seems to have an invalid audio file", None, True, ExecuteTaskInputError)
+ task_max_audio_length = self.rconf[RuntimeConfiguration.TASK_MAX_AUDIO_LENGTH]
if (
- (self.rconf["task_max_a_len"] > 0) and
- (self.task.audio_file.audio_length > self.rconf["task_max_a_len"])
+ (task_max_audio_length > 0) and
+ (self.task.audio_file.audio_length > task_max_audio_length)
):
- self._failed(u"The audio file of the task has length %.3f, more than the maximum allowed (%.3f)." % (
- self.task.audio_file.audio_length,
- self.rconf["task_max_a_len"]
- ), False)
+ self.log_exc(u"The audio file of the task has length %.3f, more than the maximum allowed (%.3f)." % (self.task.audio_file.audio_length, task_max_audio_length), None, True, ExecuteTaskInputError)
# check that we have the TextFile object
if self.task.text_file is None:
- self._failed(u"The task does not seem to have its text file set", False)
+ self.log_exc(u"The task does not seem to have its text file set", None, True, ExecuteTaskInputError)
if len(self.task.text_file) == 0:
- self._failed(u"The task text file seems to have no text fragments", False)
+ self.log_exc(u"The task text file seems to have no text fragments", None, True, ExecuteTaskInputError)
+ task_max_text_length = self.rconf[RuntimeConfiguration.TASK_MAX_TEXT_LENGTH]
if (
- (self.rconf["task_max_t_len"] > 0) and
- (len(self.task.text_file) > self.rconf["task_max_t_len"])
+ (task_max_text_length > 0) and
+ (len(self.task.text_file) > task_max_text_length)
):
- self._failed(u"The text file of the task has %d fragments, more than the maximum allowed (%d)." % (
- len(self.task.text_file),
- self.rconf["task_max_t_len"]
- ), False)
+ self.log_exc(u"The text file of the task has %d fragments, more than the maximum allowed (%d)." % (len(self.task.text_file), task_max_text_length), None, True, ExecuteTaskInputError)
if self.task.text_file.chars == 0:
- self._failed(u"The task text file seems to have empty text", False)
+ self.log_exc(u"The task text file seems to have empty text", None, True, ExecuteTaskInputError)
- self._log(u"Both audio and text input file are present")
- self.cleanup_info = []
+ self.log(u"Both audio and text input file are present")
- # real full wave = the real audio file, converted to WAVE format
- # real trimmed wave = real full wave, possibly with head and/or tail trimmed off
- # synt wave = WAVE file synthesized from text; it will be aligned to real trimmed wave
+ # execute
+ self.step_index = 1
+ self.step_total = 0.000
+ if self.task.text_file.file_format in [TextFileFormat.MPLAIN, TextFileFormat.MUNPARSED]:
+ self._execute_multi_level_task()
+ else:
+ self._execute_single_level_task()
+ self.log(u"Executing task... done")
- step_index = 0
+ def _execute_single_level_task(self):
+ """ Execute a single-level task """
+ self.log(u"Executing single level task...")
try:
- # STEP 0 : convert audio file to real full wave
- self._log(u"STEP %d BEGIN" % (step_index))
- real_full_handler, real_full_path = self._convert()
- self.cleanup_info.append([real_full_handler, real_full_path])
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 1 : extract MFCCs from real full wave
- self._log(u"STEP %d BEGIN" % (step_index))
- real_full_wave_full_mfcc, real_full_wave_length = self._extract_mfcc(real_full_path)
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 2 : cut head and/or tail off
- # detecting head/tail if requested, and
- # overwriting real_path
- # at the end, read_path will not have the head/tail
- self._log(u"STEP %d BEGIN" % (step_index))
- real_wave_modified = self._cut_head_tail(real_full_path)
- real_trimmed_path = real_full_path
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 3 : synthesize text to wave
- self._log(u"STEP %d BEGIN" % (step_index))
- synt_handler, synt_path, synt_anchors = self._synthesize()
- self.cleanup_info.append([synt_handler, synt_path])
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 4 : align waves
- self._log(u"STEP %d BEGIN" % (step_index))
- if real_wave_modified:
- wave_map = self._align_waves(real_trimmed_path, synt_path, None, None)
- else:
- wave_map = self._align_waves(real_trimmed_path, synt_path, real_full_wave_full_mfcc, real_full_wave_length)
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 5 : align text
- self._log(u"STEP %d BEGIN" % (step_index))
- text_map = self._align_text(wave_map, synt_anchors)
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 6 : translate the text_map, possibly putting back the head/tail
- self._log(u"STEP %d BEGIN" % (step_index))
- translated_text_map = self._translate_text_map(
- text_map,
- real_full_wave_length
- )
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 7 : adjust boundaries
- self._log(u"STEP %d BEGIN" % (step_index))
- adjusted_map = self._adjust_boundaries(
- translated_text_map,
- real_full_wave_full_mfcc,
- real_full_wave_length
- )
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 8 : create syncmap and add it to task
- self._log(u"STEP %d BEGIN" % (step_index))
- self._create_syncmap(adjusted_map)
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- # STEP 9 : cleanup
- self._log(u"STEP %d BEGIN" % (step_index))
- self._cleanup()
- self._log(u"STEP %d END" % (step_index))
- step_index += 1
-
- self._log(u"Execution completed")
- return True
+ # load audio file, extract MFCCs from real wave, clear audio file
+ self._step_begin(u"extract MFCC real wave")
+ real_wave_mfcc = self._extract_mfcc(file_path=self.task.audio_file_path_absolute, file_path_is_mono_wave=False)
+ self._step_end()
+
+ # compute head and/or tail and set it
+ self._step_begin(u"compute head tail")
+ (head_length, process_length, tail_length) = self._compute_head_process_tail(real_wave_mfcc)
+ real_wave_mfcc.set_head_middle_tail(head_length, process_length, tail_length)
+ self._step_end()
+
+ # compute a time map alignment
+ time_map = self._execute_inner(real_wave_mfcc, self.task.text_file, adjust_boundaries=True, log=True)
+
+ # convert time_map to tree and create syncmap and add it to task
+ self._step_begin(u"create sync map")
+ tree = self._level_time_map_to_tree(self.task.text_file, time_map)
+ self.task.sync_map = self._create_syncmap(tree)
+ self._step_end()
+
+ # check for fragments with zero duration
+ self._step_begin(u"check zero duration")
+ self._check_no_zero(self.rconf.mws)
+ self._step_end()
+
+ # log total
+ self._step_total()
+ self.log(u"Executing single level task... done")
except Exception as exc:
- self._log(u"STEP %d FAILURE" % step_index, Logger.CRITICAL)
- self._cleanup()
- self._failed("%s" % (exc), True)
- self._log(u"Executing task... done")
-
- def _failed(self, msg, during_execution=True):
- """ Bubble exception up """
- if during_execution:
- self._log(msg, Logger.CRITICAL)
- raise ExecuteTaskExecutionError(msg)
- else:
- self._log(msg, Logger.CRITICAL)
- raise ExecuteTaskInputError(msg)
+ self._step_failure(exc)
+
+ def _execute_multi_level_task(self):
+ """ Execute a multi-level task """
+ self.log(u"Executing multi level task...")
+
+ self.log(u"Saving rconf...")
+ # save original rconf
+ orig_rconf = self.rconf.clone()
+ # clone rconfs and set granularity
+ level_rconfs = [None, self.rconf.clone(), self.rconf.clone(), self.rconf.clone()]
+ level_mfccs = [None, None, None, None]
+ for i in range(1, len(level_rconfs)):
+ level_rconfs[i].set_granularity(i)
+ self.log([u"Level %d mws: %.3f", i, level_rconfs[i].mws])
+ self.log(u"Saving rconf... done")
+
+ try:
+ self.log(u"Creating AudioFile object...")
+ audio_file = self._load_audio_file()
+ self.log(u"Creating AudioFile object... done")
+
+ # extract MFCC for each level
+ for i in range(1, len(level_rconfs)):
+ self._step_begin(u"extract MFCC real wave level %d" % i)
+ if (i == 1) or (level_rconfs[i].mws != level_rconfs[i-1].mws) or (level_rconfs[i].mwl != level_rconfs[i-1].mwl):
+ self.rconf = level_rconfs[i]
+ level_mfccs[i] = self._extract_mfcc(audio_file=audio_file)
+ else:
+ self.log(u"Keeping MFCC real wave from previous level")
+ level_mfccs[i] = level_mfccs[i-1]
+ self._step_end()
+
+ self.log(u"Clearing AudioFile object...")
+ self.rconf = level_rconfs[1]
+ self._clear_audio_file(audio_file)
+ self.log(u"Clearing AudioFile object... done")
+
+ # compute head tail for the entire real wave (level 1)
+ self._step_begin(u"compute head tail")
+ (head_length, process_length, tail_length) = self._compute_head_process_tail(level_mfccs[1])
+ level_mfccs[1].set_head_middle_tail(head_length, process_length, tail_length)
+ self._step_end()
+
+ # compute alignment at each level
+ tree = Tree()
+ sync_roots = [tree]
+ text_files = [self.task.text_file]
+ aht = [None, True, False, False]
+ aba = [None, True, True, False]
+ for i in range(1, len(level_rconfs)):
+ self._step_begin(u"compute alignment level %d" % i)
+ text_files, sync_roots = self._execute_level(i, level_rconfs[i], level_mfccs[i], text_files, sync_roots, aht[i], aba[i])
+ self._step_end()
+
+ self._step_begin(u"select levels")
+ tree = self._select_levels(tree)
+ self._step_end()
+
+ self._step_begin(u"create sync map")
+ self.rconf = orig_rconf
+ self.task.sync_map = self._create_syncmap(tree)
+ self._step_end()
+
+ self._step_begin(u"check zero duration")
+ self._check_no_zero(level_rconfs[-1].mws)
+ self._step_end()
+
+ self._step_total()
+ self.log(u"Executing multi level task... done")
+ except Exception as exc:
+ self._step_failure(exc)
- def _cleanup(self):
+ def _execute_level(self, level, rconf, audio_file_mfcc, text_files, sync_roots, add_head_tail, adjust_boundaries):
"""
- Remove all temporary files.
+ Compute the alignment for all the nodes in the given level.
+
+ Return a pair (next_level_text_files, next_level_sync_roots),
+ containing two lists of text file subtrees and sync map subtrees
+ on the next level.
+
+ :param int level: the level
+ :param rconf: the runtime configuration for this level
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param audio_file_mfcc: the audio MFCC representation for this level
+ :type audio_file_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :param list text_files: a list of :class:`~aeneas.textfile.TextFile` objects,
+ each representing a (sub)tree of the Task text file
+ :param list sync_roots: a list of :class:`~aeneas.tree.Tree` objects,
+ each representing a SyncMapFragment tree,
+ one for each element in ``text_files``
+ :param bool add_head_tail: if ``True``, add head and tail nodes to the sync map tree
+ :param bool adjust_boundaries: if ``True``, execute the adjust boundary algorithm
+ :rtype: (list, list)
"""
- self._log(u"Cleaning up...")
- for info in self.cleanup_info:
- handler, path = info
- self._log([u"Removing file '%s'", path])
- gf.delete_file(handler, path)
- self.cleanup_info = []
- self._log(u"Cleaning up... done")
-
- def _convert(self):
+ self.rconf = rconf
+ i = 0
+ next_level_text_files = []
+ next_level_sync_roots = []
+ for text_file in text_files:
+ self.log([u"Text level %d, fragment %d", level, i])
+ self.log([u" Len: %d", len(text_file)])
+ sync_root = sync_roots[i]
+ if (level > 1) and (len(text_file) == 1):
+ self.log(u" Level > 1 and only one child => returning trivial timemap")
+ time_map = [
+ (TimeValue("0.000"), sync_root.value.begin),
+ (sync_root.value.begin, sync_root.value.end),
+ (sync_root.value.end, audio_file_mfcc.audio_length)
+ ]
+ else:
+ self.log(u" Level 1 or more than one child => computing timemap")
+ if not sync_root.is_empty:
+ begin = sync_root.value.begin
+ end = sync_root.value.end
+ self.log([u" Begin: %.3f", begin])
+ self.log([u" End: %.3f", end])
+ audio_file_mfcc.set_head_middle_tail(head_length=begin, middle_length=(end - begin))
+ else:
+ self.log(u" No begin or end to set")
+ time_map = self._execute_inner(audio_file_mfcc, text_file, adjust_boundaries=adjust_boundaries, log=False)
+ self.log([u" Map: %s", str(time_map)])
+ self._level_time_map_to_tree(text_file, time_map, sync_root, add_head_tail=add_head_tail)
+ # store next level roots
+ next_level_text_files.extend(text_file.children_not_empty)
+ src = sync_root.children
+ if add_head_tail:
+ # if we added head and tail,
+ # we must not pass them to the next level
+ src = src[1:-1]
+ next_level_sync_roots.extend(src)
+ i += 1
+ return (next_level_text_files, next_level_sync_roots)
+
+ def _execute_inner(self, audio_file_mfcc, text_file, adjust_boundaries=True, log=True):
"""
- Convert the entire audio file into a ``wav`` file.
+ Align a subinterval of the given AudioFileMFCC
+ with the given TextFile.
- (Head/tail will be cut off later.)
+ Return the computed time map, as a list of intervals.
- Return a pair:
+ The begin and end positions inside the AudioFileMFCC
+ must have been set ahead by the caller.
- 1. handler of the generated wave file
- 2. path of the generated wave file
+ The text fragments being aligned are the vchildren of ``text_file``.
+
+ :param audio_file_mfcc: the audio file MFCC representation
+ :type audio_file_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ :param text_file: the text file subtree to align
+ :type text_file: :class:`~aeneas.textfile.TextFile`
+ :param bool adjust_boundaries: if ``True``, execute the adjust boundary algorithm
+ :param bool log: if ``True``, log steps
+ :rtype: list
+ """
+ self._step_begin(u"synthesize text", log=log)
+ synt_handler, synt_path, synt_anchors, synt_mono = self._synthesize(text_file)
+ self._step_end(log=log)
+
+ self._step_begin(u"extract MFCC synt wave", log=log)
+ synt_wave_mfcc = self._extract_mfcc(file_path=synt_path, file_path_is_mono_wave=synt_mono)
+ gf.delete_file(synt_handler, synt_path)
+ self._step_end(log=log)
+
+ self._step_begin(u"align waves", log=log)
+ indices = self._align_waves(audio_file_mfcc, synt_wave_mfcc, synt_anchors)
+ self._step_end(log=log)
+
+ self._step_begin(u"adjust boundaries", log=log)
+ time_map = self._adjust_boundaries(audio_file_mfcc, text_file, indices, adjust_boundaries)
+ self._step_end(log=log)
+
+ return time_map
+
+ def _load_audio_file(self):
"""
- self._log(u"Converting real audio to wav")
- handler = None
- path = None
- self._log(u"Creating an output tmp file")
- handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"])
- self._log(u"Creating a FFMPEGWrapper")
- ffmpeg = FFMPEGWrapper(rconf=self.rconf, logger=self.logger)
- self._log(u"Converting...")
- ffmpeg.convert(
- input_file_path=self.task.audio_file_path_absolute,
- output_file_path=path
+ Load audio in memory.
+
+ :rtype: :class:`~aeneas.audiofile.AudioFile`
+ """
+ self._step_begin(u"load audio file")
+ audio_file = AudioFile(
+ file_path=self.task.audio_file_path_absolute,
+ is_mono_wave=False,
+ rconf=self.rconf,
+ logger=self.logger
)
- self._log(u"Converting... done")
- self._log(u"Converting real audio to wav: succeeded")
- return (handler, path)
+ audio_file.read_samples_from_file()
+ self._step_end()
+ return audio_file
- def _extract_mfcc(self, audio_file_path):
+ def _clear_audio_file(self, audio_file):
"""
- Extract the MFCCs of the real full wave.
+ Clear audio from memory.
- Return a pair:
+ :param audio_file: the object to clear
+ :type audio_file: :class:`~aeneas.audiofile.AudioFile`
+ """
+ self._step_begin(u"clear audio file")
+ audio_file.clear_data()
+ audio_file = None
+ self._step_end()
- 1. audio MFCCs
- 2. audio length
+ def _extract_mfcc(self, file_path=None, file_path_is_mono_wave=False, audio_file=None):
"""
- self._log(u"Extracting MFCCs from real full wave")
- audio_file = AudioFileMonoWAVE(audio_file_path, rconf=self.rconf, logger=self.logger)
- audio_file.extract_mfcc()
- self._log(u"Extracting MFCCs from real full wave: succeeded")
- return (audio_file.audio_mfcc, audio_file.audio_length)
+ Extract the MFCCs from the given audio file.
- def _cut_head_tail(self, audio_file_path):
+ :rtype: :class:`~aeneas.audiofilemfcc.AudioFileMFCC`
+ """
+ return AudioFileMFCC(
+ file_path=file_path,
+ file_path_is_mono_wave=file_path_is_mono_wave,
+ audio_file=audio_file,
+ rconf=self.rconf,
+ logger=self.logger
+ )
+
+ def _compute_head_process_tail(self, audio_file_mfcc):
"""
Set the audio file head or tail,
- suitably cutting the audio file on disk,
- and setting the corresponding parameters in the task configuration.
+ by either reading the explicit values
+ from the Task configuration,
+ or using SD to determine them.
- Return ``True`` if head or tail has been cut;
- otherwise return ``False`` (real wave file not modified)
+ This function returns the lengths, in seconds,
+ of the (head, process, tail).
- :rtype: bool
+ :rtype: tuple (float, float, float)
"""
- self._log(u"Setting head and/or tail")
head_length = self.task.configuration["i_a_head"]
process_length = self.task.configuration["i_a_process"]
tail_length = self.task.configuration["i_a_tail"]
- detect_head_max = self.task.configuration["i_a_head_max"]
- detect_head_min = self.task.configuration["i_a_head_min"]
- detect_tail_max = self.task.configuration["i_a_tail_max"]
- detect_tail_min = self.task.configuration["i_a_tail_min"]
-
- # explicit head or process?
- explicit = (
+ head_max = self.task.configuration["i_a_head_max"]
+ head_min = self.task.configuration["i_a_head_min"]
+ tail_max = self.task.configuration["i_a_tail_max"]
+ tail_min = self.task.configuration["i_a_tail_min"]
+ if (
(head_length is not None) or
(process_length is not None) or
(tail_length is not None)
- )
-
- # at least one detect parameter?
- detect = (
- (detect_head_min is not None) or
- (detect_head_max is not None) or
- (detect_tail_min is not None) or
- (detect_tail_max is not None)
- )
-
- if not (explicit or detect):
- # nothing to do
- self._log(u"No explicit head/process or detect head/tail")
- self._log(u"Setting head and/or tail: succeeded")
- return False
-
- # we need to cut head/tail, hence load the audio data
- audio_file = AudioFileMonoWAVE(audio_file_path, rconf=self.rconf, logger=self.logger)
- audio_file.load_data()
-
- if explicit:
- self._log(u"Explicit head, process, or tail")
- else:
- self._log(u"No explicit head, process, or tail => detecting head/tail")
-
- head = 0.0
- if (detect_head_min is not None) or (detect_head_max is not None):
- self._log(u"Detecting head...")
- detect_head_min = gf.safe_float(detect_head_min, SD.MIN_HEAD_LENGTH)
- detect_head_max = gf.safe_float(detect_head_max, SD.MAX_HEAD_LENGTH)
- self._log([u"detect_head_min is %.3f", detect_head_min])
- self._log([u"detect_head_max is %.3f", detect_head_max])
- start_detector = SD(audio_file, self.task.text_file, rconf=self.rconf, logger=self.logger)
- head = start_detector.detect_head(detect_head_min, detect_head_max)
- self._log([u"Detected head: %.3f", head])
-
- tail = 0.0
- if (detect_tail_min is not None) or (detect_tail_max is not None):
- self._log(u"Detecting tail...")
- detect_tail_max = gf.safe_float(detect_tail_max, SD.MAX_TAIL_LENGTH)
- detect_tail_min = gf.safe_float(detect_tail_min, SD.MIN_TAIL_LENGTH)
- self._log([u"detect_tail_min is %.3f", detect_tail_min])
- self._log([u"detect_tail_max is %.3f", detect_tail_max])
- start_detector = SD(audio_file, self.task.text_file, rconf=self.rconf, logger=self.logger)
- tail = start_detector.detect_tail(detect_tail_min, detect_tail_max)
- self._log([u"Detected tail: %.3f", tail])
-
- head_length = max(0, head)
- process_length = max(0, audio_file.audio_length - tail - head)
- tail_length = audio_file.audio_length - head_length - process_length
-
- # we need to set these values
- # in the config object for later use
- self.task.configuration["i_a_head"] = head_length
- self.task.configuration["i_a_process"] = process_length
- self._log([u"Set head_length: %.3f", head_length])
- self._log([u"Set process_length: %.3f", process_length])
-
- # in case we are reading from config object
- if head_length is not None:
- self._log(u"head_length is not None, converting to float")
- head_length = float(head_length)
+ ):
+ self.log(u"Setting explicit head process tail")
else:
- self._log(u"head_length is None: setting it to 0.0")
- head_length = 0.0
- # note that process_length and tail_length are mutually exclusive
- # with process_length having precedence over tail_length
- if process_length is not None:
- self._log(u"process_length is not None, converting to float")
- process_length = float(process_length)
- if tail_length is not None:
- self._log(u"tail_length is not None, but it will be ignored")
- tail_length = float(tail_length)
- elif tail_length is not None:
- self._log(u"tail_length is not None, converting to float")
- tail_length = float(tail_length)
- self._log(u"computing process_length from tail_length")
- process_length = audio_file.audio_length - head_length - tail_length
-
- self._log([u"is_audio_file_head_length is %s", str(head_length)])
- self._log([u"is_audio_file_process_length is %s", str(process_length)])
- self._log([u"is_audio_file_tail_length is %s", str(tail_length)])
-
- self._log(u"Trimming audio data...")
- audio_file.trim(head_length, process_length)
- self._log(u"Trimming audio data... done")
-
- self._log(u"Writing audio file...")
- audio_file.write(audio_file_path)
- self._log(u"Writing audio file... done")
-
- self._log(u"Clearing audio data...")
- audio_file.clear_data()
- self._log(u"Clearing audio data... done")
-
- self._log(u"Setting head and/or tail: succeeded")
- return True
-
- def _synthesize(self):
+ self.log(u"Detecting head tail...")
+ sd = SD(audio_file_mfcc, self.task.text_file, rconf=self.rconf, logger=self.logger)
+ head_length = TimeValue("0.000")
+ process_length = None
+ tail_length = TimeValue("0.000")
+ if (head_min is not None) or (head_max is not None):
+ self.log(u"Detecting HEAD...")
+ head_length = sd.detect_head(head_min, head_max)
+ self.log([u"Detected HEAD: %.3f", head_length])
+ self.log(u"Detecting HEAD... done")
+ if (tail_min is not None) or (tail_max is not None):
+ self.log(u"Detecting TAIL...")
+ tail_length = sd.detect_tail(tail_min, tail_max)
+ self.log([u"Detected TAIL: %.3f", tail_length])
+ self.log(u"Detecting TAIL... done")
+ self.log(u"Detecting head tail... done")
+ self.log([u"Head: %s", gf.safe_float(head_length, None)])
+ self.log([u"Process: %s", gf.safe_float(process_length, None)])
+ self.log([u"Tail: %s", gf.safe_float(tail_length, None)])
+ return (head_length, process_length, tail_length)
+
+ def _synthesize(self, text_file):
"""
- Synthesize text into a ``wav`` file.
+ Synthesize text into a WAVE file.
- Return a triple:
+ Return:
1. handler of the generated wave file
2. path of the generated wave file
@@ -411,238 +481,211 @@ def _synthesize(self):
each representing the start time of the corresponding
text fragment in the generated wave file
``[start_1, start_2, ..., start_n]``
- """
- self._log(u"Synthesizing text")
- handler = None
- path = None
- anchors = None
- self._log(u"Creating an output tmp file")
- handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"])
- self._log(u"Creating Synthesizer object")
- synt = Synthesizer(rconf=self.rconf, logger=self.logger)
- self._log(u"Synthesizing...")
- result = synt.synthesize(self.task.text_file, path)
- anchors = result[0]
- self._log(u"Synthesizing... done")
- self._log(u"Synthesizing text: succeeded")
- return (handler, path, anchors)
+ 4. if the synthesizer produced a PCM16 mono WAVE file
- def _align_waves(self, real_path, synt_path, real_full_wave_full_mfcc=None, real_full_wave_length=None):
+ :param synthesizer: the synthesizer to use
+ :type synthesizer: :class:`~aeneas.synthesizer.Synthesizer`
+ :rtype: tuple (handler, string, list)
"""
- Align two ``wav`` files.
-
- Return the computed alignment map, that is,
- a list of pairs of floats, each representing
- corresponding time instants
- in the real and synt wave, respectively
- ``[real_time, synt_time]``
+ synthesizer = Synthesizer(rconf=self.rconf, logger=self.logger)
+ handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf[RuntimeConfiguration.TMP_PATH])
+ result = synthesizer.synthesize(text_file, path)
+ anchors = result[0]
+ return (handler, path, anchors, synthesizer.output_is_mono_wave)
- If ``real_full_wave_full_mfcc`` and ``real_full_wave_length``
- are not None, use them instead of computing MFCCs again.
+ def _align_waves(self, real_wave_mfcc, synt_wave_mfcc, synt_anchors):
"""
- self._log(u"Aligning waves")
- self._log(u"Creating DTWAligner object")
- aligner = DTWAligner(real_path, synt_path, rconf=self.rconf, logger=self.logger)
- self._log(u"Computing MFCC...")
- if (real_full_wave_full_mfcc is not None) and (real_full_wave_length is not None):
- self._log(u"Using real wave MFCCs already computed")
- aligner.real_wave_full_mfcc = real_full_wave_full_mfcc
- aligner.real_wave_length = real_full_wave_length
- aligner.compute_mfcc(real_wave=False, synt_wave=True)
- else:
- self._log(u"Computing both real and synt wave MFCCs")
- aligner.compute_mfcc(real_wave=True, synt_wave=True)
- self._log(u"Computing MFCC... done")
- self._log(u"Computing path...")
- aligner.compute_path()
- self._log(u"Computing path... done")
- self._log(u"Computing map...")
- computed_map = aligner.computed_map
- self._log(u"Computing map... done")
- self._log(u"Aligning waves: succeeded")
- return computed_map
-
- def _align_text(self, wave_map, synt_anchors):
- """
- Align the text with the real wave,
- using the ``wave_map`` (containing the mapping
- between real and synt waves) and ``synt_anchors``
- (containing the start times of text fragments
- in the synt wave).
-
- Return the computed interval map, that is,
- a list of triples ``[start_time, end_time, fragment_id]``
- """
- self._log(u"Aligning text")
- self._log([u"Number of frames: %d", len(wave_map)])
- self._log([u"Number of fragments: %d", len(synt_anchors)])
-
- real_times = numpy.array([t[0] for t in wave_map])
- synt_times = numpy.array([t[1] for t in wave_map])
- real_anchors = []
- anchor_index = 0
- # TODO numpy-fy this loop
- for anchor in synt_anchors:
- time, fragment_id, fragment_text = anchor
- self._log(u"Looking for argmin index...")
- # TODO allow an user-specified function instead of min
- # partially solved by AdjustBoundaryAlgorithm
- index = (numpy.abs(synt_times - time)).argmin()
- self._log(u"Looking for argmin index... done")
- real_time = real_times[index]
- real_anchors.append([real_time, fragment_id, fragment_text])
- self._log([u"Time for anchor %d: %f", anchor_index, real_time])
- anchor_index += 1
-
- # dummy last anchor, starting at the real file duration
- real_anchors.append([real_times[-1], None, None])
-
- # compute map
- self._log(u"Computing interval map...")
- # TODO numpy-fy this loop
- computed_map = []
- for i in range(len(real_anchors) - 1):
- fragment_id = real_anchors[i][1]
- fragment_text = real_anchors[i][2]
- start = real_anchors[i][0]
- end = real_anchors[i+1][0]
- computed_map.append([start, end, fragment_id, fragment_text])
- self._log(u"Computing interval map... done")
- self._log(u"Aligning text: succeeded")
- return computed_map
-
- def _translate_text_map(self, text_map, real_full_wave_length):
- """
- Translate the text_map by adding head and tail dummy fragments
+ Align two AudioFileMFCC objects,
+ representing WAVE files.
- Return the translated text map
+ Return a list of boundary indices.
"""
- translated = []
- head = gf.safe_float(self.task.configuration["i_a_head"], 0)
- translated.append([0, head, None, None])
- end = 0
- for element in text_map:
- start, end, fragment_id, fragment_text = element
- start += head
- end += head
- translated.append([start, end, fragment_id, fragment_text])
- translated.append([end, real_full_wave_length, None, None])
- return translated
-
- def _adjust_boundaries(
- self,
- text_map,
- real_wave_full_mfcc,
- real_wave_length
- ):
+ self.log(u"Creating DTWAligner...")
+ aligner = DTWAligner(real_wave_mfcc, synt_wave_mfcc, rconf=self.rconf, logger=self.logger)
+ self.log(u"Creating DTWAligner... done")
+ self.log(u"Computing boundary indices...")
+ boundary_indices = aligner.compute_boundaries(synt_anchors)
+ self.log(u"Computing boundary indices... done")
+ return boundary_indices
+
+ def _adjust_boundaries(self, real_wave_mfcc, text_file, boundary_indices, adjust_boundaries=True):
"""
- Adjust the boundaries between consecutive fragments.
+ Adjust boundaries as requested by the user.
- Return the computed interval map, that is,
- a list of triples ``[start_time, end_time, fragment_id]``
+ Return the computed time map, that is,
+ a list of pairs ``[start_time, end_time]``,
+ of length equal to number of fragments + 2,
+ where the two extra elements are for
+ the HEAD (first) and TAIL (last).
"""
- self._log(u"Adjusting boundaries")
- algo = self.task.configuration["aba_algorithm"]
- value = None
- if algo is None:
- self._log(u"No adjust boundary algorithm specified: returning")
- return text_map
- elif algo == AdjustBoundaryAlgorithm.AUTO:
- self._log(u"Requested adjust boundary algorithm AUTO: returning")
- return text_map
- elif algo == AdjustBoundaryAlgorithm.AFTERCURRENT:
- value = self.task.configuration["aba_aftercurrent_value"]
- elif algo == AdjustBoundaryAlgorithm.BEFORENEXT:
- value = self.task.configuration["aba_beforenext_value"]
- elif algo == AdjustBoundaryAlgorithm.OFFSET:
- value = self.task.configuration["aba_offset_value"]
- elif algo == AdjustBoundaryAlgorithm.PERCENT:
- value = self.task.configuration["aba_percent_value"]
- elif algo == AdjustBoundaryAlgorithm.RATE:
- value = self.task.configuration["aba_rate_value"]
- elif algo == AdjustBoundaryAlgorithm.RATEAGGRESSIVE:
- value = self.task.configuration["aba_rate_value"]
- self._log([u"Requested algo %s and value %s", algo, str(value)])
-
- self._log(u"Running VAD...")
- vad = VAD(real_wave_full_mfcc, real_wave_length, rconf=self.rconf, logger=self.logger)
- vad.compute_vad()
- self._log(u"Running VAD... done")
-
- self._log(u"Creating AdjustBoundaryAlgorithm object")
- adjust_boundary = AdjustBoundaryAlgorithm(
- algorithm=algo,
- text_map=text_map,
- speech=vad.speech,
- nonspeech=vad.nonspeech,
- value=value,
+ # boundary_indices contains the boundary indices in the all_mfcc of real_wave_mfcc
+ # starting with the (head-1st fragment) and ending with (-1th fragment-tail)
+ if adjust_boundaries:
+ aba_algorithm, aba_parameters = self.task.configuration.aba_parameters()
+ self.log([u"Running algorithm: '%s'", aba_algorithm])
+ else:
+ self.log(u"Forced running algorithm: 'auto'")
+ aba_algorithm = AdjustBoundaryAlgorithm.AUTO
+ aba_parameters = None
+ return AdjustBoundaryAlgorithm(
+ algorithm=aba_algorithm,
+ parameters=aba_parameters,
+ real_wave_mfcc=real_wave_mfcc,
+ boundary_indices=boundary_indices,
+ text_file=text_file,
rconf=self.rconf,
logger=self.logger
- )
- self._log(u"Adjusting boundaries...")
- adjusted_map = adjust_boundary.adjust()
- self._log(u"Adjusting boundaries... done")
- self._log(u"Adjusting boundaries: succeeded")
- return adjusted_map
+ ).to_time_map()
- def _create_syncmap(self, adjusted_map):
+ def _level_time_map_to_tree(self, text_file, time_map, tree=None, add_head_tail=True):
"""
- Create a sync map out of the provided interval map,
- and store it in the task object.
+ Convert a level time map into a Tree of SyncMapFragments.
+
+ The time map is
+ a list of pairs ``[start_time, end_time]``,
+ of length equal to number of fragments + 2,
+ where the two extra elements are for
+ the HEAD (first) and TAIL (last).
+
+ :param text_file: the text file object
+ :type text_file: :class:`~aeneas.textfile.TextFile`
+ :param list time_map: the time map
+ :param tree: the tree; if ``None``, a new Tree will be built
+ :type tree: :class:`~aeneas.tree.Tree`
+ :rtype: :class:`~aeneas.tree.Tree`
"""
- self._log(u"Creating sync map")
- self._log([u"Number of fragments in adjusted map (including HEAD and TAIL): %d", len(adjusted_map)])
+ if tree is None:
+ tree = Tree()
+ if add_head_tail:
+ fragments = (
+ [TextFragment(u"HEAD", self.task.configuration["language"], [u""])] +
+ text_file.fragments +
+ [TextFragment(u"TAIL", self.task.configuration["language"], [u""])]
+ )
+ i = 0
+ else:
+ fragments = text_file.fragments
+ i = 1
+ for fragment in fragments:
+ interval = time_map[i]
+ sm_frag = SyncMapFragment(fragment, interval[0], interval[1])
+ tree.add_child(Tree(value=sm_frag))
+ i += 1
+ return tree
- # adjusted map has 2 elements (HEAD and TAIL) more than text_file
- #if len(adjusted_map) != len(self.task.text_file.fragments) + 2:
- # self._log(u"The number of sync map fragments does not match the number of text fragments (+2)", Logger.CRITICAL)
- # return False
+ def _select_levels(self, tree):
+ """
+ Select the correct levels in the tree,
+ reading the ``os_task_file_levels``
+ parameter in the Task configuration.
- sync_map = SyncMap()
- head = adjusted_map[0]
- tail = adjusted_map[-1]
+ If ``None`` or invalid, return the current sync map tree
+ unchanged.
+ Otherwise, return only the levels appearing in it.
- # get language
- language = Language.EN
- self._log([u"Language set to default: %s", language])
- if len(self.task.text_file.fragments) > 0:
- language = self.task.text_file.fragments[0].language
- self._log([u"Language read from text_file: %s", language])
+ :param tree: a Tree of SyncMapFragments
+ :type tree: :class:`~aeneas.tree.Tree`
+ :rtype: :class:`~aeneas.tree.Tree`
+ """
+ levels = self.task.configuration["o_levels"]
+ self.log([u"Levels: '%s'", levels])
+ if (levels is None) or (len(levels) < 1):
+ return tree
+ try:
+ levels = [int(l) for l in levels if int(l) > 0]
+ self.log([u"Converted levels: %s", levels])
+ except ValueError:
+ self.log_warn(u"Cannot convert levels to list of int, returning unchanged")
+ return tree
+ # remove head and tail nodes
+ head = tree.vchildren[0]
+ tail = tree.vchildren[-1]
+ tree.remove_child(0)
+ tree.remove_child(-1)
+ # keep only the selected levels
+ tree.keep_levels(levels)
+ # add head and tail back
+ tree.add_child(Tree(value=head), as_last=False)
+ tree.add_child(Tree(value=tail), as_last=True)
+ # return the new tree
+ return tree
+
+ def _create_syncmap(self, tree):
+ """
+ Return a sync map corresponding to the provided text file and time map.
- # get head/tail format
+ :param tree: a Tree of SyncMapFragments
+ :type tree: :class:`~aeneas.tree.Tree`
+ :rtype: :class:`~aeneas.syncmap.SyncMap`
+ """
+ self.log([u"Fragments in time map (including HEAD/TAIL): %d", len(tree)])
head_tail_format = self.task.configuration["o_h_t_format"]
- self._log([u"Head/tail format: %s", str(head_tail_format)])
+ self.log([u"Head/tail format: %s", str(head_tail_format)])
+
+ children = tree.vchildren
+ head = children[0]
+ first = children[1]
+ last = children[-2]
+ tail = children[-1]
- # add head sync map fragment if needed
- if head_tail_format == SyncMapHeadTailFormat.ADD:
- head_frag = TextFragment(u"HEAD", language, [u""])
- sync_map_frag = SyncMapFragment(head_frag, head[0], head[1])
- sync_map.append_fragment(sync_map_frag)
- self._log([u"Adding head (ADD): %.3f %.3f", head[0], head[1]])
+ # remove HEAD fragment if needed
+ if head_tail_format != SyncMapHeadTailFormat.ADD:
+ tree.remove_child(0)
+ self.log(u"Removed HEAD")
# stretch first and last fragment timings if needed
if head_tail_format == SyncMapHeadTailFormat.STRETCH:
- self._log([u"Stretching (STRETCH): %.3f => %.3f (head) and %.3f => %.3f (tail)", adjusted_map[1][0], head[0], adjusted_map[-2][1], tail[1]])
- adjusted_map[1][0] = head[0]
- adjusted_map[-2][1] = tail[1]
-
- i = 1
- for fragment in self.task.text_file.fragments:
- start = adjusted_map[i][0]
- end = adjusted_map[i][1]
- sync_map_frag = SyncMapFragment(fragment, start, end)
- sync_map.append_fragment(sync_map_frag)
- i += 1
+ self.log([u"Stretched first.begin: %.3f => %.3f (head)", first.begin, head.begin])
+ self.log([u"Stretched last.end: %.3f => %.3f (tail)", last.end, tail.end])
+ first.begin = head.begin
+ last.end = tail.end
- # add tail sync map fragment if needed
- if head_tail_format == SyncMapHeadTailFormat.ADD:
- tail_frag = TextFragment(u"TAIL", language, [u""])
- sync_map_frag = SyncMapFragment(tail_frag, tail[0], tail[1])
- sync_map.append_fragment(sync_map_frag)
- self._log([u"Adding tail (ADD): %.3f %.3f", tail[0], tail[1]])
+ # remove TAIL fragment if needed
+ if head_tail_format != SyncMapHeadTailFormat.ADD:
+ tree.remove_child(-1)
+ self.log(u"Removed TAIL")
- self.task.sync_map = sync_map
- self._log(u"Creating sync map: succeeded")
+ # return sync map
+ sync_map = SyncMap()
+ sync_map.fragments_tree = tree
+ return sync_map
+
+ # TODO can this be done during the alignment?
+ def _check_no_zero(self, min_mws):
+ """ Check for fragments with zero duration """
+ if self.task.configuration["o_no_zero"]:
+ self.log(u"Checking for fragments with zero duration...")
+ # TODO use min_mws when doable, e.g. only one fragment?
+ delta = TimeValue("0.001")
+ leaves = self.task.sync_map.fragments_tree.vleaves_not_empty
+ # first and last leaves are HEAD and TAIL, skipping them
+ max_index = len(leaves) - 1
+ self.log([u"Fragment min index: %d", 1])
+ self.log([u"Fragment max index: %d", max_index - 1])
+ for i in range(1, max_index):
+ self.log([u"Checking index: %d", i])
+ j = i
+ while (j < max_index) and (leaves[j].end == leaves[i].begin):
+ j += 1
+ if j != i:
+ self.log(u"Fragment(s) with zero duration:")
+ for k in range(i, j):
+ self.log([u" %d : %s", k, leaves[k]])
+
+ if leaves[j].end - leaves[j].begin > (j - i) * delta:
+ # there is room after
+ # to move each zero fragment forward by 0.001
+ for k in range(j - i):
+ shift = (k + 1) * delta
+ leaves[i + k].end += shift
+ leaves[i + k + 1].begin += shift
+ self.log([u" Moved fragment %d forward by %.3f", i + k, shift])
+ else:
+ self.log_warn(u" Unable to fix")
+ i = j - 1
+ self.log(u"Checking for fragments with zero duration... done")
+ else:
+ self.log(u"Not checking for fragments with zero duration")
diff --git a/aeneas/extra/.gitignore b/aeneas/extra/.gitignore
new file mode 100644
index 00000000..ef3a94cf
--- /dev/null
+++ b/aeneas/extra/.gitignore
@@ -0,0 +1 @@
+ctw_speect
diff --git a/aeneas/extra/README.md b/aeneas/extra/README.md
new file mode 100644
index 00000000..06a37b80
--- /dev/null
+++ b/aeneas/extra/README.md
@@ -0,0 +1,81 @@
+# aeneas extras
+
+This Python module (directory) contains
+a collection of extra tools for aeneas,
+mainly custom TTS engine wrappers.
+
+
+
+## `ctw_espeak.py`
+
+A wrapper for the `eSpeak` TTS engine
+that executes `eSpeak` via `subprocess`.
+
+This file is an example to illustrate
+how to write a custom TTS wrapper,
+and how to use it at runtime:
+
+1. Copy the `ctw_espeak.py` file to `/tmp/ctw_espeak.py`
+ (or any other directory you like).
+
+2. Run any `aeneas.tools.*` with the following options:
+
+ ```
+ -r="tts=custom|tts_path=/tmp/ctw_espeak.py"
+ ```
+
+ For example:
+
+ ```bash
+ python -m aeneas.tools.execute_task --example-srt -r="tts=custom|tts_path=/tmp/ctw_espeak.py"
+ ```
+
+For details, please inspect the `ctw_espeak.py` file,
+which is heavily commented and it should help you
+create a new wrapper for your own TTS engine.
+
+Note: if you want to use `eSpeak` as your TTS engine
+in a production environment,
+do NOT use the `ctw_espeak.py` wrapper!
+`eSpeak` is the default TTS engine of `aeneas`,
+and the `aeneas.espeakwrapper` in the main library
+is faster than the `ctw_espeak.py` wrapper.
+
+
+
+## `ctw_speect.py`
+
+A wrapper for the `Speect` TTS engine
+that synthesizes text via Python calls
+to the `speect` Python module.
+
+To use it, do the following:
+
+1. Install `Speect` and compile the Python module `speect`:
+see [http://speect.sourceforge.net/](http://speect.sourceforge.net/) for details.
+
+2. Download a voice for `Speect`, for example the `Speect CMU Arctic slt` voice
+(file `cmu_arctic_slt-1.0.tar.gz`
+from [http://hlt.mirror.ac.za/TTS/Speect/](http://hlt.mirror.ac.za/TTS/Speect/)),
+and decompress it to `/tmp/cmu_arctic_slt/`
+(or any other directory you like).
+
+3. Copy the `ctw_speect.py` file to `/tmp/cmu_arctic_slt/ctw_speect.py`
+ (or any other directory you like).
+
+4. Run any `aeneas.tools.*` with the following options:
+
+ ```
+ -r="tts=custom|tts_path=/tmp/cmu_arctic_slt/ctw_speect.py"
+ ```
+
+ For example:
+
+ ```bash
+ python -m aeneas.tools.execute_task --example-srt -r="tts=custom|tts_path=/tmp/cmu_arctic_slt/ctw_speect.py"
+ ```
+
+For details, please inspect the `ctw_speect.py` file.
+
+
+
diff --git a/aeneas/extra/__init__.py b/aeneas/extra/__init__.py
new file mode 100644
index 00000000..8b698ed4
--- /dev/null
+++ b/aeneas/extra/__init__.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+aeneas.extra contains a collection of extra tools for aeneas,
+mainly custom TTS engine wrappers.
+"""
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL 3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+
+
+
diff --git a/aeneas/extra/ctw_espeak.py b/aeneas/extra/ctw_espeak.py
new file mode 100644
index 00000000..7f8828f5
--- /dev/null
+++ b/aeneas/extra/ctw_espeak.py
@@ -0,0 +1,152 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+A wrapper for a custom TTS engine.
+"""
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+from aeneas.language import Language
+from aeneas.ttswrapper import TTSWrapper
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+class CustomTTSWrapper(TTSWrapper):
+ """
+ A wrapper for the ``espeak`` TTS engine,
+ to illustrate the use of custom TTS wrapper
+ loading at runtime.
+
+ It will perform one or more calls like ::
+
+ $ echo "text to be synthesized" | espeak -v en -w output_file.wav
+
+ This wrapper supports calling the TTS engine
+ only via ``subprocess``.
+
+ To use this TTS engine, specify ::
+
+ "tts=custom|tts_path=/path/to/this/file.py"
+
+ in the ``rconf`` object.
+
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ """
+
+ TAG = u"CustomTTSWrapper"
+
+ #
+ # NOTE create aliases for the language codes
+ # supported by this TTS: in this example,
+ # English, Italian, Russian and Ukrainian
+ #
+ ENG = Language.ENG
+ """ English """
+
+ ITA = Language.ITA
+ """ Italian """
+
+ RUS = Language.RUS
+ """ Russian """
+
+ UKR = Language.UKR
+ """ Ukrainian """
+
+ #
+ # NOTE LANGUAGE_TO_VOICE_CODE maps a language code
+ # to the corresponding voice code
+ # supported by this custom TTS wrapper;
+ # mock support for Ukrainian with Russian voice
+ #
+ LANGUAGE_TO_VOICE_CODE = {
+ ENG : "en",
+ ITA : "it",
+ RUS : "ru",
+ UKR : "ru",
+ }
+ DEFAULT_LANGUAGE = ENG
+
+ #
+ # NOTE eSpeak always outputs to PCM16 mono WAVE (RIFF)
+ #
+ OUTPUT_MONO_WAVE = True
+
+ def __init__(self, rconf=None, logger=None):
+ #
+ # NOTE custom TTS wrappers must be implemented
+ # in a class named CustomTTSWrapper
+ # otherwise the Synthesizer will not work
+ #
+ # NOTE this custom TTS wrapper implements
+ # only the subprocess call method
+ # hence we set the following init parameters
+ #
+ super(CustomTTSWrapper, self).__init__(
+ has_subprocess_call=True,
+ has_c_extension_call=False,
+ has_python_call=False,
+ rconf=rconf,
+ logger=logger
+ )
+ #
+ # NOTE this example is minimal, as we implement only
+ # the subprocess call method
+ # hence, all we need to do is to specify
+ # how to map the command line arguments of the TTS engine
+ #
+ # NOTE if our TTS engine was callable via Python or a Python C extension,
+ # we would have needed to write a _synthesize_multiple_python()
+ # or a _synthesize_multiple_c_extension() function,
+ # with the same I/O interface of
+ # _synthesize_multiple_c_extension() in espeakwrapper.py
+ #
+ # NOTE on a command line, you will use eSpeak
+ # to synthesize some text to a WAVE file as follows:
+ #
+ # $ echo "text to be synthesized" | espeak -v en -w output_file.wav
+ #
+ # Observe that text is read from stdin, while the audio data
+ # is written to a file specified by a given output path,
+ # introduced by the "-w" switch.
+ # Also, there is a parameter to select the English voice ("en"),
+ # introduced by the "-v" switch.
+ #
+ self.set_subprocess_arguments([
+ u"/usr/bin/espeak", # path of espeak executable; you can use just "espeak" if it is in your PATH
+ u"-v", # append "-v"
+ TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING, # it will be replaced by the actual voice code
+ u"-w", # append "-w"
+ TTSWrapper.CLI_PARAMETER_WAVE_PATH, # it will be replaced by the actual output file path
+ TTSWrapper.CLI_PARAMETER_TEXT_STDIN # text is read from stdin
+ ])
+ #
+ # NOTE if your TTS engine only reads text from a file
+ # you can use the TTSWrapper.CLI_PARAMETER_TEXT_PATH placeholder.
+ #
+ # NOTE if your TTS engine only writes audio data to stdout
+ # you can use the TTSWrapper.CLI_PARAMETER_WAVE_STDOUT placeholder.
+ #
+ # NOTE if your TTS engine needs a more complex parameter
+ # for selecting the voice, e.g. Festival needs '-eval "(language_italian)"',
+ # you can implement a _voice_code_to_subprocess() function
+ # and use the TTSWrapper.CLI_PARAMETER_VOICE_CODE_FUNCTION placeholder
+ # instead of the TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING placeholder.
+ # See the aeneas/festivalwrapper.py file for an example.
+ #
+
+
+
diff --git a/aeneas/extra/ctw_speect.py b/aeneas/extra/ctw_speect.py
new file mode 100644
index 00000000..b7ade692
--- /dev/null
+++ b/aeneas/extra/ctw_speect.py
@@ -0,0 +1,222 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+A wrapper for the ``speect`` TTS engine.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy
+import speect
+import speect.audio
+import speect.audio_riff
+
+from aeneas.audiofile import AudioFile
+from aeneas.language import Language
+from aeneas.timevalue import TimeValue
+from aeneas.ttswrapper import TTSWrapper
+import aeneas.globalfunctions as gf
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+class CustomTTSWrapper(TTSWrapper):
+ """
+ A wrapper for the ``speect`` TTS engine.
+
+ This wrapper supports calling the TTS engine
+ only via Python.
+
+ To use this TTS engine, specify ::
+
+ "tts=custom|tts_path=/path/to/this/file.py"
+
+ in the ``RuntimeConfiguration`` object.
+
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ """
+
+ TAG = u"CustomTTSWrapper"
+
+ #
+ # NOTE in this example we load an English voice,
+ # hence we support only English language,
+ # and we map it to a dummy voice code
+ #
+ ENG = Language.ENG
+ """ English """
+ LANGUAGE_TO_VOICE_CODE = {
+ ENG : ENG
+ }
+ DEFAULT_LANGUAGE = ENG
+
+ #
+ # NOTE in this example we load a voice producing
+ # audio data in PCM16 mono WAVE (RIFF) format
+ #
+ OUTPUT_MONO_WAVE = True
+
+ def __init__(self, rconf=None, logger=None):
+ super(CustomTTSWrapper, self).__init__(
+ has_subprocess_call=False,
+ has_c_extension_call=False,
+ has_python_call=True,
+ rconf=rconf,
+ logger=logger)
+
+ def _synthesize_multiple_python(self, text_file, output_file_path, quit_after=None, backwards=False):
+ """
+ Synthesize multiple text fragments, via Python call.
+
+ Return a tuple (anchors, total_time, num_chars).
+
+ :rtype: (bool, (list, TimeValue, int))
+ """
+ #
+ # TODO in the Speect Python API I was not able to find a way
+ # to generate the wave incrementally
+ # so I essentially copy the subprocess call mechanism:
+ # generating wave data for each fragment,
+ # and concatenating them together
+ #
+ self.log(u"Calling TTS engine via Python...")
+ try:
+ # get sample rate and encoding
+ du_nu, sample_rate, encoding, da_nu = self._synthesize_single_helper(
+ text=u"Dummy text to get sample_rate",
+ voice_code=self.DEFAULT_LANGUAGE
+ )
+
+ # open output file
+ output_file = AudioFile(rconf=self.rconf, logger=self.logger)
+ output_file.audio_format = encoding
+ output_file.audio_channels = 1
+ output_file.audio_sample_rate = sample_rate
+
+ # create output
+ anchors = []
+ current_time = TimeValue("0.000")
+ num = 0
+ num_chars = 0
+ fragments = text_file.fragments
+ if backwards:
+ fragments = fragments[::-1]
+ for fragment in fragments:
+ # language to voice code
+ #
+ # NOTE since voice_code is actually ignored
+ # in _synthesize_single_helper(),
+ # the value of voice_code is irrelevant
+ #
+ # however, in general you need to apply
+ # the _language_to_voice_code() function that maps
+ # the text language to a voice code
+ #
+ # here we apply the _language_to_voice_code() defined in super()
+ # that sets voice_code = fragment.language
+ #
+ voice_code = self._language_to_voice_code(fragment.language)
+ # synthesize and get the duration of the output file
+ self.log([u"Synthesizing fragment %d", num])
+ duration, sr_nu, enc_nu, data = self._synthesize_single_helper(
+ text=(fragment.filtered_text + u" "),
+ voice_code=voice_code
+ )
+ # store for later output
+ anchors.append([current_time, fragment.identifier, fragment.text])
+ # increase the character counter
+ num_chars += fragment.characters
+ # append new data
+ self.log([u"Fragment %d starts at: %.3f", num, current_time])
+ if duration > 0:
+ self.log([u"Fragment %d duration: %.3f", num, duration])
+ current_time += duration
+ # if backwards, we append the data reversed
+ output_file.add_samples(data, reverse=backwards)
+ else:
+ self.log([u"Fragment %d has zero duration", num])
+ # increment fragment counter
+ num += 1
+ # check if we must stop synthesizing because we have enough audio
+ if (quit_after is not None) and (current_time > quit_after):
+ self.log([u"Quitting after reached duration %.3f", current_time])
+ break
+
+ # if backwards, we need to reverse the audio samples again
+ if backwards:
+ output_file.reverse()
+
+ # write output file
+ self.log([u"Writing audio file '%s'", output_file_path])
+ output_file.write(file_path=output_file_path)
+ except Exception as exc:
+ self.log_exc(u"An unexpected error occurred while calling TTS engine via Python", exc, False, None)
+ return (False, None)
+
+ # return output
+ # NOTE anchors do not make sense if backwards
+ self.log([u"Returning %d time anchors", len(anchors)])
+ self.log([u"Current time %.3f", current_time])
+ self.log([u"Synthesized %d characters", num_chars])
+ self.log(u"Calling TTS engine via Python... done")
+ return (True, (anchors, current_time, num_chars))
+
+ def _synthesize_single_python(self, text, voice_code, output_file_path):
+ """
+ Synthesize a single text fragment via Python call.
+
+ :rtype: tuple (result, (duration, sample_rate, encoding, data))
+ """
+ self.log(u"Synthesizing using Python call...")
+ data = self._synthesize_single_helper(text, voice_code, output_file_path)
+ return (True, data)
+
+ def _synthesize_single_helper(self, text, voice_code, output_file_path=None):
+ """
+ This is an helper function to synthesize a single text fragment via Python call.
+
+ The caller can choose whether the output file should be written to disk or not.
+
+ :rtype: tuple (result, (duration, sample_rate, encoding, data))
+ """
+ #
+ # NOTE in this example, we assume that the Speect voice data files
+ # are located in the same directory of this .py source file
+ # and that the voice JSON file is called "voice.json"
+ #
+ # NOTE the voice_code value is ignored in this example,
+ # but in general one might select a voice file to load,
+ # depending on voice_code
+ #
+ voice_json_path = gf.safe_str(gf.absolute_path("voice.json", __file__))
+ voice = speect.SVoice(voice_json_path)
+ utt = voice.synth(text)
+ audio = utt.features["audio"]
+ if output_file_path is not None:
+ audio.save_riff(gf.safe_str(output_file_path))
+
+ # get length and data using speect Python API
+ waveform = audio.get_audio_waveform()
+ audio_sample_rate = int(waveform["samplerate"])
+ audio_length = TimeValue(audio.num_samples() / audio_sample_rate)
+ audio_format = "pcm16"
+ audio_samples = numpy.fromstring(waveform["samples"], dtype=numpy.int16).astype("float64") / 32768
+
+ # return data
+ return (audio_length, audio_sample_rate, audio_format, audio_samples)
+
+
+
diff --git a/aeneas/festivalwrapper.py b/aeneas/festivalwrapper.py
new file mode 100644
index 00000000..67dadd34
--- /dev/null
+++ b/aeneas/festivalwrapper.py
@@ -0,0 +1,135 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+"""
+This module contains the following classes:
+
+* :class:`~aeneas.festivalwrapper.FESTIVALWrapper`, a wrapper for the ``Festival`` TTS engine.
+"""
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+from aeneas.language import Language
+from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.ttswrapper import TTSWrapper
+
+__author__ = "Alberto Pettarin"
+__copyright__ = """
+ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it)
+ Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it)
+ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
+ """
+__license__ = "GNU AGPL v3"
+__version__ = "1.5.0"
+__email__ = "aeneas@readbeyond.it"
+__status__ = "Production"
+
+class FESTIVALWrapper(TTSWrapper):
+ """
+ A wrapper for the ``Festival`` TTS engine.
+
+ This wrapper supports calling the TTS engine
+ via ``subprocess`` only.
+
+ In abstract terms, it performs one or more calls like ::
+
+ $ echo text | text2wave -eval (language_italian) -o output_file.wav
+
+ To use this TTS engine, specify ::
+
+ "tts=festival|tts_path=/path/to/wave2text"
+
+ in the ``RuntimeConfiguration`` object.
+
+ See :class:`~aeneas.ttswrapper.TTSWrapper` for the available functions.
+ Below are listed the languages supported by this wrapper.
+
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param logger: the logger object
+ :type logger: :class:`~aeneas.logger.Logger`
+ """
+
+ CES = Language.CES
+ """ Czech """
+
+ CYM = Language.CYM
+ """ Welsh """
+
+ ENG = Language.ENG
+ """ English """
+
+ FIN = Language.FIN
+ """ Finnish """
+
+ ITA = Language.ITA
+ """ Italian """
+
+ RUS = Language.RUS
+ """ Russian """
+
+ SPA = Language.SPA
+ """ Spanish """
+
+ ENG_GBR = "eng-GBR"
+ """ English (GB) """
+
+ ENG_SCT = "eng-SCT"
+ """ English (Scotland) """
+
+ ENG_USA = "eng-USA"
+ """ English (USA) """
+
+ LANGUAGE_TO_VOICE_CODE = {
+ CES : CES,
+ CYM : CYM,
+ ENG : ENG,
+ ENG_GBR : ENG_GBR,
+ ENG_SCT : ENG_SCT,
+ ENG_USA : ENG_USA,
+ SPA : SPA,
+ FIN : FIN,
+ ITA : ITA,
+ RUS : RUS
+ }
+ DEFAULT_LANGUAGE = ENG
+
+ VOICE_CODE_TO_SUBPROCESS = {
+ CES : u"(language_czech)",
+ CYM : u"(language_welsh)",
+ ENG : u"(language_english)",
+ ENG_GBR : u"(language_british_english)",
+ ENG_SCT : u"(language_scots_gaelic)",
+ ENG_USA : u"(language_american_english)",
+ SPA : u"(language_castillian_spanish)",
+ FIN : u"(language_finnish)",
+ ITA : u"(language_italian)",
+ RUS : u"(language_russian)",
+ }
+
+ OUTPUT_MONO_WAVE = True
+
+ TAG = u"FESTIVALWrapper"
+
+ def __init__(self, rconf=None, logger=None):
+ super(FESTIVALWrapper, self).__init__(
+ has_subprocess_call=True,
+ has_c_extension_call=False,
+ has_python_call=False,
+ rconf=rconf,
+ logger=logger
+ )
+ self.set_subprocess_arguments([
+ self.rconf[RuntimeConfiguration.TTS_PATH],
+ TTSWrapper.CLI_PARAMETER_VOICE_CODE_FUNCTION,
+ u"-o",
+ TTSWrapper.CLI_PARAMETER_WAVE_PATH,
+ TTSWrapper.CLI_PARAMETER_TEXT_STDIN
+ ])
+
+ def _voice_code_to_subprocess(self, voice_code):
+ return [u"-eval", self.VOICE_CODE_TO_SUBPROCESS[voice_code]]
+
+
+
diff --git a/aeneas/ffmpegwrapper.py b/aeneas/ffmpegwrapper.py
index 4da85420..108c7bae 100644
--- a/aeneas/ffmpegwrapper.py
+++ b/aeneas/ffmpegwrapper.py
@@ -2,14 +2,17 @@
# coding=utf-8
"""
-Wrapper around ``ffmpeg`` to convert audio files.
+This module contains the following classes:
+
+* :class:`~aeneas.ffmpegwrapper.FFMPEGWrapper`, a wrapper around ``ffmpeg`` to convert audio files;
+* :class:`~aeneas.ffmpegwrapper.FFMPEGPathError`, representing a failure to locate the ``ffmpeg`` executable.
"""
from __future__ import absolute_import
from __future__ import print_function
import subprocess
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
import aeneas.globalfunctions as gf
@@ -20,31 +23,32 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
class FFMPEGPathError(Exception):
"""
Error raised when the path to ``ffmpeg`` is not a valid executable.
+
+ .. versionadded:: 1.4.1
"""
pass
-class FFMPEGWrapper(object):
+class FFMPEGWrapper(Loggable):
"""
- Wrapper around ``ffmpeg`` to convert audio files.
+ A wrapper around ``ffmpeg`` to convert audio files.
- It will perform a call like::
+ In abstract terms, it will perform a call like::
$ ffmpeg -i /path/to/input.mp3 [parameters] /path/to/output.wav
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
FFMPEG_SAMPLE_8000 = ["-ar", "8000"]
@@ -134,14 +138,6 @@ class FFMPEGWrapper(object):
TAG = u"FFMPEGWrapper"
- def __init__(self, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def convert(
self,
input_file_path,
@@ -166,43 +162,36 @@ def convert(
you can skip a portion at the beginning and at the end
of the original input file.
- :param input_file_path: the path of the audio file to convert
- :type input_file_path: string
- :param output_file_path: the path of the converted audio file
- :type output_file_path: string
- :param head_length: skip these many seconds
- from the beginning of the audio file
- :type head_length: float
- :param process_length: process these many seconds of the audio file
- :type process_length: float
-
- :raises FFMPEGPathError: if the path to the ``ffmpeg`` executable cannot be called
- :raise OSError: if ``input_file_path`` does not exist
- or ``output_file_path`` cannot be written
+ :param string input_file_path: the path of the audio file to convert
+ :param string output_file_path: the path of the converted audio file
+ :param float head_length: skip these many seconds
+ from the beginning of the audio file
+ :param float process_length: process these many seconds of the audio file
+ :raises: :class:`~aeneas.ffmpegwrapper.FFMPEGPathError`: if the path to the ``ffmpeg`` executable cannot be called
+ :raises: OSError: if ``input_file_path`` does not exist
+ or ``output_file_path`` cannot be written
"""
# test if we can read the input file
if not gf.file_can_be_read(input_file_path):
- self._log([u"Input file '%s' cannot be read", input_file_path], Logger.CRITICAL)
- raise OSError("Input file cannot be read")
+ self.log_exc(u"Input file '%s' cannot be read" % (input_file_path), None, True, OSError)
# test if we can write the output file
if not gf.file_can_be_written(output_file_path):
- self._log([u"Output file '%s' cannot be written", output_file_path], Logger.CRITICAL)
- raise OSError("Output file cannot be written")
+ self.log_exc(u"Output file '%s' cannot be written" % (output_file_path), None, True, OSError)
# call ffmpeg
- arguments = [self.rconf["ffmpeg_path"]]
+ arguments = [self.rconf[RuntimeConfiguration.FFMPEG_PATH]]
arguments.extend(["-i", input_file_path])
if head_length is not None:
arguments.extend(["-ss", head_length])
if process_length is not None:
arguments.extend(["-t", process_length])
- if self.rconf["ffmpeg_sample_rate"] in self.FFMPEG_PARAMETERS_MAP:
- arguments.extend(self.FFMPEG_PARAMETERS_MAP[self.rconf["ffmpeg_sample_rate"]])
+ if self.rconf[RuntimeConfiguration.FFMPEG_SAMPLE_RATE] in self.FFMPEG_PARAMETERS_MAP:
+ arguments.extend(self.FFMPEG_PARAMETERS_MAP[self.rconf[RuntimeConfiguration.FFMPEG_SAMPLE_RATE]])
else:
arguments.extend(self.FFMPEG_PARAMETERS_DEFAULT)
arguments.append(output_file_path)
- self._log([u"Calling with arguments '%s'", arguments])
+ self.log([u"Calling with arguments '%s'", arguments])
try:
proc = subprocess.Popen(
arguments,
@@ -214,18 +203,16 @@ def convert(
proc.stdout.close()
proc.stdin.close()
proc.stderr.close()
- except OSError:
- self._log([u"Unable to call the '%s' ffmpeg executable", self.rconf["ffmpeg_path"]], Logger.CRITICAL)
- raise FFMPEGPathError("Unable to call the specified ffmpeg executable")
- self._log(u"Call completed")
+ except OSError as exc:
+ self.log_exc(u"Unable to call the '%s' ffmpeg executable" % (self.rconf[RuntimeConfiguration.FFMPEG_PATH]), exc, True, FFMPEGPathError)
+ self.log(u"Call completed")
# check if the output file exists
if not gf.file_exists(output_file_path):
- self._log([u"Output file '%s' was not written", output_file_path], Logger.CRITICAL)
- raise OSError("Output file was not written")
+ self.log_exc(u"Output file '%s' was not written" % (output_file_path), None, True, OSError)
# returning the output file path
- self._log([u"Returning output file path '%s'", output_file_path])
+ self.log([u"Returning output file path '%s'", output_file_path])
return output_file_path
diff --git a/aeneas/ffprobewrapper.py b/aeneas/ffprobewrapper.py
index 89f9ab72..e3db20b4 100644
--- a/aeneas/ffprobewrapper.py
+++ b/aeneas/ffprobewrapper.py
@@ -2,7 +2,13 @@
# coding=utf-8
"""
-Wrapper around ``ffprobe`` to read the properties of an audio file.
+This module contains the following classes:
+
+* :class:`~aeneas.ffprobewrapper.FFPROBEWrapper`, a wrapper around ``ffprobe`` to read the properties of an audio file;
+* :class:`~aeneas.ffprobewrapper.FFPROBEParsingError`,
+* :class:`~aeneas.ffprobewrapper.FFPROBEPathError`, and
+* :class:`~aeneas.ffprobewrapper.FFPROBEUnsupportedFormatError`,
+ representing errors while reading the properties of audio files.
"""
from __future__ import absolute_import
@@ -10,8 +16,9 @@
import re
import subprocess
-from aeneas.logger import Logger
+from aeneas.logger import Loggable
from aeneas.runtimeconfiguration import RuntimeConfiguration
+from aeneas.timevalue import TimeValue
import aeneas.globalfunctions as gf
__author__ = "Alberto Pettarin"
@@ -21,7 +28,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
@@ -36,6 +43,8 @@ class FFPROBEParsingError(Exception):
class FFPROBEPathError(Exception):
"""
Error raised when the path to ``ffprobe`` is not a valid executable.
+
+ .. versionadded:: 1.4.1
"""
pass
@@ -48,7 +57,7 @@ class FFPROBEUnsupportedFormatError(Exception):
-class FFPROBEWrapper(object):
+class FFPROBEWrapper(Loggable):
"""
Wrapper around ``ffprobe`` to read the properties of an audio file.
@@ -99,11 +108,10 @@ class FFPROBEWrapper(object):
DISPOSITION:attached_pic=0
[/STREAM]
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
"""
FFPROBE_PARAMETERS = [
@@ -136,14 +144,6 @@ class FFPROBEWrapper(object):
TAG = u"FFPROBEWrapper"
- def __init__(self, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def read_properties(self, audio_file_path):
"""
Read the properties of an audio file
@@ -190,29 +190,26 @@ def read_properties(self, audio_file_path):
d["DISPOSITION:clean_effects"]=0
d["DISPOSITION:attached_pic"]=0
- :param audio_file_path: the path of the audio file to analyze
- :type audio_file_path: string (path)
+ :param string audio_file_path: the path of the audio file to analyze
:rtype: dict
-
- :raises TypeError: if ``audio_file_path`` is None
- :raises OSError: if the file at ``audio_file_path`` cannot be read
- :raises FFPROBEParsingError: if the call to ``ffprobe`` does not produce any output
- :raises FFPROBEPathError: if the path to the ``ffprobe`` executable cannot be called
- :raises FFPROBEUnsupportedFormatError: if the file has a format not supported by ``ffprobe``
+ :raises: TypeError: if ``audio_file_path`` is None
+ :raises: OSError: if the file at ``audio_file_path`` cannot be read
+ :raises: FFPROBEParsingError: if the call to ``ffprobe`` does not produce any output
+ :raises: FFPROBEPathError: if the path to the ``ffprobe`` executable cannot be called
+ :raises: FFPROBEUnsupportedFormatError: if the file has a format not supported by ``ffprobe``
"""
# test if we can read the file at audio_file_path
if audio_file_path is None:
- raise TypeError("The audio file path is None")
+ self.log_exc(u"The audio file path is None", None, True, TypeError)
if not gf.file_can_be_read(audio_file_path):
- self._log([u"Input file '%s' cannot be read", audio_file_path], Logger.CRITICAL)
- raise OSError("Input file cannot be read")
+ self.log_exc(u"Input file '%s' cannot be read" % (audio_file_path), None, True, OSError)
# call ffprobe
- arguments = [self.rconf["ffprobe_path"]]
+ arguments = [self.rconf[RuntimeConfiguration.FFPROBE_PATH]]
arguments.extend(self.FFPROBE_PARAMETERS)
arguments.append(audio_file_path)
- self._log([u"Calling with arguments '%s'", arguments])
+ self.log([u"Calling with arguments '%s'", arguments])
try:
proc = subprocess.Popen(
arguments,
@@ -224,23 +221,20 @@ def read_properties(self, audio_file_path):
proc.stdout.close()
proc.stdin.close()
proc.stderr.close()
- except OSError:
- self._log([u"Unable to call the '%s' ffprobe executable", self.rconf["ffprobe_path"]], Logger.CRITICAL)
- raise FFPROBEPathError("Unable to call the specified ffprobe executable")
- self._log(u"Call completed")
+ except OSError as exc:
+ self.log_exc(u"Unable to call the '%s' ffprobe executable" % (self.rconf[RuntimeConfiguration.FFPROBE_PATH]), exc, True, FFPROBEPathError)
+ self.log(u"Call completed")
- # if no output, raise error
+ # check there is some output
if (stdoutdata is None) or (len(stderrdata) == 0):
- self._log(u"No output produced by ffprobe", Logger.CRITICAL)
- raise FFPROBEParsingError("No output produced by ffprobe")
+ self.log_exc(u"ffprobe produced no output", None, True, FFPROBEParsingError)
# decode stdoutdata and stderrdata to Unicode string
try:
stdoutdata = gf.safe_unicode(stdoutdata)
stderrdata = gf.safe_unicode(stderrdata)
- except UnicodeDecodeError:
- self._log(u"Error decoding stdout/stderr.")
- raise FFPROBEParsingError("Unable to decode ffprobe out/err")
+ except UnicodeDecodeError as exc:
+ self.log_exc(u"Unable to decode ffprobe out/err", exc, True, FFPROBEParsingError)
# dictionary for the results
results = {
@@ -255,39 +249,34 @@ def read_properties(self, audio_file_path):
# TODO deal with multiple audio streams
for line in stdoutdata.splitlines():
if line == self.STDOUT_END_STREAM:
- self._log(u"Reached end of the stream")
+ self.log(u"Reached end of the stream")
break
elif len(line.split("=")) == 2:
key, value = line.split("=")
results[key] = value
- self._log([u"Found property '%s'='%s'", key, value])
-
- # convert duration to float
- if self.STDOUT_DURATION in results:
- self._log([u"Found duration: '%s'", results[self.STDOUT_DURATION]])
- results[self.STDOUT_DURATION] = gf.safe_float(
- results[self.STDOUT_DURATION],
- None
- )
- else:
- self._log(u"No duration found in stdout", Logger.WARNING)
+ self.log([u"Found property '%s'='%s'", key, value])
- # if audio_length is still None, try scanning ffprobe stderr output
- if results[self.STDOUT_DURATION] is None:
+ try:
+ self.log([u"Duration found in stdout: '%s'", results[self.STDOUT_DURATION]])
+ results[self.STDOUT_DURATION] = TimeValue(results[self.STDOUT_DURATION])
+ self.log(u"Valid duration")
+ except:
+ self.log_warn(u"Invalid duration")
+ results[self.STDOUT_DURATION] = None
+ # try scanning ffprobe stderr output
for line in stderrdata.splitlines():
match = self.STDERR_DURATION_REGEX.search(line)
if match is not None:
- self._log([u"Found matching line '%s'", line])
+ self.log([u"Found matching line '%s'", line])
results[self.STDOUT_DURATION] = gf.time_from_hhmmssmmm(line)
- self._log([u"Extracted duration '%f'", results[self.STDOUT_DURATION]])
+ self.log([u"Extracted duration '%.3f'", results[self.STDOUT_DURATION]])
break
if results[self.STDOUT_DURATION] is None:
- self._log(u"No duration found in stdout or stderr (unsupported audio file format?)", Logger.CRITICAL)
- raise FFPROBEUnsupportedFormatError("Unsupported audio file format")
+ self.log_exc(u"No duration found in stdout or stderr. Unsupported audio file format?", None, True, FFPROBEUnsupportedFormatError)
# return dictionary
- self._log(u"Returning dict")
+ self.log(u"Returning dict")
return results
diff --git a/aeneas/globalconstants.py b/aeneas/globalconstants.py
index 9b15599a..2d5aa210 100644
--- a/aeneas/globalconstants.py
+++ b/aeneas/globalconstants.py
@@ -13,36 +13,68 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-
### CONSTANTS ###
+CONFIG_RESERVED_CHARACTERS = ["~"]
+""" List of reserved characters which are forbidden in configuration files """
+
+CONFIG_STRING_ASSIGNMENT_SYMBOL = "="
+""" Assignment symbol in config string ``key=value`` pairs """
+
+CONFIG_STRING_SEPARATOR_SYMBOL = "|"
+""" Separator of ``key=value`` pairs in config strings """
+
+PARSED_TEXT_SEPARATOR = "|"
+""" Separator for input text files in parsed format """
+
CONFIG_TXT_FILE_NAME = "config.txt"
""" File name for the TXT configuration file in containers """
CONFIG_XML_FILE_NAME = "config.xml"
""" File name for the XML configuration file in containers """
+CONFIG_XML_TASK_TAG = "task"
+""" ```` tag in the XML configuration file """
+
CONFIG_XML_TASKS_TAG = "tasks"
""" ```` tag in the XML configuration file """
-CONFIG_XML_TASK_TAG = "task"
-""" ```` tag in the XML configuration file """
+MIMETYPE_MAP = {
+ "aac": "audio/aac",
+ "aiff": "audio/x-aiff",
+ "flac": "audio/flac",
+ "mp3": "audio/mpeg",
+ "mp4": "audio/mp4",
+ "oga": "audio/x-vorbis+ogg",
+ "ogg": "audio/x-vorbis+ogg",
+ "wav": "audio/x-wav",
+ "webm": "video/webm"
+}
+""" Map from audio file extension to mimetype """
+
+TMP_PATH_DEFAULT_NONPOSIX = None
+"""
+Default temporary directory path for non-POSIX OSes.
+Set to ``None`` so that ``tempfile`` will select
+the most approriate temporary directory root path.
-CONFIG_RESERVED_CHARACTERS = ["~"]
-""" List of reserved characters which are forbidden in configuration files """
+.. versionadded:: 1.4.1
+"""
-CONFIG_STRING_SEPARATOR_SYMBOL = "|"
-""" Separator of ``key=value`` pairs in config strings """
+TMP_PATH_DEFAULT_POSIX = "/tmp/"
+"""
+Default temporary directory path for POSIX OSes.
-CONFIG_STRING_ASSIGNMENT_SYMBOL = "="
-""" Assignment symbol in config string ``key=value`` pairs """
+.. versionadded:: 1.4.1
+"""
-PARSED_TEXT_SEPARATOR = "|"
-""" Separator for input text files in parsed format """
+
+
+### PARAMETER NAMES ###
# reserved parameter names (RPN)
RPN_JOB_IDENTIFIER = "job_identifier"
@@ -80,11 +112,13 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.language.Language`
+Values: listed in :class:`~aeneas.language.Language`
Example::
- job_language=en
+ job_language=eng-GBR
+ job_language=eng-USA
+ job_language=ita-ITA
"""
@@ -110,7 +144,7 @@
Usage: config string, TXT config file
-Values: string (path)
+Values: string
Example::
@@ -127,7 +161,7 @@
Usage: config string, TXT config file
-Values: string (path)
+Values: string
Example::
@@ -142,7 +176,7 @@
Usage: config string, TXT config file
-Values: listed in :class:`aeneas.hierarchytype.HierarchyType`
+Values: listed in :class:`~aeneas.hierarchytype.HierarchyType`
Example::
@@ -170,18 +204,7 @@
PPN_JOB_IS_TEXT_FILE_FORMAT = "is_text_type"
"""
-The text file format of text files in input containers.
-
-Usage: config string, TXT config file, XML config file
-
-Values: listed in :class:`aeneas.textfile.TextFileFormat`
-
-Example::
-
- is_text_type=plain
- is_text_type=parsed
- is_text_type=unparsed
-
+See PPN_TASK_IS_TEXT_FILE_FORMAT
"""
PPN_JOB_IS_TEXT_FILE_NAME_REGEX = "is_text_file_name_regex"
@@ -207,7 +230,7 @@
Usage: config string, TXT config file
-Values: string (path)
+Values: string
Example::
@@ -217,56 +240,34 @@
"""
-PPN_JOB_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex"
+PPN_JOB_IS_TEXT_MUNPARSED_L1_ID_REGEX = "is_text_munparsed_l1_id_regex"
+"""
+See PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX
"""
-The regex for matching the ``class`` attribute
-of XML elements containing text fragments to be extracted
-from ``unparsed`` text files.
-
-Usage: config string, TXT config file, XML config file
-
-Values: regex
-Example::
+PPN_JOB_IS_TEXT_MUNPARSED_L2_ID_REGEX = "is_text_munparsed_l2_id_regex"
+"""
+See PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX
+"""
- is_text_unparsed_class_regex=ra
- is_text_unparsed_class_regex=readaloud
- is_text_unparsed_class_regex=ra[0-9]+
+PPN_JOB_IS_TEXT_MUNPARSED_L3_ID_REGEX = "is_text_munparsed_l3_id_regex"
+"""
+See PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX
+"""
+PPN_JOB_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex"
+"""
+See PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX
"""
PPN_JOB_IS_TEXT_UNPARSED_ID_REGEX = "is_text_unparsed_id_regex"
"""
-The regex for matching the ``id`` attribute
-of XML elements containing text fragments to be extracted
-from ``unparsed`` text files.
-
-Usage: config string, TXT config file, XML config file
-
-Values: regex
-
-Example::
-
- is_text_unparsed_id_regex=f[0-9]+
- is_text_unparsed_id_regex=ra.*
-
+See PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX
"""
PPN_JOB_IS_TEXT_UNPARSED_ID_SORT = "is_text_unparsed_id_sort"
"""
-The sorting algorithm to be used to sort the text fragments
-extracted from ``unparsed`` text files, based on their ``id`` attributes.
-
-Usage: config string, TXT config file, XML config file
-
-Values: listed in :class:`aeneas.idsortingalgorithm.IDSortingAlgorithm`
-
-Example::
-
- is_text_unparsed_id_sort=lexicographic
- is_text_unparsed_id_sort=numeric
- is_text_unparsed_id_sort=unsorted
-
+See PPN_TASK_IS_TEXT_UNPARSED_ID_SORT
"""
PPN_JOB_OS_CONTAINER_FORMAT = "os_job_file_container"
@@ -275,7 +276,7 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.container.ContainerFormat`
+Values: listed in :class:`~aeneas.container.ContainerFormat`
Example::
@@ -304,7 +305,7 @@
Usage: config string, TXT config file, XML config file
-Values: string (path)
+Values: string
Example::
@@ -318,7 +319,7 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.hierarchytype.HierarchyType`
+Values: listed in :class:`~aeneas.hierarchytype.HierarchyType`
Example::
@@ -331,12 +332,13 @@
"""
Key for specifying the syncmap language
-Values: listed in :class:`aeneas.language.Language`
+Values: listed in :class:`~aeneas.language.Language`
Example::
- language=en
- language=it
+ language=eng-GBR
+ language=eng-USA
+ language=ita-ITA
.. versionadded:: 1.2.0
"""
@@ -375,11 +377,13 @@
Usage: config string, XML config file
-Values: listed in :class:`aeneas.language.Language`
+Values: listed in :class:`~aeneas.language.Language`
Example::
- task_language=en
+ task_language=eng-GBR
+ task_language=eng-USA
+ task_language=ita-ITA
"""
@@ -390,7 +394,7 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm`
+Values: listed in :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm`
Example::
@@ -656,7 +660,7 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.textfile.TextFileFormat`
+Values: listed in :class:`~aeneas.textfile.TextFileFormat`
Example::
@@ -689,7 +693,7 @@
Usage: config string, TXT config file, XML config file
-Values: string (path)
+Values: string
Example::
@@ -697,6 +701,85 @@
"""
+PPN_TASK_IS_TEXT_MPLAIN_WORD_SEPARATOR = "is_text_mplain_word_separator"
+"""
+The word separator to be used when splitting words
+in ``mplain`` input text files.
+
+You can use the following special strings:
+
+* ``equal`` for a ``=`` character (ASCII ``0x20``),
+* ``pipe`` for a ``|`` character (ASCII ``0x7C``),
+* ``space`` for a space character (ASCII ``0x20``),
+* ``tab`` for a tab character (ASCII ``0x09``).
+
+Any other string will be used as the word separator.
+If not specified, the ``space`` will be used.
+
+Usage: config string, TXT config file, XML config file
+
+Values: string
+
+Example::
+
+ is_text_mplain_word_separator=space
+ is_text_mplain_word_separator=tab
+ is_text_mplain_word_separator=,
+
+"""
+
+PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX = "is_text_munparsed_l1_id_regex"
+"""
+The regex to match ``id`` attributes for level 1 (paragraph) text fragments.
+It applies to ``munparsed`` text files only.
+
+Usage: config string, TXT config file, XML config file
+
+Values: regex
+
+Example::
+
+ is_text_munparsed_l1_id_regex=p[0-9]+
+
+.. versionadded:: 1.5.0
+"""
+
+PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX = "is_text_munparsed_l2_id_regex"
+"""
+The regex to match ``id`` attributes for level 2 (sentence) text fragments.
+It applies to ``munparsed`` text files only.
+
+Usage: config string, TXT config file, XML config file
+
+Values: regex
+
+Example::
+
+ is_text_munparsed_l2_id_regex=s[0-9]+
+ is_text_munparsed_l2_id_regex=p[0-9]+s[0-9]+
+
+.. versionadded:: 1.5.0
+
+"""
+
+PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX = "is_text_munparsed_l3_id_regex"
+"""
+The regex to match ``id`` attributes for level 3 (word) text fragments.
+It applies to ``munparsed`` text files only.
+
+Usage: config string, TXT config file, XML config file
+
+Values: regex
+
+Example::
+
+ is_text_munparsed_l3_id_regex=w[0-9]+
+ is_text_munparsed_l3_id_regex=p[0-9]+s[0-9]+w[0-9]+
+
+.. versionadded:: 1.5.0
+
+"""
+
PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex"
"""
The regex to match ``class`` attributes for text fragments.
@@ -733,11 +816,11 @@
PPN_TASK_IS_TEXT_UNPARSED_ID_SORT = "is_text_unparsed_id_sort"
"""
The algorithm to sort text fragments by their ``id`` attributes.
-It applies to unparsed text files only.
+It applies to ``unparsed`` text files only.
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.idsortingalgorithm.IDSortingAlgorithm`
+Values: listed in :class:`~aeneas.idsortingalgorithm.IDSortingAlgorithm`
Example::
@@ -753,7 +836,7 @@
Usage: config string, TXT config file, XML config file
-Values: listed in :class:`aeneas.syncmap.SyncMapFormat`
+Values: listed in :class:`~aeneas.syncmap.SyncMapFormat`
Example::
@@ -788,6 +871,26 @@
.. versionadded:: 1.3.1
"""
+PPN_TASK_OS_FILE_LEVELS = "os_task_file_levels"
+"""
+If the input text file is multilevel,
+only outputs the specified levels.
+
+This parameter has no effect for single-level
+input text files or output sync map formats.
+
+Usage: config string, TXT config file, XML config file
+
+Values: string
+
+Example::
+
+ os_task_file_levels=123
+ os_task_file_levels=3
+
+.. versionadded:: 1.5.0
+"""
+
PPN_TASK_OS_FILE_NAME = "os_task_file_name"
"""
The name of the sync map file output for the task.
@@ -806,6 +909,21 @@
"""
+PPN_TASK_OS_FILE_NO_ZERO = "os_task_file_no_zero"
+"""
+If specified, do not allow fragments with zero duration.
+
+Usage: config string, TXT config file, XML config file
+
+Values: string
+
+Example::
+
+ os_task_file_no_zero=True
+
+.. versionadded:: 1.5.0
+"""
+
PPN_TASK_OS_FILE_SMIL_AUDIO_REF = "os_task_file_smil_audio_ref"
"""
The value of the ``src`` attribute for the ``
+
+
+
+
+ 1
+
+
+
+
+
+
+
+ From
+ fairest
+ creatures
+ we
+ desire
+ increase,
+
+
+ That
+ thereby
+ beauty’s
+ rose
+ might
+ never
+ die,
+
+
+ But
+ as
+ the
+ riper
+ should
+ by
+ time
+ decease,
+
+
+ His
+ tender
+ heir
+ might
+ bear
+ his
+ memory:
+
+
+
+
+ But
+ thou
+ contracted
+ to
+ thine
+ own
+ bright
+ eyes,
+
+
+ Feed’st
+ thy
+ light’s
+ flame
+ with
+ self-substantial
+ fuel,
+
+
+ Making
+ a
+ famine
+ where
+ abundance
+ lies,
+
+
+ Thy
+ self
+ thy
+ foe,
+ to
+ thy
+ sweet
+ self
+ too
+ cruel:
+
+
+
+
+ Thou
+ that
+ art
+ now
+ the
+ world's
+ fresh
+ ornament,
+
+
+ And
+ only
+ herald
+ to
+ the
+ gaudy
+ spring,
+
+
+ Within
+ thine
+ own
+ bud
+ buriest
+ thy
+ content,
+
+
+ And
+ tender
+ churl
+ mak’st
+ waste
+ in
+ niggarding:
+
+
+
+
+ Pity
+ the
+ world,
+ or
+ else
+ this
+ glutton
+ be,
+
+
+ To
+ eat
+ the
+ world’s
+ due,
+ by
+ the
+ grave
+ and
+ thee.
+
+
+
+
+ * Multiple levels: yes (output only)
+ * Multiple lines: yes
"""
TXT = "txt"
@@ -378,6 +512,9 @@ class SyncMapFormat(object):
f002 00:00:01.234 00:00:05.678 "Second fragment text"
f003 00:00:05.678 00:00:07.890 "Third fragment text"
+ * Multiple levels: no
+ * Multiple lines: no
+
.. versionadded:: 1.0.4
"""
@@ -391,13 +528,15 @@ class SyncMapFormat(object):
f002 1.234 5.678 "Second fragment text"
f003 5.678 7.890 "Third fragment text"
+ * Multiple levels: no
+ * Multiple lines: no
+
.. versionadded:: 1.2.0
"""
VTT = "vtt"
"""
- WebVTT caption/subtitle format
- (it might have multiple lines per fragment)::
+ WebVTT caption/subtitle format::
WEBVTT
@@ -415,38 +554,39 @@ class SyncMapFormat(object):
Third fragment text
Second line of third fragment
+ * Multiple levels: no
+ * Multiple lines: yes
"""
XML = "xml"
"""
- XML (it might have multiple lines per fragment)::
+ XML::
+ * Multiple levels: yes (output only)
+ * Multiple lines: yes
"""
XML_LEGACY = "xml_legacy"
"""
- XML, legacy format.
-
- Deprecated, it will be removed in v2.0.0. Use XML instead.
-
- .. deprecated:: 1.2.0
-
- ::
+ XML, legacy format::
- """
+ * Multiple levels: no
+ * Multiple lines: no
+ Deprecated, it will be removed in v2.0.0. Use XML instead.
+
+ .. deprecated:: 1.2.0
+ """
ALLOWED_VALUES = [
+ AUD,
+ AUDH,
+ AUDM,
CSV,
CSVH,
CSVM,
DFXP,
+ EAF,
JSON,
RBSE,
SBV,
@@ -502,10 +651,10 @@ class SyncMapFormat(object):
-class SyncMap(object):
+class SyncMap(Loggable):
"""
- A synchronization map, that is, a list of
- :class:`aeneas.syncmap.SyncMapFragment`
+ A synchronization map, that is, a tree of
+ :class:`~aeneas.syncmap.SyncMapFragment`
objects.
"""
@@ -540,13 +689,9 @@ class SyncMap(object):
TAG = u"SyncMap"
- def __init__(self, logger=None):
- self.fragments = []
- self.logger = logger or Logger()
-
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
+ def __init__(self, rconf=None, logger=None):
+ super(SyncMap, self).__init__(rconf=rconf, logger=logger)
+ self.fragments_tree = Tree()
def __len__(self):
return len(self.fragments)
@@ -557,60 +702,92 @@ def __unicode__(self):
def __str__(self):
return gf.safe_str(self.__unicode__())
- def to_json(self):
+ @property
+ def fragments_tree(self):
"""
- Return a JSON representation of the sync map.
+ Return the current tree of fragments.
- :rtype: Unicode string
+ :rtype: :class:`~aeneas.tree.Tree`
+ """
+ return self.__fragments_tree
+ @fragments_tree.setter
+ def fragments_tree(self, fragments_tree):
+ self.__fragments_tree = fragments_tree
- .. versionadded:: 1.3.1
+ @property
+ def is_single_level(self):
"""
- output_fragments = []
- for fragment in self.fragments:
- text = fragment.text_fragment
- output_fragments.append({
- "id" : text.identifier,
- "language" : text.language,
- "lines" : text.lines,
- "begin" : gf.time_to_ssmmm(fragment.begin),
- "end" : gf.time_to_ssmmm(fragment.end)
- })
- return gf.safe_unicode(
- json.dumps({"fragments": output_fragments}, indent=1, sort_keys=True)
- )
+ Return ``True`` if the sync map
+ has only one level, that is,
+ if it is a list of fragments
+ rather than a hierarchical tree.
- def append_fragment(self, fragment):
+ :rtype: bool
"""
- Append the given sync map fragment.
+ return self.fragments_tree.height <= 2
- :param fragment: the sync map fragment to be appended
- :type fragment: :class:`aeneas.syncmap.SyncMapFragment`
+ @property
+ def fragments(self):
+ """
+ The current list of sync map fragments
+ which are the children of the root node
+ of the sync map tree.
- :raise TypeError: if ``fragment`` is ``None`` or
- it is not an instance of ``SyncMapFragment``
+ :rtype: list of :class:`~aeneas.syncmap.SyncMapFragment`
"""
- if not isinstance(fragment, SyncMapFragment):
- raise TypeError("fragment is not an instance of SyncMapFragment")
- self.fragments.append(fragment)
+ return self.fragments_tree.vchildren_not_empty
@property
- def fragments(self):
+ def json_string(self):
"""
- The current list of sync map fragments.
+ Return a JSON representation of the sync map.
+
+ :rtype: string
- :rtype: list of :class:`aeneas.syncmap.SyncMapFragment`
+ .. versionadded:: 1.3.1
"""
- return self.__fragments
- @fragments.setter
- def fragments(self, fragments):
- self.__fragments = fragments
+ def visit_children(node):
+ """ Recursively visit the fragments_tree """
+ output_fragments = []
+ for child in node.children_not_empty:
+ fragment = child.value
+ text = fragment.text_fragment
+ output_fragments.append({
+ "id" : text.identifier,
+ "language" : text.language,
+ "lines" : text.lines,
+ "begin" : gf.time_to_ssmmm(fragment.begin),
+ "end" : gf.time_to_ssmmm(fragment.end),
+ "children": visit_children(child)
+ })
+ return output_fragments
+ output_fragments = visit_children(self.fragments_tree)
+ return gf.safe_unicode(
+ json.dumps({"fragments": output_fragments}, indent=1, sort_keys=True)
+ )
+
+ def add_fragment(self, fragment, as_last=True):
+ """
+ Add the given sync map fragment,
+ as the first or last child of the root node
+ of the sync map tree.
+
+ :param fragment: the sync map fragment to be added
+ :type fragment: :class:`~aeneas.syncmap.SyncMapFragment`
+ :param bool as_last: if ``True``, append fragment; otherwise prepend it
+ :raises: TypeError: if ``fragment`` is ``None`` or
+ it is not an instance of :class:`~aeneas.syncmap.SyncMapFragment`
+ """
+ if not isinstance(fragment, SyncMapFragment):
+ self.log_exc(u"fragment is not an instance of SyncMapFragment", None, True, TypeError)
+ self.fragments_tree.add_child(Tree(value=fragment), as_last=as_last)
def clear(self):
"""
- Clear the sync map.
+ Clear the sync map, removing all the current fragments.
"""
- self._log(u"Clearing sync map")
- self.fragments = []
+ self.log(u"Clearing sync map")
+ self.fragments_tree = Tree()
def output_html_for_tuning(
self,
@@ -621,17 +798,14 @@ def output_html_for_tuning(
"""
Output an HTML file for fine tuning the sync map manually.
- :param audio_file_path: the path to the associated audio file
- :type audio_file_path: string (path)
- :param output_file_path: the path to the output file to write
- :type output_file_path: string (path)
- :param parameters: additional parameters
- :type parameters: dict
+ :param string audio_file_path: the path to the associated audio file
+ :param string output_file_path: the path to the output file to write
+ :param dict parameters: additional parameters
.. versionadded:: 1.3.1
"""
if not gf.file_can_be_written(output_file_path):
- raise OSError("Cannot output HTML file '%s' (wrong permissions?)" % output_file_path)
+ self.log_exc(u"Cannot output HTML file '%s'. Wrong permissions?" % (output_file_path), None, True, OSError)
if parameters is None:
parameters = {}
audio_file_path_absolute = gf.fix_slash(os.path.abspath(audio_file_path))
@@ -646,7 +820,7 @@ def output_html_for_tuning(
)
template = template.replace(
self.FINETUNEAS_REPLACE_FRAGMENTS,
- u"fragments = (%s).fragments;" % self.to_json()
+ u"fragments = (%s).fragments;" % self.json_string
)
if gc.PPN_TASK_OS_FILE_FORMAT in parameters:
output_format = parameters[gc.PPN_TASK_OS_FILE_FORMAT]
@@ -679,26 +853,27 @@ def output_html_for_tuning(
def read(self, sync_map_format, input_file_path, parameters=None):
"""
Read sync map fragments from the given file in the specified format,
- and append them the current (this) sync map.
+ and add them the current (this) sync map.
Return ``True`` if the call succeeded,
``False`` if an error occurred.
:param sync_map_format: the format of the sync map
- :type sync_map_format: :class:`aeneas.syncmap.SyncMapFormat` enum
- :param input_file_path: the path to the input file to read
- :type input_file_path: string (path)
- :param parameters: additional parameters (e.g., for SMIL input)
- :type parameters: dict
-
- :raise ValueError: if ``sync_map_format`` is ``None`` or it is not an allowed value
- :raise OSError: if ``input_file_path`` does not exist
+ :type sync_map_format: :class:`~aeneas.syncmap.SyncMapFormat`
+ :param string input_file_path: the path to the input file to read
+ :param dict parameters: additional parameters (e.g., for ``SMIL`` input)
+ :raises: ValueError: if ``sync_map_format`` is ``None`` or it is not an allowed value
+ :raises: OSError: if ``input_file_path`` does not exist
"""
map_read_function = {
+ SyncMapFormat.AUD: partial(self._read_aud, parse_time=gf.time_from_ssmmm),
+ SyncMapFormat.AUDH: partial(self._read_aud, parse_time=gf.time_from_hhmmssmmm),
+ SyncMapFormat.AUDM: partial(self._read_aud, parse_time=gf.time_from_ssmmm),
SyncMapFormat.CSV: partial(self._read_csv, parse_time=gf.time_from_ssmmm),
SyncMapFormat.CSVH: partial(self._read_csv, parse_time=gf.time_from_hhmmssmmm),
SyncMapFormat.CSVM: partial(self._read_csv, parse_time=gf.time_from_ssmmm),
SyncMapFormat.DFXP: self._read_ttml,
+ SyncMapFormat.EAF: self._read_eaf,
SyncMapFormat.JSON: self._read_json,
SyncMapFormat.RBSE: self._read_rbse,
SyncMapFormat.SBV: partial(self._read_sub, use_newline=True),
@@ -723,51 +898,52 @@ def read(self, sync_map_format, input_file_path, parameters=None):
SyncMapFormat.XML_LEGACY: self._read_xml_legacy,
}
if sync_map_format is None:
- raise ValueError("Sync map format is None")
+ self.log_exc(u"Sync map format is None", None, True, ValueError)
if sync_map_format not in map_read_function:
- raise ValueError("Sync map format '%s' is not allowed" % sync_map_format)
+ self.log_exc(u"Sync map format '%s' is not allowed" % (sync_map_format), None, True, ValueError)
if not gf.file_can_be_read(input_file_path):
- raise OSError("Cannot read sync map file '%s' (wrong permissions?)" % input_file_path)
+ self.log_exc(u"Cannot read sync map file '%s'. Wrong permissions?" % (input_file_path), None, True, OSError)
- self._log([u"Input format: '%s'", sync_map_format])
- self._log([u"Input path: '%s'", input_file_path])
- self._log([u"Input parameters: '%s'", parameters])
+ self.log([u"Input format: '%s'", sync_map_format])
+ self.log([u"Input path: '%s'", input_file_path])
+ self.log([u"Input parameters: '%s'", parameters])
# open file for reading
- self._log(u"Opening input file")
+ self.log(u"Opening input file")
with io.open(input_file_path, "r", encoding="utf-8") as input_file:
map_read_function[sync_map_format](input_file)
# overwrite language if requested
language = gf.safe_get(parameters, gc.PPN_SYNCMAP_LANGUAGE, None)
if language is not None:
- self._log([u"Overwriting language to '%s'", language])
+ self.log([u"Overwriting language to '%s'", language])
for fragment in self.fragments:
fragment.text_fragment.language = language
def write(self, sync_map_format, output_file_path, parameters=None):
"""
- Write the current sync map to file in the required format.
+ Write the current sync map to file in the requested format.
Return ``True`` if the call succeeded,
``False`` if an error occurred.
:param sync_map_format: the format of the sync map
- :type sync_map_format: :class:`aeneas.syncmap.SyncMapFormat` enum
- :param output_file_path: the path to the output file to write
- :type output_file_path: string (path)
- :param parameters: additional parameters (e.g., for SMIL output)
- :type parameters: dict
-
- :raise ValueError: if ``sync_map_format`` is ``None`` or it is not an allowed value
- :raise TypeError: if a required parameter is missing
- :raise OSError: if ``output_file_path`` cannot be written
+ :type sync_map_format: :class:`~aeneas.syncmap.SyncMapFormat`
+ :param string output_file_path: the path to the output file to write
+ :param dict parameters: additional parameters (e.g., for ``SMIL`` output)
+ :raises: ValueError: if ``sync_map_format`` is ``None`` or it is not an allowed value
+ :raises: TypeError: if a required parameter is missing
+ :raises: OSError: if ``output_file_path`` cannot be written
"""
map_write_function = {
+ SyncMapFormat.AUD: partial(self._write_aud, format_time=gf.time_to_ssmmm),
+ SyncMapFormat.AUDH: partial(self._write_aud, format_time=gf.time_to_hhmmssmmm),
+ SyncMapFormat.AUDM: partial(self._write_aud, format_time=gf.time_to_ssmmm),
SyncMapFormat.CSV: partial(self._write_csv, format_time=gf.time_to_ssmmm),
SyncMapFormat.CSVH: partial(self._write_csv, format_time=gf.time_to_hhmmssmmm),
SyncMapFormat.CSVM: partial(self._write_csv, format_time=gf.time_to_ssmmm),
SyncMapFormat.DFXP: partial(self._write_ttml, parameters=parameters),
+ SyncMapFormat.EAF: partial(self._write_eaf, parameters=parameters),
SyncMapFormat.JSON: self._write_json,
SyncMapFormat.RBSE: self._write_rbse,
SyncMapFormat.SBV: partial(self._write_sub, use_newline=True),
@@ -792,15 +968,15 @@ def write(self, sync_map_format, output_file_path, parameters=None):
SyncMapFormat.XML_LEGACY: self._write_xml_legacy,
}
if sync_map_format is None:
- raise ValueError("Sync map format is None")
+ self.log_exc(u"Sync map format is None", None, True, ValueError)
if sync_map_format not in map_write_function:
- raise ValueError("Sync map format '%s' is not allowed" % sync_map_format)
+ self.log_exc(u"Sync map format '%s' is not allowed" % (sync_map_format), None, True, ValueError)
if not gf.file_can_be_written(output_file_path):
- raise OSError("Cannot output sync map file '%s' (wrong permissions?)" % output_file_path)
+ self.log_exc(u"Cannot write sync map file '%s'. Wrong permissions?" % (output_file_path), None, True, OSError)
- self._log([u"Output format: '%s'", sync_map_format])
- self._log([u"Output path: '%s'", output_file_path])
- self._log([u"Output parameters: '%s'", parameters])
+ self.log([u"Output format: '%s'", sync_map_format])
+ self.log([u"Output path: '%s'", output_file_path])
+ self.log([u"Output parameters: '%s'", parameters])
# create dir hierarchy, if needed
gf.ensure_parent_directory(output_file_path)
@@ -816,20 +992,48 @@ def write(self, sync_map_format, output_file_path, parameters=None):
gc.PPN_TASK_OS_FILE_SMIL_AUDIO_REF
]:
if gf.safe_get(parameters, key, None) is None:
- msg = u"Parameter %s must be specified for format %s" % (key, sync_map_format)
- self._log(msg, Logger.CRITICAL)
- raise SyncMapMissingParameterError(msg)
+ self.log_exc(u"Parameter %s must be specified for format %s" % (key, sync_map_format), None, True, SyncMapMissingParameterError)
# open file for writing
- self._log(u"Opening output file")
+ self.log(u"Opening output file")
with io.open(output_file_path, "w", encoding="utf-8") as output_file:
map_write_function[sync_map_format](output_file)
+ def _read_aud(self, input_file, parse_time):
+ """ Read from AUD file """
+ identifier_index = 1
+ for line in input_file.readlines():
+ split = line.strip().split("\t")
+ self.add_fragment(
+ SyncMapFragment(
+ text_fragment=TextFragment(
+ identifier = u"f" + str(identifier_index).zfill(6),
+ lines=[split[2]]
+ ),
+ begin=parse_time(split[0]),
+ end=parse_time(split[1])
+ )
+ )
+ identifier_index += 1
+
+ def _write_aud(self, output_file, format_time):
+ """ Write to AUD file """
+ msg = []
+ for fragment in self.fragments:
+ msg.append(
+ u"%s\t%s\t%s" % (
+ format_time(fragment.begin),
+ format_time(fragment.end),
+ u" ".join(fragment.text_fragment.lines)
+ )
+ )
+ output_file.write(u"\n".join(msg))
+
def _read_csv(self, input_file, parse_time):
""" Read from CSV file """
for line in input_file.readlines():
split = line.strip().split(u",")
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=split[0],
@@ -854,11 +1058,94 @@ def _write_csv(self, output_file, format_time):
)
output_file.write(u"\n".join(msg))
+ def _read_eaf(self, input_file):
+ """ Read from EAF file """
+ # namespaces
+ xsi = "http://www.w3.org/2001/XMLSchema-instance"
+ ns_map = {"xsi" : xsi}
+ # get root
+ root = etree.fromstring(gf.safe_bytes(input_file.read()))
+ # get time slots
+ time_slots = dict()
+ for ts in root.iter("TIME_SLOT"):
+ time_slots[ts.get("TIME_SLOT_ID")] = gf.time_from_ssmmm(ts.get("TIME_VALUE")) / 1000
+ # parse annotations
+ for alignable in root.iter("ALIGNABLE_ANNOTATION"):
+ identifier = gf.safe_unicode(alignable.get("ANNOTATION_ID"))
+ begin = time_slots[alignable.get("TIME_SLOT_REF1")]
+ end = time_slots[alignable.get("TIME_SLOT_REF2")]
+ lines = []
+ for value in alignable.iter("ANNOTATION_VALUE"):
+ lines.append(gf.safe_unicode(value.text))
+ self.add_fragment(
+ SyncMapFragment(
+ text_fragment=TextFragment(
+ identifier=identifier,
+ lines=lines
+ ),
+ begin=begin,
+ end=end
+ )
+ )
+
+ def _write_eaf(self, output_file, parameters=None):
+ """ Write to EAF file """
+ # namespaces
+ xsi = "http://www.w3.org/2001/XMLSchema-instance"
+ ns_map = {"xsi" : xsi}
+ # build doc
+ doc = etree.Element("ANNOTATION_DOCUMENT", nsmap=ns_map)
+ doc.attrib["{%s}noNamespaceSchemaLocation" % xsi] = "http://www.mpi.nl/tools/elan/EAFv2.8.xsd"
+ doc.attrib["AUTHOR"] = "aeneas"
+ doc.attrib["DATE"] = "2016-01-01T00:00:00+00:00"
+ doc.attrib["FORMAT"] = "2.8"
+ doc.attrib["VERSION"] = "2.8"
+ # header
+ header = etree.SubElement(doc, "HEADER")
+ header.attrib["MEDIA_FILE"] = ""
+ header.attrib["TIME_UNITS"] = "milliseconds"
+ if (not parameters is None) and ("audio_file_path_absolute" in parameters):
+ media = etree.SubElement(header, "MEDIA_DESCRIPTOR")
+ media.attrib["MEDIA_URL"] = "file://%s" % parameters["audio_file_path_absolute"]
+ media.attrib["MIME_TYPE"] = gf.mimetype_from_path(parameters["audio_file_path_absolute"])
+ # time order
+ time_order = etree.SubElement(doc, "TIME_ORDER")
+ # tier
+ tier = etree.SubElement(doc, "TIER")
+ tier.attrib["LINGUISTIC_TYPE_REF"] = "utterance"
+ tier.attrib["TIER_ID"] = "tier1"
+ i = 1
+ for fragment in self.fragments:
+ # time slots
+ begin_id = "ts%06db" % i
+ end_id = "ts%06de" % i
+ slot = etree.SubElement(time_order, "TIME_SLOT")
+ slot.attrib["TIME_SLOT_ID"] = begin_id
+ slot.attrib["TIME_VALUE"] = "%d" % (fragment.begin * 1000)
+ slot = etree.SubElement(time_order, "TIME_SLOT")
+ slot.attrib["TIME_SLOT_ID"] = end_id
+ slot.attrib["TIME_VALUE"] = "%d" % (fragment.end * 1000)
+ # annotation
+ annotation = etree.SubElement(tier, "ANNOTATION")
+ alignable = etree.SubElement(annotation, "ALIGNABLE_ANNOTATION")
+ alignable.attrib["ANNOTATION_ID"] = fragment.text_fragment.identifier
+ alignable.attrib["TIME_SLOT_REF1"] = begin_id
+ alignable.attrib["TIME_SLOT_REF2"] = end_id
+ value = etree.SubElement(alignable, "ANNOTATION_VALUE")
+ value.text = u" ".join(fragment.text_fragment.lines)
+ i += 1
+ # linguistic type
+ ling = etree.SubElement(doc, "LINGUISTIC_TYPE")
+ ling.attrib["LINGUISTIC_TYPE_ID"] = "utterance"
+ ling.attrib["TIME_ALIGNABLE"] = "true"
+ # write tree
+ self._write_tree_to_file(doc, output_file, xml_declaration=True)
+
def _read_json(self, input_file):
""" Read from JSON file """
contents_dict = json.loads(input_file.read())
for fragment in contents_dict["fragments"]:
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=fragment["id"],
@@ -872,13 +1159,13 @@ def _read_json(self, input_file):
def _write_json(self, output_file):
""" Write to JSON file """
- output_file.write(self.to_json())
+ output_file.write(self.json_string)
def _read_rbse(self, input_file):
""" Read from RBSE file """
contents_dict = json.loads(input_file.read())
for fragment in contents_dict["smil_data"]:
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=fragment["id"],
@@ -936,7 +1223,7 @@ def _read_smil(self, input_file):
end = gf.time_from_hhmmssmmm(child.get("clipEnd"))
if end is None:
end = gf.time_from_ssmmm(child.get("clipEnd"))
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -963,21 +1250,56 @@ def _write_smil(self, output_file, format_time, parameters):
smil_elem.attrib["version"] = "3.0"
body_elem = etree.SubElement(smil_elem, "{%s}body" % smil_ns)
seq_elem = etree.SubElement(body_elem, "{%s}seq" % smil_ns)
- seq_elem.attrib["id"] = "s" + str(1).zfill(6)
+ seq_elem.attrib["id"] = "seq" + str(1).zfill(6)
seq_elem.attrib["{%s}textref" % epub_ns] = text_ref
- i = 1
- for fragment in self.fragments:
- text = fragment.text_fragment
- par_elem = etree.SubElement(seq_elem, "{%s}par" % smil_ns)
- par_elem.attrib["id"] = "p" + str(i).zfill(6)
- text_elem = etree.SubElement(par_elem, "{%s}text" % smil_ns)
- text_elem.attrib["src"] = "%s#%s" % (text_ref, text.identifier)
- audio_elem = etree.SubElement(par_elem, "{%s}audio" % smil_ns)
- audio_elem.attrib["src"] = audio_ref
- audio_elem.attrib["clipBegin"] = format_time(fragment.begin)
- audio_elem.attrib["clipEnd"] = format_time(fragment.end)
- i += 1
+ if self.is_single_level:
+ # single level
+ i = 1
+ for fragment in self.fragments:
+ text = fragment.text_fragment
+ par_elem = etree.SubElement(seq_elem, "{%s}par" % smil_ns)
+ par_elem.attrib["id"] = "par" + str(i).zfill(6)
+ text_elem = etree.SubElement(par_elem, "{%s}text" % smil_ns)
+ text_elem.attrib["src"] = "%s#%s" % (text_ref, text.identifier)
+ audio_elem = etree.SubElement(par_elem, "{%s}audio" % smil_ns)
+ audio_elem.attrib["src"] = audio_ref
+ audio_elem.attrib["clipBegin"] = format_time(fragment.begin)
+ audio_elem.attrib["clipEnd"] = format_time(fragment.end)
+ i += 1
+ else:
+ # TODO support generic multiple levels
+ # multiple levels
+ par_index = 1
+ for par_child in self.fragments_tree.children_not_empty:
+ par_seq_elem = etree.SubElement(seq_elem, "{%s}seq" % smil_ns)
+ #par_seq_elem.attrib["id"] = "p" + str(par_index).zfill(6)
+ par_seq_elem.attrib["{%s}type" % epub_ns] = "paragraph"
+ par_seq_elem.attrib["{%s}textref" % epub_ns] = text_ref + "#" + par_child.value.text_fragment.identifier
+ sen_index = 1
+ for sen_child in par_child.children_not_empty:
+ sen_seq_elem = etree.SubElement(par_seq_elem, "{%s}seq" % smil_ns)
+ #sen_seq_elem.attrib["id"] = par_seq_elem.attrib["id"] + "s" + str(sen_index).zfill(6)
+ sen_seq_elem.attrib["{%s}type" % epub_ns] = "sentence"
+ sen_seq_elem.attrib["{%s}textref" % epub_ns] = text_ref + "#" + sen_child.value.text_fragment.identifier
+ wor_index = 1
+ for wor_child in sen_child.children_not_empty:
+ fragment = wor_child.value
+ text = fragment.text_fragment
+ wor_seq_elem = etree.SubElement(sen_seq_elem, "{%s}seq" % smil_ns)
+ #wor_seq_elem.attrib["id"] = sen_seq_elem.attrib["id"] + "s" + str(wor_index).zfill(6)
+ wor_seq_elem.attrib["{%s}type" % epub_ns] = "word"
+ wor_seq_elem.attrib["{%s}textref" % epub_ns] = text_ref + "#" + text.identifier
+ wor_par_elem = etree.SubElement(wor_seq_elem, "{%s}par" % smil_ns)
+ text_elem = etree.SubElement(wor_par_elem, "{%s}text" % smil_ns)
+ text_elem.attrib["src"] = "%s#%s" % (text_ref, text.identifier)
+ audio_elem = etree.SubElement(wor_par_elem, "{%s}audio" % smil_ns)
+ audio_elem.attrib["src"] = audio_ref
+ audio_elem.attrib["clipBegin"] = format_time(fragment.begin)
+ audio_elem.attrib["clipEnd"] = format_time(fragment.end)
+ wor_index +=1
+ sen_index +=1
+ par_index +=1
# write tree
self._write_tree_to_file(smil_elem, output_file, xml_declaration=False)
@@ -1006,7 +1328,7 @@ def _read_srt(self, input_file):
# should never happen, but just in case...
if len(fragment_lines) == 0:
fragment_lines = [u""]
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1068,7 +1390,7 @@ def _read_sub(self, input_file, use_newline=False):
# should never happen, but just in case...
if len(fragment_lines) == 0:
fragment_lines = [u""]
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1105,7 +1427,7 @@ def _read_ssv(self, input_file, parse_time):
""" Read from SSV file """
for line in input_file.readlines():
split = line.strip().split(" ")
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=split[2],
@@ -1134,7 +1456,7 @@ def _read_tsv(self, input_file, parse_time):
""" Read from TSV file """
for line in input_file.readlines():
split = line.strip().split("\t")
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=split[2],
@@ -1169,7 +1491,7 @@ def _read_ttml(self, input_file):
begin = gf.time_from_ttml(elem.get("begin"))
end = gf.time_from_ttml(elem.get("end"))
fragment_lines = self._get_lines_from_node_text(elem)
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1204,17 +1526,37 @@ def _write_ttml(self, output_file, parameters):
#head_elem = etree.SubElement(tt_elem, "{%s}head" % ttml_ns)
body_elem = etree.SubElement(tt_elem, "{%s}body" % ttml_ns)
div_elem = etree.SubElement(body_elem, "{%s}div" % ttml_ns)
- for fragment in self.fragments:
- text = fragment.text_fragment
- p_string = u"
%s
" % (
- text.identifier,
- gf.time_to_ttml(fragment.begin),
- gf.time_to_ttml(fragment.end),
- u" ".join(text.lines)
- )
- p_elem = etree.fromstring(p_string)
- div_elem.append(p_elem)
+ if self.is_single_level:
+ # single level
+ for fragment in self.fragments:
+ text = fragment.text_fragment
+ p_string = u"
%s
" % (
+ text.identifier,
+ gf.time_to_ttml(fragment.begin),
+ gf.time_to_ttml(fragment.end),
+ u" ".join(text.lines)
+ )
+ p_elem = etree.fromstring(p_string)
+ div_elem.append(p_elem)
+ else:
+ # TODO support generic multiple levels
+ # multiple levels
+ for par_child in self.fragments_tree.children_not_empty:
+ text = par_child.value.text_fragment
+ p_elem = etree.SubElement(div_elem, "{%s}p" % ttml_ns)
+ p_elem.attrib["id"] = text.identifier
+ for sen_child in par_child.children_not_empty:
+ text = sen_child.value.text_fragment
+ sen_span_elem = etree.SubElement(p_elem, "{%s}span" % ttml_ns)
+ sen_span_elem.attrib["id"] = text.identifier
+ for wor_child in sen_child.children_not_empty:
+ fragment = wor_child.value
+ wor_span_elem = etree.SubElement(sen_span_elem, "{%s}span" % ttml_ns)
+ wor_span_elem.attrib["id"] = fragment.text_fragment.identifier
+ wor_span_elem.attrib["begin"] = gf.time_to_ttml(fragment.begin)
+ wor_span_elem.attrib["end"] = gf.time_to_ttml(fragment.end)
+ wor_span_elem.text = u" ".join(fragment.text_fragment.lines)
# write tree
self._write_tree_to_file(tt_elem, output_file)
@@ -1222,7 +1564,7 @@ def _read_txt(self, input_file, parse_time):
""" Read from TXT file """
for line in input_file.readlines():
split = line.strip().split(" ")
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=split[0],
@@ -1273,7 +1615,7 @@ def _read_vtt(self, input_file):
# should never happen, but just in case...
if len(fragment_lines) == 0:
fragment_lines = [u""]
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1316,7 +1658,7 @@ def _read_xml(self, input_file):
for child in frag:
if child.tag == "line":
lines.append(gf.safe_unicode(child.text))
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1329,15 +1671,21 @@ def _read_xml(self, input_file):
def _write_xml(self, output_file):
""" Write to XML file """
+ def visit_children(node, parent_elem):
+ """ Recursively visit the fragments_tree """
+ for child in node.children_not_empty:
+ fragment = child.value
+ fragment_elem = etree.SubElement(parent_elem, "fragment")
+ fragment_elem.attrib["id"] = fragment.text_fragment.identifier
+ fragment_elem.attrib["begin"] = gf.time_to_ssmmm(fragment.begin)
+ fragment_elem.attrib["end"] = gf.time_to_ssmmm(fragment.end)
+ for line in fragment.text_fragment.lines:
+ line_elem = etree.SubElement(fragment_elem, "line")
+ line_elem.text = line
+ children_elem = etree.SubElement(fragment_elem, "children")
+ visit_children(child, children_elem)
map_elem = etree.Element("map")
- for fragment in self.fragments:
- fragment_elem = etree.SubElement(map_elem, "fragment")
- fragment_elem.attrib["id"] = fragment.text_fragment.identifier
- fragment_elem.attrib["begin"] = gf.time_to_ssmmm(fragment.begin)
- fragment_elem.attrib["end"] = gf.time_to_ssmmm(fragment.end)
- for line in fragment.text_fragment.lines:
- line_elem = etree.SubElement(fragment_elem, "line")
- line_elem.text = line
+ visit_children(self.fragments_tree, map_elem)
self._write_tree_to_file(map_elem, output_file)
def _read_xml_legacy(self, input_file):
@@ -1351,7 +1699,7 @@ def _read_xml_legacy(self, input_file):
begin = gf.time_from_ssmmm(child.text)
elif child.tag == "end":
end = gf.time_from_ssmmm(child.text)
- self.append_fragment(
+ self.add_fragment(
SyncMapFragment(
text_fragment=TextFragment(
identifier=identifier,
@@ -1385,7 +1733,7 @@ def _write_tree_to_file(
xml_declaration=True
):
"""
- Write an lxml tree to the given output file.
+ Write an ``lxml`` tree to the given output file.
"""
string = etree.tostring(
root_element,
@@ -1399,8 +1747,8 @@ def _write_tree_to_file(
@classmethod
def _get_lines_from_node_text(cls, node):
"""
- Given an lxml node, get lines from node.text,
- where the line separator is " ".
+ Given an ``lxml`` node, get lines from ``node.text``,
+ where the line separator is `` ``.
"""
# TODO more robust parsing
parts = ([node.text] + list(chain(*([etree.tostring(c, with_tail=False), c.tail] for c in node.getchildren()))) + [node.tail])
@@ -1417,16 +1765,15 @@ def _get_lines_from_node_text(cls, node):
class SyncMapFragment(object):
"""
A sync map fragment, that is,
- a text fragment and an associated time interval.
+ a text fragment and an associated time interval ``[begin, end]``.
:param text_fragment: the text fragment
- :type text_fragment: :class:`aeneas.textfile.TextFragment`
+ :type text_fragment: :class:`~aeneas.textfile.TextFragment`
:param begin: the begin time of the audio interval
- :type begin: float
+ :type begin: :class:`~aeneas.timevalue.TimeValue`
:param end: the end time of the audio interval
- :type end: float
- :param confidence: the confidence of the audio timing
- :type confidence: float
+ :type end: :class:`~aeneas.timevalue.TimeValue`
+ :param float confidence: the confidence of the audio timing
"""
TAG = u"SyncMapFragment"
@@ -1444,35 +1791,21 @@ def __init__(
self.confidence = confidence
def __unicode__(self):
- return u"%s %.3f %.3f %.3f" % (
+ return u"%s %.3f %.3f" % (
self.text_fragment.identifier,
self.begin,
- self.end,
- self.confidence
+ self.end
)
def __str__(self):
return gf.safe_str(self.__unicode__())
- def __len__(self):
- return self.end - self.begin
-
- @property
- def audio_duration(self):
- """
- The audio duration of this sync map fragment,
- as end time minus begin time.
-
- :rtype: float
- """
- return len(self)
-
@property
def text_fragment(self):
"""
The text fragment associated with this sync map fragment.
- :rtype: :class:`aeneas.textfile.TextFragment`
+ :rtype: :class:`~aeneas.textfile.TextFragment`
"""
return self.__text_fragment
@text_fragment.setter
@@ -1484,7 +1817,7 @@ def begin(self):
"""
The begin time of this sync map fragment.
- :rtype: float
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
"""
return self.__begin
@begin.setter
@@ -1496,7 +1829,7 @@ def end(self):
"""
The end time of this sync map fragment.
- :rtype: float
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
"""
return self.__end
@end.setter
@@ -1506,9 +1839,9 @@ def end(self, end):
@property
def confidence(self):
"""
- The confidence of the audio timing, from 0 to 1.
+ The confidence of the audio timing, from ``0.0`` to ``1.0``.
- NOTE: currently always set to 1.0.
+ Currently this value is not used, and it is always ``1.0``.
:rtype: float
"""
@@ -1517,6 +1850,45 @@ def confidence(self):
def confidence(self, confidence):
self.__confidence = confidence
+ @property
+ def audio_duration(self):
+ """
+ The audio duration of this sync map fragment,
+ as end time minus begin time.
+
+ :rtype: :class:`~aeneas.timevalue.TimeValue`
+ """
+ if (self.begin is None) or (self.end is None):
+ return TimeValue("0.000")
+ return self.end - self.begin
+
+ @property
+ def chars(self):
+ """
+ Return the number of characters of the text fragment,
+ not including the line separators.
+
+ :rtype: int
+
+ .. versionadded:: 1.2.0
+ """
+ if self.text_fragment is None:
+ return 0
+ return self.text_fragment.chars
+
+ @property
+ def rate(self):
+ """
+ The rate, in characters/second, of this fragment.
+
+ :rtype: None or Decimal
+
+ .. versionadded:: 1.2.0
+ """
+ if self.audio_duration == TimeValue("0.000"):
+ return None
+ return Decimal(self.chars / self.audio_duration)
+
class SyncMapHeadTailFormat(object):
diff --git a/aeneas/synthesizer.py b/aeneas/synthesizer.py
index 8cea7c87..abfa3aa8 100644
--- a/aeneas/synthesizer.py
+++ b/aeneas/synthesizer.py
@@ -2,15 +2,21 @@
# coding=utf-8
"""
-A class to synthesize text fragments into
-a single ``wav`` file,
-along with the corresponding time anchors.
+This module contains the following classes:
+
+* :class:`~aeneas.synthesizer.Synthesizer`,
+ for synthesizing text fragments into an audio file,
+ along with the corresponding time anchors.
+
+.. warning:: This module might be refactored in a future version
"""
from __future__ import absolute_import
from __future__ import print_function
from aeneas.espeakwrapper import ESPEAKWrapper
-from aeneas.logger import Logger
+from aeneas.festivalwrapper import FESTIVALWrapper
+from aeneas.logger import Loggable
+from aeneas.nuancettsapiwrapper import NuanceTTSAPIWrapper
from aeneas.runtimeconfiguration import RuntimeConfiguration
from aeneas.textfile import TextFile
import aeneas.globalfunctions as gf
@@ -22,32 +28,98 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class Synthesizer(object):
+class Synthesizer(Loggable):
"""
A class to synthesize text fragments into
- a single ``wav`` file,
+ an audio file,
along with the corresponding time anchors.
- :param rconf: a runtime configuration. Default: ``None``, meaning that
- default settings will be used.
- :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration`
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: OSError: if a custom TTS engine is requested but it cannot be loaded
+ :raises: ImportError: if the Nuance TTS API wrapper is requested but
+ the``requests`` module is not installed
"""
+ CUSTOM = "custom"
+ """ Select custom TTS engine wrapper """
+
+ ESPEAK = "espeak"
+ """ Select eSpeak wrapper """
+
+ FESTIVAL = "festival"
+ """ Select Festival wrapper """
+
+ NUANCETTSAPI = "nuancettsapi"
+ """ Select Nuance TTS API wrapper """
+
+ ALLOWED_VALUES = [CUSTOM, ESPEAK, FESTIVAL, NUANCETTSAPI]
+ """ List of all the allowed values """
+
TAG = u"Synthesizer"
def __init__(self, rconf=None, logger=None):
- self.logger = logger or Logger()
- self.rconf = rconf or RuntimeConfiguration()
+ super(Synthesizer, self).__init__(rconf=rconf, logger=logger)
+ self.tts_engine = None
+ self._select_tts_engine()
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
+ def _select_tts_engine(self):
+ """
+ Select the TTS engine to be used by looking at the rconf object.
+ """
+ self.log(u"Selecting TTS engine...")
+ if self.rconf[RuntimeConfiguration.TTS] == self.CUSTOM:
+ self.log(u"TTS engine: custom")
+ tts_path = self.rconf[RuntimeConfiguration.TTS_PATH]
+ if not gf.file_can_be_read(tts_path):
+ self.log_exc(u"Cannot read tts_path", None, True, OSError)
+ try:
+ import imp
+ self.log([u"Loading CustomTTSWrapper module from '%s'...", tts_path])
+ imp.load_source("CustomTTSWrapperModule", tts_path)
+ self.log([u"Loading CustomTTSWrapper module from '%s'... done", tts_path])
+ self.log(u"Importing CustomTTSWrapper...")
+ from CustomTTSWrapperModule import CustomTTSWrapper
+ self.log(u"Importing CustomTTSWrapper... done")
+ self.log(u"Creating CustomTTSWrapper instance...")
+ self.tts_engine = CustomTTSWrapper(rconf=self.rconf, logger=self.logger)
+ self.log(u"Creating CustomTTSWrapper instance... done")
+ except Exception as exc:
+ self.log_exc(u"Unable to load custom TTS wrapper", exc, True, OSError)
+ elif self.rconf[RuntimeConfiguration.TTS] == self.FESTIVAL:
+ self.log(u"TTS engine: Festival")
+ self.tts_engine = FESTIVALWrapper(rconf=self.rconf, logger=self.logger)
+ elif self.rconf[RuntimeConfiguration.TTS] == self.NUANCETTSAPI:
+ try:
+ import requests
+ except ImportError as exc:
+ self.log_exc(u"Unable to import requests for Nuance TTS API wrapper", exc, True, ImportError)
+ self.log(u"TTS engine: Nuance TTS API")
+ self.tts_engine = NuanceTTSAPIWrapper(rconf=self.rconf, logger=self.logger)
+ else:
+ self.log(u"TTS engine: eSpeak")
+ self.tts_engine = ESPEAKWrapper(rconf=self.rconf, logger=self.logger)
+ self.log(u"Selecting TTS engine... done")
+
+ def output_is_mono_wave(self):
+ """
+ Return ``True`` if the TTS engine
+ outputs a PCM16 mono WAVE file.
+
+ This information can be used to avoid
+ converting the audio file output by the TTS engine.
+
+ :rtype: bool
+ """
+ if self.tts_engine is not None:
+ return self.tts_engine.OUTPUT_MONO_WAVE
+ return False
def synthesize(
self,
@@ -60,39 +132,41 @@ def synthesize(
Synthesize the text contained in the given fragment list
into a ``wav`` file.
+ Return a tuple ``(anchors, total_time, num_chars)``.
+
:param text_file: the text file to be synthesized
- :type text_file: :class:`aeneas.textfile.TextFile`
- :param audio_file_path: the path to the output audio file
- :type audio_file_path: string (path)
- :param quit_after: stop synthesizing as soon as
- reaching this many seconds
- :type quit_after: float
- :param backwards: if ``True``, synthesizing from the end of the text file
- :type backwards: bool
-
- :raise TypeError: if ``text_file`` is ``None`` or not an instance of ``TextFile``
- :raise OSError: if ``audio_file_path`` cannot be written
+ :type text_file: :class:`~aeneas.textfile.TextFile`
+ :param string audio_file_path: the path to the output audio file
+ :param float quit_after: stop synthesizing as soon as
+ reaching this many seconds
+ :param bool backwards: if ``True``, synthesizing from the end of the text file
+ :rtype: tuple
+ :raises: TypeError: if ``text_file`` is ``None`` or not an instance of ``TextFile``
+ :raises: OSError: if ``audio_file_path`` cannot be written
+ :raises: OSError: if ``tts=custom`` in the RuntimeConfiguration and ``tts_path`` cannot be read
"""
if text_file is None:
- raise TypeError("text_file is None")
+ self.log_exc(u"text_file is None", None, True, TypeError)
if not isinstance(text_file, TextFile):
- raise TypeError("text_file is not an instance of TextFile")
+ self.log_exc(u"text_file is not an instance of TextFile", None, True, TypeError)
if not gf.file_can_be_written(audio_file_path):
- raise OSError("audio_file_path cannot be written")
+ self.log_exc(u"Audio file path '%s' cannot be written" % (audio_file_path), None, True, OSError)
+ if self.tts_engine is None:
+ self.log_exc(u"Cannot select the TTS engine", None, True, ValueError)
- # at the moment only espeak TTS is supported
- self._log(u"Synthesizing using espeak...")
- espeak = ESPEAKWrapper(rconf=self.rconf, logger=self.logger)
- result = espeak.synthesize_multiple(
+ # synthesize
+ self.log(u"Synthesizing text...")
+ result = self.tts_engine.synthesize_multiple(
text_file=text_file,
output_file_path=audio_file_path,
quit_after=quit_after,
backwards=backwards
)
- self._log(u"Synthesizing using espeak... done")
+ self.log(u"Synthesizing text... done")
+ # check that the output file has been written
if not gf.file_exists(audio_file_path):
- raise OSError("audio_file_path was not written")
+ self.log_exc(u"Audio file path '%s' cannot be read" % (audio_file_path), None, True, OSError)
return result
diff --git a/aeneas/task.py b/aeneas/task.py
index b1ffc76a..09cc2f87 100644
--- a/aeneas/task.py
+++ b/aeneas/task.py
@@ -2,19 +2,23 @@
# coding=utf-8
"""
-A structure representing a task, that is,
-an audio file and a list of text fragments
-to be synchronized.
+This module contains the following classes:
+
+* :class:`~aeneas.task.Task`, representing a task;
+* :class:`~aeneas.task.TaskConfiguration`, representing a task configuration.
"""
from __future__ import absolute_import
from __future__ import print_function
import os
+from aeneas.adjustboundaryalgorithm import AdjustBoundaryAlgorithm
from aeneas.audiofile import AudioFile
-from aeneas.configurationobject import ConfigurationObject
-from aeneas.logger import Logger
+from aeneas.configuration import Configuration
+from aeneas.logger import Loggable
from aeneas.textfile import TextFile
+from aeneas.timevalue import Decimal
+from aeneas.timevalue import TimeValue
import aeneas.globalconstants as gc
import aeneas.globalfunctions as gf
@@ -25,29 +29,29 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
-class Task(object):
+class Task(Loggable):
"""
A structure representing a task, that is,
- an audio file and a list of text fragments
+ an audio file and an ordered set of text fragments
to be synchronized.
- :param config_string: the task configuration string
- :type config_string: string
+ :param string config_string: the task configuration string
+ :param rconf: a runtime configuration
+ :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
:param logger: the logger object
- :type logger: :class:`aeneas.logger.Logger`
-
- :raises TypeError: if ``config_string`` is not ``None`` and
- it is not a Unicode string
+ :type logger: :class:`~aeneas.logger.Logger`
+ :raises: TypeError: if ``config_string`` is not ``None`` and
+ it is not a Unicode string
"""
TAG = u"Task"
- def __init__(self, config_string=None, logger=None):
- self.logger = logger or Logger()
+ def __init__(self, config_string=None, rconf=None, logger=None):
+ super(Task, self).__init__(rconf=rconf, logger=logger)
self.identifier = gf.uuid_string()
self.configuration = None
self.audio_file_path = None # relative to input container root
@@ -62,10 +66,6 @@ def __init__(self, config_string=None, logger=None):
if config_string is not None:
self.configuration = TaskConfiguration(config_string)
- def _log(self, message, severity=Logger.DEBUG):
- """ Log """
- self.logger.log(message, severity, self.TAG)
-
def __unicode__(self):
msg = [
u"%s: '%s'" % (gc.RPN_TASK_IDENTIFIER, self.identifier),
@@ -99,7 +99,7 @@ def audio_file_path_absolute(self):
"""
The absolute path of the audio file.
- :rtype: string (path)
+ :rtype: string
"""
return self.__audio_file_path_absolute
@audio_file_path_absolute.setter
@@ -112,7 +112,7 @@ def text_file_path_absolute(self):
"""
The absolute path of the text file.
- :rtype: string (path)
+ :rtype: string
"""
return self.__text_file_path_absolute
@text_file_path_absolute.setter
@@ -125,7 +125,7 @@ def sync_map_file_path_absolute(self):
"""
The absolute path of the sync map file.
- :rtype: string (path)
+ :rtype: string
"""
return self.__sync_map_file_path_absolute
@sync_map_file_path_absolute.setter
@@ -143,51 +143,51 @@ def output_sync_map_file(self, container_root_path=None):
of the sync map inside the container.
Otherwise, the sync map file will be created at the path
- ``sync_map_file_path_absolute``.
+ ``self.sync_map_file_path_absolute``.
Return the the path of the sync map file created,
or ``None`` if an error occurred.
- :param container_root_path: the path to the root directory
- for the output container
- :type container_root_path: string (path)
- :rtype: string (path)
+ :param string container_root_path: the path to the root directory
+ for the output container
+ :rtype: string
"""
if self.sync_map is None:
- self._log(u"sync_map is None", Logger.CRITICAL)
- raise TypeError("sync_map object has not been set")
+ self.log_exc(u"The sync_map object has not been set", None, True, TypeError)
if (container_root_path is not None) and (self.sync_map_file_path is None):
- self._log(u"The (internal) path of the sync map has been set", Logger.CRITICAL)
- raise TypeError("The (internal) path of the sync map has been set")
+ self.log_exc(u"The (internal) path of the sync map has been set", None, True, TypeError)
- self._log([u"container_root_path is %s", container_root_path])
- self._log([u"self.sync_map_file_path is %s", self.sync_map_file_path])
- self._log([u"self.sync_map_file_path_absolute is %s", self.sync_map_file_path_absolute])
+ self.log([u"container_root_path is %s", container_root_path])
+ self.log([u"self.sync_map_file_path is %s", self.sync_map_file_path])
+ self.log([u"self.sync_map_file_path_absolute is %s", self.sync_map_file_path_absolute])
if (container_root_path is not None) and (self.sync_map_file_path is not None):
path = os.path.join(container_root_path, self.sync_map_file_path)
elif self.sync_map_file_path_absolute:
path = self.sync_map_file_path_absolute
gf.ensure_parent_directory(path)
- self._log([u"Output sync map to %s", path])
+ self.log([u"Output sync map to %s", path])
sync_map_format = self.configuration["o_format"]
audio_ref = self.configuration["o_smil_audio_ref"]
page_ref = self.configuration["o_smil_page_ref"]
- self._log([u"sync_map_format is %s", sync_map_format])
- self._log([u"page_ref is %s", page_ref])
- self._log([u"audio_ref is %s", audio_ref])
+ self.log([u"sync_map_format is %s", sync_map_format])
+ self.log([u"page_ref is %s", page_ref])
+ self.log([u"audio_ref is %s", audio_ref])
- self._log(u"Calling sync_map.write...")
- # TODO just pass self.configuration?
+ self.log(u"Calling sync_map.write...")
+ afpa = self.audio_file_path_absolute
+ if afpa is not None:
+ afpa = os.path.abspath(afpa)
parameters = {
+ "audio_file_path_absolute": afpa,
gc.PPN_TASK_OS_FILE_SMIL_PAGE_REF : page_ref,
gc.PPN_TASK_OS_FILE_SMIL_AUDIO_REF : audio_ref
}
self.sync_map.write(sync_map_format, path, parameters)
- self._log(u"Calling sync_map.write... done")
+ self.log(u"Calling sync_map.write... done")
return path
def _populate_audio_file(self):
@@ -195,34 +195,37 @@ def _populate_audio_file(self):
Create the ``self.audio_file`` object by reading
the audio file at ``self.audio_file_path_absolute``.
"""
- self._log(u"Populate audio file...")
+ self.log(u"Populate audio file...")
if self.audio_file_path_absolute is not None:
- self._log([u"audio_file_path_absolute is '%s'", self.audio_file_path_absolute])
+ self.log([u"audio_file_path_absolute is '%s'", self.audio_file_path_absolute])
self.audio_file = AudioFile(
file_path=self.audio_file_path_absolute,
logger=self.logger
)
self.audio_file.read_properties()
else:
- self._log(u"audio_file_path_absolute is None")
- self._log(u"Populate audio file... done")
+ self.log(u"audio_file_path_absolute is None")
+ self.log(u"Populate audio file... done")
def _populate_text_file(self):
"""
Create the ``self.text_file`` object by reading
the text file at ``self.text_file_path_absolute``.
"""
- self._log(u"Populate text file...")
+ self.log(u"Populate text file...")
if (
(self.text_file_path_absolute is not None) and
(self.configuration["language"] is not None)
):
- # TODO just pass self.configuration?
# the following values might be None
parameters = {
gc.PPN_TASK_IS_TEXT_FILE_IGNORE_REGEX : self.configuration["i_t_ignore_regex"],
gc.PPN_TASK_IS_TEXT_FILE_TRANSLITERATE_MAP : self.configuration["i_t_transliterate_map"],
- gc.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX :self.configuration["i_t_unparsed_class_regex"],
+ gc.PPN_TASK_IS_TEXT_MPLAIN_WORD_SEPARATOR : self.configuration["i_t_mplain_word_separator"],
+ gc.PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX : self.configuration["i_t_munparsed_l1_id_regex"],
+ gc.PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX : self.configuration["i_t_munparsed_l2_id_regex"],
+ gc.PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX : self.configuration["i_t_munparsed_l3_id_regex"],
+ gc.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX : self.configuration["i_t_unparsed_class_regex"],
gc.PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX : self.configuration["i_t_unparsed_id_regex"],
gc.PPN_TASK_IS_TEXT_UNPARSED_ID_SORT : self.configuration["i_t_unparsed_id_sort"],
gc.PPN_TASK_OS_FILE_ID_REGEX : self.configuration["o_id_regex"]
@@ -235,90 +238,121 @@ def _populate_text_file(self):
)
self.text_file.set_language(self.configuration["language"])
else:
- self._log(u"text_file_path_absolute and/or language is None")
- self._log(u"Populate text file... done")
+ self.log(u"text_file_path_absolute and/or language is None")
+ self.log(u"Populate text file... done")
-class TaskConfiguration(ConfigurationObject):
+class TaskConfiguration(Configuration):
"""
A structure representing a configuration for a task, that is,
a series of directives for I/O and processing the task.
Allowed keys:
- * ``PPN_TASK_CUSTOM_ID`` or ``custom_id``
- * ``PPN_TASK_DESCRIPTION`` or ``description``
- * ``PPN_TASK_LANGUAGE`` or ``language``
- * ``PPN_TASK_ADJUST_BOUNDARY_AFTERCURRENT_VALUE`` or ``aba_aftercurrent_value``
- * ``PPN_TASK_ADJUST_BOUNDARY_ALGORITHM`` or ``aba_algorithm``
- * ``PPN_TASK_ADJUST_BOUNDARY_BEFORENEXT_VALUE`` or ``aba_beforenext_value``
- * ``PPN_TASK_ADJUST_BOUNDARY_OFFSET_VALUE`` or ``aba_offset_value``
- * ``PPN_TASK_ADJUST_BOUNDARY_PERCENT_VALUE`` or ``aba_percent_value``
- * ``PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE`` or ``aba_rate_value``
- * ``PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MAX`` or ``i_a_head_max``
- * ``PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MIN`` or ``i_a_head_min``
- * ``PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MAX`` or ``i_a_tail_max``
- * ``PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MIN`` or ``i_a_tail_min``
- * ``PPN_TASK_IS_AUDIO_FILE_HEAD_LENGTH`` or ``i_a_head``
- * ``PPN_TASK_IS_AUDIO_FILE_PROCESS_LENGTH`` or ``i_a_process``
- * ``PPN_TASK_IS_AUDIO_FILE_TAIL_LENGTH`` or ``i_a_tail``
- * ``PPN_TASK_IS_TEXT_FILE_FORMAT`` or ``i_t_format``
- * ``PPN_TASK_IS_TEXT_FILE_IGNORE_REGEX`` or ``i_t_ignore_regex``
- * ``PPN_TASK_IS_TEXT_FILE_TRANSLITERATE_MAP`` or ``i_t_transliterate_map``
- * ``PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX`` or ``i_t_unparsed_class_regex``
- * ``PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX`` or ``i_t_unparsed_id_regex``
- * ``PPN_TASK_IS_TEXT_UNPARSED_ID_SORT`` or ``i_t_unparsed_id_sort``
- * ``PPN_TASK_OS_FILE_FORMAT`` or ``o_format``
- * ``PPN_TASK_OS_FILE_HEAD_TAIL_FORMAT`` or ``o_h_t_format``
- * ``PPN_TASK_OS_FILE_ID_REGEX`` or ``o_id_regex``
- * ``PPN_TASK_OS_FILE_NAME`` or ``o_name``
- * ``PPN_TASK_OS_FILE_SMIL_AUDIO_REF`` or ``o_smil_audio_ref``
- * ``PPN_TASK_OS_FILE_SMIL_PAGE_REF`` or ``o_smil_page_ref``
-
- :param config_string: the job configuration string
- :type config_string: Unicode string
-
- :raises TypeError: if ``config_string`` is not ``None`` and
- it is not a Unicode string
- :raises KeyError: if trying to access a key not listed above
+ * :data:`~aeneas.globalconstants.PPN_TASK_CUSTOM_ID` or ``custom_id``
+ * :data:`~aeneas.globalconstants.PPN_TASK_DESCRIPTION` or ``description``
+ * :data:`~aeneas.globalconstants.PPN_TASK_LANGUAGE` or ``language``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_AFTERCURRENT_VALUE` or ``aba_aftercurrent_value``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_ALGORITHM` or ``aba_algorithm``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_BEFORENEXT_VALUE` or ``aba_beforenext_value``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_OFFSET_VALUE` or ``aba_offset_value``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_PERCENT_VALUE` or ``aba_percent_value``
+ * :data:`~aeneas.globalconstants.PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE` or ``aba_rate_value``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MAX` or ``i_a_head_max``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MIN` or ``i_a_head_min``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MAX` or ``i_a_tail_max``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MIN` or ``i_a_tail_min``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_HEAD_LENGTH` or ``i_a_head``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_PROCESS_LENGTH` or ``i_a_process``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_AUDIO_FILE_TAIL_LENGTH` or ``i_a_tail``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_FILE_FORMAT` or ``i_t_format``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_FILE_IGNORE_REGEX` or ``i_t_ignore_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_FILE_TRANSLITERATE_MAP` or ``i_t_transliterate_map``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MPLAIN_WORD_SEPARATOR` or ``i_t_mplain_word_separator``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX` or ``i_t_munparsed_l1_id_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX` or ``i_t_munparsed_l2_id_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX` or ``i_t_munparsed_l3_id_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX` or ``i_t_unparsed_class_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX` or ``i_t_unparsed_id_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_ID_SORT` or ``i_t_unparsed_id_sort``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_FORMAT` or ``o_format``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_HEAD_TAIL_FORMAT` or ``o_h_t_format``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_ID_REGEX` or ``o_id_regex``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_LEVELS` or ``o_levels``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_NAME` or ``o_name``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_SMIL_AUDIO_REF` or ``o_smil_audio_ref``
+ * :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_SMIL_PAGE_REF` or ``o_smil_page_ref``
+
+ :param string config_string: the job configuration string
+ :raises: TypeError: if ``config_string`` is not ``None`` and
+ it is not a Unicode string
+ :raises: KeyError: if trying to access a key not listed above
"""
- TAG = u"TaskConfiguration"
-
FIELDS = [
(gc.PPN_TASK_CUSTOM_ID, (None, None, ["custom_id"])),
(gc.PPN_TASK_DESCRIPTION, (None, None, ["description"])),
(gc.PPN_TASK_LANGUAGE, (None, None, ["language"])),
- (gc.PPN_TASK_ADJUST_BOUNDARY_AFTERCURRENT_VALUE, (None, float, ["aba_aftercurrent_value"])),
+ (gc.PPN_TASK_ADJUST_BOUNDARY_AFTERCURRENT_VALUE, (None, TimeValue, ["aba_aftercurrent_value"])),
(gc.PPN_TASK_ADJUST_BOUNDARY_ALGORITHM, (None, None, ["aba_algorithm"])),
- (gc.PPN_TASK_ADJUST_BOUNDARY_BEFORENEXT_VALUE, (None, float, ["aba_beforenext_value"])),
- (gc.PPN_TASK_ADJUST_BOUNDARY_OFFSET_VALUE, (None, float, ["aba_offset_value"])),
+ (gc.PPN_TASK_ADJUST_BOUNDARY_BEFORENEXT_VALUE, (None, TimeValue, ["aba_beforenext_value"])),
+ (gc.PPN_TASK_ADJUST_BOUNDARY_OFFSET_VALUE, (None, TimeValue, ["aba_offset_value"])),
(gc.PPN_TASK_ADJUST_BOUNDARY_PERCENT_VALUE, (None, int, ["aba_percent_value"])),
- (gc.PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE, (None, float, ["aba_rate_value"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MAX, (None, float, ["i_a_head_max"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MIN, (None, float, ["i_a_head_min"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MAX, (None, float, ["i_a_tail_max"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MIN, (None, float, ["i_a_tail_min"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_HEAD_LENGTH, (None, float, ["i_a_head"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_PROCESS_LENGTH, (None, float, ["i_a_process"])),
- (gc.PPN_TASK_IS_AUDIO_FILE_TAIL_LENGTH, (None, float, ["i_a_tail"])),
+ (gc.PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE, (None, Decimal, ["aba_rate_value"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MAX, (None, TimeValue, ["i_a_head_max"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_HEAD_MIN, (None, TimeValue, ["i_a_head_min"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MAX, (None, TimeValue, ["i_a_tail_max"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_DETECT_TAIL_MIN, (None, TimeValue, ["i_a_tail_min"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_HEAD_LENGTH, (None, TimeValue, ["i_a_head"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_PROCESS_LENGTH, (None, TimeValue, ["i_a_process"])),
+ (gc.PPN_TASK_IS_AUDIO_FILE_TAIL_LENGTH, (None, TimeValue, ["i_a_tail"])),
(gc.PPN_TASK_IS_TEXT_FILE_FORMAT, (None, None, ["i_t_format"])),
(gc.PPN_TASK_IS_TEXT_FILE_IGNORE_REGEX, (None, None, ["i_t_ignore_regex"])),
(gc.PPN_TASK_IS_TEXT_FILE_TRANSLITERATE_MAP, (None, None, ["i_t_transliterate_map"])),
+ (gc.PPN_TASK_IS_TEXT_MPLAIN_WORD_SEPARATOR, (None, None, ["i_t_mplain_word_separator"])),
+ (gc.PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX, (None, None, ["i_t_munparsed_l1_id_regex"])),
+ (gc.PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX, (None, None, ["i_t_munparsed_l2_id_regex"])),
+ (gc.PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX, (None, None, ["i_t_munparsed_l3_id_regex"])),
(gc.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX, (None, None, ["i_t_unparsed_class_regex"])),
(gc.PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX, (None, None, ["i_t_unparsed_id_regex"])),
(gc.PPN_TASK_IS_TEXT_UNPARSED_ID_SORT, (None, None, ["i_t_unparsed_id_sort"])),
(gc.PPN_TASK_OS_FILE_FORMAT, (None, None, ["o_format"])),
(gc.PPN_TASK_OS_FILE_HEAD_TAIL_FORMAT, (None, None, ["o_h_t_format"])),
(gc.PPN_TASK_OS_FILE_ID_REGEX, (None, None, ["o_id_regex"])),
+ (gc.PPN_TASK_OS_FILE_LEVELS, (None, None, ["o_levels"])),
(gc.PPN_TASK_OS_FILE_NAME, (None, None, ["o_name"])),
+ (gc.PPN_TASK_OS_FILE_NO_ZERO, (None, bool, ["o_no_zero"])),
(gc.PPN_TASK_OS_FILE_SMIL_AUDIO_REF, (None, None, ["o_smil_audio_ref"])),
(gc.PPN_TASK_OS_FILE_SMIL_PAGE_REF, (None, None, ["o_smil_page_ref"])),
]
+ TAG = u"TaskConfiguration"
+
def __init__(self, config_string=None):
super(TaskConfiguration, self).__init__(config_string)
+ def aba_parameters(self):
+ """
+ Return a tuple ``(aba_algorithm, aba_parameters)``
+ representing the :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm`
+ algorithm and its parameters.
+
+ :rtype: tuple
+ """
+ ABA_MAP = {
+ AdjustBoundaryAlgorithm.AFTERCURRENT : [self[gc.PPN_TASK_ADJUST_BOUNDARY_AFTERCURRENT_VALUE]],
+ AdjustBoundaryAlgorithm.AUTO : [],
+ AdjustBoundaryAlgorithm.BEFORENEXT : [self[gc.PPN_TASK_ADJUST_BOUNDARY_BEFORENEXT_VALUE]],
+ AdjustBoundaryAlgorithm.OFFSET : [self[gc.PPN_TASK_ADJUST_BOUNDARY_OFFSET_VALUE]],
+ AdjustBoundaryAlgorithm.PERCENT : [self[gc.PPN_TASK_ADJUST_BOUNDARY_PERCENT_VALUE]],
+ AdjustBoundaryAlgorithm.RATE : [self[gc.PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE]],
+ AdjustBoundaryAlgorithm.RATEAGGRESSIVE : [self[gc.PPN_TASK_ADJUST_BOUNDARY_RATE_VALUE]]
+ }
+ aba_algorithm = self["aba_algorithm"]
+ if aba_algorithm is None:
+ aba_algorithm = AdjustBoundaryAlgorithm.AUTO
+ return (aba_algorithm, ABA_MAP[aba_algorithm])
+
diff --git a/aeneas/tests/__init__.py b/aeneas/tests/__init__.py
index bf36a441..803dc12d 100644
--- a/aeneas/tests/__init__.py
+++ b/aeneas/tests/__init__.py
@@ -15,7 +15,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL 3"
-__version__ = "1.4.1"
+__version__ = "1.5.0"
__email__ = "aeneas@readbeyond.it"
__status__ = "Production"
diff --git a/aeneas/tests/long_test_festivalwrapper.py b/aeneas/tests/long_test_festivalwrapper.py
new file mode 100644
index 00000000..93caf0f9
--- /dev/null
+++ b/aeneas/tests/long_test_festivalwrapper.py
@@ -0,0 +1,169 @@
+#!/usr/bin/env python
+# coding=utf-8
+
+import unittest
+
+from aeneas.festivalwrapper import FESTIVALWrapper
+from aeneas.textfile import TextFile
+from aeneas.textfile import TextFragment
+from aeneas.runtimeconfiguration import RuntimeConfiguration
+import aeneas.globalfunctions as gf
+
+class TestFESTIVALWrapper(unittest.TestCase):
+
+ def synthesize_single(self, text, language, ofp=None, zero_length=False):
+ if ofp is None:
+ handler, output_file_path = gf.tmp_file(suffix=".wav")
+ else:
+ handler = None
+ output_file_path = ofp
+ try:
+ rconf = RuntimeConfiguration()
+ rconf[RuntimeConfiguration.TTS] = u"festival"
+ rconf[RuntimeConfiguration.TTS_PATH] = u"text2wave"
+ tts_engine = FESTIVALWrapper(rconf=rconf)
+ result = tts_engine.synthesize_single(text, language, output_file_path)
+ gf.delete_file(handler, output_file_path)
+ if zero_length:
+ self.assertEqual(result, 0)
+ else:
+ self.assertGreater(result, 0)
+ except (OSError, TypeError, UnicodeDecodeError, ValueError) as exc:
+ gf.delete_file(handler, output_file_path)
+ raise exc
+
+ def synthesize_multiple(self, text_file, ofp=None, quit_after=None, backwards=False, zero_length=False):
+ if ofp is None:
+ handler, output_file_path = gf.tmp_file(suffix=".wav")
+ else:
+ handler = None
+ output_file_path = ofp
+ try:
+ rconf = RuntimeConfiguration()
+ rconf[RuntimeConfiguration.TTS] = u"festival"
+ rconf[RuntimeConfiguration.TTS_PATH] = u"text2wave"
+ tts_engine = FESTIVALWrapper(rconf=rconf)
+ anchors, total_time, num_chars = tts_engine.synthesize_multiple(
+ text_file,
+ output_file_path,
+ quit_after,
+ backwards
+ )
+ gf.delete_file(handler, output_file_path)
+ if zero_length:
+ self.assertEqual(total_time, 0.0)
+ else:
+ self.assertGreater(total_time, 0.0)
+ except (OSError, TypeError, UnicodeDecodeError, ValueError) as exc:
+ gf.delete_file(handler, output_file_path)
+ raise exc
+
+ def tfl(self, frags):
+ tfl = TextFile()
+ for language, lines in frags:
+ tfl.add_fragment(TextFragment(language=language, lines=lines, filtered_lines=lines), as_last=True)
+ return tfl
+
+ def test_multiple_tfl_none(self):
+ with self.assertRaises(TypeError):
+ self.synthesize_multiple(None, zero_length=True)
+
+ def test_multiple_invalid_output_path(self):
+ tfl = self.tfl([(FESTIVALWrapper.ENG, [u"word"])])
+ with self.assertRaises(OSError):
+ self.synthesize_multiple(tfl, ofp="x/y/z/not_existing.wav")
+
+ def test_multiple_no_fragments(self):
+ tfl = TextFile()
+ tfl.set_language(FESTIVALWrapper.ENG)
+ with self.assertRaises(ValueError):
+ self.synthesize_multiple(tfl)
+
+ def test_multiple_unicode_ascii(self):
+ tfl = self.tfl([(FESTIVALWrapper.ENG, [u"word"])])
+ self.synthesize_multiple(tfl)
+
+ def test_multiple_unicode_unicode(self):
+ tfl = self.tfl([(FESTIVALWrapper.ENG, [u"Ausführliche"])])
+ self.synthesize_multiple(tfl)
+
+ # TODO disabling this test, festival does not handle empty text
+ #def test_multiple_empty(self):
+ # tfl = self.tfl([(FESTIVALWrapper.ENG, [u""])])
+ # self.synthesize_multiple(tfl)
+
+ # TODO disabling this test, festival does not handle empty text
+ #def test_multiple_empty_multiline(self):
+ # tfl = self.tfl([(FESTIVALWrapper.ENG, [u"", u"", u""])])
+ # self.synthesize_multiple(tfl)
+
+ # TODO disabling this test, festival does not handle empty text
+ #def test_multiple_empty_fragments(self):
+ # tfl = self.tfl([
+ # (FESTIVALWrapper.ENG, [u""]),
+ # (FESTIVALWrapper.ENG, [u""]),
+ # (FESTIVALWrapper.ENG, [u""]),
+ # ])
+ # self.synthesize_multiple(tfl)
+
+ def test_multiple_empty_mixed(self):
+ tfl = self.tfl([(FESTIVALWrapper.ENG, [u"Word", u"", u"Word"])])
+ self.synthesize_multiple(tfl)
+
+ # TODO disabling this test, festival does not handle empty text
+ #def test_multiple_empty_mixed_fragments(self):
+ # tfl = self.tfl([
+ # (FESTIVALWrapper.ENG, [u"Word"]),
+ # (FESTIVALWrapper.ENG, [u""]),
+ # (FESTIVALWrapper.ENG, [u"Word"]),
+ # ])
+ # self.synthesize_multiple(tfl)
+
+ def test_multiple_invalid_language(self):
+ tfl = self.tfl([("zzzz", [u"Word"])])
+ with self.assertRaises(ValueError):
+ self.synthesize_multiple(tfl)
+
+ def test_multiple_variation_language(self):
+ tfl = self.tfl([(FESTIVALWrapper.ENG_GBR, [u"Word"])])
+ self.synthesize_multiple(tfl)
+
+ def test_single_none(self):
+ with self.assertRaises(TypeError):
+ self.synthesize_single(None, FESTIVALWrapper.ENG)
+
+ def test_single_invalid_output_path(self):
+ with self.assertRaises(OSError):
+ self.synthesize_single(u"word", FESTIVALWrapper.ENG, ofp="x/y/z/not_existing.wav")
+
+ def test_single_empty_string(self):
+ self.synthesize_single(u"", FESTIVALWrapper.ENG, zero_length=True)
+
+ def test_single_text_str_ascii(self):
+ with self.assertRaises(TypeError):
+ self.synthesize_single(b"Word", FESTIVALWrapper.ENG)
+
+ def test_single_text_str_unicode(self):
+ with self.assertRaises(TypeError):
+ self.synthesize_single(b"Ausf\xc3\xbchrliche", FESTIVALWrapper.ENG)
+
+ def test_single_text_unicode_ascii(self):
+ self.synthesize_single(u"Word", FESTIVALWrapper.ENG)
+
+ def test_single_text_unicode_unicode(self):
+ self.synthesize_single(u"Ausführliche", FESTIVALWrapper.ENG)
+
+ def test_single_variation_language(self):
+ self.synthesize_single(u"Word", FESTIVALWrapper.ENG_GBR)
+
+ def test_single_invalid_language(self):
+ with self.assertRaises(ValueError):
+ self.synthesize_single(u"Word", "zzzz")
+
+
+
+if __name__ == '__main__':
+ unittest.main()
+
+
+
diff --git a/aeneas/tests/long_test_sd.py b/aeneas/tests/long_test_sd.py
deleted file mode 100644
index d7426df4..00000000
--- a/aeneas/tests/long_test_sd.py
+++ /dev/null
@@ -1,175 +0,0 @@
-#!/usr/bin/env python
-# coding=utf-8
-
-import os
-import unittest
-
-from aeneas.audiofile import AudioFileMonoWAVE
-from aeneas.sd import SD
-from aeneas.sd import SDMetric
-from aeneas.textfile import TextFile
-from aeneas.textfile import TextFileFormat
-import aeneas.globalfunctions as gf
-
-class TestSD(unittest.TestCase):
-
- AUDIO_FILE = gf.absolute_path("res/cmfcc/audio.wav", __file__)
- TEXT_FILE = gf.absolute_path("res/inputtext/sonnet_plain.txt", __file__)
-
- def load(self):
- audio_file = AudioFileMonoWAVE(file_path=self.AUDIO_FILE)
- text_file = TextFile(file_path=self.TEXT_FILE, file_format=TextFileFormat.PLAIN)
- return SD(audio_file, text_file)
-
- def test_create_sd(self):
- sd = self.load()
-
- def test_detect_interval(self):
- sd = self.load()
- begin, end = sd.detect_interval()
-
- def test_detect_interval_head_min(self):
- sd = self.load()
- begin, end = sd.detect_interval(min_head_length=0.0)
-
- def test_detect_interval_head_max(self):
- sd = self.load()
- begin, end = sd.detect_interval(max_head_length=10.0)
-
- def test_detect_interval_head_min_max(self):
- sd = self.load()
- begin, end = sd.detect_interval(min_head_length=0.0, max_head_length=10.0)
-
- def test_detect_interval_tail_min(self):
- sd = self.load()
- begin, end = sd.detect_interval(min_tail_length=0.0)
-
- def test_detect_interval_tail_max(self):
- sd = self.load()
- begin, end = sd.detect_interval(max_tail_length=10.0)
-
- def test_detect_interval_tail_min_max(self):
- sd = self.load()
- begin, end = sd.detect_interval(min_tail_length=0.0, max_tail_length=10.0)
-
- def test_detect_interval_head_tail(self):
- sd = self.load()
- begin, end = sd.detect_interval(min_head_length=0.0, max_head_length=10.0, min_tail_length=0.0, max_tail_length=10.0)
-
- def test_detect_interval_metric_value(self):
- sd = self.load()
- begin, end = sd.detect_interval(metric=SDMetric.VALUE)
-
- def test_detect_interval_metric_distortion(self):
- sd = self.load()
- begin, end = sd.detect_interval(metric=SDMetric.DISTORTION)
-
- def test_detect_interval_metric_bad(self):
- sd = self.load()
- begin, end = sd.detect_interval(metric="foo")
-
- def test_detect_head(self):
- sd = self.load()
- begin = sd.detect_head()
-
- def test_detect_head_min(self):
- sd = self.load()
- begin = sd.detect_head(min_head_length=0.0)
-
- def test_detect_head_min_bad_1(self):
- sd = self.load()
- begin = sd.detect_head(min_head_length=-10.0)
-
- def test_detect_head_min_bad_2(self):
- sd = self.load()
- begin = sd.detect_head(min_head_length=1000.0)
-
- def test_detect_head_min_bad_3(self):
- sd = self.load()
- begin = sd.detect_head(min_head_length="foo")
-
- def test_detect_head_max(self):
- sd = self.load()
- begin = sd.detect_head(max_head_length=10.0)
-
- def test_detect_head_max_bad_1(self):
- sd = self.load()
- begin = sd.detect_head(max_head_length=-10.0)
-
- def test_detect_head_max_bad_2(self):
- sd = self.load()
- begin = sd.detect_head(max_head_length=1000.0)
-
- def test_detect_head_max_bad_3(self):
- sd = self.load()
- begin = sd.detect_head(max_head_length="foo")
-
- def test_detect_head_metric_value(self):
- sd = self.load()
- begin = sd.detect_head(metric=SDMetric.VALUE)
-
- def test_detect_head_metric_distortion(self):
- sd = self.load()
- begin = sd.detect_head(metric=SDMetric.DISTORTION)
-
- def test_detect_head_metric_bad(self):
- sd = self.load()
- begin = sd.detect_head(metric="foo")
-
- def test_detect_tail(self):
- sd = self.load()
- tail = sd.detect_tail()
-
- def test_detect_tail_min(self):
- sd = self.load()
- begin = sd.detect_tail(min_tail_length=0.0)
-
- def test_detect_tail_min_bad_1(self):
- sd = self.load()
- begin = sd.detect_tail(min_tail_length=-10.0)
-
- def test_detect_tail_min_bad_2(self):
- sd = self.load()
- begin = sd.detect_tail(min_tail_length=1000.0)
-
- def test_detect_tail_min_bad_3(self):
- sd = self.load()
- begin = sd.detect_tail(min_tail_length="foo")
-
- def test_detect_tail_max(self):
- sd = self.load()
- begin = sd.detect_tail(max_tail_length=10.0)
-
- def test_detect_tail_max_bad_1(self):
- sd = self.load()
- begin = sd.detect_tail(max_tail_length=-10.0)
-
- def test_detect_tail_max_bad_2(self):
- sd = self.load()
- begin = sd.detect_tail(max_tail_length=1000.0)
-
- def test_detect_tail_max_bad_3(self):
- sd = self.load()
- begin = sd.detect_tail(max_tail_length="foo")
-
- def test_detect_tail_metric_value(self):
- sd = self.load()
- begin = sd.detect_tail(metric=SDMetric.VALUE)
-
- def test_detect_tail_metric_distortion(self):
- sd = self.load()
- begin = sd.detect_tail(metric=SDMetric.DISTORTION)
-
- def test_detect_tail_metric_bad(self):
- sd = self.load()
- begin = sd.detect_tail(metric="foo")
-
- # TODO add more meaningful tests about the actual detection of head/tail
-
-
-
-if __name__ == '__main__':
- unittest.main()
-
-
-
diff --git a/aeneas/tests/res/audioformats/mono.16000.wav b/aeneas/tests/res/audioformats/mono.16000.wav
new file mode 100644
index 00000000..6aebc0d4
Binary files /dev/null and b/aeneas/tests/res/audioformats/mono.16000.wav differ
diff --git a/aeneas/tests/res/audioformats/mono.22050.wav b/aeneas/tests/res/audioformats/mono.22050.wav
new file mode 100644
index 00000000..c5ab1687
Binary files /dev/null and b/aeneas/tests/res/audioformats/mono.22050.wav differ
diff --git a/aeneas/tests/res/cmfcc/audio.wav b/aeneas/tests/res/audioformats/mono.44100.wav
similarity index 100%
rename from aeneas/tests/res/cmfcc/audio.wav
rename to aeneas/tests/res/audioformats/mono.44100.wav
diff --git a/aeneas/tests/res/audioformats/mono.48000.wav b/aeneas/tests/res/audioformats/mono.48000.wav
new file mode 100644
index 00000000..5d223243
Binary files /dev/null and b/aeneas/tests/res/audioformats/mono.48000.wav differ
diff --git a/aeneas/tests/res/audioformats/mono.empty.wav b/aeneas/tests/res/audioformats/mono.empty.wav
new file mode 100644
index 00000000..e69de29b
diff --git a/aeneas/tests/res/audioformats/mono.invalid.wav b/aeneas/tests/res/audioformats/mono.invalid.wav
new file mode 100644
index 00000000..1c1b1a69
Binary files /dev/null and b/aeneas/tests/res/audioformats/mono.invalid.wav differ
diff --git a/aeneas/tests/res/audioformats/mono.zero.wav b/aeneas/tests/res/audioformats/mono.zero.wav
new file mode 100644
index 00000000..8dbde954
Binary files /dev/null and b/aeneas/tests/res/audioformats/mono.zero.wav differ
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.mp3 b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.mp3
new file mode 120000
index 00000000..182cdadf
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.mp3
@@ -0,0 +1 @@
+../../../../container/job/assets/p001.mp3
\ No newline at end of file
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.txt b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.txt
new file mode 100644
index 00000000..b4d6117b
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet001.txt
@@ -0,0 +1,15 @@
+f000001|1
+f000002|From fairest creatures we desire increase,
+f000003|That thereby beauty's rose might never die,
+f000004|But as the riper should by time decease,
+f000005|His tender heir might bear his memory:
+f000006|But thou contracted to thine own bright eyes,
+f000007|Feed'st thy light's flame with self-substantial fuel,
+f000008|Making a famine where abundance lies,
+f000009|Thy self thy foe, to thy sweet self too cruel:
+f000010|Thou that art now the world's fresh ornament,
+f000011|And only herald to the gaudy spring,
+f000012|Within thine own bud buriest thy content,
+f000013|And tender churl mak'st waste in niggarding:
+f000014|Pity the world, or else this glutton be,
+f000015|To eat the world's due, by the grave and thee.
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.mp3 b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.mp3
new file mode 120000
index 00000000..182cdadf
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.mp3
@@ -0,0 +1 @@
+../../../../container/job/assets/p001.mp3
\ No newline at end of file
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.txt b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.txt
new file mode 100644
index 00000000..b4d6117b
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet002.txt
@@ -0,0 +1,15 @@
+f000001|1
+f000002|From fairest creatures we desire increase,
+f000003|That thereby beauty's rose might never die,
+f000004|But as the riper should by time decease,
+f000005|His tender heir might bear his memory:
+f000006|But thou contracted to thine own bright eyes,
+f000007|Feed'st thy light's flame with self-substantial fuel,
+f000008|Making a famine where abundance lies,
+f000009|Thy self thy foe, to thy sweet self too cruel:
+f000010|Thou that art now the world's fresh ornament,
+f000011|And only herald to the gaudy spring,
+f000012|Within thine own bud buriest thy content,
+f000013|And tender churl mak'st waste in niggarding:
+f000014|Pity the world, or else this glutton be,
+f000015|To eat the world's due, by the grave and thee.
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.mp3 b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.mp3
new file mode 120000
index 00000000..182cdadf
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.mp3
@@ -0,0 +1 @@
+../../../../container/job/assets/p001.mp3
\ No newline at end of file
diff --git a/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.txt b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.txt
new file mode 100644
index 00000000..b4d6117b
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/OEBPS/Resources/sonnet003.txt
@@ -0,0 +1,15 @@
+f000001|1
+f000002|From fairest creatures we desire increase,
+f000003|That thereby beauty's rose might never die,
+f000004|But as the riper should by time decease,
+f000005|His tender heir might bear his memory:
+f000006|But thou contracted to thine own bright eyes,
+f000007|Feed'st thy light's flame with self-substantial fuel,
+f000008|Making a famine where abundance lies,
+f000009|Thy self thy foe, to thy sweet self too cruel:
+f000010|Thou that art now the world's fresh ornament,
+f000011|And only herald to the gaudy spring,
+f000012|Within thine own bud buriest thy content,
+f000013|And tender churl mak'st waste in niggarding:
+f000014|Pity the world, or else this glutton be,
+f000015|To eat the world's due, by the grave and thee.
diff --git a/aeneas/tests/res/example_jobs/example8/config.txt b/aeneas/tests/res/example_jobs/example8/config.txt
new file mode 100644
index 00000000..f4254f37
--- /dev/null
+++ b/aeneas/tests/res/example_jobs/example8/config.txt
@@ -0,0 +1,17 @@
+is_hierarchy_type=flat
+is_hierarchy_prefix=OEBPS/Resources/
+is_text_file_relative_path=.
+is_text_file_name_regex=.*\.txt
+is_text_type=parsed
+is_audio_file_relative_path=.
+is_audio_file_name_regex=.*\.mp3
+
+os_job_file_name=output_example8
+os_job_file_container=zip
+os_job_file_hierarchy_type=flat
+os_job_file_hierarchy_prefix=OEBPS/Resources/
+os_task_file_name=$PREFIX.aud
+os_task_file_format=aud
+
+job_language=en
+job_description=Example 8 (flat hierarchy, parsed text files, 3 identical tasks)
diff --git a/aeneas/tests/res/inputtext/sonnet_mplain.txt b/aeneas/tests/res/inputtext/sonnet_mplain.txt
new file mode 100644
index 00000000..33f28f6f
--- /dev/null
+++ b/aeneas/tests/res/inputtext/sonnet_mplain.txt
@@ -0,0 +1,20 @@
+1
+
+From fairest creatures we desire increase,
+That thereby beauty's rose might never die,
+But as the riper should by time decease,
+His tender heir might bear his memory:
+
+But thou contracted to thine own bright eyes,
+Feed'st thy light's flame with self-substantial fuel,
+Making a famine where abundance lies,
+Thy self thy foe, to thy sweet self too cruel:
+
+Thou that art now the world's fresh ornament,
+And only herald to the gaudy spring,
+Within thine own bud buriest thy content,
+And tender churl mak'st waste in niggarding:
+
+Pity the world, or else this glutton be,
+To eat the world's due, by the grave and thee.
+
diff --git a/aeneas/tests/res/inputtext/sonnet_mplain_multiple_blank.txt b/aeneas/tests/res/inputtext/sonnet_mplain_multiple_blank.txt
new file mode 100644
index 00000000..8e786954
--- /dev/null
+++ b/aeneas/tests/res/inputtext/sonnet_mplain_multiple_blank.txt
@@ -0,0 +1,24 @@
+1
+
+
+From fairest creatures we desire increase,
+That thereby beauty's rose might never die,
+But as the riper should by time decease,
+His tender heir might bear his memory:
+
+
+But thou contracted to thine own bright eyes,
+Feed'st thy light's flame with self-substantial fuel,
+Making a famine where abundance lies,
+Thy self thy foe, to thy sweet self too cruel:
+
+
+
+
+Thou that art now the world's fresh ornament,
+And only herald to the gaudy spring,
+Within thine own bud buriest thy content,
+And tender churl mak'st waste in niggarding:
+
+Pity the world, or else this glutton be,
+To eat the world's due, by the grave and thee.
diff --git a/aeneas/tests/res/inputtext/sonnet_mplain_no_end_newline.txt b/aeneas/tests/res/inputtext/sonnet_mplain_no_end_newline.txt
new file mode 100644
index 00000000..c0be2822
--- /dev/null
+++ b/aeneas/tests/res/inputtext/sonnet_mplain_no_end_newline.txt
@@ -0,0 +1,19 @@
+1
+
+From fairest creatures we desire increase,
+That thereby beauty's rose might never die,
+But as the riper should by time decease,
+His tender heir might bear his memory:
+
+But thou contracted to thine own bright eyes,
+Feed'st thy light's flame with self-substantial fuel,
+Making a famine where abundance lies,
+Thy self thy foe, to thy sweet self too cruel:
+
+Thou that art now the world's fresh ornament,
+And only herald to the gaudy spring,
+Within thine own bud buriest thy content,
+And tender churl mak'st waste in niggarding:
+
+Pity the world, or else this glutton be,
+To eat the world's due, by the grave and thee.
diff --git a/aeneas/tests/res/inputtext/sonnet_mplain_with_end_newline.txt b/aeneas/tests/res/inputtext/sonnet_mplain_with_end_newline.txt
new file mode 100644
index 00000000..33f28f6f
--- /dev/null
+++ b/aeneas/tests/res/inputtext/sonnet_mplain_with_end_newline.txt
@@ -0,0 +1,20 @@
+1
+
+From fairest creatures we desire increase,
+That thereby beauty's rose might never die,
+But as the riper should by time decease,
+His tender heir might bear his memory:
+
+But thou contracted to thine own bright eyes,
+Feed'st thy light's flame with self-substantial fuel,
+Making a famine where abundance lies,
+Thy self thy foe, to thy sweet self too cruel:
+
+Thou that art now the world's fresh ornament,
+And only herald to the gaudy spring,
+Within thine own bud buriest thy content,
+And tender churl mak'st waste in niggarding:
+
+Pity the world, or else this glutton be,
+To eat the world's due, by the grave and thee.
+
diff --git a/aeneas/tests/res/inputtext/sonnet_munparsed.xhtml b/aeneas/tests/res/inputtext/sonnet_munparsed.xhtml
new file mode 100644
index 00000000..0d7a167e
--- /dev/null
+++ b/aeneas/tests/res/inputtext/sonnet_munparsed.xhtml
@@ -0,0 +1,165 @@
+
+
+