diff --git a/MANIFEST.in b/MANIFEST.in index ef99fc9b..94d3d07c 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,3 +1,10 @@ +recursive-include aeneas/cdtw * +recursive-include aeneas/cew * +recursive-include aeneas/cint * +recursive-include aeneas/cmfcc * +recursive-include aeneas/cwave * +recursive-include aeneas/extra * +prune aeneas/extra/ctw_speect recursive-include aeneas/res * recursive-include aeneas/tools/res * include aeneas_check_setup.py diff --git a/README.md b/README.md index 39fc7756..5e78ff43 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,13 @@ **aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). -* Version: 1.4.1 -* Date: 2016-02-13 +* Version: 1.5.0 +* Date: 2016-04-02 * Developed by: [ReadBeyond](http://www.readbeyond.it/) * Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/) * License: the GNU Affero General Public License Version 3 (AGPL v3) * Contact: [aeneas@readbeyond.it](mailto:aeneas@readbeyond.it) -* Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [API Docs](http://www.readbeyond.it/aeneas/docs/) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org) +* Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [Docs](http://www.readbeyond.it/aeneas/docs/) - [Tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org) ## Goal @@ -19,32 +19,38 @@ and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) **forced alignment**. -For example, given [this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml) -and [this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3), +For example, given +[this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml) +and +[this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3), **aeneas** determines, for each fragment, the corresponding time interval in the audio file: ``` -1 => [00:00:00.000, 00:00:02.680] -From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480] -That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640] -But as the riper should by time decease, => [00:00:08.640, 00:00:11.960] -His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280] -But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520] -Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760] -Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720] -Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240] -Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280] -And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960] -Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640] -And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600] -Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000] -To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280] +1 => [00:00:00.000, 00:00:02.640] +From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] +That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] +But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] +His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] +But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] +Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] +Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] +Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] +Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] +And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] +Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] +And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] +Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] +To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240] ``` +![Waveform with aligned labels, detail](wiki/align.png) + This synchronization map can be output to file in several formats: -SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, -JSON/RBSE for Web usage, -or raw CSV/SSV/TSV/TXT/XML for further processing. +EAF for research purposes, +SMIL for EPUB 3, +SBV/SRT/SUB/TTML/VTT for closed captioning, +JSON for Web usage, +or raw AUD/CSV/SSV/TSV/TXT/XML for further processing. ## System Requirements, Supported Platforms and Installation @@ -56,30 +62,33 @@ or raw CSV/SSV/TSV/TXT/XML for further processing. 3. [FFmpeg](https://www.ffmpeg.org/) 4. [eSpeak](http://espeak.sourceforge.net/) 5. Python modules `BeautifulSoup4`, `lxml`, and `numpy` -6. Python C headers to compile the Python C extensions (Optional but strongly recommended) -7. A shell supporting UTF-8 (Optional but strongly recommended) -8. Python module `pafy` (Optional, only required if you want to download audio from YouTube) +6. Python C headers to compile the Python C extensions (optional but strongly recommended) +7. A shell supporting UTF-8 (optional but strongly recommended) ### Supported Platforms **aeneas** has been developed and tested on **Debian 64bit**, which is the **only supported OS** at the moment. - -However, **aeneas** has been confirmed to work on +Nevertheless, **aeneas** has been confirmed to work on other Linux distributions, OS X, and Windows. -See the [PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md) for the details. +See the +[PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md) +for the details. If installing **aeneas** natively on your OS proves difficult, you are strongly encouraged to use [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant), which provides **aeneas** inside a virtualized Debian image -running under [VirtualBox](https://www.virtualbox.org/) -and [Vagrant](http://www.vagrantup.com/), which can be installed -on any modern OS (Linux, Mac OS X, Windows). +running under +[VirtualBox](https://www.virtualbox.org/) +and +[Vagrant](http://www.vagrantup.com/), +which can be installed on any modern OS (Linux, Mac OS X, Windows). ### Installation -1. Install [Python](https://python.org/) (2.7.x preferred), +1. Install + [Python](https://python.org/) (2.7.x preferred), [FFmpeg](https://www.ffmpeg.org/), and [eSpeak](http://espeak.sourceforge.net/) @@ -93,59 +102,76 @@ on any modern OS (Linux, Mac OS X, Windows). pip install aeneas ``` -See the [INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md) +See the +[INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md) for detailed, step-by-step procedures for Linux, OS X, and Windows. ## Usage -1. To check that you installed `aeneas` correctly, run: +1. To **check** whether you installed **aeneas** correctly, run: ```bash python -m aeneas.diagnostics ``` -2. Run `execute_task` or `execute_job` - with `-h` (resp., `--help`) to get a short (resp., long) usage message: +2. Run without arguments to get the **usage message**: ```bash - python -m aeneas.tools.execute_task -h - python -m aeneas.tools.execute_job -h + python -m aeneas.tools.execute_task + python -m aeneas.tools.execute_job ``` - The above commands also print a list of live usage examples - that you can immediately run on your machine, - thanks to the included example files. + You can also get a list of **live examples** + that you can immediately run on your machine + thanks to the included files: + + ```bash + python -m aeneas.tools.execute_task --examples + python -m aeneas.tools.execute_task --examples-all + ``` -3. To compute a synchronization map `map.json` for a pair - (`audio.mp3`, `text.txt` in [`plain`](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN) text format), you can run: +3. To **compute a synchronization map** `map.json` for a pair + (`audio.mp3`, `text.txt` in + [plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN) + text format), you can run: ```bash python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \ - "task_language=en|os_task_file_format=json|is_text_type=plain" \ + "task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json ``` - To compute a synchronization map `map.smil` for a pair - (`audio.mp3`, [`page.xhtml`](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED) containing fragments marked by `id` attributes like `f001`), + (The command has been split into lines with `\` for visual clarity; + in production you can have the entire command on a single line + and/or you can use shell variables.) + + To **compute a synchronization map** `map.smil` for a pair + (`audio.mp3`, + [page.xhtml](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED) + containing fragments marked by `id` attributes like `f001`), you can run: ```bash python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \ - "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ + "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smil ``` - The third parameter (the _configuration string_) can specify several other parameters/options. - See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. + As you can see, the third argument (the _configuration string_) + specifies the parameters controlling the I/O formats + and the processing options for the task. + Consult the + [documentation](http://www.readbeyond.it/aeneas/docs/) + for details. 4. If you have several tasks to process, - you can create a job container and a configuration file, - to process them all at once: + you can create a **job container** + to batch process them: ```bash python -m aeneas.tools.execute_job job.zip output_directory @@ -155,48 +181,59 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows. configuration file, providing **aeneas** with all the information needed to parse the input assets and format the output sync map files. - See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. + Consult the + [documentation](http://www.readbeyond.it/aeneas/docs/) + for details. -The [documentation](http://www.readbeyond.it/aeneas/docs/) -provides an introduction to the concepts of -[`task`](http://www.readbeyond.it/aeneas/docs/#tasks) and -[`job`](http://www.readbeyond.it/aeneas/docs/#job), -and it lists of all the options and tools available in the library. +The +[documentation](http://www.readbeyond.it/aeneas/docs/) +contains a highly suggested +[tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) +which explains how to use the built-in command line tools. ## Documentation and Support -Documentation: [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/) - -High level description of how aeneas works: [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md) - -Tutorial: [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html) - -Mailing list: [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment) - -Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html) - -Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md) +* Documentation: + [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/) +* Command line tools tutorial: + [http://www.readbeyond.it/aeneas/docs/clitutorial.html](http://www.readbeyond.it/aeneas/docs/clitutorial.html) +* Library tutorial: + [http://www.readbeyond.it/aeneas/docs/libtutorial.html](http://www.readbeyond.it/aeneas/docs/libtutorial.html) +* Old, verbose tutorial: + [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html) +* Mailing list: + [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment) +* Changelog: + [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html) +* High level description of how **aeneas** works: + [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md) +* Development history: + [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md) ## Supported Features -* Input text files in plain, parsed, subtitles, or unparsed format +* Input text files in `parsed`, `plain`, `subtitles`, or `unparsed` (XML) format +* Multilevel input text files in `mplain` and `munparsed` (XML) format * Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes * Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.) -* Input audio file formats: all those supported by `ffmpeg` -* Possibility of downloading the audio file from a YouTube video -* Batch processing -* Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT, VTT, XML -* Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, SW, TR, UK +* Input audio file formats: all those readable by `ffmpeg` +* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TSV, TTML, TXT, VTT, XML +* Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR +* MFCC and DTW computed via Python C extensions to reduce the processing time +* On Linux, eSpeak called via a Python C extension for faster audio synthesis +* Batch processing of multiple audio/text pairs +* Several built-in TTS engine wrappers: eSpeak (default, FLOSS), Festival (FLOSS), Nuance TTS API (commercial) +* Use custom TTS engine wrappers besides the built-in ones +* Download audio from a YouTube video +* In multilevel mode, recursive alignment from paragraph to sentence to word level * Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes -* Code suitable for a Web app deployment (e.g., on-demand AWS instances) * Adjustable splitting times, including a max character/second constraint for CC applications * Automated detection of audio head/tail -* MFCC and DTW computed via Python C extensions to reduce the processing time -* On Linux, `espeak` called via a Python C extension for faster audio synthesis -* Output an HTML file (from `finetuneas` project) for fine tuning the sync map manually +* Output an HTML file for fine tuning the sync map manually (`finetuneas` project) * Execution parameters tunable at runtime +* Code suitable for Web app deployment (e.g., on-demand cloud computing) ## Limitations and Missing Features @@ -204,7 +241,6 @@ Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/ * Audio should match the text: large portions of spurious text or audio might produce a wrong sync map * Audio is assumed to be spoken: not suitable/YMMV for song captioning * No protection against memory trashing if you feed extremely long audio files -* On Mac OS X and Windows, audio synthesis might be slow if you have thousands of text fragments * [Open issues](https://github.com/readbeyond/aeneas/issues) @@ -212,10 +248,12 @@ Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/ **aeneas** is released under the terms of the GNU Affero General Public License Version 3. -See the [LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details. +See the +[LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details. Licenses for third party code and files included in **aeneas** -can be found in the [licenses/](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory. +can be found in the +[licenses](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory. No copy rights were harmed in the making of this project. @@ -232,6 +270,8 @@ No copy rights were harmed in the making of this project. * **October 2015**: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0) +* **April 2016**: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0 + ### Supporting Would you like supporting the development of **aeneas**? @@ -245,7 +285,8 @@ I accept sponsorships to * support of third party installations, and * improve the documentation. -Feel free to [get in touch](mailto:aeneas@readbeyond.it). +Feel free to +[get in touch](mailto:aeneas@readbeyond.it). ### Contributing @@ -297,8 +338,13 @@ for its asynchronous usage. **Chris Hubbard** prepared the files for packaging aeneas as a Debian/Ubuntu `.deb`. -All the mighty [GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors), -and the members of the [Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment). +**Firat Ozdemir** contributed the `finetuneas` +HTML/JS code for fine tuning sync maps in the browser. + +All the mighty +[GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors), +and the members of the +[Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment). diff --git a/README.rst b/README.rst index 629ab29c..dec6c93f 100644 --- a/README.rst +++ b/README.rst @@ -4,16 +4,18 @@ aeneas **aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). -- Version: 1.4.1 -- Date: 2016-02-13 +- Version: 1.5.0 +- Date: 2016-04-02 - Developed by: `ReadBeyond `__ - Lead Developer: `Alberto Pettarin `__ - License: the GNU Affero General Public License Version 3 (AGPL v3) - Contact: aeneas@readbeyond.it - Quick Links: `Home `__ - `GitHub `__ - - `PyPI `__ - `API - Docs `__ - `Mailing + `PyPI `__ - + `Docs `__ - + `Tutorial `__ + - `Mailing List `__ - `Web App `__ @@ -34,25 +36,31 @@ interval in the audio file: :: - 1 => [00:00:00.000, 00:00:02.680] - From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480] - That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640] - But as the riper should by time decease, => [00:00:08.640, 00:00:11.960] - His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280] - But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520] - Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760] - Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720] - Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240] - Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280] - And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960] - Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640] - And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600] - Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000] - To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280] - -This synchronization map can be output to file in several formats: SMIL -for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON/RBSE for -Web usage, or raw CSV/SSV/TSV/TXT/XML for further processing. + 1 => [00:00:00.000, 00:00:02.640] + From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] + That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] + But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] + His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] + But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] + Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] + Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] + Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] + Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] + And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] + Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] + And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] + Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] + To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240] + +.. figure:: wiki/align.png + :alt: Waveform with aligned labels, detail + + Waveform with aligned labels, detail + +This synchronization map can be output to file in several formats: EAF +for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed +captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for +further processing. System Requirements, Supported Platforms and Installation --------------------------------------------------------- @@ -66,20 +74,17 @@ System Requirements 3. `FFmpeg `__ 4. `eSpeak `__ 5. Python modules ``BeautifulSoup4``, ``lxml``, and ``numpy`` -6. Python C headers to compile the Python C extensions (Optional but +6. Python C headers to compile the Python C extensions (optional but strongly recommended) -7. A shell supporting UTF-8 (Optional but strongly recommended) -8. Python module ``pafy`` (Optional, only required if you want to - download audio from YouTube) +7. A shell supporting UTF-8 (optional but strongly recommended) Supported Platforms ~~~~~~~~~~~~~~~~~~~ **aeneas** has been developed and tested on **Debian 64bit**, which is -the **only supported OS** at the moment. - -However, **aeneas** has been confirmed to work on other Linux -distributions, OS X, and Windows. See the `PLATFORMS +the **only supported OS** at the moment. Nevertheless, **aeneas** has +been confirmed to work on other Linux distributions, OS X, and Windows. +See the `PLATFORMS file `__ for the details. @@ -115,25 +120,28 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows. Usage ----- -1. To check that you installed ``aeneas`` correctly, run: +1. To **check** whether you installed **aeneas** correctly, run: ``bash python -m aeneas.diagnostics`` -2. Run ``execute_task`` or ``execute_job`` with ``-h`` (resp., - ``--help``) to get a short (resp., long) usage message: +2. Run without arguments to get the **usage message**: .. code:: bash - python -m aeneas.tools.execute_task -h - python -m aeneas.tools.execute_job -h + python -m aeneas.tools.execute_task + python -m aeneas.tools.execute_job + + You can also get a list of **live examples** that you can immediately + run on your machine thanks to the included files: - The above commands also print a list of live usage examples that you - can immediately run on your machine, thanks to the included example - files. + .. code:: bash -3. To compute a synchronization map ``map.json`` for a pair + python -m aeneas.tools.execute_task --examples + python -m aeneas.tools.execute_task --examples-all + +3. To **compute a synchronization map** ``map.json`` for a pair (``audio.mp3``, ``text.txt`` in - ```plain`` `__ + `plain `__ text format), you can run: .. code:: bash @@ -141,11 +149,16 @@ Usage python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \ - "task_language=en|os_task_file_format=json|is_text_type=plain" \ + "task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json -To compute a synchronization map ``map.smil`` for a pair (``audio.mp3``, -```page.xhtml`` `__ +(The command has been split into lines with ``\`` for visual clarity; in +production you can have the entire command on a single line and/or you +can use shell variables.) + +To **compute a synchronization map** ``map.smil`` for a pair +(``audio.mp3``, +`page.xhtml `__ containing fragments marked by ``id`` attributes like ``f001``), you can run: @@ -155,16 +168,17 @@ run: python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \ - "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ + "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smil ``` -The third parameter (the *configuration string*) can specify several -other parameters/options. See the +As you can see, the third argument (the *configuration string*) +specifies the parameters controlling the I/O formats and the processing +options for the task. Consult the `documentation `__ for details. -4. If you have several tasks to process, you can create a job container - and a configuration file, to process them all at once: +4. If you have several tasks to process, you can create a **job + container** to batch process them: .. code:: bash @@ -172,63 +186,71 @@ other parameters/options. See the File ``job.zip`` should contain a ``config.txt`` or ``config.xml`` configuration file, providing **aeneas** with all the information needed -to parse the input assets and format the output sync map files. See the -`documentation `__ for details. +to parse the input assets and format the output sync map files. Consult +the `documentation `__ for +details. -The `documentation `__ provides -an introduction to the concepts of -```task`` `__ and -```job`` `__, and it lists of -all the options and tools available in the library. +The `documentation `__ contains a +highly suggested +`tutorial `__ +which explains how to use the built-in command line tools. Documentation and Support ------------------------- -Documentation: http://www.readbeyond.it/aeneas/docs/ - -High level description of how aeneas works: -`HOWITWORKS `__ - -Tutorial: `A Practical Introduction To The aeneas -Package `__ - -Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment - -Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html - -Development history: -`HISTORY `__ +- Documentation: http://www.readbeyond.it/aeneas/docs/ +- Command line tools tutorial: + http://www.readbeyond.it/aeneas/docs/clitutorial.html +- Library tutorial: + http://www.readbeyond.it/aeneas/docs/libtutorial.html +- Old, verbose tutorial: `A Practical Introduction To The aeneas + Package `__ +- Mailing list: + https://groups.google.com/d/forum/aeneas-forced-alignment +- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html +- High level description of how **aeneas** works: + `HOWITWORKS `__ +- Development history: + `HISTORY `__ Supported Features ------------------ -- Input text files in plain, parsed, subtitles, or unparsed format +- Input text files in ``parsed``, ``plain``, ``subtitles``, or + ``unparsed`` (XML) format +- Multilevel input text files in ``mplain`` and ``munparsed`` (XML) + format - Text extraction from XML (e.g., XHTML) files using ``id`` and ``class`` attributes - Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.) -- Input audio file formats: all those supported by ``ffmpeg`` -- Possibility of downloading the audio file from a YouTube video -- Batch processing -- Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT, - VTT, XML -- Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI, - FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, - SR, SV, SW, TR, UK +- Input audio file formats: all those readable by ``ffmpeg`` +- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, + TSV, TTML, TXT, VTT, XML +- Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, + EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD, + NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR +- MFCC and DTW computed via Python C extensions to reduce the + processing time +- On Linux, eSpeak called via a Python C extension for faster audio + synthesis +- Batch processing of multiple audio/text pairs +- Several built-in TTS engine wrappers: eSpeak (default, FLOSS), + Festival (FLOSS), Nuance TTS API (commercial) +- Use custom TTS engine wrappers besides the built-in ones +- Download audio from a YouTube video +- In multilevel mode, recursive alignment from paragraph to sentence to + word level - Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes -- Code suitable for a Web app deployment (e.g., on-demand AWS - instances) - Adjustable splitting times, including a max character/second constraint for CC applications - Automated detection of audio head/tail -- MFCC and DTW computed via Python C extensions to reduce the - processing time -- On Linux, ``espeak`` called via a Python C extension for faster audio - synthesis -- Output an HTML file (from ``finetuneas`` project) for fine tuning the - sync map manually +- Output an HTML file for fine tuning the sync map manually + (``finetuneas`` project) - Execution parameters tunable at runtime +- Code suitable for Web app deployment (e.g., on-demand cloud + computing) Limitations and Missing Features -------------------------------- @@ -238,8 +260,6 @@ Limitations and Missing Features - Audio is assumed to be spoken: not suitable/YMMV for song captioning - No protection against memory trashing if you feed extremely long audio files -- On Mac OS X and Windows, audio synthesis might be slow if you have - thousands of text fragments - `Open issues `__ License @@ -252,7 +272,7 @@ details. Licenses for third party code and files included in **aeneas** can be found in the -`licenses/ `__ +`licenses `__ directory. No copy rights were harmed in the making of this project. @@ -278,6 +298,9 @@ Sponsors - **October 2015**: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0) +- **April 2016**: the Fruch Foundation kindly sponsored the development + and documentation of v1.5.0 + Supporting ~~~~~~~~~~ @@ -337,6 +360,9 @@ asynchronous usage. **Chris Hubbard** prepared the files for packaging aeneas as a Debian/Ubuntu ``.deb``. +**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine +tuning sync maps in the browser. + All the mighty `GitHub contributors `__, and the members of the `Google diff --git a/VERSION b/VERSION index 347f5833..bc80560f 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.4.1 +1.5.0 diff --git a/aeneas/__init__.py b/aeneas/__init__.py index 18a457ef..3bb2b30b 100644 --- a/aeneas/__init__.py +++ b/aeneas/__init__.py @@ -6,51 +6,6 @@ to automagically synchronize audio and text (aka forced alignment). """ -from __future__ import absolute_import -from __future__ import print_function -from aeneas.adjustboundaryalgorithm import AdjustBoundaryAlgorithm -from aeneas.analyzecontainer import AnalyzeContainer -from aeneas.audiofile import AudioFile -from aeneas.audiofile import AudioFileMonoWAVE -from aeneas.audiofile import AudioFileUnsupportedFormatError -from aeneas.container import Container -from aeneas.container import ContainerFormat -from aeneas.downloader import Downloader -from aeneas.dtw import DTWAlgorithm -from aeneas.dtw import DTWAligner -from aeneas.espeakwrapper import ESPEAKWrapper -from aeneas.executejob import ExecuteJob -from aeneas.executetask import ExecuteTask -from aeneas.executetask import ExecuteTaskExecutionError -from aeneas.executetask import ExecuteTaskInputError -from aeneas.ffmpegwrapper import FFMPEGWrapper -from aeneas.ffprobewrapper import FFPROBEParsingError -from aeneas.ffprobewrapper import FFPROBEUnsupportedFormatError -from aeneas.ffprobewrapper import FFPROBEWrapper -from aeneas.hierarchytype import HierarchyType -from aeneas.idsortingalgorithm import IDSortingAlgorithm -from aeneas.job import Job -from aeneas.job import JobConfiguration -from aeneas.language import Language -from aeneas.logger import Logger -from aeneas.sd import SD -from aeneas.sd import SDMetric -from aeneas.syncmap import SyncMap -from aeneas.syncmap import SyncMapFormat -from aeneas.syncmap import SyncMapFragment -from aeneas.syncmap import SyncMapHeadTailFormat -from aeneas.syncmap import SyncMapMissingParameterError -from aeneas.synthesizer import Synthesizer -from aeneas.task import Task -from aeneas.task import TaskConfiguration -from aeneas.textfile import TextFile -from aeneas.textfile import TextFileFormat -from aeneas.textfile import TextFragment -from aeneas.vad import VAD -from aeneas.validator import Validator -import aeneas.globalconstants as gc -import aeneas.globalfunctions as gf - __author__ = "Alberto Pettarin" __copyright__ = """ Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) @@ -58,7 +13,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" diff --git a/aeneas/adjustboundaryalgorithm.py b/aeneas/adjustboundaryalgorithm.py index 2d110c06..4651c746 100644 --- a/aeneas/adjustboundaryalgorithm.py +++ b/aeneas/adjustboundaryalgorithm.py @@ -2,18 +2,26 @@ # coding=utf-8 """ -Enumeration of the available algorithms to adjust -the boundary point between two fragments. +This module contains the following classes: -.. versionadded:: 1.0.4 +* :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` + implementing functions to adjust + the boundary point between two consecutive fragments. + +.. warning:: This module is likely to be refactored in a future version """ from __future__ import absolute_import +from __future__ import division from __future__ import print_function -import copy +import numpy -from aeneas.logger import Logger +from aeneas.audiofilemfcc import AudioFileMFCC +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.textfile import TextFile +from aeneas.timevalue import Decimal +from aeneas.timevalue import TimeValue __author__ = "Alberto Pettarin" __copyright__ = """ @@ -22,67 +30,183 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class AdjustBoundaryAlgorithm(object): +class AdjustBoundaryAlgorithm(Loggable): """ - Enumeration of the available algorithms to adjust - the boundary point between two consecutive fragments. - - :param algorithm: the boundary adjustment algorithm to be used - :type algorithm: :class:`aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` enum - :param text_map: a text map list [[start, end, id, text], ..., []] - :type text_map: list - :param speech: a list of time intervals [[s_1, e_1,], ..., [s_k, e_k]] - containing speech - :type speech: list - :param nonspeech: a list of time intervals [[s_1, e_1,], ..., [s_j, e_j]] - not containing speech - :type nonspeech: list - :param value: an optional parameter to be passed - to the boundary adjustment algorithm, - it will be converted (to int, to float) as needed, - depending on the selected algorithm - :type value: string - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + Enumeration and implementation of the available algorithms + to adjust the boundary point between two consecutive fragments. + + :param algorithm: the algorithm to be used + :type algorithm: :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` + :param list parameters: a list of additional parameters to be passed to the algorithm + :param boundary_indices: the current boundary indices, + with respect to the audio file full MFCCs + :type boundary_indices: :class:`numpy.ndarray` (1D) + :param real_wave_mfcc: the audio file MFCCs + :type real_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :param text_file: the text file containing the text fragments associated + :type text_file: :class:`~aeneas.textfile.TextFile` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` - - :raises ValueError: if one of `text_map`, `speech` or `nonspeech` is `None` or `algorithm` value is not allowed + :type logger: :class:`~aeneas.logger.Logger` + :raises: ValueError: if the value of ``algorithm`` is not allowed + :raises: TypeError: if one of ``boundary_indices``, ``real_wave_mfcc``, + or ``text_file`` is ``None`` or it has a wrong type """ AFTERCURRENT = "aftercurrent" - """ Set the boundary at ``value`` seconds - after the end of the current fragment """ + """ + Set the boundary at ``value`` seconds + after the end of the current fragment, + if the current boundary falls inside + a nonspeech interval. + If not, no adjustment is made. + + Example (value ``0.200`` seconds): + + .. image:: _static/aftercurrent.200.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and AFTERCURRENT labels with 0.200 seconds offset + """ AUTO = "auto" - """ Auto (no adjustment) """ + """ + Auto (no adjustment). + + Example: + + .. image:: _static/auto.png + :scale: 100% + :align: center + :alt: The AUTO method does not change the time intervals + """ BEFORENEXT = "beforenext" - """ Set the boundary at ``value`` seconds - before the beginning of the next fragment """ + """ + Set the boundary at ``value`` seconds + before the beginning of the next fragment, + if the current boundary falls inside + a nonspeech interval. + If not, no adjustment is made. + + Example (value ``0.200`` seconds): + + .. image:: _static/beforenext.200.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and BEFORENEXT labels with 0.200 seconds offset + """ OFFSET = "offset" - """ Offset the current boundaries by ``value`` seconds + """ + Offset the current boundaries by ``value`` seconds. + The ``value`` can be negative or positive. + + Example (value ``-0.200`` seconds): + + .. image:: _static/offset.m200.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and OFFSET labels with value -0.200 + + Example (value ``0.200`` seconds): + + .. image:: _static/offset.200.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and OFFSET labels with value 0.200 .. versionadded:: 1.1.0 """ PERCENT = "percent" - """ Set the boundary at ``value`` percent of - the nonspeech interval between the current and the next fragment """ + """ + Set the boundary at ``value`` percent of + the nonspeech interval between the current and the next fragment, + if the current boundary falls inside + a nonspeech interval. + The ``value`` must be an integer in ``[0, 100]``. + If not, no adjustment is made. + + Example (value ``25`` %): + + .. image:: _static/percent.25.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and PERCENT labels with value 25 % + + Example (value ``50`` %): + + .. image:: _static/percent.50.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and PERCENT labels with value 50 % + + Example (value ``75`` %): + + .. image:: _static/percent.75.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and PERCENT labels with value 75 % + + """ RATE = "rate" - """ Adjust boundaries trying to respect the - ``value`` characters/second constraint """ + """ + Adjust boundaries trying to respect the + ``value`` characters/second constraint. + The ``value`` must be positive. + First, the rates of all fragments are computed, + using the current boundaries. + For those fragments exceeding ``value`` characters/second, + the algorithm will try to move the end boundary forward, + so that its time interval increases (and hence its rate decreases). + Clearly, it is possible that not all fragments + can be adjusted this way: for example, + if you have three consecutive fragments exceeding ``value``, + the middle one cannot be stretched. + + Example (value ``13.0``, note how ``f000003`` is modified): + + .. image:: _static/rate.13.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and RATE labels with value 13.0 + + """ RATEAGGRESSIVE = "rateaggressive" - """ Adjust boundaries trying to respect the - ``value`` characters/second constraint (aggressive mode) + """ + Adjust boundaries trying to respect the + ``value`` characters/second constraint, in aggressive mode. + The ``value`` must be positive. + First, the rates of all fragments are computed, + using the current boundaries. + For those fragments exceeding ``value`` characters/second, + the algorithm will try to move the end boundary forward, + so that its time interval increases (and hence its rate decreases). + If moving the end boundary is not possible, + or it is not enough to keep the rate below ``value``, + the algorithm will try to move the begin boundary back; + this is the difference with the less aggressive + :data:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm.RATE` + algorithm. + Clearly, it is possible that not all fragments + can be adjusted this way: for example, + if you have three consecutive fragments exceeding ``value``, + the middle one cannot be stretched. + + Example (value ``13.0``, note how ``f000003`` is modified): + + .. image:: _static/rateaggressive.13.png + :scale: 100% + :align: center + :alt: Comparison between AUTO labels and RATEAGGRESSIVE labels with value 13.0 .. versionadded:: 1.1.0 """ @@ -98,515 +222,334 @@ class AdjustBoundaryAlgorithm(object): ] """ List of all the allowed values """ - DEFAULT_MAX_RATE = 21.0 - """ Default max rate (used only when ``RATE`` or ``RATEAGGRESSIVE`` - algorithms are used) """ - - DEFAULT_PERCENT = 50 - """ Default percent value (used only when ``PERCENT`` algorithm is used) """ - - TOLERANCE = 0.001 - """ Tolerance when comparing floats """ - TAG = u"AdjustBoundaryAlgorithm" def __init__( self, algorithm, - text_map, - speech, - nonspeech, - value=None, + parameters, + boundary_indices, + real_wave_mfcc, + text_file, rconf=None, logger=None ): if algorithm not in self.ALLOWED_VALUES: - raise ValueError("Algorithm value not allowed") - if text_map is None: - raise ValueError("Text map is None") - if speech is None: - raise ValueError("Speech list is None") - if nonspeech is None: - raise ValueError("Nonspeech list is None") + raise ValueError(u"Algorithm value not allowed") + if boundary_indices is None: + raise TypeError(u"boundary_indices is None") + if (real_wave_mfcc is None) or (not isinstance(real_wave_mfcc, AudioFileMFCC)): + raise TypeError(u"real_wave_mfcc is None or not an AudioFileMFCC object") + if (text_file is None) or (not isinstance(text_file, TextFile)): + raise TypeError(u"text_file is None or not a TextFile object") + super(AdjustBoundaryAlgorithm, self).__init__(rconf=rconf, logger=logger) self.algorithm = algorithm - self.text_map = copy.deepcopy(text_map) - self.speech = speech - self.nonspeech = nonspeech - self.value = value - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() - self._parse_value() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - - def _parse_value(self): - """ - Parse the self.value value - """ - if self.algorithm == self.AUTO: - return - elif self.algorithm == self.PERCENT: - try: - self.value = int(self.value) - except ValueError: - self.value = self.DEFAULT_PERCENT - self.value = max(min(self.value, 100), 0) - else: - try: - self.value = float(self.value) - except ValueError: - self.value = 0.0 - if ( - (self.value <= 0) and - (self.algorithm in [self.RATE, self.RATEAGGRESSIVE]) - ): - self.value = self.DEFAULT_MAX_RATE + self.parameters = parameters + self.real_wave_mfcc = real_wave_mfcc + self.boundary_indices = boundary_indices + self.text_file = text_file + self.intervals = [] - def adjust(self): + def to_time_map(self): """ - Adjust the boundaries of the text map. + Adjust the boundaries of the text map + using the algorithm and parameters + specified in the constructor, + and return a list of time intervals. :rtype: list of intervals """ if self.algorithm == self.AUTO: - return self._adjust_auto() + self._adjust_auto() elif self.algorithm == self.AFTERCURRENT: - return self._adjust_aftercurrent() + self._adjust_aftercurrent() elif self.algorithm == self.BEFORENEXT: - return self._adjust_beforenext() + self._adjust_beforenext() elif self.algorithm == self.OFFSET: - return self._adjust_offset() + self._adjust_offset() elif self.algorithm == self.PERCENT: - return self._adjust_percent() + self._adjust_percent() elif self.algorithm == self.RATE: - return self._adjust_rate(False) + self._adjust_rate(False) elif self.algorithm == self.RATEAGGRESSIVE: - return self._adjust_rate(True) - return self.text_map + self._adjust_rate(True) + else: + self._adjust_auto() + return self.intervals def _adjust_auto(self): - self._log(u"Called _adjust_auto: returning text_map unchanged") - return self.text_map - - def _adjust_offset(self): - self._log(u"Called _adjust_offset") - try: - for index in range(1, len(self.text_map)): - current = self.text_map[index] - previous = self.text_map[index - 1] - if self.value >= 0: - offset = min(self.value, current[1] - current[0]) - else: - offset = -min(-self.value, previous[1] - previous[0]) - previous[1] += offset - current[0] += offset - except: - self._log(u"Exception in _adjust_offset: returning text_map unchanged") - return self.text_map - - def _adjust_percent(self): - def new_time(current_boundary, nsi): - duration = nsi[1] - nsi[0] - percent = self.value / 100.0 - return nsi[0] + duration * percent - return self._adjust_on_nsi(new_time) - - def _adjust_aftercurrent(self): - def new_time(current_boundary, nsi): - duration = nsi[1] - nsi[0] - try: - delay = max(min(self.value, duration), 0) - if delay == 0: - return current_boundary - return nsi[0] + delay - except: - return current_boundary - return self._adjust_on_nsi(new_time) - - def _adjust_beforenext(self): - def new_time(current_boundary, nsi): - duration = nsi[1] - nsi[0] - try: - delay = max(min(self.value, duration), 0) - if delay == 0: - return current_boundary - return nsi[1] - delay - except: - return current_boundary - return self._adjust_on_nsi(new_time) - - def _adjust_on_nsi(self, new_time_function): - nsi_index = 0 - # TODO numpy-fy this loop? - for index in range(len(self.text_map) - 1): - current_boundary = self.text_map[index][1] - self._log([u"current_boundary: %.3f", current_boundary]) - # the tolerance comparison seems necessary - while ( - (nsi_index < len(self.nonspeech)) and - (self.nonspeech[nsi_index][1] + self.TOLERANCE <= current_boundary) - ): - nsi_index += 1 - nsi = None - if ( - (nsi_index < len(self.nonspeech)) and - (current_boundary >= self.nonspeech[nsi_index][0] - self.TOLERANCE) - ): - nsi = self.nonspeech[nsi_index] - nsi_index += 1 - if nsi: - self._log([u" in interval %.3f %.3f", nsi[0], nsi[1]]) - new_time = new_time_function(current_boundary, nsi) - self._log([u" new_time: %.3f", new_time]) - new_start = self.text_map[index][0] - new_end = self.text_map[index + 1][1] - if self._time_in_interval(new_time, new_start, new_end): - self._log([u" updating %.3f => %.3f", current_boundary, new_time]) - self.text_map[index][1] = new_time - self.text_map[index + 1][0] = new_time - else: - self._log(u" new_time outside: no adjustment performed") - else: - self._log(u" no nonspeech interval found: no adjustment performed") - return self.text_map - - def _len(self, string): - """ - Return the length of the given string. - If it is greater than 2 times the self.value (= user max rate), - one space will become a newline, - and hence we do not count it - (e.g., value = 21 => max 42 chars per line). - - :param string: the string to be counted - :type string: string - :rtype: int """ - # TODO this should depend on the number of lines - # in the text fragment; current code assumes - # at most 2 lines of at most value characters each - # (the effect of this finesse is negligible in practice) - if string is None: - return 0 - length = len(string) - if length > 2 * self.value: - length -= 1 - return length - - def _time_in_interval(self, time, start, end): + AUTO (do not modify) """ - Decides whether the given time is within the given interval. - - :param time: a time value - :type time: float - :param start: the start of the interval - :type start: float - :param end: the end of the interval - :type end: float - :rtype: bool - """ - return (time >= start) and (time <= end) + self.log(u"Called _adjust_auto") + self._apply_offset(TimeValue("0.000")) - # TODO a more efficient search (e.g., binary) is possible - # the tolerance comparison seems necessary - def _find_interval_containing(self, intervals, time): - """ - Return the interval containing the given time, - or None if no such interval exists. - - :param intervals: a list of time intervals - [[s_1, e_1], ..., [s_k, e_k]] - :type intervals: list of lists - :param time: a time value - :type time: float - :rtype: a time interval ``[s, e]`` or ``None`` - """ - for interval in intervals: - start = interval[0] - self.TOLERANCE - end = interval[1] + self.TOLERANCE - if self._time_in_interval(time, start, end): - return interval - return None - - def _compute_rate_raw(self, start, end, length): + def _adjust_offset(self): """ - Compute the rate of a fragment, that is, - the number of characters per second. - - :param start: the start time - :type start: float - :param end: the end time - :type end: float - :param length: the number of character (possibly adjusted) of the text - :type length: int - :rtype: float + OFFSET """ - duration = end - start - if duration > 0: - return length / duration - return 0 + self.log(u"Called _adjust_offset") + # NOTE self.parameters[0] is TimeValue + self._apply_offset(self.parameters[0]) - def _compute_rate(self, index): + def _adjust_percent(self): + """ + PERCENT """ - Compute the rate of a fragment, that is, - the number of characters per second. + def new_time(begin, end, current): + """ Compute new time """ + # NOTE self.parameters[0] is an int + percent = max(min(Decimal(self.parameters[0]) / 100, 100), 0) + return (begin + (end + 1 - begin) * percent) * self.rconf.mws + self.log(u"Called _adjust_percent") + self._adjust_on_nonspeech(new_time) - :param index: the index of the fragment in the text map - :type index: int - :rtype: float + def _adjust_aftercurrent(self): """ - if (index < 0) or (index >= len(self.text_map)): - return 0 - fragment = self.text_map[index] - start = fragment[0] - end = fragment[1] - length = self._len(fragment[3]) - return self._compute_rate_raw(start, end, length) - - def _compute_slack(self, index): + AFTERCURRENT """ - Return the slack of a fragment, that is, - the difference between the current duration - of the fragment and the duration it should have - if its rate was exactly self.value (= max rate) - - If the slack is positive, the fragment - can be shrinken; if the slack is negative, - the fragment should be stretched. - - The returned value can be None, - in case the index is out of self.text_map bounds. + def new_time(begin, end, current): + """ Compute new time """ + mws = self.rconf.mws + # NOTE self.parameters[0] is TimeValue + delay = max(self.parameters[0], TimeValue("0.000")) + tentative = begin * mws + delay + if tentative > (end + 1) * mws: + return current * mws + return tentative + self.log(u"Called _adjust_aftercurrent") + self._adjust_on_nonspeech(new_time) - :param index: the index of the fragment in the text map - :type index: int - :rtype: float + def _adjust_beforenext(self): + """ + BEFORENEXT """ - if (index < 0) or (index >= len(self.text_map)): - return None - fragment = self.text_map[index] - start = fragment[0] - end = fragment[1] - length = self._len(fragment[3]) - duration = end - start - return duration - (length / self.value) + def new_time(begin, end, current): + """ Compute new time """ + mws = self.rconf.mws + # NOTE self.parameters[0] is TimeValue + delay = max(self.parameters[0], TimeValue("0.000")) + tentative = (end + 1) * mws - delay + if tentative < begin * mws: + return current * mws + return tentative + self.log(u"Called _adjust_beforenext") + self._adjust_on_nonspeech(new_time) def _adjust_rate(self, aggressive=False): - faster = [] - - # TODO numpy-fy this loop? - for index in range(len(self.text_map)): - fragment = self.text_map[index] - self._log([u"Fragment %d", index]) - rate = self._compute_rate(index) - self._log([u" %.3f %.3f => %.3f", fragment[0], fragment[1], rate]) - if rate > self.value: - self._log(u" too fast") - faster.append(index) - - if len(self.text_map) == 1: - self._log(u"Only one fragment, and it is too fast") - return self.text_map + self.log(u"Called _adjust_rate") + # if only one fragment, return unchanged + if len(self.text_file) <= 1: + self.log(u"Only one fragment, returning") + self._apply_offset(TimeValue("0.000")) + return + # compute fragments too fast + mws = self.rconf.mws + # NOTE self.parameters[0] is Decimal + max_rate = self.parameters[0] + times = self.boundary_indices * mws + durations = numpy.diff(times) + lengths = numpy.array([f.chars for f in self.text_file.fragments]) + # compute rates, dealing with division by zero + with numpy.errstate(divide="ignore", invalid="ignore"): + rates = numpy.divide(lengths, durations) + rates[rates == numpy.inf] = 0 + rates = numpy.nan_to_num(rates) + faster = numpy.where(rates > max_rate)[0] + + # if no fragment is faster, return unchanged if len(faster) == 0: - self._log([u"No fragment faster than max rate %.3f", self.value]) - return self.text_map + self.log([u"No fragment faster than max rate %.3f", max_rate]) + self._apply_offset(TimeValue("0.000")) + return - # TODO numpy-fy this loop? # try fixing faster fragments - self._log(u"Fixing faster fragments...") for index in faster: - self._log([u"Fixing faster fragment %d ...", index]) - if aggressive: - try: - self._rateaggressive_fix_fragment(index) - except: - self._log(u"Exception in _rateaggressive_fix_fragment") - else: - try: - self._rate_fix_fragment(index) - except: - self._log(u"Exception in _rate_fix_fragment") - self._log([u"Fixing faster fragment %d ... done", index]) - self._log(u"Fixing faster fragments... done") - return self.text_map - - def _rate_fix_fragment(self, index): - """ - Fix index-th fragment using the rate algorithm (standard variant). - """ - succeeded = False - current = self.text_map[index] - current_start = current[0] - current_end = current[1] - current_rate = self._compute_rate(index) - previous_slack = self._compute_slack(index - 1) - current_slack = self._compute_slack(index) - next_slack = self._compute_slack(index + 1) - if previous_slack is not None: - previous = self.text_map[index - 1] - self._log([u" previous: %.3f %.3f => %.3f", previous[0], previous[1], self._compute_rate(index - 1)]) - self._log([u" previous slack: %.3f", previous_slack]) - if current_slack is not None: - self._log([u" current: %.3f %.3f => %.3f", current_start, current_end, current_rate]) - self._log([u" current slack: %.3f", current_slack]) - if next_slack is not None: - nextf = self.text_map[index] - self._log([u" next: %.3f %.3f => %.3f", nextf[0], nextf[1], self._compute_rate(index + 1)]) - self._log([u" next slack: %.3f", next_slack]) - - # try expanding into the previous fragment - new_start = current_start - new_end = current_end - if (previous_slack is not None) and (previous_slack > 0): - self._log(u" can expand into previous") - nsi = self._find_interval_containing(self.nonspeech, current[0]) - previous = self.text_map[index - 1] - if nsi is not None: - if nsi[0] > previous[0]: - self._log([u" found suitable nsi: %.3f %.3f", nsi[0], nsi[1]]) - previous_slack = min(current[0] - nsi[0], previous_slack) - self._log([u" previous slack after min: %.3f", previous_slack]) - if previous_slack + current_slack >= 0: - self._log(u" enough slack to completely fix") - steal_from_previous = -current_slack - succeeded = True - else: - self._log(u" not enough slack to completely fix") - steal_from_previous = previous_slack - new_start = current_start - steal_from_previous - self.text_map[index - 1][1] = new_start - self.text_map[index][0] = new_start - new_rate = self._compute_rate(index) - self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate]) - self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate]) + self.log([u"Fragment %d has rate %.3f", index, rates[index]]) + fixed = False + + # first, try moving begin time back + if index > 0: + self.log(u" Trying to move begin time back...") + lacking = lengths[index] / max_rate - durations[index] + self.log([u" Overflow current fragment: %.3f", lacking]) + slack = durations[index - 1] - lengths[index - 1] / max_rate + self.log([u" Slack previous fragment: %.3f", slack]) + if slack >= lacking: + self.log([u" Moving begin time: %.3f => %.3f", times[index], times[index] - lacking]) + self.log(u" Complete fix (slack >= lacking)") + times[index] -= lacking + durations[index - 1] -= lacking + durations[index] += lacking + rates[index - 1] = lengths[index - 1] / durations[index - 1] + rates[index] = lengths[index] / durations[index] + fixed = True + elif slack > 0: + self.log([u" Moving begin time: %.3f => %.3f", times[index], times[index] - slack]) + self.log(u" Partial fix (slack < lacking but slack > 0)") + times[index] -= slack + durations[index - 1] -= slack + durations[index] += slack + rates[index - 1] = lengths[index - 1] / durations[index - 1] + rates[index] = lengths[index] / durations[index] else: - self._log(u" nsi found is not suitable") - else: - self._log(u" no nsi found") - else: - self._log(u" cannot expand into previous") + self.log(u" Cannot move begin time back (slack <= 0)") + + # if aggressive and not completely fixed, try moving end time forward + if (aggressive) and (not fixed) and (index < len(self.text_file) - 1): + self.log(u" Trying to move end time forward...") + lacking = lengths[index] / max_rate - durations[index] + self.log([u" Overflow current fragment: %.3f", lacking]) + slack = durations[index + 1] - lengths[index + 1] / max_rate + self.log([u" Slack next fragment: %.3f", slack]) + if slack >= lacking: + self.log([u" Moving end time: %.3f => %.3f", times[index + 1], times[index + 1] + lacking]) + self.log(u" Complete fix (slack >= lacking)") + times[index + 1] += lacking + durations[index] += lacking + durations[index + 1] -= lacking + rates[index] = lengths[index] / durations[index] + rates[index + 1] = lengths[index + 1] / durations[index + 1] + fixed = True + elif slack > 0: + self.log([u" Moving end time: %.3f => %.3f", times[index + 1], times[index + 1] + slack]) + self.log(u" Partial fix (slack < lacking but slack > 0)") + times[index + 1] += slack + durations[index] += slack + durations[index + 1] -= slack + rates[index] = lengths[index] / durations[index] + rates[index + 1] = lengths[index + 1] / durations[index + 1] + else: + self.log(u" Cannot move end time forward (slack <= 0)") - if succeeded: - self._log(u" succeeded: returning") - return + # if not completely fixed, log warning + if not fixed: + self.log_warn([u"Fragment %d is faster and could not be fixed", index]) - # recompute current fragment - current_rate = self._compute_rate(index) - current_slack = self._compute_slack(index) - current_rate = self._compute_rate(index) - - # try expanding into the next fragment - new_start = current_start - new_end = current_end - if (next_slack is not None) and (next_slack > 0): - self._log(u" can expand into next") - nsi = self._find_interval_containing(self.nonspeech, current[1]) - previous = self.text_map[index - 1] - if nsi is not None: - if nsi[0] > previous[0]: - self._log([u" found suitable nsi: %.3f %.3f", nsi[0], nsi[1]]) - next_slack = min(nsi[1] - current[1], next_slack) - self._log([u" next slack after min: %.3f", next_slack]) - if next_slack + current_slack >= 0: - self._log(u" enough slack to completely fix") - steal_from_next = -current_slack - succeeded = True - else: - self._log(u" not enough slack to completely fix") - steal_from_next = next_slack - new_end = current_end + steal_from_next - self.text_map[index][1] = new_end - self.text_map[index + 1][0] = new_end - new_rate = self._compute_rate(index) - self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate]) - self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate]) - else: - self._log(u" nsi found is not suitable") - else: - self._log(u" no nsi found") - else: - self._log(u" cannot expand into next") + # create intervals and return + self._times_to_intervals(times) - if succeeded: - self._log(u" succeeded: returning") - return + def _times_to_intervals(self, times): + """ + Transform a list of time values into a list of intervals. - self._log(u" not succeeded, returning") + For example: [0,1,2,3,4] => [[0,1], [1,2], [2,3], [3,4]] - def _rateaggressive_fix_fragment(self, index): + :param times: the time values + :type times: list of :class:`~aeneas.timevalue.TimeValue` """ - Fix index-th fragment using the rate algorithm (aggressive variant). + self.log(u"Converting times to intervals...") + intervals = [[times[i], times[i+1]] for i in range(len(times) - 1)] + self.log(u"Converting times to intervals... done") + self.log(u"Adding head and tail...") + self.intervals = [[TimeValue("0.000"), intervals[0][0]]] + intervals + [[intervals[-1][1], self.real_wave_mfcc.audio_length]] + self.log(u"Adding head and tail... done") + + def _apply_offset(self, offset): """ - current = self.text_map[index] - current_start = current[0] - current_end = current[1] - current_rate = self._compute_rate(index) - previous_slack = self._compute_slack(index - 1) - current_slack = self._compute_slack(index) - next_slack = self._compute_slack(index + 1) - if previous_slack is not None: - self._log([u" previous slack: %.3f", previous_slack]) - if current_slack is not None: - self._log([u" current slack: %.3f", current_slack]) - if next_slack is not None: - self._log([u" next slack: %.3f", next_slack]) - steal_from_previous = 0 - steal_from_next = 0 - if ( - (previous_slack is not None) and - (next_slack is not None) and - (previous_slack > 0) and - (next_slack > 0) - ): - self._log(u" can expand into both previous and next") - total_slack = previous_slack + next_slack - self._log([u" total slack: %.3f", total_slack]) - if total_slack + current_slack >= 0: - self._log(u" enough total slack to completely fix") - # partition the needed slack proportionally - previous_percentage = previous_slack / total_slack - self._log([u" previous percentage: %.3f", previous_percentage]) - steal_from_previous = -current_slack * previous_percentage - steal_from_next = -current_slack - steal_from_previous - else: - self._log(u" not enough total slack to completely fix") - # consume all the available slack - steal_from_previous = previous_slack - steal_from_next = next_slack - elif (previous_slack is not None) and (previous_slack > 0): - self._log(u" can expand into previous only") - if previous_slack + current_slack >= 0: - self._log(u" enough previous slack to completely fix") - steal_from_previous = -current_slack - else: - self._log(u" not enough previous slack to completely fix") - steal_from_previous = previous_slack - elif (next_slack is not None) and (next_slack > 0): - self._log(u" can expand into next only") - if next_slack + current_slack >= 0: - self._log(u" enough next slack to completely fix") - steal_from_next = -current_slack + Apply the given offset (negative, zero, or positive) + to all times. + + :param offset: the offset, in seconds + :type offset: :class:`~aeneas.timevalue.TimeValue` + """ + times = (self.boundary_indices * self.rconf.mws) + offset + if numpy.min(times) < TimeValue("0.000"): + self.log_warn(u"After applying offset some boundary times are negative") + if numpy.max(times) > self.real_wave_mfcc.audio_length: + self.log_warn(u"After applying offset some boundary times are beyond audio file duration") + times = numpy.clip(times, TimeValue("0.000"), self.real_wave_mfcc.audio_length) + self._times_to_intervals(times) + + def _adjust_on_nonspeech(self, adjust_function): + """ + Apply the adjust function to each boundary point + falling inside (extrema included) of a nonspeech interval. + + The adjust function is not applied to a boundary index + if there are two or more boundary indices falling + inside the same nonspeech interval. + + The adjust function is not applied to the last boundary index + to avoid anticipating the end of the audio file. + + The adjust function takes three arguments: the begin and end + indices of the nonspeech interval, and the current boundary index. + """ + self.log(u"Called _adjust_on_nonspeech") + mws = self.rconf.mws + nonspeech_intervals = self.real_wave_mfcc.intervals(speech=False, time=False) + # + # first iteration + # nonspeech_counter[i] is the number of boundary indices + # falling in the i-th nonspeech interval + # + self.log(u" First iteration...") + nonspeech_counter = numpy.zeros(len(nonspeech_intervals), dtype=int) + i = 0 # index of current boundary_index + j = 0 # index of current nonspeech_interval + while i < len(self.boundary_indices): + # current boundary index + cbi = self.boundary_indices[i] + # current nonspeech interval + # with the property that it ends at an index >= cbi - 1 + while (j < len(nonspeech_intervals)) and (nonspeech_intervals[j][1] < cbi - 1): + j += 1 + if j >= len(nonspeech_intervals): + break + cni = nonspeech_intervals[j] + self.log([u"FI Current boundary index: %d %.3f", cbi, cbi * mws]) + self.log([u"FI Current nonspeech interval: %d %d", cni[0], cni[1]]) + if (cbi - 1 >= cni[0]) and (cbi - 1 <= cni[1]): + self.log(u"FI Current boundary index is inside nonspeech") + nonspeech_counter[j] += 1 + i += 1 + self.log(u" First iteration... done") + # + # second iteration + # we adjust the time value only for those boundary indices that + # 1. fall within a nonspeech interval and, + # 2. each is the only boundary index falling in that nonspeech interval + # all the other boundary indices are returned unchanged + # + self.log(u" Second iteration...") + times = numpy.zeros(len(self.boundary_indices), dtype=TimeValue) + i = 0 + j = 0 + while i < len(self.boundary_indices): + # current boundary index + cbi = self.boundary_indices[i] + # current nonspeech interval + # with the property that it ends at an index >= cbi - 1 + while (j < len(nonspeech_intervals)) and (nonspeech_intervals[j][1] < cbi - 1): + j += 1 + if j >= len(nonspeech_intervals): + break + cni = nonspeech_intervals[j] + self.log([u"SI Current boundary index: %d %.3f", cbi, cbi * mws]) + self.log([u"SI Current nonspeech interval: %d %d", cni[0], cni[1]]) + if ( + (cbi - 1 >= cni[0]) and + (cbi - 1 <= cni[1]) and + (nonspeech_counter[j] == 1) and (i < len(self.boundary_indices) - 1) + ): + # falling inside and unique and not last => adjust + times[i] = adjust_function(cni[0], cni[1], cbi) + self.log([u"SI Adjusted cbi %d : %.3f => %.3f", cbi, cbi * mws, times[i]]) else: - self._log(u" not enough next slack to completely fix") - steal_from_next = next_slack - else: - self._log([u" fragment %d cannot be fixed", index]) - - self._log([u" steal from previous: %.3f", steal_from_previous]) - self._log([u" steal from next: %.3f", steal_from_next]) - new_start = current_start - steal_from_previous - new_end = current_end + steal_from_next - if index - 1 >= 0: - self.text_map[index - 1][1] = new_start - self.text_map[index][0] = new_start - self.text_map[index][1] = new_end - if index + 1 < len(self.text_map): - self.text_map[index + 1][0] = new_end - new_rate = self._compute_rate(index) - self._log([u" old: %.3f %.3f => %.3f", current_start, current_end, current_rate]) - self._log([u" new: %.3f %.3f => %.3f", new_start, new_end, new_rate]) + # not falling inside or not unique or last => do not adjust + times[i] = cbi * mws + self.log([u"SI Not adjusted cbi %d : %.3f => %.3f", cbi, times[i], times[i]]) + i += 1 + while i < len(self.boundary_indices): + # complete with remaining indices + cbi = self.boundary_indices[i] + times[i] = cbi * mws + self.log([u"Not adjusting %d %.3f", cbi, times[i]]) + i += 1 + self.log(u" Second iteration... done") + self._times_to_intervals(times) diff --git a/aeneas/analyzecontainer.py b/aeneas/analyzecontainer.py index 2bfea32f..9fcf6868 100644 --- a/aeneas/analyzecontainer.py +++ b/aeneas/analyzecontainer.py @@ -2,7 +2,13 @@ # coding=utf-8 """ -Analyze a given container and build the corresponding job. +This module contains the following classes: + +* :class:`~aeneas.analyzecontainer.AnalyzeContainer` + implementing functions to analyze a given container + and build the corresponding job object. + +.. warning:: This module might be refactored in a future version """ from __future__ import absolute_import @@ -13,7 +19,7 @@ from aeneas.container import Container from aeneas.hierarchytype import HierarchyType from aeneas.job import Job -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration from aeneas.task import Task import aeneas.globalconstants as gc @@ -26,40 +32,33 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class AnalyzeContainer(object): +class AnalyzeContainer(Loggable): """ Analyze a given container and build the corresponding job. :param container: the container to be analyzed - :type container: :class:`aeneas.container.Container` - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :type container: :class:`~aeneas.container.Container` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` - - :raise TypeError: if ``container`` is ``None`` or not an instance of ``Container`` + :type logger: :class:`~aeneas.logger.Logger` + :raises: TypeError: if ``container`` is ``None`` or not an instance of :class:`~aeneas.container.Container` """ TAG = u"AnalyzeContainer" def __init__(self, container, rconf=None, logger=None): if container is None: - raise TypeError("container is None") + raise TypeError(u"container is None") if not isinstance(container, Container): - raise TypeError("container is not an instance of Container") - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() + raise TypeError(u"container is not an instance of Container") + super(AnalyzeContainer, self).__init__(rconf=rconf, logger=logger) self.container = container - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - def analyze(self, config_string=None): """ Analyze the given container and @@ -67,25 +66,23 @@ def analyze(self, config_string=None): On error, it will return ``None``. - :param config_string: the configuration string generated by wizard - :type config_string: string - :rtype: :class:`aeneas.job.Job` + :param string config_string: the configuration string generated by wizard + :rtype: :class:`~aeneas.job.Job` or ``None`` """ try: if config_string is not None: - self._log(u"Analyzing container with the given config string") + self.log(u"Analyzing container with the given config string") return self._analyze_txt_config(config_string=config_string) elif self.container.has_config_xml: - self._log(u"Analyzing container with XML config file") + self.log(u"Analyzing container with XML config file") return self._analyze_xml_config(config_contents=None) elif self.container.has_config_txt: - self._log(u"Analyzing container with TXT config file") + self.log(u"Analyzing container with TXT config file") return self._analyze_txt_config(config_string=None) else: - self._log(u"No configuration file in this container, returning None") + self.log(u"No configuration file in this container, returning None") except (OSError, KeyError, TypeError) as exc: - self._log(u"Error in analyze", Logger.CRITICAL) - self._log([u"Message: %s", exc], Logger.CRITICAL) + self.log_exc(u"An unexpected error occurred while analyzing", exc, True, None) return None def _analyze_txt_config(self, config_string=None): @@ -95,96 +92,95 @@ def _analyze_txt_config(self, config_string=None): If ``config_string`` is ``None``, try reading it from the TXT config file inside the container. - :param config_string: the configuration string - :type config_string: string - :rtype: :class:`aeneas.job.Job` + :param string config_string: the configuration string + :rtype: :class:`~aeneas.job.Job` """ - self._log(u"Analyzing container with TXT config string") + self.log(u"Analyzing container with TXT config string") if config_string is None: - self._log(u"Analyzing container with TXT config file") + self.log(u"Analyzing container with TXT config file") config_entry = self.container.entry_config_txt - self._log([u"Found TXT config entry '%s'", config_entry]) + self.log([u"Found TXT config entry '%s'", config_entry]) config_dir = os.path.dirname(config_entry) - self._log([u"Directory of TXT config entry: '%s'", config_dir]) - self._log([u"Reading TXT config entry: '%s'", config_entry]) + self.log([u"Directory of TXT config entry: '%s'", config_dir]) + self.log([u"Reading TXT config entry: '%s'", config_entry]) config_contents = self.container.read_entry(config_entry) - self._log(u"Converting config contents to config string") + self.log(u"Converting config contents to config string") config_contents = gf.safe_unicode(config_contents) config_string = gf.config_txt_to_string(config_contents) else: - self._log([u"Analyzing container with TXT config string '%s'", config_string]) + self.log([u"Analyzing container with TXT config string '%s'", config_string]) config_dir = "" - self._log(u"Creating the Job object") + self.log(u"Creating the Job object") job = Job(config_string) - self._log(u"Getting entries") - entries = self.container.entries() + self.log(u"Getting entries") + entries = self.container.entries - self._log(u"Converting config string into config dict") + self.log(u"Converting config string into config dict") parameters = gf.config_string_to_dict(config_string) - self._log(u"Calculating the path of the tasks root directory") + self.log(u"Calculating the path of the tasks root directory") tasks_root_directory = gf.norm_join( config_dir, parameters[gc.PPN_JOB_IS_HIERARCHY_PREFIX] ) - self._log([u"Path of the tasks root directory: '%s'", tasks_root_directory]) + self.log([u"Path of the tasks root directory: '%s'", tasks_root_directory]) - self._log(u"Calculating the path of the sync map root directory") + self.log(u"Calculating the path of the sync map root directory") sync_map_root_directory = gf.norm_join( config_dir, parameters[gc.PPN_JOB_OS_HIERARCHY_PREFIX] ) job_os_hierarchy_type = parameters[gc.PPN_JOB_OS_HIERARCHY_TYPE] - self._log([u"Path of the sync map root directory: '%s'", sync_map_root_directory]) + self.log([u"Path of the sync map root directory: '%s'", sync_map_root_directory]) text_file_relative_path = parameters[gc.PPN_JOB_IS_TEXT_FILE_RELATIVE_PATH] - self._log([u"Relative path for text file: '%s'", text_file_relative_path]) + self.log([u"Relative path for text file: '%s'", text_file_relative_path]) text_file_name_regex = re.compile(r"" + parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX]) - self._log([u"Regex for text file: '%s'", parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX]]) + self.log([u"Regex for text file: '%s'", parameters[gc.PPN_JOB_IS_TEXT_FILE_NAME_REGEX]]) audio_file_relative_path = parameters[gc.PPN_JOB_IS_AUDIO_FILE_RELATIVE_PATH] - self._log([u"Relative path for audio file: '%s'", audio_file_relative_path]) + self.log([u"Relative path for audio file: '%s'", audio_file_relative_path]) audio_file_name_regex = re.compile(r"" + parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX]) - self._log([u"Regex for audio file: '%s'", parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX]]) + self.log([u"Regex for audio file: '%s'", parameters[gc.PPN_JOB_IS_AUDIO_FILE_NAME_REGEX]]) if parameters[gc.PPN_JOB_IS_HIERARCHY_TYPE] == HierarchyType.FLAT: - self._log(u"Looking for text/audio pairs in flat hierarchy") + self.log(u"Looking for text/audio pairs in flat hierarchy") text_files = self._find_files( entries, tasks_root_directory, text_file_relative_path, text_file_name_regex ) - self._log([u"Found text files: '%s'", text_files]) + self.log([u"Found text files: '%s'", text_files]) audio_files = self._find_files( entries, tasks_root_directory, audio_file_relative_path, audio_file_name_regex ) - self._log([u"Found audio files: '%s'", audio_files]) + self.log([u"Found audio files: '%s'", audio_files]) - self._log(u"Matching files in flat hierarchy...") + self.log(u"Matching files in flat hierarchy...") matched_tasks = self._match_files_flat_hierarchy( text_files, audio_files ) - self._log(u"Matching files in flat hierarchy... done") + self.log(u"Matching files in flat hierarchy... done") for task_info in matched_tasks: - self._log([u"Creating task: '%s'", str(task_info)]) + self.log([u"Creating task: '%s'", str(task_info)]) task = self._create_task( task_info, config_string, sync_map_root_directory, job_os_hierarchy_type ) - job.append_task(task) + job.add_task(task) if parameters[gc.PPN_JOB_IS_HIERARCHY_TYPE] == HierarchyType.PAGED: - self._log(u"Looking for text/audio pairs in paged hierarchy") + self.log(u"Looking for text/audio pairs in paged hierarchy") # find all subdirectories of tasks_root_directory # that match gc.PPN_JOB_IS_TASK_DIRECTORY_NAME_REGEX matched_directories = self._match_directories( @@ -198,7 +194,7 @@ def _analyze_txt_config(self, config_string=None): tasks_root_directory, matched_directory ) - self._log([u"Looking for text/audio pairs in directory '%s'", matched_directory_full_path]) + self.log([u"Looking for text/audio pairs in directory '%s'", matched_directory_full_path]) # look for text and audio files there text_files = self._find_files( @@ -207,38 +203,38 @@ def _analyze_txt_config(self, config_string=None): text_file_relative_path, text_file_name_regex ) - self._log([u"Found text files: '%s'", text_files]) + self.log([u"Found text files: '%s'", text_files]) audio_files = self._find_files( entries, matched_directory_full_path, audio_file_relative_path, audio_file_name_regex ) - self._log([u"Found audio files: '%s'", audio_files]) + self.log([u"Found audio files: '%s'", audio_files]) # if we have found exactly one text and one audio file, # create a Task if (len(text_files) == 1) and (len(audio_files) == 1): - self._log([u"Exactly one text file and one audio file in '%s'", matched_directory]) + self.log([u"Exactly one text file and one audio file in '%s'", matched_directory]) task_info = [ matched_directory, text_files[0], audio_files[0] ] - self._log([u"Creating task: '%s'", str(task_info)]) + self.log([u"Creating task: '%s'", str(task_info)]) task = self._create_task( task_info, config_string, sync_map_root_directory, job_os_hierarchy_type ) - job.append_task(task) + job.add_task(task) elif len(text_files) > 1: - self._log([u"More than one text file in '%s'", matched_directory]) + self.log([u"More than one text file in '%s'", matched_directory]) elif len(audio_files) > 1: - self._log([u"More than one audio file in '%s'", matched_directory]) + self.log([u"More than one audio file in '%s'", matched_directory]) else: - self._log([u"No text nor audio file in '%s'", matched_directory]) + self.log([u"No text nor audio file in '%s'", matched_directory]) return job @@ -249,53 +245,52 @@ def _analyze_xml_config(self, config_contents=None): If ``config_contents`` is ``None``, try reading it from the XML config file inside the container. - :param config_contents: the contents of the XML config file - :type config_contents: string - :rtype: :class:`aeneas.job.Job` + :param string config_contents: the contents of the XML config file + :rtype: :class:`~aeneas.job.Job` """ - self._log(u"Analyzing container with XML config string") + self.log(u"Analyzing container with XML config string") if config_contents is None: - self._log(u"Analyzing container with XML config file") + self.log(u"Analyzing container with XML config file") config_entry = self.container.entry_config_xml - self._log([u"Found XML config entry '%s'", config_entry]) + self.log([u"Found XML config entry '%s'", config_entry]) config_dir = os.path.dirname(config_entry) - self._log([u"Directory of XML config entry: '%s'", config_dir]) - self._log([u"Reading XML config entry: '%s'", config_entry]) + self.log([u"Directory of XML config entry: '%s'", config_dir]) + self.log([u"Reading XML config entry: '%s'", config_entry]) config_contents = self.container.read_entry(config_entry) else: - self._log(u"Analyzing container with XML config contents") + self.log(u"Analyzing container with XML config contents") config_dir = "" - self._log(u"Converting config contents into job config dict") + self.log(u"Converting config contents into job config dict") job_parameters = gf.config_xml_to_dict( config_contents, result=None, parse_job=True ) - self._log(u"Converting config contents into tasks config dict") + self.log(u"Converting config contents into tasks config dict") tasks_parameters = gf.config_xml_to_dict( config_contents, result=None, parse_job=False ) - self._log(u"Calculating the path of the sync map root directory") + self.log(u"Calculating the path of the sync map root directory") sync_map_root_directory = gf.norm_join( config_dir, job_parameters[gc.PPN_JOB_OS_HIERARCHY_PREFIX] ) job_os_hierarchy_type = job_parameters[gc.PPN_JOB_OS_HIERARCHY_TYPE] - self._log([u"Path of the sync map root directory: '%s'", sync_map_root_directory]) + self.log([u"Path of the sync map root directory: '%s'", sync_map_root_directory]) - self._log(u"Converting job config dict into job config string") + self.log(u"Converting job config dict into job config string") config_string = gf.config_dict_to_string(job_parameters) job = Job(config_string) for task_parameters in tasks_parameters: - self._log(u"Converting task config dict into task config string") + self.log(u"Converting task config dict into task config string") config_string = gf.config_dict_to_string(task_parameters) - self._log([u"Creating task with config string '%s'", config_string]) + self.log([u"Creating task with config string '%s'", config_string]) try: custom_id = task_parameters[gc.PPN_TASK_CUSTOM_ID] except KeyError: @@ -311,14 +306,14 @@ def _analyze_xml_config(self, config_contents=None): task_parameters[gc.PPN_TASK_IS_AUDIO_FILE_XML] ) ] - self._log([u"Creating task: '%s'", str(task_info)]) + self.log([u"Creating task: '%s'", str(task_info)]) task = self._create_task( task_info, config_string, sync_map_root_directory, job_os_hierarchy_type ) - job.append_task(task) + job.add_task(task) return job @@ -335,67 +330,64 @@ def _create_task( 1. the ``task_info`` found analyzing the container entries, and 2. the given ``config_string``. - :param task_info: the task information: ``[prefix, text_path, audio_path]`` - :type task_info: list of strings - :param config_string: the configuration string - :type config_string: string - :param sync_map_root_directory: the root directory for the sync map files - :type sync_map_root_directory: string (path) + :param list task_info: the task information: ``[prefix, text_path, audio_path]`` + :param string config_string: the configuration string + :param string sync_map_root_directory: the root directory for the sync map files :param job_os_hierarchy_type: type of job output hierarchy - :type job_os_hierarchy_type: :class:`aeneas.hierarchytype.HierarchyType` - :rtype: :class:`aeneas.task.Task` + :type job_os_hierarchy_type: :class:`~aeneas.hierarchytype.HierarchyType` + :rtype: :class:`~aeneas.task.Task` """ - self._log(u"Converting config string to config dict") + self.log(u"Converting config string to config dict") parameters = gf.config_string_to_dict(config_string) - self._log(u"Creating task") + self.log(u"Creating task") task = Task(config_string, logger=self.logger) task.configuration["description"] = "Task %s" % task_info[0] - self._log([u"Task description: %s", task.configuration["description"]]) + self.log([u"Task description: %s", task.configuration["description"]]) try: task.configuration["language"] = parameters[gc.PPN_TASK_LANGUAGE] - self._log([u"Set language from task: '%s'", task.configuration["language"]]) + self.log([u"Set language from task: '%s'", task.configuration["language"]]) except KeyError: task.configuration["language"] = parameters[gc.PPN_JOB_LANGUAGE] - self._log([u"Set language from job: '%s'", task.configuration["language"]]) + self.log([u"Set language from job: '%s'", task.configuration["language"]]) custom_id = task_info[0] task.configuration["custom_id"] = custom_id - self._log([u"Task custom_id: %s", task.configuration["custom_id"]]) + self.log([u"Task custom_id: %s", task.configuration["custom_id"]]) task.text_file_path = task_info[1] - self._log([u"Task text file path: %s", task.text_file_path]) + self.log([u"Task text file path: %s", task.text_file_path]) task.audio_file_path = task_info[2] - self._log([u"Task audio file path: %s", task.audio_file_path]) + self.log([u"Task audio file path: %s", task.audio_file_path]) task.sync_map_file_path = self._compute_sync_map_file_path( sync_map_root_directory, job_os_hierarchy_type, custom_id, task.configuration["o_name"] ) - self._log([u"Task sync map file path: %s", task.sync_map_file_path]) + self.log([u"Task sync map file path: %s", task.sync_map_file_path]) - self._log(u"Replacing placeholder in os_file_smil_audio_ref") + self.log(u"Replacing placeholder in os_file_smil_audio_ref") task.configuration["o_smil_audio_ref"] = self._replace_placeholder( task.configuration["o_smil_audio_ref"], custom_id ) - self._log(u"Replacing placeholder in os_file_smil_page_ref") + self.log(u"Replacing placeholder in os_file_smil_page_ref") task.configuration["o_smil_page_ref"] = self._replace_placeholder( task.configuration["o_smil_page_ref"], custom_id ) - self._log(u"Returning task") + self.log(u"Returning task") return task def _replace_placeholder(self, string, custom_id): """ Replace the prefix placeholder - :class:`aeneas.globalconstants.PPV_OS_TASK_PREFIX` + :class:`~aeneas.globalconstants.PPV_OS_TASK_PREFIX` with ``custom_id`` and return the resulting string. :rtype: string """ if string is None: return None - self._log([u"Replacing '%s' with '%s' in '%s'", gc.PPV_OS_TASK_PREFIX, custom_id, string]) + self.log([u"Replacing '%s' with '%s' in '%s'", gc.PPV_OS_TASK_PREFIX, custom_id, string]) return string.replace(gc.PPV_OS_TASK_PREFIX, custom_id) def _compute_sync_map_file_path( @@ -408,16 +400,13 @@ def _compute_sync_map_file_path( """ Compute the sync map file path inside the output container. - :param root: the root of the sync map files inside the container - :type root: string (path) + :param string root: the root of the sync map files inside the container :param job_os_hierarchy_type: type of job output hierarchy - :type job_os_hierarchy_type: :class:`aeneas.hierarchytype.HierarchyType` - :param custom_id: the task custom id (flat) or - page directory name (paged) - :type custom_id: string - :param file_name: the output file name for the sync map - :type file_name: string - :rtype: string (path) + :type job_os_hierarchy_type: :class:`~aeneas.hierarchytype.HierarchyType` + :param string custom_id: the task custom id (flat) or + page directory name (paged) + :param string file_name: the output file name for the sync map + :rtype: string """ prefix = root if hierarchy_type == HierarchyType.PAGED: @@ -432,34 +421,30 @@ def _find_files(self, entries, root, relative_path, file_name_regex): 1. are in ``root/relative_path``, and 2. match ``file_name_regex``. - :param entries: the list of entries (file paths) in the container - :type entries: list of strings (path) - :param root: the root directory of the container - :type root: string (path) - :param relative_path: the relative path in which we must search - :type relative_path: string (path) - :param file_name_regex: the regex matching the desired file names - :type file_name_regex: regex + :param list entries: the list of entries (file paths) in the container + :param string root: the root directory of the container + :param string relative_path: the relative path in which we must search + :param regex file_name_regex: the regex matching the desired file names :rtype: list of strings (path) """ - self._log([u"Finding files within root: '%s'", root]) + self.log([u"Finding files within root: '%s'", root]) target = root if relative_path is not None: - self._log([u"Joining relative path: '%s'", relative_path]) + self.log([u"Joining relative path: '%s'", relative_path]) target = gf.norm_join(root, relative_path) - self._log([u"Finding files within target: '%s'", target]) + self.log([u"Finding files within target: '%s'", target]) files = [] target_len = len(target) for entry in entries: if entry.startswith(target): - self._log([u"Examining entry: '%s'", entry]) + self.log([u"Examining entry: '%s'", entry]) entry_suffix = entry[target_len + 1:] - self._log([u"Examining entry suffix: '%s'", entry_suffix]) + self.log([u"Examining entry suffix: '%s'", entry_suffix]) if re.search(file_name_regex, entry_suffix) is not None: - self._log([u"Match: '%s'", entry]) + self.log([u"Match: '%s'", entry]) files.append(entry) else: - self._log([u"No match: '%s'", entry]) + self.log([u"No match: '%s'", entry]) return sorted(files) def _match_files_flat_hierarchy(self, text_files, audio_files): @@ -477,32 +462,30 @@ def _match_files_flat_hierarchy(self, text_files, audio_files): foo/res/c.txt foo/res/c.mp3 => match: ["c", "foo/res/c.txt", "foo/res/c.mp3"] foo/res/d.txt foo/res/e.mp3 => no match - :param text_files: the entries corresponding to text files - :type text_files: list of strings (path) - :param audio_files: the entries corresponding to audio files - :type audio_files: list of strings (path) + :param list text_files: the entries corresponding to text files + :param list audio_files: the entries corresponding to audio files :rtype: list of lists (see above) """ - self._log(u"Matching files in flat hierarchy") - self._log([u"Text files: '%s'", text_files]) - self._log([u"Audio files: '%s'", audio_files]) + self.log(u"Matching files in flat hierarchy") + self.log([u"Text files: '%s'", text_files]) + self.log([u"Audio files: '%s'", audio_files]) d_text = {} d_audio = {} for text_file in text_files: text_file_no_ext = gf.file_name_without_extension(text_file) d_text[text_file_no_ext] = text_file - self._log([u"Added text file '%s' to key '%s'", text_file, text_file_no_ext]) + self.log([u"Added text file '%s' to key '%s'", text_file, text_file_no_ext]) for audio_file in audio_files: audio_file_no_ext = gf.file_name_without_extension(audio_file) d_audio[audio_file_no_ext] = audio_file - self._log([u"Added audio file '%s' to key '%s'", audio_file, audio_file_no_ext]) + self.log([u"Added audio file '%s' to key '%s'", audio_file, audio_file_no_ext]) tasks = [] for key in d_text.keys(): - self._log([u"Examining text key '%s'", key]) + self.log([u"Examining text key '%s'", key]) if key in d_audio: - self._log([u"Key '%s' is also in audio", key]) + self.log([u"Key '%s' is also in audio", key]) tasks.append([key, d_text[key], d_audio[key]]) - self._log([u"Added pair ('%s', '%s')", d_text[key], d_audio[key]]) + self.log([u"Added pair ('%s', '%s')", d_text[key], d_audio[key]]) return tasks def _match_directories(self, entries, root, regex_string): @@ -525,24 +508,21 @@ def _match_directories(self, entries, root, regex_string): => ["/foo/bar/1", "/foo/bar/2", "/foo/bar/3"] - :param entries: the list of entries (paths) of a container - :type entries: list of strings (paths) - :param root: the root directory to search within - :type root: string (path) - :param regex_string: regex string to match directory names - :type regex_string: string + :param list entries: the list of entries (paths) of a container + :param string root: the root directory to search within + :param string regex_string: regex string to match directory names :rtype: list of matched directories """ - self._log(u"Matching directory names in paged hierarchy") - self._log([u"Matching within '%s'", root]) - self._log([u"Matching regex '%s'", regex_string]) + self.log(u"Matching directory names in paged hierarchy") + self.log([u"Matching within '%s'", root]) + self.log([u"Matching regex '%s'", regex_string]) regex = re.compile(r"" + regex_string) directories = set() root_len = len(root) for entry in entries: # look only inside root dir if entry.startswith(root): - self._log([u"Examining '%s'", entry]) + self.log([u"Examining '%s'", entry]) # remove common prefix root/ entry = entry[root_len + 1:] # split path @@ -551,9 +531,9 @@ def _match_directories(self, entries, root, regex_string): if ((len(entry_splitted) >= 2) and (re.match(regex, entry_splitted[0]) is not None)): directories.add(entry_splitted[0]) - self._log([u"Match: '%s'", entry_splitted[0]]) + self.log([u"Match: '%s'", entry_splitted[0]]) else: - self._log([u"No match: '%s'", entry]) + self.log([u"No match: '%s'", entry]) return sorted(directories) diff --git a/aeneas/audiofile.py b/aeneas/audiofile.py index 3c4f8dfd..25638fcc 100644 --- a/aeneas/audiofile.py +++ b/aeneas/audiofile.py @@ -2,7 +2,14 @@ # coding=utf-8 """ -A class representing an audio file. +This module contains the following classes: + +* :class:`~aeneas.audiofile.AudioFile`, representing an audio file; +* :class:`~aeneas.audiofile.AudioFileConverterError`, +* :class:`~aeneas.audiofile.AudioFileNotInitializedError`, +* :class:`~aeneas.audiofile.AudioFileProbeError`, and +* :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`, + representing errors generated by audio files. """ from __future__ import absolute_import @@ -14,9 +21,11 @@ from aeneas.ffprobewrapper import FFPROBEPathError from aeneas.ffprobewrapper import FFPROBEUnsupportedFormatError from aeneas.ffprobewrapper import FFPROBEWrapper -from aeneas.logger import Logger -from aeneas.mfcc import MFCC +from aeneas.ffmpegwrapper import FFMPEGPathError +from aeneas.ffmpegwrapper import FFMPEGWrapper +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.timevalue import TimeValue from aeneas.wavfile import read as scipywavread from aeneas.wavfile import write as scipywavwrite import aeneas.globalfunctions as gf @@ -28,13 +37,31 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" +class AudioFileConverterError(Exception): + """ + Error raised when the audio converter executable cannot be executed. + """ + pass + + + +class AudioFileNotInitializedError(Exception): + """ + Error raised when trying to access audio samples from + an :class:`~aeneas.audiofile.AudioFile` object which + has not been initialized yet. + """ + pass + + + class AudioFileProbeError(Exception): """ - Error raised when the probe executable cannot be executed. + Error raised when the audio probe executable cannot be executed. """ pass @@ -48,39 +75,61 @@ class AudioFileUnsupportedFormatError(Exception): -class AudioFile(object): +class AudioFile(Loggable): """ A class representing an audio file. + This class can be used either to extract properties + from an audio file on disk, + or to load/edit/save a monoaural (single channel) audio file, + represented as an array of audio samples. + The properties of the audio file (length, format, etc.) - are set by invoking the ``read_properties()`` function, + can set by invoking the :func:`~aeneas.audiofile.AudioFile.read_properties` function, which calls an audio file probe. - (Currently, the probe is :class:`aeneas.ffprobewrapper.FFPROBEWrapper`) - - :param file_path: the path of the audio file - :type file_path: Unicode string (path) - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + (Currently, the probe is :class:`~aeneas.ffprobewrapper.FFPROBEWrapper`) + + Moreover, this class can read the audio data, + by converting the original file format + into a temporary PCM16 Mono WAVE (RIFF) file, + which is deleted as soon as audio data is read in memory. + (Currently, the converter is :class:`~aeneas.ffmpegwrapper.FFMPEGWrapper`) + + The internal representation of the wave is a + a NumPy 1D array of ``float64`` values in ``[-1.0, 1.0]``. + It supports append, reverse, and trim operations. + Audio samples can be written to file. + Memory can be pre-allocated to speed append operations up. + Allocated memory is doubled when an append operation + requires more memory than what is available; + this leads to an amortized linear complexity + (in the number of audio samples) + for append operations. + + .. note:: Support for stereo WAVE files might be implemented in a future version + + :param string file_path: the path of the audio file + :param bool is_mono_wave: set to ``True`` if the audio file is a PCM16 mono WAVE file + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ TAG = u"AudioFile" - def __init__(self, file_path=None, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() + def __init__(self, file_path=None, is_mono_wave=False, rconf=None, logger=None): + super(AudioFile, self).__init__(rconf=rconf, logger=logger) self.file_path = file_path self.file_size = None + self.is_mono_wave = is_mono_wave self.audio_length = None self.audio_format = None self.audio_sample_rate = None self.audio_channels = None - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) + self.__samples_capacity = 0 + self.__samples_length = 0 + self.__samples = None def __unicode__(self): msg = [ @@ -90,6 +139,8 @@ def __unicode__(self): u"Audio format: %s" % self.audio_format, u"Audio sample rate: %s" % gf.safe_int(self.audio_sample_rate), u"Audio channels: %s" % gf.safe_int(self.audio_channels), + u"Samples capacity: %s" % gf.safe_int(self.__samples_capacity), + u"Samples length: %s" % gf.safe_int(self.__samples_length), ] return u"\n".join(msg) @@ -101,7 +152,7 @@ def file_path(self): """ The path of the audio file. - :rtype: Unicode string + :rtype: string """ return self.__file_path @file_path.setter @@ -125,7 +176,7 @@ def audio_length(self): """ The length of the audio file, in seconds. - :rtype: float + :rtype: :class:`~aeneas.timevalue.TimeValue` """ return self.__audio_length @audio_length.setter @@ -137,7 +188,7 @@ def audio_format(self): """ The format of the audio file. - :rtype: Unicode string + :rtype: string """ return self.__audio_format @audio_format.setter @@ -168,225 +219,245 @@ def audio_channels(self): def audio_channels(self, audio_channels): self.__audio_channels = audio_channels + @property + def audio_samples(self): + """ + The audio audio_samples, that is, an array of ``float64`` values, + each representing an audio sample in ``[-1.0, 1.0]``. + + Note that this function returns a view into the + first ``self.__samples_length`` elements of ``self.__samples``. + If you want to clone the values, + you must use e.g. ``numpy.array(audiofile.audio_samples)``. + + :rtype: :class:`numpy.ndarray` (1D, view) + :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet + """ + if self.__samples is None: + if self.file_path is None: + self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError) + else: + self.read_samples_from_file() + return self.__samples[0:self.__samples_length] + def read_properties(self): """ Populate this object by reading the audio properties of the file at the given path. Currently this function uses - :class:`aeneas.ffprobewrapper.FFPROBEWrapper` + :class:`~aeneas.ffprobewrapper.FFPROBEWrapper` to get the audio file properties. - :raises AudioFileProbeError: if the path to the ``ffprobe`` executable cannot be called - :raises AudioFileUnsupportedFormatError: if the audio file has a format not supported - :raises OSError: if the audio file cannot be read + :raises: :class:`~aeneas.audiofile.AudioFileProbeError`: if the path to the ``ffprobe`` executable cannot be called + :raises: :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`: if the audio file has a format not supported + :raises: OSError: if the audio file cannot be read """ - - self._log(u"Reading properties...") + self.log(u"Reading properties...") # check the file can be read if not gf.file_can_be_read(self.file_path): - self._log([u"File '%s' cannot be read", self.file_path], Logger.CRITICAL) - raise OSError(u"File '%s' cannot be read" % self.file_path) + self.log_exc(u"File '%s' cannot be read" % (self.file_path), None, True, OSError) # get the file size - self._log([u"Getting file size for '%s'", self.file_path]) + self.log([u"Getting file size for '%s'", self.file_path]) self.file_size = gf.file_size(self.file_path) - self._log([u"File size for '%s' is '%d'", self.file_path, self.file_size]) + self.log([u"File size for '%s' is '%d'", self.file_path, self.file_size]) # get the audio properties using FFPROBEWrapper try: - self._log(u"Reading properties with FFPROBEWrapper...") - properties = FFPROBEWrapper(rconf=self.rconf, logger=self.logger).read_properties(self.file_path) - self._log(u"Reading properties with FFPROBEWrapper... done") + self.log(u"Reading properties with FFPROBEWrapper...") + properties = FFPROBEWrapper( + rconf=self.rconf, + logger=self.logger + ).read_properties(self.file_path) + self.log(u"Reading properties with FFPROBEWrapper... done") except FFPROBEPathError: - self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL) - self._log(u"Unable to call ffprobe executable", Logger.CRITICAL) - raise AudioFileProbeError("Unable to call the audio probe executable") - except FFPROBEUnsupportedFormatError: - self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL) - self._log(u"Unsupported audio file format", Logger.CRITICAL) - raise AudioFileUnsupportedFormatError("Unsupported audio file format") - except FFPROBEParsingError: - self._log(u"Reading properties with FFPROBEWrapper... failed", Logger.CRITICAL) - self._log(u"Failed while parsing the ffprobe output", Logger.CRITICAL) - raise AudioFileUnsupportedFormatError("Unsupported audio file format") + self.log_exc(u"Unable to call ffprobe executable", None, True, AudioFileProbeError) + except (FFPROBEUnsupportedFormatError, FFPROBEParsingError): + self.log_exc(u"Audio file format not supported by ffprobe", None, True, AudioFileUnsupportedFormatError) # save relevant properties in results inside the audiofile object - self.audio_length = gf.safe_float(properties[FFPROBEWrapper.STDOUT_DURATION]) + self.audio_length = TimeValue(properties[FFPROBEWrapper.STDOUT_DURATION]) self.audio_format = properties[FFPROBEWrapper.STDOUT_CODEC_NAME] self.audio_sample_rate = gf.safe_int(properties[FFPROBEWrapper.STDOUT_SAMPLE_RATE]) self.audio_channels = gf.safe_int(properties[FFPROBEWrapper.STDOUT_CHANNELS]) - self._log([u"Stored audio_length: '%s'", self.audio_length]) - self._log([u"Stored audio_format: '%s'", self.audio_format]) - self._log([u"Stored audio_sample_rate: '%s'", self.audio_sample_rate]) - self._log([u"Stored audio_channels: '%s'", self.audio_channels]) - self._log(u"Reading properties... done") - - - -class AudioFileMonoWAVE(AudioFile): - """ - A monoaural (single-channel) WAVE audio file. - - Its data can be read from and write to file, set from a ``numpy`` 1D array. - - It supports append, prepend, reverse, and trim operations. - - It can also extract MFCCs and store them internally, - also after the audio data has been discarded. - - NOTE - At the moment, the state of this object might be inconsistent - (e.g., setting a new path after loading audio data will not flush the audio data). - Use this class with care. - - :param file_path: the path of the audio file - :type file_path: Unicode string (path) - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` - :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` - """ - - TAG = u"AudioFileMonoWAVE" - - def __init__(self, file_path=None, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() - self.audio_data = None - self.audio_mfcc = None - AudioFile.__init__(self, file_path=file_path, rconf=rconf, logger=logger) - - @property - def audio_data(self): - """ - The audio data. - - :rtype: numpy 1D array - """ - return self.__audio_data - @audio_data.setter - def audio_data(self, audio_data): - self.__audio_data = audio_data - - @property - def audio_mfcc(self): - """ - The MFCCs of the audio file. + self.log([u"Stored audio_length: '%s'", self.audio_length]) + self.log([u"Stored audio_format: '%s'", self.audio_format]) + self.log([u"Stored audio_sample_rate: '%s'", self.audio_sample_rate]) + self.log([u"Stored audio_channels: '%s'", self.audio_channels]) + self.log(u"Reading properties... done") - :rtype: numpy 2D array + def read_samples_from_file(self): """ - return self.__audio_mfcc + Load the audio samples from file into memory. - @audio_mfcc.setter - def audio_mfcc(self, audio_mfcc): - self.__audio_mfcc = audio_mfcc + If ``self.is_mono_wave`` is ``False``, + the file will be first converted + to a temporary PCM16 mono WAVE file. + Audio data will be read from this temporary file, + which will be then deleted from disk immediately. - def load_data(self): - """ - Load the audio file data. + If ``self.is_mono_wave`` is ``True``, + the audio data will be read directly + from the given file, + which will not be deleted from disk. - :raises AudioFileUnsupportedFormatError: if the audio file is not a mono WAVE file - :raises OSError: if the audio file cannot be read + :raises: :class:`~aeneas.audiofile.AudioFileConverterError`: if the path to the ``ffmpeg`` executable cannot be called + :raises: :class:`~aeneas.audiofile.AudioFileUnsupportedFormatError`: if the audio file has a format not supported + :raises: OSError: if the audio file cannot be read """ - self._log(u"Loading audio data...") + self.log(u"Loading audio data...") # check the file can be read if not gf.file_can_be_read(self.file_path): - self._log([u"File '%s' cannot be read", self.file_path], Logger.CRITICAL) - raise OSError("File '%s' cannot be read" % self.file_path) + self.log_exc(u"File '%s' cannot be read" % (self.file_path), None, True, OSError) + # convert file to PCM16 mono WAVE + if self.is_mono_wave: + self.log(u"is_mono_wave=True => reading self.file_path directly") + tmp_handler = None + tmp_file_path = self.file_path + else: + self.log(u"is_mono_wave=False => converting self.file_path") + tmp_handler, tmp_file_path = gf.tmp_file(suffix=u".wav", root=self.rconf[RuntimeConfiguration.TMP_PATH]) + self.log([u"Temporary PCM16 mono WAVE file: '%s'", tmp_file_path]) + try: + self.log(u"Converting audio file to mono...") + converter = FFMPEGWrapper(rconf=self.rconf, logger=self.logger) + converter.convert(self.file_path, tmp_file_path) + self.log(u"Converting audio file to mono... done") + except FFMPEGPathError: + gf.delete_file(tmp_handler, tmp_file_path) + self.log_exc(u"Unable to call ffmpeg executable", None, True, AudioFileConverterError) + except OSError: + gf.delete_file(tmp_handler, tmp_file_path) + self.log_exc(u"Audio file format not supported by ffmpeg", None, True, AudioFileUnsupportedFormatError) + + # TODO allow calling C extension cwave to read samples faster try: self.audio_format = "pcm16" - self.audio_sample_rate, self.audio_data = scipywavread(self.file_path) + self.audio_channels = 1 + self.audio_sample_rate, self.__samples = scipywavread(tmp_file_path) # scipy reads a sample as an int16_t, that is, a number in [-32768, 32767] # so we convert it to a float64 in [-1, 1] - self.audio_data = self.audio_data.astype("float64") / 32768 + self.__samples = self.__samples.astype("float64") / 32768 + self.__samples_capacity = len(self.__samples) + self.__samples_length = self.__samples_capacity + self._update_length() except ValueError: - self._log(u"Unsupported audio file format", Logger.CRITICAL) - raise AudioFileUnsupportedFormatError("Unsupported audio file format") + self.log_exc(u"Audio format not supported by scipywavread", None, True, AudioFileUnsupportedFormatError) + + if not self.is_mono_wave: + gf.delete_file(tmp_handler, tmp_file_path) + self.log([u"Deleted temporary PCM16 mono WAVE file: '%s'", tmp_file_path]) self._update_length() - self._log([u"Sample length: %f", self.audio_length]) - self._log([u"Sample rate: %f", self.audio_sample_rate]) - self._log([u"Audio format: %s", self.audio_format]) - self._log(u"Loading audio data... done") + self.log([u"Sample length: %.3f", self.audio_length]) + self.log([u"Sample rate: %d", self.audio_sample_rate]) + self.log([u"Audio format: %s", self.audio_format]) + self.log([u"Audio channels: %d", self.audio_channels]) + self.log(u"Loading audio data... done") - def append_data(self, new_data): + def preallocate_memory(self, capacity): """ - Append the given new data to the current audio data. + Preallocate memory to store audio samples, + to avoid repeated new allocations and copies + while performing several consecutive append operations. - If audio data is not loaded, create an empty data structure - and then append to it. + If ``self.__samples`` is not initialized, + it will become an array of ``capacity`` zeros. - :param new_data: the new data to be appended - :type new_data: numpy 1D array + If ``capacity`` is larger than the current capacity, + the current ``self.__samples`` will be extended with zeros. - .. versionadded:: 1.2.1 + If ``capacity`` is smaller than the current capacity, + the first ``capacity`` values of ``self.__samples`` + will be retained. + + :param int capacity: the new capacity, in number of samples + :raises: ValueError: if ``capacity`` is negative + + .. versionadded:: 1.5.0 """ - self._log(u"Appending audio data...") - self._audio_data_is_initialized(load=False) - self.audio_data = numpy.append(self.audio_data, new_data) - self._update_length() - self._log(u"Appending audio data... done") + if capacity < 0: + raise ValueError(u"The capacity value cannot be negative") + if self.__samples is None: + self.log(u"Not initialized") + self.__samples = numpy.zeros(capacity) + self.__samples_length = 0 + else: + self.log([u"Previous sample length was (samples): %d", self.__samples_length]) + self.log([u"Previous sample capacity was (samples): %d", self.__samples_capacity]) + self.__samples = numpy.resize(self.__samples, capacity) + self.__samples_length = min(self.__samples_length, capacity) + self.__samples_capacity = capacity + self.log([u"Current sample capacity is (samples): %d", self.__samples_capacity]) + + def minimize_memory(self): + """ + Reduce the allocated memory to the minimum + required to store the current audio samples. + + This function is meant to be called + when building a wave incrementally, + after the last append operation. - def prepend_data(self, new_data): + .. versionadded:: 1.5.0 """ - Prepend the given new data to the current audio data. + if self.__samples is None: + self.log(u"Not initialized, returning") + else: + self.log(u"Initialized, minimizing memory...") + self.preallocate_memory(self.__samples_length) + self.log(u"Initialized, minimizing memory... done") + + def add_samples(self, samples, reverse=False): + """ + Concatenate the given new samples to the current audio data. - If audio data is not loaded, create an empty data structure - and then preppend to it. + This function initializes the memory if no audio data + is present already. - :param new_data: the new data to be prepended - :type new_data: numpy 1D array + If ``reverse`` is ``True``, the new samples + will be reversed and then concatenated. + + :param samples: the new samples to be concatenated + :type samples: :class:`numpy.ndarray` (1D) + :param bool reverse: if ``True``, concatenate new samples after reversing them .. versionadded:: 1.2.1 """ - self._log(u"Prepending audio data...") - self._audio_data_is_initialized(load=False) - self.audio_data = numpy.append(new_data, self.audio_data) - self._update_length() - self._log(u"Prepending audio data... done") - - def extract_mfcc(self): - """ - Extract MFCCs from the given audio file. - - If audio data is not loaded, load it, extract MFCCs, - store them internally, and discard the audio data immediately. - - :raise RuntimeError: if both the C extension and - the pure Python code did not succeed. - """ - had_audio_data = self._audio_data_is_initialized(load=True) - gf.run_c_extension_with_fallback( - self._log, - "cmfcc", - self._compute_mfcc_c_extension, - self._compute_mfcc_pure_python, - (), - c_extension=self.rconf["c_ext"] - ) - if not had_audio_data: - self._log(u"Audio data was not loaded, clearing it") - self.clear_data() + self.log(u"Adding samples...") + samples_length = len(samples) + current_length = self.__samples_length + future_length = current_length + samples_length + if (self.__samples is None) or (self.__samples_capacity < future_length): + self.preallocate_memory(2 * future_length) + if reverse: + self.__samples[current_length:future_length] = samples[::-1] else: - self._log(u"Audio data was loaded, not clearing it") + self.__samples[current_length:future_length] = samples[:] + self.__samples_length = future_length + self._update_length() + self.log(u"Adding samples... done") def reverse(self): """ Reverse the audio data. - If audio data is not loaded, load it and then reverse it. + :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet .. versionadded:: 1.2.0 """ - self._log(u"Reversing...") - self._audio_data_is_initialized(load=True) - self.audio_data = self.audio_data[::-1] - self._log(u"Reversing... done") + if self.__samples is None: + if self.file_path is None: + self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError) + else: + self.read_samples_from_file() + self.log(u"Reversing...") + self.__samples[0:self.__samples_length] = numpy.flipud(self.__samples[0:self.__samples_length]) + self.log(u"Reversing... done") def trim(self, begin=None, length=None): """ @@ -396,62 +467,72 @@ def trim(self, begin=None, length=None): If audio data is not loaded, load it and then slice it. :param begin: the start position, in seconds - :type begin: float + :type begin: :class:`~aeneas.timevalue.TimeValue` :param length: the position, in seconds - :type length: float + :type length: :class:`~aeneas.timevalue.TimeValue` + :raises: TypeError: if one of the arguments is not ``None`` + or :class:`~aeneas.timevalue.TimeValue` .. versionadded:: 1.2.0 """ - self._log(u"Trimming...") + for variable, name in [(begin, "begin"), (length, "length")]: + if (variable is not None) and (not isinstance(variable, TimeValue)): + raise TypeError(u"%s is not None or TimeValue" % name) + self.log(u"Trimming...") if (begin is None) and (length is None): - self._log(u"begin and length are both None: nothing to do") + self.log(u"begin and length are both None: nothing to do") else: - self._audio_data_is_initialized(load=True) - self._log([u"audio_length is %.3f", self.audio_length]) if begin is None: - begin = 0 - self._log([u"begin was None, now set to %.3f", begin]) - begin = min(max(0, begin), self.audio_length) - self._log([u"begin is %.3f", begin]) + begin = TimeValue("0.000") + self.log([u"begin was None, now set to %.3f", begin]) + begin = min(max(TimeValue("0.000"), begin), self.audio_length) + self.log([u"begin is %.3f", begin]) if length is None: length = self.audio_length - begin - self._log([u"length was None, now set to %.3f", length]) - length = min(max(0, length), self.audio_length - begin) - self._log([u"length is %.3f", length]) + self.log([u"length was None, now set to %.3f", length]) + length = min(max(TimeValue("0.000"), length), self.audio_length - begin) + self.log([u"length is %.3f", length]) begin_index = int(begin * self.audio_sample_rate) end_index = int((begin + length) * self.audio_sample_rate) - self.audio_data = self.audio_data[begin_index:end_index] + new_idx = end_index - begin_index + self.__samples[0:new_idx] = self.__samples[begin_index:end_index] + self.__samples_length = new_idx self._update_length() - self._log(u"Trimming... done") + self.log(u"Trimming... done") def write(self, file_path): """ Write the audio data to file. Return ``True`` on success, or ``False`` otherwise. - :param file_path: the path of the output file to be written - :type file_path: Unicode string (path) + :param string file_path: the path of the output file to be written + :raises: :class:`~aeneas.audiofile.AudioFileNotInitializedError`: if the audio file is not initialized yet .. versionadded:: 1.2.0 """ - self._log([u"Writing audio file '%s'...", file_path]) - self._audio_data_is_initialized(load=False) + if self.__samples is None: + if self.file_path is None: + self.log_exc(u"AudioFile object not initialized", None, True, AudioFileNotInitializedError) + else: + self.read_samples_from_file() + self.log([u"Writing audio file '%s'...", file_path]) try: # our value is a float64 in [-1, 1] # scipy writes the sample as an int16_t, that is, a number in [-32768, 32767] - data = (self.audio_data * 32768).astype("int16") + data = (self.audio_samples * 32768).astype("int16") scipywavwrite(file_path, self.audio_sample_rate, data) - except: - self._log(u"Error writing audio file", severity=Logger.CRITICAL) - raise OSError("Error writing audio file to '%s'" % file_path) - self._log([u"Writing audio file '%s'... done", file_path]) + except Exception as exc: + self.log_exc(u"Error writing audio file to '%s'" % (file_path), exc, True, OSError) + self.log([u"Writing audio file '%s'... done", file_path]) def clear_data(self): """ Clear the audio data, freeing memory. """ - self._log(u"Clear audio_data") - self.audio_data = None + self.log(u"Clear audio_data") + self.__samples_capacity = 0 + self.__samples_length = 0 + self.__samples = None def _update_length(self): """ @@ -459,81 +540,10 @@ def _update_length(self): according to the length of the current audio data and audio sample rate. - This function fails silently if one of the two is None. - """ - if (self.audio_sample_rate is not None) and (self.audio_data is not None): - self.audio_length = len(self.audio_data) / self.audio_sample_rate - - def _audio_data_is_initialized(self, load=True): - """ - Check if audio data is loaded: - if so, return True. - - Otherwise, either load or initialize the audio data - and return False. - - :param load: if True, load from file; if False, initialize to empty - :type load: bool - :rtype: bool - """ - if self.audio_data is not None: - self._log(u"audio data is not None: returning True") - return True - if load: - self._log(u"No audio data: loading it from file") - self.load_data() - else: - self._log(u"No audio data: initializing it to an empty data structure") - self.audio_data = numpy.array([]) - self._log(u"audio data was None: returning False") - return False - - def _compute_mfcc_c_extension(self): + This function fails silently if one of the two is ``None``. """ - Compute MFCCs using the Python C extension cmfcc. - """ - self._log(u"Computing MFCCs using C extension...") - try: - self._log(u"Importing cmfcc...") - import aeneas.cmfcc.cmfcc - self._log(u"Importing cmfcc... done") - self.audio_mfcc = (aeneas.cmfcc.cmfcc.compute_from_data( - self.audio_data, - self.audio_sample_rate, - self.rconf["mfcc_filters"], - self.rconf["mfcc_size"], - self.rconf["mfcc_order"], - self.rconf["mfcc_lower_freq"], - self.rconf["mfcc_upper_freq"], - self.rconf["mfcc_emph"], - self.rconf["mfcc_win_len"], - self.rconf["mfcc_win_shift"] - )[0]).transpose() - self._log(u"Computing MFCCs using C extension... done") - return (True, None) - except Exception as exc: - self._log(u"Computing MFCCs using C extension... failed") - self._log(u"An unexpected exception occurred while running cmfcc:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) - - def _compute_mfcc_pure_python(self): - """ - Compute MFCCs using the pure Python code. - """ - self._log(u"Computing MFCCs using pure Python code...") - try: - self.audio_mfcc = MFCC( - rconf=self.rconf, - logger=self.logger - ).compute_from_data(self.audio_data, self.audio_sample_rate).transpose() - self._log(u"Computing MFCCs using pure Python code... done") - return (True, None) - except Exception as exc: - self._log(u"Computing MFCCs using pure Python code... failed") - self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) + if (self.audio_sample_rate is not None) and (self.__samples is not None): + self.audio_length = TimeValue(self.__samples_length / self.audio_sample_rate) diff --git a/aeneas/audiofilemfcc.py b/aeneas/audiofilemfcc.py new file mode 100644 index 00000000..2dd5efb4 --- /dev/null +++ b/aeneas/audiofilemfcc.py @@ -0,0 +1,637 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +This module contains the following classes: + +* :class:`~aeneas.audiofilemfcc.AudioFileMFCC`, + representing a mono WAVE audio file as a matrix of + Mel-frequency ceptral coefficients (MFCC). +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import numpy + +from aeneas.audiofile import AudioFile +from aeneas.logger import Loggable +from aeneas.mfcc import MFCC +from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.timevalue import TimeValue +from aeneas.vad import VAD +import aeneas.globalfunctions as gf + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +class AudioFileMFCC(Loggable): + """ + A monoaural (single channel) WAVE audio file, + represented as a NumPy 2D matrix of + Mel-frequency ceptral coefficients (MFCC). + + The matrix is "fat", that is, + its number of rows is equal to the number of MFCC coefficients + and its number of columns is equal to the number of window shifts + in the audio file. + The number of MFCC coefficients and the MFCC window shift can + be modified via the + :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_SIZE` + and + :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_SHIFT` + keys in the ``rconf`` object. + + If ``mfcc_matrix`` is not ``None``, + it will be used as the MFCC matrix. + + If ``file_path`` or ``audio_file`` is not ``None``, + the MFCCs will be computed upon creation of the object, + possibly converting to PCM16 Mono WAVE and/or + loading audio data in memory. + + The MFCCs for the entire wave + are divided into three + contiguous intervals (possibly, zero-length):: + + HEAD = [:middle_begin[ + MIDDLE = [middle_begin:middle_end[ + TAIL = [middle_end:[ + + The usual NumPy convention of including the left/start index + and excluding the right/end index is adopted. + + For alignment purposes, only the ``MIDDLE`` portion of the wave + is taken into account; the ``HEAD`` and ``TAIL`` intervals are ignored. + + This class heavily uses NumPy views and in-place operations + to avoid creating temporary data or copying data around. + + :param string file_path: the path of the PCM16 mono WAVE file, or ``None`` + :param bool file_path_is_mono_wave: set to ``True`` if the audio file at ``file_path`` is a PCM16 mono WAVE file + :param mfcc_matrix: the MFCC matrix to be set, or ``None`` + :type mfcc_matrix: :class:`numpy.ndarray` + :param audio_file: an audio file, or ``None`` + :type audio_file: :class:`~aeneas.audiofile.AudioFile` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + :raises: ValueError: if ``file_path``, ``audio_file``, and ``mfcc_matrix`` are all ``None`` + + .. versionadded:: 1.5.0 + """ + + TAG = u"AudioFileMFCC" + + def __init__( + self, + file_path=None, + file_path_is_mono_wave=False, + mfcc_matrix=None, + audio_file=None, + rconf=None, + logger=None + ): + if (file_path is None) and (audio_file is None) and (mfcc_matrix is None): + raise ValueError(u"You must initialize with at least one of: file_path, audio_file, or mfcc_matrix") + super(AudioFileMFCC, self).__init__(rconf=rconf, logger=logger) + self.file_path = file_path + self.audio_file = audio_file + self.is_reversed = False + self.__mfcc = None + self.__mfcc_mask = None + self.__mfcc_mask_map = None + self.__speech_intervals = None + self.__nonspeech_intervals = None + self.log(u"Initializing MFCCs...") + if mfcc_matrix is not None: + self.__mfcc = mfcc_matrix + self.audio_length = self.all_length * self.rconf.mws + elif (self.file_path is not None) or (self.audio_file is not None): + audio_file_was_none = False + if self.audio_file is None: + audio_file_was_none = True + self.audio_file = AudioFile( + self.file_path, + is_mono_wave=file_path_is_mono_wave, + rconf=self.rconf, + logger=self.logger + ) + # NOTE load audio samples into memory, if not present already + self.audio_file.audio_samples + gf.run_c_extension_with_fallback( + self.log, + "cmfcc", + self._compute_mfcc_c_extension, + self._compute_mfcc_pure_python, + (), + c_extension=self.rconf[RuntimeConfiguration.C_EXTENSIONS] + ) + self.audio_length = self.audio_file.audio_length + if audio_file_was_none: + self.log(u"Clearing the audio data...") + self.audio_file.clear_data() + self.audio_file = None + self.log(u"Clearing the audio data... done") + self.__middle_begin = 0 + self.__middle_end = self.__mfcc.shape[1] + self.log(u"Initializing MFCCs... done") + + def __unicode__(self): + msg = [ + u"File path: %s" % self.file_path, + u"Audio length (s): %s" % gf.safe_float(self.audio_length), + ] + return u"\n".join(msg) + + def __str__(self): + return gf.safe_str(self.__unicode__()) + + @property + def all_mfcc(self): + """ + The MFCCs of the entire audio file, + that is, HEAD + MIDDLE + TAIL. + + :rtype: :class:`numpy.ndarray` (2D) + """ + return self.__mfcc + + @property + def all_length(self): + """ + The length, in MFCC coefficients, + of the entire audio file, + that is, HEAD + MIDDLE + TAIL. + + :rtype: int + """ + return self.__mfcc.shape[1] + + @property + def middle_mfcc(self): + """ + The MFCCs of the middle part of the audio file, + that is, without HEAD and TAIL. + + :rtype: :class:`numpy.ndarray` (2D) + """ + return self.__mfcc[:, self.__middle_begin:self.__middle_end] + + @property + def middle_length(self): + """ + The length, in MFCC coefficients, + of the middle part of the audio file, + that is, without HEAD and TAIL. + + :rtype: int + """ + return self.__middle_end - self.__middle_begin + + @property + def middle_map(self): + """ + Return the map + from the MFCC frame indices + in the MIDDLE portion of the wave + to the MFCC FULL frame indices, + that is, an ``numpy.arange(self.middle_begin, self.middle_end)``. + + NOTE: to translate indices of MIDDLE, + instead of using fancy indexing with the + result of this function, you might want to simply + add ``self.head_length``. + This function is provided mostly for consistency + with the MASKED case. + + :rtype: :class:`numpy.ndarray` (1D) + """ + return numpy.arange(self.__middle_begin, self.__middle_end) + + @property + def head_length(self): + """ + The length, in MFCC coefficients, + of the HEAD of the audio file. + + :rtype: int + """ + return self.__middle_begin + + @property + def tail_length(self): + """ + The length, in MFCC coefficients, + of the TAIL of the audio file. + + :rtype: int + """ + return self.all_length - self.__middle_end + + @property + def tail_begin(self): + """ + The index, in MFCC coefficients, + where the TAIL of the audio file starts. + + :rtype: int + """ + return self.__middle_end + + @property + def audio_length(self): + """ + The length, in seconds, of the audio file. + + This value is the actual length of the audio file, + computed as ``number of samples / sample_rate``, + hence it might differ than ``len(self.__mfcc) * mfcc_window_shift``. + + :rtype: :class:`~aeneas.timevalue.TimeValue` + """ + return self.__audio_length + @audio_length.setter + def audio_length(self, audio_length): + self.__audio_length = audio_length + + @property + def is_reversed(self): + """ + Return ``True`` if currently reversed. + + :rtype: bool + """ + return self.__is_reversed + @is_reversed.setter + def is_reversed(self, is_reversed): + self.__is_reversed = is_reversed + + @property + def masked_mfcc(self): + """ + Return the MFCC speech frames + in the FULL wave. + + :rtype: :class:`numpy.ndarray` (2D) + """ + self._ensure_mfcc_mask() + return self.__mfcc[:, self.__mfcc_mask] + + @property + def masked_length(self): + """ + Return the number of MFCC speech frames + in the FULL wave. + + :rtype: int + """ + self._ensure_mfcc_mask() + return len(self.__mfcc_mask_map) + + @property + def masked_map(self): + """ + Return the map + from the MFCC speech frame indices + to the MFCC FULL frame indices. + + :rtype: :class:`numpy.ndarray` (1D) + """ + self._ensure_mfcc_mask() + return self.__mfcc_mask_map + + @property + def masked_middle_mfcc(self): + """ + Return the MFCC speech frames + in the MIDDLE portion of the wave. + + :rtype: :class:`numpy.ndarray` (2D) + """ + begin, end = self._masked_middle_begin_end() + return (self.masked_mfcc)[:, begin:end] + + @property + def masked_middle_length(self): + """ + Return the number of MFCC speech frames + in the MIDDLE portion of the wave. + + :rtype: int + """ + begin, end = self._masked_middle_begin_end() + return end - begin + + @property + def masked_middle_map(self): + """ + Return the map + from the MFCC speech frame indices + in the MIDDLE portion of the wave + to the MFCC FULL frame indices. + + :rtype: :class:`numpy.ndarray` (1D) + """ + begin, end = self._masked_middle_begin_end() + return self.__mfcc_mask_map[begin:end] + + def _masked_middle_begin_end(self): + """ + Return the begin and end indices w.r.t. ``self.__mfcc_mask_map``, + corresponding to indices in the MIDDLE portion of the wave, + that is, which fall between ``self.__middle_begin`` and + ``self.__middle_end`` in ``self.__mfcc``. + + :rtype: (int, int) + """ + self._ensure_mfcc_mask() + begin = numpy.searchsorted(self.__mfcc_mask_map, self.__middle_begin, side="left") + end = numpy.searchsorted(self.__mfcc_mask_map, self.__middle_end, side="right") + return (begin, end) + + def intervals(self, speech=True, time=True): + """ + Return a list of intervals:: + + [(b_1, e_1), (b_2, e_2), ..., (b_k, e_k)] + + where ``b_i`` is the time when the ``i``-th interval begins, + and ``e_i`` is the time when it ends. + + :param bool speech: if ``True``, return speech intervals, + otherwise return nonspeech intervals + :param bool time: if ``True``, return values in seconds (:class:`~aeneas.timevalue.TimeValue`), + otherwise in indices (int) + :rtype: list of pairs (see above) + """ + self._ensure_mfcc_mask() + if speech: + self.log(u"Converting speech runs to intervals") + intervals = self.__speech_intervals + else: + self.log(u"Converting nonspeech runs to intervals") + intervals = self.__nonspeech_intervals + if time: + mws = self.rconf.mws + return [(i[0] * mws, (i[1] + 1) * mws) for i in intervals] + return intervals + + def inside_nonspeech(self, index): + """ + If ``index`` is contained in a nonspeech interval, + return a pair ``(interval_begin, interval_end)`` + such that ``interval_begin <= index < interval_end``, + i.e., ``interval_end`` is assumed not to be included. + + Otherwise, return ``None``. + + :rtype: ``None`` or tuple + """ + self._ensure_mfcc_mask() + if (index < 0) or (index >= self.all_length) or (self.__mfcc_mask[index]): + return None + return self._binary_search_intervals(self.__nonspeech_intervals, index) + + @classmethod + def _binary_search_intervals(cls, intervals, index): + """ + Binary search for the interval containing index, + assuming there is such an interval. + This function should never return ``None``. + """ + start = 0 + end = len(intervals) - 1 + while start <= end: + middle_index = start + ((end - start) // 2) + middle = intervals[middle_index] + if (middle[0] <= index) and (index < middle[1]): + return middle + elif middle[0] > index: + end = middle_index - 1 + else: + start = middle_index + 1 + return None + + @property + def middle_begin(self): + """ + Return the index where MIDDLE starts. + + :rtype: int + """ + return self.__middle_begin + + @middle_begin.setter + def middle_begin(self, index): + """ + Set the index where MIDDLE starts. + + :param int index: the new index for MIDDLE begin + """ + if (index < 0) or (index > self.all_length): + raise ValueError(u"The given index is not valid") + self.__middle_begin = index + + @property + def middle_begin_seconds(self): + """ + Return the time instant, in seconds, where MIDDLE starts. + + :rtype: :class:`~aeneas.timevalue.TimeValue` + """ + return TimeValue(self.__middle_begin) * self.rconf.mws + + @property + def middle_end(self): + """ + Return the index (+1) where MIDDLE ends. + + :rtype: int + """ + return self.__middle_end + + @middle_end.setter + def middle_end(self, index): + """ + Set the index (+1) where MIDDLE ends. + + :param int index: the new index for MIDDLE end + """ + if (index < 0) or (index > self.all_length): + raise ValueError(u"The given index is not valid") + self.__middle_end = index + + @property + def middle_end_seconds(self): + """ + Return the time instant, in seconds, where MIDDLE ends. + + :rtype: :class:`~aeneas.timevalue.TimeValue` + """ + return TimeValue(self.__middle_end) * self.rconf.mws + + def _ensure_mfcc_mask(self): + """ + Ensure that ``run_vad()`` has already been called, + and hence ``self.__mfcc_mask`` has a meaningful value. + """ + if self.__mfcc_mask is None: + self.log(u"VAD was not run: running it now") + self.run_vad() + + def _compute_mfcc_c_extension(self): + """ + Compute MFCCs using the Python C extension cmfcc. + """ + self.log(u"Computing MFCCs using C extension...") + try: + self.log(u"Importing cmfcc...") + import aeneas.cmfcc.cmfcc + self.log(u"Importing cmfcc... done") + self.__mfcc = (aeneas.cmfcc.cmfcc.compute_from_data( + self.audio_file.audio_samples, + self.audio_file.audio_sample_rate, + self.rconf[RuntimeConfiguration.MFCC_FILTERS], + self.rconf[RuntimeConfiguration.MFCC_SIZE], + self.rconf[RuntimeConfiguration.MFCC_FFT_ORDER], + self.rconf[RuntimeConfiguration.MFCC_LOWER_FREQUENCY], + self.rconf[RuntimeConfiguration.MFCC_UPPER_FREQUENCY], + self.rconf[RuntimeConfiguration.MFCC_EMPHASIS_FACTOR], + self.rconf[RuntimeConfiguration.MFCC_WINDOW_LENGTH], + self.rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT] + )[0]).transpose() + self.log(u"Computing MFCCs using C extension... done") + return (True, None) + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running cmfcc", exc, False, None) + return (False, None) + + def _compute_mfcc_pure_python(self): + """ + Compute MFCCs using the pure Python code. + """ + self.log(u"Computing MFCCs using pure Python code...") + try: + self.__mfcc = MFCC( + rconf=self.rconf, + logger=self.logger + ).compute_from_data( + self.audio_file.audio_samples, + self.audio_file.audio_sample_rate + ).transpose() + self.log(u"Computing MFCCs using pure Python code... done") + return (True, None) + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None) + return (False, None) + + def reverse(self): + """ + Reverse the audio file. + + The reversing is done efficiently using NumPy views inplace + instead of swapping values. + + Only speech and nonspeech intervals are actually recomputed + as Python lists. + """ + self.log(u"Reversing...") + all_length = self.all_length + self.__mfcc = self.__mfcc[:, ::-1] + tmp = self.__middle_end + self.__middle_end = all_length - self.__middle_begin + self.__middle_begin = all_length - tmp + if self.__mfcc_mask is not None: + self.__mfcc_mask = self.__mfcc_mask[::-1] + # equivalent to + # self.__mfcc_mask_map = ((all_length - 1) - self.__mfcc_mask_map)[::-1] + # but done in place using NumPy view + self.__mfcc_mask_map *= -1 + self.__mfcc_mask_map += all_length - 1 + self.__mfcc_mask_map = self.__mfcc_mask_map[::-1] + self.__speech_intervals = [(all_length - i[1], all_length - i[0]) for i in self.__speech_intervals[::-1]] + self.__nonspeech_intervals = [(all_length - i[1], all_length - i[0]) for i in self.__nonspeech_intervals[::-1]] + self.is_reversed = not self.is_reversed + self.log(u"Reversing...done") + + def run_vad(self): + """ + Determine which frames contain speech and nonspeech, + and store the resulting boolean mask internally. + """ + def _compute_runs(array): + """ + Compute runs as a list of arrays, + each containing the indices of a contiguous run. + + :param array: the data array + :type array: :class:`numpy.ndarray` (1D) + :rtype: list of :class:`numpy.ndarray` (1D) + """ + if len(array) < 1: + return [] + return numpy.split(array, numpy.where(numpy.diff(array) != 1)[0] + 1) + self.log(u"Creating VAD object") + vad = VAD(rconf=self.rconf, logger=self.logger) + self.log(u"Running VAD...") + self.__mfcc_mask = vad.run_vad(self.__mfcc[0]) + self.__mfcc_mask_map = (numpy.where(self.__mfcc_mask))[0] + self.log(u"Running VAD... done") + self.log(u"Storing speech and nonspeech intervals...") + # where( == True) already computed, reusing + #runs = _compute_runs((numpy.where(self.__mfcc_mask))[0]) + runs = _compute_runs(self.__mfcc_mask_map) + self.__speech_intervals = [(r[0], r[-1]) for r in runs] + # where( == False) not already computed, computing now + runs = _compute_runs((numpy.where(~self.__mfcc_mask))[0]) + self.__nonspeech_intervals = [(r[0], r[-1]) for r in runs] + self.log(u"Storing speech and nonspeech intervals... done") + + def set_head_middle_tail(self, head_length=None, middle_length=None, tail_length=None): + """ + Set the HEAD, MIDDLE, TAIL explicitly. + + If a parameter is ``None``, it will be ignored. + If both ``middle_length`` and ``tail_length`` are specified, + only ``middle_length`` will be applied. + + :param head_length: the length of HEAD, in seconds + :type head_length: :class:`~aeneas.timevalue.TimeValue` + :param middle_length: the length of MIDDLE, in seconds + :type middle_length: :class:`~aeneas.timevalue.TimeValue` + :param tail_length: the length of TAIL, in seconds + :type tail_length: :class:`~aeneas.timevalue.TimeValue` + :raises: TypeError: if one of the arguments is not ``None`` + or :class:`~aeneas.timevalue.TimeValue` + """ + for variable, name in [ + (head_length, "head_length"), + (middle_length, "middle_length"), + (tail_length, "tail_length") + ]: + if (variable is not None) and (not isinstance(variable, TimeValue)): + raise TypeError(u"%s is not None or TimeValue" % name) + self.log(u"Setting head middle tail...") + mws = self.rconf.mws + self.log([u"Before: 0 %d %d %d", self.middle_begin, self.middle_end, self.all_length]) + if head_length is not None: + self.middle_begin = int(head_length / mws) + if middle_length is not None: + self.middle_end = self.middle_begin + int(middle_length / mws) + elif tail_length is not None: + self.middle_end = self.all_length - int(tail_length / mws) + self.log([u"After: 0 %d %d %d", self.middle_begin, self.middle_end, self.all_length]) + self.log(u"Setting head middle tail... done") + + + diff --git a/aeneas/cdtw/000_compile_driver.sh b/aeneas/cdtw/000_compile_driver.sh new file mode 100644 index 00000000..4cb2f2b7 --- /dev/null +++ b/aeneas/cdtw/000_compile_driver.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +gcc cdtw_driver.c cdtw_func.c cint.c -o cdtw_driver -lm -Wall -pedantic -std=c99 + + + diff --git a/aeneas/cdtw/100_run_driver.sh b/aeneas/cdtw/100_run_driver.sh new file mode 100644 index 00000000..a68ab4f8 --- /dev/null +++ b/aeneas/cdtw/100_run_driver.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +if [ ! -e cdtw_driver ] +then + bash 000_compile_driver.sh +fi + +echo "Run 1" +./cdtw_driver +echo "" + +echo "Run 2 (no stdout)" +./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 cm > /dev/null +echo "" + +echo "Run 3 (no stdout)" +./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 acm > /dev/null +echo "" + +echo "Run 4 (no stdout)" +./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path > /dev/null +echo "" + + + diff --git a/aeneas/cdtw/800_compile_py.sh b/aeneas/cdtw/800_compile_py.sh new file mode 100644 index 00000000..46ad13b6 --- /dev/null +++ b/aeneas/cdtw/800_compile_py.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +rm -rf build *.so +python cdtw_setup.py build_ext --inplace + diff --git a/aeneas/cdtw/README.md b/aeneas/cdtw/README.md new file mode 100644 index 00000000..f1b6e035 --- /dev/null +++ b/aeneas/cdtw/README.md @@ -0,0 +1,22 @@ +# aeneas.cdtw + +**aeneas.cdtw** is a Python C extension for computing the DTW. + +## API + +See the [__init__.py](__init__.py) file. + +## Compiling the Python C extension locally + +```bash +$ python cdtw_setup.py build_ext --inplace +``` + +## Compiling the pure C driver program + +```bash +$ bash 000_compile_driver.sh +``` + + + diff --git a/aeneas/cdtw/__init__.py b/aeneas/cdtw/__init__.py index 9afa8c04..a8215303 100644 --- a/aeneas/cdtw/__init__.py +++ b/aeneas/cdtw/__init__.py @@ -3,6 +3,108 @@ """ aeneas.cdtw is a Python C extension for computing the DTW. + +.. function:: cdtw.compute_best_path(mfcc1, mfcc2, delta) + + Compute the DTW (approximated) best path + for the two audio waves, represented by their MFCCs. + + This function implements the Sakoe-Chiba heuristic, + that is, it explores only a band of width ``2 * delta`` + around the main diagonal of the cost matrix. + + The computation is done in-memory, and it might fail + if there is not enough memory to allocate the cost matrix + or the list to be returned. + + The returned list contains tuples ``(i, j)``, + representing the best path from ``(0, 0)`` to ``(n-1, m-1)``, + where ``n`` is the length of ``mfcc1``, and + ``m`` is the length of ``mfcc2``. + The returned list has length between ``min(n, m)`` and ``n + m`` + (it can be less than ``n + m`` if diagonal steps + are selected in the best path). + + :param mfcc1: the MFCCs of the first wave ``(n, mfcc_size)`` + :type mfcc1: :class:`numpy.ndarray` + :param mfcc2: the MFCCs of the second wave ``(m, mfcc_size)`` + :type mfcc2: :class:`numpy.ndarray` + :param int delta: the margin parameter + :rtype: list of tuples + +.. function:: cdtw.compute_cost_matrix_step(mfcc1, mfcc2, delta) + + Compute the DTW (approximated) cost matrix + for the two audio waves, represented by their MFCCs. + + This function implements the Sakoe-Chiba heuristic, + that is, it explores only a band of width ``2 * delta`` + around the main diagonal of the cost matrix. + + The computation is done in-memory, and it might fail + if there is not enough memory to allocate the cost matrix. + + The returned tuple ``(cost_matrix, centers)`` + contains the cost matrix (NumPy 2D array of shape (n, delta)) + and the row centers (NumPy 1D array of size n). + + :param mfcc1: the MFCCs of the first wave ``(n, mfcc_size)`` + :type mfcc1: :class:`numpy.ndarray` + :param mfcc2: the MFCCs of the second wave ``(m, mfcc_size)`` + :type mfcc2: :class:`numpy.ndarray` + :param int delta: the margin parameter + :rtype: tuple + +.. function:: cdtw.compute_accumulated_cost_matrix_step(cost_matrix, centers) + + Compute the DTW (approximated) accumulated cost matrix + from the cost matrix and the row centers. + + This function implements the Sakoe-Chiba heuristic, + that is, it explores only a band of width ``2 * delta`` + around the main diagonal of the cost matrix. + + The computation is done in-memory, + and the accumulated cost matrix is computed in place, + that is, the original cost matrix is destroyed + and its allocated memory used to store + the accumulated cost matrix. + Hence, this call should not fail for memory reasons. + + The returned NumPy 2D array of shape ``(n, delta)`` + contains the accumulated cost matrix. + + :param cost_matrix: the cost matrix ``(n, delta)`` + :type cost_matrix: :class:`numpy.ndarray` + :param centers: the row centers ``(n,)`` + :type centers: :class:`numpy.ndarray` + :rtype: :class:`numpy.ndarray` + +.. function:: cdtw.compute_best_path_step(accumulated_cost_matrix, centers) + + Compute the DTW (approximated) best path + from the accumulated cost matrix and the row centers. + + This function implements the Sakoe-Chiba heuristic, + that is, it explores only a band of width ``2 * delta`` + around the main diagonal of the cost matrix. + + The computation is done in-memory, and it might fail + if there is not enough memory to allocate the list to be returned. + + The returned list contains tuples ``(i, j)``, + representing the best path from ``(0, 0)`` to ``(n-1, m-1)``, + where ``n`` is the length of ``mfcc1``, and + ``m`` is the length of ``mfcc2``. + The returned list has length between ``min(n, m)`` and ``n + m`` + (it can be less than ``n + m`` if diagonal steps + are selected in the best path). + + :param cost_matrix: the accumulated cost matrix ``(n, delta)`` + :type cost_matrix: :class:`numpy.ndarray` + :param centers: the row centers ``(n, )`` + :type centers: :class:`numpy.ndarray` + :rtype: list of tuples """ __author__ = "Alberto Pettarin" @@ -12,7 +114,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL 3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" diff --git a/aeneas/cdtw/cdtw_driver.c b/aeneas/cdtw/cdtw_driver.c index 09c3fb7d..6d511227 100644 --- a/aeneas/cdtw/cdtw_driver.c +++ b/aeneas/cdtw/cdtw_driver.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for computing the DTW __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -20,11 +20,30 @@ __status__ = "Production" #include #include "cdtw_func.h" -void _read_matrix(const char *file_name, double *matrix, unsigned int n, unsigned int m) { - unsigned int i, j; +#define DRIVER_SUCCESS 0 +#define DRIVER_FAILURE 1 + +// print usage +void _usage(const char *prog) { + printf("\n"); + printf("Usage: %s MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN [cm|acm|path]\n", prog); + printf("\n"); + printf("Example: %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 cm\n", prog); + printf(" %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 acm\n", prog); + printf(" %s 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path\n", prog); + printf("\n"); +} + +// read matrix from file +int _read_matrix(const char *file_name, double *matrix, uint32_t n, uint32_t m) { + uint32_t i, j; FILE *file_ptr; file_ptr = fopen(file_name, "r"); + if (file_ptr == NULL) { + return DRIVER_FAILURE; + } + for (i = 0; i < n; ++i) { for (j = 0; j < m; ++j) { if (!fscanf(file_ptr, "%lf", matrix + i * m + j)) { @@ -35,10 +54,12 @@ void _read_matrix(const char *file_name, double *matrix, unsigned int n, unsigne } fclose(file_ptr); file_ptr = NULL; + return DRIVER_SUCCESS; } -void _print_matrix(double *matrix, unsigned int n, unsigned int m) { - unsigned int i, j; +// print matrix to stdout +void _print_matrix(double *matrix, uint32_t n, uint32_t m) { + uint32_t i, j; for (i = 0; i < n; ++i) { for (j = 0; j < m; ++j) { @@ -48,33 +69,18 @@ void _print_matrix(double *matrix, unsigned int n, unsigned int m) { } } -// -// this is a simple driver to test on the command line -// -// compile it with: -// -// $ gcc cdtw_driver.c cdtw_func.c -o cdtw_driver -lm -// -// use it as follows: -// -// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN cm => compute and print cost matrix -// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN acm => compute and print accumulated cost matrix -// ./cdtw_driver MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN path => compute and print best path -// -// example: -// ./cdtw_driver 12 3000 ../tests/res/cdtw/mfcc1_12_1332 1332 ../tests/res/cdtw/mfcc2_12_868 868 path -// int main(int argc, char **argv) { double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr; char *mfcc1_file_name, *mfcc2_file_name, *mode; - unsigned int *centers_ptr; - unsigned int mfcc_size, delta, mfcc1_len, mfcc2_len, best_path_length, k; + uint32_t *centers_ptr; + uint32_t mfcc_size, delta, mfcc1_len, mfcc2_len; + uint32_t best_path_length, k; struct PATH_CELL *best_path; if (argc < 8) { - printf("\nUsage: %s MFCC_SIZE DELTA MFCC1_FILE MFCC1_LEN MFCC2_FILE MFCC2_LEN [cm|acm|path]\n\n", argv[0]); - return 1; + _usage(argv[0]); + return DRIVER_FAILURE; } mfcc_size = atoi(argv[1]); delta = atoi(argv[2]); @@ -88,29 +94,62 @@ int main(int argc, char **argv) { delta = mfcc2_len; } + // allocate space for the MFCCs and read the input files mfcc1_ptr = (double *)calloc(mfcc1_len * mfcc_size, sizeof(double)); - _read_matrix(mfcc1_file_name, mfcc1_ptr, mfcc1_len, mfcc_size); mfcc2_ptr = (double *)calloc(mfcc2_len * mfcc_size, sizeof(double)); - _read_matrix(mfcc2_file_name, mfcc2_ptr, mfcc2_len, mfcc_size); + if ((mfcc1_ptr == NULL) || (mfcc2_ptr == NULL)) { + printf("Error: unable to allocate space for the input MFCCs.\n"); + return DRIVER_FAILURE; + } + if (_read_matrix(mfcc1_file_name, mfcc1_ptr, mfcc1_len, mfcc_size) != DRIVER_SUCCESS) { + printf("Error: unable to read MFCC1.\n"); + return DRIVER_FAILURE; + } + if (_read_matrix(mfcc2_file_name, mfcc2_ptr, mfcc2_len, mfcc_size) != DRIVER_SUCCESS) { + printf("Error: unable to read MFCC2.\n"); + return DRIVER_FAILURE; + } - // allocate space + // allocate space for the cost matrix cost_matrix_ptr = (double *)calloc(mfcc1_len * delta, sizeof(double)); - centers_ptr = (unsigned int *)calloc(mfcc1_len, sizeof(unsigned int)); + centers_ptr = (uint32_t *)calloc(mfcc1_len, sizeof(uint32_t)); + if ((cost_matrix_ptr == NULL) || (centers_ptr == NULL)) { + printf("Error: unable to allocate space for the cost matrix and the centers.\n"); + return DRIVER_FAILURE; + } // compute cost matrix - _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, mfcc1_len, mfcc2_len, mfcc_size); + if (_compute_cost_matrix( + mfcc1_ptr, + mfcc2_ptr, + delta, + cost_matrix_ptr, + centers_ptr, + mfcc1_len, + mfcc2_len, + mfcc_size) != CDTW_SUCCESS) { + printf("Error: unable to compute cost matrix.\n"); + return DRIVER_FAILURE; + } + if (strcmp(mode, "cm") == 0) { // print cost matrix _print_matrix(cost_matrix_ptr, mfcc1_len, delta); } else if ((strcmp(mode, "acm") == 0) || (strcmp(mode, "path") == 0)) { // compute accumulated cost matrix - _compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, mfcc1_len, delta); + if (_compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, mfcc1_len, delta) != CDTW_SUCCESS) { + printf("Error: unable to compute accumulated cost matrix.\n"); + return DRIVER_FAILURE; + } if (strcmp(mode, "acm") == 0) { // print accumulated cost matrix _print_matrix(cost_matrix_ptr, mfcc1_len, delta); } else { // print best path - _compute_best_path(cost_matrix_ptr, centers_ptr, mfcc1_len, delta, &best_path, &best_path_length); + if (_compute_best_path(cost_matrix_ptr, centers_ptr, mfcc1_len, delta, &best_path, &best_path_length) != CDTW_SUCCESS) { + printf("Error: unable to compute best path.\n"); + return DRIVER_FAILURE; + } for (k = 0; k < best_path_length; ++k) { printf("%u %u\n", best_path[k].i, best_path[k].j); } @@ -124,7 +163,7 @@ int main(int argc, char **argv) { free((void *)mfcc2_ptr); free((void *)mfcc1_ptr); - return 0; + return DRIVER_SUCCESS; } diff --git a/aeneas/cdtw/cdtw_func.c b/aeneas/cdtw/cdtw_func.c index fd55df51..6bed35a3 100644 --- a/aeneas/cdtw/cdtw_func.c +++ b/aeneas/cdtw/cdtw_func.c @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -27,23 +27,27 @@ __status__ = "Production" #define NPY_INFINITY DBL_MAX #endif -// return the max of the given arguments -unsigned int _max(const int a, const int b) { - if (a > b) { - return a; +#define MOVE0 0 // up +#define MOVE1 1 // left +#define MOVE2 2 // up and left + +// return the max(0, center_j - half_delta) +uint32_t _nonnegative_difference(uint32_t center_j, uint32_t half_delta) { + if (half_delta > center_j) { + return 0; } - return b; + return center_j - half_delta; } // return the argmin of the three arguments unsigned int _three_way_argmin(const double cost0, const double cost1, const double cost2) { if ((cost0 <= cost1) && (cost0 <= cost2)) { - return 0; + return MOVE0; } if (cost1 <= cost2) { - return 1; + return MOVE1; } - return 2; + return MOVE2; } // return the min of three arguments @@ -58,20 +62,19 @@ double _three_way_min(const double cost0, const double cost1, const double cost2 } // copy the row-th row of cost_matrix into buffer -void _copy_cost_matrix_row(const double *cost_matrix_ptr, const unsigned int row, const unsigned int width, double *buffer_ptr) { +void _copy_cost_matrix_row(const double *cost_matrix_ptr, const uint32_t row, const uint32_t width, double *buffer_ptr) { memcpy(buffer_ptr, cost_matrix_ptr + row * width, width * sizeof(double)); } // appen the given (i, j) cell to the k-th position of the best path -void _append(struct PATH_CELL *best_path_ptr, const unsigned int k, const unsigned int i, const unsigned int j) { +void _append(struct PATH_CELL *best_path_ptr, const uint32_t k, const uint32_t i, const uint32_t j) { best_path_ptr[k].i = i; best_path_ptr[k].j = j; } // reverse the best path -void _reverse(struct PATH_CELL *best_path_ptr, const unsigned int best_path_len) { - unsigned int tmp_i, tmp_j; - unsigned int a, b; +void _reverse(struct PATH_CELL *best_path_ptr, const uint32_t best_path_len) { + uint32_t a, b, tmp_i, tmp_j; // reverse the min path for (a = 0; a < best_path_len / 2; ++a) { @@ -86,13 +89,13 @@ void _reverse(struct PATH_CELL *best_path_ptr, const unsigned int best_path_len) } // compute the norm2 of the given MFCCs vector -void _compute_norm2(double *mfcc_ptr, const unsigned int mfcc_len, const unsigned int mfcc_coeffs, double *norm2_ptr) { - unsigned int i, k; +void _compute_norm2(double *mfcc_ptr, const uint32_t mfcc_len, const uint32_t mfcc_size, double *norm2_ptr) { + uint32_t i, k; double v, sum; for (i = 0; i < mfcc_len; ++i) { sum = 0.0; - for (k = 0; k < mfcc_coeffs; ++k) { + for (k = 0; k < mfcc_size; ++k) { v = mfcc_ptr[k * mfcc_len + i]; sum += v * v; } @@ -101,31 +104,34 @@ void _compute_norm2(double *mfcc_ptr, const unsigned int mfcc_len, const unsigne } // compute cost matrix from mfcc? -void _compute_cost_matrix( +int _compute_cost_matrix( double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n) double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m) - const unsigned int delta, // margin parameter + const uint32_t delta, // margin parameter double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row; delta/2 <= centers[i] < m - delta/2 - const unsigned int n, // number of frames of the first wave - const unsigned int m, // number of frames of the second wave - const unsigned int l // number of MFCCs + uint32_t *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row + const uint32_t n, // number of frames (MFCC vectors) of the first wave + const uint32_t m, // number of frames (MFCC vectors) of the second wave + const uint32_t l // MFCC size ) { double *norm2_1_ptr, *norm2_2_ptr; double sum; - unsigned int center_j, range_start, range_end; - unsigned int i, j, k; + uint32_t center_j, range_start, range_end; + uint32_t i, j, k; // compute norm2 vectors norm2_1_ptr = (double *)calloc(n, sizeof(double)); norm2_2_ptr = (double *)calloc(m, sizeof(double)); + if ((norm2_1_ptr == NULL) || (norm2_2_ptr == NULL)) { + return CDTW_FAILURE; + } _compute_norm2(mfcc1_ptr, n, l, norm2_1_ptr); _compute_norm2(mfcc2_ptr, m, l, norm2_2_ptr); for (i = 0; i < n; ++i) { center_j = (int)floor(m * (1.0 * i / n)); - range_start = _max(0, center_j - (delta / 2)); + range_start = _nonnegative_difference(center_j, delta / 2); range_end = range_start + delta; if (range_end > m) { range_end = m; @@ -144,20 +150,21 @@ void _compute_cost_matrix( // deallocate norm2 vectors as they are no longer needed free((void *)norm2_1_ptr); free((void *)norm2_2_ptr); + return CDTW_SUCCESS; } // compute accumulated cost matrix, not in-place -void _compute_accumulated_cost_matrix( - double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - unsigned int n, // number of frames of the first wave - unsigned int delta, // margin parameter - double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta) +int _compute_accumulated_cost_matrix( + const double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta, // margin parameter + double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta) ) { double cost0, cost1, cost2; - unsigned int current_idx, offset; - unsigned int i, j; + uint32_t current_idx, offset; + uint32_t i, j; accumulated_cost_matrix_ptr[0] = cost_matrix_ptr[0]; for (j = 1; j < delta; ++j) { @@ -182,29 +189,33 @@ void _compute_accumulated_cost_matrix( accumulated_cost_matrix_ptr[current_idx] = cost_matrix_ptr[current_idx] + _three_way_min(cost0, cost1, cost2); } } + return CDTW_SUCCESS; } // compute accumulated cost matrix, in-place // (i.e., this function overwrites cost_matrix with the accumulated cost values) -void _compute_accumulated_cost_matrix_in_place( - double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - const unsigned int n, // number of frames of the first wave - const unsigned int delta // margin parameter +int _compute_accumulated_cost_matrix_in_place( + double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta // margin parameter ) { double *current_row_ptr; double cost0, cost1, cost2; - unsigned int current_idx, offset; - unsigned int i, j; + uint32_t current_idx, offset; + uint32_t i, j; // to compute the i-th row of the accumulated cost matrix // we only need the i-th row of the cost matrix - current_row_ptr = (double *)malloc(delta * sizeof(double)); + current_row_ptr = (double *)calloc(delta, sizeof(double)); + if (current_row_ptr == NULL) { + return CDTW_FAILURE; + } // copy the first row of cost_matrix_ptr to current row buffer _copy_cost_matrix_row(cost_matrix_ptr, 0, delta, current_row_ptr); - //cost_matrix_ptr[0] = current_row_ptr[0]; + //cost_matrix_ptr[0] = current_row_ptr[0]; // not needed! for (j = 1; j < delta; ++j) { cost_matrix_ptr[j] = current_row_ptr[j] + cost_matrix_ptr[j-1]; } @@ -230,21 +241,22 @@ void _compute_accumulated_cost_matrix_in_place( } } free((void *)current_row_ptr); + return CDTW_SUCCESS; } // compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, delta-1) -void _compute_best_path( - double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - const unsigned int n, // number of frames of the first wave - const unsigned int delta, // margin parameter - struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path - unsigned int *best_path_len // length of the best path +int _compute_best_path( + const double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta, // margin parameter + struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path + uint32_t *best_path_len // length of the best path ) { double cost0, cost1, cost2; - unsigned int argmin, offset; - unsigned int i, j, k, r_j, max_path_len; + uint32_t argmin, r_j, offset; + uint32_t i, j, k, max_path_len; // allocate space for keeping the best path // @@ -256,6 +268,9 @@ void _compute_best_path( // max_path_len = n + centers_ptr[n-1] + delta; *best_path_ptr = (struct PATH_CELL *)calloc(max_path_len, sizeof(struct PATH_CELL)); + if ((*best_path_ptr) == NULL) { + return CDTW_FAILURE; + } i = n - 1; j = centers_ptr[i] + delta - 1; @@ -283,9 +298,9 @@ void _compute_best_path( cost2 = accumulated_cost_matrix_ptr[(i-1) * delta + (r_j+offset-1)]; } argmin = _three_way_argmin(cost0, cost1, cost2); - if (argmin == 0) { + if (argmin == MOVE0) { _append(*best_path_ptr, k++, --i, j); - } else if (argmin == 1) { + } else if (argmin == MOVE1) { _append(*best_path_ptr, k++, i, --j); } else { _append(*best_path_ptr, k++, --i, --j); @@ -298,6 +313,7 @@ void _compute_best_path( // reverse the path _reverse(*best_path_ptr, k); + return CDTW_SUCCESS; } diff --git a/aeneas/cdtw/cdtw_func.h b/aeneas/cdtw/cdtw_func.h index a7f3f16c..abef0212 100644 --- a/aeneas/cdtw/cdtw_func.h +++ b/aeneas/cdtw/cdtw_func.h @@ -9,55 +9,60 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" */ +#include "cint.h" + +#define CDTW_SUCCESS 0 +#define CDTW_FAILURE 1 + struct PATH_CELL { - unsigned int i; - unsigned int j; + uint32_t i; // row index in the virtual full matrix (n x m) + uint32_t j; // column index in the virtual full matrix (n x m) }; // compute cost matrix from mfcc? -void _compute_cost_matrix( - double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n) - double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m) - const unsigned int delta, // margin parameter - double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row; delta/2 <= centers[i] < m - delta/2 - const unsigned int n, // number of frames of the first wave - const unsigned int m, // number of frames of the second wave - const unsigned int l // number of MFCCs +int _compute_cost_matrix( + double *mfcc1_ptr, // pointer to the MFCCs of the first wave (2D, l x n) + double *mfcc2_ptr, // pointer to the MFCCs of the second wave (2D, l x m) + const uint32_t delta, // margin parameter + double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) + uint32_t *centers_ptr, // pointer to the centers (1D, n); centers[i] = center for the i-th row + const uint32_t n, // number of frames (MFCC vectors) of the first wave + const uint32_t m, // number of frames (MFCC vectors) of the second wave + const uint32_t l // MFCC size ); // compute accumulated cost matrix, not in-place -void _compute_accumulated_cost_matrix( - double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - const unsigned int n, // number of frames of the first wave - const unsigned int delta, // margin parameter - double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta) +int _compute_accumulated_cost_matrix( + const double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta, // margin parameter + double *accumulated_cost_matrix_ptr // pointer to the accumulated cost matrix (2D, n x delta) ); // compute accumulated cost matrix, in-place // (i.e., this function overwrites cost_matrix with the accumulated cost values) -void _compute_accumulated_cost_matrix_in_place( - double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - const unsigned int n, // number of frames of the first wave - const unsigned int delta // margin parameter +int _compute_accumulated_cost_matrix_in_place( + double *cost_matrix_ptr, // pointer to the cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta // margin parameter ); -// compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, delta-1) -void _compute_best_path( - double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta) - unsigned int *centers_ptr, // pointer to the centers (1D, n) - const unsigned int n, // number of frames of the first wave - const unsigned int delta, // margin parameter - struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path - unsigned int *best_path_len // length of the best path +// compute best path and return it as a list of (i, j) tuples, from (0,0) to (n-1, m-1) +int _compute_best_path( + const double *accumulated_cost_matrix_ptr, // pointer to the accumulated cost matrix (2D, n x delta) + const uint32_t *centers_ptr, // pointer to the centers (1D, n) + const uint32_t n, // number of frames of the first wave + const uint32_t delta, // margin parameter + struct PATH_CELL **best_path_ptr, // pointer to the list of cells making the best path + uint32_t *best_path_len // length of the best path ); diff --git a/aeneas/cdtw/cdtw_py.c b/aeneas/cdtw/cdtw_py.c index fc004679..14e516ea 100644 --- a/aeneas/cdtw/cdtw_py.c +++ b/aeneas/cdtw/cdtw_py.c @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -25,7 +25,7 @@ __status__ = "Production" #include "cdtw_func.h" // append a new tuple (i, j) to the given list -static void _append(PyObject *list, unsigned int i, unsigned int j) { +static void _append(PyObject *list, uint32_t i, uint32_t j) { PyObject *tuple; tuple = PyTuple_New(2); @@ -36,15 +36,10 @@ static void _append(PyObject *list, unsigned int i, unsigned int j) { } // convert array of struct to list of tuples -static void _array_to_list(struct PATH_CELL *best_path, unsigned int best_path_length, PyObject *list) { - //unsigned int i, j; - unsigned int k; +static void _array_to_list(struct PATH_CELL *best_path, uint32_t best_path_length, PyObject *list) { + uint32_t k; for (k = 0; k < best_path_length; ++k) { - //i = (*best_path).i; - //j = (*best_path).j; - //printf("k = %d : i = %d, j = %d\n", k, (int)i, (int)j); - //printf("k = %d : i = %d, j = %d\n", k, best_path[k].i, best_path[k].j); _append(list, best_path[k].i, best_path[k].j); } } @@ -53,22 +48,22 @@ static void _array_to_list(struct PATH_CELL *best_path, unsigned int best_path_l // take the PyObject containing the following arguments: // - mfcc1: 2D array (l x n) of double, MFCCs of the first wave // - mfcc2: 2D array (l x m) of double, MFCCs of the second wave -// - delta: int, the number of frames of margin -// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, delta-1) +// - delta: uint, the number of frames of margin +// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, m-1) static PyObject *compute_best_path(PyObject *self, PyObject *args) { PyObject *mfcc1_raw; PyObject *mfcc2_raw; - unsigned int delta; + uint32_t delta; PyArrayObject *mfcc1, *mfcc2, *cost_matrix, *centers; PyObject *best_path_ptr; npy_intp cost_matrix_dimensions[2]; npy_intp centers_dimensions[1]; double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr; - unsigned int *centers_ptr; - unsigned int l1, l2, n, m; + uint32_t *centers_ptr; + uint32_t l1, l2, n, m; struct PATH_CELL *best_path; - unsigned int best_path_length; + uint32_t best_path_length; // O = object (do not convert or check for errors) // I = unsigned int @@ -87,13 +82,11 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) { return NULL; } - // NOTE: if arrived here, the mfcc? have the correct number of dimensions (2) - // get the dimensions of the input arguments l1 = PyArray_DIMS(mfcc1)[0]; // number of MFCCs in the first wave l2 = PyArray_DIMS(mfcc2)[0]; // number of MFCCs in the second wave - n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave - m = PyArray_DIMS(mfcc2)[1];; // number of frames in the second wave + n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave + m = PyArray_DIMS(mfcc2)[1]; // number of frames in the second wave // check that the number of MFCCs is the same for both waves if (l1 != l2) { @@ -107,8 +100,8 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) { } // pointer to cost matrix data - mfcc1_ptr = (double *)PyArray_DATA(mfcc1); - mfcc2_ptr = (double *)PyArray_DATA(mfcc2); + mfcc1_ptr = (double *)PyArray_DATA(mfcc1); + mfcc2_ptr = (double *)PyArray_DATA(mfcc2); // create cost matrix object cost_matrix_dimensions[0] = n; @@ -118,13 +111,36 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) { // create centers object centers_dimensions[0] = n; - centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_INT32); - centers_ptr = (unsigned int *)PyArray_DATA(centers); + centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_UINT32); + centers_ptr = (uint32_t *)PyArray_DATA(centers); // actual computation - _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1); - _compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, n, delta); - _compute_best_path(cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length); + if (_compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1) != CDTW_SUCCESS) { + Py_XDECREF(mfcc1); + Py_XDECREF(mfcc2); + Py_XDECREF(cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing cost matrix"); + return NULL; + } + + if (_compute_accumulated_cost_matrix_in_place(cost_matrix_ptr, centers_ptr, n, delta) != CDTW_SUCCESS) { + Py_XDECREF(mfcc1); + Py_XDECREF(mfcc2); + Py_XDECREF(cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix"); + return NULL; + } + + if (_compute_best_path(cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length) != CDTW_SUCCESS) { + Py_XDECREF(mfcc1); + Py_XDECREF(mfcc2); + Py_XDECREF(cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing best path"); + return NULL; + } // convert array of struct to list of tuples best_path_ptr = PyList_New(0); @@ -145,22 +161,22 @@ static PyObject *compute_best_path(PyObject *self, PyObject *args) { // take the PyObject containing the following arguments: // - mfcc1: 2D array (l x n) of double, MFCCs of the first wave // - mfcc2: 2D array (l x m) of double, MFCCs of the second wave -// - delta: int, the number of frames of margin +// - delta: uint, the number of frames of margin // and return a tuple (cost_matrix, centers), where // - cost_matrix: 2D array (n x delta) of double -// - centers: 1D array (n x 1) of int, centers[i] is the 0 <= center < m of the stripe at row i +// - centers: 1D array (n x 1) of uint, centers[i] is the 0 <= center < m of the stripe at row i static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) { PyObject *mfcc1_raw; PyObject *mfcc2_raw; - unsigned int delta; + uint32_t delta; PyArrayObject *mfcc1, *mfcc2, *cost_matrix, *centers; PyObject *tuple; npy_intp cost_matrix_dimensions[2]; npy_intp centers_dimensions[1]; double *mfcc1_ptr, *mfcc2_ptr, *cost_matrix_ptr; - unsigned int *centers_ptr; - unsigned int l1, l2, n, m; + uint32_t *centers_ptr; + uint32_t l1, l2, n, m; // O = object (do not convert or check for errors) // I = unsigned int @@ -179,13 +195,11 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) { return NULL; } - // NOTE: if arrived here, the mfcc? have the correct number of dimensions (2) - // get the dimensions of the input arguments l1 = PyArray_DIMS(mfcc1)[0]; // number of MFCCs in the first wave l2 = PyArray_DIMS(mfcc2)[0]; // number of MFCCs in the second wave - n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave - m = PyArray_DIMS(mfcc2)[1];; // number of frames in the second wave + n = PyArray_DIMS(mfcc1)[1]; // number of frames in the first wave + m = PyArray_DIMS(mfcc2)[1]; // number of frames in the second wave // check that the number of MFCCs is the same for both waves if (l1 != l2) { @@ -199,8 +213,8 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) { } // pointer to cost matrix data - mfcc1_ptr = (double *)PyArray_DATA(mfcc1); - mfcc2_ptr = (double *)PyArray_DATA(mfcc2); + mfcc1_ptr = (double *)PyArray_DATA(mfcc1); + mfcc2_ptr = (double *)PyArray_DATA(mfcc2); // create cost matrix object cost_matrix_dimensions[0] = n; @@ -210,11 +224,18 @@ static PyObject *compute_cost_matrix_step(PyObject *self, PyObject *args) { // create centers object centers_dimensions[0] = n; - centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_INT32); - centers_ptr = (unsigned int *)PyArray_DATA(centers); + centers = (PyArrayObject *)PyArray_SimpleNew(1, centers_dimensions, NPY_UINT32); + centers_ptr = (uint32_t *)PyArray_DATA(centers); // compute cost matrix - _compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1); + if (_compute_cost_matrix(mfcc1_ptr, mfcc2_ptr, delta, cost_matrix_ptr, centers_ptr, n, m, l1) != CDTW_SUCCESS) { + Py_XDECREF(mfcc1); + Py_XDECREF(mfcc2); + Py_XDECREF(cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing cost matrix"); + return NULL; + } // decrement reference to local object no longer needed Py_DECREF(mfcc1); @@ -242,8 +263,8 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject * PyArrayObject *cost_matrix, *centers, *accumulated_cost_matrix; npy_intp accumulated_cost_matrix_dimensions[2]; double *cost_matrix_ptr, *accumulated_cost_matrix_ptr; - unsigned int *centers_ptr; - unsigned int n, delta; + uint32_t *centers_ptr; + uint32_t n, delta; // O = object (do not convert or check for errors) if (!PyArg_ParseTuple(args, "OO", &cost_matrix_raw, ¢ers_raw)) { @@ -253,7 +274,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject * // convert to C contiguous array cost_matrix = (PyArrayObject *) PyArray_ContiguousFromAny(cost_matrix_raw, NPY_DOUBLE, 2, 2); - centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_INT32, 1, 1); + centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_UINT32, 1, 1); // pointer to cost matrix data cost_matrix_ptr = (double *)PyArray_DATA(cost_matrix); @@ -269,7 +290,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject * } // pointer to centers data - centers_ptr = (unsigned int *)PyArray_DATA(centers); + centers_ptr = (uint32_t *)PyArray_DATA(centers); // create accumulated cost matrix object accumulated_cost_matrix_dimensions[0] = n; @@ -280,7 +301,12 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject * accumulated_cost_matrix_ptr = (double *)PyArray_DATA(accumulated_cost_matrix); // compute accumulated cost matrix - _compute_accumulated_cost_matrix(cost_matrix_ptr, centers_ptr, n, delta, accumulated_cost_matrix_ptr); + if (_compute_accumulated_cost_matrix(cost_matrix_ptr, centers_ptr, n, delta, accumulated_cost_matrix_ptr) != CDTW_SUCCESS) { + Py_XDECREF(cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix"); + return NULL; + } // decrement reference to local object no longer needed Py_DECREF(cost_matrix); @@ -294,7 +320,7 @@ static PyObject *compute_accumulated_cost_matrix_step(PyObject *self, PyObject * // take the PyObject containing the following arguments: // - accumulated_cost_matrix: 2D array (n x delta) of double // - centers: 1D array (n x 1) of int, centers[i] is the 0 <= center < m of the stripe at row i -// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, delta-1) +// and return the best path as a list of (i, j) tuples, from (0,0) to (n-1, m-1) static PyObject *compute_best_path_step(PyObject *self, PyObject *args) { PyObject *accumulated_cost_matrix_raw; PyObject *centers_raw; @@ -302,10 +328,10 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) { PyArrayObject *accumulated_cost_matrix, *centers; PyObject *best_path_ptr; double *accumulated_cost_matrix_ptr; - unsigned int *centers_ptr; - unsigned int n, delta; + uint32_t *centers_ptr; + uint32_t n, delta; struct PATH_CELL *best_path; - unsigned int best_path_length; + uint32_t best_path_length; // O = object (do not convert or check for errors) if (!PyArg_ParseTuple(args, "OO", &accumulated_cost_matrix_raw, ¢ers_raw)) { @@ -315,7 +341,7 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) { // convert to C contiguous array accumulated_cost_matrix = (PyArrayObject *) PyArray_ContiguousFromAny(accumulated_cost_matrix_raw, NPY_DOUBLE, 2, 2); - centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_INT32, 1, 1); + centers = (PyArrayObject *) PyArray_ContiguousFromAny(centers_raw, NPY_UINT32, 1, 1); // pointer to cost matrix data accumulated_cost_matrix_ptr = (double *)PyArray_DATA(accumulated_cost_matrix); @@ -331,13 +357,18 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) { } // pointer to centers data - centers_ptr = (unsigned int *)PyArray_DATA(centers); + centers_ptr = (uint32_t *)PyArray_DATA(centers); // create best path array of integers best_path_ptr = PyList_New(0); // compute best path - _compute_best_path(accumulated_cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length); + if (_compute_best_path(accumulated_cost_matrix_ptr, centers_ptr, n, delta, &best_path, &best_path_length) != CDTW_SUCCESS) { + Py_XDECREF(accumulated_cost_matrix); + Py_XDECREF(centers); + PyErr_SetString(PyExc_ValueError, "Error while computing accumulated cost matrix"); + return NULL; + } // convert array of struct to list of tuples _array_to_list(best_path, best_path_length, best_path_ptr); @@ -355,31 +386,43 @@ static PyObject *compute_best_path_step(PyObject *self, PyObject *args) { static PyMethodDef cdtw_methods[] = { - // compute best path at once { "compute_best_path", compute_best_path, METH_VARARGS, - "Given the MFCCs of the two waves, compute and return the DTW best path at once" + "Given the MFCCs of the two waves, compute and return the DTW best path at once\n" + ":param object mfcc1: numpy 2D matrix (mfcc_size, n) of MFCCs of the first wave\n" + ":param object mfcc2: numpy 2D matrix (mfcc_size, m) of MFCCs of the second wave\n" + ":param uint delta: the margin, in number of frames\n" + ":rtype: a list of tuples (i, j), from (0, 0) to (n-1, m-1) representing the best path" }, - // compute in separate steps { "compute_cost_matrix_step", compute_cost_matrix_step, METH_VARARGS, - "Given the MFCCs of the two waves, compute and return the DTW cost matrix" + "Given the MFCCs of the two waves, compute and return the DTW cost matrix\n" + ":param object mfcc1: numpy 2D matrix (mfcc_size, n) of MFCCs of the first wave\n" + ":param object mfcc2: numpy 2D matrix (mfcc_size, m) of MFCCs of the second wave\n" + ":param uint delta: the margin, in number of frames\n" + ":rtype: tuple (cost_matrix, centers)" }, { "compute_accumulated_cost_matrix_step", compute_accumulated_cost_matrix_step, METH_VARARGS, - "Given the DTW cost matrix, compute and return the DTW accumulated cost matrix" + "Given the DTW cost matrix, compute and return the DTW accumulated cost matrix\n" + ":param object cost_matrix: the cost matrix (n, delta)\n" + ":param object centers: the centers (n)\n" + ":rtype: the accumulated cost matrix" }, { "compute_best_path_step", compute_best_path_step, METH_VARARGS, - "Given the DTW accumulated cost matrix, compute and return the DTW best path" + "Given the DTW accumulated cost matrix, compute and return the DTW best path\n" + ":param object accumulated_cost_matrix: the accumulated cost matrix (n, delta)\n" + ":param object centers: the centers (n)\n" + ":rtype: a list of tuples (i, j), from (0, 0) to (n-1, m-1) representing the best path" }, { NULL, diff --git a/aeneas/cdtw/cdtw_setup.py b/aeneas/cdtw/cdtw_setup.py index a844b8fe..aab0c739 100644 --- a/aeneas/cdtw/cdtw_setup.py +++ b/aeneas/cdtw/cdtw_setup.py @@ -23,15 +23,15 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -CMODULE = Extension("cdtw", sources=["cdtw_py.c", "cdtw_func.c"], include_dirs=[get_include()]) +CMODULE = Extension("cdtw", sources=["cdtw_py.c", "cdtw_func.c", "cint.c"], include_dirs=[get_include()]) setup( name="cdtw", - version="1.4.1", + version="1.5.0", description=""" Python C Extension for computing the DTW as fast as your bare metal allows. """, diff --git a/aeneas/cdtw/cint.c b/aeneas/cdtw/cint.c new file mode 120000 index 00000000..8e1c9dae --- /dev/null +++ b/aeneas/cdtw/cint.c @@ -0,0 +1 @@ +../cint/cint.c \ No newline at end of file diff --git a/aeneas/cdtw/cint.h b/aeneas/cdtw/cint.h new file mode 120000 index 00000000..27a6bb39 --- /dev/null +++ b/aeneas/cdtw/cint.h @@ -0,0 +1 @@ +../cint/cint.h \ No newline at end of file diff --git a/aeneas/cew/000_compile_driver.sh b/aeneas/cew/000_compile_driver.sh new file mode 100644 index 00000000..e26727c6 --- /dev/null +++ b/aeneas/cew/000_compile_driver.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +gcc cew_driver.c cew_func.c -lespeak -o cew_driver -Wall -pedantic -std=c99 + + + diff --git a/aeneas/cew/100_run_driver.sh b/aeneas/cew/100_run_driver.sh new file mode 100644 index 00000000..f11e4eef --- /dev/null +++ b/aeneas/cew/100_run_driver.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +if [ ! -e cew_driver ] +then + bash 000_compile_driver.sh +fi + +echo "Run 1" +./cew_driver +echo "" + +echo "Run 2" +./cew_driver en "Hello World" /tmp/out.wav single +echo "" + +echo "Run 3" +./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 0.0 0 +echo "" + +echo "Run 4" +./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 0.0 1 +echo "" + +echo "Run 4" +./cew_driver en "Hello|World|My|Dear|Friend" /tmp/out.wav multi 2.0 1 +echo "" + + + + diff --git a/aeneas/cew/800_compile_py.sh b/aeneas/cew/800_compile_py.sh new file mode 100644 index 00000000..390950af --- /dev/null +++ b/aeneas/cew/800_compile_py.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +rm -rf build *.so +python cew_setup.py build_ext --inplace + diff --git a/aeneas/cew/README.md b/aeneas/cew/README.md new file mode 100644 index 00000000..8925eb4f --- /dev/null +++ b/aeneas/cew/README.md @@ -0,0 +1,22 @@ +# aeneas.cew + +**aeneas.cew** is a Python C extension to synthesize text with eSpeak. + +## API + +See the [__init__.py](__init__.py) file. + +## Compiling the Python C extension locally + +```bash +$ python cew_setup.py build_ext --inplace +``` + +## Compiling the pure C driver program + +```bash +$ bash 000_compile_driver.sh +``` + + + diff --git a/aeneas/cew/__init__.py b/aeneas/cew/__init__.py index 96f3cb04..eea4eea4 100644 --- a/aeneas/cew/__init__.py +++ b/aeneas/cew/__init__.py @@ -2,7 +2,56 @@ # coding=utf-8 """ -aeneas.cew is a Python C extension to synthesize text with eSpeak +aeneas.cew is a Python C extension to synthesize text with eSpeak. + +The functions provided by this module are: + +.. function:: cew.synthesize_single(output_file_path, voice_code, text) + + Synthesize a single text fragment into a single WAVE file. + + The returned tuple ``(sr, begin, end)`` contains + the sample rate and the begin and end time values + of the output WAVE file. + + Note that ``begin`` is always ``0.0``, while ``end`` is equal to the + duration of the synthesized WAVE file, in seconds. + + :param string output_file_path: the path of the WAVE file to be created, UTF-8 encoded + :param string voice_code: the eSpeak voice code (e.g., ``en``, ``en-gb``, ``it``, etc.) + :param string text: the text to be synthesized, UTF-8 encoded + :rtype: tuple + + +.. function:: cew.synthesize_multiple(output_file_path, quit_after, backwards, text) + + Synthesize several text fragments into a single WAVE file. + + The returned tuple ``(sr, synt, anchors)`` contains + the sample rate of the output WAVE file, + the number of fragments actually synthesized, + and a list of time values, each representing + the begin time in the output WAVE file + of the corresponding text fragment. + + Note that if ``quit_after`` is specified, + the number ``synt`` of fragments actually synthesized + might be less than the number of fragments in ``text``. + + :param string output_file_path: the path of the WAVE file to be created, UTF-8 encoded + :param float quit_after: stop synthesizing after reaching the given duration (in seconds) + :param int backwards: if nonzero, synthesize backwards, that is, + starting from the last fragment. + In any case, the fragments in the output WAVE file + will be in natural order. + This option is meaningful only if ``quit_after > 0``. + :param list text: a list of ``(voice_code, fragment_text)`` tuples + with the text to be synthesized. + The ``voice_code`` is the the eSpeak voice code + (e.g., ``en``, ``en-gb``, ``it``, etc.). + The ``fragment_text`` must be UTF-8 encoded. + :rtype: tuple + """ __author__ = "Alberto Pettarin" @@ -12,7 +61,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL 3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" diff --git a/aeneas/cew/cew_driver.c b/aeneas/cew/cew_driver.c index 3f6dbd71..65354bc9 100644 --- a/aeneas/cew/cew_driver.c +++ b/aeneas/cew/cew_driver.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for synthesizing text with eSpeak __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -21,17 +21,26 @@ __status__ = "Production" #include "cew_func.h" +#define DRIVER_SUCCESS 0 +#define DRIVER_FAILURE 1 + // print usage -void usage(const char *prog) { +void _usage(const char *prog) { + printf("\n"); + printf("Usage: %s VOICE_CODE TEXT AUDIO_FILE.wav single\n", prog); + printf(" %s VOICE_CODE TEXT AUDIO_FILE.wav multi QUIT_AFTER BACKWARDS\n", prog); + printf("\n"); + printf("Example: %s en \"Hello World\" /tmp/out.wav single\n", prog); + printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 0.0 0\n", prog); + printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 0.0 1\n", prog); + printf(" %s en \"Hello|World|My|Dear|Friend\" /tmp/out.wav multi 2.0 1\n", prog); printf("\n"); - printf("Usage: %s VOICE_CODE TEXT AUDIO_FILE.wav single\n", prog); - printf(" %s VOICE_CODE TEXT AUDIO_FILE.wav multi QUIT_AFTER BACKWARDS\n\n", prog); } // split a given string using a delimiter character -// adapted from http://stackoverflow.com/questions/9210528/split-string-with-delimiters-in-c -char** str_split(char* a_str, const char a_delim, int *count) -{ +// adapted from +// http://stackoverflow.com/questions/9210528/split-string-with-delimiters-in-c +char **_str_split(char* a_str, const char a_delim, int *count) { char** result = 0; char* tmp = a_str; char* last_delim = 0; @@ -51,8 +60,8 @@ char** str_split(char* a_str, const char a_delim, int *count) // add space for trailing token (*count) += last_delim < (a_str + strlen(a_str) - 1); - result = malloc(sizeof(char*) * (*count)); - + // tokenize + result = calloc((*count), sizeof(char*)); if (result) { size_t idx = 0; char* token = strtok(a_str, delim); @@ -66,19 +75,6 @@ char** str_split(char* a_str, const char a_delim, int *count) return result; } -// -// this is a simple driver to test on the command line -// compile with: -// -// gcc cew_driver.c cew_func.c -lespeak -o cew_driver -// -// and use it as: -// -// ./cew_driver en "Hello World" out.wav single => synth single -// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 0.0 0 => synth multi normal -// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 0.0 1 => synth multi normal, quit after reaching 2.0 seconds -// ./cew_driver en "Hello|World|My|Dear|Friend" out.wav multi 2.0 1 => synth multi backwards, quit after reaching 2.0 seconds -// int main(int argc, char **argv) { const char *voice_code, *text, *output_file_name, *mode; @@ -88,11 +84,11 @@ int main(int argc, char **argv) { struct FRAGMENT_INFO *fragments; char **texts; int i, n; - unsigned int synthesized_ret; + size_t synthesized_ret; if (argc < 5) { - usage(argv[0]); - return 1; + _usage(argv[0]); + return DRIVER_FAILURE; } voice_code = argv[1]; text = argv[2]; @@ -101,15 +97,15 @@ int main(int argc, char **argv) { if (strcmp(mode, "multi") == 0) { if (argc < 7) { - usage(argv[0]); - return 1; + _usage(argv[0]); + return DRIVER_FAILURE; } quit_after = (float)atof(argv[5]); backwards = atoi(argv[6]); // split text into fragments n = 0; - texts = str_split((char *)text, '|', &n); + texts = _str_split((char *)text, '|', &n); // create fragments fragments = (struct FRAGMENT_INFO *)calloc(sizeof(fragment), n); @@ -127,12 +123,12 @@ int main(int argc, char **argv) { backwards, &sample_rate_ret, &synthesized_ret - ) != 0) { + ) != CEW_SUCCESS) { printf("Error while calling _synthesize_single()\n"); - return 1; + return DRIVER_FAILURE; } printf("Sample rate: %d\n", sample_rate_ret); - printf("Synthesized: %u\n", synthesized_ret); + printf("Synthesized: %lu\n", synthesized_ret); for (i = 0; i < synthesized_ret; ++i) { printf("%d %.3f %.3f\n", i, fragments[i].begin, fragments[i].end); } @@ -148,15 +144,15 @@ int main(int argc, char **argv) { } else { fragment.voice_code = voice_code; fragment.text = text; - if (_synthesize_single(output_file_name, &sample_rate_ret, &fragment) != 0) { + if (_synthesize_single(output_file_name, &sample_rate_ret, &fragment) != CEW_SUCCESS) { printf("Error while calling _synthesize_single()\n"); - return 1; + return DRIVER_FAILURE; } printf("Sample rate: %d\n", sample_rate_ret); printf("Begin: %.3f\n", fragment.begin); printf("End: %.3f\n", fragment.end); } - return 0; + return DRIVER_SUCCESS; } diff --git a/aeneas/cew/cew_func.c b/aeneas/cew/cew_func.c index a4481fcb..2cfee22b 100644 --- a/aeneas/cew/cew_func.c +++ b/aeneas/cew/cew_func.c @@ -1,6 +1,6 @@ /* -Python C Extension for synthesizing text with espeak +Python C Extension for synthesizing text with eSpeak __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -30,8 +30,19 @@ static int sample_rate; static FILE *wave_file = NULL; -// write an uint32_t as a big endian int to file -// that is, least significant first +/* +00000000 52 49 46 46 XX XX XX XX 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt | +00000010 10 00 00 00 01 00 01 00 22 56 00 00 44 ac 00 00 |........"V..D...| +00000020 02 00 10 00 64 61 74 61 XX XX XX XX |....data.... | +*/ +static const unsigned char wave_hdr[44] = { + 'R' , 'I', 'F' , 'F', 0x2c , 0 , 0 , 0 , 'W' , 'A' , 'V' , 'E' , 'f' , 'm' , 't', ' ', + 0x10, 0 , 0 , 0 , 1 , 0 , 1 , 0 , 9 , 0x3d, 0 , 0 , 0x12, 0x7a, 0 , 0 , + 2 , 0 , 0x10 , 0 , 'd' , 'a' , 't' , 'a' , 0 , 0 , 0 , 0 +}; + +// write an uint32_t as a little endian int to file +// that is, least significant byte first void _write_uint32_t(FILE *f, int value) { int ix; for (ix = 0; ix < 4; ix++) { @@ -45,19 +56,8 @@ void _write_uint32_t(FILE *f, int value) { // will be set by _close_wave_file() // once all audio samples are generated int _open_wave_file(char const *path, int rate) { - /* - 00000000 52 49 46 46 XX XX XX XX 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt | - 00000010 10 00 00 00 01 00 01 00 22 56 00 00 44 ac 00 00 |........"V..D...| - 00000020 02 00 10 00 64 61 74 61 XX XX XX XX |....data.... | - */ - static unsigned char wave_hdr[44] = { - 'R' , 'I', 'F' , 'F', 0x2c , 0 , 0 , 0 , 'W' , 'A' , 'V' , 'E' , 'f' , 'm' , 't', ' ', - 0x10, 0 , 0 , 0 , 1 , 0 , 1 , 0 , 9 , 0x3d, 0 , 0 , 0x12, 0x7a, 0 , 0 , - 2 , 0 , 0x10 , 0 , 'd' , 'a' , 't' , 'a' , 0 , 0 , 0 , 0 - }; - if (path == NULL) { - return 2; + return CEW_FAILURE; } while (isspace(*path)) { @@ -71,22 +71,22 @@ int _open_wave_file(char const *path, int rate) { } if (wave_file == NULL) { - return 1; + return CEW_FAILURE; } fwrite(wave_hdr, 1, 24, wave_file); _write_uint32_t(wave_file, rate); _write_uint32_t(wave_file, rate * 2); fwrite(&wave_hdr[32], 1, 12, wave_file); - return 0; + return CEW_SUCCESS; } // close wave file int _close_wave_file(void) { - unsigned int pos; + long pos; if (wave_file == NULL) { - return 1; + return CEW_FAILURE; } // flush and get the current position, @@ -106,13 +106,13 @@ int _close_wave_file(void) { fclose(wave_file); wave_file = NULL; - return 0; + return CEW_SUCCESS; } // callback for synth events int _synth_callback(short *wav, int numsamples, espeak_EVENT *events) { if (wav == NULL) { - return 1; + return CEW_FAILURE; } while (events->type != 0) { if (events->type == espeakEVENT_SAMPLERATE) { @@ -128,7 +128,7 @@ int _synth_callback(short *wav, int numsamples, espeak_EVENT *events) { if (numsamples > 0) { fwrite(wav, numsamples * 2, 1, wave_file); } - return 0; + return CEW_SUCCESS; } // terminate synthesis and close file @@ -145,10 +145,10 @@ int _synthesize_string(char const *text) { espeak_Synth(text, size + 1, 0, POS_CHARACTER, 0, synth_flags, NULL, NULL); } if (espeak_Synchronize() != EE_OK) { - return 1; + return CEW_FAILURE; } current_time += last_end_time; - return 0; + return CEW_SUCCESS; } // set the current language @@ -159,9 +159,9 @@ int _set_voice_code(char const *voice_code) { memset(&voice, 0, sizeof(voice)); voice.languages = voice_code; if (espeak_SetVoiceByProperties(&voice) != EE_OK) { - return 1; + return CEW_FAILURE; } - return 0; + return CEW_SUCCESS; } // initialize the synthesizer @@ -176,7 +176,8 @@ int _initialize_synthesizer(char const *output_file_path) { sample_rate = 0; // synthesizer flags - synth_flags = espeakCHARS_UTF8 | espeakPHONEMES | espeakENDPAUSE; + // TODO let the user control espeakENDPAUSE + synth_flags = espeakCHARS_UTF8 | espeakENDPAUSE; // writing to a file (or no output), we can use synchronous mode sample_rate = espeak_Initialize(AUDIO_OUTPUT_SYNCHRONOUS, 0, data_path, 0); @@ -217,8 +218,8 @@ int _initialize_synthesizer(char const *output_file_path) { // open wave file if (wave_file == NULL) { - if(_open_wave_file(output_file_path, sample_rate) != 0) { - return 1; + if(_open_wave_file(output_file_path, sample_rate) != CEW_SUCCESS) { + return CEW_FAILURE; } } @@ -226,52 +227,52 @@ int _initialize_synthesizer(char const *output_file_path) { current_time = 0.0; last_end_time = 0.0; - return 0; + return CEW_SUCCESS; } // synthesize a single text fragment int _synthesize_single( const char *output_file_path, int *sample_rate_ret, - struct FRAGMENT_INFO *fragment + struct FRAGMENT_INFO *fragment_ret ) { // open output wave file - if (_initialize_synthesizer(output_file_path) != 0) { - return 1; + if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) { + return CEW_FAILURE; } // set voice code - if (_set_voice_code((*fragment).voice_code) != 0) { - return 1; + if (_set_voice_code((*fragment_ret).voice_code) != CEW_SUCCESS) { + return CEW_FAILURE; } // synthesize text *sample_rate_ret = sample_rate; - (*fragment).begin = current_time; - if (_synthesize_string((*fragment).text) != 0) { - return 1; + (*fragment_ret).begin = current_time; + if (_synthesize_string((*fragment_ret).text) != CEW_SUCCESS) { + return CEW_FAILURE; } - (*fragment).end = current_time; + (*fragment_ret).end = current_time; // close output wave file _terminate_synthesis(); - return 0; + return CEW_SUCCESS; } // synthesize multiple fragments int _synthesize_multiple( const char *output_file_path, - struct FRAGMENT_INFO **ret, - const int number_of_fragments, + struct FRAGMENT_INFO **fragments_ret, + const size_t number_of_fragments, const float quit_after, const int backwards, int *sample_rate_ret, - unsigned int *synthesized_ret + size_t *synthesized_ret ) { - int i, synthesized, start; + size_t i, synthesized, start; start = 0; @@ -280,23 +281,27 @@ int _synthesize_multiple( // from the back we need to reach quit_after seconds of audio // open output wave file - if (_initialize_synthesizer(output_file_path) != 0) { - return 1; + if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) { + return CEW_FAILURE; } // synthesize from the back - for (i = number_of_fragments - 1; i >= 0 ; --i) { - if (_set_voice_code((*ret)[i].voice_code) != 0) { - return 1; + for (i = number_of_fragments - 1; ; --i) { + if (_set_voice_code((*fragments_ret)[i].voice_code) != CEW_SUCCESS) { + return CEW_FAILURE; } - if (_synthesize_string((*ret)[i].text) != 0) { - return 1; + if (_synthesize_string((*fragments_ret)[i].text) != CEW_SUCCESS) { + return CEW_FAILURE; } start = i; // check if we generated >= quit_after seconds of audio if (current_time >= quit_after) { break; } + // end of the loop, checked here because i is size_t i.e. unsigned! + if (i == 0) { + break; + } } // close output wave file @@ -304,8 +309,8 @@ int _synthesize_multiple( } // open output wave file - if (_initialize_synthesizer(output_file_path) != 0) { - return 1; + if (_initialize_synthesizer(output_file_path) != CEW_SUCCESS) { + return CEW_FAILURE; } // number of synthesized fragments @@ -313,8 +318,8 @@ int _synthesize_multiple( // loop over all input fragments for (i = start; i < number_of_fragments; ++i) { - if (_set_voice_code((*ret)[i].voice_code) != 0) { - return 1; + if (_set_voice_code((*fragments_ret)[i].voice_code) != CEW_SUCCESS) { + return CEW_FAILURE; } // NOTE: if backwards, we move the anchor times to the first fragments, @@ -322,11 +327,11 @@ int _synthesize_multiple( // despite the fact that they will not be saved with the "correct" text // this trick avoids copying data around // if backwards, the user is not expected to use the time anchors anyway - (*ret)[i-start].begin = current_time; - if (_synthesize_string((*ret)[i].text) != 0) { - return 1; + (*fragments_ret)[i-start].begin = current_time; + if (_synthesize_string((*fragments_ret)[i].text) != CEW_SUCCESS) { + return CEW_FAILURE; } - (*ret)[i-start].end = current_time; + (*fragments_ret)[i-start].end = current_time; synthesized += 1; // check if we generated >= quit_after seconds of audio @@ -342,7 +347,7 @@ int _synthesize_multiple( *sample_rate_ret = sample_rate; *synthesized_ret = synthesized; - return 0; + return CEW_SUCCESS; } diff --git a/aeneas/cew/cew_func.h b/aeneas/cew/cew_func.h index 58749679..7010d19c 100644 --- a/aeneas/cew/cew_func.h +++ b/aeneas/cew/cew_func.h @@ -1,6 +1,6 @@ /* -Python C Extension for synthesizing text with espeak +Python C Extension for synthesizing text with eSpeak __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,12 +9,15 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" */ +#define CEW_SUCCESS 0 +#define CEW_FAILURE 1 + struct FRAGMENT_INFO { float begin; float end; @@ -22,22 +25,49 @@ struct FRAGMENT_INFO { const char *text; }; -// synthesize a single text fragment +/* + Synthesize a single text fragment, + described by the FRAGMENT_INFO fragment_ret, + creating a WAVE file at output_file_path. + + The sample rate of the output WAVE file is stored + in sample_rate_ret, and the begin and end times + are stored in the begin and end attributes of + fragment_ret. +*/ int _synthesize_single( const char *output_file_path, - int *sample_rate_ret, - struct FRAGMENT_INFO *ret + int *sample_rate_ret, // int because the espeak lib returns it as such + struct FRAGMENT_INFO *fragment_ret ); -// synthesize multiple fragments +/* + Synthesize multiple text fragments, + described by the FRAGMENT_INFO fragments_ret array, + creating a WAVE file at output_file_path. + + If quit_after > 0, then the synthesis is terminated + as soon as the total duration reaches >= quit_after seconds. + + If backwards is != 0, then the synthesis is done + backwards, from the end of the fragments array. + This option is meaningful only if quit_after is > 0, + otherwise it has no effect. + + The sample rate of the output WAVE file is stored + in sample_rate_ret, the number of synthesized fragments + in synthesized_ret, and the begin and end times + are stored in the begin and end attributes of + the elements of fragments_ret. +*/ int _synthesize_multiple( const char *output_file_path, - struct FRAGMENT_INFO **ret, - const int number_of_fragments, + struct FRAGMENT_INFO **fragments_ret, + const size_t number_of_fragments, const float quit_after, const int backwards, - int *sample_rate_ret, - unsigned int *synthesized_ret + int *sample_rate_ret, // int because the espeak lib returns it as such + size_t *synthesized_ret ); diff --git a/aeneas/cew/cew_py.c b/aeneas/cew/cew_py.c index d41d3c43..77661b60 100644 --- a/aeneas/cew/cew_py.c +++ b/aeneas/cew/cew_py.c @@ -1,6 +1,6 @@ /* -Python C Extension for synthesizing text with espeak +Python C Extension for synthesizing text with eSpeak __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -28,7 +28,7 @@ static PyObject *synthesize_single(PyObject *self, PyObject *args) { PyObject *tuple; char const *output_file_path; struct FRAGMENT_INFO ret; - int sample_rate; + int sample_rate; // int because espeak lib returns it as such // s = string if (!PyArg_ParseTuple(args, "sss", &output_file_path, &ret.voice_code, &ret.text)) { @@ -42,7 +42,6 @@ static PyObject *synthesize_single(PyObject *self, PyObject *args) { } // build the tuple to be returned - // NOTE: returning sample_rate as an int, as the espeak lib does tuple = PyTuple_New(3); PyTuple_SetItem(tuple, 0, Py_BuildValue("i", sample_rate)); PyTuple_SetItem(tuple, 1, Py_BuildValue("f", ret.begin)); @@ -60,8 +59,8 @@ static PyObject *synthesize_multiple(PyObject *self, PyObject *args) { int const backwards; struct FRAGMENT_INFO *fragments_synt; - int sample_rate; - unsigned int number_of_fragments, i, synthesized; + int sample_rate; // int because espeak lib returns it as such + size_t number_of_fragments, i, synthesized; // s = string // f = float @@ -107,7 +106,8 @@ static PyObject *synthesize_multiple(PyObject *self, PyObject *args) { number_of_fragments, quit_after, backwards, - &sample_rate, &synthesized + &sample_rate, + &synthesized ) != 0) { PyErr_SetString(PyExc_ValueError, "Error while synthesizing multiple fragments"); free((void*)fragments_synt); @@ -150,13 +150,22 @@ static PyMethodDef cew_methods[] = { "synthesize_single", synthesize_single, METH_VARARGS, - "Synthesize a single text fragment with espeak" + "Synthesize a single text fragment with eSpeak\n" + ":param string output_file_path: the path of the WAVE file to be created\n" + ":param string voice_code: the voice code of the language to be used\n" + ":param string text: the text to be synthesized\n" + ":rtype: tuple (sample_rate, begin, end)" }, { "synthesize_multiple", synthesize_multiple, METH_VARARGS, - "Synthesize multiple text fragments with espeak" + "Synthesize multiple text fragments with eSpeak\n" + ":param string output_file_path: the path of the WAVE file to be created\n" + ":param float quit_after: if > 0, stop synthesizing when reaching quit_after seconds\n" + ":param int backwards: if 1, synthesize backwards, from the last fragment to the first\n" + ":param list fragments: list of (voice_code, text) tuples of text fragments to be synthesized\n" + ":rtype: tuple (sample_rate, synthesized, list) where list is a list of (begin, end) time values" }, { NULL, diff --git a/aeneas/cew/cew_setup.py b/aeneas/cew/cew_setup.py index d0f52ba1..41de70a1 100644 --- a/aeneas/cew/cew_setup.py +++ b/aeneas/cew/cew_setup.py @@ -2,7 +2,7 @@ # coding=utf-8 """ -Compile the Python C Extension for synthesizing text with espeak. +Compile the Python C Extension for synthesizing text with eSpeak. .. versionadded:: 1.3.0 """ @@ -21,7 +21,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -29,9 +29,9 @@ setup( name="cew", - version="1.4.1", + version="1.5.0", description=""" - Python C Extension for synthesizing text with espeak. + Python C Extension for synthesizing text with eSpeak. """, ext_modules=[CMODULE] ) diff --git a/aeneas/cewsubprocess.py b/aeneas/cewsubprocess.py new file mode 100644 index 00000000..1b69db80 --- /dev/null +++ b/aeneas/cewsubprocess.py @@ -0,0 +1,208 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +This module contains the following classes: + +* :class:`aeneas.cewsubprocess.CEWSubprocess` which is an + helper class executes the :mod:`aeneas.cew` C extension + in a separate process via ``subprocess``. + +This module works around a problem with the ``eSpeak`` library, +which seems to generate different audio data for the same +input parameters/text, when run multiple times in the same process. +See the following discussions for details: + +#. https://groups.google.com/d/msg/aeneas-forced-alignment/NLbtSRf2_vg/mMHuTQiFEgAJ +#. https://sourceforge.net/p/espeak/mailman/message/34861696/ + +.. warning:: This module might be removed in a future version + +.. versionadded:: 1.5.0 +""" + +from __future__ import absolute_import +from __future__ import print_function +import io +import subprocess +import sys + +from aeneas.logger import Loggable +from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.timevalue import TimeValue +import aeneas.globalfunctions as gf + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +class CEWSubprocess(Loggable): + """ + This helper class executes the ``aeneas.cew`` C extension + in a separate process by running + the :func:`aeneas.cewsubprocess.CEWSubprocess.main` function + via ``subprocess``. + + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + """ + + TAG = u"CEWSubprocess" + + def synthesize_single(self, audio_file_path, voice_code, text): + """ + Create a ``wav`` audio file containing the synthesized text. + + The ``text`` must be a unicode string encodable with UTF-8, + otherwise ``espeak`` might fail. + + Return the duration of the synthesized audio file, in seconds. + + :param string audio_file_path: the path of the output audio file + :param string voice_code: the code of the voice to use + :param string text: the text to synthesize + :rtype: :class:`~aeneas.timevalue.TimeValue` + """ + u_text = [(voice_code, text)] + sr, sf, intervals = self.synthesize_multiple(audio_file_path, 0, 0, u_text) + if len(intervals) > 0: + return intervals[0][1] + return None + + def synthesize_multiple(self, audio_file_path, c_quit_after, c_backwards, u_text): + """ + Synthesize the text contained in the given fragment list + into a ``wav`` file. + + :param string audio_file_path: the path to the output audio file + :param float c_quit_after: stop synthesizing as soon as + reaching this many seconds + :param bool c_backwards: synthesizing from the end of the text file + :param object u_text: a list of ``(voice_code, text)`` tuples + :rtype: tuple ``(sample_rate, synthesized, intervals)`` + """ + self.log([u"Audio file path: '%s'", audio_file_path]) + self.log([u"c_quit_after: '%.3f'", c_quit_after]) + self.log([u"c_backwards: '%d'", c_backwards]) + + text_file_handler, text_file_path = gf.tmp_file() + data_file_handler, data_file_path = gf.tmp_file() + self.log([u"Temporary text file path: '%s'", text_file_path]) + self.log([u"Temporary data file path: '%s'", data_file_path]) + + self.log(u"Populating the text file...") + with io.open(text_file_path, "w", encoding="utf-8") as tmp_text_file: + for f_voice_code, f_text in u_text: + tmp_text_file.write(u"%s %s\n" % (f_voice_code, f_text)) + self.log(u"Populating the text file... done") + + arguments = [ + self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_PATH], + "-m", + "aeneas.cewsubprocess", + "%.3f" % c_quit_after, + "%d" % c_backwards, + text_file_path, + audio_file_path, + data_file_path + ] + self.log([u"Calling with arguments '%s'", u" ".join(arguments)]) + proc = subprocess.Popen( + arguments, + stdout=subprocess.PIPE, + stdin=subprocess.PIPE, + stderr=subprocess.PIPE, + universal_newlines=True) + proc.communicate() + + self.log(u"Reading output data...") + with io.open(data_file_path, "r", encoding="utf-8") as data_file: + lines = data_file.read().splitlines() + sr = int(lines[0]) + sf = int(lines[1]) + intervals = [] + for line in lines[2:]: + values = line.split(u" ") + if len(values) == 2: + intervals.append((TimeValue(values[0]), TimeValue(values[1]))) + self.log(u"Reading output data... done") + + self.log(u"Deleting text and data files...") + gf.delete_file(text_file_handler, text_file_path) + gf.delete_file(data_file_handler, data_file_path) + self.log(u"Deleting text and data files... done") + + return (sr, sf, intervals) + + + +def main(): + """ + Run ``aeneas.cew``, reading input text from file and writing audio and interval data to file. + """ + + # make sure we have enough parameters + if len(sys.argv) < 6: + print("You must pass five arguments: QUIT_AFTER BACKWARDS TEXT_FILE_PATH AUDIO_FILE_PATH DATA_FILE_PATH") + return 1 + + # read parameters + c_quit_after = float(sys.argv[1]) # NOTE: cew needs float, not TimeValue + c_backwards = int(sys.argv[2]) + text_file_path = sys.argv[3] + audio_file_path = sys.argv[4] + data_file_path = sys.argv[5] + + # read (voice_code, text) from file + s_text = [] + with io.open(text_file_path, "r", encoding="utf-8") as text: + for line in text.readlines(): + # NOTE: not using strip() to avoid removing trailing blank characters + line = line.replace(u"\n", u"").replace(u"\r", u"") + idx = line.find(" ") + if idx > 0: + f_voice_code = line[:idx] + f_text = line[idx+1:] + #print("%s => '%s' and '%s'" % (line, f_voice_code, f_text)) + s_text.append((f_voice_code, f_text)) + + # convert to bytes/unicode as required by subprocess + c_text = [] + if gf.PY2: + for f_voice_code, f_text in s_text: + c_text.append((gf.safe_bytes(f_voice_code), gf.safe_bytes(f_text))) + else: + for f_voice_code, f_text in s_text: + c_text.append((gf.safe_unicode(f_voice_code), gf.safe_unicode(f_text))) + + try: + import aeneas.cew.cew + sr, sf, intervals = aeneas.cew.cew.synthesize_multiple( + audio_file_path, + c_quit_after, + c_backwards, + c_text + ) + with io.open(data_file_path, "w", encoding="utf-8") as data: + data.write(u"%d\n" % (sr)) + data.write(u"%d\n" % (sf)) + data.write(u"\n".join([u"%.3f %.3f" % (i[0], i[1]) for i in intervals])) + except Exception as exc: + print(u"Unexpected error: %s" % str(exc)) + + + +if __name__ == "__main__": + main() + + + diff --git a/aeneas/cint/README.md b/aeneas/cint/README.md new file mode 100644 index 00000000..f08a006f --- /dev/null +++ b/aeneas/cint/README.md @@ -0,0 +1,6 @@ +# aeneas.cint + +This directory contains portable +fixed-size int type definitions and functions +for the other Python C extensions. + diff --git a/aeneas/cint/__init__.py b/aeneas/cint/__init__.py new file mode 100644 index 00000000..7087cd97 --- /dev/null +++ b/aeneas/cint/__init__.py @@ -0,0 +1,23 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +aeneas.cint contains portable +fixed-size int type definitions and functions +for the other Python C extensions. +""" + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL 3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + + + + diff --git a/aeneas/cint/cint.c b/aeneas/cint/cint.c new file mode 100644 index 00000000..99541bfc --- /dev/null +++ b/aeneas/cint/cint.c @@ -0,0 +1,111 @@ +/* + +Portable fixed-size int definitions for the other Python C extensions. + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +*/ + +#include "cint.h" + +uint8_t le_u8_to_cpu(const unsigned char *buf) { + return (uint8_t)buf[0]; +} +uint8_t be_u8_to_cpu(const unsigned char *buf) { + return (uint8_t)buf[0]; +} +uint16_t le_u16_to_cpu(const unsigned char *buf) { + return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8); +} +uint16_t be_u16_to_cpu(const unsigned char *buf) { + return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8); +} +uint32_t le_u32_to_cpu(const unsigned char *buf) { + return ((uint32_t)buf[0]) | (((uint32_t)buf[1]) << 8) | (((uint32_t)buf[2]) << 16) | (((uint32_t)buf[3]) << 24); +} +uint32_t be_u32_to_cpu(const unsigned char *buf) { + return ((uint32_t)buf[3]) | (((uint32_t)buf[2]) << 8) | (((uint32_t)buf[1]) << 16) | (((uint32_t)buf[0]) << 24); +} + +int8_t le_s8_to_cpu(const unsigned char *buf) { + return (uint8_t)buf[0]; +} +int8_t be_s8_to_cpu(const unsigned char *buf) { + return (uint8_t)buf[0]; +} +int16_t le_s16_to_cpu(const unsigned char *buf) { + return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8); +} +int16_t be_s16_to_cpu(const unsigned char *buf) { + return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8); +} +int32_t le_s32_to_cpu(const unsigned char *buf) { + return ((uint32_t)buf[0]) | (((uint32_t)buf[1]) << 8) | (((uint32_t)buf[2]) << 16) | (((uint32_t)buf[3]) << 24); +} +int32_t be_s32_to_cpu(const unsigned char *buf) { + return ((uint32_t)buf[3]) | (((uint32_t)buf[2]) << 8) | (((uint32_t)buf[1]) << 16) | (((uint32_t)buf[0]) << 24); +} + +void cpu_to_le_u8(unsigned char *buf, uint8_t val) { + buf[0] = (val & 0xFF); +} +void cpu_to_be_u8(uint8_t *buf, uint8_t val) { + buf[0] = (val & 0xFF); +} +void cpu_to_le_u16(unsigned char *buf, uint16_t val) { + buf[0] = (val & 0x00FF); + buf[1] = (val & 0xFF00) >> 8; +} +void cpu_to_be_u16(uint8_t *buf, uint16_t val) { + buf[0] = (val & 0xFF00) >> 8; + buf[1] = (val & 0x00FF); +} +void cpu_to_le_u32(unsigned char *buf, uint32_t val) { + buf[0] = (val & 0x000000FF); + buf[1] = (val & 0x0000FF00) >> 8; + buf[2] = (val & 0x00FF0000) >> 16; + buf[3] = (val & 0xFF000000) >> 24; +} +void cpu_to_be_u32(uint8_t *buf, uint32_t val) { + buf[0] = (val & 0xFF000000) >> 24; + buf[1] = (val & 0x00FF0000) >> 16; + buf[2] = (val & 0x0000FF00) >> 8; + buf[3] = (val & 0x000000FF); +} + +void cpu_to_le_s8(unsigned char *buf, int8_t val) { + buf[0] = (val & 0xFF); +} +void cpu_to_be_s8(uint8_t *buf, int8_t val) { + buf[0] = (val & 0xFF); +} +void cpu_to_le_s16(unsigned char *buf, int16_t val) { + buf[0] = (val & 0x00FF); + buf[1] = (val & 0xFF00) >> 8; +} +void cpu_to_be_s16(uint8_t *buf, int16_t val) { + buf[0] = (val & 0xFF00) >> 8; + buf[1] = (val & 0x00FF); +} +void cpu_to_le_s32(unsigned char *buf, int32_t val) { + buf[0] = (val & 0x000000FF); + buf[1] = (val & 0x0000FF00) >> 8; + buf[2] = (val & 0x00FF0000) >> 16; + buf[3] = (val & 0xFF000000) >> 24; +} +void cpu_to_be_s32(uint8_t *buf, int32_t val) { + buf[0] = (val & 0xFF000000) >> 24; + buf[1] = (val & 0x00FF0000) >> 16; + buf[2] = (val & 0x0000FF00) >> 8; + buf[3] = (val & 0x000000FF); +} + diff --git a/aeneas/cint/cint.h b/aeneas/cint/cint.h new file mode 100644 index 00000000..a8abed0e --- /dev/null +++ b/aeneas/cint/cint.h @@ -0,0 +1,58 @@ +/* + +Portable fixed-size int definitions for the other Python C extensions. + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +*/ + +#ifdef _MSC_VER +typedef __int8 int8_t; +typedef __int16 int16_t; +typedef __int32 int32_t; +typedef __int64 int64_t; +typedef unsigned __int8 uint8_t; +typedef unsigned __int16 uint16_t; +typedef unsigned __int32 uint32_t; +typedef unsigned __int64 uint64_t; +#else +#include +#endif + +uint8_t le_u8_to_cpu(const unsigned char *buf); +uint8_t be_u8_to_cpu(const unsigned char *buf); +uint16_t le_u16_to_cpu(const unsigned char *buf); +uint16_t be_u16_to_cpu(const unsigned char *buf); +uint32_t le_u32_to_cpu(const unsigned char *buf); +uint32_t be_u32_to_cpu(const unsigned char *buf); + +int8_t le_s8_to_cpu(const unsigned char *buf); +int8_t be_s8_to_cpu(const unsigned char *buf); +int16_t le_s16_to_cpu(const unsigned char *buf); +int16_t be_s16_to_cpu(const unsigned char *buf); +int32_t le_s32_to_cpu(const unsigned char *buf); +int32_t be_s32_to_cpu(const unsigned char *buf); + +void cpu_to_le_u8(unsigned char *buf, uint8_t val); +void cpu_to_be_u8(unsigned char *buf, uint8_t val); +void cpu_to_le_u16(unsigned char *buf, uint16_t val); +void cpu_to_be_u16(unsigned char *buf, uint16_t val); +void cpu_to_le_u32(unsigned char *buf, uint32_t val); +void cpu_to_be_u32(unsigned char *buf, uint32_t val); + +void cpu_to_le_s8(unsigned char *buf, int8_t val); +void cpu_to_be_s8(unsigned char *buf, int8_t val); +void cpu_to_le_s16(unsigned char *buf, int16_t val); +void cpu_to_be_s16(unsigned char *buf, int16_t val); +void cpu_to_le_s32(unsigned char *buf, int32_t val); +void cpu_to_be_s32(unsigned char *buf, int32_t val); + diff --git a/aeneas/cmfcc/000_compile_driver.sh b/aeneas/cmfcc/000_compile_driver.sh new file mode 100644 index 00000000..63fc926d --- /dev/null +++ b/aeneas/cmfcc/000_compile_driver.sh @@ -0,0 +1,8 @@ +#!/bin/bash + +gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_wo_fo -Wall -pedantic -std=c99 -lm +gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_wo_ff -Wall -pedantic -std=c99 -lm -lrfftw -lfftw -DUSE_FFTW +gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_ws_fo -Wall -pedantic -std=c99 -lm -lsndfile -DUSE_SNDFILE +gcc cmfcc_driver.c cmfcc_func.c cwave_func.c cint.c -o cmfcc_driver_ws_ff -Wall -pedantic -std=c99 -lm -lsndfile -lrfftw -lfftw -DUSE_SNDFILE -DUSE_FFTW + + diff --git a/aeneas/cmfcc/100_run_driver.sh b/aeneas/cmfcc/100_run_driver.sh new file mode 100644 index 00000000..6c3fdb8b --- /dev/null +++ b/aeneas/cmfcc/100_run_driver.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +if [ ! -e cmfcc_driver_wo_fo ] +then + bash 000_compile_driver.sh +fi + +echo "Run 1" +./cmfcc_driver_wo_fo +echo "" + +echo "Run 2" +./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.dt.bin data text +echo "" + +echo "Run 3" +./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.db.bin data binary +echo "" + +echo "Run 4" +./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.ft.bin file text +echo "" + +echo "Run 5" +./cmfcc_driver_wo_fo ../tools/res/audio.wav /tmp/out.fb.bin file binary +echo "" + + + diff --git a/aeneas/cmfcc/800_compile_py.sh b/aeneas/cmfcc/800_compile_py.sh new file mode 100644 index 00000000..94186b22 --- /dev/null +++ b/aeneas/cmfcc/800_compile_py.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +rm -rf build *.so +python cmfcc_setup.py build_ext --inplace + diff --git a/aeneas/cmfcc/README.md b/aeneas/cmfcc/README.md new file mode 100644 index 00000000..63244e2d --- /dev/null +++ b/aeneas/cmfcc/README.md @@ -0,0 +1,22 @@ +# aeneas.cmfcc + +**aeneas.cmfcc** is a Python C extension to extract MFCCs from a WAVE mono file. + +## API + +See the [__init__.py](__init__.py) file. + +## Compiling the Python C extension locally + +```bash +$ python cmfcc_setup.py build_ext --inplace +``` + +## Compiling the pure C driver program + +```bash +$ bash 000_compile_driver.sh +``` + + + diff --git a/aeneas/cmfcc/__init__.py b/aeneas/cmfcc/__init__.py index 4cddbc42..16ea6cb7 100644 --- a/aeneas/cmfcc/__init__.py +++ b/aeneas/cmfcc/__init__.py @@ -2,7 +2,53 @@ # coding=utf-8 """ -aeneas.cmfcc is a Python C extension to extract MFCCs from a wave file +aeneas.cmfcc is a Python C Extension for computing the MFCCs from a WAVE mono file. + +.. function:: cmfcc.compute_from_data(data, sample_rate, filter_bank_size, mfcc_size, fft_order, lower_frequency, upper_frequency, emphasis_factor, window_length, window_shift) + + Compute MFCCs for a given WAVE mono file, + passed as a NumPy 1D array of ``float64`` values in ``[-1.0, 1.0]``. + + The returned tuple ``(mfcc, length, sr)`` contains + the MFCCs as a NumPy 2D matrix of shape ``(n, mfcc_size)``, + and the number of samples and sample rate of the WAVE file. + + The last two elements ``length`` and ``sr`` + are returned to make the signature of this function + consistent with that of function :func:`cmfcc.compute_from_file`. + + :param data: the audio data + :type data: :class:`numpy.ndarray` (1D) + :param int sample_rate: the audio sample rate + :param int filter_bank_size: the number of Mel filters + :param int mfcc_size: the number of MFCC coefficients + :param int fft_order: the order of the FFT + :param float lower_frequency: the lower frequency to cut, in Hz + :param float upper_frequency: the upper frequency to cut, in Hz + :param float emphasis_factor: the pre-emphasis factor + :param float window_length: the length of the MFCC window, in seconds + :param float window_shift: the shift of the MFCC window, in seconds + :rtype: tuple + +.. function:: cmfcc.compute_from_file(audio_file_path, filter_bank_size, mfcc_size, fft_order, lower_frequency, upper_frequency, emphasis_factor, window_length, window_shift) + + Compute MFCCs for a given WAVE mono file, + passed as a file path on disk. + + The returned tuple ``(mfcc, length, sr)`` contains + the MFCCs as a NumPy 2D matrix of shape ``(n, mfcc_size)``, + and the number of samples and sample rate of the WAVE file. + + :param string audio_file_path: the path of the WAVE file to be created, UTF-8 encoded + :param int filter_bank_size: the number of Mel filters + :param int mfcc_size: the number of MFCC coefficients + :param int fft_order: the order of the FFT + :param float lower_frequency: the lower frequency to cut, in Hz + :param float upper_frequency: the upper frequency to cut, in Hz + :param float emphasis_factor: the pre-emphasis factor + :param float window_length: the length of the MFCC window, in seconds + :param float window_shift: the shift of the MFCC window, in seconds + :rtype: tuple """ __author__ = "Alberto Pettarin" @@ -12,7 +58,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL 3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" diff --git a/aeneas/cmfcc/cint.c b/aeneas/cmfcc/cint.c new file mode 120000 index 00000000..8e1c9dae --- /dev/null +++ b/aeneas/cmfcc/cint.c @@ -0,0 +1 @@ +../cint/cint.c \ No newline at end of file diff --git a/aeneas/cmfcc/cint.h b/aeneas/cmfcc/cint.h new file mode 120000 index 00000000..27a6bb39 --- /dev/null +++ b/aeneas/cmfcc/cint.h @@ -0,0 +1 @@ +../cint/cint.h \ No newline at end of file diff --git a/aeneas/cmfcc/cmfcc_driver.c b/aeneas/cmfcc/cmfcc_driver.c index 4ff40ea7..cd87a2f5 100644 --- a/aeneas/cmfcc/cmfcc_driver.c +++ b/aeneas/cmfcc/cmfcc_driver.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for computing the MFCCs from a WAVE mono file. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -26,26 +26,21 @@ __status__ = "Production" #include "cwave_func.h" #endif -// -// this is a simple driver to test on the command line -// -// you can compile it with: sndfile (s) or cwave_func (o) for reading WAVE info and data, -// and with: fftw (f) or cmfcc_func (o) for computing the RFFT -// -// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_wo_fo -lm -// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_wo_ff -lm -lrfftw -lfftw -DUSE_FFTW -// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_ws_fo -lm -lsndfile -DUSE_SNDFILE -// $ gcc cmfcc_driver.c cmfcc_func.c cwave_func.c -o cmfcc_driver_ws_ff -lm -lsndfile -lrfftw -lfftw -DUSE_SNDFILE -DUSE_FFTW -// -// use it as follows: -// -// ./cmfcc_driver_X audio.wav out.mfcc data text => load file in RAM, then compute MFCC, output text -// ./cmfcc_driver_X audio.wav out.mfcc file text => compute MFCC directly from file, output text -// ./cmfcc_driver_X audio.wav out.mfcc data binary => load file in RAM, then compute MFCC, output binary -// ./cmfcc_driver_X audio.wav out.mfcc file binary => compute MFCC directly from file, output binary -// -// where X is wo_fo|wo_ff|ws_fo|ws_ff as described above -// +#define DRIVER_SUCCESS 0 +#define DRIVER_FAILURE 1 + +// print usage +void _usage(const char *prog) { + printf("\n"); + printf("Usage: %s AUDIO_FILE.wav OUTPUT.bin [data|file] [text|binary]\n", prog); + printf("\n"); + printf("Example: %s ../tools/res/audio.wav /tmp/out.dt.bin data text\n", prog); + printf(" %s ../tools/res/audio.wav /tmp/out.db.bin data binary\n", prog); + printf(" %s ../tools/res/audio.wav /tmp/out.ft.bin file text\n", prog); + printf(" %s ../tools/res/audio.wav /tmp/out.fb.bin file binary\n", prog); + printf("\n"); +} + int main(int argc, char **argv) { #if USE_SNDFILE @@ -58,22 +53,23 @@ int main(int argc, char **argv) { char *audio_file_name, *output_file_name, *mode, *output_format; double *data_ptr, *mfcc_ptr; - unsigned int data_length, sample_rate, mfcc_length; FILE *output_file; - unsigned int i, j; + uint32_t sample_rate; + uint32_t data_length, mfcc_length; + uint32_t i, j; - const unsigned int filter_bank_size = 40; - const unsigned int mfcc_size = 13; - const unsigned int fft_order = 512; + const uint32_t filter_bank_size = 40; + const uint32_t mfcc_size = 13; + const uint32_t fft_order = 512; const double lower_frequency = 133.3333; const double upper_frequency = 6855.4976; const double emphasis_factor = 0.97; - const double window_length = 0.025; - const double window_shift = 0.010; + const double window_length = 0.100; + const double window_shift = 0.040; if (argc < 5) { - printf("\nUsage: %s AUDIO_FILE.wav OUTPUT.bin [data|file] [text|binary]\n\n", argv[0]); - return 1; + _usage(argv[0]); + return DRIVER_FAILURE; } audio_file_name = argv[1]; output_file_name = argv[2]; @@ -102,7 +98,7 @@ int main(int argc, char **argv) { if (!(audio_file = sf_open(audio_file_name, SFM_READ, &audio_info))) { printf("Error: unable to open input file %s.\n", audio_file_name); puts(sf_strerror(NULL)); - return 1; + return DRIVER_FAILURE; } data_length = audio_info.frames; sample_rate = audio_info.samplerate; @@ -110,10 +106,9 @@ int main(int argc, char **argv) { sf_read_double(audio_file, data_ptr, audio_info.frames); sf_close(audio_file); #else - memset(&audio_info, 0, sizeof(audio_info)); if (!(audio_file = wave_open(audio_file_name, &audio_info))) { printf("Error: unable to open input file %s.\n", audio_file_name); - return 1; + return DRIVER_FAILURE; } data_length = audio_info.coNumSamples; sample_rate = audio_info.leSampleRate; @@ -193,6 +188,6 @@ int main(int argc, char **argv) { free((void *)mfcc_ptr); mfcc_ptr = NULL; - return 0; + return DRIVER_SUCCESS; } diff --git a/aeneas/cmfcc/cmfcc_func.c b/aeneas/cmfcc/cmfcc_func.c index 2f11741c..7a9c1b7f 100644 --- a/aeneas/cmfcc/cmfcc_func.c +++ b/aeneas/cmfcc/cmfcc_func.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for computing the MFCCs from a WAVE mono file. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -34,7 +34,7 @@ __status__ = "Production" #endif // return the min of the given arguments -unsigned int _min(unsigned int a, unsigned int b) { +uint32_t _min(uint32_t a, uint32_t b) { if (a < b) { return a; } @@ -42,7 +42,7 @@ unsigned int _min(unsigned int a, unsigned int b) { } // return the max of the given arguments -unsigned int _max(unsigned int a, unsigned int b) { +uint32_t _max(uint32_t a, uint32_t b) { if (a > b) { return a; } @@ -50,24 +50,26 @@ unsigned int _max(unsigned int a, unsigned int b) { } // round the given number to the nearest integer -// or return 0 if the argument is negative +// or return zero if the argument is negative // e.g.: 1.1 => 1; 1.6 => 2 -unsigned int _round(double x) { - if (x <= 0) { - //printf("Error: _round argument is negative!!!\n"); - return 0; +uint32_t _round(double x) { + if (x < 0.0) { + return 0; //printf("Error: _round argument is negative!!!\n"); } - return (unsigned int)floor(x + 0.5); + return (uint32_t)floor(x + 0.5); } // precompute the sin table for the FFT/RFFT -double *_precompute_sin_table(unsigned int m) { +double *_precompute_sin_table(uint32_t m) { const double arg = PI / m * 2; - const unsigned int size = m - m / 4 + 1; + const uint32_t size = m - m / 4 + 1; double *table; int k; table = (double *)calloc(size, sizeof(double)); + if (table == NULL) { + return NULL; + } table[0] = 0; for (k = 1; k < size; ++k) { table[k] = sin(arg * k); @@ -75,7 +77,7 @@ double *_precompute_sin_table(unsigned int m) { table[m / 2] = 0; return table; } -int fft(double *x, double *y, const unsigned int m, double *sin_table) { +int fft(double *x, double *y, const uint32_t m, double *sin_table) { // code adapted from the fft function of SPTK double t1, t2; double *cosp, *sinp, *xp, *yp; @@ -146,12 +148,12 @@ int fft(double *x, double *y, const unsigned int m, double *sin_table) { yp = y + j; } - return 0; + return CMFCC_SUCCESS; } int rfft( double *x, double *y, - const unsigned int m, + const uint32_t m, double *sin_table_full, double *sin_table_half ) { @@ -169,7 +171,7 @@ int rfft( } if (fft(x, y, mv2, sin_table_half) == -1) { - return -1; + return CMFCC_FAILURE; } sinp = sin_table_full; @@ -202,7 +204,7 @@ int rfft( *yp++ = -(*(--yq)); } - return 0; + return CMFCC_SUCCESS; } // convert Hz frequency to Mel frequency @@ -217,18 +219,21 @@ double _mel2hz(const double m) { // pre emphasis of the given frame // returns the prior to be used for the next frame -void _apply_emphasis( +int _apply_emphasis( double *frame, - const unsigned int length, + const uint32_t length, const double emphasis_factor, double *prior ) { double prior_orig; double *frame_orig; - unsigned int i; + uint32_t i; prior_orig = frame[length - 1]; frame_orig = (double *)calloc(length, sizeof(double)); + if (frame_orig == NULL) { + return CMFCC_FAILURE; + } memcpy(frame_orig, frame, length * sizeof(double)); frame[0] = frame_orig[0] - emphasis_factor * (*prior); for (i = 1; i < length; ++i) { @@ -237,23 +242,27 @@ void _apply_emphasis( free((void *)frame_orig); frame_orig = NULL; *prior = prior_orig; + return CMFCC_SUCCESS; } // own code // compute the power of the given frame -void _compute_power( +int _compute_power( double *frame, // it has length == fft_order double *power, // power has length == (fft_order / 2) + 1 - const unsigned int fft_order, + const uint32_t fft_order, double *sin_table_full, double *sin_table_half ) { double *tmp; - unsigned int k; - const unsigned int n = fft_order; // length of the I/O vectors - const unsigned int m = (fft_order / 2) + 1; // length of power + uint32_t k; + const uint32_t n = fft_order; // length of the I/O vectors + const uint32_t m = (fft_order / 2) + 1; // length of power tmp = (double *)calloc(n + m, sizeof(double)); + if (tmp == NULL) { + return CMFCC_FAILURE; + } rfft(frame, tmp, fft_order, sin_table_full, sin_table_half); power[0] = frame[0] * frame[0]; for (k = 1; k < m; ++k) { @@ -261,24 +270,28 @@ void _compute_power( } free((void *)tmp); tmp = NULL; + return CMFCC_SUCCESS; } #ifdef USE_FFTW // fftw code // compute the power of the given frame -void _compute_power_fftw( +int _compute_power_fftw( double *frame, // it has length == fft_order double *power, // power has length == (fft_order / 2) + 1 - const unsigned int fft_order, + const uint32_t fft_order, rfftw_plan plan ) { - unsigned int k; + uint32_t k; double *out; - const unsigned int n = fft_order; // length of the I/O vectors - //const unsigned int m = (fft_order / 2) + 1; // length of power + const uint32_t n = fft_order; // length of the I/O vectors + //const uint32_t m = (fft_order / 2) + 1; // length of power out = (double *)calloc(n, sizeof(double)); + if (out == NULL) { + return CMFCC_FAILURE; + } rfftw_one(plan, frame, out); power[0] = out[0] * out[0]; for (k = 1; k < (n+1)/2; ++k) { @@ -289,27 +302,32 @@ void _compute_power_fftw( } free((void *)out); out = NULL; + return CMFCC_SUCCESS; } #endif // transform the frame using the Hamming window -void _apply_hamming( +int _apply_hamming( double *frame, - const unsigned int frame_length, + const uint32_t frame_length, double *coefficients ) { - unsigned int k; + uint32_t k; for (k = 0; k < frame_length; ++k) { frame[k] *= coefficients[k]; } + return CMFCC_SUCCESS; } -double *_precompute_hamming(const unsigned int frame_length) { +double *_precompute_hamming(const uint32_t frame_length) { const double arg = PI_2 / (frame_length - 1); double *coefficients; - unsigned int k; + uint32_t k; coefficients = (double *)calloc(frame_length, sizeof(double)); + if (coefficients == NULL) { + return NULL; + } for (k = 0; k < frame_length; ++k) { coefficients[k] = (0.54 - 0.46 * cos(k * arg)); } @@ -319,9 +337,9 @@ double *_precompute_hamming(const unsigned int frame_length) { // create Mel filter bank // return a pointer to a 2D matrix (filters_n x filter_bank_size) double *_create_mel_filter_bank( - unsigned int fft_order, - unsigned int filter_bank_size, - unsigned int sample_rate, + uint32_t fft_order, + uint32_t filter_bank_size, + uint32_t sample_rate, double upper_frequency, double lower_frequency ) { @@ -329,28 +347,34 @@ double *_create_mel_filter_bank( const double melmax = _hz2mel(upper_frequency); const double melmin = _hz2mel(lower_frequency); const double melstep = (melmax - melmin) / (filter_bank_size + 1); - const unsigned int filter_edge_length = filter_bank_size + 2; - const unsigned int filters_n = (fft_order / 2) + 1; + const uint32_t filter_edge_length = filter_bank_size + 2; + const uint32_t filters_n = (fft_order / 2) + 1; double *filter_edges, *filters; - unsigned int k; + uint32_t k; // filter bank filters = (double *)calloc(filters_n * filter_bank_size, sizeof(double)); + if (filters == NULL) { + return NULL; + } // filter edges filter_edges = (double *)calloc(filter_edge_length, sizeof(double)); + if (filter_edges == NULL) { + return NULL; + } for (k = 0; k < filter_edge_length; ++k) { filter_edges[k] = _mel2hz(melmin + melstep * k); } for (k = 0; k < filter_bank_size; ++k) { - const unsigned int left_frequency = _round(filter_edges[k] / step_frequency); - const unsigned int center_frequency = _round(filter_edges[k + 1] / step_frequency); - const unsigned int right_frequency = _round(filter_edges[k + 2] / step_frequency); + const uint32_t left_frequency = _round(filter_edges[k] / step_frequency); + const uint32_t center_frequency = _round(filter_edges[k + 1] / step_frequency); + const uint32_t right_frequency = _round(filter_edges[k + 2] / step_frequency); const double width_frequency = (right_frequency - left_frequency) * step_frequency; const double height_frequency = 2.0 / width_frequency; double left_slope, right_slope; - unsigned int current_frequency; + uint32_t current_frequency; left_slope = 0.0; if (center_frequency != left_frequency) { @@ -381,11 +405,14 @@ double *_create_mel_filter_bank( // create the DCT matrix // return a pointer to a 2D matrix (mfcc_size x filter_bank_size) -double *_create_dct_matrix(unsigned int mfcc_size, unsigned int filter_bank_size) { +double *_create_dct_matrix(uint32_t mfcc_size, uint32_t filter_bank_size) { double *s2dct; - unsigned int i, j; + uint32_t i, j; s2dct = (double *)calloc(mfcc_size * filter_bank_size, sizeof(double)); + if (s2dct == NULL) { + return NULL; + } for (i = 0; i < mfcc_size; ++i) { const double frequency = PI * i / filter_bank_size; for (j = 0; j < filter_bank_size; ++j) { @@ -404,26 +431,26 @@ int _compute_mfcc( double *data_ptr, FILE *audio_file_ptr, struct WAVE_INFO header, - const unsigned int data_length, - const unsigned int sample_rate, - const unsigned int filter_bank_size, - const unsigned int mfcc_size, - const unsigned int fft_order, + const uint32_t data_length, + const uint32_t sample_rate, + const uint32_t filter_bank_size, + const uint32_t mfcc_size, + const uint32_t fft_order, const double lower_frequency, const double upper_frequency, const double emphasis_factor, const double window_length, const double window_shift, double **mfcc_ptr, - unsigned int *mfcc_length + uint32_t *mfcc_length ) { double *filters, *s2dct, *sin_table_full, *sin_table_half, *hamming_coefficients; double *frame, *power, *logsp; double prior, acc; - unsigned int i, j, filters_n; - unsigned int frame_length, frame_shift, frame_length_padded; - unsigned int number_of_frames, frame_index, frame_start, frame_end; + uint32_t filters_n, frame_length, frame_shift, frame_length_padded; + uint32_t number_of_frames, frame_index, frame_start, frame_end; + uint32_t i, j; #if USE_FFTW rfftw_plan plan; @@ -431,7 +458,7 @@ int _compute_mfcc( if (upper_frequency > (sample_rate / 2.0)) { // upper frequency exceeds Nyquist - return 0; + return CMFCC_FAILURE; } #if USE_FFTW @@ -447,34 +474,43 @@ int _compute_mfcc( sample_rate, upper_frequency, lower_frequency); + if (filters == NULL) { + return CMFCC_FAILURE; + } // compute DCT matrix s2dct = _create_dct_matrix(mfcc_size, filter_bank_size); + if (s2dct == NULL) { + return CMFCC_FAILURE; + } // length of a frame, in samples - frame_length = (unsigned int)floor(window_length * sample_rate); + frame_length = (uint32_t)floor(window_length * sample_rate); frame_length_padded = _max(frame_length, fft_order); // shift of a frame, in samples - frame_shift = (unsigned int)floor(window_shift * sample_rate); + frame_shift = (uint32_t)floor(window_shift * sample_rate); // value of the last sample in the previous frame prior = 0.0; // number of frames - number_of_frames = (unsigned int)floor(1.0 * data_length / frame_shift); + number_of_frames = (uint32_t)floor(1.0 * data_length / frame_shift); *mfcc_length = number_of_frames; // allocate the mfcc matrix *mfcc_ptr = (double *)calloc(number_of_frames * mfcc_size, sizeof(double)); + if ((*mfcc_ptr) == NULL) { + return CMFCC_FAILURE; + } - // precompute sin tables + // precompute sin tables and hamming coefficients sin_table_full = _precompute_sin_table(fft_order); sin_table_half = _precompute_sin_table(fft_order / 2); - - // precompute hamming coefficients hamming_coefficients = _precompute_hamming(frame_length); - + if ((sin_table_full == NULL) || (sin_table_half == NULL) || (hamming_coefficients == NULL)) { + return CMFCC_FAILURE; + } //printf("Frame length: %d\n", frame_length); //printf("Frame shift: %d\n", frame_shift); //printf("Frame length padded: %d\n", frame_length_padded); @@ -489,6 +525,9 @@ int _compute_mfcc( frame = (double *)calloc(frame_length_padded, sizeof(double)); power = (double *)calloc(filters_n, sizeof(double)); logsp = (double *)calloc(filter_bank_size, sizeof(double)); + if ((frame == NULL) || (power == NULL) || (logsp == NULL)) { + return CMFCC_FAILURE; + } // process frames for (frame_index = 0; frame_index < number_of_frames; ++frame_index) { @@ -507,7 +546,9 @@ int _compute_mfcc( frame_start = frame_index * frame_shift; frame_end = _min(frame_start + frame_length, data_length); if (data_ptr == NULL) { - wave_read_double(audio_file_ptr, &header, frame, frame_start, (frame_end - frame_start)); + if (wave_read_double(audio_file_ptr, &header, frame, frame_start, (frame_end - frame_start)) != CWAVE_SUCCESS) { + return CMFCC_FAILURE; + } } else { memcpy(frame, data_ptr + frame_start, (frame_end - frame_start) * sizeof(double)); } @@ -515,15 +556,23 @@ int _compute_mfcc( //printf("Frame %d : %d -> %d\n", frame_index, frame_start, frame_end); // emphasis + hamming + compute power - _apply_emphasis(frame, frame_length, emphasis_factor, &prior); - _apply_hamming(frame, frame_length, hamming_coefficients); + if (_apply_emphasis(frame, frame_length, emphasis_factor, &prior) != CMFCC_SUCCESS) { + return CMFCC_FAILURE; + } + if (_apply_hamming(frame, frame_length, hamming_coefficients) != CMFCC_SUCCESS) { + return CMFCC_FAILURE; + } #ifdef USE_FFTW // fftw code - _compute_power_fftw(frame, power, fft_order, plan); + if (_compute_power_fftw(frame, power, fft_order, plan) != CMFCC_SUCCESS) { + return CMFCC_FAILURE; + } #else // own code - _compute_power(frame, power, fft_order, sin_table_full, sin_table_half); + if (_compute_power(frame, power, fft_order, sin_table_full, sin_table_half) != CMFCC_SUCCESS) { + return CMFCC_FAILURE; + } #endif // apply Mel filter bank @@ -568,25 +617,24 @@ int _compute_mfcc( sin_table_full = NULL; s2dct = NULL; filters = NULL; - - return 1; + return CMFCC_SUCCESS; } // compute MFCC from data loaded in RAM int compute_mfcc_from_data( double *data_ptr, - const unsigned int data_length, - const unsigned int sample_rate, - const unsigned int filter_bank_size, - const unsigned int mfcc_size, - const unsigned int fft_order, + const uint32_t data_length, + const uint32_t sample_rate, + const uint32_t filter_bank_size, + const uint32_t mfcc_size, + const uint32_t fft_order, const double lower_frequency, const double upper_frequency, const double emphasis_factor, const double window_length, const double window_shift, double **mfcc_ptr, - unsigned int *mfcc_length + uint32_t *mfcc_length ) { // to keep the compile happy, it will never be used @@ -614,41 +662,42 @@ int compute_mfcc_from_data( // compute MFCC from file on disk int compute_mfcc_from_file( char *audio_file_path, - const unsigned int filter_bank_size, - const unsigned int mfcc_size, - const unsigned int fft_order, + const uint32_t filter_bank_size, + const uint32_t mfcc_size, + const uint32_t fft_order, const double lower_frequency, const double upper_frequency, const double emphasis_factor, const double window_length, const double window_shift, - unsigned int *data_length_ret, - unsigned int *sample_rate_ret, + uint32_t *data_length, + uint32_t *sample_rate, double **mfcc_ptr, - unsigned int *mfcc_length + uint32_t *mfcc_length ) { FILE *audio_file_ptr; struct WAVE_INFO header; - unsigned int data_length, sample_rate; + uint32_t sample_rate_loc; + uint32_t data_length_loc; int ret; // open file - memset(&header, 0, sizeof(header)); - if (! (audio_file_ptr = wave_open(audio_file_path, &header))) { + audio_file_ptr = wave_open(audio_file_path, &header); + if (audio_file_ptr == NULL) { //printf("Error: cannot open file\n"); - return 0; + return CMFCC_FAILURE; } - data_length = header.coNumSamples; - sample_rate = header.leSampleRate; + data_length_loc = header.coNumSamples; + sample_rate_loc = header.leSampleRate; // compute mfcc ret = _compute_mfcc( NULL, audio_file_ptr, header, - data_length, - sample_rate, + data_length_loc, + sample_rate_loc, filter_bank_size, mfcc_size, fft_order, @@ -663,11 +712,11 @@ int compute_mfcc_from_file( // close file wave_close(audio_file_ptr); - *data_length_ret = data_length; - *sample_rate_ret = sample_rate; + *data_length = data_length_loc; + *sample_rate = sample_rate_loc; return ret; -}; +} diff --git a/aeneas/cmfcc/cmfcc_func.h b/aeneas/cmfcc/cmfcc_func.h index ba80bfc6..e6aa89b2 100644 --- a/aeneas/cmfcc/cmfcc_func.h +++ b/aeneas/cmfcc/cmfcc_func.h @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for computing the MFCCs from a WAVE mono file. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,47 +9,48 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" */ -// NOTE: using unsigned int as it is 32-bit wide on all modern architectures -// not using uint32_t because the MS C compiler does not have -// or, at least, it is not easy to use it +#include "cint.h" + +#define CMFCC_SUCCESS 0 +#define CMFCC_FAILURE 1 // compute MFCC from data loaded in RAM int compute_mfcc_from_data( double *data_ptr, - const unsigned int data_length, - const unsigned int sample_rate, - const unsigned int filter_bank_size, - const unsigned int mfcc_size, - const unsigned int fft_order, + const uint32_t data_length, + const uint32_t sample_rate, + const uint32_t filter_bank_size, + const uint32_t mfcc_size, + const uint32_t fft_order, const double lower_frequency, const double upper_frequency, const double emphasis_factor, const double window_length, const double window_shift, double **mfcc_ptr, - unsigned int *mfcc_length + uint32_t *mfcc_length ); // compute MFCC from file on disk int compute_mfcc_from_file( char *audio_file_path, - const unsigned int filter_bank_size, - const unsigned int mfcc_size, - const unsigned int fft_order, + const uint32_t filter_bank_size, + const uint32_t mfcc_size, + const uint32_t fft_order, const double lower_frequency, const double upper_frequency, const double emphasis_factor, const double window_length, const double window_shift, - unsigned int *data_length_ret, - unsigned int *sample_rate_ret, + uint32_t *data_length, + uint32_t *sample_rate, double **mfcc_ptr, - unsigned int *mfcc_length + uint32_t *mfcc_length ); diff --git a/aeneas/cmfcc/cmfcc_py.c b/aeneas/cmfcc/cmfcc_py.c index 59319b97..b65df710 100644 --- a/aeneas/cmfcc/cmfcc_py.c +++ b/aeneas/cmfcc/cmfcc_py.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for computing the MFCCs from a WAVE mono file. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -25,30 +25,26 @@ __status__ = "Production" #include "cmfcc_func.h" // compute the MFCCs of the given audio data (mono) -// take the PyObject containing the following arguments (see below) -// and return the MFCCs as a n x mfcc_size 2D array of double, where -// - n is the number of frames -// - mfcc_size is the number of ceptral coefficients (including the 0-th) static PyObject *compute_from_data(PyObject *self, PyObject *args) { - PyObject *data_raw; // 1D array of double, holding the data - unsigned int sample_rate; // sample rate (default: 16000) - unsigned int filter_bank_size; // number of filters in the filter bank (default: 40) - unsigned int mfcc_size; // number of ceptral coefficients (default: 13) - unsigned int fft_order; // FFT order; must be a power of 2 (default: 512) - double lower_frequency; // lower frequency (default: 133.3333) - double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976) - double emphasis_factor; // pre-emphasis factor (default: 0.97) - double window_length; // window length (default: 0.0250) - double window_shift; // window shift (default: 0.010) + PyObject *data_raw; // 1D array of double, holding the data + uint32_t sample_rate; // sample rate (default: 16000) + uint32_t filter_bank_size; // number of filters in the filter bank (default: 40) + uint32_t mfcc_size; // number of ceptral coefficients (default: 13) + uint32_t fft_order; // FFT order; must be a power of 2 (default: 512) + double lower_frequency; // lower frequency (default: 133.3333) + double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976) + double emphasis_factor; // pre-emphasis factor (default: 0.97) + double window_length; // window length (default: 0.0250) + double window_shift; // window shift (default: 0.010) PyObject *tuple; PyArrayObject *data, *mfcc; npy_intp mfcc_dimensions[2]; double *data_ptr, *mfcc_ptr; - unsigned int data_length, mfcc_length; + uint32_t data_length, mfcc_length; // O = object (do not convert or check for errors) - // I = unsigned integer + // I = uint32_teger // d = double if (!PyArg_ParseTuple( args, @@ -75,10 +71,10 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) { data_ptr = (double *)PyArray_DATA(data); // number of audio samples in data (= duration in seconds * sample_rate) - data_length = (unsigned int)PyArray_DIMS(data)[0]; + data_length = (uint32_t)PyArray_DIMS(data)[0]; // compute MFCC matrix - if (!compute_mfcc_from_data( + if (compute_mfcc_from_data( data_ptr, data_length, sample_rate, @@ -91,7 +87,7 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) { window_length, window_shift, &mfcc_ptr, - &mfcc_length) + &mfcc_length) != CMFCC_SUCCESS ) { // failed PyErr_SetString(PyExc_ValueError, "Error while calling compute_mfcc_from_data()"); @@ -115,30 +111,27 @@ static PyObject *compute_from_data(PyObject *self, PyObject *args) { return tuple; } -// compute the MFCCs of the given data -// take the PyObject containing the following arguments (see below) -// and return the MFCCs as a n x mfcc_size 2D array of double, where -// - n is the number of frames -// - mfcc_size is the number of ceptral coefficients (including the 0-th) +// compute the MFCCs of the given audio file static PyObject *compute_from_file(PyObject *self, PyObject *args) { - char *audio_file_path; // path of the WAVE file - unsigned int filter_bank_size; // number of filters in the filter bank (default: 40) - unsigned int mfcc_size; // number of ceptral coefficients (default: 13) - unsigned int fft_order; // FFT order; must be a power of 2 (default: 512) - double lower_frequency; // lower frequency (default: 133.3333) - double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976) - double emphasis_factor; // pre-emphasis factor (default: 0.97) - double window_length; // window length (default: 0.0250) - double window_shift; // window shift (default: 0.010) + char *audio_file_path; // path of the WAVE file + uint32_t filter_bank_size; // number of filters in the filter bank (default: 40) + uint32_t mfcc_size; // number of ceptral coefficients (default: 13) + uint32_t fft_order; // FFT order; must be a power of 2 (default: 512) + double lower_frequency; // lower frequency (default: 133.3333) + double upper_frequency; // upper frequency; must be <= sample_rate/2 = Nyquist frequency (default: 6855.4976) + double emphasis_factor; // pre-emphasis factor (default: 0.97) + double window_length; // window length (default: 0.0250) + double window_shift; // window shift (default: 0.010) PyObject *tuple; PyArrayObject *mfcc; npy_intp mfcc_dimensions[2]; double *mfcc_ptr; - unsigned int data_length, sample_rate, mfcc_length; + uint32_t sample_rate; + uint32_t data_length, mfcc_length; // s = string - // I = unsigned integer + // I = uint32_teger // d = double if (!PyArg_ParseTuple( args, @@ -158,7 +151,7 @@ static PyObject *compute_from_file(PyObject *self, PyObject *args) { } // compute MFCC matrix - if (!compute_mfcc_from_file( + if (compute_mfcc_from_file( audio_file_path, filter_bank_size, mfcc_size, @@ -171,7 +164,7 @@ static PyObject *compute_from_file(PyObject *self, PyObject *args) { &data_length, &sample_rate, &mfcc_ptr, - &mfcc_length) + &mfcc_length) != CMFCC_SUCCESS ) { // failed PyErr_SetString(PyExc_ValueError, "Error while calling compute_mfcc_from_file()"); @@ -197,13 +190,34 @@ static PyMethodDef cmfcc_methods[] = { "compute_from_data", compute_from_data, METH_VARARGS, - "Given the data from a mono PCM16 WAVE file, compute and return the MFCCs" + "Given the data from a mono PCM16 WAVE file, compute and return the MFCCs\n" + ":param object data_raw: numpy 1D array of float values, one per sample\n" + ":param uint sample_rate: the sample rate of the WAVE file\n" + ":param uint filter_bank_size: the number of MFCC filters\n" + ":param uint mfcc_size: the number of MFCCs\n" + ":param uint fft_order: the order of the FFT\n" + ":param float lower_frequency: cut below this frequency, in Hz\n" + ":param float upper_frequency: cut above this frequency, in Hz\n" + ":param float emphasis_factor: pre-amplify frames by this factor\n" + ":param float window_length: MFCC window lenght, in s\n" + ":param float window_shift: MFCC window shift, in s\n" + ":rtype: tuple (mfccs, data_length, sample_rate)" }, { "compute_from_file", compute_from_file, METH_VARARGS, - "Given the path of the mono PCM16 WAVE file, compute and return the MFCCs" + "Given the path of the mono PCM16 WAVE file, compute and return the MFCCs\n" + ":param string audio_file_path: the path of the audio file\n" + ":param uint filter_bank_size: the number of MFCC filters\n" + ":param uint mfcc_size: the number of MFCCs\n" + ":param uint fft_order: the order of the FFT\n" + ":param float lower_frequency: cut below this frequency, in Hz\n" + ":param float upper_frequency: cut above this frequency, in Hz\n" + ":param float emphasis_factor: pre-amplify frames by this factor\n" + ":param float window_length: MFCC window lenght, in s\n" + ":param float window_shift: MFCC window shift, in s\n" + ":rtype: tuple (mfccs, data_length, sample_rate)" }, { NULL, diff --git a/aeneas/cmfcc/cmfcc_setup.py b/aeneas/cmfcc/cmfcc_setup.py index b0c18122..dbc1fa1e 100644 --- a/aeneas/cmfcc/cmfcc_setup.py +++ b/aeneas/cmfcc/cmfcc_setup.py @@ -2,7 +2,7 @@ # coding=utf-8 """ -Compile the Python C Extension for computing the MFCCs. +Compile the Python C Extension for computing the MFCCs from a WAVE mono file. .. versionadded:: 1.1.0 """ @@ -23,15 +23,15 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -CMODULE = Extension("cmfcc", sources=["cmfcc_py.c", "cmfcc_func.c", "cwave_func.c"], include_dirs=[get_include()]) +CMODULE = Extension("cmfcc", sources=["cmfcc_py.c", "cmfcc_func.c", "cwave_func.c", "cint.c"], include_dirs=[get_include()]) setup( name="cmfcc", - version="1.4.1", + version="1.5.0", description=""" Python C Extension for computing the MFCCs as fast as your bare metal allows. """, diff --git a/aeneas/configurationobject.py b/aeneas/configuration.py similarity index 62% rename from aeneas/configurationobject.py rename to aeneas/configuration.py index 44dbb5d9..9f45193a 100644 --- a/aeneas/configurationobject.py +++ b/aeneas/configuration.py @@ -2,8 +2,13 @@ # coding=utf-8 """ -Basically a dictionary with a fixed set of keys, -with default values and aliases. +This module contains the following classes: + +* :class:`~aeneas.configuration.Configuration` + which is a dictionary with a fixed set of keys, + possibly with default values and key aliases. + +.. versionadded:: 1.4.1 """ from __future__ import absolute_import @@ -19,28 +24,35 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class ConfigurationObject(object): +class Configuration(object): """ - A structure representing a generic configuration object, that is, + A generic configuration object, that is, a dictionary with a fixed set of keys, - each with a default value, type, and possibly aliases. + each with a type, a default value, and possibly aliases. + + Keys are (unique) Unicode strings. - Values are stored as Unicode strings, and casted to int or float + Values are stored as Unicode strings (or ``None``), and casted + to the type of the field (``int``, ``float``, + ``bool``, :class:`~aeneas.timevalue.TimeValue`, etc.) when accessed. - :param config_string: the job configuration string - :type config_string: Unicode string + For ``bool`` keys, values listed in + :data:`~aeneas.configuration.Configuration.TRUE_ALIASES` + are considered equivalent to a ``True`` value. - :raises TypeError: if ``config_string`` is not ``None`` and - it is not a Unicode string - :raises KeyError: if trying to access a key not listed above - """ + If ``config_string`` is not ``None``, the given string will be parsed + and ``key=value`` pairs will be stored in the object, + provided that ``key`` is listed in :data:`~aeneas.configuration.Configuration.FIELDS`. - TAG = u"ConfigurationObject" + :param string config_string: the configuration string to be parsed + :raises: TypeError: if ``config_string`` is not ``None`` and it is not a Unicode string + :raises: KeyError: if trying to access a key not listed above + """ FIELDS = [ # @@ -49,13 +61,26 @@ class ConfigurationObject(object): # # examples: # (gc.FOO, (None, None, ["foo"])) - # (gc.BAR, (0.0, float, ["bar", "baz"])) + # (gc.BAR, (0.0, float, ["bar", "barrr"])) + # (gc.BAZ, (None, TimeValue, ["baz"])) # ] + """ + The fields, that is, key names each with associated + default value, type, and possibly aliases, + of this object. + """ + + TRUE_ALIASES = [True, u"TRUE", u"True", u"true", u"YES", u"Yes", u"yes", u"1", 1] + """ + Aliases for a ``True`` value for ``bool`` fields + """ + + TAG = u"Configuration" def __init__(self, config_string=None): if (config_string is not None) and (not gf.is_unicode(config_string)): - raise TypeError("config_string is not a Unicode string") + raise TypeError(u"config_string is not a Unicode string") # set dictionaries up to keep the config data self.data = {} @@ -70,7 +95,7 @@ def __init__(self, config_string=None): if config_string is not None: # strip leading/trailing " or ' characters - if (config_string[0] == config_string[-1]) and (config_string[0] in [u"\"", u"'"]): + if (len(config_string) > 0) and (config_string[0] == config_string[-1]) and (config_string[0] in [u"\"", u"'"]): config_string = config_string[1:-1] # populate values from config_string, # ignoring keys not present in FIELDS @@ -106,7 +131,7 @@ def __str__(self): def _cast(self, key, value): if (value is not None) and (self.types[key] is not None): if self.types[key] is bool: - return value in [True, u"True", u"true", u"Yes", u"yes", u"1", 1] + return value in self.TRUE_ALIASES else: return self.types[key](value) return value @@ -114,7 +139,7 @@ def _cast(self, key, value): def config_string(self): """ Build the storable string corresponding - to this job configuration object. + to this configuration object. :rtype: string """ diff --git a/aeneas/container.py b/aeneas/container.py index 04f330c1..c4aaeba2 100644 --- a/aeneas/container.py +++ b/aeneas/container.py @@ -2,18 +2,15 @@ # coding=utf-8 """ -A container is an abstraction for a group of files (entries) -compressed into an archive file (e.g., ZIP or TAR) -or uncompressed inside a directory. - -This module contains two main classes. - -1. :class:`aeneas.container.Container` - is the main class, exposing functions - like extracting all or just one entry, - listing the entries in the container, etc. -2. :class:`aeneas.container.ContainerFormat` - is an enumeration of the supported container formats. +This module contains the following classes: + +* :class:`~aeneas.container.Container` + is the main class, exposing functions + like extracting all entries, + extracting just one entry, + listing the entries in the container, etc.; +* :class:`~aeneas.container.ContainerFormat` + is an enumeration of the supported container formats. """ from __future__ import absolute_import @@ -23,7 +20,7 @@ import tarfile import zipfile -from aeneas.logger import Logger +from aeneas.logger import Loggable import aeneas.globalconstants as gc import aeneas.globalfunctions as gf @@ -34,7 +31,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -67,52 +64,49 @@ class ContainerFormat(object): ALLOWED_VALUES = [EPUB, TAR, TAR_GZ, TAR_BZ2, UNPACKED, ZIP] """ List of all the allowed values """ -class Container(object): + + +class Container(Loggable): """ An abstraction for different archive formats like ZIP or TAR, - exposing common functions like extracting all files or - a single file, listing the files, etc. + exposing common functions like extracting all entries or + just a single entry, listing the entries, etc. An (uncompressed) directory can be used in lieu of a compressed file. - :param file_path: the path to the container file (or directory) - :type file_path: string (path) + :param string file_path: the path to the container file (or directory) :param container_format: the format of the container - :type container_format: :class:`aeneas.container.ContainerFormat` + :type container_format: :class:`~aeneas.container.ContainerFormat` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` - - :raise TypeError: if ``file_path`` is None - :raise ValueError: if ``container_format`` is not None and is not an allowed value + :type logger: :class:`~aeneas.logger.Logger` + :raises: TypeError: if ``file_path`` is ``None`` + :raises: ValueError: if ``container_format`` is not ``None`` and is not an allowed value """ TAG = u"Container" - def __init__(self, file_path, container_format=None, logger=None): + def __init__(self, file_path, container_format=None, rconf=None, logger=None): if file_path is None: - raise TypeError("File path is None") + raise TypeError(u"File path is None") if ( (container_format is not None) and (container_format not in ContainerFormat.ALLOWED_VALUES) ): - raise ValueError("Container format not allowed") + raise ValueError(u"Container format not allowed") + super(Container, self).__init__(rconf=rconf, logger=logger) self.file_path = file_path self.container_format = container_format self.actual_container = None - self.logger = logger or Logger() - self._log(u"Setting actual Container object") self._set_actual_container() - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - @property def file_path(self): """ The path of this container. - :rtype: string (path) + :rtype: string """ return self.__file_path @file_path.setter @@ -124,7 +118,7 @@ def container_format(self): """ The format of this container. - :rtype: :class:`aeneas.container.ContainerFormat` + :rtype: :class:`~aeneas.container.ContainerFormat` """ return self.__container_format @container_format.setter @@ -138,8 +132,7 @@ def has_config_xml(self): ``False`` otherwise. :rtype: bool - - :raise: see ``entries()`` + :raises: same as :func:`~aeneas.container.Container.entries` """ return self.entry_config_xml is not None @@ -150,9 +143,8 @@ def entry_config_xml(self): of the XML config file in this container, or ``None`` if not present. - :rtype: string (path) - - :raise: see ``entries()`` + :rtype: string + :raises: same as :func:`~aeneas.container.Container.entries` """ return self.find_entry(gc.CONFIG_XML_FILE_NAME, exact=False) @@ -163,8 +155,7 @@ def has_config_txt(self): ``False`` otherwise. :rtype: bool - - :raise: see ``entries()`` + :raises: same as :func:`~aeneas.container.Container.entries` """ return self.entry_config_txt is not None @@ -175,9 +166,8 @@ def entry_config_txt(self): of the TXT config file in this container, or ``None`` if not present. - :rtype: string (path) - - :raise: see ``entries()`` + :rtype: string + :raises: same as :func:`~aeneas.container.Container.entries` """ return self.find_entry(gc.CONFIG_TXT_FILE_NAME, exact=False) @@ -188,16 +178,14 @@ def is_safe(self): that is, if all its entries are safe, ``False`` otherwise. :rtype: bool - - :raise: see ``entries()`` + :raises: same as :func:`~aeneas.container.Container.entries` """ - self._log(u"Checking if this container is safe") - entries = self.entries() - for entry in entries: + self.log(u"Checking if this container is safe") + for entry in self.entries: if not self.is_entry_safe(entry): - self._log([u"This container is not safe: found unsafe entry '%s'", entry]) + self.log([u"This container is not safe: found unsafe entry '%s'", entry]) return False - self._log(u"This container is safe") + self.log(u"This container is safe") return True def is_entry_safe(self, entry): @@ -210,27 +198,28 @@ def is_entry_safe(self, entry): """ normalized = os.path.normpath(entry) if normalized.startswith(os.sep) or normalized.startswith(".." + os.sep): - self._log([u"Entry '%s' is not safe", entry]) + self.log([u"Entry '%s' is not safe", entry]) return False - self._log([u"Entry '%s' is safe", entry]) + self.log([u"Entry '%s' is safe", entry]) return True + @property def entries(self): """ Return the sorted list of entries in this container, each represented by its full path inside the container. :rtype: list of strings (path) - - :raise TypeError: if this container does not exist - :raise OSError: if an error occurred reading the given container (e.g., empty file, damaged file, etc.) + :raises: TypeError: if this container does not exist + :raises: OSError: if an error occurred reading the given container + (e.g., empty file, damaged file, etc.) """ - self._log(u"Getting entries") + self.log(u"Getting entries") if not self.exists(): - raise TypeError("This container does not exist (wrong path?)") + self.log_exc(u"This container does not exist. Wrong path?", None, True, TypeError) if self.actual_container is None: - raise TypeError("The actual container object has not been set") - return self.actual_container.entries() + self.log_exc(u"The actual container object has not been set", None, True, TypeError) + return self.actual_container.entries def find_entry(self, entry, exact=True): """ @@ -246,32 +235,29 @@ def find_entry(self, entry, exact=True): entry = "config.txt" - might match: :: - - config.txt - foo/config.txt (if exact = False) - foo/bar/config.txt (if exact = False) + matches: :: - :param entry: the entry name to be searched for - :type entry: string (path) - :param exact: look for the exact entry path - :type exact: bool - :rtype: string (path) + config.txt (if exact == True or exact == False) + foo/config.txt (if exact == False) + foo/bar/config.txt (if exact == False) - :raise: see ``entries()`` + :param string entry: the entry name to be searched for + :param bool exact: look for the exact entry path + :rtype: string + :raises: same as :func:`~aeneas.container.Container.entries` """ if exact: - self._log([u"Finding entry '%s' with exact=True", entry]) - if entry in self.entries(): - self._log([u"Found entry '%s'", entry]) + self.log([u"Finding entry '%s' with exact=True", entry]) + if entry in self.entries: + self.log([u"Found entry '%s'", entry]) return entry else: - self._log([u"Finding entry '%s' with exact=False", entry]) - for ent in self.entries(): + self.log([u"Finding entry '%s' with exact=False", entry]) + for ent in self.entries: if os.path.basename(ent) == entry: - self._log([u"Found entry '%s'", ent]) + self.log([u"Found entry '%s'", ent]) return ent - self._log([u"Entry '%s' not found", entry]) + self.log([u"Entry '%s' not found", entry]) return None def read_entry(self, entry): @@ -283,68 +269,63 @@ def read_entry(self, entry): or it cannot be found. :rtype: byte string - - :raise: see ``entries()`` + :raises: same as :func:`~aeneas.container.Container.entries` """ if not self.is_entry_safe(entry): - self._log([u"Accessing entry '%s' is not safe", entry]) + self.log([u"Accessing entry '%s' is not safe", entry]) return None - if entry not in self.entries(): - self._log([u"Entry '%s' not found in this container", entry]) + if entry not in self.entries: + self.log([u"Entry '%s' not found in this container", entry]) return None - self._log([u"Reading contents of entry '%s'", entry]) + self.log([u"Reading contents of entry '%s'", entry]) try: return self.actual_container.read_entry(entry) except: - self._log([u"An error occurred while reading the contents of '%s'", entry]) + self.log([u"An error occurred while reading the contents of '%s'", entry]) return None def decompress(self, output_path): """ Decompress the entire container into the given directory. - :param output_path: path of the destination directory - :type output_path: string (path) - - :raise TypeError: if this container does not exist - :raise ValueError: if this container contains unsafe entries, - or ``output_path`` is not an existing directory - :raise OSError: if an error occurred decompressing the given container - (e.g., empty file, damaged file, etc.) + :param string output_path: path of the destination directory + :raises: TypeError: if this container does not exist + :raises: ValueError: if this container contains unsafe entries, + or ``output_path`` is not an existing directory + :raises: OSError: if an error occurred decompressing the given container + (e.g., empty file, damaged file, etc.) """ - self._log([u"Decompressing the container into '%s'", output_path]) + self.log([u"Decompressing the container into '%s'", output_path]) if not self.exists(): - raise TypeError("This container does not exist (wrong path?)") + self.log_exc(u"This container does not exist. Wrong path?", None, True, TypeError) if self.actual_container is None: - raise TypeError("The actual container object has not been set") + self.log_exc(u"The actual container object has not been set", None, True, TypeError) if not gf.directory_exists(output_path): - raise ValueError("The output_path is not an existing directory") + self.log_exc(u"The output path is not an existing directory", None, True, ValueError) if not self.is_safe: - raise ValueError("This container contains unsafe entries") + self.log_exc(u"This container contains unsafe entries", None, True, ValueError) self.actual_container.decompress(output_path) def compress(self, input_path): """ Compress the contents of the given directory. - :param input_path: path of the input directory - :type input_path: string (path) - - :raise TypeError: if the container path has not been set - :raise ValueError: if ``input_path`` is not an existing directory - :raise OSError: if an error occurred compressing the given container - (e.g., empty file, damaged file, etc.) + :param string input_path: path of the input directory + :raises: TypeError: if the container path has not been set + :raises: ValueError: if ``input_path`` is not an existing directory + :raises: OSError: if an error occurred compressing the given container + (e.g., empty file, damaged file, etc.) """ - self._log([u"Compressing '%s' into this container", input_path]) + self.log([u"Compressing '%s' into this container", input_path]) if self.file_path is None: - raise TypeError("The container path has not been set") + self.log_exc(u"The container path has not been set", None, True, TypeError) if self.actual_container is None: - raise TypeError("The actual container object has not been set") + self.log_exc(u"The actual container object has not been set", None, True, TypeError) if not gf.directory_exists(input_path): - raise ValueError("The input_path is not an existing directory") + self.log_exc(u"The input path is not an existing directory", None, True, ValueError) gf.ensure_parent_directory(input_path) self.actual_container.compress(input_path) @@ -364,62 +345,63 @@ def _set_actual_container(self): If the container format is not specified, infer it from the (lowercased) extension of the file path. If the format cannot be inferred, it is assumed to be - of type :class:`aeneas.container.ContainerFormat.UNPACKED` + of type :class:`~aeneas.container.ContainerFormat.UNPACKED` (unpacked directory). """ - self._log(u"Setting actual container") - # infer container format if self.container_format is None: - self._log(u"Inferring actual container format") + self.log(u"Inferring actual container format...") path_lowercased = self.file_path.lower() - self._log([u"Lowercased file path: '%s'", path_lowercased]) + self.log([u"Lowercased file path: '%s'", path_lowercased]) self.container_format = ContainerFormat.UNPACKED for fmt in ContainerFormat.ALLOWED_FILE_VALUES: if path_lowercased.endswith(fmt): self.container_format = fmt break - self._log([u"Inferred format: '%s'", self.container_format]) + self.log(u"Inferring actual container format... done") + self.log([u"Inferred format: '%s'", self.container_format]) # set the actual container - self._log(u"Setting actual container") + self.log(u"Setting actual container...") + # TODO map this if self.container_format == ContainerFormat.ZIP: - self.actual_container = _ContainerZIP(self.file_path) + self.actual_container = _ContainerZIP(self.file_path, rconf=self.rconf, logger=self.logger) elif self.container_format == ContainerFormat.EPUB: - self.actual_container = _ContainerZIP(self.file_path) + self.actual_container = _ContainerZIP(self.file_path, rconf=self.rconf, logger=self.logger) elif self.container_format == ContainerFormat.TAR: - self.actual_container = _ContainerTAR(self.file_path, "") + self.actual_container = _ContainerTAR(self.file_path, "", rconf=self.rconf, logger=self.logger) elif self.container_format == ContainerFormat.TAR_GZ: - self.actual_container = _ContainerTAR(self.file_path, ":gz") + self.actual_container = _ContainerTAR(self.file_path, ":gz", rconf=self.rconf, logger=self.logger) elif self.container_format == ContainerFormat.TAR_BZ2: - self.actual_container = _ContainerTAR(self.file_path, ":bz2") + self.actual_container = _ContainerTAR(self.file_path, ":bz2", rconf=self.rconf, logger=self.logger) elif self.container_format == ContainerFormat.UNPACKED: - self.actual_container = _ContainerUnpacked(self.file_path) - self._log([u"Actual container format: '%s'", self.container_format]) - self._log(u"Actual container set") + self.actual_container = _ContainerUnpacked(self.file_path, rconf=self.rconf, logger=self.logger) + self.log([u"Actual container format: '%s'", self.container_format]) + self.log(u"Setting actual container... done") -class _ContainerTAR(object): +class _ContainerTAR(Loggable): """ A TAR container. """ TAG = u"ContainerTAR" - def __init__(self, file_path, variant, logger=None): + def __init__(self, file_path, variant, rconf=None, logger=None): + super(_ContainerTAR, self).__init__(rconf=rconf, logger=logger) self.file_path = file_path self.variant = variant - self.logger = logger or Logger() + @property def entries(self): try: argument = "r" + self.variant with tarfile.open(self.file_path, argument) as tar_file: result = [e.name for e in tar_file.getmembers() if e.isfile()] return sorted(result) - except: - raise OSError("Cannot read entries from TAR file") + except Exception as exc: + self.log_exc(u"Cannot read entries from TAR file", exc, True, OSError) def read_entry(self, entry): try: @@ -429,16 +411,16 @@ def read_entry(self, entry): result = tar_entry.read() tar_entry.close() return result - except: - raise OSError("Cannot read entry from TAR file") + except Exception as exc: + self.log_exc(u"Cannot read entry from TAR file", exc, True, OSError) def decompress(self, output_path): try: argument = "r" + self.variant with tarfile.open(self.file_path, argument) as tar_file: tar_file.extractall(output_path) - except: - raise OSError("Cannot decompress TAR file") + except Exception as exc: + self.log_exc(u"Cannot decompress TAR file", exc, True, OSError) def compress(self, input_path): try: @@ -451,27 +433,30 @@ def compress(self, input_path): fullpath = os.path.join(root, f) archive_name = os.path.join(archive_root, f) tar_file.add(name=fullpath, arcname=archive_name) - except: - raise OSError("Cannot compress TAR File") + except Exception as exc: + self.log_exc(u"Cannot compress TAR File", exc, True, OSError) + + -class _ContainerZIP(object): +class _ContainerZIP(Loggable): """ A ZIP container. """ TAG = u"ContainerZIP" - def __init__(self, file_path, logger=None): + def __init__(self, file_path, rconf=None, logger=None): + super(_ContainerZIP, self).__init__(rconf=rconf, logger=logger) self.file_path = file_path - self.logger = logger or Logger() + @property def entries(self): try: with zipfile.ZipFile(self.file_path) as zip_file: result = [e for e in zip_file.namelist() if not e.endswith("/")] return sorted(result) - except: - raise OSError("Cannot read entries from ZIP file") + except Exception as exc: + self.log_exc(u"Cannot read entries from ZIP file", exc, True, OSError) def read_entry(self, entry): try: @@ -480,15 +465,15 @@ def read_entry(self, entry): result = zip_entry.read() zip_entry.close() return result - except: - raise OSError("Cannot read entry from ZIP file") + except Exception as exc: + self.log_exc(u"Cannot read entry from ZIP file", exc, True, OSError) def decompress(self, output_path): try: with zipfile.ZipFile(self.file_path) as zip_file: zip_file.extractall(output_path) - except: - raise OSError("Cannot decompress ZIP file") + except Exception as exc: + self.log_exc(u"Cannot decompress ZIP file", exc, True, OSError) def compress(self, input_path): try: @@ -500,20 +485,23 @@ def compress(self, input_path): fullpath = os.path.join(root, f) archive_name = os.path.join(archive_root, f) zip_file.write(fullpath, archive_name) - except: - raise OSError("Cannot compress ZIP file") + except Exception as exc: + self.log_exc(u"Cannot compress ZIP file", exc, True, OSError) + -class _ContainerUnpacked(object): + +class _ContainerUnpacked(Loggable): """ An unpacked container. """ TAG = u"ContainerUnpacked" - def __init__(self, file_path, logger=None): + def __init__(self, file_path, rconf=None, logger=None): + super(_ContainerUnpacked, self).__init__(rconf=rconf, logger=logger) self.file_path = file_path - self.logger = logger or Logger() + @property def entries(self): try: result = [] @@ -524,32 +512,32 @@ def entries(self): relative_path = os.path.join(current_dir_abs, f)[root_len+1:] result.append(relative_path) return sorted(result) - except: - raise OSError("Cannot read entries from unpacked") + except Exception as exc: + self.log_exc(u"Cannot read entries from unpacked", exc, True, OSError) def read_entry(self, entry): try: with io.open(os.path.join(self.file_path, entry), "rb") as unpacked_entry: result = unpacked_entry.read() return result - except: - raise OSError("Cannot read entry from unpacked") + except Exception as exc: + self.log_exc(u"Cannot read entry from unpacked", exc, True, OSError) def decompress(self, output_path): try: if os.path.abspath(output_path) == os.path.abspath(self.file_path): return gf.copytree(self.file_path, output_path) - except: - raise OSError("Cannot decompress unpacked") + except Exception as exc: + self.log_exc(u"Cannot decompress unpacked", exc, True, OSError) def compress(self, input_path): try: if os.path.abspath(input_path) == os.path.abspath(self.file_path): return gf.copytree(input_path, self.file_path) - except: - raise OSError("Cannot compress unpacked") + except Exception as exc: + self.log_exc(u"Cannot compress unpacked", exc, True, OSError) diff --git a/aeneas/cwave/000_compile_driver.sh b/aeneas/cwave/000_compile_driver.sh new file mode 100644 index 00000000..4b24ca6d --- /dev/null +++ b/aeneas/cwave/000_compile_driver.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +gcc cwave_driver.c cwave_func.c cint.c -o cwave_driver -Wall -pedantic -std=c99 + + + diff --git a/aeneas/cwave/100_run_driver.sh b/aeneas/cwave/100_run_driver.sh new file mode 100644 index 00000000..37812bdd --- /dev/null +++ b/aeneas/cwave/100_run_driver.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +if [ ! -e cwave_driver ] +then + bash 000_compile_driver.sh +fi + +echo "Run 1" +./cwave_driver +echo "" + +echo "Run 2" +./cwave_driver ../tools/res/audio.wav +echo "" + +echo "Run 3" +./cwave_driver ../tools/res/audio.wav 0 10 +echo "" + +echo "Run 4" +./cwave_driver ../tools/res/audio.wav 5 5 +echo "" + +echo "Run 5" +./cwave_driver ../tests/res/audioformats/mono.empty.wav +./cwave_driver ../tests/res/audioformats/mono.invalid.wav +./cwave_driver ../tests/res/audioformats/mono.zero.wav +./cwave_driver ../tests/res/audioformats/mono.16000.wav +./cwave_driver ../tests/res/audioformats/mono.22050.wav +./cwave_driver ../tests/res/audioformats/mono.44100.wav +./cwave_driver ../tests/res/audioformats/mono.48000.wav +echo "" + diff --git a/aeneas/cwave/800_compile_py.sh b/aeneas/cwave/800_compile_py.sh new file mode 100644 index 00000000..62de2877 --- /dev/null +++ b/aeneas/cwave/800_compile_py.sh @@ -0,0 +1,5 @@ +#!/bin/bash + +rm -rf build *.so +python cwave_setup.py build_ext --inplace + diff --git a/aeneas/cwave/README.md b/aeneas/cwave/README.md new file mode 100644 index 00000000..72daaca6 --- /dev/null +++ b/aeneas/cwave/README.md @@ -0,0 +1,22 @@ +# aeneas.cwave + +**aeneas.cwave** is a Python C extension to read WAVE files. + +## API + +See the [__init__.py](__init__.py) file. + +## Compiling the Python C extension locally + +```bash +$ python cwave_setup.py build_ext --inplace +``` + +## Compiling the pure C driver program + +```bash +$ bash 000_compile_driver.sh +``` + + + diff --git a/aeneas/cwave/__init__.py b/aeneas/cwave/__init__.py index d2dca9bf..a435100a 100644 --- a/aeneas/cwave/__init__.py +++ b/aeneas/cwave/__init__.py @@ -2,7 +2,32 @@ # coding=utf-8 """ -aeneas.cwave is a Python C extension to read WAVE files. +aeneas.cwave is a Python C extension to read WAVE mono files. + +.. function:: cwave.get_audio_info(audio_file_path) + + Read the sample rate and length of the given WAVE mono file. + + The returned tuple ``(sr, length)`` contains + the sample rate and the number of samples + of the WAVE file. + + :param string audio_file_path: the path of the WAVE file to be read, UTF-8 encoded + :rtype: tuple + +.. function:: cwave.read_audio_data(audio_file_path, from_sample, num_samples) + + Read audio samples from the given WAVE mono file. + + The returned tuple ``(sr, data)`` contains + the sample rate of the WAVE file, + and the samples read as a NumPy 1D array + of ``float64`` values in ``[-1.0, 1.0]``. + + :param string audio_file_path: the path of the WAVE file to be read, UTF-8 encoded + :param int from_sample: index of the first sample to be read + :param int num_samples: number of samples to be read + :rtype: tuple """ __author__ = "Alberto Pettarin" @@ -12,7 +37,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL 3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" diff --git a/aeneas/cwave/cint.c b/aeneas/cwave/cint.c new file mode 120000 index 00000000..8e1c9dae --- /dev/null +++ b/aeneas/cwave/cint.c @@ -0,0 +1 @@ +../cint/cint.c \ No newline at end of file diff --git a/aeneas/cwave/cint.h b/aeneas/cwave/cint.h new file mode 120000 index 00000000..27a6bb39 --- /dev/null +++ b/aeneas/cwave/cint.h @@ -0,0 +1 @@ +../cint/cint.h \ No newline at end of file diff --git a/aeneas/cwave/cwave_driver.c b/aeneas/cwave/cwave_driver.c index 5164cdcc..b1790e9f 100644 --- a/aeneas/cwave/cwave_driver.c +++ b/aeneas/cwave/cwave_driver.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for reading WAVE mono files. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,43 +9,44 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" */ -// -// this is a simple driver to test on the command line -// -// you can compile it with: -// -// $ gcc cwave_driver.c cwave_func.c -o cwave_driver -// -// use it as follows: -// -// ./cwave_driver audio.wav => print info about the WAVE file -// ./cwave_driver audio.wav 0 100 => print the value of the first 100 samples, as (signed) double -// ./cwave_driver audio.wav 25 75 => print the value of the samples with index (starting at 0) 25-99, as (signed) double -// - #include #include #include #include "cwave_func.h" +#define DRIVER_SUCCESS 0 +#define DRIVER_FAILURE 1 + +// print usage +void _usage(const char *prog) { + printf("\n"); + printf("Usage: $ %s AUDIO.wav [FROM_SAMPLE] [NUM_SAMPLES]\n", prog); + printf("\n"); + printf("Example: %s ../tools/res/audio.wav\n", prog); + printf(" %s ../tools/res/audio.wav 0 100\n", prog); + printf(" %s ../tools/res/audio.wav 25 75\n", prog); + printf("\n"); +} + int main(int argc, char **argv) { FILE *audio_file_ptr; struct WAVE_INFO audio_info; char *filename; double *buffer; double duration; - unsigned int i, from_sample, num_samples; + uint32_t i, from_sample, num_samples; // a WAVE file cannot have more 2^32 samples + // parse arguments if (argc < 2) { - printf("\nUsage: $ %s AUDIO.wav [FROM_SAMPLE] [NUM_SAMPLES]\n\n", argv[0]); - return 1; + _usage(argv[0]); + return DRIVER_FAILURE; } filename = argv[1]; from_sample = 0; @@ -55,20 +56,24 @@ int main(int argc, char **argv) { num_samples = atol(argv[3]); } - memset(&audio_info, 0, sizeof(audio_info)); - if (!(audio_file_ptr = wave_open(filename, &audio_info))) { + audio_file_ptr = wave_open(filename, &audio_info); + if (audio_file_ptr == NULL) { printf("Error: cannot open file %s\n", filename); - return 1; + return DRIVER_FAILURE; } duration = 1.0 * audio_info.coNumSamples / audio_info.leSampleRate; if (num_samples > 0) { buffer = (double *)calloc(num_samples, sizeof(double)); - if (!wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples)) { + if (buffer == NULL) { + printf("Error: cannot allocate buffer\n"); + return DRIVER_FAILURE; + } + if (wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples) != CWAVE_SUCCESS) { printf("Error: cannot read the specified range: %u %u\n", from_sample, num_samples); free((void *)buffer); buffer = NULL; - return 1; + return DRIVER_FAILURE; } for (i = 0; i < num_samples; ++i) { printf("%.12f\n", buffer[i]); @@ -86,5 +91,5 @@ int main(int argc, char **argv) { } wave_close(audio_file_ptr); - return 0; + return DRIVER_SUCCESS; } diff --git a/aeneas/cwave/cwave_func.c b/aeneas/cwave/cwave_func.c index 4aa97ff2..5e717e99 100644 --- a/aeneas/cwave/cwave_func.c +++ b/aeneas/cwave/cwave_func.c @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for reading WAVE mono files. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -23,207 +23,183 @@ __status__ = "Production" static const int CWAVE_BUFFER_SIZE = 4096; -// TODO make me faster and more portable -// convert a little-endian buffer to big-endian as unsigned int -// return the number of bytes read -unsigned int _be_to_le_uint(unsigned char *buffer, const int length) { - unsigned int ret; - - ret = 0; +// convert a little-endian buffer to signed double +static double _le_to_double(unsigned char *buffer, const uint32_t length) { if (length == 1) { - ret = buffer[0]; + return ((double)le_s8_to_cpu(buffer)) / 128; } if (length == 2) { - ret = buffer[0]; - ret |= ((buffer[1]) << 8); + return ((double)le_s16_to_cpu(buffer)) / 32768; } if (length == 4) { - ret = buffer[0]; - ret |= ((buffer[1]) << 8); - ret |= ((buffer[2]) << 16); - ret |= ((buffer[3]) << 24); + return ((double)le_s32_to_cpu(buffer)) / 2147483648; } - return ret; + return 0.0; } -// TODO make me faster and more portable -// convert a little-endian buffer to big-endian as unsigned int -// return the number of bytes read -int _be_to_le_int(unsigned char *buffer, const int length) { - int ret; - - ret = 0; - if (length == 1) { - ret = buffer[0]; - ret = (ret << 24) >> 24; - } - if (length == 2) { - ret = buffer[0]; - ret |= ((buffer[1]) << 8); - ret = (ret << 16) >> 16; - } - if (length == 4) { - ret = buffer[0]; - ret |= ((buffer[1]) << 8); - ret |= ((buffer[2]) << 16); - ret |= ((buffer[3]) << 24); - } - return ret; -} +// read a little-endian u16 field +static int _read_le_u16_field(FILE *ptr, uint16_t *dest) { + unsigned char buffer[2]; -// TODO make me faster and more portable -// convert a little-endian buffer to big-endian as signed double -// return the number of bytes read -double _be_to_le_double(unsigned char *buffer, const int length) { - if (length == 1) { - return ((double)_be_to_le_int(buffer, length)) / 128; + if (fread(buffer, 2, 1, ptr) != 1) { + return CWAVE_FAILURE; } - if (length == 2) { - return ((double)_be_to_le_int(buffer, length)) / 32768; - } - if (length == 4) { - return ((double)_be_to_le_int(buffer, length)) / 2147483648; - } - return 0; + *dest = le_u16_to_cpu(buffer); + return CWAVE_SUCCESS; } -// TODO make me faster and more portable -// read a little-endian field and convert it to big-endian into an int -// return the number of bytes read -int _read_le_field(FILE *ptr, unsigned int *dest, const int length) { - unsigned char buffer1[1]; - unsigned char buffer2[2]; - unsigned char buffer4[4]; - unsigned char *buffer; - int read; +// read a little-endian u32 field +static int _read_le_u32_field(FILE *ptr, uint32_t *dest) { + unsigned char buffer[4]; - if (length == 1) { - buffer = buffer1; - } else if (length == 2) { - buffer = buffer2; - } else if (length == 4) { - buffer = buffer4; - } else { - return 0; + if (fread(buffer, 4, 1, ptr) != 1) { + return CWAVE_FAILURE; } - read = fread(buffer, length, 1, ptr); - *dest = _be_to_le_uint(buffer, length); - return read; + *dest = le_u32_to_cpu(buffer); + return CWAVE_SUCCESS; } -// TODO make me faster and more portable // read a big-endian field -// return the number of bytes read -int _read_be_field(FILE *ptr, char *dest, const int length) { - return fread(dest, length, 1, ptr); +static int _read_be_field(FILE *ptr, char *dest, const int length) { + if (fread(dest, length, 1, ptr) != 1) { + return CWAVE_FAILURE; + } + return CWAVE_SUCCESS; } -// find the "match" chunk, and store its size in size -// return 1 on success or 0 on failure -int _seek_to_chunk(FILE *ptr, struct WAVE_INFO *header, const char *match, unsigned int *size) { +// find the "match" chunk, and store its size in "size" +static int _seek_to_chunk(FILE *ptr, struct WAVE_INFO *header, const char *match, uint32_t *size) { char buffer4[4]; - unsigned int chunk_size; - const unsigned int max_pos = (*header).leChunkSize + 8; // max pos in file + uint32_t chunk_size; + const uint32_t max_pos = (*header).leChunkSize + 8; // max pos in file rewind(ptr); chunk_size = 12; // skip first 12 bytes - while(ftell(ptr) + chunk_size + 8 < max_pos) { + while((ftell(ptr) >= 0) && (ftell(ptr) + chunk_size + 8 < max_pos)) { + // seek to the next chunk if (fseek(ptr, chunk_size, SEEK_CUR) != 0) { - return 0; + return CWAVE_FAILURE; } - if (_read_be_field(ptr, buffer4, 4) != 1) { - return 0; + // read the chunk description + if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) { + return CWAVE_FAILURE; } - if (_read_le_field(ptr, &chunk_size, 4) != 1) { - return 0; + // read the chunk size + if (_read_le_u32_field(ptr, &chunk_size) != CWAVE_SUCCESS) { + return CWAVE_FAILURE; } + // compare the chunk description with the desired string if (memcmp(buffer4, match, 4) == 0) { *size = chunk_size; - return 1; + return CWAVE_SUCCESS; } } - return 0; + return CWAVE_FAILURE; } -// parse the header -// it assumes the given file is a RIFF WAVE file +// open a WAVE mono file and read header info +// the header is always initialized to zero FILE *wave_open(const char *path, struct WAVE_INFO *header) { FILE *ptr; char buffer4[4]; struct WAVE_INFO h; + // initialize header + memset(header, 0, sizeof(*header)); + // open file if (path == NULL) { - printf("Error: path is NULL\n"); + //printf("Error: path is NULL\n"); return NULL; } ptr = fopen(path, "rb"); if (ptr == NULL) { - printf("Error: unable to open input file %s\n", path); + //printf("Error: unable to open input file %s\n", path); return NULL; } // read first 12 bytes: RIFF header.leChunkSize WAVE rewind(ptr); - if (_read_be_field(ptr, buffer4, 4) != 1) { - printf("Error: cannot read beChunkID\n"); + if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) { + //printf("Error: cannot read beChunkID\n"); return NULL; } if (memcmp(buffer4, "RIFF", 4) != 0) { - printf("Error: beChunkID is not RIFF\n"); + //printf("Error: beChunkID is not RIFF\n"); return NULL; } - if (_read_le_field(ptr, &h.leChunkSize, 4) != 1) { - printf("Error: cannot read leChunkSize\n"); + if (_read_le_u32_field(ptr, &h.leChunkSize) != CWAVE_SUCCESS) { + //printf("Error: cannot read leChunkSize\n"); return NULL; } - //printf("leChunkSize: %d\n", header.leChunkSize); - if (_read_be_field(ptr, buffer4, 4) != 1) { - printf("Error: cannot read beFormat\n"); - return 0; + if (_read_be_field(ptr, buffer4, 4) != CWAVE_SUCCESS) { + //printf("Error: cannot read beFormat\n"); + return NULL; } if (memcmp(buffer4, "WAVE", 4) != 0) { - printf("Error: beFormat is not WAVE\n"); + //printf("Error: beFormat is not WAVE\n"); return NULL; } // locate the fmt chunk - if (! _seek_to_chunk(ptr, &h, "fmt ", &h.leSubchunkFmtSize)) { - printf("Error: cannot locate fmt chunk\n"); + if (_seek_to_chunk(ptr, &h, "fmt ", &h.leSubchunkFmtSize) != CWAVE_SUCCESS) { + //printf("Error: cannot locate fmt chunk\n"); return NULL; } if (h.leSubchunkFmtSize < 16) { - printf("Error: fmt chunk has length < 16\n"); + //printf("Error: fmt chunk has length < 16\n"); return NULL; } - _read_le_field(ptr, &h.leAudioFormat, 2); - _read_le_field(ptr, &h.leNumChannels, 2); - _read_le_field(ptr, &h.leSampleRate, 4); - _read_le_field(ptr, &h.leByteRate, 4); - _read_le_field(ptr, &h.leBlockAlign, 2); - _read_le_field(ptr, &h.leBitsPerSample, 2); + + // read fields + if (_read_le_u16_field(ptr, &h.leAudioFormat) != CWAVE_SUCCESS) { + //printf("Error: cannot read leAudioFormat\n"); + return NULL; + } + // NOTE we fail here because we are only interested in PCM files! if (h.leAudioFormat != WAVE_FORMAT_PCM) { - printf("Error: leAudioFormat is not PCM\n"); + //printf("Error: leAudioFormat is not PCM\n"); + return NULL; + } + if (_read_le_u16_field(ptr, &h.leNumChannels) != CWAVE_SUCCESS) { + //printf("Error: cannot read leNumChannels\n"); return NULL; } + // NOTE we fail here because we are only interested in mono files! if (h.leNumChannels != WAVE_CHANNELS_MONO) { - printf("Error: leNumChannels is not 1\n"); + //printf("Error: leNumChannels is not 1\n"); + return NULL; + } + if (_read_le_u32_field(ptr, &h.leSampleRate) != CWAVE_SUCCESS) { + //printf("Error: cannot read leSampleRate\n"); + return NULL; + } + if (_read_le_u32_field(ptr, &h.leByteRate) != CWAVE_SUCCESS) { + //printf("Error: cannot read leByteRate\n"); + return NULL; + } + if (_read_le_u16_field(ptr, &h.leBlockAlign) != CWAVE_SUCCESS) { + //printf("Error: cannot read leBlockAlign\n"); + return NULL; + } + if (_read_le_u16_field(ptr, &h.leBitsPerSample) != CWAVE_SUCCESS) { + //printf("Error: cannot read leBitsPerSample\n"); return NULL; } // locate the data chunk - if (! _seek_to_chunk(ptr, &h, "data", &h.leSubchunkDataSize)) { - printf("Error: cannot locate data chunk\n"); + if (_seek_to_chunk(ptr, &h, "data", &h.leSubchunkDataSize) != CWAVE_SUCCESS) { + //printf("Error: cannot locate data chunk\n"); return NULL; } if (h.leSubchunkDataSize == 0) { - printf("Error: data chunk has length zero\n"); + //printf("Error: data chunk has length zero\n"); return NULL; } // here ptr is at the beginnig of the data info - h.coSubchunkDataStart = ftell(ptr); + h.coSubchunkDataStart = (uint32_t)ftell(ptr); // compute number of samples h.coNumSamples = (h.leSubchunkDataSize / (h.leNumChannels * h.leBitsPerSample / 8)); // compute number of bytes/sample (single channel) @@ -231,12 +207,12 @@ FILE *wave_open(const char *path, struct WAVE_INFO *header) { // max byte position h.coMaxDataPosition = h.coSubchunkDataStart + h.leSubchunkDataSize; - // copy h into header and return success + // copy h into header and return the pointer to the audio file *header = h; return ptr; } -// close file +// close a WAVE mono file previously open int wave_close(FILE *ptr) { int ret; @@ -245,23 +221,22 @@ int wave_close(FILE *ptr) { return ret; } -// read number_samples samples, starting from sample with index from_sample -// and save them as doubles into dest +// read samples from an open WAVE mono file int wave_read_double( FILE *ptr, struct WAVE_INFO *header, double *dest, - const unsigned int from_sample, - const unsigned int number_samples + const uint32_t from_sample, + const uint32_t number_samples ) { unsigned char *buffer; - unsigned int target_pos; - unsigned int i, j, read, remaining; - const unsigned int bytes_per_sample = (*header).coBytesPerSample; + uint32_t target_pos; + const uint32_t bytes_per_sample = (*header).coBytesPerSample; + uint32_t i, j, read, remaining; if (from_sample + number_samples > (*header).coNumSamples) { - printf("Error: attempted reading outside data\n"); - return 0; + //printf("Error: attempted reading outside data\n"); + return CWAVE_FAILURE; } target_pos = (*header).coSubchunkDataStart + bytes_per_sample * from_sample; @@ -279,14 +254,14 @@ int wave_read_double( read = fread(buffer, bytes_per_sample, remaining, ptr); } for (i = 0; i < read; ++i) { - dest[j++] = _be_to_le_double(buffer + i * bytes_per_sample, bytes_per_sample); + dest[j++] = _le_to_double(buffer + i * bytes_per_sample, bytes_per_sample); } remaining -= read; } free((void *)buffer); buffer = NULL; - return 1; + return CWAVE_SUCCESS; } diff --git a/aeneas/cwave/cwave_func.h b/aeneas/cwave/cwave_func.h index adfc14bd..ae429924 100644 --- a/aeneas/cwave/cwave_func.h +++ b/aeneas/cwave/cwave_func.h @@ -1,6 +1,6 @@ /* -Python C Extension for computing the MFCC +Python C Extension for reading WAVE mono files. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,15 +9,16 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" */ -// NOTE: using unsigned int as it is 32-bit wide on all modern architectures -// not using uint32_t because the MS C compiler does not have -// or, at least, it is not easy to use it +#include "cint.h" + +#define CWAVE_SUCCESS 0 +#define CWAVE_FAILURE 1 enum { WAVE_FORMAT_PCM = 0x0001, // PCM @@ -31,35 +32,46 @@ enum { }; struct WAVE_INFO { - // be = big endian - // le = little endian - - // read - unsigned int leChunkSize; // (size of the whole file in bytes - 8) - unsigned int leSubchunkFmtSize; // (size of the subchunk 1 in bytes - 4) - unsigned int leAudioFormat; // one of the WAVE_FORMAT_* values - unsigned int leNumChannels; // number of channels (1 = mono, 2 = stereo) - unsigned int leSampleRate; // samples per second (e.g. 48000, 44100, 22050, 16000, 8000) - unsigned int leByteRate; // leSampleRate * leNumChannels * leBitsPerSample/8 => data bytes/s - unsigned int leBlockAlign; // beNumChannels * beBitsPerSample/8 => bytes/sample, including all channels - unsigned int leBitsPerSample; // number of bits per sample (e.g., 8, 16, 32) - unsigned int leSubchunkDataSize; // leNumSamples * leNumChannels * leBitsPerSample/8 => data bytes + // be = big endian in file => converted into cpu endianness + // le = little endian in file => converted into cpu endianness + // co = computed, always in cpu endianness + + // first 12 bytes + //uint32_t beChunkID; // string 'RIFF' + uint32_t leChunkSize; // (size of the whole file in bytes - 8) + //uint32_t beFormat; // string 'WAVE' + + // then, we have at least the SubchunkFmt and SubchunkData + // in any order, and other kinds of Subchunk can be present as well + uint32_t leSubchunkFmtSize; // (size of the subchunk 1 in bytes - 4) + uint16_t leAudioFormat; // one of the WAVE_FORMAT_* values + uint16_t leNumChannels; // number of channels (1 = mono, 2 = stereo) + uint32_t leSampleRate; // samples per second (e.g. 48000, 44100, 22050, 16000, 8000) + uint32_t leByteRate; // leSampleRate * leNumChannels * leBitsPerSample/8 => data bytes/s + uint16_t leBlockAlign; // leNumChannels * leBitsPerSample/8 => bytes/sample, including all channels + uint16_t leBitsPerSample; // number of bits per sample (e.g., 8, 16, 32) + uint32_t leSubchunkDataSize; // leNumSamples * leNumChannels * leBitsPerSample/8 => data bytes // computed - unsigned int coNumSamples; // number of samples - unsigned int coSubchunkDataStart; // byte at which the data chunk starts - unsigned int coBytesPerSample; // leBitsPerSample / 8 => bytes/sample (single channel) - unsigned int coMaxDataPosition; // coSubchunkDataStart + leSubchunkDataSize => max byte position of data + uint32_t coNumSamples; // number of samples + uint32_t coSubchunkDataStart; // byte at which the data chunk starts + uint32_t coBytesPerSample; // leBitsPerSample / 8 => bytes/sample (single channel) + uint32_t coMaxDataPosition; // coSubchunkDataStart + leSubchunkDataSize => max byte position of data }; +// open a WAVE mono file and read header info FILE *wave_open(const char *path, struct WAVE_INFO *audio_info); + +// close an open WAVE mono file int wave_close(FILE *audio_file_ptr); + +// read samples from an open WAVE mono file int wave_read_double( FILE *audio_file_ptr, struct WAVE_INFO *audio_info, double *dest, - const unsigned int from_sample, - const unsigned int number_samples + const uint32_t from_sample, + const uint32_t number_samples ); diff --git a/aeneas/cwave/cwave_py.c b/aeneas/cwave/cwave_py.c index 6d265bb2..06493adf 100644 --- a/aeneas/cwave/cwave_py.c +++ b/aeneas/cwave/cwave_py.c @@ -1,6 +1,6 @@ /* -Python C Extension for reading WAVE files +Python C Extension for reading WAVE mono files. __author__ = "Alberto Pettarin" __copyright__ = """ @@ -9,7 +9,7 @@ __copyright__ = """ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -31,10 +31,9 @@ __status__ = "Production" static PyObject *get_audio_info(PyObject *self, PyObject *args) { PyObject *tuple; char *audio_file_path; - FILE *audio_file_ptr; struct WAVE_INFO audio_info; - unsigned int sample_rate, total_samples; + uint32_t sample_rate, total_samples; // a WAVE file cannot have more than 2^32 samples // s = string if (!PyArg_ParseTuple(args, "s", &audio_file_path)) { @@ -42,8 +41,8 @@ static PyObject *get_audio_info(PyObject *self, PyObject *args) { return NULL; } - memset(&audio_info, 0, sizeof(audio_info)); - if (!(audio_file_ptr = wave_open(audio_file_path, &audio_info))) { + audio_file_ptr = wave_open(audio_file_path, &audio_info); + if (audio_file_ptr == NULL) { PyErr_SetString(PyExc_ValueError, "Error while opening the WAVE file"); return NULL; } @@ -63,12 +62,11 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) { PyArrayObject *audio_data; npy_intp audio_data_dimensions[1]; char *audio_file_path; - unsigned int from_sample, num_samples; - FILE *audio_file_ptr; struct WAVE_INFO audio_info; - unsigned int sample_rate, total_samples; - double *buffer; + uint32_t from_sample, num_samples, total_samples; // a WAVE file cannot have more than 2^32 samples + uint32_t sample_rate; // sample_rate is a uint32_t in the WAVE header + double *buffer; // this buffer will store the data read // s = string // I = unsigned int @@ -77,8 +75,8 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) { return NULL; } - memset(&audio_info, 0, sizeof(audio_info)); - if (!(audio_file_ptr = wave_open(audio_file_path, &audio_info))) { + audio_file_ptr = wave_open(audio_file_path, &audio_info); + if (audio_file_ptr == NULL) { PyErr_SetString(PyExc_ValueError, "Error while opening the WAVE file"); return NULL; } @@ -93,7 +91,11 @@ static PyObject *read_audio_data(PyObject *self, PyObject *args) { return NULL; } buffer = (double *)calloc(num_samples, sizeof(double)); - wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples); + if (wave_read_double(audio_file_ptr, &audio_info, buffer, from_sample, num_samples) != CWAVE_SUCCESS) { + wave_close(audio_file_ptr); + PyErr_SetString(PyExc_ValueError, "Error while reading WAVE data: unable to read data"); + return NULL; + } wave_close(audio_file_ptr); // build the array to be returned @@ -114,13 +116,19 @@ static PyMethodDef cwave_methods[] = { "get_audio_info", get_audio_info, METH_VARARGS, - "Get information about a WAVE file" + "Get information about a WAVE file\n" + ":param string audio_file_path: the file path of the audio file\n" + ":rtype: tuple (sample_rate, num_samples)" }, { "read_audio_data", read_audio_data, METH_VARARGS, - "Get audio data from a WAVE file" + "Get audio data from a WAVE file\n" + ":param string audio_file_path: the file path of the audio file\n" + ":param uint from_sample: read from this sample index\n" + ":param uint num_samples: read this many samples\n" + ":rtype: tuple (sample_rate, list) where list is a list of float values, one per sample" }, { NULL, diff --git a/aeneas/cwave/cwave_setup.py b/aeneas/cwave/cwave_setup.py index 0586b2e8..513a23ed 100644 --- a/aeneas/cwave/cwave_setup.py +++ b/aeneas/cwave/cwave_setup.py @@ -23,15 +23,15 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -CMODULE = Extension("cwave", sources=["cwave_py.c", "cwave_func.c"], include_dirs=[get_include()]) +CMODULE = Extension("cwave", sources=["cwave_py.c", "cwave_func.c", "cint.c"], include_dirs=[get_include()]) setup( name="cwave", - version="1.4.1", + version="1.5.0", description=""" Python C Extension for for reading WAVE files. """, diff --git a/aeneas/diagnostics.py b/aeneas/diagnostics.py index 1b659451..f2b571b6 100644 --- a/aeneas/diagnostics.py +++ b/aeneas/diagnostics.py @@ -2,12 +2,16 @@ # coding=utf-8 """ -Check whether the setup of aeneas was successful. +This module contains the following classes: -Running the checks in this class makes sense only -if you git-cloned the original GitHub repository -and/or if you are interested in contributing to the -development of aeneas. +* :class:`~aeneas.diagnostics.Diagnostics`, + checking whether the setup of ``aeneas`` was successful. + +This module can be executed from command line with:: + + python -m aeneas.diagnostics + +.. versionadded:: 1.4.1 """ from __future__ import absolute_import @@ -23,46 +27,20 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" SETUP_COMMAND = u"'python setup.py build_ext --inplace'" -ANSI_ERROR = u"\033[91m" -ANSI_OK = u"\033[92m" -ANSI_WARNING = u"\033[93m" -ANSI_END = u"\033[0m" - -def print_error(msg): - if gf.is_posix(): - print(u"%s[ERRO] %s%s" % (ANSI_ERROR, msg, ANSI_END)) - else: - print(u"[ERRO] %s" % (msg)) - -def print_info(msg): - print(u"[INFO] %s" % (msg)) - -def print_success(msg): - if gf.is_posix(): - print(u"%s[INFO] %s%s" % (ANSI_OK, msg, ANSI_END)) - else: - print(u"[INFO] %s" % (msg)) - -def print_warning(msg): - if gf.is_posix(): - print(u"%s[WARN] %s%s" % (ANSI_WARNING, msg, ANSI_END)) - else: - print(u"[WARN] %s" % (msg)) - class Diagnostics(object): """ - Check whether the setup of aeneas was successful. + Check whether the setup of ``aeneas`` was successful. """ @classmethod def check_shell_encoding(cls): """ - Check whether the shell (sys.stdin and sys.stdout) is UTF-8 encoded. + Check whether ``sys.stdin`` and ``sys.stdout`` are UTF-8 encoded. Return ``True`` on failure and ``False`` on success. @@ -75,25 +53,25 @@ def check_shell_encoding(cls): if sys.stdout.encoding not in ["UTF-8", "UTF8"]: is_out_utf8 = False if (is_in_utf8) and (is_out_utf8): - print_success(u"shell encoding OK") + gf.print_success(u"shell encoding OK") else: - print_warning(u"shell encoding WARNING") + gf.print_warning(u"shell encoding WARNING") if not is_in_utf8: - print_warning(u" The default input encoding of your shell is not UTF-8") + gf.print_warning(u" The default input encoding of your shell is not UTF-8") if not is_out_utf8: - print_warning(u" The default output encoding of your shell is not UTF-8") - print_info(u" If you plan to use aeneas on the command line,") + gf.print_warning(u" The default output encoding of your shell is not UTF-8") + gf.print_info(u" If you plan to use aeneas on the command line,") if gf.is_posix(): - print_info(u" you might want to 'export PYTHONIOENCODING=UTF-8' in your shell") + gf.print_info(u" you might want to 'export PYTHONIOENCODING=UTF-8' in your shell") else: - print_info(u" you might want to 'set PYTHONIOENCODING=UTF-8' in your shell") - return True + gf.print_info(u" you might want to 'set PYTHONIOENCODING=UTF-8' in your shell") + return True return False @classmethod def check_ffprobe(cls): """ - Check whether ffprobe can be called. + Check whether ``ffprobe`` can be called. Return ``True`` on failure and ``False`` on success. @@ -101,24 +79,23 @@ def check_ffprobe(cls): """ try: from aeneas.ffprobewrapper import FFPROBEWrapper - import aeneas.globalfunctions as gf file_path = gf.absolute_path(u"tools/res/audio.mp3", __file__) prober = FFPROBEWrapper() properties = prober.read_properties(file_path) - print_success(u"ffprobe OK") + gf.print_success(u"ffprobe OK") return False except: pass - print_error(u"ffprobe ERROR") - print_info(u" Please make sure you have ffprobe installed correctly") - print_info(u" (usually it is provided by the ffmpeg installer)") - print_info(u" and that its path is in your PATH environment variable") + gf.print_error(u"ffprobe ERROR") + gf.print_info(u" Please make sure you have ffprobe installed correctly") + gf.print_info(u" (usually it is provided by the ffmpeg installer)") + gf.print_info(u" and that its path is in your PATH environment variable") return True @classmethod def check_ffmpeg(cls): """ - Check whether ffmpeg can be called. + Check whether ``ffmpeg`` can be called. Return ``True`` on failure and ``False`` on success. @@ -126,26 +103,25 @@ def check_ffmpeg(cls): """ try: from aeneas.ffmpegwrapper import FFMPEGWrapper - import aeneas.globalfunctions as gf input_file_path = gf.absolute_path(u"tools/res/audio.mp3", __file__) handler, output_file_path = gf.tmp_file(suffix=u".wav") converter = FFMPEGWrapper() result = converter.convert(input_file_path, output_file_path) gf.delete_file(handler, output_file_path) if result: - print_success(u"ffmpeg OK") + gf.print_success(u"ffmpeg OK") return False except: pass - print_error(u"ffmpeg ERROR") - print_info(u" Please make sure you have ffmpeg installed correctly") - print_info(u" and that its path is in your PATH environment variable") + gf.print_error(u"ffmpeg ERROR") + gf.print_info(u" Please make sure you have ffmpeg installed correctly") + gf.print_info(u" and that its path is in your PATH environment variable") return True @classmethod def check_espeak(cls): """ - Check whether espeak can be called. + Check whether ``espeak`` can be called. Return ``True`` on failure and ``False`` on success. @@ -153,10 +129,8 @@ def check_espeak(cls): """ try: from aeneas.espeakwrapper import ESPEAKWrapper - from aeneas.language import Language - import aeneas.globalfunctions as gf text = u"From fairest creatures we desire increase," - language = Language.EN + language = u"eng" handler, output_file_path = gf.tmp_file(suffix=u".wav") espeak = ESPEAKWrapper() result = espeak.synthesize_single( @@ -166,21 +140,21 @@ def check_espeak(cls): ) gf.delete_file(handler, output_file_path) if result: - print_success(u"espeak OK") + gf.print_success(u"espeak OK") return False except: pass - print_error(u"espeak ERROR") - print_info(u" Please make sure you have espeak installed correctly") - print_info(u" and that its path is in your PATH environment variable") - print_info(u" You might also want to check that the espeak-data directory") - print_info(u" is set up correctly, for example, it has the correct permissions") + gf.print_error(u"espeak ERROR") + gf.print_info(u" Please make sure you have espeak installed correctly") + gf.print_info(u" and that its path is in your PATH environment variable") + gf.print_info(u" You might also want to check that the espeak-data directory") + gf.print_info(u" is set up correctly, for example, it has the correct permissions") return True @classmethod def check_tools(cls): """ - Check whether aeneas.tools.* can be imported. + Check whether ``aeneas.tools.*`` can be imported. Return ``True`` on failure and ``False`` on success. @@ -188,85 +162,87 @@ def check_tools(cls): """ try: from aeneas.tools.convert_syncmap import ConvertSyncMapCLI - from aeneas.tools.download import DownloadCLI - from aeneas.tools.espeak_wrapper import ESPEAKWrapperCLI + # disabling this check, as it contains optional dependency pafy + #from aeneas.tools.download import DownloadCLI from aeneas.tools.execute_job import ExecuteJobCLI from aeneas.tools.execute_task import ExecuteTaskCLI from aeneas.tools.extract_mfcc import ExtractMFCCCLI from aeneas.tools.ffmpeg_wrapper import FFMPEGWrapperCLI from aeneas.tools.ffprobe_wrapper import FFPROBEWrapperCLI + # disabling this check, as it contains optional dependency Pillow + #from aeneas.tools.plot_waveform import PlotWaveformCLI from aeneas.tools.read_audio import ReadAudioCLI from aeneas.tools.read_text import ReadTextCLI from aeneas.tools.run_sd import RunSDCLI from aeneas.tools.run_vad import RunVADCLI from aeneas.tools.synthesize_text import SynthesizeTextCLI from aeneas.tools.validate import ValidateCLI - print_success(u"aeneas.tools OK") + gf.print_success(u"aeneas.tools OK") return False except: pass - print_error(u"aeneas.tools ERROR") - print_info(u" Unable to import one or more aeneas.tools") - print_info(u" Please check that you installed aeneas properly") + gf.print_error(u"aeneas.tools ERROR") + gf.print_info(u" Unable to import one or more aeneas.tools") + gf.print_info(u" Please check that you installed aeneas properly") return True @classmethod def check_cdtw(cls): """ - Check whether Python C extension cdtw can be imported. + Check whether Python C extension ``cdtw`` can be imported. Return ``True`` on failure and ``False`` on success. :rtype: bool """ if gf.can_run_c_extension("cdtw"): - print_success(u"aeneas.cdtw COMPILED") + gf.print_success(u"aeneas.cdtw COMPILED") return False - print_warning(u"aeneas.cdtw NOT COMPILED") - print_info(u" You can still run aeneas but it will be significantly slower") - print_info(u" To compile the cdtw module, run %s" % SETUP_COMMAND) + gf.print_warning(u"aeneas.cdtw NOT COMPILED") + gf.print_info(u" You can still run aeneas but it will be significantly slower") + gf.print_info(u" To compile the cdtw module, run %s" % SETUP_COMMAND) return True @classmethod def check_cmfcc(cls): """ - Check whether Python C extension cmfcc can be imported. + Check whether Python C extension ``cmfcc`` can be imported. Return ``True`` on failure and ``False`` on success. :rtype: bool """ if gf.can_run_c_extension("cmfcc"): - print_success(u"aeneas.cmfcc COMPILED") + gf.print_success(u"aeneas.cmfcc COMPILED") return False - print_warning(u"aeneas.cmfcc NOT COMPILED") - print_info(u" You can still run aeneas but it will be significantly slower") - print_info(u" To compile the cmfcc module, run %s" % SETUP_COMMAND) + gf.print_warning(u"aeneas.cmfcc NOT COMPILED") + gf.print_info(u" You can still run aeneas but it will be significantly slower") + gf.print_info(u" To compile the cmfcc module, run %s" % SETUP_COMMAND) return True @classmethod def check_cew(cls): """ - Check whether Python C extension cew can be imported. + Check whether Python C extension ``cew`` can be imported. Return ``True`` on failure and ``False`` on success. For those OSes where ``cew`` is not available, - print a warning but also return ``False`` (success). + print a warning and return ``False`` (success). :rtype: bool """ if not gf.is_linux(): - print_warning(u"cew NOT AVAILABLE") - print_info(u" The Python C Extension cew is not available for your OS") - print_info(u" You can still run aeneas but it will be a bit slower (than Linux)") + gf.print_warning(u"aeneas.cew NOT AVAILABLE") + gf.print_info(u" The Python C Extension cew is not available for your OS") + gf.print_info(u" You can still run aeneas but it will be a bit slower (than Linux)") return False if gf.can_run_c_extension("cew"): - print_success(u"aeneas.cew COMPILED") + gf.print_success(u"aeneas.cew COMPILED") return False - print_warning(u"aeneas.cew NOT COMPILED") - print_info(u" You can still run aeneas but it will be a bit slower") - print_info(u" To compile the cew module, run %s" % SETUP_COMMAND) + gf.print_warning(u"aeneas.cew NOT COMPILED") + gf.print_info(u" You can still run aeneas but it will be a bit slower") + gf.print_info(u" To compile the cew module, run %s" % SETUP_COMMAND) return True @classmethod @@ -276,12 +252,9 @@ def check_all(cls, tools=True, encoding=True, c_ext=True): Return a tuple of booleans ``(errors, warnings, c_ext_warnings)``. - :param tools: if ``True``, check aeneas tools - :type tools: bool - :param encoding: if ``True``, check shell encoding - :type encoding: bool - :param c_ext: if ``True``, check Python C extensions - :type c_ext: bool + :param bool tools: if ``True``, check aeneas tools + :param bool encoding: if ``True``, check shell encoding + :param bool c_ext: if ``True``, check Python C extensions :rtype: (bool, bool, bool) """ # errors are fatal @@ -293,20 +266,17 @@ def check_all(cls, tools=True, encoding=True, c_ext=True): return (True, False, False) if (tools) and (cls.check_tools()): return (True, False, False) - # warnings are non-fatal warnings = False c_ext_warnings = False - if encoding: warnings = cls.check_shell_encoding() - if c_ext: # we do not want lazy evaluation c_ext_warnings = cls.check_cdtw() or c_ext_warnings c_ext_warnings = cls.check_cmfcc() or c_ext_warnings c_ext_warnings = cls.check_cew() or c_ext_warnings - + # return results return (False, warnings, c_ext_warnings) @@ -315,15 +285,11 @@ def main(): errors, warnings, c_ext_warnings = Diagnostics.check_all() if errors: sys.exit(1) - #print_info(u"") if c_ext_warnings: - print_warning(u"All required dependencies are met but at least one available Python C extension is not compiled") - #print_info(u"You can still run aeneas but it will be slower") - #print_info(u"Enjoy running aeneas!") + gf.print_warning(u"All required dependencies are met but at least one available Python C extension is not compiled") sys.exit(2) else: - print_success(u"All required dependencies are met and all available Python C extensions are compiled") - #print_info(u"Enjoy running aeneas!") + gf.print_success(u"All required dependencies are met and all available Python C extensions are compiled") sys.exit(0) diff --git a/aeneas/downloader.py b/aeneas/downloader.py index b231248d..3436b46e 100644 --- a/aeneas/downloader.py +++ b/aeneas/downloader.py @@ -2,13 +2,17 @@ # coding=utf-8 """ -Download files from various Web sources. +This module contains the following classes: + +* :class:`~aeneas.downloader.Downloader`, which download files from various Web sources. + +.. note:: This module requires Python modules ``youtube-dl`` and ``pafy`` (``pip install youtube-dl pafy``). """ from __future__ import absolute_import from __future__ import print_function -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration import aeneas.globalfunctions as gf @@ -19,31 +23,22 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class Downloader(object): +class Downloader(Loggable): """ Download files from various Web sources. - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ TAG = u"Downloader" - def __init__(self, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - def audio_from_youtube( self, source_url, @@ -73,91 +68,80 @@ def audio_from_youtube( Return the path of the downloaded file. - :param source_url: the URL of the YouTube video - :type source_url: string (url) - :param download: if ``True``, download the audio stream - best matching ``preferred_index`` or ``preferred_format`` - and ``largest_audio``; - if ``False``, return the list of available audio streams - :type download: bool - :param output_file_path: the path where the downloaded audio should be saved; - if ``None``, create a temporary file - :type output_file_path: string (path) - :param preferred_index: preferably download this audio stream - :type preferred_index: int - :param largest_audio: if ``True``, download the largest audio stream available; - if ``False``, download the smallest one. - :type largest_audio: bool - :param preferred_format: preferably download this audio format - :type preferred_format: string - - :rtype: string (path) or list of pafy audio streams - - :raise ImportError: if ``pafy`` is not installed - :raise OSError: if ``output_file_path`` cannot be written - :raise ValueError: if ``source_url`` is not a valid YouTube URL + :param string source_url: the URL of the YouTube video + :param bool download: if ``True``, download the audio stream + best matching ``preferred_index`` or ``preferred_format`` + and ``largest_audio``; + if ``False``, return the list of available audio streams + :param string output_file_path: the path where the downloaded audio should be saved; + if ``None``, create a temporary file + :param int preferred_index: preferably download this audio stream + :param bool largest_audio: if ``True``, download the largest audio stream available; + if ``False``, download the smallest one. + :param string preferred_format: preferably download this audio format + :rtype: string or list of pafy audio streams + :raises: ImportError: if ``pafy`` is not installed + :raises: OSError: if ``output_file_path`` cannot be written + :raises: ValueError: if ``source_url`` is not a valid YouTube URL """ def select_audiostream(audiostreams): """ Select the audiostream best matching the given parameters. """ if preferred_index is not None: if preferred_index in range(len(audiostreams)): - self._log([u"Selecting audiostream with index %d", preferred_index]) + self.log([u"Selecting audiostream with index %d", preferred_index]) return audiostreams[preferred_index] else: - self._log([u"Audio stream index %d not allowed", preferred_index], Logger.WARNING) - self._log(u"Ignoring the requested audio stream index", Logger.WARNING) - # filter by preferred format + self.log_warn([u"Audio stream index '%d' not allowed", preferred_index]) + self.log_warn(u"Ignoring the requested audio stream index") + # selecting by preferred format streams = audiostreams if preferred_format is not None: - self._log([u"Filtering audiostreams by preferred format %s", preferred_format]) + self.log([u"Selecting audiostreams by preferred format %s", preferred_format]) streams = [audiostream for audiostream in streams if audiostream.extension == preferred_format] if len(streams) < 1: - self._log([u"No audiostream with preferred format %s", preferred_format]) + self.log([u"No audiostream with preferred format %s", preferred_format]) streams = audiostreams # sort by size streams = sorted([(audio.get_filesize(), audio) for audio in streams]) if largest_audio: - self._log(u"Selecting largest audiostream") + self.log(u"Selecting largest audiostream") selected = streams[-1][1] else: - self._log(u"Selecting smallest audiostream") + self.log(u"Selecting smallest audiostream") selected = streams[0][1] return selected try: import pafy except ImportError as exc: - self._log(u"pafy is not installed", Logger.CRITICAL) - raise exc + self.log_exc(u"Python module pafy is not installed", exc, True, ImportError) try: video = pafy.new(source_url) except (IOError, OSError, ValueError) as exc: - self._log([u"The specified source URL '%s' is not a valid YouTube URL", source_url], Logger.CRITICAL) - raise ValueError("The specified source URL is not a valid YouTube URL") + self.log_exc(u"The specified source URL '%s' is not a valid YouTube URL or you are offline" % (source_url), exc, True, ValueError) if not download: - self._log(u"Returning the list of audio streams") + self.log(u"Returning the list of audio streams") return video.audiostreams output_path = output_file_path if output_file_path is None: - self._log(u"output_path is None: creating temp file") - handler, output_path = gf.tmp_file(root=self.rconf["tmp_path"]) + self.log(u"output_path is None: creating temp file") + handler, output_path = gf.tmp_file(root=self.rconf[RuntimeConfiguration.TMP_PATH]) else: if not gf.file_can_be_written(output_path): - self._log([u"Path '%s' cannot be written (wrong permissions?)", output_path], Logger.CRITICAL) - raise OSError("Path '%s' cannot be written (wrong permissions?)" % output_path) + self.log_exc(u"Path '%s' cannot be written. Wrong permissions?" % (output_path), None, True, OSError) audiostream = select_audiostream(video.audiostreams) if output_file_path is None: gf.delete_file(handler, output_path) output_path += "." + audiostream.extension - self._log([u"output_path is '%s'", output_path]) - self._log(u"Downloading...") + self.log([u"output_path is '%s'", output_path]) + self.log(u"Downloading...") audiostream.download(filepath=output_path, quiet=True) - self._log(u"Downloading... done") + self.log(u"Downloading... done") return output_path diff --git a/aeneas/dtw.py b/aeneas/dtw.py index 1f4a0bfd..afe168d9 100644 --- a/aeneas/dtw.py +++ b/aeneas/dtw.py @@ -7,24 +7,28 @@ to align two audio waves, represented by their Mel-frequency cepstral coefficients (MFCCs). -The two classes provided by this module are: +This module contains the following classes: -1. :class:`aeneas.dtw.DTWAlgorithm` - is an enumeration of the available algorithms. -2. :class:`aeneas.dtw.DTWAligner` - is the actual feature extractor and aligner. +* :class:`~aeneas.dtw.DTWAlgorithm`, + an enumeration of the available algorithms; +* :class:`~aeneas.dtw.DTWAligner`, + the actual wave aligner; +* :class:`~aeneas.dtw.DTWExact`, + a DTW aligner implementing the exact (full) DTW algorithm; +* :class:`~aeneas.dtw.DTWStripe`, + a DTW aligner implementing the Sachoe-Chiba band heuristic. To align two wave files: -1. build an :class:`aeneas.dtw.DTWAligner` object - passing the paths of the two wave files - in the constructor, possibly with custom arguments - to fine-tune the alignment; -2. call ``compute_mfcc`` to extract the MFCCs of the two wave files; -3. call ``compute_path`` to compute the min cost path between - the MFCC representations of the two wave files; -4. obtain the map between the two wave files by reading the - ``computed_map`` property. +1. build an :class:`~aeneas.dtw.DTWAligner` object, + passing in the constructor + the paths of the two wave files + or their MFCC representations; +2. call :func:`~aeneas.dtw.DTWAligner.compute_path` + to compute the min cost path between + the MFCC representations of the two wave files. + +.. warning:: This module might be refactored in a future version """ from __future__ import absolute_import @@ -32,8 +36,8 @@ from __future__ import print_function import numpy -from aeneas.audiofile import AudioFileMonoWAVE -from aeneas.logger import Logger +from aeneas.audiofilemfcc import AudioFileMFCC +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration import aeneas.globalfunctions as gf @@ -44,7 +48,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -58,18 +62,19 @@ class DTWAlgorithm(object): """ Classical (exact) DTW algorithm. This implementation has ``O(nm)`` time and space complexity, - where ``n`` (respectively, ``m``) is the number of MFCCs + where ``n`` (respectively, ``m``) is the number of MFCC window shifts (vectors) of the real (respectively, synthesized) wave. """ STRIPE = "stripe" """ DTW algorithm restricted to a stripe around the main diagonal - (Sakoe-Chiba Band), for optimized memory usage and processing. + (Sakoe-Chiba Band), for reducing memory usage and run time. Note that this is an heuristic approximation of the optimal (exact) path. This implementation has ``O(nd)`` time and space complexity, - where ``n`` is the number of MFCCs of the real wave, - and ``d`` is the number of MFCCs + where ``n`` is the number of MFCC window shifts (vectors) + of the real wave, + and ``d`` is the number of MFCC window shifts corresponding to the margin. """ ALLOWED_VALUES = [EXACT, STRIPE] @@ -77,246 +82,219 @@ class DTWAlgorithm(object): -class DTWAligner(object): +class DTWAlignerNotInitialized(Exception): """ - The MFCC extractor and wave aligner. - - :param real_wave_path: the path to the real wav file (must be mono!) - :type real_wave_path: string (path) - :param synt_wave_path: the path to the synthesized wav file (must be mono!) - :type synt_wave_path: string (path) - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` - :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + Error raised when trying to compute + using an DTWAligner object whose real and/or synt waves + are not initialized yet. + """ + pass - :raise ValueError: if ``real_wave_path`` or ``synt_wave_path`` is ``None`` - or it does not exist, or if ``algorithm`` is not an allowed value + + +class DTWAligner(Loggable): + """ + The audio wave aligner. + + The two waves, henceforth named real and synthesized, + can be passed as :class:`~aeneas.audiofilemfcc.AudioFileMFCC` objects + or as file paths. + In the latter case, MFCCs will be extracted upon object creation. + + :param real_wave_mfcc: the real audio file + :type real_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :param synt_wave_mfcc: the synthesized audio file + :type synt_wave_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :param string real_wave_path: the path to the real audio file + :param string synt_wave_path: the path to the synthesized audio file + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + :raises: ValueError: if ``real_wave_mfcc`` or ``synt_wave_mfcc`` is not ``None`` + but not of type :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :raises: ValueError: if ``real_wave_path`` or ``synt_wave_path`` is not ``None`` + but it cannot be read """ TAG = u"DTWAligner" - def __init__(self, real_wave_path=None, synt_wave_path=None, rconf=None, logger=None): + def __init__( + self, + real_wave_mfcc=None, + synt_wave_mfcc=None, + real_wave_path=None, + synt_wave_path=None, + rconf=None, + logger=None + ): + if (real_wave_mfcc is not None) and (type(real_wave_mfcc) is not AudioFileMFCC): + raise ValueError(u"Real wave mfcc must be None or of type AudioFileMFCC") + if (synt_wave_mfcc is not None) and (type(synt_wave_mfcc) is not AudioFileMFCC): + raise ValueError(u"Synt wave mfcc must be None or of type AudioFileMFCC") if (real_wave_path is not None) and (not gf.file_can_be_read(real_wave_path)): - raise ValueError("Real wave cannot be read") + raise ValueError(u"Real wave cannot be read") if (synt_wave_path is not None) and (not gf.file_can_be_read(synt_wave_path)): - raise ValueError("Synt wave cannot be read") - if (rconf is not None) and (rconf["dtw_algorithm"] not in DTWAlgorithm.ALLOWED_VALUES): - raise ValueError("Algorithm value not allowed") - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() + raise ValueError(u"Synt wave cannot be read") + if (rconf is not None) and (rconf[RuntimeConfiguration.DTW_ALGORITHM] not in DTWAlgorithm.ALLOWED_VALUES): + raise ValueError(u"Algorithm value not allowed") + super(DTWAligner, self).__init__(rconf=rconf, logger=logger) + self.real_wave_mfcc = real_wave_mfcc + self.synt_wave_mfcc = synt_wave_mfcc self.real_wave_path = real_wave_path self.synt_wave_path = synt_wave_path - self.real_wave_full_mfcc = None - self.synt_wave_full_mfcc = None - self.real_wave_length = None - self.synt_wave_length = None - self.computed_path = None - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - - @property - def real_wave_full_mfcc(self): - """ - MFCCs of the real wave, including the 0-th. - - :rtype: numpy 2D array - - .. versionadded:: 1.1.0 - """ - return self.__real_wave_full_mfcc - - @real_wave_full_mfcc.setter - def real_wave_full_mfcc(self, real_wave_full_mfcc): - self.__real_wave_full_mfcc = real_wave_full_mfcc + if (self.real_wave_mfcc is None) and (self.real_wave_path is not None): + self.real_wave_mfcc = AudioFileMFCC(self.real_wave_path, rconf=self.rconf, logger=self.logger) + if (self.synt_wave_mfcc is None) and (self.synt_wave_path is not None): + self.synt_wave_mfcc = AudioFileMFCC(self.synt_wave_path, rconf=self.rconf, logger=self.logger) - @property - def real_wave_length(self): - """ - The length, in seconds, of the real wave. - - :rtype: float - - .. versionadded:: 1.1.0 + def compute_accumulated_cost_matrix(self): """ - return self.__real_wave_length + Compute the accumulated cost matrix, and return it. - @real_wave_length.setter - def real_wave_length(self, real_wave_length): - self.__real_wave_length = real_wave_length + :rtype: :class:`numpy.ndarray` (2D) + :raises: RuntimeError: if both the C extension and + the pure Python code did not succeed. - @property - def synt_wave_full_mfcc(self): + .. versionadded:: 1.2.0 """ - MFCCs of the synthesized wave, including the 0-th. - - :rtype: numpy 2D array + dtw = self._setup_dtw() + self.log(u"Returning accumulated cost matrix") + return dtw.compute_accumulated_cost_matrix() - .. versionadded:: 1.1.0 + def compute_path(self): """ - return self.__synt_wave_full_mfcc + Compute the min cost path between the two waves, and return it. - @synt_wave_full_mfcc.setter - def synt_wave_full_mfcc(self, synt_wave_full_mfcc): - self.__synt_wave_full_mfcc = synt_wave_full_mfcc + Return the computed path as a tuple with two elements, + each being a :class:`numpy.ndarray` (1D) of ``int`` indices: :: - @property - def synt_wave_length(self): - """ - The length, in seconds, of the synthesized wave. + ([r_1, r_2, ..., r_k], [s_1, s_2, ..., s_k]) - :rtype: float + where ``r_i`` are the indices in the real wave + and ``s_i`` are the indices in the synthesized wave, + and ``k`` is the length of the min cost path. - .. versionadded:: 1.1.0 + :rtype: tuple (see above) + :raises: RuntimeError: if both the C extension and + the pure Python code did not succeed. """ - return self.__synt_wave_length - - @synt_wave_length.setter - def synt_wave_length(self, synt_wave_length): - self.__synt_wave_length = synt_wave_length - - def compute_mfcc(self, real_wave=True, synt_wave=True): - """ - Compute the MFCCs of the two waves, - and store them internally. - - :param real_wave: if ``True``, extract MFCCs for the real wave - :type real_wave: bool - :param synt_wave: if ``True``, extract MFCCs for the synt wave - :type synt_wave: bool - - :raise OSError: if the real or synt wave file cannot be read + dtw = self._setup_dtw() + self.log(u"Computing path...") + wave_path = dtw.compute_path() + self.log(u"Computing path... done") + self.log(u"Translating path to full wave indices...") + real_indices = numpy.array([t[0] for t in wave_path]) + synt_indices = numpy.array([t[1] for t in wave_path]) + # TODO this depends whether we are masking or not + real_indices += self.real_wave_mfcc.head_length + self.log(u"Translating path to full wave indices... done") + return (real_indices, synt_indices) + + def compute_boundaries(self, synt_anchors): """ - if real_wave: - if not gf.file_can_be_read(self.real_wave_path): - raise OSError("Real wave path is None or it cannot be read") - self._log(u"Computing MFCCs for real wave...") - wave = AudioFileMonoWAVE(self.real_wave_path, rconf=self.rconf, logger=self.logger) - wave.extract_mfcc() - self.real_wave_full_mfcc = wave.audio_mfcc - self.real_wave_length = wave.audio_length - self._log(u"Computing MFCCs for real wave... done") - - if synt_wave: - if not gf.file_can_be_read(self.synt_wave_path): - raise OSError("Synt wave path is None or it cannot be read") - self._log(u"Computing MFCCs for synt wave...") - wave = AudioFileMonoWAVE(self.synt_wave_path, rconf=self.rconf, logger=self.logger) - wave.extract_mfcc() - self.synt_wave_full_mfcc = wave.audio_mfcc - self.synt_wave_length = wave.audio_length - self._log(u"Computing MFCCs for synt wave... done") + Compute the min cost path between the two waves, + and return a list of boundary points, + representing the argmin values with respect to + the provided ``synt_anchors`` timings. - def compute_accumulated_cost_matrix(self): - """ - Compute the accumulated cost matrix, - and return it. + If ``synt_anchors`` has ``k`` elements, + the returned array will have ``k+1`` elements, + accounting for the tail fragment. - :rtype: numpy 2D array + :param synt_anchors: the anchor time values (in seconds) of the synthesized fragments, + each representing the begin time in the synthesized wave + of the corresponding fragment + :type synt_anchors: list of :class:`~aeneas.timevalue.TimeValue` - :raise RuntimeError: if both the C extension and - the pure Python code did not succeed. + Return the list of boundary indices. - .. versionadded:: 1.2.0 + :rtype: :class:`numpy.ndarray` (1D) """ - dtw = self._setup_dtw() - self._log(u"Returning accumulated cost matrix") - return dtw.compute_accumulated_cost_matrix() + self.log(u"Computing path...") + real_indices, synt_indices = self.compute_path() + self.log(u"Computing path... done") + + self.log(u"Computing boundary indices...") + # both real_indices and synt_indices are w.r.t. the full wave + self.log([u"Fragments: %d", len(synt_anchors)]) + self.log([u"Path length: %d", len(real_indices)]) + # synt_anchors as in seconds, convert them in MFCC indices + mws = self.rconf.mws + anchor_indices = numpy.array([int(a[0] / mws) for a in synt_anchors]) + # right side sets the split point at the very beginning of "next" fragment + begin_indices = numpy.searchsorted(synt_indices, anchor_indices, side="right") + # first split must occur at zero + begin_indices[0] = 0 + # map onto real indices, obtaining "default" boundary indices + boundary_indices = numpy.append(real_indices[begin_indices], self.real_wave_mfcc.tail_begin) + self.log([u"Boundary indices: %d", len(boundary_indices)]) + self.log(u"Computing boundary indices... done") + return boundary_indices - def compute_path(self): + def _setup_dtw(self): """ - Compute the min cost path between the two waves, - and store it internally. - - :raise RuntimeError: if both the C extension and - the pure Python code did not succeed. + Set the DTW object up. """ - dtw = self._setup_dtw() - self._log(u"Computing path...") - self.computed_path = dtw.compute_path() - self._log(u"Computing path... done") + # check we have the AudioFileMFCC objects + if (self.real_wave_mfcc is None) or (self.real_wave_mfcc.middle_mfcc is None): + self.log_exc(u"The real wave MFCCs are not initialized", None, True, DTWAlignerNotInitialized) + if (self.synt_wave_mfcc is None) or (self.synt_wave_mfcc.middle_mfcc is None): + self.log_exc(u"The synt wave MFCCs are not initialized", None, True, DTWAlignerNotInitialized) - def _setup_dtw(self): - """ Setup DTW object """ # setup - algorithm = self.rconf["dtw_algorithm"] - delta = int(2 * self.rconf["dtw_margin"] / self.rconf["mfcc_win_shift"]) - mfcc2_length = self.synt_wave_full_mfcc.shape[1] - self._log([u"Requested algorithm: '%s'", algorithm]) - self._log([u"delta = %d", delta]) - self._log([u"m = %d", mfcc2_length]) + algorithm = self.rconf[RuntimeConfiguration.DTW_ALGORITHM] + delta = int(2 * self.rconf[RuntimeConfiguration.DTW_MARGIN] / self.rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT]) + mfcc2_length = self.synt_wave_mfcc.middle_length + self.log([u"Requested algorithm: '%s'", algorithm]) + self.log([u"delta = %d", delta]) + self.log([u"m = %d", mfcc2_length]) # check if delta is >= length of synt wave if mfcc2_length <= delta: - self._log(u"We have mfcc2_length <= delta") - if (self.rconf["c_ext"]) and (gf.can_run_c_extension()): + self.log(u"We have mfcc2_length <= delta") + if (self.rconf[RuntimeConfiguration.C_EXTENSIONS]) and (gf.can_run_c_extension()): # the C code can be run: since it is still faster, do not run EXACT - self._log(u"C extensions enabled and loaded: not selecting EXACT algorithm") + self.log(u"C extensions enabled and loaded: not selecting EXACT algorithm") else: - self._log(u"Selecting EXACT algorithm") + self.log(u"Selecting EXACT algorithm") algorithm = DTWAlgorithm.EXACT # execute the selected algorithm if algorithm == DTWAlgorithm.EXACT: - self._log(u"Computing with EXACT algo") + self.log(u"Computing with EXACT algo") dtw = DTWExact( - self.real_wave_full_mfcc, - self.synt_wave_full_mfcc, - self.logger + self.real_wave_mfcc.middle_mfcc, + self.synt_wave_mfcc.middle_mfcc, + rconf=self.rconf, + logger=self.logger ) else: - self._log(u"Computing with STRIPE algo") + self.log(u"Computing with STRIPE algo") dtw = DTWStripe( - self.real_wave_full_mfcc, - self.synt_wave_full_mfcc, + self.real_wave_mfcc.middle_mfcc, + self.synt_wave_mfcc.middle_mfcc, delta, - self.logger + rconf=self.rconf, + logger=self.logger ) return dtw - @property - def computed_map(self): - """ - Return the computed map between the two waves, - as a list of lists, each being a pair of floats: :: - [[r_1, s_1], [r_2, s_2], ..., [r_k, s_k]] - where ``r_i`` are the time instants in the real wave - and ``s_i`` are the time instants in the synthesized wave, - and ``k = n + m`` (or ``k = n + d``) - is the length of the min cost path. - :rtype: list of pairs of floats (see above) - """ - result = [] - for i in range(len(self.computed_path)): - real_time = self.computed_path[i][0] * self.rconf["mfcc_win_shift"] - synt_time = self.computed_path[i][1] * self.rconf["mfcc_win_shift"] - result.append([real_time, synt_time]) - return result - - - -class DTWStripe(object): +class DTWStripe(Loggable): TAG = u"DTWStripe" - def __init__(self, m1, m2, delta, logger=None): + def __init__(self, m1, m2, delta, rconf=None, logger=None): + super(DTWStripe, self).__init__(rconf=rconf, logger=logger) self.m1 = m1 self.m2 = m2 self.delta = delta - self.logger = logger or Logger() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) def compute_accumulated_cost_matrix(self): return gf.run_c_extension_with_fallback( - self._log, + self.log, "cdtw", self._compute_acm_c_extension, self._compute_acm_pure_python, @@ -325,47 +303,43 @@ def compute_accumulated_cost_matrix(self): ) def _compute_acm_c_extension(self): - self._log(u"Computing acm using C extension...") + self.log(u"Computing acm using C extension...") try: - self._log(u"Importing cdtw...") + self.log(u"Importing cdtw...") import aeneas.cdtw.cdtw - self._log(u"Importing cdtw... done") + self.log(u"Importing cdtw... done") # discard first MFCC component mfcc1 = self.m1[1:, :] mfcc2 = self.m2[1:, :] n = mfcc1.shape[1] m = mfcc2.shape[1] delta = self.delta - self._log([u"n m delta: %d %d %d", n, m, delta]) + self.log([u"n m delta: %d %d %d", n, m, delta]) if delta > m: - self._log(u"Limiting delta to m") + self.log(u"Limiting delta to m") delta = m cost_matrix, centers = aeneas.cdtw.cdtw.compute_cost_matrix_step(mfcc1, mfcc2, delta) accumulated_cost_matrix = aeneas.cdtw.cdtw.compute_accumulated_cost_matrix_step(cost_matrix, centers) - self._log(u"Computing acm using C extension... done") + self.log(u"Computing acm using C extension... done") return (True, accumulated_cost_matrix) except Exception as exc: - self._log(u"Computing acm using C extension... failed") - self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) + self.log_exc(u"An unexpected error occurred while running cdtw", exc, False, None) return (False, None) def _compute_acm_pure_python(self): - self._log(u"Computing acm using pure Python code...") + self.log(u"Computing acm using pure Python code...") try: cost_matrix, centers = self._compute_cost_matrix() accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix, centers) - self._log(u"Computing acm using pure Python code... done") + self.log(u"Computing acm using pure Python code... done") return (True, accumulated_cost_matrix) except Exception as exc: - self._log(u"Computing acm using pure Python code... failed") - self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) + self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None) return (False, None) def compute_path(self): return gf.run_c_extension_with_fallback( - self._log, + self.log, "cdtw", self._compute_path_c_extension, self._compute_path_pure_python, @@ -374,50 +348,46 @@ def compute_path(self): ) def _compute_path_c_extension(self): - self._log(u"Computing path using C extension...") + self.log(u"Computing path using C extension...") try: - self._log(u"Importing cdtw...") + self.log(u"Importing cdtw...") import aeneas.cdtw.cdtw - self._log(u"Importing cdtw... done") + self.log(u"Importing cdtw... done") # discard first MFCC component mfcc1 = self.m1[1:, :] mfcc2 = self.m2[1:, :] n = mfcc1.shape[1] m = mfcc2.shape[1] delta = self.delta - self._log([u"n m delta: %d %d %d", n, m, delta]) + self.log([u"n m delta: %d %d %d", n, m, delta]) if delta > m: - self._log(u"Limiting delta to m") + self.log(u"Limiting delta to m") delta = m best_path = aeneas.cdtw.cdtw.compute_best_path( mfcc1, mfcc2, delta ) - self._log(u"Computing path using C extension... done") + self.log(u"Computing path using C extension... done") return (True, best_path) except Exception as exc: - self._log(u"Computing path using C extension... failed") - self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) + self.log_exc(u"An unexpected error occurred while running cdtw", exc, False, None) return (False, None) def _compute_path_pure_python(self): - self._log(u"Computing path using pure Python code...") + self.log(u"Computing path using pure Python code...") try: cost_matrix, centers = self._compute_cost_matrix() accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix, centers) best_path = self._compute_best_path(accumulated_cost_matrix, centers) - self._log(u"Computing path using pure Python code... done") + self.log(u"Computing path using pure Python code... done") return (True, best_path) except Exception as exc: - self._log(u"Computing path using pure Python code... failed") - self._log(u"An unexpected exception occurred while running cdtw:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) + self.log_exc(u"An unexpected error occurred while running pure Python code", exc, False, None) return (False, None) def _compute_cost_matrix(self): - self._log(u"Computing cost matrix...") + self.log(u"Computing cost matrix...") # discard first MFCC component mfcc1 = self.m1[1:, :] mfcc2 = self.m2[1:, :] @@ -426,28 +396,28 @@ def _compute_cost_matrix(self): n = mfcc1.shape[1] m = mfcc2.shape[1] delta = self.delta - self._log([u"n m delta: %d %d %d", n, m, delta]) + self.log([u"n m delta: %d %d %d", n, m, delta]) if delta > m: - self._log(u"Limiting delta to m") + self.log(u"Limiting delta to m") delta = m cost_matrix = numpy.zeros((n, delta)) centers = numpy.zeros(n) for i in range(n): # center j at row i center_j = (m * i) // n - #self._log([u"Center at row %d is %d", i, center_j]) + #self.log([u"Center at row %d is %d", i, center_j]) range_start = max(0, center_j - (delta // 2)) range_end = range_start + delta if range_end > m: range_end = m range_start = range_end - delta centers[i] = range_start - #self._log([u"Range at row %d is %d %d", i, range_start, range_end]) + #self.log([u"Range at row %d is %d %d", i, range_start, range_end]) for j in range(range_start, range_end): tmp = mfcc1[:, i].transpose().dot(mfcc2[:, j]) tmp /= norm2_1[i] * norm2_2[j] cost_matrix[i][j - range_start] = 1 - tmp - self._log(u"Computing cost matrix... done") + self.log(u"Computing cost matrix... done") return (cost_matrix, centers) def _compute_accumulated_cost_matrix(self, cost_matrix, centers): @@ -458,9 +428,9 @@ def _compute_accumulated_cost_matrix(self, cost_matrix, centers): return self._compute_acm_in_place(cost_matrix, centers) def _compute_acm_in_place(self, cost_matrix, centers): - self._log(u"Computing the acm with the in-place algorithm...") + self.log(u"Computing the acm with the in-place algorithm...") n, delta = cost_matrix.shape - self._log([u"n delta: %d %d", n, delta]) + self.log([u"n delta: %d %d", n, delta]) current_row = numpy.copy(cost_matrix[0, :]) #cost_matrix[0][0] = current_row[0] for j in range(1, delta): @@ -480,15 +450,15 @@ def _compute_acm_in_place(self, cost_matrix, centers): if ((j+offset-1) < delta) and ((j+offset-1) >= 0): cost2 = cost_matrix[i-1][j+offset-1] cost_matrix[i][j] = current_row[j] + min(cost0, cost1, cost2) - self._log(u"Computing the acm with the in-place algorithm... done") + self.log(u"Computing the acm with the in-place algorithm... done") return cost_matrix # DISABLED #def _compute_acm_not_in_place(self, cost_matrix, centers): - # self._log(u"Computing the acm with the not-in-place algorithm...") + # self.log(u"Computing the acm with the not-in-place algorithm...") # acc_matrix = numpy.zeros(cost_matrix.shape) # n, delta = acc_matrix.shape - # self._log([u"n delta: %d %d", n, delta]) + # self.log([u"n delta: %d %d", n, delta]) # # first row # acc_matrix[0][0] = cost_matrix[0][0] # for j in range(1, delta): @@ -507,14 +477,14 @@ def _compute_acm_in_place(self, cost_matrix, centers): # if ((j+offset-1) < delta) and ((j+offset-1) >= 0): # cost2 = acc_matrix[i-1][j+offset-1] # acc_matrix[i][j] = cost_matrix[i][j] + min(cost0, cost1, cost2) - # self._log(u"Computing the acm with the not-in-place algorithm... done") + # self.log(u"Computing the acm with the not-in-place algorithm... done") # return acc_matrix def _compute_best_path(self, acc_matrix, centers): - self._log(u"Computing best path...") + self.log(u"Computing best path...") # get dimensions n, delta = acc_matrix.shape - self._log([u"n delta: %d %d", n, delta]) + self.log([u"n delta: %d %d", n, delta]) i = n - 1 j = delta - 1 + centers[i] path = [(i, j)] @@ -549,61 +519,57 @@ def _compute_best_path(self, acc_matrix, centers): (i-1, j-1) ] min_cost = numpy.argmin(costs) - #self._log([u"Selected min cost move %d", min_cost]) + #self.log([u"Selected min cost move %d", min_cost]) min_move = moves[min_cost] path.append(min_move) i, j = min_move # reverse path and return path.reverse() - self._log(u"Computing best path... done") + self.log(u"Computing best path... done") return path -class DTWExact(object): +class DTWExact(Loggable): TAG = u"DTWExact" - def __init__(self, m1, m2, logger=None): + def __init__(self, m1, m2, rconf=None, logger=None): + super(DTWExact, self).__init__(rconf=rconf, logger=logger) self.m1 = m1 self.m2 = m2 - self.logger = logger or Logger() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) def compute_accumulated_cost_matrix(self): - self._log(u"Computing acm using pure Python code...") + self.log(u"Computing acm using pure Python code...") cost_matrix = self._compute_cost_matrix() accumulated_cost_matrix = self._compute_accumulated_cost_matrix(cost_matrix) - self._log(u"Computing acm using pure Python code... done") + self.log(u"Computing acm using pure Python code... done") return accumulated_cost_matrix def compute_path(self): - self._log(u"Computing path using pure Python code...") + self.log(u"Computing path using pure Python code...") accumulated_cost_matrix = self.compute_accumulated_cost_matrix() best_path = self._compute_best_path(accumulated_cost_matrix) - self._log(u"Computing path using pure Python code... done") + self.log(u"Computing path using pure Python code... done") return best_path def _compute_cost_matrix(self): - self._log(u"Computing cost matrix...") + self.log(u"Computing cost matrix...") # discard first MFCC component mfcc1 = self.m1[1:, :] mfcc2 = self.m2[1:, :] norm2_1 = numpy.sqrt(numpy.sum(mfcc1 ** 2, 0)) norm2_2 = numpy.sqrt(numpy.sum(mfcc2 ** 2, 0)) # compute dot product - self._log(u"Computing matrix with transpose+dot...") + self.log(u"Computing matrix with transpose+dot...") cost_matrix = mfcc1.transpose().dot(mfcc2) - self._log(u"Computing matrix with transpose+dot... done") + self.log(u"Computing matrix with transpose+dot... done") # normalize - self._log(u"Normalizing matrix...") + self.log(u"Normalizing matrix...") norm_matrix = numpy.outer(norm2_1, norm2_2) cost_matrix = 1 - (cost_matrix / norm_matrix) - self._log(u"Normalizing matrix... done") - self._log(u"Computing cost matrix... done") + self.log(u"Normalizing matrix... done") + self.log(u"Computing cost matrix... done") return cost_matrix def _compute_accumulated_cost_matrix(self, cost_matrix): @@ -614,9 +580,9 @@ def _compute_accumulated_cost_matrix(self, cost_matrix): return self._compute_acm_in_place(cost_matrix) def _compute_acm_in_place(self, cost_matrix): - self._log(u"Computing the acm with the in-place algorithm...") + self.log(u"Computing the acm with the in-place algorithm...") n, m = cost_matrix.shape - self._log([u"n m: %d %d", n, m]) + self.log([u"n m: %d %d", n, m]) current_row = numpy.copy(cost_matrix[0, :]) #cost_matrix[0][0] = current_row[0] for j in range(1, m): @@ -630,15 +596,15 @@ def _compute_acm_in_place(self, cost_matrix): cost_matrix[i][j-1], cost_matrix[i-1][j-1] ) - self._log(u"Computing the acm with the in-place algorithm... done") + self.log(u"Computing the acm with the in-place algorithm... done") return cost_matrix # DISABLED #def _compute_acm_not_in_place(self, cost_matrix): - # self._log(u"Computing the acm with the not-in-place algorithm...") + # self.log(u"Computing the acm with the not-in-place algorithm...") # acc_matrix = numpy.zeros(cost_matrix.shape) # n, m = acc_matrix.shape - # self._log([u"n m: %d %d", n, m]) + # self.log([u"n m: %d %d", n, m]) # acc_matrix[0][0] = cost_matrix[0][0] # for j in range(1, m): # acc_matrix[0][j] = acc_matrix[0][j-1] + cost_matrix[0][j] @@ -651,14 +617,14 @@ def _compute_acm_in_place(self, cost_matrix): # acc_matrix[i][j-1], # acc_matrix[i-1][j-1] # ) - # self._log(u"Computing the acm with the not-in-place algorithm... done") + # self.log(u"Computing the acm with the not-in-place algorithm... done") # return acc_matrix def _compute_best_path(self, acc_matrix): - self._log(u"Computing best path...") + self.log(u"Computing best path...") # get dimensions n, m = acc_matrix.shape - self._log([u"n m: %d %d", n, m]) + self.log([u"n m: %d %d", n, m]) i = n - 1 j = m - 1 path = [(i, j)] @@ -682,13 +648,13 @@ def _compute_best_path(self, acc_matrix): (i-1, j-1) ] min_cost = numpy.argmin(costs) - #self._log([u"Selected min cost move %d", min_cost]) + #self.log([u"Selected min cost move %d", min_cost]) min_move = moves[min_cost] path.append(min_move) i, j = min_move # reverse path and return path.reverse() - self._log(u"Computing best path... done") + self.log(u"Computing best path... done") return path diff --git a/aeneas/espeakwrapper.py b/aeneas/espeakwrapper.py index 641d5a30..38c423fa 100644 --- a/aeneas/espeakwrapper.py +++ b/aeneas/espeakwrapper.py @@ -2,18 +2,18 @@ # coding=utf-8 """ -Wrapper around ``espeak`` to synthesize text into a ``wav`` audio file. +This module contains the following classes: + +* :class:`~aeneas.espeakwrapper.ESPEAKWrapper`, a wrapper for the ``eSpeak`` TTS engine. """ from __future__ import absolute_import from __future__ import print_function -import subprocess -from aeneas.audiofile import AudioFileMonoWAVE -from aeneas.audiofile import AudioFileUnsupportedFormatError from aeneas.language import Language -from aeneas.logger import Logger from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.timevalue import TimeValue +from aeneas.ttswrapper import TTSWrapper import aeneas.globalfunctions as gf __author__ = "Alberto Pettarin" @@ -23,126 +23,631 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class ESPEAKWrapper(object): +class ESPEAKWrapper(TTSWrapper): """ - Wrapper around ``espeak`` to synthesize text into a ``wav`` audio file. + A wrapper for the ``espeak`` TTS engine. + + This wrapper is the default TTS engine for ``aeneas``. + + This wrapper supports calling the TTS engine + via ``subprocess`` or via Python C extension. + + In abstract terms, it performs one or more calls like :: + + $ espeak -v voice_code -w /tmp/output_file.wav < text + + To specify the path of the TTS executable, use :: - It will perform one or more calls like :: + "tts=espeak|tts_path=/path/to/espeak" - $ espeak -v language_code -w /tmp/output_file.wav < text + in the ``rconf`` object. - In case of multiple text fragments, the resulting wav files - will be joined together. + To run the ``cew`` Python C extension + in a separate process via + :class:`~aeneas.cewsubprocess.CEWSubprocess`, use :: + "cew_subprocess_enabled=True|cew_subprocess_path=/path/to/python" + + in the ``rconf`` object. + + See :class:`~aeneas.ttswrapper.TTSWrapper` for the available functions. + Below are listed the languages supported by this wrapper. + + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ - TAG = u"ESPEAKWrapper" + AFR = Language.AFR + """ Afrikaans (not tested) """ - def __init__(self, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() + ARG = Language.ARG + """ Aragonese (not tested) """ - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) + BOS = Language.BOS + """ Bosnian (not tested) """ - def _replace_language(self, language): - """ - Mock support for a given language by - synthesizing using a similar language. + BUL = Language.BUL + """ Bulgarian """ - :param language: the requested language - :type language: :class:`aeneas.language.Language` enum - :rtype: :class:`aeneas.language.Language` enum - """ - if language == Language.UK: - self._log([u"Replaced '%s' with '%s'", Language.UK, Language.RU]) - return Language.RU - return language - - def synthesize_multiple( - self, - text_file, - output_file_path, - quit_after=None, - backwards=False - ): - """ - Synthesize the text contained in the given fragment list - into a ``wav`` file. - - :param text_file: the text file to be synthesized - :type text_file: :class:`aeneas.textfile.TextFile` - :param output_file_path: the path to the output audio file - :type output_file_path: string (path) - :param quit_after: stop synthesizing as soon as - reaching this many seconds - :type quit_after: float - :param backwards: synthesizing from the end of the text file - :type backwards: bool - :rtype: tuple (anchors, total_time, num_chars) - - :raise TypeError: if ``text_file`` is ``None`` or - one of the text fragments is not a ``unicode`` object - :raise ValueError: if ``rconf["allow_unlisted_languages"]`` is ``False`` and - a fragment has its language code not listed in - :class:`aeneas.language.Language` - :raise OSError: if output file cannot be written to ``output_file_path`` - :raise RuntimeError: if both the C extension and - the pure Python code did not succeed. - """ - # check that text_file is not None - if text_file is None: - self._log(u"text_file is None", Logger.CRITICAL) - raise TypeError("text_file is None") - - # check that the lines in the text file all have - # a supported language code and unicode type - if not self.rconf["allow_unlisted_languages"]: - for fragment in text_file.fragments: - if fragment.language not in Language.ALLOWED_VALUES: - self._log([u"Language '%s' is not allowed", fragment.language], Logger.CRITICAL) - raise ValueError("Language code not allowed") - for fragment in text_file.fragments: - for line in fragment.lines: - if not gf.is_unicode(line): - self._log(u"Text file must contain only unicode strings", Logger.CRITICAL) - raise TypeError("Text file must contain only unicode strings") - - # log parameters - if quit_after is not None: - self._log([u"Quit after reaching %.3f", quit_after]) - if backwards: - self._log(u"Synthesizing backwards") - - # check that output_file_path can be written - if not gf.file_can_be_written(output_file_path): - self._log([u"Cannot write output file to '%s'", output_file_path], Logger.CRITICAL) - raise OSError("Cannot write output file") - - return gf.run_c_extension_with_fallback( - self._log, - "cew", - self._synthesize_multiple_c_extension, - self._synthesize_multiple_pure_python, - (text_file, output_file_path, quit_after, backwards), - c_extension=self.rconf["c_ext"] + CAT = Language.CAT + """ Catalan """ + + CES = Language.CES + """ Czech """ + + CMN = Language.CMN + """ Mandarin Chinese (not tested) """ + + CYM = Language.CYM + """ Welsh """ + + DAN = Language.DAN + """ Danish """ + + DEU = Language.DEU + """ German """ + + ELL = Language.ELL + """ Greek (Modern) """ + + ENG = Language.ENG + """ English """ + + EPO = Language.EPO + """ Esperanto (not tested) """ + + EST = Language.EST + """ Estonian """ + + FAS = Language.FAS + """ Persian """ + + FIN = Language.FIN + """ Finnish """ + + FRA = Language.FRA + """ French """ + + GLE = Language.GLE + """ Irish """ + + GRC = Language.GRC + """ Greek (Ancient) """ + + HIN = Language.HIN + """ Hindi (not tested) """ + + HRV = Language.HRV + """ Croatian """ + + HUN = Language.HUN + """ Hungarian """ + + HYE = Language.HYE + """ Armenian (not tested) """ + + IND = Language.IND + """ Indonesian (not tested) """ + + ISL = Language.ISL + """ Icelandic """ + + ITA = Language.ITA + """ Italian """ + + JBO = Language.JBO + """ Lojban (not tested) """ + + KAN = Language.KAN + """ Kannada (not tested) """ + + KAT = Language.KAT + """ Georgian (not tested) """ + + KUR = Language.KUR + """ Kurdish (not tested) """ + + LAT = Language.LAT + """ Latin """ + + LAV = Language.LAV + """ Latvian """ + + LFN = Language.LFN + """ Lingua Franca Nova (not tested) """ + + LIT = Language.LIT + """ Lithuanian """ + + MAL = Language.MAL + """ Malayalam (not tested) """ + + MKD = Language.MKD + """ Macedonian (not tested) """ + + MSA = Language.MSA + """ Malay (not tested) """ + + NEP = Language.NEP + """ Nepali (not tested) """ + + NLD = Language.NLD + """ Dutch """ + + NOR = Language.NOR + """ Norwegian """ + + PAN = Language.PAN + """ Panjabi (not tested) """ + + POL = Language.POL + """ Polish """ + + POR = Language.POR + """ Portuguese """ + + RON = Language.RON + """ Romanian """ + + RUS = Language.RUS + """ Russian """ + + SLK = Language.SLK + """ Slovak """ + + SPA = Language.SPA + """ Spanish """ + + SQI = Language.SQI + """ Albanian (not tested) """ + + SRP = Language.SRP + """ Serbian """ + + SWA = Language.SWA + """ Swahili """ + + SWE = Language.SWE + """ Swedish """ + + TAM = Language.TAM + """ Tamil (not tested) """ + + TUR = Language.TUR + """ Turkish """ + + UKR = Language.UKR + """ Ukrainian """ + + VIE = Language.VIE + """ Vietnamese (not tested) """ + + YUE = Language.YUE + """ Yue Chinese (not tested) """ + + ZHO = Language.ZHO + """ Chinese (not tested) """ + + ENG_GBR = "eng-GBR" + """ English (GB) """ + + ENG_SCT = "eng-SCT" + """ English (Scotland) (not tested) """ + + ENG_USA = "eng-USA" + """ English (USA) """ + + SPA_ESP = "spa-ESP" + """ Spanish (Castillan) """ + + FRA_BEL = "fra-BEL" + """ French (Belgium) (not tested) """ + + FRA_FRA = "fra-FRA" + """ French (France) """ + + POR_BRA = "por-bra" + """ Portuguese (Brazil) (not tested) """ + + POR_PRT = "por-prt" + """ Portuguese (Portugal) """ + + AF = "af" + """ Afrikaans (not tested) """ + + AN = "an" + """ Aragonese (not tested) """ + + BG = "bg" + """ Bulgarian """ + + BS = "bs" + """ Bosnian (not tested) """ + + CA = "ca" + """ Catalan """ + + CS = "cs" + """ Czech """ + + CY = "cy" + """ Welsh """ + + DA = "da" + """ Danish """ + + DE = "de" + """ German """ + + EL = "el" + """ Greek (Modern) """ + + EN = "en" + """ English """ + + EN_GB = "en-gb" + """ English (GB) """ + + EN_SC = "en-sc" + """ English (Scotland) (not tested) """ + + EN_UK_NORTH = "en-uk-north" + """ English (Northern) (not tested) """ + + EN_UK_RP = "en-uk-rp" + """ English (Received Pronunciation) (not tested) """ + + EN_UK_WMIDS = "en-uk-wmids" + """ English (Midlands) (not tested) """ + + EN_US = "en-us" + """ English (USA) """ + + EN_WI = "en-wi" + """ English (West Indies) (not tested) """ + + EO = "eo" + """ Esperanto (not tested) """ + + ES = "es" + """ Spanish (Castillan) """ + + ES_LA = "es-la" + """ Spanish (Latin America) (not tested) """ + + ET = "et" + """ Estonian """ + + FA = "fa" + """ Persian """ + + FA_PIN = "fa-pin" + """ Persian (Pinglish) """ + + FI = "fi" + """ Finnish """ + + FR = "fr" + """ French """ + + FR_BE = "fr-be" + """ French (Belgium) (not tested) """ + + FR_FR = "fr-fr" + """ French (France) """ + + GA = "ga" + """ Irish """ + + # NOTE already defined + #GRC = "grc" + #""" Greek (Ancient) """ + + HI = "hi" + """ Hindi (not tested) """ + + HR = "hr" + """ Croatian """ + + HU = "hu" + """ Hungarian """ + + HY = "hy" + """ Armenian (not tested) """ + + HY_WEST = "hy-west" + """ Armenian (West) (not tested) """ + + ID = "id" + """ Indonesian (not tested) """ + + IS = "is" + """ Icelandic """ + + IT = "it" + """ Italian """ + + # NOTE already defined + #JBO = "jbo" + #""" Lojban (not tested) """ + + KA = "ka" + """ Georgian (not tested) """ + + KN = "kn" + """ Kannada (not tested) """ + + KU = "ku" + """ Kurdish (not tested) """ + + LA = "la" + """ Latin """ + + # NOTE already defined + #LFN = "lfn" + #""" Lingua Franca Nova (not tested) """ + + LT = "lt" + """ Lithuanian """ + + LV = "lv" + """ Latvian """ + + MK = "mk" + """ Macedonian (not tested) """ + + ML = "ml" + """ Malayalam (not tested) """ + + MS = "ms" + """ Malay (not tested) """ + + NE = "ne" + """ Nepali (not tested) """ + + NL = "nl" + """ Dutch """ + + NO = "no" + """ Norwegian """ + + PA = "pa" + """ Panjabi (not tested) """ + + PL = "pl" + """ Polish """ + + PT = "pt" + """ Portuguese """ + + PT_BR = "pt-br" + """ Portuguese (Brazil) (not tested) """ + + PT_PT = "pt-pt" + """ Portuguese (Portugal) """ + + RO = "ro" + """ Romanian """ + + RU = "ru" + """ Russian """ + + SQ = "sq" + """ Albanian (not tested) """ + + SK = "sk" + """ Slovak """ + + SR = "sr" + """ Serbian """ + + SV = "sv" + """ Swedish """ + + SW = "sw" + """ Swahili """ + + TA = "ta" + """ Tamil (not tested) """ + + TR = "tr" + """ Turkish """ + + UK = "uk" + """ Ukrainian """ + + VI = "vi" + """ Vietnamese (not tested) """ + + VI_HUE = "vi-hue" + """ Vietnamese (hue) (not tested) """ + + VI_SGN = "vi-sgn" + """ Vietnamese (sgn) (not tested) """ + + ZH = "zh" + """ Mandarin Chinese (not tested) """ + + ZH_YUE = "zh-yue" + """ Yue Chinese (not tested) """ + + LANGUAGE_TO_VOICE_CODE = { + AF : "af", + AN : "an", + BG : "bg", + BS : "bs", + CA : "ca", + CS : "cs", + CY : "cy", + DA : "da", + DE : "de", + EL : "el", + EN : "en", + EN_GB : "en-gb", + EN_SC : "en-sc", + EN_UK_NORTH : "en-uk-north", + EN_UK_RP : "en-uk-rp", + EN_UK_WMIDS : "en-uk-wmids", + EN_US : "en-us", + EN_WI : "en-wi", + EO : "eo", + ES : "es", + ES_LA : "es-la", + ET : "et", + FA : "fa", + FA_PIN : "fa-pin", + FI : "fi", + FR : "fr", + FR_BE : "fr-be", + FR_FR : "fr-fr", + GA : "ga", + #GRC : "grc", + HI : "hi", + HR : "hr", + HU : "hu", + HY : "hy", + HY_WEST : "hy-west", + ID : "id", + IS : "is", + IT : "it", + #JBO : "jbo", + KA : "ka", + KN : "kn", + KU : "ku", + LA : "la", + #LFN : "lfn", + LT : "lt", + LV : "lv", + MK : "mk", + ML : "ml", + MS : "ms", + NE : "ne", + NL : "nl", + NO : "no", + PA : "pa", + PL : "pl", + PT : "pt", + PT_BR : "pt-br", + PT_PT : "pt-pt", + RO : "ro", + RU : "ru", + SQ : "sq", + SK : "sk", + SR : "sr", + SV : "sv", + SW : "sw", + TA : "ta", + TR : "tr", + UK : "ru", # NOTE mocking support for Ukrainian with Russian voice + VI : "vi", + VI_HUE : "vi-hue", + VI_SGN : "vi-sgn", + ZH : "zh", + ZH_YUE : "zh-yue", + AFR : "af", + ARG : "an", + BOS : "bs", + BUL : "bg", + CAT : "ca", + CES : "cs", + CMN : "zh", + CYM : "cy", + DAN : "da", + DEU : "de", + ELL : "el", + ENG : "en", + EPO : "eo", + EST : "et", + FAS : "fa", + FIN : "fi", + FRA : "fr", + GLE : "ga", + GRC : "grc", + HIN : "hi", + HRV : "hr", + HUN : "hu", + HYE : "hy", + IND : "id", + ISL : "is", + ITA : "it", + JBO : "jbo", + KAN : "kn", + KAT : "ka", + KUR : "ku", + LAT : "la", + LAV : "lv", + LFN : "lfn", + LIT : "lt", + MAL : "ml", + MKD : "mk", + MSA : "ms", + NEP : "ne", + NLD : "nl", + NOR : "no", + PAN : "pa", + POL : "pl", + POR : "pt", + RON : "ro", + RUS : "ru", + SLK : "sk", + SPA : "es", + SQI : "sq", + SRP : "sr", + SWA : "sw", + SWE : "sv", + TAM : "ta", + TUR : "tr", + UKR : "ru", # NOTE mocking support for Ukrainian with Russian voice + VIE : "vi", + YUE : "zh-yue", + ZHO : "zh", + ENG_GBR : "en-gb", + ENG_SCT : "en-sc", + ENG_USA : "en-us", + SPA_ESP : "es-es", + FRA_BEL : "fr-be", + FRA_FRA : "fr-fr", + POR_BRA : "pt-br", + POR_PRT : "pt-pt" + } + DEFAULT_LANGUAGE = ENG + + OUTPUT_MONO_WAVE = True + + TAG = u"ESPEAKWrapper" + + def __init__(self, rconf=None, logger=None): + super(ESPEAKWrapper, self).__init__( + has_subprocess_call=True, + has_c_extension_call=True, + has_python_call=False, + rconf=rconf, + logger=logger ) + self.set_subprocess_arguments([ + self.rconf[RuntimeConfiguration.TTS_PATH], + u"-v", + TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING, + u"-w", + TTSWrapper.CLI_PARAMETER_WAVE_PATH, + TTSWrapper.CLI_PARAMETER_TEXT_STDIN + ]) + + def _synthesize_multiple_c_extension(self, text_file, output_file_path, quit_after=None, backwards=False): + """ + Synthesize multiple text fragments, using the cew extension. - def _synthesize_multiple_c_extension( - self, - text_file, - output_file_path, - quit_after=None, - backwards=False - ): - self._log(u"Synthesizing using C extension...") + Return a tuple (anchors, total_time, num_chars). + + :rtype: (bool, (list, :class:`~aeneas.timevalue.TimeValue`, int)) + """ + self.log(u"Synthesizing using C extension...") # convert parameters from Python values to C values try: @@ -152,52 +657,75 @@ def _synthesize_multiple_c_extension( c_backwards = 0 if backwards: c_backwards = 1 - self._log([u"output_file_path: %s", output_file_path]) - self._log([u"c_quit_after: %.3f", c_quit_after]) - self._log([u"c_backwards: %d", c_backwards]) - self._log(u"Preparing c_text...") - c_text = [] + self.log([u"output_file_path: %s", output_file_path]) + self.log([u"c_quit_after: %.3f", c_quit_after]) + self.log([u"c_backwards: %d", c_backwards]) + self.log(u"Preparing u_text...") + u_text = [] fragments = text_file.fragments for fragment in fragments: f_lang = fragment.language f_text = fragment.filtered_text if f_lang is None: - f_lang = Language.EN - f_lang = self._replace_language(f_lang) + f_lang = self.DEFAULT_LANGUAGE + f_voice_code = self._language_to_voice_code(f_lang) if f_text is None: f_text = u"" + u_text.append((f_voice_code, f_text)) + self.log(u"Preparing u_text... done") + + # call C extension + sr = None + sf = None + intervals = None + if self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_ENABLED]: + self.log(u"Using cewsubprocess to call aeneas.cew") + try: + self.log(u"Importing aeneas.cewsubprocess...") + from aeneas.cewsubprocess import CEWSubprocess + self.log(u"Importing aeneas.cewsubprocess... done") + self.log(u"Calling aeneas.cewsubprocess...") + cewsub = CEWSubprocess(rconf=self.rconf, logger=self.logger) + sr, sf, intervals = cewsub.synthesize_multiple(output_file_path, c_quit_after, c_backwards, u_text) + self.log(u"Calling aeneas.cewsubprocess... done") + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running cewsubprocess", exc, False, None) + # NOTE not critical, try calling aeneas.cew directly + #return (False, None) + + if sr is None: + self.log(u"Preparing c_text...") if gf.PY2: # Python 2 => pass byte strings - c_text.append((gf.safe_bytes(f_lang), gf.safe_bytes(f_text))) + c_text = [(gf.safe_bytes(t[0]), gf.safe_bytes(t[1])) for t in u_text] else: # Python 3 => pass Unicode strings - c_text.append((gf.safe_unicode(f_lang), gf.safe_unicode(f_text))) - self._log(u"Preparing c_text... done") + c_text = [(gf.safe_unicode(t[0]), gf.safe_unicode(t[1])) for t in u_text] + self.log(u"Preparing c_text... done") + + self.log(u"Calling aeneas.cew directly") + try: + self.log(u"Importing aeneas.cew...") + import aeneas.cew.cew + self.log(u"Importing aeneas.cew... done") + self.log(u"Calling aeneas.cew...") + sr, sf, intervals = aeneas.cew.cew.synthesize_multiple( + output_file_path, + c_quit_after, + c_backwards, + c_text + ) + self.log(u"Calling aeneas.cew... done") + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running cew", exc, False, None) + return (False, None) - # call C extension - try: - self._log(u"Importing aeneas.cew...") - import aeneas.cew.cew - self._log(u"Importing aeneas.cew... done") - self._log(u"Calling aeneas.cew...") - sr, sf, intervals = aeneas.cew.cew.synthesize_multiple( - output_file_path, - c_quit_after, - c_backwards, - c_text - ) - self._log(u"Calling aeneas.cew... done") - except Exception as exc: - self._log(u"Calling aeneas.cew... failed") - self._log(u"An unexpected exception occurred while running cew:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) - self._log([u"sr: %d", sr]) - self._log([u"sf: %d", sf]) + self.log([u"sr: %d", sr]) + self.log([u"sf: %d", sf]) # create output anchors = [] - current_time = 0.0 + current_time = TimeValue("0.000") num_chars = 0 if backwards: fragments = fragments[::-1] @@ -206,301 +734,78 @@ def _synthesize_multiple_c_extension( fragment = fragments[i] # store for later output anchors.append([ - intervals[i][0], + TimeValue(intervals[i][0]), fragment.identifier, fragment.filtered_text ]) # increase the character counter num_chars += fragment.characters # update current_time - current_time = intervals[i][1] + current_time = TimeValue(intervals[i][1]) # return output # NOTE anchors do not make sense if backwards == True - self._log([u"Returning %d time anchors", len(anchors)]) - self._log([u"Current time %.3f", current_time]) - self._log([u"Synthesized %d characters", num_chars]) - self._log(u"Synthesizing using C extension... done") + self.log([u"Returning %d time anchors", len(anchors)]) + self.log([u"Current time %.3f", current_time]) + self.log([u"Synthesized %d characters", num_chars]) + self.log(u"Synthesizing using C extension... done") return (True, (anchors, current_time, num_chars)) - def _synthesize_multiple_pure_python( - self, - text_file, - output_file_path, - quit_after=None, - backwards=False - ): - def synthesize_and_clean(text, language): - """ - Synthesize a single fragment, pure Python, - and immediately remove the temporary file. - """ - self._log(u"Synthesizing text...") - handler, tmp_destination = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"]) - result, data = self._synthesize_single_pure_python( - text=(text + u" "), - language=language, - output_file_path=tmp_destination - ) - self._log([u"Removing temporary file '%s'", tmp_destination]) - gf.delete_file(handler, tmp_destination) - self._log(u"Synthesizing text... done") - return data - - self._log(u"Synthesizing using pure Python...") - - try: - # get sample rate and encoding - du_nu, sample_rate, encoding, da_nu = synthesize_and_clean( - u"Dummy text to get sample_rate", - Language.EN - ) - - # open output file - output_file = AudioFileMonoWAVE( - file_path=output_file_path, - logger=self.logger - ) - output_file.audio_format = encoding - output_file.audio_sample_rate = sample_rate - - # create output - anchors = [] - current_time = 0.0 - num = 0 - num_chars = 0 - fragments = text_file.fragments - if backwards: - fragments = fragments[::-1] - for fragment in fragments: - # replace language - language = self._replace_language(fragment.language) - # synthesize and get the duration of the output file - self._log([u"Synthesizing fragment %d", num]) - duration, sr_nu, enc_nu, data = synthesize_and_clean( - text=fragment.filtered_text, - language=language - ) - # store for later output - anchors.append([current_time, fragment.identifier, fragment.text]) - # increase the character counter - num_chars += fragment.characters - # append/prepend data - self._log([u"Fragment %d starts at: %f", num, current_time]) - if duration > 0: - self._log([u"Fragment %d duration: %f", num, duration]) - current_time += duration - # - # NOTE since numpy.append cannot be in place, - # it seems that the only alternative to make - # this more efficient consists in pre-allocating - # the destination array, - # possibly truncating or extending it as needed - # - if backwards: - output_file.prepend_data(data) - else: - output_file.append_data(data) - else: - self._log([u"Fragment %d has zero duration", num]) - - # increment fragment counter - num += 1 - - # check if we must stop synthesizing because we have enough audio - if (quit_after is not None) and (current_time > quit_after): - self._log([u"Quitting after reached duration %.3f", current_time]) - break - - # write output file - self._log([u"Writing audio file '%s'", output_file_path]) - output_file.write(file_path=output_file_path) - self._log(u"Synthesizing using pure Python... done") - except Exception as exc: - self._log(u"Synthesizing using pure Python... failed") - self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) - - # return output - # NOTE anchors do not make sense if backwards == True - self._log([u"Returning %d time anchors", len(anchors)]) - self._log([u"Current time %.3f", current_time]) - self._log([u"Synthesized %d characters", num_chars]) - self._log(u"Synthesizing using pure Python... done") - return (True, (anchors, current_time, num_chars)) - - def synthesize_single( - self, - text, - language, - output_file_path - ): - """ - Create a ``wav`` audio file containing the synthesized text. - - The ``text`` must be a unicode string encodable with UTF-8, - otherwise ``espeak`` might fail. - - Return the duration of the synthesized audio file, in seconds. - - :param text: the text to synthesize - :type text: unicode - :param language: the language to use - :type language: :class:`aeneas.language.Language` enum - :param output_file_path: the path of the output audio file - :type output_file_path: string - :rtype: float - - :raise TypeError: if ``text`` is ``None`` or it is not a ``unicode`` object - :raise ValueError: if ``rconf["allow_unlisted_languages"]`` is ``False`` and - ``language`` is not listed in - :class:`aeneas.language.Language` - :raise OSError: if output file cannot be written to ``output_file_path`` - :raise RuntimeError: if both the C extension and - the pure Python code did not succeed. - """ - # check that text_file is not None - if text is None: - self._log(u"text is None", Logger.CRITICAL) - raise TypeError("text is None") - - # check that text has unicode type - if not gf.is_unicode(text): - self._log(u"text must be a unicode string", Logger.CRITICAL) - raise TypeError("text must be a unicode string") - - # check that output_file_path can be written - if not gf.file_can_be_written(output_file_path): - self._log([u"Cannot write output file to '%s'", output_file_path], Logger.CRITICAL) - raise OSError("Cannot write output file") - - # check that the requested language is listed in language.py - if (language not in Language.ALLOWED_VALUES) and (not self.rconf["allow_unlisted_languages"]): - self._log([u"Language '%s' is not allowed", language], Logger.CRITICAL) - raise ValueError("Language code not allowed") - - self._log([u"Synthesizing text: '%s'", text]) - self._log([u"Synthesizing language: '%s'", language]) - self._log([u"Synthesizing to file: '%s'", output_file_path]) - - # return zero if text is the empty string - if len(text) == 0: - self._log(u"len(text) is zero: returning 0.0") - return 0.0 - - # replace language - language = self._replace_language(language) - self._log([u"Using language: '%s'", language]) - - result = gf.run_c_extension_with_fallback( - self._log, - "cew", - self._synthesize_single_c_extension, - self._synthesize_single_pure_python, - (text, language, output_file_path), - c_extension=self.rconf["c_ext"] - ) - return result[0] - - def _synthesize_single_c_extension(self, text, language, output_file_path): + def _synthesize_single_c_extension(self, text, voice_code, output_file_path): """ - Synthesize a single text fragment, using cew extension. + Synthesize a single text fragment, using the cew extension. Return the duration of the synthesized text, in seconds. - :rtype: (bool, (float, )) + :rtype: (bool, (:class:`~aeneas.timevalue.TimeValue`, )) """ - self._log(u"Synthesizing using C extension...") - - self._log(u"Preparing c_text...") - if gf.PY2: - # Python 2 => pass byte strings - c_text = gf.safe_bytes(text) - else: - # Python 3 => pass Unicode strings - c_text = text - # NOTE language has been replaced already! - self._log(u"Preparing c_text... done") - - try: - self._log(u"Importing aeneas.cew...") - import aeneas.cew.cew - self._log(u"Importing aeneas.cew... done") - self._log(u"Calling aeneas.cew...") - sr, begin, end = aeneas.cew.cew.synthesize_single( - output_file_path, - language, - c_text - ) - self._log(u"Calling aeneas.cew... done") - except Exception as exc: - self._log(u"Calling aeneas.cew... failed") - self._log(u"An unexpected exception occurred while running cew:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) - - self._log(u"Synthesizing using C extension... done") - return (True, (end, )) - - def _synthesize_single_pure_python(self, text, language, output_file_path): - """ - Synthesize a single text fragment, pure Python. - - :rtype: tuple (duration, sample_rate, encoding, data) - """ - self._log(u"Synthesizing using pure Python...") - - # NOTE language has been replaced already! - - try: - # call espeak via subprocess - self._log(u"Calling espeak ...") - arguments = [self.rconf["espeak_path"], "-v", language, "-w", output_file_path] - self._log([u"Calling with arguments '%s'", " ".join(arguments)]) - self._log([u"Calling with text '%s'", text]) - proc = subprocess.Popen( - arguments, - stdout=subprocess.PIPE, - stdin=subprocess.PIPE, - stderr=subprocess.PIPE, - universal_newlines=True) + self.log(u"Synthesizing using C extension...") + + end = None + if self.rconf[RuntimeConfiguration.CEW_SUBPROCESS_ENABLED]: + self.log(u"Using cewsubprocess to call aeneas.cew") + try: + self.log(u"Importing aeneas.cewsubprocess...") + from aeneas.cewsubprocess import CEWSubprocess + self.log(u"Importing aeneas.cewsubprocess... done") + self.log(u"Calling aeneas.cewsubprocess...") + cewsub = CEWSubprocess(rconf=self.rconf, logger=self.logger) + end = cewsub.synthesize_single(output_file_path, voice_code, text) + self.log(u"Calling aeneas.cewsubprocess... done") + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running cewsubprocess", exc, False, None) + # NOTE not critical, try calling aeneas.cew directly + #return (False, None) + + if end is None: + self.log(u"Preparing c_text...") if gf.PY2: - proc.communicate(input=gf.safe_bytes(text)) + # Python 2 => pass byte strings + c_text = gf.safe_bytes(text) else: - proc.communicate(input=text) - proc.stdout.close() - proc.stdin.close() - proc.stderr.close() - self._log(u"Calling espeak ... done") - except Exception as exc: - self._log(u"Calling espeak ... failed") - self._log(u"An unexpected exception occurred while running pure Python code:", Logger.WARNING) - self._log([u"%s", exc], Logger.WARNING) - return (False, None) - - # check the file can be read - if not gf.file_can_be_read(output_file_path): - self._log([u"Output file '%s' does not exist", output_file_path], Logger.CRITICAL) - return (False, None) - - # return the duration of the output file - try: - audio_file = AudioFileMonoWAVE( - file_path=output_file_path, - logger=self.logger - ) - audio_file.load_data() - duration = audio_file.audio_length - sample_rate = audio_file.audio_sample_rate - encoding = audio_file.audio_format - data = audio_file.audio_data - self._log([u"Duration of '%s': %f", output_file_path, duration]) - self._log(u"Synthesizing using pure Python... done") - return (True, (duration, sample_rate, encoding, data)) - except (AudioFileUnsupportedFormatError, OSError) as exc: - self._log(u"Error while trying reading the sythesized audio file", Logger.CRITICAL) - return (False, None) + # Python 3 => pass Unicode strings + c_text = gf.safe_unicode(text) + self.log(u"Preparing c_text... done") + + self.log(u"Calling aeneas.cew directly") + try: + self.log(u"Importing aeneas.cew...") + import aeneas.cew.cew + self.log(u"Importing aeneas.cew... done") + self.log(u"Calling aeneas.cew...") + sr, begin, end = aeneas.cew.cew.synthesize_single( + output_file_path, + voice_code, + c_text + ) + end = TimeValue(end) + self.log(u"Calling aeneas.cew... done") + except Exception as exc: + self.log_exc(u"An unexpected error occurred while running cew", exc, False, None) + return (False, None) + + self.log(u"Synthesizing using C extension... done") + return (True, (end, )) diff --git a/aeneas/executejob.py b/aeneas/executejob.py index db1a5ba0..3590fe70 100644 --- a/aeneas/executejob.py +++ b/aeneas/executejob.py @@ -2,9 +2,13 @@ # coding=utf-8 """ -Execute a job, that is, execute all of its tasks -and generate the output container -holding the generated sync maps. +This module contains the following classes: + +* :class:`~aeneas.executejob.ExecuteJob`, a class to process a job; +* :class:`~aeneas.executejob.ExecuteJobExecutionError`, +* :class:`~aeneas.executejob.ExecuteJobInputError`, and +* :class:`~aeneas.executejob.ExecuteJobOutputError`, + representing errors generated while processing jobs. """ from __future__ import absolute_import @@ -15,7 +19,7 @@ from aeneas.container import ContainerFormat from aeneas.executetask import ExecuteTask from aeneas.job import Job -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration import aeneas.globalfunctions as gf @@ -26,21 +30,21 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class ExecuteJobInputError(Exception): +class ExecuteJobExecutionError(Exception): """ - Error raised when the input parameters of the job are invalid or missing. + Error raised when the execution of the job fails for internal reasons. """ pass -class ExecuteJobExecutionError(Exception): +class ExecuteJobInputError(Exception): """ - Error raised when the execution of the job fails for internal reasons. + Error raised when the input parameters of the job are invalid or missing. """ pass @@ -54,7 +58,7 @@ class ExecuteJobOutputError(Exception): -class ExecuteJob(object): +class ExecuteJob(Loggable): """ Execute a job, that is, execute all of its tasks and generate the output container @@ -62,7 +66,7 @@ class ExecuteJob(object): If you do not provide a job object in the constructor, you must manually set it later, or load it from a container - with ``load_job_from_container``. + with :func:`~aeneas.executejob.ExecuteJob.load_job_from_container`. In the first case, you are responsible for setting the absolute audio/text/sync map paths of each task of the job, @@ -71,102 +75,92 @@ class ExecuteJob(object): any temporary files you might have generated around. In the second case, you are responsible for - calling ``clean`` at the end of the job execution, + calling :func:`~aeneas.executejob.ExecuteJob.clean` + at the end of the job execution, to delete the working directory - created by ``load_job_from_container`` + created by :func:`~aeneas.executejob.ExecuteJob.load_job_from_container` when creating the job object. :param job: the job to be executed - :type job: :class:`aeneas.job.Job` - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :type job: :class:`~aeneas.job.Job` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` - - :raise ExecuteJobInputError: if ``job`` is not an instance of ``Job`` + :type logger: :class:`~aeneas.logger.Logger` + :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if ``job`` is not an instance of ``Job`` """ TAG = u"ExecuteJob" def __init__(self, job=None, rconf=None, logger=None): + super(ExecuteJob, self).__init__(rconf=rconf, logger=logger) self.job = job self.working_directory = None self.tmp_directory = None - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() if job is not None: self.load_job(self.job) - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - def load_job(self, job): """ Load the job from the given ``Job`` object. :param job: the job to load - :type job: :class:`aeneas.job.Job` - - :raise ExecuteJobInputError: if ``job`` is not an instance of ``Job`` + :type job: :class:`~aeneas.job.Job` + :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if ``job`` is not an instance of :class:`~aeneas.job.Job` """ if not isinstance(job, Job): - self._failed(u"job is not an instance of Job", "input") + self.log_exc(u"job is not an instance of Job", None, True, ExecuteJobInputError) self.job = job def load_job_from_container(self, container_path, config_string=None): """ - Load the job from the given ``Container`` object. + Load the job from the given :class:`aeneas.container.Container` object. If ``config_string`` is ``None``, the container must contain a configuration file; otherwise use the provided config string (i.e., the wizard case). - :param container_path: the path to the input container - :type container_path: string (path) - :param config_string: the configuration string (from wizard) - :type config_string: string - - :raise ExecuteJobInputError: if the given container does not contain a valid ``Job`` + :param string container_path: the path to the input container + :param string config_string: the configuration string (from wizard) + :raises: :class:`~aeneas.executejob.ExecuteJobInputError`: if the given container does not contain a valid :class:`~aeneas.job.Job` """ - self._log(u"Loading job from container...") + self.log(u"Loading job from container...") # create working directory where the input container # will be decompressed - self.working_directory = gf.tmp_directory(root=self.rconf["tmp_path"]) - self._log([u"Created working directory '%s'", self.working_directory]) + self.working_directory = gf.tmp_directory(root=self.rconf[RuntimeConfiguration.TMP_PATH]) + self.log([u"Created working directory '%s'", self.working_directory]) try: - self._log(u"Decompressing input container...") + self.log(u"Decompressing input container...") input_container = Container(container_path, logger=self.logger) input_container.decompress(self.working_directory) - self._log(u"Decompressing input container... done") + self.log(u"Decompressing input container... done") except Exception as exc: self.clean() - self._failed(u"Unable to decompress container '%s': %s" % (container_path, exc), "input") + self.log_exc(u"Unable to decompress container '%s': %s" % (container_path, exc), None, True, ExecuteJobInputError) try: - self._log(u"Creating job from working directory...") + self.log(u"Creating job from working directory...") working_container = Container( self.working_directory, logger=self.logger ) analyzer = AnalyzeContainer(working_container, logger=self.logger) self.job = analyzer.analyze(config_string=config_string) - self._log(u"Creating job from working directory... done") + self.log(u"Creating job from working directory... done") except Exception as exc: self.clean() - self._failed(u"Unable to analyze container '%s': %s" % (container_path, exc), "input") + self.log_exc(u"Unable to analyze container '%s': %s" % (container_path, exc), None, True, ExecuteJobInputError) if self.job is None: - self._failed(u"The container '%s' does not contain a valid Job" % container_path, "input") + self.log_exc(u"The container '%s' does not contain a valid Job" % (container_path), None, True, ExecuteJobInputError) try: # set absolute path for text file and audio file # for each task in the job - self._log(u"Setting absolute paths for tasks...") + self.log(u"Setting absolute paths for tasks...") for task in self.job.tasks: task.text_file_path_absolute = gf.norm_join( self.working_directory, @@ -176,12 +170,12 @@ def load_job_from_container(self, container_path, config_string=None): self.working_directory, task.audio_file_path ) - self._log(u"Setting absolute paths for tasks... done") + self.log(u"Setting absolute paths for tasks... done") - self._log(u"Loading job from container: succeeded") + self.log(u"Loading job from container: succeeded") except Exception as exc: self.clean() - self._failed(u"Error while setting absolute paths for tasks: %s" % exc, "input") + self.log_exc(u"Error while setting absolute paths for tasks", exc, True, ExecuteJobInputError) def execute(self): """ @@ -190,142 +184,129 @@ def execute(self): Each produced sync map will be stored inside the corresponding task object. - :raise ExecuteJobExecutionError: if there is a problem during the job execution + :raises: :class:`~aeneas.executejob.ExecuteJobExecutionError`: if there is a problem during the job execution """ - self._log(u"Executing job") + self.log(u"Executing job") if self.job is None: - self._failed(u"The job object is None", "execution") + self.log_exc(u"The job object is None", None, True, ExecuteJobExecutionError) if len(self.job) == 0: - self._failed(u"The job has no tasks", "execution") - if (self.rconf["job_max_tasks"] > 0) and (len(self.job) > self.rconf["job_max_tasks"]): - self._failed(u"The Job has %d Tasks, more than the maximum allowed (%d)." % ( - len(self.job), - self.rconf["job_max_tasks"] - ), "execution") - self._log([u"Number of tasks: '%d'", len(self.job)]) + self.log_exc(u"The job has no tasks", None, True, ExecuteJobExecutionError) + job_max_tasks = self.rconf[RuntimeConfiguration.JOB_MAX_TASKS] + if (job_max_tasks > 0) and (len(self.job) > job_max_tasks): + self.log_exc(u"The Job has %d Tasks, more than the maximum allowed (%d)." % (len(self.job), job_max_tasks), None, True, ExecuteJobExecutionError) + self.log([u"Number of tasks: '%d'", len(self.job)]) for task in self.job.tasks: try: custom_id = task.configuration["custom_id"] - self._log([u"Executing task '%s'...", custom_id]) + self.log([u"Executing task '%s'...", custom_id]) executor = ExecuteTask(task, rconf=self.rconf, logger=self.logger) executor.execute() - self._log([u"Executing task '%s'... done", custom_id]) + self.log([u"Executing task '%s'... done", custom_id]) except Exception as exc: - self._failed(u"Error while executing task '%s': %s" % (custom_id, exc), "execution") - self._log(u"Executing task: succeeded") + self.log_exc(u"Error while executing task '%s'" % (custom_id), exc, True, ExecuteJobExecutionError) + self.log(u"Executing task: succeeded") - self._log(u"Executing job: succeeded") + self.log(u"Executing job: succeeded") def write_output_container(self, output_directory_path): """ Write the output container for this job. - Return the path to output container. + Return the path to output container, + which is the concatenation of ``output_directory_path`` + and of the output container file or directory name. - :param output_directory_path: the path to a directory where - the output container must be created - :type output_directory_path: string (path) + :param string output_directory_path: the path to a directory where + the output container must be created :rtype: string + :raises: :class:`~aeneas.executejob.ExecuteJobOutputError`: if there is a problem while writing the output container """ - self._log(u"Writing output container for this job") + self.log(u"Writing output container for this job") if self.job is None: - self._failed(u"The job object is None", "output") + self.log_exc(u"The job object is None", None, True, ExecuteJobOutputError) if len(self.job) == 0: - self._failed(u"The job has no tasks", "output") - self._log([u"Number of tasks: '%d'", len(self.job)]) + self.log_exc(u"The job has no tasks", None, True, ExecuteJobOutputError) + self.log([u"Number of tasks: '%d'", len(self.job)]) # create temporary directory where the sync map files # will be created # this temporary directory will be compressed into # the output container - self.tmp_directory = gf.tmp_directory(root=self.rconf["tmp_path"]) - self._log([u"Created temporary directory '%s'", self.tmp_directory]) + self.tmp_directory = gf.tmp_directory(root=self.rconf[RuntimeConfiguration.TMP_PATH]) + self.log([u"Created temporary directory '%s'", self.tmp_directory]) for task in self.job.tasks: custom_id = task.configuration["custom_id"] # check if the task has sync map and sync map file path if task.sync_map_file_path is None: - self._failed(u"Task '%s' has sync_map_file_path not set" % custom_id, "output") + self.log_exc(u"Task '%s' has sync_map_file_path not set" % (custom_id), None, True, ExecuteJobOutputError) if task.sync_map is None: - self._failed(u"Task '%s' has sync_map not set" % custom_id, "output") + self.log_exc(u"Task '%s' has sync_map not set" % (custom_id), None, True, ExecuteJobOutputError) try: # output sync map - self._log([u"Outputting sync map for task '%s'...", custom_id]) + self.log([u"Outputting sync map for task '%s'...", custom_id]) task.output_sync_map_file(self.tmp_directory) - self._log([u"Outputting sync map for task '%s'... done", custom_id]) + self.log([u"Outputting sync map for task '%s'... done", custom_id]) except Exception as exc: - self._failed(u"Error while outputting sync map for task '%s': %s" % (custom_id, exc), "output") + self.log_exc(u"Error while outputting sync map for task '%s'" % (custom_id), None, True, ExecuteJobOutputError) # get output container info output_container_format = self.job.configuration["o_container_format"] - self._log([u"Output container format: '%s'", output_container_format]) + self.log([u"Output container format: '%s'", output_container_format]) output_file_name = self.job.configuration["o_name"] if ((output_container_format != ContainerFormat.UNPACKED) and (not output_file_name.endswith(output_container_format))): - self._log(u"Adding extension to output_file_name") + self.log(u"Adding extension to output_file_name") output_file_name += "." + output_container_format - self._log([u"Output file name: '%s'", output_file_name]) + self.log([u"Output file name: '%s'", output_file_name]) output_file_path = gf.norm_join( output_directory_path, output_file_name ) - self._log([u"Output file path: '%s'", output_file_path]) + self.log([u"Output file path: '%s'", output_file_path]) try: - self._log(u"Compressing...") + self.log(u"Compressing...") container = Container( output_file_path, output_container_format, logger=self.logger ) container.compress(self.tmp_directory) - self._log(u"Compressing... done") - self._log([u"Created output file: '%s'", output_file_path]) - self._log(u"Writing output container for this job: succeeded") + self.log(u"Compressing... done") + self.log([u"Created output file: '%s'", output_file_path]) + self.log(u"Writing output container for this job: succeeded") self.clean(False) return output_file_path except Exception as exc: self.clean(False) - self._failed("%s" % (exc), "output") + self.log_exc(u"Error while compressing", exc, True, ExecuteJobOutputError) return None def clean(self, remove_working_directory=True): """ Remove the temporary directory. - If ``remove_working_directory`` is True + If ``remove_working_directory`` is ``True`` remove the working directory as well, otherwise just remove the temporary directory. - :param remove_working_directory: if ``True``, remove - the working directory as well - :type remove_working_directory: bool + :param bool remove_working_directory: if ``True``, remove + the working directory as well """ if remove_working_directory is not None: - self._log(u"Removing working directory... ") + self.log(u"Removing working directory... ") gf.delete_directory(self.working_directory) self.working_directory = None - self._log(u"Removing working directory... done") - self._log(u"Removing temporary directory... ") + self.log(u"Removing working directory... done") + self.log(u"Removing temporary directory... ") gf.delete_directory(self.tmp_directory) self.tmp_directory = None - self._log(u"Removing temporary directory... done") - - def _failed(self, msg, during="execution"): - """ Bubble exception up """ - if during == "input": - self._log(msg, Logger.CRITICAL) - raise ExecuteJobInputError(msg) - elif during == "output": - self._log(msg, Logger.CRITICAL) - raise ExecuteJobOutputError(msg) - else: - self._log(msg, Logger.CRITICAL) - raise ExecuteJobExecutionError(msg) + self.log(u"Removing temporary directory... done") diff --git a/aeneas/executetask.py b/aeneas/executetask.py index 84c33b13..c949878d 100644 --- a/aeneas/executetask.py +++ b/aeneas/executetask.py @@ -2,27 +2,36 @@ # coding=utf-8 """ -Execute a task, that is, compute the sync map for it. +This module contains the following classes: + +* :class:`~aeneas.executetask.ExecuteTask`, a class to process a task; +* :class:`~aeneas.executetask.ExecuteTaskExecutionError`, and +* :class:`~aeneas.executetask.ExecuteTaskInputError`, + representing errors generated while processing tasks. """ from __future__ import absolute_import +from __future__ import division from __future__ import print_function import numpy from aeneas.adjustboundaryalgorithm import AdjustBoundaryAlgorithm -from aeneas.audiofile import AudioFileMonoWAVE +from aeneas.audiofile import AudioFile +from aeneas.audiofilemfcc import AudioFileMFCC from aeneas.dtw import DTWAligner from aeneas.ffmpegwrapper import FFMPEGWrapper -from aeneas.language import Language -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration from aeneas.sd import SD from aeneas.syncmap import SyncMap from aeneas.syncmap import SyncMapFragment from aeneas.syncmap import SyncMapHeadTailFormat from aeneas.synthesizer import Synthesizer +from aeneas.task import Task +from aeneas.textfile import TextFileFormat from aeneas.textfile import TextFragment -from aeneas.vad import VAD +from aeneas.timevalue import TimeValue +from aeneas.tree import Tree import aeneas.globalfunctions as gf __author__ = "Alberto Pettarin" @@ -32,378 +41,439 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" -class ExecuteTaskInputError(Exception): +class ExecuteTaskExecutionError(Exception): """ - Error raised when the input parameters of the task are invalid or missing. + Error raised when the execution of the task fails for internal reasons. """ pass -class ExecuteTaskExecutionError(Exception): +class ExecuteTaskInputError(Exception): """ - Error raised when the execution of the task fails for internal reasons. + Error raised when the input parameters of the task are invalid or missing. """ pass -class ExecuteTask(object): +class ExecuteTask(Loggable): """ Execute a task, that is, compute the sync map for it. :param task: the task to be executed - :type task: :class:`aeneas.task.Task` - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :type task: :class:`~aeneas.task.Task` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ TAG = u"ExecuteTask" - def __init__(self, task, rconf=None, logger=None): + def __init__(self, task=None, rconf=None, logger=None): + super(ExecuteTask, self).__init__(rconf=rconf, logger=logger) self.task = task - self.cleanup_info = [] - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() + self.step_index = 1 + self.step_label = u"" + self.step_begin_time = None + self.step_total = 0.000 + if task is not None: + self.load_task(self.task) + + def load_task(self, task): + """ + Load the task from the given ``Task`` object. - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) + :param task: the task to load + :type task: :class:`~aeneas.task.Task` + :raises: :class:`~aeneas.executetask.ExecuteTaskInputError`: if ``task`` is not an instance of :class:`~aeneas.task.Task` + """ + if not isinstance(task, Task): + self.log_exc(u"task is not an instance of Task", None, True, ExecuteTaskInputError) + self.task = task + + def _step_begin(self, label, log=True): + """ Log begin of a step """ + if log: + self.step_label = label + self.step_begin_time = self.log(u"STEP %d BEGIN (%s)" % (self.step_index, label)) + + def _step_end(self, log=True): + """ Log end of a step """ + if log: + step_end_time = self.log(u"STEP %d END (%s)" % (self.step_index, self.step_label)) + diff = (step_end_time - self.step_begin_time) + diff = float(diff.seconds + diff.microseconds / 1000000.0) + self.step_total += diff + self.log(u"STEP %d DURATION %.3f (%s)" % (self.step_index, diff, self.step_label)) + self.step_index += 1 + + def _step_failure(self, exc): + """ Log failure of a step """ + self.log_crit(u"STEP %d (%s) FAILURE" % (self.step_index, self.step_label)) + self.step_index += 1 + self.log_exc(u"Unexpected error while executing task", exc, True, ExecuteTaskExecutionError) + + def _step_total(self): + """ Log total """ + self.log(u"STEP T DURATION %.3f" % (self.step_total)) def execute(self): """ Execute the task. The sync map produced will be stored inside the task object. - :raise ExecuteTaskInputError: if there is a problem with the input parameters - :raise ExecuteTaskExecutionError: if there is a problem during the task execution + :raises: :class:`~aeneas.executetask.ExecuteTaskInputError`: if there is a problem with the input parameters + :raises: :class:`~aeneas.executetask.ExecuteTaskExecutionError`: if there is a problem during the task execution """ - self._log(u"Executing task") + self.log(u"Executing task...") # check that we have the AudioFile object if self.task.audio_file is None: - self._failed(u"The task does not seem to have its audio file set", False) + self.log_exc(u"The task does not seem to have its audio file set", None, True, ExecuteTaskInputError) if ( (self.task.audio_file.audio_length is None) or (self.task.audio_file.audio_length <= 0) ): - self._failed(u"The task seems to have an invalid audio file", False) + self.log_exc(u"The task seems to have an invalid audio file", None, True, ExecuteTaskInputError) + task_max_audio_length = self.rconf[RuntimeConfiguration.TASK_MAX_AUDIO_LENGTH] if ( - (self.rconf["task_max_a_len"] > 0) and - (self.task.audio_file.audio_length > self.rconf["task_max_a_len"]) + (task_max_audio_length > 0) and + (self.task.audio_file.audio_length > task_max_audio_length) ): - self._failed(u"The audio file of the task has length %.3f, more than the maximum allowed (%.3f)." % ( - self.task.audio_file.audio_length, - self.rconf["task_max_a_len"] - ), False) + self.log_exc(u"The audio file of the task has length %.3f, more than the maximum allowed (%.3f)." % (self.task.audio_file.audio_length, task_max_audio_length), None, True, ExecuteTaskInputError) # check that we have the TextFile object if self.task.text_file is None: - self._failed(u"The task does not seem to have its text file set", False) + self.log_exc(u"The task does not seem to have its text file set", None, True, ExecuteTaskInputError) if len(self.task.text_file) == 0: - self._failed(u"The task text file seems to have no text fragments", False) + self.log_exc(u"The task text file seems to have no text fragments", None, True, ExecuteTaskInputError) + task_max_text_length = self.rconf[RuntimeConfiguration.TASK_MAX_TEXT_LENGTH] if ( - (self.rconf["task_max_t_len"] > 0) and - (len(self.task.text_file) > self.rconf["task_max_t_len"]) + (task_max_text_length > 0) and + (len(self.task.text_file) > task_max_text_length) ): - self._failed(u"The text file of the task has %d fragments, more than the maximum allowed (%d)." % ( - len(self.task.text_file), - self.rconf["task_max_t_len"] - ), False) + self.log_exc(u"The text file of the task has %d fragments, more than the maximum allowed (%d)." % (len(self.task.text_file), task_max_text_length), None, True, ExecuteTaskInputError) if self.task.text_file.chars == 0: - self._failed(u"The task text file seems to have empty text", False) + self.log_exc(u"The task text file seems to have empty text", None, True, ExecuteTaskInputError) - self._log(u"Both audio and text input file are present") - self.cleanup_info = [] + self.log(u"Both audio and text input file are present") - # real full wave = the real audio file, converted to WAVE format - # real trimmed wave = real full wave, possibly with head and/or tail trimmed off - # synt wave = WAVE file synthesized from text; it will be aligned to real trimmed wave + # execute + self.step_index = 1 + self.step_total = 0.000 + if self.task.text_file.file_format in [TextFileFormat.MPLAIN, TextFileFormat.MUNPARSED]: + self._execute_multi_level_task() + else: + self._execute_single_level_task() + self.log(u"Executing task... done") - step_index = 0 + def _execute_single_level_task(self): + """ Execute a single-level task """ + self.log(u"Executing single level task...") try: - # STEP 0 : convert audio file to real full wave - self._log(u"STEP %d BEGIN" % (step_index)) - real_full_handler, real_full_path = self._convert() - self.cleanup_info.append([real_full_handler, real_full_path]) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 1 : extract MFCCs from real full wave - self._log(u"STEP %d BEGIN" % (step_index)) - real_full_wave_full_mfcc, real_full_wave_length = self._extract_mfcc(real_full_path) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 2 : cut head and/or tail off - # detecting head/tail if requested, and - # overwriting real_path - # at the end, read_path will not have the head/tail - self._log(u"STEP %d BEGIN" % (step_index)) - real_wave_modified = self._cut_head_tail(real_full_path) - real_trimmed_path = real_full_path - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 3 : synthesize text to wave - self._log(u"STEP %d BEGIN" % (step_index)) - synt_handler, synt_path, synt_anchors = self._synthesize() - self.cleanup_info.append([synt_handler, synt_path]) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 4 : align waves - self._log(u"STEP %d BEGIN" % (step_index)) - if real_wave_modified: - wave_map = self._align_waves(real_trimmed_path, synt_path, None, None) - else: - wave_map = self._align_waves(real_trimmed_path, synt_path, real_full_wave_full_mfcc, real_full_wave_length) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 5 : align text - self._log(u"STEP %d BEGIN" % (step_index)) - text_map = self._align_text(wave_map, synt_anchors) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 6 : translate the text_map, possibly putting back the head/tail - self._log(u"STEP %d BEGIN" % (step_index)) - translated_text_map = self._translate_text_map( - text_map, - real_full_wave_length - ) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 7 : adjust boundaries - self._log(u"STEP %d BEGIN" % (step_index)) - adjusted_map = self._adjust_boundaries( - translated_text_map, - real_full_wave_full_mfcc, - real_full_wave_length - ) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 8 : create syncmap and add it to task - self._log(u"STEP %d BEGIN" % (step_index)) - self._create_syncmap(adjusted_map) - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - # STEP 9 : cleanup - self._log(u"STEP %d BEGIN" % (step_index)) - self._cleanup() - self._log(u"STEP %d END" % (step_index)) - step_index += 1 - - self._log(u"Execution completed") - return True + # load audio file, extract MFCCs from real wave, clear audio file + self._step_begin(u"extract MFCC real wave") + real_wave_mfcc = self._extract_mfcc(file_path=self.task.audio_file_path_absolute, file_path_is_mono_wave=False) + self._step_end() + + # compute head and/or tail and set it + self._step_begin(u"compute head tail") + (head_length, process_length, tail_length) = self._compute_head_process_tail(real_wave_mfcc) + real_wave_mfcc.set_head_middle_tail(head_length, process_length, tail_length) + self._step_end() + + # compute a time map alignment + time_map = self._execute_inner(real_wave_mfcc, self.task.text_file, adjust_boundaries=True, log=True) + + # convert time_map to tree and create syncmap and add it to task + self._step_begin(u"create sync map") + tree = self._level_time_map_to_tree(self.task.text_file, time_map) + self.task.sync_map = self._create_syncmap(tree) + self._step_end() + + # check for fragments with zero duration + self._step_begin(u"check zero duration") + self._check_no_zero(self.rconf.mws) + self._step_end() + + # log total + self._step_total() + self.log(u"Executing single level task... done") except Exception as exc: - self._log(u"STEP %d FAILURE" % step_index, Logger.CRITICAL) - self._cleanup() - self._failed("%s" % (exc), True) - self._log(u"Executing task... done") - - def _failed(self, msg, during_execution=True): - """ Bubble exception up """ - if during_execution: - self._log(msg, Logger.CRITICAL) - raise ExecuteTaskExecutionError(msg) - else: - self._log(msg, Logger.CRITICAL) - raise ExecuteTaskInputError(msg) + self._step_failure(exc) + + def _execute_multi_level_task(self): + """ Execute a multi-level task """ + self.log(u"Executing multi level task...") + + self.log(u"Saving rconf...") + # save original rconf + orig_rconf = self.rconf.clone() + # clone rconfs and set granularity + level_rconfs = [None, self.rconf.clone(), self.rconf.clone(), self.rconf.clone()] + level_mfccs = [None, None, None, None] + for i in range(1, len(level_rconfs)): + level_rconfs[i].set_granularity(i) + self.log([u"Level %d mws: %.3f", i, level_rconfs[i].mws]) + self.log(u"Saving rconf... done") + + try: + self.log(u"Creating AudioFile object...") + audio_file = self._load_audio_file() + self.log(u"Creating AudioFile object... done") + + # extract MFCC for each level + for i in range(1, len(level_rconfs)): + self._step_begin(u"extract MFCC real wave level %d" % i) + if (i == 1) or (level_rconfs[i].mws != level_rconfs[i-1].mws) or (level_rconfs[i].mwl != level_rconfs[i-1].mwl): + self.rconf = level_rconfs[i] + level_mfccs[i] = self._extract_mfcc(audio_file=audio_file) + else: + self.log(u"Keeping MFCC real wave from previous level") + level_mfccs[i] = level_mfccs[i-1] + self._step_end() + + self.log(u"Clearing AudioFile object...") + self.rconf = level_rconfs[1] + self._clear_audio_file(audio_file) + self.log(u"Clearing AudioFile object... done") + + # compute head tail for the entire real wave (level 1) + self._step_begin(u"compute head tail") + (head_length, process_length, tail_length) = self._compute_head_process_tail(level_mfccs[1]) + level_mfccs[1].set_head_middle_tail(head_length, process_length, tail_length) + self._step_end() + + # compute alignment at each level + tree = Tree() + sync_roots = [tree] + text_files = [self.task.text_file] + aht = [None, True, False, False] + aba = [None, True, True, False] + for i in range(1, len(level_rconfs)): + self._step_begin(u"compute alignment level %d" % i) + text_files, sync_roots = self._execute_level(i, level_rconfs[i], level_mfccs[i], text_files, sync_roots, aht[i], aba[i]) + self._step_end() + + self._step_begin(u"select levels") + tree = self._select_levels(tree) + self._step_end() + + self._step_begin(u"create sync map") + self.rconf = orig_rconf + self.task.sync_map = self._create_syncmap(tree) + self._step_end() + + self._step_begin(u"check zero duration") + self._check_no_zero(level_rconfs[-1].mws) + self._step_end() + + self._step_total() + self.log(u"Executing multi level task... done") + except Exception as exc: + self._step_failure(exc) - def _cleanup(self): + def _execute_level(self, level, rconf, audio_file_mfcc, text_files, sync_roots, add_head_tail, adjust_boundaries): """ - Remove all temporary files. + Compute the alignment for all the nodes in the given level. + + Return a pair (next_level_text_files, next_level_sync_roots), + containing two lists of text file subtrees and sync map subtrees + on the next level. + + :param int level: the level + :param rconf: the runtime configuration for this level + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param audio_file_mfcc: the audio MFCC representation for this level + :type audio_file_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :param list text_files: a list of :class:`~aeneas.textfile.TextFile` objects, + each representing a (sub)tree of the Task text file + :param list sync_roots: a list of :class:`~aeneas.tree.Tree` objects, + each representing a SyncMapFragment tree, + one for each element in ``text_files`` + :param bool add_head_tail: if ``True``, add head and tail nodes to the sync map tree + :param bool adjust_boundaries: if ``True``, execute the adjust boundary algorithm + :rtype: (list, list) """ - self._log(u"Cleaning up...") - for info in self.cleanup_info: - handler, path = info - self._log([u"Removing file '%s'", path]) - gf.delete_file(handler, path) - self.cleanup_info = [] - self._log(u"Cleaning up... done") - - def _convert(self): + self.rconf = rconf + i = 0 + next_level_text_files = [] + next_level_sync_roots = [] + for text_file in text_files: + self.log([u"Text level %d, fragment %d", level, i]) + self.log([u" Len: %d", len(text_file)]) + sync_root = sync_roots[i] + if (level > 1) and (len(text_file) == 1): + self.log(u" Level > 1 and only one child => returning trivial timemap") + time_map = [ + (TimeValue("0.000"), sync_root.value.begin), + (sync_root.value.begin, sync_root.value.end), + (sync_root.value.end, audio_file_mfcc.audio_length) + ] + else: + self.log(u" Level 1 or more than one child => computing timemap") + if not sync_root.is_empty: + begin = sync_root.value.begin + end = sync_root.value.end + self.log([u" Begin: %.3f", begin]) + self.log([u" End: %.3f", end]) + audio_file_mfcc.set_head_middle_tail(head_length=begin, middle_length=(end - begin)) + else: + self.log(u" No begin or end to set") + time_map = self._execute_inner(audio_file_mfcc, text_file, adjust_boundaries=adjust_boundaries, log=False) + self.log([u" Map: %s", str(time_map)]) + self._level_time_map_to_tree(text_file, time_map, sync_root, add_head_tail=add_head_tail) + # store next level roots + next_level_text_files.extend(text_file.children_not_empty) + src = sync_root.children + if add_head_tail: + # if we added head and tail, + # we must not pass them to the next level + src = src[1:-1] + next_level_sync_roots.extend(src) + i += 1 + return (next_level_text_files, next_level_sync_roots) + + def _execute_inner(self, audio_file_mfcc, text_file, adjust_boundaries=True, log=True): """ - Convert the entire audio file into a ``wav`` file. + Align a subinterval of the given AudioFileMFCC + with the given TextFile. - (Head/tail will be cut off later.) + Return the computed time map, as a list of intervals. - Return a pair: + The begin and end positions inside the AudioFileMFCC + must have been set ahead by the caller. - 1. handler of the generated wave file - 2. path of the generated wave file + The text fragments being aligned are the vchildren of ``text_file``. + + :param audio_file_mfcc: the audio file MFCC representation + :type audio_file_mfcc: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + :param text_file: the text file subtree to align + :type text_file: :class:`~aeneas.textfile.TextFile` + :param bool adjust_boundaries: if ``True``, execute the adjust boundary algorithm + :param bool log: if ``True``, log steps + :rtype: list + """ + self._step_begin(u"synthesize text", log=log) + synt_handler, synt_path, synt_anchors, synt_mono = self._synthesize(text_file) + self._step_end(log=log) + + self._step_begin(u"extract MFCC synt wave", log=log) + synt_wave_mfcc = self._extract_mfcc(file_path=synt_path, file_path_is_mono_wave=synt_mono) + gf.delete_file(synt_handler, synt_path) + self._step_end(log=log) + + self._step_begin(u"align waves", log=log) + indices = self._align_waves(audio_file_mfcc, synt_wave_mfcc, synt_anchors) + self._step_end(log=log) + + self._step_begin(u"adjust boundaries", log=log) + time_map = self._adjust_boundaries(audio_file_mfcc, text_file, indices, adjust_boundaries) + self._step_end(log=log) + + return time_map + + def _load_audio_file(self): """ - self._log(u"Converting real audio to wav") - handler = None - path = None - self._log(u"Creating an output tmp file") - handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"]) - self._log(u"Creating a FFMPEGWrapper") - ffmpeg = FFMPEGWrapper(rconf=self.rconf, logger=self.logger) - self._log(u"Converting...") - ffmpeg.convert( - input_file_path=self.task.audio_file_path_absolute, - output_file_path=path + Load audio in memory. + + :rtype: :class:`~aeneas.audiofile.AudioFile` + """ + self._step_begin(u"load audio file") + audio_file = AudioFile( + file_path=self.task.audio_file_path_absolute, + is_mono_wave=False, + rconf=self.rconf, + logger=self.logger ) - self._log(u"Converting... done") - self._log(u"Converting real audio to wav: succeeded") - return (handler, path) + audio_file.read_samples_from_file() + self._step_end() + return audio_file - def _extract_mfcc(self, audio_file_path): + def _clear_audio_file(self, audio_file): """ - Extract the MFCCs of the real full wave. + Clear audio from memory. - Return a pair: + :param audio_file: the object to clear + :type audio_file: :class:`~aeneas.audiofile.AudioFile` + """ + self._step_begin(u"clear audio file") + audio_file.clear_data() + audio_file = None + self._step_end() - 1. audio MFCCs - 2. audio length + def _extract_mfcc(self, file_path=None, file_path_is_mono_wave=False, audio_file=None): """ - self._log(u"Extracting MFCCs from real full wave") - audio_file = AudioFileMonoWAVE(audio_file_path, rconf=self.rconf, logger=self.logger) - audio_file.extract_mfcc() - self._log(u"Extracting MFCCs from real full wave: succeeded") - return (audio_file.audio_mfcc, audio_file.audio_length) + Extract the MFCCs from the given audio file. - def _cut_head_tail(self, audio_file_path): + :rtype: :class:`~aeneas.audiofilemfcc.AudioFileMFCC` + """ + return AudioFileMFCC( + file_path=file_path, + file_path_is_mono_wave=file_path_is_mono_wave, + audio_file=audio_file, + rconf=self.rconf, + logger=self.logger + ) + + def _compute_head_process_tail(self, audio_file_mfcc): """ Set the audio file head or tail, - suitably cutting the audio file on disk, - and setting the corresponding parameters in the task configuration. + by either reading the explicit values + from the Task configuration, + or using SD to determine them. - Return ``True`` if head or tail has been cut; - otherwise return ``False`` (real wave file not modified) + This function returns the lengths, in seconds, + of the (head, process, tail). - :rtype: bool + :rtype: tuple (float, float, float) """ - self._log(u"Setting head and/or tail") head_length = self.task.configuration["i_a_head"] process_length = self.task.configuration["i_a_process"] tail_length = self.task.configuration["i_a_tail"] - detect_head_max = self.task.configuration["i_a_head_max"] - detect_head_min = self.task.configuration["i_a_head_min"] - detect_tail_max = self.task.configuration["i_a_tail_max"] - detect_tail_min = self.task.configuration["i_a_tail_min"] - - # explicit head or process? - explicit = ( + head_max = self.task.configuration["i_a_head_max"] + head_min = self.task.configuration["i_a_head_min"] + tail_max = self.task.configuration["i_a_tail_max"] + tail_min = self.task.configuration["i_a_tail_min"] + if ( (head_length is not None) or (process_length is not None) or (tail_length is not None) - ) - - # at least one detect parameter? - detect = ( - (detect_head_min is not None) or - (detect_head_max is not None) or - (detect_tail_min is not None) or - (detect_tail_max is not None) - ) - - if not (explicit or detect): - # nothing to do - self._log(u"No explicit head/process or detect head/tail") - self._log(u"Setting head and/or tail: succeeded") - return False - - # we need to cut head/tail, hence load the audio data - audio_file = AudioFileMonoWAVE(audio_file_path, rconf=self.rconf, logger=self.logger) - audio_file.load_data() - - if explicit: - self._log(u"Explicit head, process, or tail") - else: - self._log(u"No explicit head, process, or tail => detecting head/tail") - - head = 0.0 - if (detect_head_min is not None) or (detect_head_max is not None): - self._log(u"Detecting head...") - detect_head_min = gf.safe_float(detect_head_min, SD.MIN_HEAD_LENGTH) - detect_head_max = gf.safe_float(detect_head_max, SD.MAX_HEAD_LENGTH) - self._log([u"detect_head_min is %.3f", detect_head_min]) - self._log([u"detect_head_max is %.3f", detect_head_max]) - start_detector = SD(audio_file, self.task.text_file, rconf=self.rconf, logger=self.logger) - head = start_detector.detect_head(detect_head_min, detect_head_max) - self._log([u"Detected head: %.3f", head]) - - tail = 0.0 - if (detect_tail_min is not None) or (detect_tail_max is not None): - self._log(u"Detecting tail...") - detect_tail_max = gf.safe_float(detect_tail_max, SD.MAX_TAIL_LENGTH) - detect_tail_min = gf.safe_float(detect_tail_min, SD.MIN_TAIL_LENGTH) - self._log([u"detect_tail_min is %.3f", detect_tail_min]) - self._log([u"detect_tail_max is %.3f", detect_tail_max]) - start_detector = SD(audio_file, self.task.text_file, rconf=self.rconf, logger=self.logger) - tail = start_detector.detect_tail(detect_tail_min, detect_tail_max) - self._log([u"Detected tail: %.3f", tail]) - - head_length = max(0, head) - process_length = max(0, audio_file.audio_length - tail - head) - tail_length = audio_file.audio_length - head_length - process_length - - # we need to set these values - # in the config object for later use - self.task.configuration["i_a_head"] = head_length - self.task.configuration["i_a_process"] = process_length - self._log([u"Set head_length: %.3f", head_length]) - self._log([u"Set process_length: %.3f", process_length]) - - # in case we are reading from config object - if head_length is not None: - self._log(u"head_length is not None, converting to float") - head_length = float(head_length) + ): + self.log(u"Setting explicit head process tail") else: - self._log(u"head_length is None: setting it to 0.0") - head_length = 0.0 - # note that process_length and tail_length are mutually exclusive - # with process_length having precedence over tail_length - if process_length is not None: - self._log(u"process_length is not None, converting to float") - process_length = float(process_length) - if tail_length is not None: - self._log(u"tail_length is not None, but it will be ignored") - tail_length = float(tail_length) - elif tail_length is not None: - self._log(u"tail_length is not None, converting to float") - tail_length = float(tail_length) - self._log(u"computing process_length from tail_length") - process_length = audio_file.audio_length - head_length - tail_length - - self._log([u"is_audio_file_head_length is %s", str(head_length)]) - self._log([u"is_audio_file_process_length is %s", str(process_length)]) - self._log([u"is_audio_file_tail_length is %s", str(tail_length)]) - - self._log(u"Trimming audio data...") - audio_file.trim(head_length, process_length) - self._log(u"Trimming audio data... done") - - self._log(u"Writing audio file...") - audio_file.write(audio_file_path) - self._log(u"Writing audio file... done") - - self._log(u"Clearing audio data...") - audio_file.clear_data() - self._log(u"Clearing audio data... done") - - self._log(u"Setting head and/or tail: succeeded") - return True - - def _synthesize(self): + self.log(u"Detecting head tail...") + sd = SD(audio_file_mfcc, self.task.text_file, rconf=self.rconf, logger=self.logger) + head_length = TimeValue("0.000") + process_length = None + tail_length = TimeValue("0.000") + if (head_min is not None) or (head_max is not None): + self.log(u"Detecting HEAD...") + head_length = sd.detect_head(head_min, head_max) + self.log([u"Detected HEAD: %.3f", head_length]) + self.log(u"Detecting HEAD... done") + if (tail_min is not None) or (tail_max is not None): + self.log(u"Detecting TAIL...") + tail_length = sd.detect_tail(tail_min, tail_max) + self.log([u"Detected TAIL: %.3f", tail_length]) + self.log(u"Detecting TAIL... done") + self.log(u"Detecting head tail... done") + self.log([u"Head: %s", gf.safe_float(head_length, None)]) + self.log([u"Process: %s", gf.safe_float(process_length, None)]) + self.log([u"Tail: %s", gf.safe_float(tail_length, None)]) + return (head_length, process_length, tail_length) + + def _synthesize(self, text_file): """ - Synthesize text into a ``wav`` file. + Synthesize text into a WAVE file. - Return a triple: + Return: 1. handler of the generated wave file 2. path of the generated wave file @@ -411,238 +481,211 @@ def _synthesize(self): each representing the start time of the corresponding text fragment in the generated wave file ``[start_1, start_2, ..., start_n]`` - """ - self._log(u"Synthesizing text") - handler = None - path = None - anchors = None - self._log(u"Creating an output tmp file") - handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf["tmp_path"]) - self._log(u"Creating Synthesizer object") - synt = Synthesizer(rconf=self.rconf, logger=self.logger) - self._log(u"Synthesizing...") - result = synt.synthesize(self.task.text_file, path) - anchors = result[0] - self._log(u"Synthesizing... done") - self._log(u"Synthesizing text: succeeded") - return (handler, path, anchors) + 4. if the synthesizer produced a PCM16 mono WAVE file - def _align_waves(self, real_path, synt_path, real_full_wave_full_mfcc=None, real_full_wave_length=None): + :param synthesizer: the synthesizer to use + :type synthesizer: :class:`~aeneas.synthesizer.Synthesizer` + :rtype: tuple (handler, string, list) """ - Align two ``wav`` files. - - Return the computed alignment map, that is, - a list of pairs of floats, each representing - corresponding time instants - in the real and synt wave, respectively - ``[real_time, synt_time]`` + synthesizer = Synthesizer(rconf=self.rconf, logger=self.logger) + handler, path = gf.tmp_file(suffix=u".wav", root=self.rconf[RuntimeConfiguration.TMP_PATH]) + result = synthesizer.synthesize(text_file, path) + anchors = result[0] + return (handler, path, anchors, synthesizer.output_is_mono_wave) - If ``real_full_wave_full_mfcc`` and ``real_full_wave_length`` - are not None, use them instead of computing MFCCs again. + def _align_waves(self, real_wave_mfcc, synt_wave_mfcc, synt_anchors): """ - self._log(u"Aligning waves") - self._log(u"Creating DTWAligner object") - aligner = DTWAligner(real_path, synt_path, rconf=self.rconf, logger=self.logger) - self._log(u"Computing MFCC...") - if (real_full_wave_full_mfcc is not None) and (real_full_wave_length is not None): - self._log(u"Using real wave MFCCs already computed") - aligner.real_wave_full_mfcc = real_full_wave_full_mfcc - aligner.real_wave_length = real_full_wave_length - aligner.compute_mfcc(real_wave=False, synt_wave=True) - else: - self._log(u"Computing both real and synt wave MFCCs") - aligner.compute_mfcc(real_wave=True, synt_wave=True) - self._log(u"Computing MFCC... done") - self._log(u"Computing path...") - aligner.compute_path() - self._log(u"Computing path... done") - self._log(u"Computing map...") - computed_map = aligner.computed_map - self._log(u"Computing map... done") - self._log(u"Aligning waves: succeeded") - return computed_map - - def _align_text(self, wave_map, synt_anchors): - """ - Align the text with the real wave, - using the ``wave_map`` (containing the mapping - between real and synt waves) and ``synt_anchors`` - (containing the start times of text fragments - in the synt wave). - - Return the computed interval map, that is, - a list of triples ``[start_time, end_time, fragment_id]`` - """ - self._log(u"Aligning text") - self._log([u"Number of frames: %d", len(wave_map)]) - self._log([u"Number of fragments: %d", len(synt_anchors)]) - - real_times = numpy.array([t[0] for t in wave_map]) - synt_times = numpy.array([t[1] for t in wave_map]) - real_anchors = [] - anchor_index = 0 - # TODO numpy-fy this loop - for anchor in synt_anchors: - time, fragment_id, fragment_text = anchor - self._log(u"Looking for argmin index...") - # TODO allow an user-specified function instead of min - # partially solved by AdjustBoundaryAlgorithm - index = (numpy.abs(synt_times - time)).argmin() - self._log(u"Looking for argmin index... done") - real_time = real_times[index] - real_anchors.append([real_time, fragment_id, fragment_text]) - self._log([u"Time for anchor %d: %f", anchor_index, real_time]) - anchor_index += 1 - - # dummy last anchor, starting at the real file duration - real_anchors.append([real_times[-1], None, None]) - - # compute map - self._log(u"Computing interval map...") - # TODO numpy-fy this loop - computed_map = [] - for i in range(len(real_anchors) - 1): - fragment_id = real_anchors[i][1] - fragment_text = real_anchors[i][2] - start = real_anchors[i][0] - end = real_anchors[i+1][0] - computed_map.append([start, end, fragment_id, fragment_text]) - self._log(u"Computing interval map... done") - self._log(u"Aligning text: succeeded") - return computed_map - - def _translate_text_map(self, text_map, real_full_wave_length): - """ - Translate the text_map by adding head and tail dummy fragments + Align two AudioFileMFCC objects, + representing WAVE files. - Return the translated text map + Return a list of boundary indices. """ - translated = [] - head = gf.safe_float(self.task.configuration["i_a_head"], 0) - translated.append([0, head, None, None]) - end = 0 - for element in text_map: - start, end, fragment_id, fragment_text = element - start += head - end += head - translated.append([start, end, fragment_id, fragment_text]) - translated.append([end, real_full_wave_length, None, None]) - return translated - - def _adjust_boundaries( - self, - text_map, - real_wave_full_mfcc, - real_wave_length - ): + self.log(u"Creating DTWAligner...") + aligner = DTWAligner(real_wave_mfcc, synt_wave_mfcc, rconf=self.rconf, logger=self.logger) + self.log(u"Creating DTWAligner... done") + self.log(u"Computing boundary indices...") + boundary_indices = aligner.compute_boundaries(synt_anchors) + self.log(u"Computing boundary indices... done") + return boundary_indices + + def _adjust_boundaries(self, real_wave_mfcc, text_file, boundary_indices, adjust_boundaries=True): """ - Adjust the boundaries between consecutive fragments. + Adjust boundaries as requested by the user. - Return the computed interval map, that is, - a list of triples ``[start_time, end_time, fragment_id]`` + Return the computed time map, that is, + a list of pairs ``[start_time, end_time]``, + of length equal to number of fragments + 2, + where the two extra elements are for + the HEAD (first) and TAIL (last). """ - self._log(u"Adjusting boundaries") - algo = self.task.configuration["aba_algorithm"] - value = None - if algo is None: - self._log(u"No adjust boundary algorithm specified: returning") - return text_map - elif algo == AdjustBoundaryAlgorithm.AUTO: - self._log(u"Requested adjust boundary algorithm AUTO: returning") - return text_map - elif algo == AdjustBoundaryAlgorithm.AFTERCURRENT: - value = self.task.configuration["aba_aftercurrent_value"] - elif algo == AdjustBoundaryAlgorithm.BEFORENEXT: - value = self.task.configuration["aba_beforenext_value"] - elif algo == AdjustBoundaryAlgorithm.OFFSET: - value = self.task.configuration["aba_offset_value"] - elif algo == AdjustBoundaryAlgorithm.PERCENT: - value = self.task.configuration["aba_percent_value"] - elif algo == AdjustBoundaryAlgorithm.RATE: - value = self.task.configuration["aba_rate_value"] - elif algo == AdjustBoundaryAlgorithm.RATEAGGRESSIVE: - value = self.task.configuration["aba_rate_value"] - self._log([u"Requested algo %s and value %s", algo, str(value)]) - - self._log(u"Running VAD...") - vad = VAD(real_wave_full_mfcc, real_wave_length, rconf=self.rconf, logger=self.logger) - vad.compute_vad() - self._log(u"Running VAD... done") - - self._log(u"Creating AdjustBoundaryAlgorithm object") - adjust_boundary = AdjustBoundaryAlgorithm( - algorithm=algo, - text_map=text_map, - speech=vad.speech, - nonspeech=vad.nonspeech, - value=value, + # boundary_indices contains the boundary indices in the all_mfcc of real_wave_mfcc + # starting with the (head-1st fragment) and ending with (-1th fragment-tail) + if adjust_boundaries: + aba_algorithm, aba_parameters = self.task.configuration.aba_parameters() + self.log([u"Running algorithm: '%s'", aba_algorithm]) + else: + self.log(u"Forced running algorithm: 'auto'") + aba_algorithm = AdjustBoundaryAlgorithm.AUTO + aba_parameters = None + return AdjustBoundaryAlgorithm( + algorithm=aba_algorithm, + parameters=aba_parameters, + real_wave_mfcc=real_wave_mfcc, + boundary_indices=boundary_indices, + text_file=text_file, rconf=self.rconf, logger=self.logger - ) - self._log(u"Adjusting boundaries...") - adjusted_map = adjust_boundary.adjust() - self._log(u"Adjusting boundaries... done") - self._log(u"Adjusting boundaries: succeeded") - return adjusted_map + ).to_time_map() - def _create_syncmap(self, adjusted_map): + def _level_time_map_to_tree(self, text_file, time_map, tree=None, add_head_tail=True): """ - Create a sync map out of the provided interval map, - and store it in the task object. + Convert a level time map into a Tree of SyncMapFragments. + + The time map is + a list of pairs ``[start_time, end_time]``, + of length equal to number of fragments + 2, + where the two extra elements are for + the HEAD (first) and TAIL (last). + + :param text_file: the text file object + :type text_file: :class:`~aeneas.textfile.TextFile` + :param list time_map: the time map + :param tree: the tree; if ``None``, a new Tree will be built + :type tree: :class:`~aeneas.tree.Tree` + :rtype: :class:`~aeneas.tree.Tree` """ - self._log(u"Creating sync map") - self._log([u"Number of fragments in adjusted map (including HEAD and TAIL): %d", len(adjusted_map)]) + if tree is None: + tree = Tree() + if add_head_tail: + fragments = ( + [TextFragment(u"HEAD", self.task.configuration["language"], [u""])] + + text_file.fragments + + [TextFragment(u"TAIL", self.task.configuration["language"], [u""])] + ) + i = 0 + else: + fragments = text_file.fragments + i = 1 + for fragment in fragments: + interval = time_map[i] + sm_frag = SyncMapFragment(fragment, interval[0], interval[1]) + tree.add_child(Tree(value=sm_frag)) + i += 1 + return tree - # adjusted map has 2 elements (HEAD and TAIL) more than text_file - #if len(adjusted_map) != len(self.task.text_file.fragments) + 2: - # self._log(u"The number of sync map fragments does not match the number of text fragments (+2)", Logger.CRITICAL) - # return False + def _select_levels(self, tree): + """ + Select the correct levels in the tree, + reading the ``os_task_file_levels`` + parameter in the Task configuration. - sync_map = SyncMap() - head = adjusted_map[0] - tail = adjusted_map[-1] + If ``None`` or invalid, return the current sync map tree + unchanged. + Otherwise, return only the levels appearing in it. - # get language - language = Language.EN - self._log([u"Language set to default: %s", language]) - if len(self.task.text_file.fragments) > 0: - language = self.task.text_file.fragments[0].language - self._log([u"Language read from text_file: %s", language]) + :param tree: a Tree of SyncMapFragments + :type tree: :class:`~aeneas.tree.Tree` + :rtype: :class:`~aeneas.tree.Tree` + """ + levels = self.task.configuration["o_levels"] + self.log([u"Levels: '%s'", levels]) + if (levels is None) or (len(levels) < 1): + return tree + try: + levels = [int(l) for l in levels if int(l) > 0] + self.log([u"Converted levels: %s", levels]) + except ValueError: + self.log_warn(u"Cannot convert levels to list of int, returning unchanged") + return tree + # remove head and tail nodes + head = tree.vchildren[0] + tail = tree.vchildren[-1] + tree.remove_child(0) + tree.remove_child(-1) + # keep only the selected levels + tree.keep_levels(levels) + # add head and tail back + tree.add_child(Tree(value=head), as_last=False) + tree.add_child(Tree(value=tail), as_last=True) + # return the new tree + return tree + + def _create_syncmap(self, tree): + """ + Return a sync map corresponding to the provided text file and time map. - # get head/tail format + :param tree: a Tree of SyncMapFragments + :type tree: :class:`~aeneas.tree.Tree` + :rtype: :class:`~aeneas.syncmap.SyncMap` + """ + self.log([u"Fragments in time map (including HEAD/TAIL): %d", len(tree)]) head_tail_format = self.task.configuration["o_h_t_format"] - self._log([u"Head/tail format: %s", str(head_tail_format)]) + self.log([u"Head/tail format: %s", str(head_tail_format)]) + + children = tree.vchildren + head = children[0] + first = children[1] + last = children[-2] + tail = children[-1] - # add head sync map fragment if needed - if head_tail_format == SyncMapHeadTailFormat.ADD: - head_frag = TextFragment(u"HEAD", language, [u""]) - sync_map_frag = SyncMapFragment(head_frag, head[0], head[1]) - sync_map.append_fragment(sync_map_frag) - self._log([u"Adding head (ADD): %.3f %.3f", head[0], head[1]]) + # remove HEAD fragment if needed + if head_tail_format != SyncMapHeadTailFormat.ADD: + tree.remove_child(0) + self.log(u"Removed HEAD") # stretch first and last fragment timings if needed if head_tail_format == SyncMapHeadTailFormat.STRETCH: - self._log([u"Stretching (STRETCH): %.3f => %.3f (head) and %.3f => %.3f (tail)", adjusted_map[1][0], head[0], adjusted_map[-2][1], tail[1]]) - adjusted_map[1][0] = head[0] - adjusted_map[-2][1] = tail[1] - - i = 1 - for fragment in self.task.text_file.fragments: - start = adjusted_map[i][0] - end = adjusted_map[i][1] - sync_map_frag = SyncMapFragment(fragment, start, end) - sync_map.append_fragment(sync_map_frag) - i += 1 + self.log([u"Stretched first.begin: %.3f => %.3f (head)", first.begin, head.begin]) + self.log([u"Stretched last.end: %.3f => %.3f (tail)", last.end, tail.end]) + first.begin = head.begin + last.end = tail.end - # add tail sync map fragment if needed - if head_tail_format == SyncMapHeadTailFormat.ADD: - tail_frag = TextFragment(u"TAIL", language, [u""]) - sync_map_frag = SyncMapFragment(tail_frag, tail[0], tail[1]) - sync_map.append_fragment(sync_map_frag) - self._log([u"Adding tail (ADD): %.3f %.3f", tail[0], tail[1]]) + # remove TAIL fragment if needed + if head_tail_format != SyncMapHeadTailFormat.ADD: + tree.remove_child(-1) + self.log(u"Removed TAIL") - self.task.sync_map = sync_map - self._log(u"Creating sync map: succeeded") + # return sync map + sync_map = SyncMap() + sync_map.fragments_tree = tree + return sync_map + + # TODO can this be done during the alignment? + def _check_no_zero(self, min_mws): + """ Check for fragments with zero duration """ + if self.task.configuration["o_no_zero"]: + self.log(u"Checking for fragments with zero duration...") + # TODO use min_mws when doable, e.g. only one fragment? + delta = TimeValue("0.001") + leaves = self.task.sync_map.fragments_tree.vleaves_not_empty + # first and last leaves are HEAD and TAIL, skipping them + max_index = len(leaves) - 1 + self.log([u"Fragment min index: %d", 1]) + self.log([u"Fragment max index: %d", max_index - 1]) + for i in range(1, max_index): + self.log([u"Checking index: %d", i]) + j = i + while (j < max_index) and (leaves[j].end == leaves[i].begin): + j += 1 + if j != i: + self.log(u"Fragment(s) with zero duration:") + for k in range(i, j): + self.log([u" %d : %s", k, leaves[k]]) + + if leaves[j].end - leaves[j].begin > (j - i) * delta: + # there is room after + # to move each zero fragment forward by 0.001 + for k in range(j - i): + shift = (k + 1) * delta + leaves[i + k].end += shift + leaves[i + k + 1].begin += shift + self.log([u" Moved fragment %d forward by %.3f", i + k, shift]) + else: + self.log_warn(u" Unable to fix") + i = j - 1 + self.log(u"Checking for fragments with zero duration... done") + else: + self.log(u"Not checking for fragments with zero duration") diff --git a/aeneas/extra/.gitignore b/aeneas/extra/.gitignore new file mode 100644 index 00000000..ef3a94cf --- /dev/null +++ b/aeneas/extra/.gitignore @@ -0,0 +1 @@ +ctw_speect diff --git a/aeneas/extra/README.md b/aeneas/extra/README.md new file mode 100644 index 00000000..06a37b80 --- /dev/null +++ b/aeneas/extra/README.md @@ -0,0 +1,81 @@ +# aeneas extras + +This Python module (directory) contains +a collection of extra tools for aeneas, +mainly custom TTS engine wrappers. + + + +## `ctw_espeak.py` + +A wrapper for the `eSpeak` TTS engine +that executes `eSpeak` via `subprocess`. + +This file is an example to illustrate +how to write a custom TTS wrapper, +and how to use it at runtime: + +1. Copy the `ctw_espeak.py` file to `/tmp/ctw_espeak.py` + (or any other directory you like). + +2. Run any `aeneas.tools.*` with the following options: + + ``` + -r="tts=custom|tts_path=/tmp/ctw_espeak.py" + ``` + + For example: + + ```bash + python -m aeneas.tools.execute_task --example-srt -r="tts=custom|tts_path=/tmp/ctw_espeak.py" + ``` + +For details, please inspect the `ctw_espeak.py` file, +which is heavily commented and it should help you +create a new wrapper for your own TTS engine. + +Note: if you want to use `eSpeak` as your TTS engine +in a production environment, +do NOT use the `ctw_espeak.py` wrapper! +`eSpeak` is the default TTS engine of `aeneas`, +and the `aeneas.espeakwrapper` in the main library +is faster than the `ctw_espeak.py` wrapper. + + + +## `ctw_speect.py` + +A wrapper for the `Speect` TTS engine +that synthesizes text via Python calls +to the `speect` Python module. + +To use it, do the following: + +1. Install `Speect` and compile the Python module `speect`: +see [http://speect.sourceforge.net/](http://speect.sourceforge.net/) for details. + +2. Download a voice for `Speect`, for example the `Speect CMU Arctic slt` voice +(file `cmu_arctic_slt-1.0.tar.gz` +from [http://hlt.mirror.ac.za/TTS/Speect/](http://hlt.mirror.ac.za/TTS/Speect/)), +and decompress it to `/tmp/cmu_arctic_slt/` +(or any other directory you like). + +3. Copy the `ctw_speect.py` file to `/tmp/cmu_arctic_slt/ctw_speect.py` + (or any other directory you like). + +4. Run any `aeneas.tools.*` with the following options: + + ``` + -r="tts=custom|tts_path=/tmp/cmu_arctic_slt/ctw_speect.py" + ``` + + For example: + + ```bash + python -m aeneas.tools.execute_task --example-srt -r="tts=custom|tts_path=/tmp/cmu_arctic_slt/ctw_speect.py" + ``` + +For details, please inspect the `ctw_speect.py` file. + + + diff --git a/aeneas/extra/__init__.py b/aeneas/extra/__init__.py new file mode 100644 index 00000000..8b698ed4 --- /dev/null +++ b/aeneas/extra/__init__.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +aeneas.extra contains a collection of extra tools for aeneas, +mainly custom TTS engine wrappers. +""" + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL 3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + + + + diff --git a/aeneas/extra/ctw_espeak.py b/aeneas/extra/ctw_espeak.py new file mode 100644 index 00000000..7f8828f5 --- /dev/null +++ b/aeneas/extra/ctw_espeak.py @@ -0,0 +1,152 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +A wrapper for a custom TTS engine. +""" + +from __future__ import absolute_import +from __future__ import print_function + +from aeneas.language import Language +from aeneas.ttswrapper import TTSWrapper + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +class CustomTTSWrapper(TTSWrapper): + """ + A wrapper for the ``espeak`` TTS engine, + to illustrate the use of custom TTS wrapper + loading at runtime. + + It will perform one or more calls like :: + + $ echo "text to be synthesized" | espeak -v en -w output_file.wav + + This wrapper supports calling the TTS engine + only via ``subprocess``. + + To use this TTS engine, specify :: + + "tts=custom|tts_path=/path/to/this/file.py" + + in the ``rconf`` object. + + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + """ + + TAG = u"CustomTTSWrapper" + + # + # NOTE create aliases for the language codes + # supported by this TTS: in this example, + # English, Italian, Russian and Ukrainian + # + ENG = Language.ENG + """ English """ + + ITA = Language.ITA + """ Italian """ + + RUS = Language.RUS + """ Russian """ + + UKR = Language.UKR + """ Ukrainian """ + + # + # NOTE LANGUAGE_TO_VOICE_CODE maps a language code + # to the corresponding voice code + # supported by this custom TTS wrapper; + # mock support for Ukrainian with Russian voice + # + LANGUAGE_TO_VOICE_CODE = { + ENG : "en", + ITA : "it", + RUS : "ru", + UKR : "ru", + } + DEFAULT_LANGUAGE = ENG + + # + # NOTE eSpeak always outputs to PCM16 mono WAVE (RIFF) + # + OUTPUT_MONO_WAVE = True + + def __init__(self, rconf=None, logger=None): + # + # NOTE custom TTS wrappers must be implemented + # in a class named CustomTTSWrapper + # otherwise the Synthesizer will not work + # + # NOTE this custom TTS wrapper implements + # only the subprocess call method + # hence we set the following init parameters + # + super(CustomTTSWrapper, self).__init__( + has_subprocess_call=True, + has_c_extension_call=False, + has_python_call=False, + rconf=rconf, + logger=logger + ) + # + # NOTE this example is minimal, as we implement only + # the subprocess call method + # hence, all we need to do is to specify + # how to map the command line arguments of the TTS engine + # + # NOTE if our TTS engine was callable via Python or a Python C extension, + # we would have needed to write a _synthesize_multiple_python() + # or a _synthesize_multiple_c_extension() function, + # with the same I/O interface of + # _synthesize_multiple_c_extension() in espeakwrapper.py + # + # NOTE on a command line, you will use eSpeak + # to synthesize some text to a WAVE file as follows: + # + # $ echo "text to be synthesized" | espeak -v en -w output_file.wav + # + # Observe that text is read from stdin, while the audio data + # is written to a file specified by a given output path, + # introduced by the "-w" switch. + # Also, there is a parameter to select the English voice ("en"), + # introduced by the "-v" switch. + # + self.set_subprocess_arguments([ + u"/usr/bin/espeak", # path of espeak executable; you can use just "espeak" if it is in your PATH + u"-v", # append "-v" + TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING, # it will be replaced by the actual voice code + u"-w", # append "-w" + TTSWrapper.CLI_PARAMETER_WAVE_PATH, # it will be replaced by the actual output file path + TTSWrapper.CLI_PARAMETER_TEXT_STDIN # text is read from stdin + ]) + # + # NOTE if your TTS engine only reads text from a file + # you can use the TTSWrapper.CLI_PARAMETER_TEXT_PATH placeholder. + # + # NOTE if your TTS engine only writes audio data to stdout + # you can use the TTSWrapper.CLI_PARAMETER_WAVE_STDOUT placeholder. + # + # NOTE if your TTS engine needs a more complex parameter + # for selecting the voice, e.g. Festival needs '-eval "(language_italian)"', + # you can implement a _voice_code_to_subprocess() function + # and use the TTSWrapper.CLI_PARAMETER_VOICE_CODE_FUNCTION placeholder + # instead of the TTSWrapper.CLI_PARAMETER_VOICE_CODE_STRING placeholder. + # See the aeneas/festivalwrapper.py file for an example. + # + + + diff --git a/aeneas/extra/ctw_speect.py b/aeneas/extra/ctw_speect.py new file mode 100644 index 00000000..b7ade692 --- /dev/null +++ b/aeneas/extra/ctw_speect.py @@ -0,0 +1,222 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +A wrapper for the ``speect`` TTS engine. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import numpy +import speect +import speect.audio +import speect.audio_riff + +from aeneas.audiofile import AudioFile +from aeneas.language import Language +from aeneas.timevalue import TimeValue +from aeneas.ttswrapper import TTSWrapper +import aeneas.globalfunctions as gf + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +class CustomTTSWrapper(TTSWrapper): + """ + A wrapper for the ``speect`` TTS engine. + + This wrapper supports calling the TTS engine + only via Python. + + To use this TTS engine, specify :: + + "tts=custom|tts_path=/path/to/this/file.py" + + in the ``RuntimeConfiguration`` object. + + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + """ + + TAG = u"CustomTTSWrapper" + + # + # NOTE in this example we load an English voice, + # hence we support only English language, + # and we map it to a dummy voice code + # + ENG = Language.ENG + """ English """ + LANGUAGE_TO_VOICE_CODE = { + ENG : ENG + } + DEFAULT_LANGUAGE = ENG + + # + # NOTE in this example we load a voice producing + # audio data in PCM16 mono WAVE (RIFF) format + # + OUTPUT_MONO_WAVE = True + + def __init__(self, rconf=None, logger=None): + super(CustomTTSWrapper, self).__init__( + has_subprocess_call=False, + has_c_extension_call=False, + has_python_call=True, + rconf=rconf, + logger=logger) + + def _synthesize_multiple_python(self, text_file, output_file_path, quit_after=None, backwards=False): + """ + Synthesize multiple text fragments, via Python call. + + Return a tuple (anchors, total_time, num_chars). + + :rtype: (bool, (list, TimeValue, int)) + """ + # + # TODO in the Speect Python API I was not able to find a way + # to generate the wave incrementally + # so I essentially copy the subprocess call mechanism: + # generating wave data for each fragment, + # and concatenating them together + # + self.log(u"Calling TTS engine via Python...") + try: + # get sample rate and encoding + du_nu, sample_rate, encoding, da_nu = self._synthesize_single_helper( + text=u"Dummy text to get sample_rate", + voice_code=self.DEFAULT_LANGUAGE + ) + + # open output file + output_file = AudioFile(rconf=self.rconf, logger=self.logger) + output_file.audio_format = encoding + output_file.audio_channels = 1 + output_file.audio_sample_rate = sample_rate + + # create output + anchors = [] + current_time = TimeValue("0.000") + num = 0 + num_chars = 0 + fragments = text_file.fragments + if backwards: + fragments = fragments[::-1] + for fragment in fragments: + # language to voice code + # + # NOTE since voice_code is actually ignored + # in _synthesize_single_helper(), + # the value of voice_code is irrelevant + # + # however, in general you need to apply + # the _language_to_voice_code() function that maps + # the text language to a voice code + # + # here we apply the _language_to_voice_code() defined in super() + # that sets voice_code = fragment.language + # + voice_code = self._language_to_voice_code(fragment.language) + # synthesize and get the duration of the output file + self.log([u"Synthesizing fragment %d", num]) + duration, sr_nu, enc_nu, data = self._synthesize_single_helper( + text=(fragment.filtered_text + u" "), + voice_code=voice_code + ) + # store for later output + anchors.append([current_time, fragment.identifier, fragment.text]) + # increase the character counter + num_chars += fragment.characters + # append new data + self.log([u"Fragment %d starts at: %.3f", num, current_time]) + if duration > 0: + self.log([u"Fragment %d duration: %.3f", num, duration]) + current_time += duration + # if backwards, we append the data reversed + output_file.add_samples(data, reverse=backwards) + else: + self.log([u"Fragment %d has zero duration", num]) + # increment fragment counter + num += 1 + # check if we must stop synthesizing because we have enough audio + if (quit_after is not None) and (current_time > quit_after): + self.log([u"Quitting after reached duration %.3f", current_time]) + break + + # if backwards, we need to reverse the audio samples again + if backwards: + output_file.reverse() + + # write output file + self.log([u"Writing audio file '%s'", output_file_path]) + output_file.write(file_path=output_file_path) + except Exception as exc: + self.log_exc(u"An unexpected error occurred while calling TTS engine via Python", exc, False, None) + return (False, None) + + # return output + # NOTE anchors do not make sense if backwards + self.log([u"Returning %d time anchors", len(anchors)]) + self.log([u"Current time %.3f", current_time]) + self.log([u"Synthesized %d characters", num_chars]) + self.log(u"Calling TTS engine via Python... done") + return (True, (anchors, current_time, num_chars)) + + def _synthesize_single_python(self, text, voice_code, output_file_path): + """ + Synthesize a single text fragment via Python call. + + :rtype: tuple (result, (duration, sample_rate, encoding, data)) + """ + self.log(u"Synthesizing using Python call...") + data = self._synthesize_single_helper(text, voice_code, output_file_path) + return (True, data) + + def _synthesize_single_helper(self, text, voice_code, output_file_path=None): + """ + This is an helper function to synthesize a single text fragment via Python call. + + The caller can choose whether the output file should be written to disk or not. + + :rtype: tuple (result, (duration, sample_rate, encoding, data)) + """ + # + # NOTE in this example, we assume that the Speect voice data files + # are located in the same directory of this .py source file + # and that the voice JSON file is called "voice.json" + # + # NOTE the voice_code value is ignored in this example, + # but in general one might select a voice file to load, + # depending on voice_code + # + voice_json_path = gf.safe_str(gf.absolute_path("voice.json", __file__)) + voice = speect.SVoice(voice_json_path) + utt = voice.synth(text) + audio = utt.features["audio"] + if output_file_path is not None: + audio.save_riff(gf.safe_str(output_file_path)) + + # get length and data using speect Python API + waveform = audio.get_audio_waveform() + audio_sample_rate = int(waveform["samplerate"]) + audio_length = TimeValue(audio.num_samples() / audio_sample_rate) + audio_format = "pcm16" + audio_samples = numpy.fromstring(waveform["samples"], dtype=numpy.int16).astype("float64") / 32768 + + # return data + return (audio_length, audio_sample_rate, audio_format, audio_samples) + + + diff --git a/aeneas/festivalwrapper.py b/aeneas/festivalwrapper.py new file mode 100644 index 00000000..67dadd34 --- /dev/null +++ b/aeneas/festivalwrapper.py @@ -0,0 +1,135 @@ +#!/usr/bin/env python +# coding=utf-8 + +""" +This module contains the following classes: + +* :class:`~aeneas.festivalwrapper.FESTIVALWrapper`, a wrapper for the ``Festival`` TTS engine. +""" + +from __future__ import absolute_import +from __future__ import print_function + +from aeneas.language import Language +from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.ttswrapper import TTSWrapper + +__author__ = "Alberto Pettarin" +__copyright__ = """ + Copyright 2012-2013, Alberto Pettarin (www.albertopettarin.it) + Copyright 2013-2015, ReadBeyond Srl (www.readbeyond.it) + Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) + """ +__license__ = "GNU AGPL v3" +__version__ = "1.5.0" +__email__ = "aeneas@readbeyond.it" +__status__ = "Production" + +class FESTIVALWrapper(TTSWrapper): + """ + A wrapper for the ``Festival`` TTS engine. + + This wrapper supports calling the TTS engine + via ``subprocess`` only. + + In abstract terms, it performs one or more calls like :: + + $ echo text | text2wave -eval (language_italian) -o output_file.wav + + To use this TTS engine, specify :: + + "tts=festival|tts_path=/path/to/wave2text" + + in the ``RuntimeConfiguration`` object. + + See :class:`~aeneas.ttswrapper.TTSWrapper` for the available functions. + Below are listed the languages supported by this wrapper. + + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` + :param logger: the logger object + :type logger: :class:`~aeneas.logger.Logger` + """ + + CES = Language.CES + """ Czech """ + + CYM = Language.CYM + """ Welsh """ + + ENG = Language.ENG + """ English """ + + FIN = Language.FIN + """ Finnish """ + + ITA = Language.ITA + """ Italian """ + + RUS = Language.RUS + """ Russian """ + + SPA = Language.SPA + """ Spanish """ + + ENG_GBR = "eng-GBR" + """ English (GB) """ + + ENG_SCT = "eng-SCT" + """ English (Scotland) """ + + ENG_USA = "eng-USA" + """ English (USA) """ + + LANGUAGE_TO_VOICE_CODE = { + CES : CES, + CYM : CYM, + ENG : ENG, + ENG_GBR : ENG_GBR, + ENG_SCT : ENG_SCT, + ENG_USA : ENG_USA, + SPA : SPA, + FIN : FIN, + ITA : ITA, + RUS : RUS + } + DEFAULT_LANGUAGE = ENG + + VOICE_CODE_TO_SUBPROCESS = { + CES : u"(language_czech)", + CYM : u"(language_welsh)", + ENG : u"(language_english)", + ENG_GBR : u"(language_british_english)", + ENG_SCT : u"(language_scots_gaelic)", + ENG_USA : u"(language_american_english)", + SPA : u"(language_castillian_spanish)", + FIN : u"(language_finnish)", + ITA : u"(language_italian)", + RUS : u"(language_russian)", + } + + OUTPUT_MONO_WAVE = True + + TAG = u"FESTIVALWrapper" + + def __init__(self, rconf=None, logger=None): + super(FESTIVALWrapper, self).__init__( + has_subprocess_call=True, + has_c_extension_call=False, + has_python_call=False, + rconf=rconf, + logger=logger + ) + self.set_subprocess_arguments([ + self.rconf[RuntimeConfiguration.TTS_PATH], + TTSWrapper.CLI_PARAMETER_VOICE_CODE_FUNCTION, + u"-o", + TTSWrapper.CLI_PARAMETER_WAVE_PATH, + TTSWrapper.CLI_PARAMETER_TEXT_STDIN + ]) + + def _voice_code_to_subprocess(self, voice_code): + return [u"-eval", self.VOICE_CODE_TO_SUBPROCESS[voice_code]] + + + diff --git a/aeneas/ffmpegwrapper.py b/aeneas/ffmpegwrapper.py index 4da85420..108c7bae 100644 --- a/aeneas/ffmpegwrapper.py +++ b/aeneas/ffmpegwrapper.py @@ -2,14 +2,17 @@ # coding=utf-8 """ -Wrapper around ``ffmpeg`` to convert audio files. +This module contains the following classes: + +* :class:`~aeneas.ffmpegwrapper.FFMPEGWrapper`, a wrapper around ``ffmpeg`` to convert audio files; +* :class:`~aeneas.ffmpegwrapper.FFMPEGPathError`, representing a failure to locate the ``ffmpeg`` executable. """ from __future__ import absolute_import from __future__ import print_function import subprocess -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration import aeneas.globalfunctions as gf @@ -20,31 +23,32 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" class FFMPEGPathError(Exception): """ Error raised when the path to ``ffmpeg`` is not a valid executable. + + .. versionadded:: 1.4.1 """ pass -class FFMPEGWrapper(object): +class FFMPEGWrapper(Loggable): """ - Wrapper around ``ffmpeg`` to convert audio files. + A wrapper around ``ffmpeg`` to convert audio files. - It will perform a call like:: + In abstract terms, it will perform a call like:: $ ffmpeg -i /path/to/input.mp3 [parameters] /path/to/output.wav - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ FFMPEG_SAMPLE_8000 = ["-ar", "8000"] @@ -134,14 +138,6 @@ class FFMPEGWrapper(object): TAG = u"FFMPEGWrapper" - def __init__(self, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - def convert( self, input_file_path, @@ -166,43 +162,36 @@ def convert( you can skip a portion at the beginning and at the end of the original input file. - :param input_file_path: the path of the audio file to convert - :type input_file_path: string - :param output_file_path: the path of the converted audio file - :type output_file_path: string - :param head_length: skip these many seconds - from the beginning of the audio file - :type head_length: float - :param process_length: process these many seconds of the audio file - :type process_length: float - - :raises FFMPEGPathError: if the path to the ``ffmpeg`` executable cannot be called - :raise OSError: if ``input_file_path`` does not exist - or ``output_file_path`` cannot be written + :param string input_file_path: the path of the audio file to convert + :param string output_file_path: the path of the converted audio file + :param float head_length: skip these many seconds + from the beginning of the audio file + :param float process_length: process these many seconds of the audio file + :raises: :class:`~aeneas.ffmpegwrapper.FFMPEGPathError`: if the path to the ``ffmpeg`` executable cannot be called + :raises: OSError: if ``input_file_path`` does not exist + or ``output_file_path`` cannot be written """ # test if we can read the input file if not gf.file_can_be_read(input_file_path): - self._log([u"Input file '%s' cannot be read", input_file_path], Logger.CRITICAL) - raise OSError("Input file cannot be read") + self.log_exc(u"Input file '%s' cannot be read" % (input_file_path), None, True, OSError) # test if we can write the output file if not gf.file_can_be_written(output_file_path): - self._log([u"Output file '%s' cannot be written", output_file_path], Logger.CRITICAL) - raise OSError("Output file cannot be written") + self.log_exc(u"Output file '%s' cannot be written" % (output_file_path), None, True, OSError) # call ffmpeg - arguments = [self.rconf["ffmpeg_path"]] + arguments = [self.rconf[RuntimeConfiguration.FFMPEG_PATH]] arguments.extend(["-i", input_file_path]) if head_length is not None: arguments.extend(["-ss", head_length]) if process_length is not None: arguments.extend(["-t", process_length]) - if self.rconf["ffmpeg_sample_rate"] in self.FFMPEG_PARAMETERS_MAP: - arguments.extend(self.FFMPEG_PARAMETERS_MAP[self.rconf["ffmpeg_sample_rate"]]) + if self.rconf[RuntimeConfiguration.FFMPEG_SAMPLE_RATE] in self.FFMPEG_PARAMETERS_MAP: + arguments.extend(self.FFMPEG_PARAMETERS_MAP[self.rconf[RuntimeConfiguration.FFMPEG_SAMPLE_RATE]]) else: arguments.extend(self.FFMPEG_PARAMETERS_DEFAULT) arguments.append(output_file_path) - self._log([u"Calling with arguments '%s'", arguments]) + self.log([u"Calling with arguments '%s'", arguments]) try: proc = subprocess.Popen( arguments, @@ -214,18 +203,16 @@ def convert( proc.stdout.close() proc.stdin.close() proc.stderr.close() - except OSError: - self._log([u"Unable to call the '%s' ffmpeg executable", self.rconf["ffmpeg_path"]], Logger.CRITICAL) - raise FFMPEGPathError("Unable to call the specified ffmpeg executable") - self._log(u"Call completed") + except OSError as exc: + self.log_exc(u"Unable to call the '%s' ffmpeg executable" % (self.rconf[RuntimeConfiguration.FFMPEG_PATH]), exc, True, FFMPEGPathError) + self.log(u"Call completed") # check if the output file exists if not gf.file_exists(output_file_path): - self._log([u"Output file '%s' was not written", output_file_path], Logger.CRITICAL) - raise OSError("Output file was not written") + self.log_exc(u"Output file '%s' was not written" % (output_file_path), None, True, OSError) # returning the output file path - self._log([u"Returning output file path '%s'", output_file_path]) + self.log([u"Returning output file path '%s'", output_file_path]) return output_file_path diff --git a/aeneas/ffprobewrapper.py b/aeneas/ffprobewrapper.py index 89f9ab72..e3db20b4 100644 --- a/aeneas/ffprobewrapper.py +++ b/aeneas/ffprobewrapper.py @@ -2,7 +2,13 @@ # coding=utf-8 """ -Wrapper around ``ffprobe`` to read the properties of an audio file. +This module contains the following classes: + +* :class:`~aeneas.ffprobewrapper.FFPROBEWrapper`, a wrapper around ``ffprobe`` to read the properties of an audio file; +* :class:`~aeneas.ffprobewrapper.FFPROBEParsingError`, +* :class:`~aeneas.ffprobewrapper.FFPROBEPathError`, and +* :class:`~aeneas.ffprobewrapper.FFPROBEUnsupportedFormatError`, + representing errors while reading the properties of audio files. """ from __future__ import absolute_import @@ -10,8 +16,9 @@ import re import subprocess -from aeneas.logger import Logger +from aeneas.logger import Loggable from aeneas.runtimeconfiguration import RuntimeConfiguration +from aeneas.timevalue import TimeValue import aeneas.globalfunctions as gf __author__ = "Alberto Pettarin" @@ -21,7 +28,7 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" @@ -36,6 +43,8 @@ class FFPROBEParsingError(Exception): class FFPROBEPathError(Exception): """ Error raised when the path to ``ffprobe`` is not a valid executable. + + .. versionadded:: 1.4.1 """ pass @@ -48,7 +57,7 @@ class FFPROBEUnsupportedFormatError(Exception): -class FFPROBEWrapper(object): +class FFPROBEWrapper(Loggable): """ Wrapper around ``ffprobe`` to read the properties of an audio file. @@ -99,11 +108,10 @@ class FFPROBEWrapper(object): DISPOSITION:attached_pic=0 [/STREAM] - :param rconf: a runtime configuration. Default: ``None``, meaning that - default settings will be used. - :type rconf: :class:`aeneas.runtimeconfiguration.RuntimeConfiguration` + :param rconf: a runtime configuration + :type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration` :param logger: the logger object - :type logger: :class:`aeneas.logger.Logger` + :type logger: :class:`~aeneas.logger.Logger` """ FFPROBE_PARAMETERS = [ @@ -136,14 +144,6 @@ class FFPROBEWrapper(object): TAG = u"FFPROBEWrapper" - def __init__(self, rconf=None, logger=None): - self.logger = logger or Logger() - self.rconf = rconf or RuntimeConfiguration() - - def _log(self, message, severity=Logger.DEBUG): - """ Log """ - self.logger.log(message, severity, self.TAG) - def read_properties(self, audio_file_path): """ Read the properties of an audio file @@ -190,29 +190,26 @@ def read_properties(self, audio_file_path): d["DISPOSITION:clean_effects"]=0 d["DISPOSITION:attached_pic"]=0 - :param audio_file_path: the path of the audio file to analyze - :type audio_file_path: string (path) + :param string audio_file_path: the path of the audio file to analyze :rtype: dict - - :raises TypeError: if ``audio_file_path`` is None - :raises OSError: if the file at ``audio_file_path`` cannot be read - :raises FFPROBEParsingError: if the call to ``ffprobe`` does not produce any output - :raises FFPROBEPathError: if the path to the ``ffprobe`` executable cannot be called - :raises FFPROBEUnsupportedFormatError: if the file has a format not supported by ``ffprobe`` + :raises: TypeError: if ``audio_file_path`` is None + :raises: OSError: if the file at ``audio_file_path`` cannot be read + :raises: FFPROBEParsingError: if the call to ``ffprobe`` does not produce any output + :raises: FFPROBEPathError: if the path to the ``ffprobe`` executable cannot be called + :raises: FFPROBEUnsupportedFormatError: if the file has a format not supported by ``ffprobe`` """ # test if we can read the file at audio_file_path if audio_file_path is None: - raise TypeError("The audio file path is None") + self.log_exc(u"The audio file path is None", None, True, TypeError) if not gf.file_can_be_read(audio_file_path): - self._log([u"Input file '%s' cannot be read", audio_file_path], Logger.CRITICAL) - raise OSError("Input file cannot be read") + self.log_exc(u"Input file '%s' cannot be read" % (audio_file_path), None, True, OSError) # call ffprobe - arguments = [self.rconf["ffprobe_path"]] + arguments = [self.rconf[RuntimeConfiguration.FFPROBE_PATH]] arguments.extend(self.FFPROBE_PARAMETERS) arguments.append(audio_file_path) - self._log([u"Calling with arguments '%s'", arguments]) + self.log([u"Calling with arguments '%s'", arguments]) try: proc = subprocess.Popen( arguments, @@ -224,23 +221,20 @@ def read_properties(self, audio_file_path): proc.stdout.close() proc.stdin.close() proc.stderr.close() - except OSError: - self._log([u"Unable to call the '%s' ffprobe executable", self.rconf["ffprobe_path"]], Logger.CRITICAL) - raise FFPROBEPathError("Unable to call the specified ffprobe executable") - self._log(u"Call completed") + except OSError as exc: + self.log_exc(u"Unable to call the '%s' ffprobe executable" % (self.rconf[RuntimeConfiguration.FFPROBE_PATH]), exc, True, FFPROBEPathError) + self.log(u"Call completed") - # if no output, raise error + # check there is some output if (stdoutdata is None) or (len(stderrdata) == 0): - self._log(u"No output produced by ffprobe", Logger.CRITICAL) - raise FFPROBEParsingError("No output produced by ffprobe") + self.log_exc(u"ffprobe produced no output", None, True, FFPROBEParsingError) # decode stdoutdata and stderrdata to Unicode string try: stdoutdata = gf.safe_unicode(stdoutdata) stderrdata = gf.safe_unicode(stderrdata) - except UnicodeDecodeError: - self._log(u"Error decoding stdout/stderr.") - raise FFPROBEParsingError("Unable to decode ffprobe out/err") + except UnicodeDecodeError as exc: + self.log_exc(u"Unable to decode ffprobe out/err", exc, True, FFPROBEParsingError) # dictionary for the results results = { @@ -255,39 +249,34 @@ def read_properties(self, audio_file_path): # TODO deal with multiple audio streams for line in stdoutdata.splitlines(): if line == self.STDOUT_END_STREAM: - self._log(u"Reached end of the stream") + self.log(u"Reached end of the stream") break elif len(line.split("=")) == 2: key, value = line.split("=") results[key] = value - self._log([u"Found property '%s'='%s'", key, value]) - - # convert duration to float - if self.STDOUT_DURATION in results: - self._log([u"Found duration: '%s'", results[self.STDOUT_DURATION]]) - results[self.STDOUT_DURATION] = gf.safe_float( - results[self.STDOUT_DURATION], - None - ) - else: - self._log(u"No duration found in stdout", Logger.WARNING) + self.log([u"Found property '%s'='%s'", key, value]) - # if audio_length is still None, try scanning ffprobe stderr output - if results[self.STDOUT_DURATION] is None: + try: + self.log([u"Duration found in stdout: '%s'", results[self.STDOUT_DURATION]]) + results[self.STDOUT_DURATION] = TimeValue(results[self.STDOUT_DURATION]) + self.log(u"Valid duration") + except: + self.log_warn(u"Invalid duration") + results[self.STDOUT_DURATION] = None + # try scanning ffprobe stderr output for line in stderrdata.splitlines(): match = self.STDERR_DURATION_REGEX.search(line) if match is not None: - self._log([u"Found matching line '%s'", line]) + self.log([u"Found matching line '%s'", line]) results[self.STDOUT_DURATION] = gf.time_from_hhmmssmmm(line) - self._log([u"Extracted duration '%f'", results[self.STDOUT_DURATION]]) + self.log([u"Extracted duration '%.3f'", results[self.STDOUT_DURATION]]) break if results[self.STDOUT_DURATION] is None: - self._log(u"No duration found in stdout or stderr (unsupported audio file format?)", Logger.CRITICAL) - raise FFPROBEUnsupportedFormatError("Unsupported audio file format") + self.log_exc(u"No duration found in stdout or stderr. Unsupported audio file format?", None, True, FFPROBEUnsupportedFormatError) # return dictionary - self._log(u"Returning dict") + self.log(u"Returning dict") return results diff --git a/aeneas/globalconstants.py b/aeneas/globalconstants.py index 9b15599a..2d5aa210 100644 --- a/aeneas/globalconstants.py +++ b/aeneas/globalconstants.py @@ -13,36 +13,68 @@ Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it) """ __license__ = "GNU AGPL v3" -__version__ = "1.4.1" +__version__ = "1.5.0" __email__ = "aeneas@readbeyond.it" __status__ = "Production" - ### CONSTANTS ### +CONFIG_RESERVED_CHARACTERS = ["~"] +""" List of reserved characters which are forbidden in configuration files """ + +CONFIG_STRING_ASSIGNMENT_SYMBOL = "=" +""" Assignment symbol in config string ``key=value`` pairs """ + +CONFIG_STRING_SEPARATOR_SYMBOL = "|" +""" Separator of ``key=value`` pairs in config strings """ + +PARSED_TEXT_SEPARATOR = "|" +""" Separator for input text files in parsed format """ + CONFIG_TXT_FILE_NAME = "config.txt" """ File name for the TXT configuration file in containers """ CONFIG_XML_FILE_NAME = "config.xml" """ File name for the XML configuration file in containers """ +CONFIG_XML_TASK_TAG = "task" +""" ```` tag in the XML configuration file """ + CONFIG_XML_TASKS_TAG = "tasks" """ ```` tag in the XML configuration file """ -CONFIG_XML_TASK_TAG = "task" -""" ```` tag in the XML configuration file """ +MIMETYPE_MAP = { + "aac": "audio/aac", + "aiff": "audio/x-aiff", + "flac": "audio/flac", + "mp3": "audio/mpeg", + "mp4": "audio/mp4", + "oga": "audio/x-vorbis+ogg", + "ogg": "audio/x-vorbis+ogg", + "wav": "audio/x-wav", + "webm": "video/webm" +} +""" Map from audio file extension to mimetype """ + +TMP_PATH_DEFAULT_NONPOSIX = None +""" +Default temporary directory path for non-POSIX OSes. +Set to ``None`` so that ``tempfile`` will select +the most approriate temporary directory root path. -CONFIG_RESERVED_CHARACTERS = ["~"] -""" List of reserved characters which are forbidden in configuration files """ +.. versionadded:: 1.4.1 +""" -CONFIG_STRING_SEPARATOR_SYMBOL = "|" -""" Separator of ``key=value`` pairs in config strings """ +TMP_PATH_DEFAULT_POSIX = "/tmp/" +""" +Default temporary directory path for POSIX OSes. -CONFIG_STRING_ASSIGNMENT_SYMBOL = "=" -""" Assignment symbol in config string ``key=value`` pairs """ +.. versionadded:: 1.4.1 +""" -PARSED_TEXT_SEPARATOR = "|" -""" Separator for input text files in parsed format """ + + +### PARAMETER NAMES ### # reserved parameter names (RPN) RPN_JOB_IDENTIFIER = "job_identifier" @@ -80,11 +112,13 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.language.Language` +Values: listed in :class:`~aeneas.language.Language` Example:: - job_language=en + job_language=eng-GBR + job_language=eng-USA + job_language=ita-ITA """ @@ -110,7 +144,7 @@ Usage: config string, TXT config file -Values: string (path) +Values: string Example:: @@ -127,7 +161,7 @@ Usage: config string, TXT config file -Values: string (path) +Values: string Example:: @@ -142,7 +176,7 @@ Usage: config string, TXT config file -Values: listed in :class:`aeneas.hierarchytype.HierarchyType` +Values: listed in :class:`~aeneas.hierarchytype.HierarchyType` Example:: @@ -170,18 +204,7 @@ PPN_JOB_IS_TEXT_FILE_FORMAT = "is_text_type" """ -The text file format of text files in input containers. - -Usage: config string, TXT config file, XML config file - -Values: listed in :class:`aeneas.textfile.TextFileFormat` - -Example:: - - is_text_type=plain - is_text_type=parsed - is_text_type=unparsed - +See PPN_TASK_IS_TEXT_FILE_FORMAT """ PPN_JOB_IS_TEXT_FILE_NAME_REGEX = "is_text_file_name_regex" @@ -207,7 +230,7 @@ Usage: config string, TXT config file -Values: string (path) +Values: string Example:: @@ -217,56 +240,34 @@ """ -PPN_JOB_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex" +PPN_JOB_IS_TEXT_MUNPARSED_L1_ID_REGEX = "is_text_munparsed_l1_id_regex" +""" +See PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX """ -The regex for matching the ``class`` attribute -of XML elements containing text fragments to be extracted -from ``unparsed`` text files. - -Usage: config string, TXT config file, XML config file - -Values: regex -Example:: +PPN_JOB_IS_TEXT_MUNPARSED_L2_ID_REGEX = "is_text_munparsed_l2_id_regex" +""" +See PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX +""" - is_text_unparsed_class_regex=ra - is_text_unparsed_class_regex=readaloud - is_text_unparsed_class_regex=ra[0-9]+ +PPN_JOB_IS_TEXT_MUNPARSED_L3_ID_REGEX = "is_text_munparsed_l3_id_regex" +""" +See PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX +""" +PPN_JOB_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex" +""" +See PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX """ PPN_JOB_IS_TEXT_UNPARSED_ID_REGEX = "is_text_unparsed_id_regex" """ -The regex for matching the ``id`` attribute -of XML elements containing text fragments to be extracted -from ``unparsed`` text files. - -Usage: config string, TXT config file, XML config file - -Values: regex - -Example:: - - is_text_unparsed_id_regex=f[0-9]+ - is_text_unparsed_id_regex=ra.* - +See PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX """ PPN_JOB_IS_TEXT_UNPARSED_ID_SORT = "is_text_unparsed_id_sort" """ -The sorting algorithm to be used to sort the text fragments -extracted from ``unparsed`` text files, based on their ``id`` attributes. - -Usage: config string, TXT config file, XML config file - -Values: listed in :class:`aeneas.idsortingalgorithm.IDSortingAlgorithm` - -Example:: - - is_text_unparsed_id_sort=lexicographic - is_text_unparsed_id_sort=numeric - is_text_unparsed_id_sort=unsorted - +See PPN_TASK_IS_TEXT_UNPARSED_ID_SORT """ PPN_JOB_OS_CONTAINER_FORMAT = "os_job_file_container" @@ -275,7 +276,7 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.container.ContainerFormat` +Values: listed in :class:`~aeneas.container.ContainerFormat` Example:: @@ -304,7 +305,7 @@ Usage: config string, TXT config file, XML config file -Values: string (path) +Values: string Example:: @@ -318,7 +319,7 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.hierarchytype.HierarchyType` +Values: listed in :class:`~aeneas.hierarchytype.HierarchyType` Example:: @@ -331,12 +332,13 @@ """ Key for specifying the syncmap language -Values: listed in :class:`aeneas.language.Language` +Values: listed in :class:`~aeneas.language.Language` Example:: - language=en - language=it + language=eng-GBR + language=eng-USA + language=ita-ITA .. versionadded:: 1.2.0 """ @@ -375,11 +377,13 @@ Usage: config string, XML config file -Values: listed in :class:`aeneas.language.Language` +Values: listed in :class:`~aeneas.language.Language` Example:: - task_language=en + task_language=eng-GBR + task_language=eng-USA + task_language=ita-ITA """ @@ -390,7 +394,7 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` +Values: listed in :class:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm` Example:: @@ -656,7 +660,7 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.textfile.TextFileFormat` +Values: listed in :class:`~aeneas.textfile.TextFileFormat` Example:: @@ -689,7 +693,7 @@ Usage: config string, TXT config file, XML config file -Values: string (path) +Values: string Example:: @@ -697,6 +701,85 @@ """ +PPN_TASK_IS_TEXT_MPLAIN_WORD_SEPARATOR = "is_text_mplain_word_separator" +""" +The word separator to be used when splitting words +in ``mplain`` input text files. + +You can use the following special strings: + +* ``equal`` for a ``=`` character (ASCII ``0x20``), +* ``pipe`` for a ``|`` character (ASCII ``0x7C``), +* ``space`` for a space character (ASCII ``0x20``), +* ``tab`` for a tab character (ASCII ``0x09``). + +Any other string will be used as the word separator. +If not specified, the ``space`` will be used. + +Usage: config string, TXT config file, XML config file + +Values: string + +Example:: + + is_text_mplain_word_separator=space + is_text_mplain_word_separator=tab + is_text_mplain_word_separator=, + +""" + +PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX = "is_text_munparsed_l1_id_regex" +""" +The regex to match ``id`` attributes for level 1 (paragraph) text fragments. +It applies to ``munparsed`` text files only. + +Usage: config string, TXT config file, XML config file + +Values: regex + +Example:: + + is_text_munparsed_l1_id_regex=p[0-9]+ + +.. versionadded:: 1.5.0 +""" + +PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX = "is_text_munparsed_l2_id_regex" +""" +The regex to match ``id`` attributes for level 2 (sentence) text fragments. +It applies to ``munparsed`` text files only. + +Usage: config string, TXT config file, XML config file + +Values: regex + +Example:: + + is_text_munparsed_l2_id_regex=s[0-9]+ + is_text_munparsed_l2_id_regex=p[0-9]+s[0-9]+ + +.. versionadded:: 1.5.0 + +""" + +PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX = "is_text_munparsed_l3_id_regex" +""" +The regex to match ``id`` attributes for level 3 (word) text fragments. +It applies to ``munparsed`` text files only. + +Usage: config string, TXT config file, XML config file + +Values: regex + +Example:: + + is_text_munparsed_l3_id_regex=w[0-9]+ + is_text_munparsed_l3_id_regex=p[0-9]+s[0-9]+w[0-9]+ + +.. versionadded:: 1.5.0 + +""" + PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX = "is_text_unparsed_class_regex" """ The regex to match ``class`` attributes for text fragments. @@ -733,11 +816,11 @@ PPN_TASK_IS_TEXT_UNPARSED_ID_SORT = "is_text_unparsed_id_sort" """ The algorithm to sort text fragments by their ``id`` attributes. -It applies to unparsed text files only. +It applies to ``unparsed`` text files only. Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.idsortingalgorithm.IDSortingAlgorithm` +Values: listed in :class:`~aeneas.idsortingalgorithm.IDSortingAlgorithm` Example:: @@ -753,7 +836,7 @@ Usage: config string, TXT config file, XML config file -Values: listed in :class:`aeneas.syncmap.SyncMapFormat` +Values: listed in :class:`~aeneas.syncmap.SyncMapFormat` Example:: @@ -788,6 +871,26 @@ .. versionadded:: 1.3.1 """ +PPN_TASK_OS_FILE_LEVELS = "os_task_file_levels" +""" +If the input text file is multilevel, +only outputs the specified levels. + +This parameter has no effect for single-level +input text files or output sync map formats. + +Usage: config string, TXT config file, XML config file + +Values: string + +Example:: + + os_task_file_levels=123 + os_task_file_levels=3 + +.. versionadded:: 1.5.0 +""" + PPN_TASK_OS_FILE_NAME = "os_task_file_name" """ The name of the sync map file output for the task. @@ -806,6 +909,21 @@ """ +PPN_TASK_OS_FILE_NO_ZERO = "os_task_file_no_zero" +""" +If specified, do not allow fragments with zero duration. + +Usage: config string, TXT config file, XML config file + +Values: string + +Example:: + + os_task_file_no_zero=True + +.. versionadded:: 1.5.0 +""" + PPN_TASK_OS_FILE_SMIL_AUDIO_REF = "os_task_file_smil_audio_ref" """ The value of the ``src`` attribute for the ``