7 個開源的TTS（文字轉語音）系統推薦

阿新 • • 發佈：2019-01-16

前言：TTS在電視產品的應用，能夠幫助對電視機介面無法採用視覺化標準訪問的盲人和弱視的人，在歐洲在美國已經開始制訂了規範的實現標準，和實施的規章制度。

Ref:

http://www.iteye.com/news/23832

TTS（Text To Speech，文字轉語音）是語音合成應用的一種，它將儲存於電腦中的檔案，如幫助檔案或者網頁，轉換成自然語音輸出。TTS可以幫助有視覺障礙的人閱讀計算機上的資訊，或者只是簡單的用來增加文字文件的可讀性。TTS經常與聲音識別程式一起使用。

本文主要介紹7款開源的TTS系統，你可以用來學習，也可以在你的專案中使用。

MARY是一個採用Java開發的、多語種的文字轉語音平臺，它支援：德語、英語、美式英語、泰盧固語、土耳其語和俄語。

The MARY Text-to-Speech System (MaryTTS)
MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.

As of version 5.2, MaryTTS supports German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, and Turkish; more languages are in preparation. MaryTTS comes with toolkits for quickly adding support for new languages and for building unit selection and HMM-based synthesis voices.

SpeakRight 是一個 Java 框架，用於編寫語音識別應用，基於 VoiceXML 技術。使用 StringTemplate 模板引擎自動生成 VoiceXML 文件。

Festival提供了一個通用的框架，用於構建語音合成系統，該系統包含了各種模組示例。它提供了完整的文字轉語音的API，原生支援Mac OS，支援的語言包括英語和西班牙語。

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system. And full tools and documentation for build new voices are available through Carnegie Mellon's FestVox project (http://festvox.org)

The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.

Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

This distribution includes:

Full English (British and American English) text to speech
Full C++ source for modules, SIOD interpreter, and Scheme library
Lexicon based on CMULEX and OALD (OALD is restricted to non-commercial use only)
Edinburgh Speech Tools, low level C++ library
Full documentation (html, postscript and GNU info format)

FreeTTS 是完全採用 Java 開發的語音合成系統，它是卡內基梅隆大學基於 Flite 這個小型的語音合成引擎開發的。

Festvox專案構建了一個更加系統化、全新的語音合成功能。Festvox是大部分語音合成庫的基礎。

eSpeak是一個小型的、開放原始碼的語音合成系統，支援多種語言。eSpeak使用共振峰合成方法，這可以使提供的語言檔案非常小。該系統支援Windows平臺上的SAPI5，所以能用於螢幕閱讀程式和其他支援Windows SAPI5介面的程式。eSpeak可以將文字轉換成音素程式碼，因此它也可以用於另一個語音合成引擎的前端。

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. http://espeak.sourceforge.net
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.
Features.
Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about 2 Mbytes.
Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Development tools are available for producing and tuning phoneme data.
Written in C.
I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh.

Flite是一個小型、快速的TTS系統，是著名的語音合成系統festival的C版本，可用於嵌入式系統。

Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
Flite 1.4-release is now released as source. Flite offers:

Completely in C (no C++ or Scheme) for portability, size and speed
Reimplentation of the core parts of the Festival architecture (HRG) allowing close compabilility between voices built for each system.
Support for compiling FestVox voices into Flite voices.
Thread safe
Scalable voice size with all data const so it can be in ROM
Target architectures, ipaq (Linux/WinCE), Palm OS (treo) and smaller
Flite is in basically written and is in its first stages of testing before release, as free software. A small diphone voice based on the CMU KAL voice is included. along with a sample limited domain talking clock.

【HSY75案】

TTS 的幾個驗證可以訪問的網站：

http://festvox.org/

http://espeak.sourceforge.net/

http://mary.dfki.de/

【HSY75案】

其他參考：

TTS技術

http://blog.csdn.net/qq_39351311/article/details/75193777?locationNum=2&fps=1

Architecture Walkthrough

http://mary.dfki.de/documentation/module-architecture.html

https://en.wikipedia.org/wiki/Speech_synthesis

https://en.wikipedia.org/wiki/Text_to_speech_in_digital_television

http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html

http://festvox.org/festvox/

http://www.cstr.ed.ac.uk/projects/festival/download.html

http://espeak.sourceforge.net/docindex.html

https://sourceforge.net/projects/espeak/

http://www.speech.cs.cmu.edu/flite/slides.pdf

7 個開源的TTS（文字轉語音）系統推薦

Architecture Walkthrough

7 個開源的TTS（文字轉語音）系統推薦

Android Studio 接入訊飛語音合成（文字轉語音）

呼叫GOOGLE的TTS實現文字轉語音(XE7+小米2)(XE10.1+小米5)

訊飛語音整合（語音轉文字，文字轉語音）

Android文字轉語音引擎（TTS）使用

Android文字轉語音引擎（TTS）簡單比較及下載

C++ Builder 源碼：TTS 文字轉語音，可以朗讀文字，或者把文字轉為 wav 聲音文件

Microsoft Azure——文字轉語音(TTS) REST API 使用教程

AWS機器學習初探（2）：文字翻譯Translate、文字轉語音Polly、語音轉文字Transcribe

C++語音識別介面快速入門（Microsoft Speech SDK）——文字轉語音

Android文字轉語音（TextToSpeech）記憶體洩漏的問題

力控呼叫捷通TTS ActiveX控制元件實現中文文字轉語音

web端文字轉語音的幾種方案

ios原生文字轉語音

C#文字轉語音以及語音閱讀小例項

使用Python實現文字轉語音並生成wav檔案

蘋果原生文字轉語音播報

Android百度語音整合——文字轉語音

Nel ASA：獲得澳大利亞首個電轉氣（太陽能轉氫氣）專案

修改Setting中文字轉語音選項的首選引擎預設項

7 個開源的TTS（文字轉語音）系統推薦

Architecture Walkthrough

相關推薦