TTS(Text To Speech,文字轉語音)是語音合成應用的一種,它將儲存於電腦中的檔案,如幫助檔案或者網頁,轉換成自然語音輸出。TTS可以幫助有視覺障礙的人閱讀計算機上的資訊,或者只是簡單的用來增加文字文件的可讀性。TTS經常與聲音識別程式一起使用。



The MARY Text-to-Speech System (MaryTTS)
MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.

As of version 5.2, MaryTTS supports German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, and Turkish; more languages are in preparation. MaryTTS comes with toolkits for quickly adding support for new languages and for building unit selection and HMM-based synthesis voices.

SpeakRight 是一個 Java 框架,用於編寫語音識別應用,基於 VoiceXML 技術。使用 StringTemplate 模板引擎自動生成 VoiceXML 文件。

Festival提供了一個通用的框架,用於構建語音合成系統,該系統包含了各種模組示例。它提供了完整的文字轉語音的API,原生支援Mac OS,支援的語言包括英語和西班牙語。

 Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system. And full tools and documentation for build new voices are available through Carnegie Mellon's FestVox project (http://festvox.org)

The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.

Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

This distribution includes:

Full English (British and American English) text to speech
Full C++ source for modules, SIOD interpreter, and Scheme library
Lexicon based on CMULEX and OALD (OALD is restricted to non-commercial use only)
Edinburgh Speech Tools, low level C++ library
Full documentation (html, postscript and GNU info format)

FreeTTS 是完全採用 Java 開發的語音合成系統,它是卡內基梅隆大學基於 Flite 這個小型的語音合成引擎開發的。


eSpeak是一個小型的、開放原始碼的語音合成系統,支援多種語言。eSpeak使用共振峰合成方法,這可以使提供的語言檔案非常小。該系統支援Windows平臺上的SAPI5,所以能用於螢幕閱讀程式和其他支援Windows SAPI5介面的程式。eSpeak可以將文字轉換成音素程式碼,因此它也可以用於另一個語音合成引擎的前端。

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.   http://espeak.sourceforge.net
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.
Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about 2 Mbytes.
Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Development tools are available for producing and tuning phoneme data.
Written in C.
I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh.


Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
Flite 1.4-release is now released as source. Flite offers:

Completely in C (no C++ or Scheme) for portability, size and speed
Reimplentation of the core parts of the Festival architecture (HRG) allowing close compabilility between voices built for each system.
Support for compiling FestVox voices into Flite voices.
Thread safe
Scalable voice size with all data const so it can be in ROM
Target architectures, ipaq (Linux/WinCE), Palm OS (treo) and smaller
Flite is in basically written and is in its first stages of testing before release, as free software. A small diphone voice based on the CMU KAL voice is included. along with a sample limited domain talking clock.


