1. 程式人生 > >常用數據庫記錄

常用數據庫記錄

忘記 第一個 文本 采樣 tar nim sting air ftw

記錄一下常用的數據庫。

  • TIMIT
    也忘記當時從哪下的了,網上也沒看到好一點的鏈接。
    TIMIT全稱The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus, 是由德州儀器(TI)、麻省理工學院(MIT)和坦福研究院(SRI)合作構建的聲學-音素連續語音語料庫。TIMIT數據集的語音采樣頻率為16kHz,一共包含6300個句子,由來自美國八個主要方言地區的630個人每人說出給定的10個句子,所有的句子都在音素級別(phone level)上進行了手動分割,標記。70%的說話人是男性;大多數說話者是成年白人。
  • THCHS30
    THCHS30是Dong Wang, Xuewei Zhang, Zhiyong Zhang這幾位大神發布的開放語音數據集,可用於開發中文語音識別系統。
  • CSTR VCTK Corpus

Google Wavenet用到的數據庫。
This CSTR VCTK Corpus includes speech data uttered by 109 native speakers of English with various accents. Each speaker reads out about 400 sentences, most of which were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker‘s accent. The newspaper texts were taken from The Herald (Glasgow), with permission from Herald & Times Group. Each speaker reads a different set of the newspaper sentences, where each set was selected using a greedy algorithm designed to maximise the contextual and phonetic coverage. The Rainbow Passage and elicitation paragraph are the same for all speakers. The Rainbow Passage can be found in the International Dialects of English Archive: (http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation paragraph is identical to the one used for the speech accent archive (http://accent.gmu.edu). The details of the the speech accent archive can be found at http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf


All speech data was recorded using an identical recording setup: an omni-directional head-mounted microphone (DPA 4035), 96kHz sampling frequency at 24 bits and in a hemi-anechoic chamber of the University of Edinburgh. All recordings were converted into 16 bits, were downsampled to 48 kHz based on STPK, and were manually end-pointed. This corpus was recorded for the purpose of building HMM-based text-to-speech synthesis systems, especially for speaker-adaptive HMM-based speech synthesis using average voice models trained on multiple speakers and speaker adaptation technologies.

  • VoxForge(開源的識別庫)

VoxForge創建的初衷是為免費和開源的語音識別引擎收集標註錄音(在Linux/Unix,Windows以及Mac平臺上)。
我們以GPL協議開放所有提交的錄音文件,並制作聲學模型,以供開源語音識別引擎使用,如CMUSphinx,ISIP,Julias(github)和HTK(註意:HTK有分發限制)。

  • OpenSL

OpenSLR是一個有聲書數據集。

OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition. We intend to be a convenient place for anyone to put resources that they have created, so that they can be downloaded publicly.


以下摘自:http://www.cnblogs.com/AriesQt/articles/6742721.html

來自論文 Zhang et al., 2015。這是有八個文字分類數據集組成的大型數據庫。對於新的文字分類基準,它是最常用的。樣本大小為 120K 到 3.6M,包括了從二元到 14 階的問題。來自 DBPedia, Amazon, Yelp, Yahoo!,搜狗和 AG 的數據集。

地址:https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M

WikiText