1. 程式人生 > >【資訊科技】【2006】熵與語音

【資訊科技】【2006】熵與語音

在這裡插入圖片描述

本文為瑞典皇家理工學院(作者:Mattias Nilsson)的博士論文,共54頁。

在本文中,我們研究了語音訊號的表示以及從包含語音訊號特徵的觀測值估計資訊理論測度。本文主要由四篇論文組成。

論文A提出了一種便於完美重構的語音訊號的緊湊表示方法,該方法由模型、模型引數和訊號係數構成。與現有的語音表示法相比,一個不同之處在於,我們根據所選擇的能量集中準則,通過模型適應以最大限度地集中訊號係數的能量來尋求緊湊表示。該表示式的各部分與語音訊號的特性密切相關,例如頻譜包絡、基音和濁音/清音訊號係數,這對語音編碼和修改都是有益的。

從熵的資訊理論測度,可以推匯出編碼和分類的效能極限。論文B和C討論微分熵的估計。論文B描述了當向量觀測集(來自表示式)位於嵌入空間中的低維表面(流形)上時,微分熵的估計方法。與論文B提出的方法相比,論文C介紹了一種通過約束觀測空間的解析度來破壞流形結構的方法。這有助於對分類錯誤率的邊界進行估計,即使流形在嵌入空間內具有不同的維數。

最後,論文D研究了窄帶(0.3 - 3.4kHz)和高頻帶(3.4 - 8kHz)語音訊譜特徵之間的共享資訊量。論文D的研究結果表明,在沒有傳輸描述高頻帶額外資訊的情況下,高頻帶和窄帶之間共享的資訊對於高質量寬頻語音編碼(0.3 - 8kHz)是不夠的。

In this thesis, we study the representationof speech signals and the estimation of information-theoretical measures fromobservations containing features of the speech signal. The main body of the thesisconsists of four research papers. Paper A presents a compact representation ofthe speech signal that facilitates perfect reconstruction. The representationis constituted of models, model parameters, and signal coefficients. Adifference compared to existing speech representations is that we seek acompact representation by adapting the models to maximally concentrate theenergy of the signal coefficients according to a selected energy concentrationcriterion. The individual parts of the representation are closely related tospeech signal properties such as spectral envelope, pitch, and voiced/unvoicedsignal coefficients, beneficial for both speech coding and modification. Fromthe information-theoretical measure of entropy, performance limits in codingand classification can be derived. Papers B and C discuss the estimation ofdifferential entropy. Paper B describes a method for estimation of thedifferential entropies in the case when the set of vector observations (fromthe representation) lie on a lower-dimensional surface (manifold) in theembedding space. In contrast to the method presented in Paper B, Paper Cintroduces a method where the manifold structures are destroyed by constrainingthe resolution of the observation space. This facilitates the estimation ofbounds on classification error rates even when the manifolds are of varyingdimensionality within the embedding space. Finally, Paper D investigates theamount of shared information between spectral features of narrow-band (0.3-3.4kHz) and high-band (3.4-8 kHz) speech. The results in Paper D indicate that theinformation shared between the high-band and the narrow-band is insufficientfor high-quality wideband speech coding (0.3-8 kHz) without transmission ofextra information describing the high-band.

1 引言
2 語音規範表示
3 關於嵌入流形上資料的微分熵估計
4 模式分類中的內在維度及其對效能預測的意義
5 基於高斯混合模型的語音訊帶間互資訊估計

下載英文原文地址:

http://page5.dfpan.com/fs/dlcaj28211293169e77/

更多精彩文章請關注微訊號:在這裡插入圖片描述