1. 程式人生 > >短時傅裏葉變換(Short Time Fourier Transform)原理及 Python 實現

短時傅裏葉變換(Short Time Fourier Transform)原理及 Python 實現

src 參考 函數 ade block return 技術 數學公式 def

原理

  短時傅裏葉變換(Short Time Fourier Transform, STFT) 是一個用於語音信號處理的通用工具.它定義了一個非常有用的時間和頻率分布類, 其指定了任意信號隨時間和頻率變化的復數幅度. 實際上,計算短時傅裏葉變換的過程是把一個較長的時間信號分成相同長度的更短的段, 在每個更短的段上計算傅裏葉變換, 即傅裏葉頻譜.

短時傅裏葉變換通常的數學定義如下:

技術分享圖片

其中,

技術分享圖片

DTFT (Decrete Time Fourier Transform) 為離散時間傅裏葉變換. 其數學公式, 如下所示:

技術分享圖片

  其中, x(n) 為在采樣數 n 處的信號幅度. ω~ 的定義如下:

技術分享圖片

  實現時, 短時傅裏葉變換被計算為一系列加窗數據幀的快速傅裏葉變換 (Fast Fourier Transform, FFT),其中窗口隨時間 “滑動” (slide) 或“跳躍” (hop) 。

Python 實現

  在程序中, frame_size 為將信號分為較短的幀的大小, 在語音處理中, 通常幀大小在 20ms 到 40ms 之間. 這裏設置為 25ms, 即 frame_size = 0.025;

  frame_stride 為相鄰幀的滑動尺寸或跳躍尺寸, 通常幀的滑動尺寸在 10ms 到 20ms 之間, 這裏設置為 10ms, 即 frame_stride = 0.01. 此時, 相鄰幀的交疊大小為 15ms;

  窗函數采用漢明窗函數 (Hamming Function) ;

  在每一幀, 進行 512 點快速傅裏葉變換, 即 NFFT = 512. 具體程序如下:

# -*- coding: utf8 -*-
import numpy as np

def calc_stft(signal, sample_rate=16000, frame_size=0.025, frame_stride=0.01, winfunc=np.hamming, NFFT=512):

    # Calculate the number of frames from the signal
    frame_length = frame_size * sample_rate
    frame_step 
= frame_stride * sample_rate signal_length = len(signal) frame_length = int(round(frame_length)) frame_step = int(round(frame_step)) num_frames = 1 + int(np.ceil(float(np.abs(signal_length - frame_length)) / frame_step)) # zero padding pad_signal_length = num_frames * frame_step + frame_length z = np.zeros((pad_signal_length - signal_length)) # Pad signal to make sure that all frames have equal number of samples # without truncating any samples from the original signal pad_signal = np.append(signal, z) # Slice the signal into frames from indices indices = np.tile(np.arange(0, frame_length), (num_frames, 1)) + np.tile(np.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T frames = pad_signal[indices.astype(np.int32, copy=False)] # Get windowed frames frames *= winfunc(frame_length) # Compute the one-dimensional n-point discrete Fourier Transform(DFT) of # a real-valued array by means of an efficient algorithm called Fast Fourier Transform (FFT) mag_frames = np.absolute(np.fft.rfft(frames, NFFT)) # Compute power spectrum pow_frames = (1.0 / NFFT) * ((mag_frames) ** 2) return pow_frames if __name__ == __main__: import scipy.io.wavfile import matplotlib.pyplot as plt # Read wav file # "OSR_us_000_0010_8k.wav" is downloaded from http://www.voiptroubleshooter.com/open_speech/american.html sample_rate, signal = scipy.io.wavfile.read("OSR_us_000_0010_8k.wav") # Get speech data in the first 2 seconds signal = signal[0:int(2. * sample_rate)] # Calculate the short time fourier transform pow_spec = calc_stft(signal, sample_rate) plt.imshow(pow_spec) plt.tight_layout() plt.show()

參考資料

1. DISCRETE TIME FOURIER TRANSFORM (DTFT). https://www.dsprelated.com/freebooks/mdft/Discrete_Time_Fourier_Transform.html

2. THE SHORT-TIME FOURIER TRANSFORM. https://www.dsprelated.com/freebooks/sasp/Short_Time_Fourier_Transform.html

3. Short-time Fourier transform. https://en.wikipedia.org/wiki/Short-time_Fourier_transform

4. Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What‘s In-Between. https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

短時傅裏葉變換(Short Time Fourier Transform)原理及 Python 實現