ffmpeg 音視訊播放，同步，硬解

FFmpeg · 發表 2019-04-21 18:59:43

摘要：背景 win10自1803版本釋出以來，取消了內建的h265的視訊解碼，雖然能安裝外掛可以播放，但是在一個支援硬解8k視訊的N卡上，居然以軟解的方式播放。顛覆了Windows平臺上DXVA（DirectX Video Acceleration）的認知，不得已，只能通過NVidia提供的sd...

背景

win10自1803版本釋出以來，取消了內建的h265的視訊解碼，雖然能安裝外掛可以播放，但是在一個支援硬解8k視訊的N卡上，居然以軟解的方式播放。顛覆了Windows平臺上DXVA（DirectX Video Acceleration）的認知，不得已，只能通過NVidia提供的sdk來硬解視訊，其中用到了ffmpeg，這是一個很好的開始，在開發過程中學到了關於音視訊的不少知識，在此分享。

播放視訊

視訊播放的本質是將一堆序列圖片按照幀頻一張一張顯示出來。幀頻決定了切換下一張圖片的時間，而這裡的圖片指的是ARGB的畫素集，而不是壓縮過的png或者JPEG等。

假設一個視訊fps（幀頻）是30，尺寸是1920x1080，時長30秒，那麼原始資料的大小是 1920x1080x30x30x4=7G，但是實際上視訊檔案不會這麼大，充其量幾十M，那麼播放視訊就成了將資料解壓成ARGB的畫素集，然後根據幀頻一張一張的顯示出來。

視訊解碼

視訊解碼其實就是兩步

1 ，根據視訊的編碼格式，解出每幀的圖片。

2，將每幀的圖片色彩空間轉成RGB的色彩空間。

解出每幀的圖片

視訊的編碼格式有很多，比如h.264, hevc,vp9等，使用ffmpeg時可以使用-c:v 指定視訊的編碼

圖片發自簡書App

上面就是將視訊編碼指定為VP9的webm視訊。

使用ffmpeg完整的視訊解碼如下：

//開啟媒體
AVFormatContext *fmtc = NULL;
avformat_open_input(&fmtc, "the video file path", NULL, NULL);
avformat_find_stream_info(fmtc, NULL);
int videoIndex = av_find_best_stream(fmtc, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);

//根據媒體的編碼格式建立解碼器
AVCodecContext* avctx = avcodec_alloc_context3(NULL);
auto st = fmtc->streams[videoIndex];
avcodec_parameters_to_context(avctx, st->codecpar);
AVCodec*codec = avcodec_find_decoder(avctx->codec_id);
avcodec_open2(avctx, codec, NULL);

//視訊解碼
AVPacket* packet
av_init_packet(&packet);
packet.data = 0;
packet.size = 0;
AVFrame* frame = av_frame_alloc();

while (av_read_frame(fmtc, &packet) >= 0)
{

if (packet.stream_index == videoIndex)
{

avcodec_send_packet(avctx, packet);

while(true)
{
int ret = avcodec_receive_frame(avctx, frame);
if(ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
break;
if(ret < 0)
{
throw std::exception("CantDecode");
}
//receive new frame
}
if(packet.data) av_packet_unref(&packet);
continue;
}
av_packet_unref(&packet);
}


//釋放操作
avcodec_free_context(&avctx);
av_frame_free(&frame);
avformat_close_input(&fmtc);

上面的程式碼中，可以看出ffmpeg解碼最終得到AVFrame，AVFrame是解壓之後的視訊幀圖片，AVFrame是靠AVPacket和AVCodecContext得來的，AVPacket是壓縮的視訊幀圖片，通過讀取AVFormatContext得來的，AVCodecContext是通過視訊的編碼格式建立的。

在獲取avframe的時候還有一個while迴圈，按理說，一個avpacket不就對應一個avframe嗎？其實不然，在有些編碼下，雖然一個avpacket對應一個avframe，但完整的影象資訊還得靠相鄰的avpacket才能完全解出，在視訊壓縮過程中，通過各種演算法來減少資料的容量，這裡最為常見的為IPB(Intra coded frames, Predicted pictures, and Bi-directional predictive pictures)。即一個avpacket可能是I幀，或是P幀，或B幀，I幀只需考慮本幀；P幀記錄的是與前一幀的差別；B幀記錄的是前一幀及後一幀的差別。所以這裡的while也就明白了，當遇到B幀時，還得等下一個avpacket。

轉成RGB

當我們拿到AVFrame的時候還不能直接顯示，因為每個畫素的色彩空間不一定是RGB的，大部分都是YUV的，我們需要把YUV轉成RGB。

在ffmpeg中使用引數-pix_fmt可以指定畫素的色彩空間。

ffmpeg -i in.mov -c:v libx264 -pix_fmt yuv420p out.mp4

YUV色彩空間不同於RGB，“Y”表示明亮度（Luminance、Luma），“U”和“V”則是色度、濃度（Chrominance、Chroma）。失去UV的影象只是一張灰度圖，加上UV的影象變成彩色的了，所以在壓縮過程中，四個畫素點的Y的共享一個UV就是YUV420，兩個畫素點的Y共享一個UV就是YUV422，一個Y對飲一個UV就是YUV444…

在YUV420P中的P指的是planar資料，即YUV是分開儲存的，這也就是為什麼在AVFrame的data屬性是byte* data[8]的，data[0]是Y分量，data[1]是U分量，data[2]是V分量。

色彩轉換程式碼如下：

//色彩模式轉換
//此段程式碼在開啟視訊流處
int w = fmtc->streams[videoIndex]->codecpar->width;
int h = fmtc->streams[videoIndex]->codecpar->height;

SwsContext* swsctx = 0;
uint8_t* pixels = new bytep[ w *h * 4] ;

///此段程式碼在得到avframe處
swsctx = sws_getCachedContext(swsctx, frame->width, frame->height, (AVPixelFormat)frame->format, w, h, AV_PIX_FMT_RGB32, SWS_BICUBIC, 0, 0, 0);
AVPicture pict = { { 0 } };
avpicture_fill(&pict, pixels, AV_PIX_FMT_RGB32, frame->width, frame->width);
sws_scale(swsctx, frame->data, frame->linesize, 0, frame->height, pict.data, pict.linesize);

視訊播放

當我們得到RGB畫素值時就可以根據幀頻顯示視訊了。

//獲取當前計算機執行的時間
double getTime()
{
__int64 freq = 0;
__int64 count = 0;
if (QueryPerformanceFrequency((LARGE_INTEGER*)&freq) && freq > 0 && QueryPerformanceCounter((LARGE_INTEGER*)&count))
{
return (double)count / (double)freq * 1000.0;
}
return 0.0;
}

double interval = 1000.0/av_q2d(fmtc->streams[videoIndex]->r_frame_rate);
double estimateTime = frameIndex * interval;// 預計時間
double actualTime = (getTime() - startTime);//實際時間

上面的程式碼中，根據幀頻得到了預計時間，實際時間是當前時間減去開始播放的時間，要是實際時間小於預計時間，那麼需要sleep一會，等到預計時間下一幀顯示，反之則要儘快顯示下一幀。

硬體解碼

上面程式碼中，如果一個視訊的幀頻是30幀/秒，意味著每幀切換的時間是33毫秒，那麼問題來了，如果在33毫秒內沒有解完下一幀的影象，視訊播放就會延時或者是丟幀，但如果用硬體解碼，那麼出現這個問題的機率就大大降低了。畢竟GPU在多核處理影象方面不是CPU所比肩的。

硬體解碼的原理是將avpacket直接提交給gpu，然後gpu解碼，得到一個surface交由應用程式處理，這個surface存在視訊記憶體中，這裡用的是DirectX9，即在cpu中以IDirect3DTexture9形式間接訪問，作為貼圖直接渲染出來。需要注意的是這裡的avpacket 在h264，hevc編碼中剔除了pps等資訊，需要加回來才能提交給GPU，如下

//硬體解碼
//(SPS)Sequence Paramater Set, (PPS)Picture Paramater Set,
//Convert an H.264 bitstream from length prefixed mode to start code prefixed mode (as defined in the Annex B of the ITU-T H.264 specification).
AVPacket* pktFiltered
AVBSFContext *bsfc = NULL;
av_init_packet(&pktFiltered);
pktFiltered.data = 0;
pktFiltered.size = 0;

const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb" / "hevc_mp4toannexb");
av_bsf_alloc(bsf, &bsfc);
avcodec_parameters_copy(bsfc->par_in, fmtc->streams[videoIndex]->codecpar);
av_bsf_init(bsfc);

av_bsf_send_packet(bsfc, packet);
av_bsf_receive_packet(bsfc, &pktFiltered);

硬體解碼的具體示例參看NVidia的sdk示例。

半硬體解碼

視訊編碼有千萬種，但gpu對於能解的編碼，解析度有嚴格的要求，如果硬體不支援，我們還得靠cpu解碼，但這裡我們可以把yuv轉rgb的程式碼放到gpu端處理以減輕cpu的壓力，yuv轉rgb的原始碼在ffmpeg中有，但沒有參考的價值，因為針對cpu優化成int演算法，而gpu擅長float的運算。下面示例程式碼是yuv420p轉rgb。

//GPU程式碼
//顏色空間轉換 YUV420P to RGB
float4x4 colormtx;
texturetex0; 
texturetex1; 
texturetex2; 
sampler sam0 =sampler_state { Texture = <tex0>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };
sampler sam1 =sampler_state { Texture = <tex1>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };
sampler sam2 =sampler_state { Texture = <tex2>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };


float4 c = float4(tex2D(sam0, uv).a, tex2D(sam1, uv).a, tex2D(sam2, uv).a, 1); 
color = mul(c, colormtx); 

//CPU程式碼
D3DXMATRIXA16 yuv2rgbMatrix()
{
/*
FLOAT r = (1.164 * (Y - 16) + 1.596 * (V - 128));
FLOAT g = (1.164 * (Y - 16) - 0.813 * (V - 128) - 0.391 * (U - 128));
FLOAT b = (1.164 * (Y - 16) + 2.018 * (U - 128));

FLOAT r = 1.164 * Y + 1.596*V - 1.596*128.0/255.0 - 1.164*16.0/255.0;
FLOAT g = 1.164 * Y - 0.391*U - 0.813*V - 1.164*16.0/255.0+0.813*128.0/255.0+0.391*128.0/255.0;
FLOAT b = 1.164 * Y + 2.018*U - 1.164*16.0/255.0- 2.018*128.0/255.0;

*/
D3DXMATRIXA16 m(
1.164, 0, 1.596, -1.596*128.0 / 255.0 - 1.164*16.0 / 255.0,
1.164, -0.391, -0.813, -1.164*16.0 / 255.0 + 0.813*128.0 / 255.0 + 0.391*128.0 / 255.0,
1.164, 2.018, 0, -1.164*16.0 / 255.0 - 2.018*128.0 / 255.0,
0, 0, 0, 1
);
D3DXMatrixTranspose(&m, &m);
return m;
}

void update(AVFrame* frame)
{
int w = ctx_->textureWidth();
int h = ctx_->textureHeight();
int w2 = w /2;
int h2 = h / 2;
auto device = ctx_->getDevice3D(); //IDirect3DDevice9Ex


if (!texY_) //IDirect3DTexture9* 
{
auto effect = render_->effect(); //ID3DXEffect* 
device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texY_, NULL);
device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texU_, NULL);
device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texV_, NULL);
effect->SetTexture("tex0", texY_);
effect->SetTexture("tex1", texU_);
effect->SetTexture("tex2", texV_);
D3DXMATRIXA16 m = yuv2rgbMatrix();
effect->SetMatrix("colormtx", &m);
}
upload(frame->data[0], frame->linesize[0], h,texY_);
upload(frame->data[1], frame->linesize[1], h2, texU_);
upload(frame->data[2], frame->linesize[2], h2, texV_);
}

void upload(uint8_t * data, int linesize, int h, IDirect3DTexture9* tex)
{
D3DLOCKED_RECT locked = { 0 };
HRESULT hr = tex->LockRect(0, &locked, NULL, D3DLOCK_DISCARD);
if (SUCCEEDED(hr))
{
uint8_t* dst = (uint8_t*)locked.pBits;
int size = linesize < locked.Pitch ? linesize : locked.Pitch;
for (INT y = 0; y < h; y++)
{
CopyMemory(dst, data, size);
dst += locked.Pitch;
data += linesize;
}
tex->UnlockRect(0);

}
}

再舉一例，是關於YUV422P10LE轉化成argb的，YUV422P10LE 每個畫素佔36bits，其中alpha佔12bits，YUV各佔8bits，但ffmpeg儲存的資料是四個分量各佔12bits，每個分量兩個位元組儲存，這裡用D3DFMT_L16建立的貼圖。

//YUV422P10LE to ARGB
///< planar YUV 4:4:4,36bpp, (1 Cr & Cb sample per 1x1 Y samples), 12b alpha, little-endian
//YUVA444P12LE to ARGB
//GPU Code
float4x4 colormtx;
texturetex0; 
texturetex1; 
texturetex2; 
texturetex3; 
sampler sam0 =sampler_state { Texture = <tex0>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };
sampler sam1 =sampler_state { Texture = <tex1>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };
sampler sam2 =sampler_state { Texture = <tex2>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };
sampler sam3 =sampler_state { Texture = <tex3>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };,

float4 c = float4(tex2D(sam0, uv).x, tex2D(sam1, uv).x, tex2D(sam2, uv).x, 0.06248569466697185); //0xfff/0xffff
c = c * 16.003663003663004; //0xffff/0xfff
color = mul(c, colormtx); 
color.a = tex2D(sam3, uv).x * 16.003663003663004; 


//CPU Code
int w = ctx_->textureWidth();
int h = ctx_->textureHeight();
auto device = ctx_->getDevice3D(); //IDirect3DDevice9Ex


if (!texY_) //IDirect3DTexture9* 
{
auto effect = render_->effect(); //ID3DXEffect* 
check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texY_, NULL)); //12b Y
check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texU_, NULL)); //12b U
check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texV_, NULL)); //12b V
check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texA_, NULL)); //12b A
effect->SetTexture("tex0", texY_);
effect->SetTexture("tex1", texU_);
effect->SetTexture("tex2", texV_);
effect->SetTexture("tex3", texA_);
D3DXMATRIXA16 m = uv2rgbMatrix();
effect->SetMatrix("colormtx", &m);
}
upload(frame->data[0], frame->linesize[0], h, texY_);
upload(frame->data[1], frame->linesize[1], h, texU_);
upload(frame->data[2], frame->linesize[2], h, texV_);
upload(frame->data[3], frame->linesize[3], h, texA_);

播放音訊

原始音訊資料有幾個重要的引數，取樣率（Sample per second - sps），通道（channel），每個取樣佔用的bit數（bits per sample - bps）。

播放音訊實際上就是把音訊資料不停的傳送到音效卡上，音效卡根據sps，channel，bps產生聲音。比如一段音訊資料大小是4M，取樣率是44100，channel是2，bps是16位，如果將這段資料傳送給音效卡，那麼過(4x1024x1024x8)/(44100x2x16)秒後音效卡會告訴你聲音播放完了。

使用wave api播放音訊

在Windows上，可以使用wave api播放音訊，播放步驟是開啟，寫入，關閉。可以使用軟體gold wave匯出原始的音訊資料，另存為snd檔案，匯出時注意配置聲道，取樣率和bps。下面一個播放sps是44100，channel是2，bps是16bit的原始音訊資料的程式碼。

//Play audio use the Wave API
#include <mmsystem.h>

const byte* pcmData =....//假設這個要播放的音訊資料和資料大小
int pcmSize = ....
openAudio();
writeAudio(pcmData, pcmSize);
closeAudio();

////////////////////////////
#define AUDIO_DEV_BLOCK_SIZE 8192
#define AUDIO_DEV_BLOCK_COUNT 4

HWAVEOUT dev = 0;
int available = 0;
WAVEHDR* blocks = 0;
int index = 0;
Mutex mtx;//自定義類 基於EnterCriticalSection 和 LeaveCriticalSection 實現的

void openAudio()
{

WAVEFORMATEX wfx = {0};
wfx.nSamplesPerSec = 44100;
wfx.wBitsPerSample = 16;
wfx.nChannels = 2;
wfx.cbSize = 0;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample * wfx.nChannels) >> 3;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
waveOutOpen(&dev, WAVE_MAPPER, &wfx, (DWORD_PTR)waveOutProc, (DWORD_PTR)0, CALLBACK_FUNCTION);


blocks = new WAVEHDR[AUDIO_DEV_BLOCK_COUNT];
memset(blocks, 0, sizeof(WAVEHDR) * AUDIO_DEV_BLOCK_COUNT);
for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++)
{
blocks[i].lpData = new char[AUDIO_DEV_BLOCK_SIZE];
blocks[i].dwBufferLength = AUDIO_DEV_BLOCK_SIZE;
}

}

void closeAudio()
{
for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++)
{
if (blocks[i].dwFlags & WHDR_PREPARED)
{
waveOutUnprepareHeader(dev_, &blocks[i], sizeof(WAVEHDR));
}
delete blocks[i].lpData;
}
delete blocks; 
waveOutClose(dev);
}

void writeAudio(const byte* data, int size)
{
if (!bok_)return;
WAVEHDR* current;
int remain;
current = &blocks[index];
while (size > 0) 
{
if (current->dwFlags & WHDR_PREPARED)
{
waveOutUnprepareHeader(dev, current, sizeof(WAVEHDR));
}
if (size < (int)(AUDIO_DEV_BLOCK_SIZE - current->dwUser))
{
memcpy(current->lpData + current->dwUser, data, size);
current->dwUser += size;
break;
}
remain = AUDIO_DEV_BLOCK_SIZE - current->dwUser;
memcpy(current->lpData + current->dwUser, data, remain);
size -= remain;
data += remain;
current->dwBufferLength = AUDIO_DEV_BLOCK_SIZE;
waveOutPrepareHeader(dev, current, sizeof(WAVEHDR));
waveOutWrite(dev, current, sizeof(WAVEHDR));


mtx.lock();
available--;
mtx.unlock();

while (!available)
{
Sleep(10);
}
index++;
index %= AUDIO_DEV_BLOCK_COUNT;
current = &blocks[index];
current->dwUser = 0;
}
}

上面程式碼中在open的時候設定了回撥函式waveOutProc，當該函式被呼叫的時候說明一個8192大小的音訊資料塊被播放完，在writeaudio裡，不停的迴圈寫入大小為8192四個資料塊，這四個資料塊預先寫進去（waveOutWrite），在等waveOutProc回撥時，又有可用的資料塊再接著寫，這樣就可以連續的播放聲音了。

ffmpeg解壓音訊

同樣地，在視訊檔案中音訊也是壓縮過的，一幀一幀的，解出音訊的完整程式碼如下：

//開啟視訊檔案
AVFormatContext *fmtc = NULL;
avformat_network_init();
avformat_open_input(&fmtc, "video file path", NULL, NULL);
avformat_find_stream_info(fmtc, NULL);
int autdioIndex = av_find_best_stream(fmtc, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);

//建立音訊解碼器
AVCodecContext* avctx = avcodec_alloc_context3(NULL);
auto st = fmtc->streams[autdioIndex];
avcodec_parameters_to_context(avctx, st->codecpar);
AVCodec*codec = avcodec_find_decoder(avctx->codec_id);
avcodec_open2(avctx, codec, NULL);



//解碼音訊
AVFrame* frame = av_frame_alloc();
AVPacket pkt;
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;

while (av_read_frame(fmtc, &pkt) >= 0)
{

if (pkt.stream_index == autdioIndex)
{
int gotFrame = 0;
if (avcodec_decode_audio4(avctx, frame, &gotFrame, &pkt) < 0) {
//fprintf(stderr, "Error decoding audio frame (%s)\n", av_err2str(ret));
break
}
if (gotFrame) {
writeAudio(frame->extended_data[0], linesize);
}
}

}

//關閉
avcodec_free_context(&avctx);
av_frame_free(&frame);
if(pkt.data)av_packet_unref(&pkt);
avformat_close_input(&fmtc_);
return 0;

不難看出，和解碼視訊如出一轍，最終的音訊資料在AVFrame中。

音訊轉化

雖然上面的例子中我們解出音訊並且播放出來了，但是有個條件，就是視訊檔案中的音訊的sps是44100，bps是16，channel是2，否則播放的音訊是不正常的。

我們知道，視訊檔案中sps，bps，channel不是固定的，這就需要我們轉換下我們能播放的取樣率，轉化程式碼如下

//音訊轉化
//建立音訊編碼轉換器
auto devSampleFormat = 16 == 8 ? AV_SAMPLE_FMT_U8 : AV_SAMPLE_FMT_S16;
SwrContext * swrc = swr_alloc();
av_opt_set_int(swrc, "in_channel_layout", av_get_default_channel_layout(avctx->channels), 0);
av_opt_set_int(swrc, "in_sample_rate", avctx->sample_rate, 0);
av_opt_set_sample_fmt(swrc, "in_sample_fmt", avctx->sample_fmt, 0);

av_opt_set_int(swrc, "out_channel_layout", av_get_default_channel_layout(2), 0);
av_opt_set_int(swrc, "out_sample_rate", 44100, 0);
av_opt_set_sample_fmt(swrc, "out_sample_fmt", devSampleFormat, 0);
swr_init(swrc);


struct SwrBuffer
{
int samplesPerSec;
int numSamples, maxNumSamples;
uint8_t **data;
int channels;
int linesize;

};
SwrBuffer dst = {0};
dst.samplesPerSec = dev.samplesPerSec();
dst.channels = dev.channels();
dst.numSamples = dst.maxNumSamples = av_rescale_rnd(numSamples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);
av_samples_alloc_array_and_samples(&dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 0);

//轉換音訊
dst.numSamples = av_rescale_rnd(swr_get_delay(swrc, avctx->sample_rate) + frame->nb_samples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);

if (dst.numSamples > dst.maxNumSamples) {
av_freep(&dst.data[0]);
av_samples_alloc(dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 1);
dst.maxNumSamples = dst.numSamples;
}
/* convert to destination format */
ret = swr_convert(swrc, dst.data, dst.numSamples, (const uint8_t**)frame->data, frame->nb_samples);
if (ret < 0) {
//error
}
int bufsize = av_samples_get_buffer_size(&dst.linesize, dst.channels, ret, devSampleFormat, 1);
if (bufsize < 0) {
//fprintf(stderr, "Could not get sample buffer size\n");
}
writeAudio(dst.data[0], bufsize);

檢視音效卡裝置支援的引數

上面示例中播放聲音一直用的44100，16，2，沒錯，這也是我開啟音效卡裝置所用的引數，如果音效卡不支援這個引數那麼waveOutOpen會呼叫失敗，如何判斷音效卡支援的引數，程式碼如下：

WAVEINCAPS caps = {0};
if(waveInGetDevCaps(0, &caps, sizeof(caps)) == MMSYSERR_NOERROR)
{
//checkCaps(caps.dwFormats, WAVE_FORMAT_96S16, 96000, 2, 16);
//checkCaps(caps.dwFormats, WAVE_FORMAT_96S08, 96000, 2, 8);
}
void checkCaps(DWORD devfmt, DWORD fmt, int sps, int channels, int bps)
{
if (bps_)return;
if (devfmt & fmt)
{
bps_ = bps;
channels_ = channels;
sps_ = sps;
}

}

音視訊同步

視訊和音訊播放起來需要同步，不能各播各的，那樣很可能出現的問題是口型對不上聲音，這裡我們將用視訊同步到音訊的方式同步音視訊。

我們知道根據取樣率等資訊，完全可以知道音訊播放了多長時間，那麼根據這個時間就可以把視訊同步上，虛擬碼如下：

int audioFrameIndex = 0;
int videoFrameIndex = 0;

//Thread 1

while (true)
{
decodeAudioData();
writeAudio(...);
}
void CALLBACK waveOutProc(HWAVEOUT hWaveOut, UINT uMsg, DWORD_PTR dwInstance, DWORD dwParam1, DWORD dwParam2)
{
if (uMsg != WOM_DONE)
return;
...
audioFrameIndex ++;
}
//Thread 2
double audioBitsPerSec = audioDev->bitsPerSample() * audioDev->samplesPerSec() * audioDev->channels();
double interval = 1000.0/av_q2d(fmtc->streams[videoIndex]->r_frame_rate);
while (true)
{

if (!decodeVideoFrame())
continue;
videoFrameIndex ++;
while (true)
{
double bits = audioFrameIndex * AUDIO_DEV_BLOCK_SIZE * 8.0;
double ms = bits / audioBitsPerSec * 1000.0; //實際播放時間
double to = videoFrameIndex * interval; //預計播放時間
if (ms < to)//Need false then wait
{
Sleep(1);
continue;
}
presentVideoFrame();
break;
}
}

這裡多執行緒中略去了執行緒鎖的問題，且行且小心

後記

這是本人在實際開發過程中的一些見地，不足之處忘大家多多指正。