1. 程式人生 > >H.265/HEVC位元速率控制總體框架與程式碼解讀(上)

H.265/HEVC位元速率控制總體框架與程式碼解讀(上)

HEVC的位元速率控制(Rate Control)部分是我研究生期間的研究重點,近期在導師的敦促下開始論文寫作,需要總結整個位元速率控制的框架以及自身對位元速率控制的演算法改進部分,藉此機會把位元速率控制的理論部分與實際程式碼部分進行一個整理和歸納,可作為各位博友的參考,如有錯誤,敬請指正!(參考軟體版本:https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.15/ ) 上:介紹位元速率控制的初始化部分 首先位元速率控制的作用是在有限的頻寬下儘可能地提高整體視訊的效能,並保證位元分配的準確性,其中HEVC中的頻寬用位元率(Bitrate)來表示。瞭解位元速率控制應從一份提案入手,即
JCTVC-K0103 http://phenix.it-sudparis.eu/jct/,這篇文章奠定了位元速率控制的整體框架。之後的JCTVC-M0036、JCTVC-M0257在JCTVC-K0103的基礎上進行擴充套件和改進,但總體的框架不變。位元速率控制的整體流程大致如下。
  1. 對位元速率控制進行初始化。(主要分為序列級別,GOP級別,frame級別,LCU級別)
  2. 對上述4個級別分別進行位元的分配。
  3. 將各個單元分配的位元數根據R-lambda和lambda-QP模型求出各個單元的最佳QP,並應用於編碼中。
接下來就對這三個部分進行解讀,並加入一些我自己對各個模組的理解。

1.初始化部分

1.1.在函式TEncTop::create () 中,若
m_RCEnableRateControl開啟,即cfg檔案中的RateCtrl==1,對整個位元速率控制模組進行初始化。
//初始化函式m_cRateCtrl.init()主要根據序列引數進行初始化,包括Int totalFrames, Int targetBitrate, Int frameRate, Int GOPSize, Int picWidth, Int picHeight, Int LCUWidth, Int LCUHeight, Int keepHierBits, Bool useLCUSeparateModel, GOPEntry  GOPList[MAX_GOP]這裡主要解釋以下形參中的keepHierBits,即是否採用分層編碼,若採用,則各幀的位元分配是不一樣的,否則,各幀的位元分配權重相同,在之後的程式碼中進行詳解。
if ( m_RCEnableRateControl )
{
m_cRateCtrl.init( m_framesToBeEncoded, m_RCTargetBitrate, (Int)( (Double)m_iFrameRate/m_temporalSubsampleRatio + 0.5), m_iGOPSize, m_iSourceWidth, m_iSourceHeight,
m_maxCUWidth, m_maxCUHeight,m_RCKeepHierarchicalBit, m_RCUseLCUSeparateModel, m_GOPList );
}
轉到m_cRateCtrl.init()的定義之中,這裡刪除了部分else語句以提升程式碼的閱讀性。
Void TEncRateCtrl::init( Int totalFrames, Int targetBitrate, Int frameRate, Int GOPSize, Int picWidth, Int picHeight, Int LCUWidth, Int LCUHeight, Int keepHierBits, Bool useLCUSeparateModel, GOPEntry  GOPList[MAX_GOP] )
{
  destroy();
  Bool isLowdelay = true;  //判斷編碼方式是否是Lowdelay
  for ( Int i=0; i<GOPSize-1; i++ )
  {
    if ( GOPList[i].m_POC > GOPList[i+1].m_POC )
    {
      isLowdelay = false;
      break;
    }
  }
  Int numberOfLevel = 1;
  Int adaptiveBit = 0;
  if ( keepHierBits > 0 )
  {
    numberOfLevel = Int( log((Double)GOPSize)/log(2.0) + 0.5 ) + 1;
  }
  if ( !isLowdelay && GOPSize == 8 )
  {
    numberOfLevel = Int( log((Double)GOPSize)/log(2.0) + 0.5 ) + 1;
  }
  numberOfLevel++;    // intra picture
  numberOfLevel++;    // non-reference picture
  Int* bitsRatio;
  bitsRatio = new Int[ GOPSize ]; //初始化每一幀權重
  for ( Int i=0; i<GOPSize; i++ )
  {
    bitsRatio[i] = 10;
    if ( !GOPList[i].m_refPic )
    {
      bitsRatio[i] = 2;
    }
  }
  if ( keepHierBits > 0 )  //如果採用分層編碼,則每一幀的權重不同,這裡的權重即為每一幀獲得位元數的比例
  {
    Double bpp = (Double)( targetBitrate / (Double)( frameRate*picWidth*picHeight ) );   //判斷當前頻寬(Bitrate)分配到每一幀的每一個畫素點上的位元數,即bit per pixel,根據bpp調整權重策略
    if ( GOPSize == 4 && isLowdelay )    //Lowdelay下每一幀權重
    {
      if ( bpp > 0.2 )
      {
        bitsRatio[0] = 2;
        bitsRatio[1] = 3;
        bitsRatio[2] = 2;
        bitsRatio[3] = 6;
      }
      else if( bpp > 0.1 )
      {
        bitsRatio[0] = 2;
        bitsRatio[1] = 3;
        bitsRatio[2] = 2;
        bitsRatio[3] = 10;
      }
      if ( keepHierBits == 2 )
      {
        adaptiveBit = 1;
      }
    }
    else if ( GOPSize == 8 && !isLowdelay )     //Random Access下每一幀權重
    {
      if ( bpp > 0.2 )
      {
        bitsRatio[0] = 15;
        bitsRatio[1] = 5;
        bitsRatio[2] = 4;
        bitsRatio[3] = 1;
        bitsRatio[4] = 1;
        bitsRatio[5] = 4;
        bitsRatio[6] = 1;
        bitsRatio[7] = 1;
      }
      else if ( bpp > 0.1 )
      {
        bitsRatio[0] = 20;
        bitsRatio[1] = 6;
        bitsRatio[2] = 4;
        bitsRatio[3] = 1;
        bitsRatio[4] = 1;
        bitsRatio[5] = 4;
        bitsRatio[6] = 1;
        bitsRatio[7] = 1;
      }

      if ( keepHierBits == 2 )
      {
        adaptiveBit = 2;
      }
    }
    else
    {
      printf( "\n hierarchical bit allocation is not support for the specified coding structure currently.\n" );
    }
  }
  Int* GOPID2Level = new Int[ GOPSize ];
  for ( Int i=0; i<GOPSize; i++ )
  {
    GOPID2Level[i] = 1;
    if ( !GOPList[i].m_refPic )
    {
      GOPID2Level[i] = 2;
    }
  }
  if ( keepHierBits > 0 )  //這裡要說明一下GOPID2Level,在這塊程式碼塊之後附上圖文解釋。
  {
    if ( GOPSize == 4 && isLowdelay )
    {
      GOPID2Level[0] = 3;
      GOPID2Level[1] = 2;
      GOPID2Level[2] = 3;
      GOPID2Level[3] = 1;
    }
  }
  if ( !isLowdelay && GOPSize == 8 )
  {
    GOPID2Level[0] = 1;
    GOPID2Level[1] = 2;
    GOPID2Level[2] = 3;
    GOPID2Level[3] = 4;
    GOPID2Level[4] = 4;
    GOPID2Level[5] = 3;
    GOPID2Level[6] = 4;
    GOPID2Level[7] = 4;
  }
  m_encRCSeq = new TEncRCSeq;     //將計算後得到的各項資料傳入位元速率控制的初始化函式(TEncRateCtrl.cpp)之中
  m_encRCSeq->create( totalFrames, targetBitrate, frameRate, GOPSize, picWidth, picHeight, LCUWidth, LCUHeight, numberOfLevel, useLCUSeparateModel, adaptiveBit );
  m_encRCSeq->initBitsRatio( bitsRatio );
  m_encRCSeq->initGOPID2Level( GOPID2Level );
  m_encRCSeq->initPicPara();
  if ( useLCUSeparateModel )
  {
    m_encRCSeq->initLCUPara();
  }
  m_CpbSaturationEnabled = false;
  m_cpbSize              = targetBitrate;
  m_cpbState             = (UInt)(m_cpbSize*0.5f);
  m_bufferingRate        = (Int)(targetBitrate / frameRate);
  delete[] bitsRatio;      //銷燬
  delete[] GOPID2Level;
}

PS:用下圖解釋以下GOPID2Level的概念,這是一個Random Access的示例圖(圖有點醜,見諒。。。),圖中的1~8表示一個GOP中的8幀,在I幀(0)編碼完成之後,首先進行編碼的是第8幀,之後0和8共同作為第4幀的參考幀,再由0和4共同決定2、0和2共同決定1的編碼。這樣就彷彿是一個層次結構,一層一層進行編碼,也就有了layer這個識別符號以標誌各幀所屬的層。而GOPID2Level 也就是所謂的layer,其中GOPID2Level[0]代表第8幀,GOPID2Level[1]代表第4幀,GOPID2Level[2]代表第2幀,GOPID2Level[3]代表第1和第3幀,以此類推。                      1.2.在TEncTop::encode() 中,對GOP級別內容進行初始化,主要內容是對一個GOP中的各幀進行計算。
if ( m_RCEnableRateControl )
{
m_cRateCtrl.initRCGOP( m_iNumPicRcvd );
}

initRCGOP(  )函式只有兩行程式碼,最重要的就是 m_encRCGOP->create( m_encRCSeq, numberOfPictures ),如下
Void TEncRCGOP::create( TEncRCSeq* encRCSeq, Int numPic )
{
  destroy();
  Int targetBits = xEstGOPTargetBits( encRCSeq, numPic );     //計算每個GOP分配的位元數
  if ( encRCSeq->getAdaptiveBits() > 0 && encRCSeq->getLastLambda() > 0.1 )//一般不進入此if判斷,除非開啟adaptiveBits
  {
    Double targetBpp = (Double)targetBits / encRCSeq->getNumPixel();
    Double basicLambda = 0.0;
    Double* lambdaRatio = new Double[encRCSeq->getGOPSize()];
    Double* equaCoeffA = new Double[encRCSeq->getGOPSize()];
    Double* equaCoeffB = new Double[encRCSeq->getGOPSize()];
    if ( encRCSeq->getAdaptiveBits() == 1 )   // for GOP size =4, low delay case
    {
      if ( encRCSeq->getLastLambda() < 120.0 )
      {
        lambdaRatio[1] = 0.725 * log( encRCSeq->getLastLambda() ) + 0.5793;
        lambdaRatio[0] = 1.3 * lambdaRatio[1];
        lambdaRatio[2] = 1.3 * lambdaRatio[1];
        lambdaRatio[3] = 1.0;
      }
      else
      {
        lambdaRatio[0] = 5.0;
        lambdaRatio[1] = 4.0;
        lambdaRatio[2] = 5.0;
        lambdaRatio[3] = 1.0;
      }
    }
    else if ( encRCSeq->getAdaptiveBits() == 2 )  // for GOP size = 8, random access case
    {
      if ( encRCSeq->getLastLambda() < 90.0 )
      {
        lambdaRatio[0] = 1.0;
        lambdaRatio[1] = 0.725 * log( encRCSeq->getLastLambda() ) + 0.7963;
        lambdaRatio[2] = 1.3 * lambdaRatio[1];
        lambdaRatio[3] = 3.25 * lambdaRatio[1];
        lambdaRatio[4] = 3.25 * lambdaRatio[1];
        lambdaRatio[5] = 1.3  * lambdaRatio[1];
        lambdaRatio[6] = 3.25 * lambdaRatio[1];
        lambdaRatio[7] = 3.25 * lambdaRatio[1];
      }
      else
      {
        lambdaRatio[0] = 1.0;
        lambdaRatio[1] = 4.0;
        lambdaRatio[2] = 5.0;
        lambdaRatio[3] = 12.3;
        lambdaRatio[4] = 12.3;
        lambdaRatio[5] = 5.0;
        lambdaRatio[6] = 12.3;
        lambdaRatio[7] = 12.3;
      }
    }
    xCalEquaCoeff( encRCSeq, lambdaRatio, equaCoeffA, equaCoeffB, encRCSeq->getGOPSize() );
    basicLambda = xSolveEqua( targetBpp, equaCoeffA, equaCoeffB, encRCSeq->getGOPSize() );
    encRCSeq->setAllBitRatio( basicLambda, equaCoeffA, equaCoeffB );
    delete []lambdaRatio;
    delete []equaCoeffA;
    delete []equaCoeffB;
  }
  m_picTargetBitInGOP = new Int[numPic];
  Int i;
  Int totalPicRatio = 0;
  Int currPicRatio = 0;
  for ( i=0; i<numPic; i++ )      //統計每一幀的權重之和,在1.1中已定義
  {
    totalPicRatio += encRCSeq->getBitRatio( i );
  }
  for ( i=0; i<numPic; i++ )
  {
    currPicRatio = encRCSeq->getBitRatio( i );
    m_picTargetBitInGOP[i] = (Int)( ((Double)targetBits) * currPicRatio / totalPicRatio );     //運用每一幀的權重分配對應位元數
  }
  m_encRCSeq    = encRCSeq;
  m_numPic       = numPic;
  m_targetBits   = targetBits;
  m_picLeft      = m_numPic;
  m_bitsLeft     = m_targetBits;
}
1.3.在TEncGOP::compressGOP()中,對frame進行初始化
    if ( m_pcCfg->getUseRateCtrl() ) // TODO: does this work with multiple slices and slice-segments?
    {
      Int frameLevel = m_pcRateCtrl->getRCSeq()->getGOPID2Level( iGOPid );
      if ( pcPic->getSlice(0)->getSliceType() == I_SLICE )
      {
        frameLevel = 0;
      }
      m_pcRateCtrl->initRCPic( frameLevel );    //對frame層級進行初始化

進入m_pcRateCtrl->initRCPic()函式,在裡面的m_encRCPic->create()函式中可以看到對frame中引數的定義,包括各項基本資訊,如下。(這一塊不難,參照對應變數即可瞭解其含義)
Void TEncRCPic::create( TEncRCSeq* encRCSeq, TEncRCGOP* encRCGOP, Int frameLevel, list<TEncRCPic*>& listPreviousPictures )
{
  destroy();
  m_encRCSeq = encRCSeq;
  m_encRCGOP = encRCGOP;
  Int targetBits    = xEstPicTargetBits( encRCSeq, encRCGOP );
  Int estHeaderBits = xEstPicHeaderBits( listPreviousPictures, frameLevel );
  if ( targetBits < estHeaderBits + 100 )
  {
    targetBits = estHeaderBits + 100;   // at least allocate 100 bits for picture data
  }
  m_frameLevel       = frameLevel;//基礎資訊
  m_numberOfPixel    = encRCSeq->getNumPixel();
  m_numberOfLCU      = encRCSeq->getNumberOfLCU();
  m_estPicLambda     = 100.0;
  m_targetBits       = targetBits;
  m_estHeaderBits    = estHeaderBits;
  m_bitsLeft         = m_targetBits;
  Int picWidth       = encRCSeq->getPicWidth();
  Int picHeight      = encRCSeq->getPicHeight();
  Int LCUWidth       = encRCSeq->getLCUWidth();
  Int LCUHeight      = encRCSeq->getLCUHeight();
  Int picWidthInLCU  = ( picWidth  % LCUWidth  ) == 0 ? picWidth  / LCUWidth  : picWidth  / LCUWidth  + 1;
  Int picHeightInLCU = ( picHeight % LCUHeight ) == 0 ? picHeight / LCUHeight : picHeight / LCUHeight + 1;
  m_lowerBound       = xEstPicLowerBound( encRCSeq, encRCGOP );
  m_LCULeft         = m_numberOfLCU;
  m_bitsLeft       -= m_estHeaderBits;
  m_pixelsLeft      = m_numberOfPixel;
  m_LCUs           = new TRCLCU[m_numberOfLCU];
  Int i, j;
  Int LCUIdx;
  for ( i=0; i<picWidthInLCU; i++ )      //每個LCU的引數進行初始化
  {
    for ( j=0; j<picHeightInLCU; j++ )
    {
      LCUIdx = j*picWidthInLCU + i;
      m_LCUs[LCUIdx].m_actualBits = 0;
      m_LCUs[LCUIdx].m_QP         = 0;
      m_LCUs[LCUIdx].m_lambda     = 0.0;
      m_LCUs[LCUIdx].m_targetBits = 0;
      m_LCUs[LCUIdx].m_bitWeight  = 1.0;
      Int currWidth  = ( (i == picWidthInLCU -1) ? picWidth  - LCUWidth *(picWidthInLCU -1) : LCUWidth  );
      Int currHeight = ( (j == picHeightInLCU-1) ? picHeight - LCUHeight*(picHeightInLCU-1) : LCUHeight );
      m_LCUs[LCUIdx].m_numberOfPixel = currWidth * currHeight;
    }
  }
  m_picActualHeaderBits = 0;
  m_picActualBits       = 0;
  m_picQP               = 0;
  m_picLambda           = 0.0;
}
1.4.在TEncSlice::compressSlice()中,對LCU進行初始化,主要是bpp,lambda,QP這幾個引數
 if ( m_pcCfg->getUseRateCtrl() )
      Int estQP        = pcSlice->getSliceQp();
      Double estLambda = -1.0;
      Double bpp       = -1.0;
以上部分就是所有RC中初始化引數的設定。