Shannon-Fano編碼——原理與實現

阿新 • • 發佈：2018-11-13

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow

也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

夏農-範諾演算法（Shannon-Fano coding）原理

和Huffman-Tree一樣，Shannon-Fano coding也是用一棵二叉樹對字元進行編碼。但在實際操作中呢，Shannon-Fano卻沒有大用處，這是由於它與Huffman coding相比，編碼效率較低的結果（或者說夏農-範諾演算法的編碼平均碼字較大）

。但是它的基本思路我們還是可以參考下的。

根據Wikipedia上面的解釋，我們來看下夏農範諾演算法的原理：

Shannon-Fano的樹是根據旨在定義一個有效的程式碼表的規範而建立的。實際的演算法很簡單：

對於一個給定的符號列表，制定了概率相應的列表或頻率計數，使每個符號的相對發生頻率是已知。
排序根據頻率的符號列表，最常出現的符號在左邊，最少出現的符號在右邊。
清單分為兩部分，使左邊部分的總頻率和儘可能接近右邊部分的總頻率和。
該列表的左半邊分配二進位制數字0，右半邊是分配的數字1。這意味著，在第一半符號代都是將所有從0開始，第二半的程式碼都從1開始。
對左、右半部分遞迴應用步驟3和4，細分群體，並新增位的程式碼，直到每個符號已成為一個相應的程式碼樹的葉。

示例

夏農-範諾編碼演算法

這個例子展示了一組字母的香濃編碼結構（如圖a所示）這五個可被編碼的字母有如下出現次數:

Symbol	A	B	C	D	E
Count	15	7	6	6	5
Probabilities	0.38461538	0.17948718	0.15384615	0.15384615	0.12820513

從左到右，所有的符號以它們出現的次數劃分。在字母B與C之間劃定分割線，得到了左右兩組，總次數分別為22,17。這樣就把兩組的差別降到最小。通過這樣的分割, A與B同時擁有了一個以0為開頭的碼字, C，D，E的碼子則為1,如圖b所示。隨後, 在樹的左半邊，於A，B間建立新的分割線，這樣A就成為了碼字為00的葉子節點，B的碼子01。經過四次分割, 得到了一個樹形編碼。如下表所示，在最終得到的樹中, 擁有最大頻率的符號被兩位編碼, 其他兩個頻率較低的符號被三位編碼。

符號	A	B	C	D	E
編碼	00	01	10	110	111

Entropy(熵，平均碼字長度):

Pseudo-code

 1:  begin 2:     count source units 3:     sort source units to non-decreasing order 4:     SF-SplitS 5:     output(count of symbols, encoded tree, symbols) 6:     write output 7:   end 8:   9:  procedure SF-Split(S)10:  begin11:     if (|S|>1) then12:      begin13:        divide S to S1 and S2 with about same count of units14:        add 1 to codes in S115:        add 0 to codes in S216:        SF-Split(S1)17:        SF-Split(S2)18:      end19:  end

想不清楚的朋友可以看下這個網站的模擬程式，很形象，perfect~

夏農-範諾演算法實現（Shannon-Fano coding implementation in C++）

我們由上面的演算法可知，需要迭代地尋找一個最優點，使得樹中每個節點的左右子樹頻率總和儘可能相近。這裡我尋找最優化點用的是順次查詢法，其實呢，我們還可以用二分法（dichotomy）達到更高的效率~

/************************************************************************//* File Name: Shanno-Fano.cpp*  @Function: Lossless [email protected]: Sophia [email protected] Time: 2012-9-26 20:[email protected] Modify: 2012-9-26 20:57*//************************************************************************/#include"iostream"#include "queue"#include "map"#include "string"#include "iterator"#include "vector"#include "algorithm"#include "math.h"using namespace std;#define NChar 8 //suppose use 8 bits to describe all symbols#define Nsymbols 1<<NChar //can describe 256 symbols totally (include a-z, A-Z)#define INF 1<<31-1typedef vector<bool> SF_Code;//8 bit code of one charmap<char,SF_Code> SF_Dic; //huffman coding dictionaryint Sumvec[Nsymbols]; //record the sum of symbol count after sortingclass HTree{public : HTree* left; HTree* right; char ch; int weight; HTree(){left = right = NULL; weight=0;ch ='\0';} HTree(HTree* l,HTree* r,int w,char c){left = l; right = r; weight=w; ch=c;} ~HTree(){delete left; delete right;} bool Isleaf(){return !left && !right; }};bool comp(const HTree* t1, const HTree* t2)//function for sorting{ return (*t1).weight>(*t2).weight; }typedef vector<HTree*> TreeVector;TreeVector TreeArr;//record the symbol count array after sortingvoid Optimize_Tree(int a,int b,HTree& root)//find optimal separate point and optimize tree recursively{ if(a==b)//build one leaf node {  root = *TreeArr[a-1];  return; } else if(b-a==1)//build 2 leaf node {  root.left = TreeArr[a-1];  root.right=TreeArr[b-1];  return; } //find optimizing point x int x,minn=INF,curdiff; for(int i=a;i<b;i++)//find the point that minimize the difference between left and right; this can also be implemented by dichotomy {  curdiff = Sumvec[i]*2-Sumvec[a-1]-Sumvec[b];  if(abs(curdiff)<minn){   x=i;   minn = abs(curdiff);  }  else break;//because this algorithm has monotonicity } HTree*lc = new HTree; HTree *rc = new HTree; root.left = lc;  root.right = rc; Optimize_Tree(a,x,*lc); Optimize_Tree(x+1,b,*rc);}HTree* BuildTree(int* freqency)//create the tree use Optimize_Tree{ int i; for(i=0;i<Nsymbols;i++)//statistic {  if(freqency[i])   TreeArr.push_back(new HTree (NULL,NULL,freqency[i], (char)i)); } sort(TreeArr.begin(), TreeArr.end(), comp); memset(Sumvec,0,sizeof(Sumvec)); for(i=1;i<=TreeArr.size();i++)  Sumvec[i] = Sumvec[i-1]+TreeArr[i-1]->weight; HTree* root = new HTree; Optimize_Tree(1,TreeArr.size(),*root); return root;}/************************************************************************//* Give Shanno Coding to the Shanno Tree/*PS: actually, this generative process is same as Huffman coding/************************************************************************/void Generate_Coding(HTree* root, SF_Code& curcode){ if(root->Isleaf()) {  SF_Dic[root->ch] = curcode;  return; } SF_Code lcode = curcode; SF_Code rcode = curcode; lcode.push_back(false); rcode.push_back(true); Generate_Coding(root->left,lcode); Generate_Coding(root->right,rcode);}int main(){ int freq[Nsymbols] = {0}; char *str = "bbbbbbbccccccaaaaaaaaaaaaaaaeeeeedddddd";//15a,7b,6c,6d,5e //statistic character frequency while (*str!='\0')  freq[*str++]++; //build tree HTree* r = BuildTree(freq); SF_Code nullcode; Generate_Coding(r,nullcode); for(map<char,SF_Code>::iterator it = SF_Dic.begin(); it != SF_Dic.end(); it++) {    cout<<(*it).first<<'\t';    std::copy(it->second.begin(),it->second.end(),std::ostream_iterator<bool>(cout));    cout<<endl;   }  }

Result：

以上面圖中的統計資料為例，進行編碼。

符號	A	B	C	D	E
計數	15	7	6	6	5

Reference:

Shannon-Fano coding. Wikipedia, the free encyclopedia
Claude Elwood Shannon. Wikipedia, the free encyclopedia.
C. E. Shannon: A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, July, October, 1948.
C. E. Shannon: Prediction and Entropy of Printed English. The Bell System Technical Journal, Vol. 30, 1951.
C. E. Shannon: Communication Theory of Secrecy Systems. The Bell System Technical Journal, Vol. 28, 1949.
http://www.stringology.org/DataCompression/sf/index_en.html

關於Compression更多的學習資料將繼續更新，敬請關注本部落格和新浪微博 Sophia_qing 。

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

Shannon-Fano編碼——原理與實現

夏農-範諾演算法（Shannon-Fano coding）原理

示例

Pseudo-code

夏農-範諾演算法實現（Shannon-Fano coding implementation in C++）

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

Shannon-Fano編碼——原理與實現

huffman編碼——原理與實現

Java 線程池的原理與實現

防盜鏈的基本原理與實現

最小二乘法多項式曲線擬合原理與實現 zz

無限極分類原理與實現（轉）

java監聽器的原理與實現

Base64編碼原理與應用

Redis實現分布式鎖原理與實現分析

優先隊列原理與實現

LVM原理與實現過程

MapReduce原理與實現

單點登錄原理與實現

數據加密--詳解 RSA加密算法原理與實現

BASE64編碼原理與Golang代碼調用

線上防雪崩利器——熔斷器設計原理與實現

分頁技術原理與實現之分頁的意義及方法（一）

Android系統硬體抽象層原理與實現之WIFI

推薦系統-協同過濾原理與實現

離散傅立葉變換（DFT）和快速傅立葉變換（FFT）原理與實現

Shannon-Fano編碼——原理與實現

夏農-範諾演算法（Shannon-Fano coding）原理

示例

Pseudo-code

夏農-範諾演算法實現（Shannon-Fano coding implementation in C++）

給我老師的人工智慧教程打call！http://blog.csdn.net/jiangjunshow

相關推薦