#資料結構與演算法學習筆記#PTA17:哈夫曼樹與哈夫曼編碼 Huffman Tree & Huffman Code(C/C++)
2018.5.16
最近一段時間忙於實驗室各種專案和輔導員的各種雜活,間隔了半周沒有耐下心學習。導師最近接了一個要PK京東方的專案讓我來做總負責,確實是很驚喜了。責任心告訴我不能把工作做水了,但是還是嘗試把實權移交給師兄們比較好。
這道題可以說是樹這塊的壓軸題了,無論是程式碼量還是思維難度都和其他題目不在一個檔次。題目意思是給定一個帶權的輸入序列,和N個與帶權輸入序列元素相同的編碼測試序列,若編碼符合最優編碼,則輸出Yes,若不符合,則輸出No。
這道題考察最優編碼長度,實際上是在考察Huffman樹與Huffman編碼,出題人擔心你想不到,還特地在題幹開頭專門介紹了David A. Huffman和他提出的"A Method for the Construction of Minimum-Redundancy Codes"(一種實現最小冗餘編碼結構的方法),也就是Huffman Codes(哈夫曼編碼)。
對於每個給定的帶權元素序列來說,其必定可以建成一棵Huffman樹,儘管根據建樹方法的不同,樹的結構不同,但是對於這個給定的帶權元素序列,其最優編碼長度是固定的,即某一形式的Huffman樹下的Huffman編碼長度。(建立Huffman樹的過程為,每次將權重最小的兩個結點合成一棵二叉樹,其樹根結點權值為兩子樹權值之合。再將該二叉樹當作結點進行重新合併。重複上述過程直到使用完所有結點,建立成一棵樹(理論上N個結點要進行N-1次合併))但需要注意的是,Huffman編碼的編碼長度是最優編碼長度,但是最優編碼長度可以不是其Huffman編碼(題目最後一句也有特地提醒,看來出題人還是比較好心的)。因此,驗證輸入測試序列需要靠兩點:1.符合最優編碼長度,2.能夠無歧義解碼。
那麼思路就出來了。先根據輸入序列建立Huffman樹,並獲得最優編碼長度。再對提交資料進行檢查:1.是否符合最優編碼長度,2.是否符合無歧義解碼規則(字首碼編碼,資料僅存在於二叉樹葉節點)。
獲得最優編碼長度的過程,需要先建立一棵Huffman樹,又需要先將帶權序列建立成最小堆,再每輪彈出2次最小堆的頂點,作為二叉樹的左右子樹進行合併,合併完後的二叉樹進行權值更新,再繼續放入最小堆進行合併……直到最小堆元素全部彈出,最後彈出一整棵Huffman樹。(最小堆的建立可以參見:#資料結構與演算法學習筆記#PTA14:最小堆與最大堆(C/C++))。計算每一個測試序列的編碼長度,與標準Huffman編碼長度比較即可。最小堆每次插入和彈出都需要對全堆某個路徑(根節點到葉子結點的一條路徑)進行一次調整,具體情況分析詳見程式碼註釋。
檢查字首碼編碼的過程,需要根據輸入序列的每個元素編碼,模擬其在樹中的路徑(相當於每次建立一個元素編碼所代表一條二叉樹的路徑,0代表左子樹,1代表右子樹)。模擬過程中的兩種情況可以驗證不滿足字首碼要求(如下圖):1.後建立的分支經過或超過已經被定義的葉子結點,2.後建立分支建立結束時未達到葉子結點。具體情況分析詳見程式碼註釋。
題目要求:
In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.
Output Specification:
For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
實現程式碼:
// HuffmanCodes.cpp : 定義控制檯應用程式的入口點。
//
#include "stdafx.h"
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
//Huffman樹結點類
class Node {
public:
Node() {}
Node(char element, int weight)
:element(element), weight(weight), left(NULL), right(NULL) {}
char element;
int weight;
Node* left = NULL;
Node* right = NULL;
bool isleave = false;
};
typedef Node* HFMTree;
//輸入測試樣例結點類
class Case {
public:
char element;
char route[1000];
int length;
int getlength() {
return strlen(this->route);
}
};
void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist);
void Insert(vector<HFMTree>& minHeap, HFMTree node); //插入資料建立最小堆
HFMTree CreateHFMT(vector<HFMTree>& minHeap); //根據最小堆建立Huffman樹
HFMTree DeleteMinHeap(vector<HFMTree>& minHeap); //從最小堆中取出最小元素,刪除該結點並重新調整最小堆,最後刪除該結點
int getHFMLength(HFMTree hfmtree, int depth); //獲得該樹編碼長度
void Input(vector<Case>& testcase, int num);
bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight); //檢查是否符合最優編碼長度
bool isPrefixCode(vector<Case>& testcase); //檢查是否符合字首碼編碼
int main()
{
/*根據輸入序列建立Huffman樹,並獲得最優編碼長度*/
int num;
cin >> num;
vector<HFMTree> minHeap; //建立最小堆,用最小堆對序列進行儲存
vector<HFMTree> inputlist; //記錄輸入順序與權值大小
HFMTree flag = new Node('-', -1);
minHeap.push_back(flag);
Read(num, minHeap, inputlist);
HFMTree hfmtree; //利用最小堆建立Huffman樹
hfmtree = CreateHFMT(minHeap);
int optcodelength = getHFMLength(hfmtree, 0); //通過序列建立的Huffman樹獲得最優編碼長度
/*對提交資料進行檢查:1.是否符合最優編碼長度,2.是否符合無歧義解碼規則(字首碼編碼,資料僅存在於二叉樹葉節點)*/
int count;
cin >> count;
for (int i = 0;i < count;i++) {
vector<Case> testcase;
Input(testcase, num);
bool isoptimallen = isOptimalLen(testcase, inputlist, optcodelength);
bool isprefixcode = isPrefixCode(testcase);
if (isoptimallen && isprefixcode) {
cout << "Yes" << endl;
}
else {
cout << "No" << endl;
}
}
system("pause");
return 0;
}
void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist) {
char element;
int weight;
for (int i = 0; i < num; i++) {
cin >> element >> weight;
HFMTree node = new Node(element, weight);
inputlist.push_back(node);
Insert(minHeap, node);
}
//minHeap.erase(minHeap.begin());
}
void Insert(vector<HFMTree>& minHeap, HFMTree node) {
int index = minHeap.size();
minHeap.push_back(node);
//每次插入後自底向上進行調整
while ((*minHeap[index / 2]).weight > (*node).weight) {
//此處不可單純進行值交換,需要交換兩個物件
//(*minHeap[index]).element = (*minHeap[index / 2]).element;
//(*minHeap[index]).weight = (*minHeap[index / 2]).weight;
minHeap[index] = minHeap[index / 2];
index /= 2;
}
minHeap[index] = node;
}
HFMTree CreateHFMT(vector<HFMTree>& minHeap) {
HFMTree hfmtree = new Node();
int size = minHeap.size() - 1;
//進行size-1次合併
for (int i = 1; i < size; i++) {
HFMTree node = new Node();
//每次從最小堆中取出堆頂的兩個結點作為該結點的左右子結點
node->left = DeleteMinHeap(minHeap);
node->right = DeleteMinHeap(minHeap);
node->weight = node->left->weight + node->right->weight;
//將該結點作為根節點的二叉樹重新加入最小堆
Insert(minHeap, node);
}
//從最小堆中取出建好的Huffman樹
hfmtree = DeleteMinHeap(minHeap);
return hfmtree;
}
HFMTree DeleteMinHeap(vector<HFMTree>& minHeap) {
//檢查是否堆空
if (minHeap.size() == 1) {
return NULL;
}
//將該堆最大元素裝入新結點並返回
HFMTree node = new Node();
node = minHeap[1];
//重新調整該堆
int size = minHeap.size();
int parent, child;
//用最大堆中最後一個元素從根結點開始向上過濾下層結點
HFMTree cmp = new Node();
cmp = minHeap[size - 1];
//從根節點開始,用parent記錄根結點下標,用child記錄其最小子結點下標,每次迴圈將parent更新為上一次迴圈的child
//當parent指向底層結點時跳出迴圈(會有極端情況比如偏向一邊的堆使得parent最終並非指向該子樹底層結點,但不影響結果)
for (parent = 1; 2 * parent < size; parent = child) {
child = parent * 2;
//若該子結點不是堆尾結點,令child指向左右子結點中的較小者
if ((child != size - 1) && ((*minHeap[child]).weight > (*minHeap[child + 1]).weight)) {
child++;
}
//當迴圈到堆尾結點值小於等於該子結點值時,可以結束(此時堆尾結點會替換parent結點而不是child結點)
if (cmp->weight <= (*minHeap[child]).weight) {
break;
}
else {
minHeap[parent] = minHeap[child];
}
}
//將尾結點與當前父結點替換
minHeap[parent] = cmp;
//刪除堆尾結點
//此處不能用minHeap.erase(minHeap.end());,因為erase會返回被刪除結點的下一結點,而尾結點的下一結點超限
minHeap.pop_back();
//返回該結點
return node;
}
int getHFMLength(HFMTree hfmtree, int depth) {
//若為葉子節點,直接返回其編碼長度
if (!hfmtree->left && !hfmtree->right) {
return hfmtree->weight * depth;
}
//否則其他節點一定有兩個子樹,返回左右子樹編碼長度之合,深度相應加一
else {
return getHFMLength(hfmtree->left, depth + 1) + getHFMLength(hfmtree->right, depth + 1);
}
}
void Input(vector<Case>& testcase, int num) {
for (int i = 0;i < num;i++) {
Case inputcase;
cin >> inputcase.element >> inputcase.route;
inputcase.length = inputcase.getlength();
testcase.push_back(inputcase);
}
}
bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight) {
int testweight = 0;
for (int i = 0;i < testcase.size();i++) {
testweight += (testcase[i].length * (*inputlist[i]).weight);
}
if (testweight == weight) {
return true;
}
else {
return false;
}
}
bool isPrefixCode(vector<Case>& testcase) {
bool isprefixcode = true;
HFMTree newtree = new Node();
//兩種情況驗證不滿足字首碼要求:1.後建立的分支經過或超過已經被定義的葉子結點,2.後建立分支建立結束時未達到葉子結點
for (int i = 0;i < testcase.size();i++) {
HFMTree point = newtree;
if (isprefixcode == false)break;
for (int j = 0;j < testcase[i].length;j++) {
if (isprefixcode == false)break;
if (testcase[i].route[j] == '0') {
//先檢查左子結點是否存在,若不存在,則建立一個左子結點
if (!point->left) {
HFMTree newnode = new Node();
point->left = newnode;
point = point->left;
//若此時為分支的最後一環,則將該結點定義為葉子結點
if (j == testcase[i].length - 1) {
point->isleave = true;
}
}
//若左子樹存在,則先將標記指標移至左子樹。
else {
point = point->left;
//若左子樹為葉子結點,則不符合要求
if (point->isleave) {
isprefixcode = false;
break;
}
//若此時為分支的最後一環且仍有葉子結點,則不符合要求
if ((j == testcase[i].length - 1) && (point->left || point->right)) {
isprefixcode = false;
break;
}
}
}
else if (testcase[i].route[j] == '1') {
//先檢查右子結點是否存在,若不存在,則建立一個右子結點
if (!point->right) {
HFMTree newnode = new Node();
point->right = newnode;
point = point->right;
//若此時為分支的最後一環,則將該結點定義為葉子結點
if (j == testcase[i].length - 1) {
point->isleave = true;
}
}
//若左子樹存在,則先將標記指標移至左子樹。
else {
point = point->right;
//若左子樹為葉子結點,則不符合要求
if (point->isleave) {
isprefixcode = false;
break;
}
//若此時為分支的最後一環且仍有葉子結點,則不符合要求
if ((j == testcase[i].length - 1) && (point->left || point->right)) {
isprefixcode = false;
break;
}
}
}
}
}
return isprefixcode;
}
#Coding一小時,Copying一秒鐘。留個言點個讚唄,謝謝你#