1. 程式人生 > >#資料結構與演算法學習筆記#PTA17:哈夫曼樹與哈夫曼編碼 Huffman Tree & Huffman Code(C/C++)

#資料結構與演算法學習筆記#PTA17:哈夫曼樹與哈夫曼編碼 Huffman Tree & Huffman Code(C/C++)

2018.5.16

最近一段時間忙於實驗室各種專案和輔導員的各種雜活,間隔了半周沒有耐下心學習。導師最近接了一個要PK京東方的專案讓我來做總負責,確實是很驚喜了。責任心告訴我不能把工作做水了,但是還是嘗試把實權移交給師兄們比較好。

這道題可以說是樹這塊的壓軸題了,無論是程式碼量還是思維難度都和其他題目不在一個檔次。題目意思是給定一個帶權的輸入序列,和N個與帶權輸入序列元素相同的編碼測試序列,若編碼符合最優編碼,則輸出Yes,若不符合,則輸出No。

這道題考察最優編碼長度,實際上是在考察Huffman樹與Huffman編碼,出題人擔心你想不到,還特地在題幹開頭專門介紹了David A. Huffman和他提出的"A Method for the Construction of Minimum-Redundancy Codes"(一種實現最小冗餘編碼結構的方法),也就是Huffman Codes(哈夫曼編碼)。

對於每個給定的帶權元素序列來說,其必定可以建成一棵Huffman樹,儘管根據建樹方法的不同,樹的結構不同,但是對於這個給定的帶權元素序列,其最優編碼長度是固定的,即某一形式的Huffman樹下的Huffman編碼長度。(建立Huffman樹的過程為,每次將權重最小的兩個結點合成一棵二叉樹,其樹根結點權值為兩子樹權值之合。再將該二叉樹當作結點進行重新合併。重複上述過程直到使用完所有結點,建立成一棵樹(理論上N個結點要進行N-1次合併))但需要注意的是,Huffman編碼的編碼長度是最優編碼長度,但是最優編碼長度可以不是其Huffman編碼(題目最後一句也有特地提醒,看來出題人還是比較好心的)。因此,驗證輸入測試序列需要靠兩點:1.符合最優編碼長度,2.能夠無歧義解碼。

那麼思路就出來了。先根據輸入序列建立Huffman樹,並獲得最優編碼長度。再對提交資料進行檢查:1.是否符合最優編碼長度,2.是否符合無歧義解碼規則(字首碼編碼,資料僅存在於二叉樹葉節點)

獲得最優編碼長度的過程,需要先建立一棵Huffman樹,又需要先將帶權序列建立成最小堆,再每輪彈出2次最小堆的頂點,作為二叉樹的左右子樹進行合併,合併完後的二叉樹進行權值更新,再繼續放入最小堆進行合併……直到最小堆元素全部彈出,最後彈出一整棵Huffman樹。(最小堆的建立可以參見:#資料結構與演算法學習筆記#PTA14:最小堆與最大堆(C/C++))。計算每一個測試序列的編碼長度,與標準Huffman編碼長度比較即可。最小堆每次插入和彈出都需要對全堆某個路徑(根節點到葉子結點的一條路徑)進行一次調整,具體情況分析詳見程式碼註釋。

檢查字首碼編碼的過程,需要根據輸入序列的每個元素編碼,模擬其在樹中的路徑(相當於每次建立一個元素編碼所代表一條二叉樹的路徑,0代表左子樹,1代表右子樹)。模擬過程中的兩種情況可以驗證不滿足字首碼要求(如下圖):1.後建立的分支經過或超過已經被定義的葉子結點,2.後建立分支建立結束時未達到葉子結點。具體情況分析詳見程式碼註釋。

題目要求:

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.


Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.

Output Specification:
For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

實現程式碼:

// HuffmanCodes.cpp : 定義控制檯應用程式的入口點。
//

#include "stdafx.h"
#include <vector>
#include <iostream>
#include <string.h>

using namespace std;

//Huffman樹結點類
class Node {
public:
	Node() {}
	Node(char element, int weight)
		:element(element), weight(weight), left(NULL), right(NULL) {}

	char element;
	int weight;
	Node* left = NULL;
	Node* right = NULL;
	bool isleave = false;
};
typedef Node* HFMTree;

//輸入測試樣例結點類
class Case {
public:
	char element;
	char route[1000];
	int length;

	int getlength() {
		return strlen(this->route);
	}
};

void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist);
void Insert(vector<HFMTree>& minHeap, HFMTree node);		//插入資料建立最小堆
HFMTree CreateHFMT(vector<HFMTree>& minHeap);			//根據最小堆建立Huffman樹
HFMTree DeleteMinHeap(vector<HFMTree>& minHeap);		//從最小堆中取出最小元素,刪除該結點並重新調整最小堆,最後刪除該結點
int getHFMLength(HFMTree hfmtree, int depth);						//獲得該樹編碼長度

void Input(vector<Case>& testcase, int num);
bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight);	//檢查是否符合最優編碼長度
bool isPrefixCode(vector<Case>& testcase);				//檢查是否符合字首碼編碼


int main()
{
	/*根據輸入序列建立Huffman樹,並獲得最優編碼長度*/
	int num;
	cin >> num;

	vector<HFMTree> minHeap;		//建立最小堆,用最小堆對序列進行儲存
	vector<HFMTree> inputlist;		//記錄輸入順序與權值大小
	HFMTree flag = new Node('-', -1);
	minHeap.push_back(flag);
	Read(num, minHeap, inputlist);

	HFMTree hfmtree;				//利用最小堆建立Huffman樹
	hfmtree = CreateHFMT(minHeap);
	int optcodelength = getHFMLength(hfmtree, 0);	//通過序列建立的Huffman樹獲得最優編碼長度


	/*對提交資料進行檢查:1.是否符合最優編碼長度,2.是否符合無歧義解碼規則(字首碼編碼,資料僅存在於二叉樹葉節點)*/
	int count;
	cin >> count;

	for (int i = 0;i < count;i++) {
		vector<Case> testcase;
		Input(testcase, num);
		bool isoptimallen = isOptimalLen(testcase, inputlist, optcodelength);
		bool isprefixcode = isPrefixCode(testcase);
		if (isoptimallen && isprefixcode) {
			cout << "Yes" << endl;
		}
		else {
			cout << "No" << endl;
		}
	}

	system("pause");
	return 0;
}

void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist) {
	char element;
	int weight;
	for (int i = 0; i < num; i++) {
		cin >> element >> weight;
		HFMTree node = new Node(element, weight);
		inputlist.push_back(node);
		Insert(minHeap, node);
	}
	//minHeap.erase(minHeap.begin());
}

void Insert(vector<HFMTree>& minHeap, HFMTree node) {
	int index = minHeap.size();
	minHeap.push_back(node);

	//每次插入後自底向上進行調整
	while ((*minHeap[index / 2]).weight > (*node).weight) {
		//此處不可單純進行值交換,需要交換兩個物件
		//(*minHeap[index]).element = (*minHeap[index / 2]).element;
		//(*minHeap[index]).weight = (*minHeap[index / 2]).weight;
		minHeap[index] = minHeap[index / 2];
		index /= 2;
	}
	minHeap[index] = node;
}

HFMTree CreateHFMT(vector<HFMTree>& minHeap) {

	HFMTree hfmtree = new Node();
	int size = minHeap.size() - 1;
	//進行size-1次合併
	for (int i = 1; i < size; i++) {
		HFMTree node = new Node();
		//每次從最小堆中取出堆頂的兩個結點作為該結點的左右子結點
		node->left = DeleteMinHeap(minHeap);
		node->right = DeleteMinHeap(minHeap);
		node->weight = node->left->weight + node->right->weight;
		//將該結點作為根節點的二叉樹重新加入最小堆
		Insert(minHeap, node);
	}

	//從最小堆中取出建好的Huffman樹
	hfmtree = DeleteMinHeap(minHeap);

	return hfmtree;
}

HFMTree DeleteMinHeap(vector<HFMTree>& minHeap) {
	//檢查是否堆空
	if (minHeap.size() == 1) {
		return NULL;
	}

	//將該堆最大元素裝入新結點並返回
	HFMTree node = new Node();
	node = minHeap[1];

	//重新調整該堆
	int size = minHeap.size();
	int parent, child;
	//用最大堆中最後一個元素從根結點開始向上過濾下層結點
	HFMTree cmp = new Node();
	cmp = minHeap[size - 1];

	//從根節點開始,用parent記錄根結點下標,用child記錄其最小子結點下標,每次迴圈將parent更新為上一次迴圈的child
	//當parent指向底層結點時跳出迴圈(會有極端情況比如偏向一邊的堆使得parent最終並非指向該子樹底層結點,但不影響結果)
	for (parent = 1; 2 * parent < size; parent = child) {
		child = parent * 2;
		//若該子結點不是堆尾結點,令child指向左右子結點中的較小者
		if ((child != size - 1) && ((*minHeap[child]).weight > (*minHeap[child + 1]).weight)) {
			child++;
		}
		//當迴圈到堆尾結點值小於等於該子結點值時,可以結束(此時堆尾結點會替換parent結點而不是child結點)
		if (cmp->weight <= (*minHeap[child]).weight) {
			break;
		}
		else {
			minHeap[parent] = minHeap[child];
		}
	}
	//將尾結點與當前父結點替換
	minHeap[parent] = cmp;

	//刪除堆尾結點
	//此處不能用minHeap.erase(minHeap.end());,因為erase會返回被刪除結點的下一結點,而尾結點的下一結點超限
	minHeap.pop_back();

	//返回該結點
	return node;
}

int getHFMLength(HFMTree hfmtree, int depth) {
	//若為葉子節點,直接返回其編碼長度
	if (!hfmtree->left && !hfmtree->right) {
		return hfmtree->weight * depth;
	}
	//否則其他節點一定有兩個子樹,返回左右子樹編碼長度之合,深度相應加一
	else {
		return getHFMLength(hfmtree->left, depth + 1) + getHFMLength(hfmtree->right, depth + 1);
	}
}

void Input(vector<Case>& testcase, int num) {
	for (int i = 0;i < num;i++) {
		Case inputcase;
		cin >> inputcase.element >> inputcase.route;
		inputcase.length = inputcase.getlength();
		testcase.push_back(inputcase);
	}
}

bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight) {
	int testweight = 0;
	for (int i = 0;i < testcase.size();i++) {
		testweight += (testcase[i].length * (*inputlist[i]).weight);
	}
	if (testweight == weight) {
		return true;
	}
	else {
		return false;
	}

}

bool isPrefixCode(vector<Case>& testcase) {
	bool isprefixcode = true;
	HFMTree newtree = new Node();

	//兩種情況驗證不滿足字首碼要求:1.後建立的分支經過或超過已經被定義的葉子結點,2.後建立分支建立結束時未達到葉子結點
	for (int i = 0;i < testcase.size();i++) {
		HFMTree point = newtree;
		if (isprefixcode == false)break;

		for (int j = 0;j < testcase[i].length;j++) {

			if (isprefixcode == false)break;

			if (testcase[i].route[j] == '0') {
				//先檢查左子結點是否存在,若不存在,則建立一個左子結點
				if (!point->left) {
					HFMTree newnode = new Node();
					point->left = newnode;
					point = point->left;
					//若此時為分支的最後一環,則將該結點定義為葉子結點
					if (j == testcase[i].length - 1) {
						point->isleave = true;
					}
				}
				//若左子樹存在,則先將標記指標移至左子樹。
				else {
					point = point->left;
					//若左子樹為葉子結點,則不符合要求
					if (point->isleave) {
						isprefixcode = false;
						break;
					}
					//若此時為分支的最後一環且仍有葉子結點,則不符合要求
					if ((j == testcase[i].length - 1) && (point->left || point->right)) {
						isprefixcode = false;
						break;
					}
				}
			}
			else if (testcase[i].route[j] == '1') {
				//先檢查右子結點是否存在,若不存在,則建立一個右子結點
				if (!point->right) {
					HFMTree newnode = new Node();
					point->right = newnode;
					point = point->right;
					//若此時為分支的最後一環,則將該結點定義為葉子結點
					if (j == testcase[i].length - 1) {
						point->isleave = true;
					}
				}
				//若左子樹存在,則先將標記指標移至左子樹。
				else {
					point = point->right;
					//若左子樹為葉子結點,則不符合要求
					if (point->isleave) {
						isprefixcode = false;
						break;
					}
					//若此時為分支的最後一環且仍有葉子結點,則不符合要求
					if ((j == testcase[i].length - 1) && (point->left || point->right)) {
						isprefixcode = false;
						break;
					}
				}
			}
		}
	}

	return isprefixcode;
}

#Coding一小時,Copying一秒鐘。留個言點個讚唄,謝謝你#