1. 程式人生 > >lzw字串壓縮演算法實現

lzw字串壓縮演算法實現

lzw演算法思想舉例:

原輸入資料為:A B A B A B A B B B A B A B A A C D A C D A D C A B A A A B A B .....
採用LZW演算法對其進行壓縮,壓縮過程用一個表來表述為:
注意原資料中只包含4個character,A,B,C,D
用兩bit即可表述,根據lzw演算法,首先擴充套件一位變為3為,Clear=2的2次方+1=4; End=4+1=5;
初始標號集因該為


0 1 2 3 4 5
A B C D Clear End

而壓縮過程為:

第幾步 字首 字尾 Entry 認識(Y/N) 輸出 標號
1
A (,A)
2 A B   (A,B)       N A 6
3 B A   (B,A)       N B 7
4 A B   (A,B)       Y
5 6 A   (6,A)       N 6 8
6 A B   (A,B)       Y
7 6 A   (6,A)       Y
8 8 B   (8,B)       N 8 9
9 B B   (B,B)       N B 10
10 B B   (B,B)       Y
11 10
A   (10,A)       N 10 11
12 A B   (A,B)       Y

.....

當進行到第12步的時候,標號集應該為

0 1 2 3 4 5 6 7 8 9 10 11
A B C D Clear End AB BA 6A 8B BB 10A


演算法實現:

#include <string>
#include <map>
#include <iostream>
#include <iterator>
#include <vector>

// Compress a string to a list of output symbols.
// The result will be written to the output iterator
// starting at "result"; the final iterator is returned.
void compress(const std::string &uncompressed, std::vector<int>& vec) {
	// Build the dictionary.
	int dictSize = 256;
	std::map<std::string,int> dictionary;
	for (int i = 0; i < 256; i++)
	{
		dictionary[std::string(1, i)] = i;   
   	}
		
	std::string w;
	for (std::string::const_iterator it = uncompressed.begin();
		 it != uncompressed.end(); ++it) {
		char c = *it;
		std::string wc = w + c;
		if (dictionary.count(wc))
			w = wc;
		else {
			vec.push_back(dictionary[w]);
			// Add wc to the dictionary.
			dictionary[wc] = dictSize++;
			w = std::string(1, c);
		}
	}
 
	// Output the code for w.
	if (!w.empty())
		vec.push_back( dictionary[w]);
}
 
// Decompress a list of output ks to a string.
// "begin" and "end" must form a valid range of ints

std::string decompress(std::vector<int>& vec) {
	// Build the dictionary.
	int dictSize = 256;
	std::map<int,std::string> dictionary;
	for (int i = 0; i < 256; i++)
		dictionary[i] = std::string(1, i);

	std::vector<int>::iterator it = vec.begin();
	std::string w(1, *it);
	std::string result = w;
	std::string entry;
	for ( it++; it != vec.end(); it++) {
		int k = *it;
		if (dictionary.count(k))
			entry = dictionary[k];
		else if (k == dictSize)
			entry = w + w[0];
		else
			throw "Bad compressed k";
 
		result += entry;
 
		// Add w+entry[0] to the dictionary.
		dictionary[dictSize++] = w + entry[0];
 
		w = entry;
	}
	return result;
}
 
int main() {
	std::vector<int> compressed;
	compress("TOBEORNOTTOBEORTOBEORNOT", compressed);
	copy(compressed.begin(), compressed.end(), std::ostream_iterator<int>(std::cout, ", "));
	std::cout << std::endl;
	std::string decompressed = decompress(compressed);
	std::cout << decompressed << std::endl;

	return 0;
}



各種版本的語言實現地址:

http://rosettacode.org/wiki/LZW_compression