1. 程式人生 > >數學之路-python計算實戰(4)-Lempel-Ziv壓縮(2)

數學之路-python計算實戰(4)-Lempel-Ziv壓縮(2)

per tex alink header 一次 borde tar 文本文 寫入文件

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of ‘<‘, ‘>‘, ‘!‘ or ‘=‘

. When using native size, the size of the packed value is platform-dependent.

本博客所有內容是原創,假設轉載請註明來源

http://blog.csdn.net/myhaspl/


FormatC TypePython typeStandard sizeNotes
xpad byteno value
ccharstring of length 11
bsigned charinteger1(3)
Bunsigned charinteger1(3)
?_Boolbool1(1)
hshortinteger2(3)
Hunsigned short
integer2(3)
iintinteger4(3)
Iunsigned intinteger4(3)
llonginteger4(3)
Lunsigned longinteger4(3)
qlong longinteger8(2), (3)
Qunsigned long longinteger8(2), (3)
ffloatfloat4(4)
ddoublefloat8(4)
schar[]string
pchar[]string
Pvoid *integer (5), (3)

struct.pack(fmt, v1, v2, ...)

Return a string containing the values v1,

v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

truct.unpack(fmt, string)

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

讀文本文件並壓縮以及解 壓 ,部分代碼例如以下:

# -*- coding: utf-8 -*- 
#lempel-ziv算法
#code:[email protected]
import struct
mystr=""
print "\n讀取源文件".decode("utf8")
mytextfile= open(‘test2.txt‘,‘r‘)
try:
     mystr=mytextfile.read( )
finally:
     mytextfile.close()
my_str=mystr
#碼表
codeword_dictionary={}
#待壓縮文本長度
str_len=len(my_str)
#碼字最大長度
dict_maxlen=1
#將解析文本段的位置(下一次解析文本的起點)
now_index=0
#碼表的最大索引
max_index=0

#壓縮後數據
print "\n生成壓縮數據中".decode("utf8") 
compresseddata=[]
while (now_index<str_len):    
    #向後移動步長
    mystep=0
    #當前匹配長度
    now_len=dict_maxlen
    if now_len>str_len-now_index:
        now_len=str_len-now_index
    #查找到的碼表索引。0表示沒有找到
    cw_addr=0   
    while (now_len>0):
        cw_index=codeword_dictionary.get(my_str[now_index:now_index+now_len])
        if cw_index!=None:
    		#找到碼字
            cw_addr=cw_index
            mystep=now_len  
            break
        now_len-=1    
    if cw_addr==0:
        #沒有找到碼字,添加新的碼字
        max_index+=1
        mystep=1
        codeword_dictionary[my_str[now_index:now_index+mystep]]=max_index
        print "don‘t find the Code word,add Code word:%s index:%d"%(my_str[now_index:now_index+mystep],max_index)
    else:
        #找到碼字,添加新的碼字
        max_index+=1    
        if now_index+mystep+1<=str_len:
            codeword_dictionary[my_str[now_index:now_index+mystep+1]]=max_index
            if mystep+1>dict_maxlen:
                dict_maxlen=mystep+1      
        print "find the Code word:%s  add Code word:%s index:%d"%(my_str[now_index:now_index+now_len],my_str[now_index:now_index+mystep+1],max_index)  
.......
......
        my_codeword_dictionary[my_maxindex]=my_codeword_dictionary[cwkey]+cwlaster        
        uncompressdata.append(my_codeword_dictionary[cwkey])
        uncompressdata.append(cwlaster)     
    print ".",
uncompress_str=uncompress_str.join(uncompressdata)
uncompressstr=uncompress_str
print "\n將解壓結果寫入文件裏..\n".decode("utf8")
uncompress_file= open(‘uncompress.txt‘,‘w‘)
try:
    uncompress_file.write(uncompressstr)
    print "\n解壓成功,已解壓到uncompress.txt!

\n".decode("utf8") finally: uncompress_file.close()

以下對中文維基中對python的解釋文本進行壓縮:

技術分享

調用該程序先壓縮形成壓縮文件,然後打開壓縮文件解壓

$ pypy lempel-ziv-compress.py python.txt python.lzv

………………..

find the Code word: C add Code word: CP index:9938

index:9939de word:ython add Code word:ython

find the Code word:

^ add Code word:

^ h index:9940

find the Code word:ttp add Code word:ttp: index:9941

find the Code word:// add Code word://e index:9942

find the Code word:dit add Code word:ditr index:9943

find the Code word:a. add Code word:a.o index:9944

生成壓縮數據頭部

將壓縮數據寫入壓縮文件裏

…………….

. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .

將解壓結果寫入文件裏..

解壓成功,已解壓到uncompress.txt!

查看壓縮效果:

$ ls -l -h

…………….

-rw-rw-r-- 1 deep deep 5.0K Jul 1 20:55 lempel-ziv-compress.py

-rw-rw-r-- 1 deep deep 30K Jul 1 20:55 python.lzv

-rw-rw-r-- 1 deep deep 36K Jul 1 20:57 python.txt

-rw-rw-r-- 1 deep deep 36K Jul 1 20:55 uncompress.txt從上面顯示結果能夠看到,沒壓縮前為36K,壓縮後為30k

壓縮sqlite 3.8.5的所有源代碼

$ pypy lempel-ziv-compress.py sqlitesrc.txtsqlitesrc.lzv

查看壓縮效果:

$ ls -l -h

…………….

-rw-rw-r-- 1 deep deep 3.2M Jul 1 21:18 sqlitesrc.lzv

-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:16 sqlitesrc.txt

-rw-rw-r-- 1 deep deep 5.2M Jul 1 21:18 uncompress.txt

沒壓縮前為5.2M,壓縮後為3.2M


數學之路-python計算實戰(4)-Lempel-Ziv壓縮(2)