1. 程式人生 > >Zip文件格式

Zip文件格式

efi 長度 http set should trees dos target 核心

官方文檔

https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.0.txt

格式說明

在官方文檔中給出的ZIP格式如下:

  Overall .ZIP file format:

    [local file header 1]
    [file data 1]
    [data descriptor 1]
    . 
    .
    .
    [local file header n]
    [file data n]
    [data descriptor n]
    [archive decryption header] (EFS)
    [archive extra data record] (EFS)
    [central directory]
    [zip64 end of central directory record]
    [zip64 end of central directory locator] 
    [end of central directory record]

通常情況下,我們用到的ZIP文件格式:

[local file header + file data + data descriptor]{1,n} + central directory + end of central directory record
即
[文件頭+文件數據+數據描述符]{此處可重復n次}+核心目錄+目錄結束標識

當壓縮包中有多個文件時,就會有多個[文件頭+文件數據+數據描述符]

本片文章討論的就是這種通常用到的ZIP文件格式,若想了解完整的ZIP文件格式,請看官方文檔。

壓縮源文件數據區

[local file header + file data + data descriptor]

記錄著壓縮的所有文件的內容信息,每個壓縮文件都由local file header 、file data、data descriptor三部分組成,在這個數據區中每一個壓縮的源文件/目錄都是一條記錄。

local file header 文件頭

用於標識該文件的開始,記錄了該壓縮文件的信息。

OffsetBytesDescription
0 4 Local file header signature = 0x04034b50 (read as a little-endian number) 文件頭標識,值固定(0x04034b50)
4 2 Version needed to extract (minimum) 解壓文件所需 pkware最低版本
6 2 General purpose bit flag 通用比特標誌位(置比特0位=加密,詳情見後)
8 2 Compression method 壓縮方式(詳情見後)
10 2 File last modification time 文件最後修改時間
12 2 File last modification date 文件最後修改日期
14 2 CRC-32 CRC-32校驗碼
18 4 Compressed size 壓縮後的大小
22 4 Uncompressed size 未壓縮的大小
26 4 File name length (n) 文件名長度
28 2 Extra field length (m) 擴展區長度
30 n File name 文件名
30+n m Extra field 擴展區

general purpose bit flag: (2 bytes) 通用位標記

      Bit 0: If set, indicates that the file is encrypted.

      (For Method 6 - Imploding)
      Bit 1: If the compression method used was type 6,
             Imploding, then this bit, if set, indicates
             an 8K sliding dictionary was used.  If clear,
             then a 4K sliding dictionary was used.
      Bit 2: If the compression method used was type 6,
             Imploding, then this bit, if set, indicates
             3 Shannon-Fano trees were used to encode the
             sliding dictionary output.  If clear, then 2
             Shannon-Fano trees were used.

      (For Methods 8 and 9 - Deflating)
      Bit 2  Bit 1
        0      0    Normal (-en) compression option was used.
        0      1    Maximum (-exx/-ex) compression option was used.
        1      0    Fast (-ef) compression option was used.
        1      1    Super Fast (-es) compression option was used.

      Note:  Bits 1 and 2 are undefined if the compression
             method is any other.

      Bit 3: If this bit is set, the fields crc-32, compressed 
             size and uncompressed size are set to zero in the 
             local header.  The correct values are put in the 
             data descriptor immediately following the compressed
             data.  (Note: PKZIP version 2.04g for DOS only 
             recognizes this bit for method 8 compression, newer 
             versions of PKZIP recognize this bit for any 
             compression method.)

      Bit 4: Reserved for use with method 8, for enhanced
             deflating. 

      Bit 5: If this bit is set, this indicates that the file is 
             compressed patched data.  (Note: Requires PKZIP 
             version 2.70 or greater)

      Bit 6: Strong encryption.  If this bit is set, you should
             set the version needed to extract value to at least
             50 and you must also set bit 0.  If AES encryption
             is used, the version needed to extract value must 
             be at least 51.

      Bit 7: Currently unused.

      Bit 8: Currently unused.

      Bit 9: Currently unused.

      Bit 10: Currently unused.

      Bit 11: Currently unused.

      Bit 12: Reserved by PKWARE for enhanced compression.

      Bit 13: Used when encrypting the Central Directory to indicate 
              selected data values in the Local Header are masked to
              hide their actual values.  See the section describing 
              the Strong Encryption Specification for details.

      Bit 14: Reserved by PKWARE.

      Bit 15: Reserved by PKWARE.

compression method: (2 bytes) 壓縮方式

      (see accompanying documentation for algorithm
      descriptions)

      0 - The file is stored (no compression)
      1 - The file is Shrunk
      2 - The file is Reduced with compression factor 1
      3 - The file is Reduced with compression factor 2
      4 - The file is Reduced with compression factor 3
      5 - The file is Reduced with compression factor 4
      6 - The file is Imploded
      7 - Reserved for Tokenizing compression algorithm
      8 - The file is Deflated
      9 - Enhanced Deflating using Deflate64(tm)
     10 - PKWARE Data Compression Library Imploding
     11 - Reserved by PKWARE
     12 - File is compressed using BZIP2 algorithm

file data 文件數據

記錄了相應壓縮文件的數據

data descriptor 數據描述符

用於標識該文件壓縮結束,該結構只有在相應的local file header中通用標記字段的第3bit設為1時才會出現,緊接在壓縮文件源數據後。這個數據描述符只用在不能對輸出的 ZIP 文件進行檢索時使用。例如:在一個不能檢索的驅動器(如:磁帶機上)上的 ZIP 文件中。如果是磁盤上的ZIP文件一般沒有這個數據描述符。

OffsetBytesDescription
0 4 crc-32 CRC-32校驗碼
4 4 compressed size 壓縮後的大小
8 4 uncompressed size 未壓縮的大小

Central directory 核心目錄

記錄了壓縮文件的目錄信息,在這個數據區中每一條紀錄對應在壓縮源文件數據區中的一條數據。

核心目錄結構如下:

OffsetBytesDescription
0 4 Central directory file header signature = 0x02014b50 核心目錄文件header標識=(0x02014b50)
4 2 Version made by 壓縮所用的pkware版本
6 2 Version needed to extract (minimum) 解壓所需pkware的最低版本
8 2 General purpose bit flag 通用位標記
10 2 Compression method 壓縮方法
12 2 File last modification time 文件最後修改時間
14 2 File last modification date 文件最後修改日期
16 4 CRC-32 CRC-32校驗碼
20 4 Compressed size 壓縮後的大小
24 4 Uncompressed size 未壓縮的大小
28 2 File name length (n) 文件名長度
30 2 Extra field length (m) 擴展域長度
32 2 File comment length (k) 文件註釋長度
34 2 Disk number where file starts 文件開始位置的磁盤編號
36 2 Internal file attributes 內部文件屬性
38 4 External file attributes 外部文件屬性
42 4 relative offset of local header 本地文件頭的相對位移
46 n File name 目錄文件名
46+n m Extra field 擴展域
46+n+m k File comment 文件註釋內容

End of central directory record(EOCD) 目錄結束標識

目錄結束標識存在於整個歸檔包的結尾,用於標記壓縮的目錄數據的結束。每個壓縮文件必須有且只有一個EOCD記錄。

OffsetBytesDescription
0 4 End of central directory signature = 0x06054b50 核心目錄結束標記(0x06054b50)
4 2 Number of this disk 當前磁盤編號
6 2 number of the disk with the start of the central directory 核心目錄開始位置的磁盤編號
8 2 total number of entries in the central directory on this disk 該磁盤上所記錄的核心目錄數量
10 2 total number of entries in the central directory 核心目錄結構總數
12 2 Size of central directory (bytes) 核心目錄的大小
16 4 offset of start of central directory with respect to the starting disk number 核心目錄開始位置相對於archive開始的位移
20 2 .ZIP file comment length(n) 註釋長度
22 n .ZIP Comment

Zip文件格式