1. 程式人生 > >關於mysql innodb 如何儲存大物件(BLOB等),最強解析

關於mysql innodb 如何儲存大物件(BLOB等),最強解析

                       Externally Stored Fields in InnoDB


This article discusses the storage (inline and external) of field data in the InnoDB storage engine. All fields of variable length types like VARCHARVARBINARYBLOB and TEXT can be stored inline within the clustered index record, or stored externally in separate BLOB pages outside of the index record (but within the same tablespace). All of these fields can be classified as large objects. These large objects are either binary large objects or character large objects. The binary large objects do not have an associated character set, whereas the character large objects do.    >>本文主要討論mysql innodb儲存引擎下如何儲存大物件。所有的變長型別,如VARCHAR,VARBINARY,BLOG,TEXT即可能被儲存在索引頁中(innodb 表是索引組織表),也可能被儲存在external pages(溢位頁,溢位頁同索引頁在同一個表空間)。至於這些列是儲存在B-tree pages還是被儲存在external pages,主要取決於下面三個因素:
1).大物件的size;2).整個行的size;3).innodb 的 row format。大物件可能是二進位制大物件也可能是字串大物件,二進位制大物件沒有相關字符集,而字串大物件有字符集。

Within the InnoDB storage engine there is no difference between the way character large objects and binary large objects are handled. Throughout this article we will use the term “BLOB field” to refer to any of the aforementioned field types that can be chosen for external storage.    >>對於innodb儲存引擎來說,對於二進位制和字串大物件的處理都是一樣的。本文下面的描述中 “BLOB field”只是一個統稱,它代表mysql中支援的所有大物件比如BLOB,TEXT,LONG VARCHAR(應該還包括LONG VARBINARY)

This article will provide information about the following: >>下面的文章包含如下幾部分內容

  • Explain when a BLOB field will be stored inline and when it will be stored externally, with respect to the clustered index record.    >>本文說明了主鍵索引中的大物件什麼情況下會被儲存在B-tree pages,什麼情況下會被儲存在external pages中(注意只有主鍵索引中可能存在大物件)
  • The structure of the BLOB reference.    >>BLOB reference的結構
  • The BLOB prefix that is stored in the clustered index record, when the BLOB is stored externally.    >>如果表的row_format為COMPACT或者REDUNDANT時如果大物件被儲存在external pages,BLOG欄位會在索引頁儲存 768 bytes資料加上一個20 bytes的指標(BLOB reference),其他的儲存在external pages。如果表的row_format為DYNAMIC或者COMPRESSED(innodb_file_format=Barracuda在支援之前的REDUNDANT和COMPACT row_format基礎上新增了Dynamic和Compressed兩種 row_format),時,如果大物件被儲存在external pages,索引頁中不會再儲存BLOB prefix(BLOB 前768 bytes 資料)只儲存BLOB reference。
  • Utility gdb functions to examine the BLOB reference and the record offsets array.    >>使用gdb 程式分析BLOB reference和記錄的位置偏移量

The BLOB fields are associated with the clustered index records (the primary key) of a table. Only the clustered index can store a BLOB field externally. A secondary index cannot have externally stored fields. For the purposes of this article, we won’t deal with any secondary indexes.    >>只有聚集索引才能儲存大物件在external pages,所以我們下面說的都是聚集索引中大物件如何儲存。

The Schema

The following example table will be used to present the information:    >>下面是用來說明問題的示例表

Database SchemaMySQL
12345CREATETABLEt1(f1INTPRIMARY KEY,f2BLOB,f3TEXT);INSERTINTOt1VALUES(1,REPEAT('௲',1000),REPEAT('௱',1000));INSERTINTOt1VALUES(2,REPEAT('௲',20000),REPEAT('௱',20000));INSERTINTOt1VALUES(3,REPEAT('௲',20000),REPEAT('௱',1500));INSERTINTOt1VALUES(4,REPEAT('௲',1500),REPEAT('௱',20000));
Note: ௱ – Tamil number one hundred (Unicode 0BF1), ௲ – Tamil number one thousand (Unicode 0BF2)

A single clustered index record can have 1 or more externally stored BLOBs. So for the given table definition of t1, there are 4 possible ways that the BLOB fields of f2 and f3 can be stored:    >>測試表t1中有f2和f3兩個列儲存的大物件,所以理論上innodb 有如下4中方式儲存它們

  1. f2 and f3 are both stored inline within the clustered index page    >>f2和f3都儲存在索引頁中
  2. f2 is stored inline, while f3 is stored externally    >>f2儲存在索引頁中,f3儲存在 external pages中
  3. f3 is stored inline, while f2 is stored externally    >>f3儲存在索引頁中,f2儲存在 external pages中
  4. both f2 and f3 are stored externally    >>f2和f3都儲存在 external pages中

In the following sections, let us see which of the BLOB columns are externally stored and which of them are stored inline, for each sample row we created above. Note that the row format of table t1 is not explicitly specified. In MySQL 5.6, it will default to the COMPACT row format.  Please keep this in mind as we discuss the example.    >>下面我們來看一下,上面表中的每一行記錄中大物件是如何儲存的。注意我們建表時沒有指定row_format ,所以t1表使用 mysql 5.6中的預設值 COMPACT。

Overview of BLOB Storage

The BLOB data can be stored inline in the clustered index record, or it can be stored externally in separate BLOB pages. These external BLOB pages are allocated from the same tablespace in which the clustered index resides. The BLOB data will always be stored inline whenever possible though. If and only if this is not possible because of the record size, then the BLOB field will be stored externally.  This is true for all of the current row formats: REDUNDANT, COMPACT, DYNAMIC, and COMPRESSED. Let’s now take a look at the storage details for the BLOB columns in our example table.    >>innodb引擎下,大物件即可能被儲存在索引頁中也可能被儲存在external pages中(external pages同聚集索引頁在同一個表空間)。在可能的情況下大物件會被儘可能的儲存在索引頁中。當記錄的size 過大,無法保證一個索引頁中儲存最少兩條記錄時,大物件就會被儲存在external pages中。這個原則對當前的row format,REDUNDANT,COMPACT,DYNAMIC,COMPRESSED都是適用的。

In MySQL 5.6, the default row format of a table is COMPACT, so that’s what our t1 table is using. The default page size is 16K, so that’s also what we’re using. The maximum record size that can be stored in a 16K page using the COMPACT row format is 8126 bytes. The function page_get_free_space_of_empty() will tell us the total free space available in a page. The value returned by this function, which is then divided by 2, will give us the maximum allowed record size. The division by 2 is required because an index page must contain a minimum of 2 records. Let’s look at an example (the argument value of “1” tells the function that the row format for the page is COMPACT):    >>Mysql 5.6 下 row format 的預設值是COMPACT,預設的page size是 16k。這時表中單條記錄最大尺寸為 8126 bytes. page_get_free_space_of_empty()能夠計算出page 總得剩餘空間,用該值除以2 即得出記錄的最大尺寸(除以2是因為一個頁中最少要儲存兩個記錄)。在如上條件下,記錄的最大尺寸為8126 bytes

123(gdb)callpage_get_free_space_of_empty(1)/2$4=8126(gdb)

The following table shows the storage details of the BLOB columns for each row in table t1. Keeping in mind the maximum allowed record size, it is clear as to why the 60000 bytes of BLOB data is always stored externally. It simply will not fit within a single clustered index record.    >>下面的表格顯示了測試表中,各行中 f2和f3列如何儲存。正如上面所分析的,單條記錄最大 size 為 8216 bytes,所以size 為60000 bytes的 大物件都被儲存在 external pages中

PRIMARY KEYLENGTH OF F2STORAGE OF F2LENGTH OF F3STORAGE OF F3
13000 bytesInline3000 bytesInline
260000 bytesExternal60000 bytesExternal
360000 bytesExternal4500 bytesInline
44500 bytesInline60000 bytesExternal

Inline Storage of BLOBs

As mentioned previously, no BLOB fields will be externally stored if the size of the record is less than the maximum record size allowed in a page. In our example table, for the row with a primary key value of 1, no BLOB fields need to be stored externally because the full record size is less than 8126 bytes. The following tables give the sizing details for each of the rows in our example table:    >>當行的 size小於 8126 bytes時,所有大物件都儲存在索引頁。所以測試表中的第一條記錄中的大物件不需要儲存在 external pages中。下面的表格描述了測試表中各行的具體大小,以及被儲存在external pages中的列

PRIMARY KEYCLUSTERED INDEX RECORD SIZE (IN BYTES)FIELDS MOVED OUTRECORD SIZE AFTER MOVING BLOB OUTSIDE (IN BYTES)MAXIMUM ALLOWED RECORD SIZE (IN BYTES)
16027-60278126
2120027f2, f316038126
364527f253158126
464527f353158126

As we can see, the BLOB fields are stored externally until the record size falls below the limit. In the table above, column 2 gives the initial clustered index record size. If this size is greater than the allowed maximum size of the record (shown in column 5), then the function dtuple_convert_big_rec() is invoked to choose the fields destined for external storage. Column 3 lists the fields that have been chosen by this function for external storage. Column 4 shows the clustered index record size after moving the chosen fields to external storage. Again, this value must be less than the maximum record size, shown in column 5 (which is 8126 bytes in our example).    >>上面的表中第二列是每行記錄的 size. 如果行的size 大於 所允許的最大size(本例中是8126 bytes),那麼會呼叫dtuple_convert_big_rec()函式來選擇哪些列將被儲存在external pages。上面表格的第三列是每行中被儲存在external pages中的列資訊。上面表格的第四列表示除了external pages儲存資料外,主鍵索引的size(這個值肯定是小於 maximum record size的)。上面表格的第五列表示當前條件下maximum record size值

All of the size details provided above are obtained through the debugger from the callers of this function.

For more clarity, let me explain the clustered index record length for the first row with a primary key value of 1  (shown as 6027 bytes). The length of the user fields f1, f2, and f3 are 4 bytes, 3000 bytes, and 3000 bytes respectively. The length of the system fields DB_ROLL_PTR and DB_TRX_ID are 7 bytes and 6 bytes respectively. The record header stores the length of the f2 and f3 fields, taking 4 bytes (2 bytes for each field). The record header also contains a null bit array, which for this record takes up 1 byte. Lastly, the record header contains REC_N_NEW_EXTRA_BYTES—which is 5 bytes for the COMPACT row format—of additional information. The complete storage details for the final record are presented in the following table:    >>下面我們來分析一下測試表中第一條記錄的6027 bytes是如何計算出來的。第一行中,f1,f2,f3的長度分別為4 bytes,3000 bytes,3000 bytes。標識回滾段和事務id的列分別佔7 bytes和6 bytes。record header中還包含 null標識佔1 byte。最後REC_N_NEW_EXTRA_BYTES佔用5 bytes(row format 為 COMPACT時)。這些值加起來 正好為6027 bytes

FIELDLENGTH (IN BYTES)
TOTAL6027
Length of f32
Length of f22
Null bit array1
REC_N_NEW_EXTRA_BYTES5
f14
DB_TRX_ID6
DB_ROLL_PTR7
f23000
f33000

You can refer to the documentation in  for more details about the COMPACT and REDUNDANT row formats. The REDUNDANT row format is also explained in the blog article InnoDB REDUNDANT Row Format.

Choosing Fields for External Storage

As discussed above, the function dtuple_convert_big_rec() is invoked to decide which parts of the oversized clustered index record will be chosen for external storage. This function makes use of the following rules to decide this:    >>當行的size 大於maximum record size值時,會呼叫dtuple_convert_big_rec()決定哪些大物件被儲存在 external pages。該函式通過如下規則確定哪些大物件會被儲存在external中

  • No fixed length fields can be chosen for external storage.    >>固定長度型別的列不會被儲存在 external pages中,如char
  • No variable length fields whose size is less than or equal to 40 (2 * BTR_EXTERN_FIELD_REF_SIZE) bytes will be chosen for external storage.  >>可變長度列型別 如果size小於等於40 bytes(2 * BTR_EXTERN_FIELD_REF_SIZE),不會被儲存在external pages中
  • No variable length fields whose size is less than the BLOB prefix size will be chosen for external storage.  This means that in the case of REDUNDANT and COMPACT row formats, if the field data length is less than or equal to 768 bytes (DICT_ANTELOPE_MAX_INDEX_COL_LEN), then it will not be chosen for external storage. This rule is not applicable for DYNAMIC and COMPRESSED row formats, because their BLOB prefix is 0 bytes.    >>可變長度列型別,如果size 小於 BLOB prefix size,不會被儲存在external pages中(這條規則使用於 row_format 為COMPACT或者REDUNDANT 的表,不適於row_format為DYNAMIC 和 COMPRESSED 的表)。

In the function dtuple_convert_big_rec(), we examine one BLOB field at a time for potential external storage, and then move it to external storage if it passes the criteria noted above, until the clustered index record size falls within the maximum allowed. Larger fields will be selected for external storage before smaller fields, to ensure that maximum space savings happens in the clustered index page. This ensures that more records can be stored in each clustered index page.    >>dtuple_convert_big_rec()函式一次檢查一個BLOB field列,通過上面的過濾條件來決定該BLOB field是否被儲存在external pages中,然後再檢查該行中的下一個BLOB field,直到列的size 小於當前條件下的 maximum size。在同一行中,size越大的field 越先被dtuple_convert_big_rec()函式檢查,以確保index pages中能保留更多的資料。

BLOB Reference

When a BLOB field is stored externally, a BLOB reference is stored in the clustered index record. The BLOB reference will be stored after the BLOB prefix, if any. This BLOB reference is 20 bytes, and it contains the following information:    >>>>當一個BLOB field被儲存在external pages時,會在index pages中儲存該BLOB field的BLOB reference(指示該BLOB的size,external page位置等元資料資訊)。如果表的row format為COMPACT或者REDUNDANT,那麼BLOB reference儲存在BLOB prefix之後。BLOB reference 佔用20 bytes空間,包含如下資訊:

  1. The space identifier (4 bytes)    >>>>表空間資訊(下圖中的Space ID部分)
  2. The page number where the first page of the BLOB is stored (4 bytes)    >>>>儲存BLOB field的起始 page號(下圖中的Page Number部分)
  3. The offset of the BLOB header within that page (4 bytes)    >>>>起始page中 BLOB header 的位移量(下圖中的Offset)
  4. The total size of the BLOB data (8 bytes)    >>>>該BLOB 的 total size(除了上面三個以外的剩餘部分)

Even though 8 bytes are available to store the total size of the BLOB data, only the last 4 bytes are actually used. This means that within InnoDB, the maximum size of a single BLOB field is currently 4GB.    >>>>儘管上面我們提到有8 bytes的空間可以用來儲存BLOB size資訊,但是實際上僅有 4 bytes空間可以真正用來儲存BLOB size 資訊。這就意味著,innodb中可以儲存的最大 BLOB field size 是4 GB(4 bytes=32 bit ,2^32=4GB)

Structure of InnoDB BLOB Reference

In the length field, two bits are used to store the ownership and inheritance information, which are not discussed in this article. We will cover that in a subsequent blog article. The most significant bit of the length field is used to store ownership information and the second most significant bit is used to store the inheritance information.    >>>>上圖中最後儲存BLOB size資訊部分,用 2 bits來儲存 ownership和inheritance 資訊,至於ownership和inheritance本文我們不做討論,在下篇的部落格中再做說明。

Here is a gdb function to print the contents of a BLOB reference. This function takes a pointer to the external BLOB reference as an argument. The calls to ntohl() are required because all data on disk is stored in network byte order by InnoDB.   >>>>這裡有一個 gdb 函式,可以用來列印 BLOB reference。

Contents of a BLOB Reference
12345678910111213141516define ib_print_blob_refset$ref=$arg0set$space_id=ntohl(*(ulint *)($ref))set$page_no=ntohl(*(ulint *)($ref+4))set$offset=ntohl(*(ulint *)($ref+8))set$flags=*(char*)($ref+12)set$ownership=$flags&0x80set$inherited=$flags&0x40set$length=ntohl(*(ulint *)($ref+16))printf"space_id : %lu\n",$space_idprintf"page_no : %lu\n",$page_noprintf"offset : %lu\n",$offsetprintf"ownership : %x\n",$ownershipprintf"inherited : %x\n",$inheritedprintf"length : %lu\n",$lengthend

BLOB Prefix

When a BLOB field is stored externally, we may also store a prefix of the value in the clustered index record, depending on the row format used. For the REDUNDANT and COMPACT row formats, a BLOB prefix of 768 bytes is stored in the clustered index record. For the DYNAMIC and COMPRESSED row formats, a BLOB prefix is never stored in the clustered index record. The BLOB prefix would be followed by the BLOB reference.    >>>>如果我們表的row format是COMPACT 或者REDUNDANT,那麼external pages儲存的BLOB field,會在index pages中儲存一個 BLOB prefix(BLOB field最開始的 768 bytes資料),在BLOB prefix 之後儲存著該BLOB 的BLOB reference。如果表的row format是 DYNAMIC,或者compressed,那麼不會在index pages中儲存BLOB prefix,所有的BLOB資料都被儲存在 external pages中,index page中只保留該BLOB相關的BLOB reference。

The BLOB prefix, when available, helps to calculate the secondary index key without needing to fetch the externally stored BLOB data (which involves at least one extra page load/fetch). This is possible because the maximum length of a secondary index key is 767 bytes. If we attempt to create a secondary index with a bigger length it will be automatically truncated with a warning. For example, consider the following statement:    >>>>

1CREATEINDEX

相關推薦

關於mysql innodb 如何儲存物件(BLOB)解析

                       Externally Stored Fields in InnoDBThis article discusses the storage (inline and external) of field data in the Inn

讓天下沒有難用的資料庫 » innodb使用欄位textblob的一些優化建議

最近看到一些老應用,在表結構的設計上使用了text或者blob的欄位;其中一個應用,對blob欄位的依賴非常的嚴重,查詢和更新的頻率也是非常的高,單表的儲存空間已經達到了近100G,這個時候,應用其實已經被資料庫綁死了,任何應用或者查詢邏輯的變更幾乎成為不可能; 為了清楚大欄位對效能的影響,我們必須

好東西!sqlite3中BLOB資料型別儲存物件運用示例

 1:常用介面 個人比較喜歡sqlite, 使用最方便,唯一的準備工作是下載250K的源;而且作者很熱心,有問必答。 以下演示一下使用sqlite的步驟,先建立一個數據庫,然後查詢其中的內容。2個重要結構體和5個主要函式: sqlite3               *

MySQL InnoDB儲存引擎:事務實現

事務基礎知識 1、事務ACID特性:     Atomic(原子性): 事務要麼成功,要麼失敗。     Consistency(一致性): 事務會把資料庫從一種一致狀態轉換為另一種一致狀態。   &

MySQL Innodb儲存引擎:索引

1,Innodb儲存引擎索引的使用的B+樹索引本身並不能找到具體的一條記錄,能找到只是該記錄所在的頁。然後資料庫通過把頁讀入到記憶體,再在記憶體中進行查詢,最後得到要查詢的資料。 B+樹的葉子節點是資料頁。頁中有多條記錄。 2、B+樹特點:所有記錄節點都是按鍵值的大小順序存放在同一層的葉

MySQL InnoDB儲存引擎體系架構 —— 索引高階

        眾所周知,在MySQL的InnoDB引擎,為了提高查詢速度,可以在欄位上新增索引,索引就像一本書的目錄,通過目錄來定位書中的內容在哪一頁。         InnoDB支援的索引有如下幾種: B+樹索引 全文索引 雜湊索引         筆者上一篇文

mysql innodb引擎 長時間使用後資料檔案遠大於實際資料量導致空間不足。

近期我碰到了一個令人頭疼的事情。就是我的mysql伺服器使用了很久之後,發現/data  目錄的空間佔滿了我係統的整個空間,馬上就要滿了。下面是我的分析。 在網上查看了這2個方法,但是執行後發現沒有解決。系統空間沒有變小。 1.optimize table table.n

談談MySQL InnoDB儲存引擎事務的ACID特性

在執行purge過程中,InnoDB儲存引擎首先從history list中找到第一個需要被清理的記錄,這裡為trx1,清理之後InnoDB儲存引擎會在trx1所在的Undo page中繼續尋找是否存在可以被清理的記錄,這裡會找到事務trx3,接著找到trx5,但是發現trx5被其他事務所引用而不能清理,故再

MySQL 如何儲存資料

最近,在工作中遇到了MySQL中如何儲存長度較長的欄位型別問題,於是花了一週多的時間抽空學習了一下,並且記錄下來。 MySQL大致的邏輯儲存結構在這篇文章中有介紹,做為基本概念: InnoDB 邏輯儲存結構 注:文中所指的大資料指的是長度較長的資料欄位,包括varch

Mysql-InnoDB儲存引擎中-鎖介紹

最近資料庫的學習都是基於InnoDB儲存引擎的,這一篇去學習第6章鎖的部分。之前有一篇是關於資料庫ACID是基於什麼保證的,ACD都分析過了,今天關於I-隔離性資料庫中是基於鎖來保證的。1. lock 和latchlatch主要保證併發執行緒操作臨界資源的正確性,沒有死鎖檢測

MySQL InnoDB儲存引擎

介紹  本篇文章是對Innodb儲存引擎的概念進行一個整體的概括,innodb儲存引擎的概念是mysql資料庫中最關鍵的幾個概念之一,涉及的內容非常的廣;由於個人的理解能力有限如果有不對的地方還見諒。 MySQL對應InnoDB版本 MySQL 5.1》InnoDB 1.0.X M

Oracle儲存過程物件(packageprocedure etc...) 呼叫許可權 ----20180206

在一些技術論壇裡面,常常看到有朋友問這種問題: 為什麼我的使用者具有DBA許可權,卻無法在儲存過程裡面建立一張普通表呢?  下面就結合具體案例來談談這個問題:  SQL> conn eric/eric; Connected.SQL> select * from dba_role_privs whe

MySQL InnoDB儲存引擎 聚集和非聚集索引

B+樹索引 索引的目的在於提高查詢效率,可以類比字典,如果要查“mysql”這個單詞,我們肯定需要定位到m字母,然後從下往下找到y字母,再找到剩下的sql。如果沒有索引,那麼你可能需要把所有單詞看一遍才能找到你想要的,如果我想找到m開頭的單詞呢?或者ze開頭的

MySQL InnoDB儲存引擎之表(一)

主要介紹InnoDB儲存引擎表的邏輯儲存以及實現。重點介紹資料在表中是如何組織和存放的。 1.索引組織表(index organized table)     在InnoDB儲存引擎中,表都是根據主鍵順序組織存放的,這種儲存方式的表叫索引組織表。在InnoDB存在引擎表中,

MySql Innodb儲存引擎--鎖和事務

lock和latch的比較latch 一般稱為閂鎖(輕量級的鎖) 因為其要求鎖定的時間非常短,若遲勳時間長,則應用效能非常差,在InnoDB儲存引擎中,latch有可以分為mutex(互斥鎖)和rwlock(讀寫鎖)其目的用來保證併發執行緒操作臨界資源的正確性,並且沒有死鎖檢

MySQL InnoDB儲存引擎隔離級別及髒讀、不重複讀、幻讀

前記: ORACLE不支援Read Uncommitted和Repeatable Read事務隔離級別; InnoDB預設是RR,使用Next-Key Lock演算法避免幻讀,達到Serializable隔離級別; 隔離級別越低,事務請求所越少或保持鎖的時間越短;

探祕MySQL InnoDB 儲存引擎

浪費了“黃金五年”的Java程式設計師,還有救嗎? >>>   

MySQL InnoDB 儲存引擎原理淺析

版權說明: 本文章版權歸本人及部落格園共同所有,轉載請標明原文出處( https://www.cnblogs.com/mikevictor07/p/12013507.html ),以下內容為個人理解,僅供參考。   前言: 本文主要基於MySQL 5.6以後版本編

『樂營銷』的營銷謊言成功的營銷案例!

大學 還要 中國 感覺 每次 思維 味道 原因 產品 世界上最大的營銷謊言是石頭賣出黃金的價格。“鉆石恒久遠,一顆永流傳!”大家都會記得這個著名的鉆石的廣告語。一種毫無用處的石頭,怎麽就超過了黃金的價格?這看似廣告語的成功,其實是其背後營銷思維的成功。   那麽,成功的營銷

作業三——求左部分中的值減去右部分值的絕對值是多少

給定一個長度為N(N>1)的整型陣列A,可以將A劃分成左右兩個部分,左部分A[0..K],右部分A[K+1..N-1],K可以取值的範圍是[0,N-2]。求這麼多劃分方案中,左部分中的最大值減去右部分最大值的絕對值,最大是多少? 給定整數陣列A和陣列的大小n,請返回題目所求的答案。 測