Thrift 物件序列化、反序列化-位元組陣列分析

Thrift · 發表 2018-11-24 23:27:49

摘要：說明本篇部落格僅分析Thrift物件的序列化、反序列化的位元組陣列，以及Thrift物件的序列化、反序列化原理。其他原始碼分析會另開章節~ 準備工作定義一個 Thrift 檔案 struct Person { 1: required i32 ag...

說明

本篇部落格僅分析Thrift物件的序列化、反序列化的位元組陣列，以及Thrift物件的序列化、反序列化原理。其他原始碼分析會另開章節~

準備工作

定義一個 Thrift 檔案

struct Person {
1: required i32 age;
2: required string name;
 }

生成 Java 程式碼

thrift -r --gen java test.thrift

測試程式碼

@Test
public void testPerson() throws TException {

Person person = new Person().setAge(18).setName("yano");
System.out.println(person);

TSerializer serializer = new TSerializer();
byte[] bytes = serializer.serialize(person);
System.out.println(Arrays.toString(bytes));

Person parsePerson = new Person();
TDeserializer deserializer = new TDeserializer();
deserializer.deserialize(parsePerson, bytes);
System.out.println(parsePerson);

}

輸出結果

com.yano.nankai.spring.thrift.Person(age:18, name:yano)
[8, 0, 1, 0, 0, 0, 18, 11, 0, 2, 0, 0, 0, 4, 121, 97, 110, 111, 0]
com.yano.nankai.spring.thrift.Person(age:18, name:yano)

序列化過程

上述測試用例首先新建了Person物件，這個物件只有兩個field。接著呼叫Thrift的TSerializer對person物件進行序列化。

其生成的位元組陣列為：

[8, 0, 1, 0, 0, 0, 18, 11, 0, 2, 0, 0, 0, 4, 121, 97, 110, 111, 0]

TSerializer類的serialize方法如下，最終是呼叫了person物件的write方法。

public byte[] serialize(TBase base) throws TException {
this.baos_.reset();
base.write(this.protocol_);
return this.baos_.toByteArray();
}

Person類的write方法：

public void write(TProtocol oprot) throws TException {
validate();

oprot.writeStructBegin(STRUCT_DESC);
oprot.writeFieldBegin(AGE_FIELD_DESC);
oprot.writeI32(this.age);
oprot.writeFieldEnd();
if (this.name != null) {
oprot.writeFieldBegin(NAME_FIELD_DESC);
oprot.writeString(this.name);
oprot.writeFieldEnd();
}
oprot.writeFieldStop();
oprot.writeStructEnd();
}

其中TProtocol預設為TBinaryProtocol，writeStructBegin()和writeStructEnd()方法為空。

oprot.writeFieldBegin(AGE_FIELD_DESC);

TBinaryProtocol 中的具體實現為：

public void writeFieldBegin(TField field) throws TException {
this.writeByte(field.type);
this.writeI16(field.id);
}

可以看到，首先是將位元組陣列寫入了一個byte表示該欄位的型別，而這裡的TFiled AGE_FIELD_DESC 為：

private static final TField AGE_FIELD_DESC = new TField("age", TType.I32, (short)1);

在thrift中定義的第一個欄位為：

1: required i32 age;

其中TType的定義如下：

public final class TType {
public static final byte STOP = 0;
public static final byte VOID = 1;
public static final byte BOOL = 2;
public static final byte BYTE = 3;
public static final byte DOUBLE = 4;
public static final byte I16 = 6;
public static final byte I32 = 8;
public static final byte I64 = 10;
public static final byte STRING = 11;
public static final byte STRUCT = 12;
public static final byte MAP = 13;
public static final byte SET = 14;
public static final byte LIST = 15;
public static final byte ENUM = 16;

public TType() {
}
}

那麼位元組陣列的第一個元素就是i32 這個型別，為8。

接下來會寫入這個欄位所定義的id，age欄位的id為1（注意這裡是佔兩個位元組），所以位元組陣列接下來的兩個元素是 0，1。

對於name欄位也是同理。

輸出的位元組陣列每個值所代表的含義：

8 // 資料型別為i32
0, 1 // 欄位id為1
0, 0, 0, 18 // 欄位id為1（age）的值，佔4個位元組
11 // 資料型別為string
0, 2 // 欄位id為2（name）
0, 0, 0, 4 // 字串name的長度，佔4個位元組
121, 97, 110, 111 // "yano"的4個ASCII碼（其實是UTF-8編碼）
0 // 結束

反序列化過程

其反序列化的語句為：

Person parsePerson = new Person();
TDeserializer deserializer = new TDeserializer();
deserializer.deserialize(parsePerson, bytes);

Person類的read函式：

public void read(TProtocol iprot) throws TException {
TField field;
iprot.readStructBegin();
while (true)
{
field = iprot.readFieldBegin();
if (field.type == TType.STOP) { 
break;
}
switch (field.id) {
case 1: // AGE
if (field.type == TType.I32) {
this.age = iprot.readI32();
setAgeIsSet(true);
} else { 
TProtocolUtil.skip(iprot, field.type);
}
break;
case 2: // NAME
if (field.type == TType.STRING) {
this.name = iprot.readString();
} else { 
TProtocolUtil.skip(iprot, field.type);
}
break;
default:
TProtocolUtil.skip(iprot, field.type);
}
iprot.readFieldEnd();
}
iprot.readStructEnd();

// check for required fields of primitive type, which can't be checked in the validate method
if (!isSetAge()) {
throw new TProtocolException("Required field 'age' was not found in serialized data! Struct: " + toString());
}
validate();
}

其程式碼也很簡單清晰，先在位元組陣列中讀取TField（5個位元組，1位元組型別+4位元組id），接著根據id將其賦值給對應的欄位。

其中有很多細節，就不一一介紹了。我寫得也不如原始碼清楚。

與 Google Protocol Buffers 的對比

我曾經分析過Google Protocol Buffers 的序列化位元組碼，ofollow,noindex">Google Protocol Buffers 序列化演算法分析。感覺兩者在序列化位元組陣列方面實現差別還是挺大的：

Thrift的位元組碼並不緊湊，比如每個欄位的id佔4個位元組，型別佔1個位元組；而Google Protocol Buffers的欄位id和型別佔同一個位元組，而且對於i32等型別還會使用varint減少陣列長度。
Thrift生成的Java程式碼很簡潔，程式碼實現也很簡潔；Google Protocol Buffers生成的Java程式碼動不動就幾千行……
Thrift不單單是一個序列化協議，更是一個rpc呼叫框架；從這方面來說，Google Protocol Buffers是完全做不到的。