Google Protocol Buffer 簡單介紹

阿新 • • 發佈：2019-01-13

以下內容主要整理自官方文件。

為什麼使用 Protocol Buffers

通常序列化和解析結構化資料的幾種方式？

使用Java預設的序列化機制。這種方式缺點很明顯：效能差、跨語言性差。
將資料編碼成自己定義的字串格式。簡單高效，但是僅適合比較簡單的資料格式。
使用XML序列化。比較普遍的做法，優點很明顯，人類可讀，擴充套件性強，自描述。但是相對來說XML結構比較冗餘，解析起來比較複雜性能不高。

Protocol Buffers是一個更靈活、高效、自動化的解決方案。它通過一個.proto檔案描述你想要的資料結構，它能夠自動生成解析這個資料結構的Java類，這個類提供高效的讀寫二進位制格式資料的API。最重要的是Protocol Buffers

的擴充套件性和相容性很強，只要遵很少的規則就可以保證向前和向後相容。

.proto檔案

package tutorial;

option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE  
= 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

message AddressBook {
  repeated Person person = 1;
}

Protocol Buffers 語法

.proto檔案的語法跟Java的很相似，message相當於class，enum即列舉型別，基本的資料型別有bool

, int32, float, double, 和 string，型別前的修飾符有：

required 必需的欄位
optional 可選的欄位
repeated 重複的欄位

NOTE 1: 由於歷史原因，數值型的repeated欄位後面最好加上[packed=true]，這樣能達到更好的編碼效果。 repeated int32 samples = 4 [packed=true];

NOTE 2: Protocol Buffers不支援map，如果需要的話只能用兩個repeated代替：keys和values。

欄位後面的1,2,3…是它的欄位編號（tag number），注意這個編號在後期協議擴充套件的時候不能改動。[default = HOME]即預設值。為了避免命名衝突，每個.proto檔案最好都定義一個package，package用法和Java的基本類似，也支援import。

import "myproject/other_protos.proto";

擴充套件

PB語法雖然跟Java類似，但是它並沒有繼承機制，它有所謂的Extensions，這很不同於我們原來基於面向物件的JavaBeans式的協議設計。

Extensions就是我們定義message的時候保留一些field number 讓第三方去擴充套件。

message Foo {
  required int32 a = 1;
  extensions 100 to 199;
}

message Bar {

    optional string name =1;
    optional Foo foo = 2;
} 

extend Foo {
    optional int32 bar = 102;
}

也可以巢狀：

message Bar {

    extend Foo {
    optional int32 bar = 102;
    }

    optional string name =1;
    optional Foo foo = 2;
}

Java中設定擴充套件的欄位：

BarProto.Bar.Builder bar = BarProto.Bar.newBuilder();
bar.setName("zjd");
        
FooProto.Foo.Builder foo = FooProto.Foo.newBuilder();
foo.setA(1);
foo.setExtension(BarProto.Bar.bar,12);
        
bar.setFoo(foo.build());
System.out.println(bar.getFoo().getExtension(BarProto.Bar.bar));

個人覺得使用起來非常不方便。

有關PB的語法的詳細說明，建議看官方文件。PB的語法相對比較簡單，一旦能巢狀就能定義出非常複雜的資料結構，基本可以滿足我們所有的需求。

編譯.proto檔案

可以用Google提供的一個proto程式來編譯，Windows版本下載protoc.exe。基本使用如下：

protoc.exe -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto

.proto檔案中的java_package和java_outer_classname定義了生成的Java類的包名和類名。

Protocol Buffers API

AddressBookProtos.java中對應.proto檔案中的每個message都會生成一個內部類：AddressBook和Person。每個類都有自己的一個內部類Builder用來建立例項。messages只有getter只讀方法，builders既有getter方法也有setter方法。

Person

// required string name = 1;
public boolean hasName();
public String getName();

// required int32 id = 2;
public boolean hasId();
public int getId();

// optional string email = 3;
public boolean hasEmail();
public String getEmail();

// repeated .tutorial.Person.PhoneNumber phone = 4;
public List<PhoneNumber> getPhoneList();
public int getPhoneCount();
public PhoneNumber getPhone(int index);

Person.Builder

// required string name = 1;
public boolean hasName();
public java.lang.String getName();
public Builder setName(String value);
public Builder clearName();

// required int32 id = 2;
public boolean hasId();
public int getId();
public Builder setId(int value);
public Builder clearId();

// optional string email = 3;
public boolean hasEmail();
public String getEmail();
public Builder setEmail(String value);
public Builder clearEmail();

// repeated .tutorial.Person.PhoneNumber phone = 4;
public List<PhoneNumber> getPhoneList();
public int getPhoneCount();
public PhoneNumber getPhone(int index);
public Builder setPhone(int index, PhoneNumber value);
public Builder addPhone(PhoneNumber value);
public Builder addAllPhone(Iterable<PhoneNumber> value);
public Builder clearPhone();

除了JavaBeans風格的getter-setter方法之外，還會生成一些其他getter-setter方法：

has_ 非repeated的欄位都有一個這樣的方法來判斷欄位值是否設定了還是取的預設值。
clear_ 每個欄位都有1個clear方法用來清理欄位的值為空。
_Count 返回repeated欄位的個數。
addAll_ 給repeated欄位賦值集合。
repeated欄位還有根據index設定和讀取的方法。

列舉和巢狀類

message巢狀message會生成巢狀類，enum會生成未Java 5的列舉型別。

public static enum PhoneType {
  MOBILE(0, 0),
  HOME(1, 1),
  WORK(2, 2),
  ;
  ...
}

Builders vs. Messages

所有的messages生成的類像Java的string一樣都是不可變的。要例項化一個message必須先建立一個builder，修改message類只能通過builder類的setter方法修改。每個setter方法會返回builder自身，這樣就能在一行程式碼內完成所有欄位的設定：

Person john =
  Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("[email protected]")
    .addPhone(
      Person.PhoneNumber.newBuilder()
        .setNumber("555-4321")
        .setType(Person.PhoneType.HOME))
    .build();

每個message和builder提供了以下幾個方法：

isInitialized(): 檢查是否所有的required欄位都已經設定；
toString(): 返回一個人類可讀的字串，這在debug的時候很有用；
mergeFrom(Message other): 只有builder有該方法，合併另外一個message物件，非repeated欄位會覆蓋，repeated欄位則合併兩個集合。
clear(): 只有builder有該方法，清除所有欄位回到空值狀態。

解析和序列化

每個message都有以下幾個方法用來讀寫二進位制格式的protocol buffer。關於二進位制格式，看這裡（可能需要FQ）。

byte[] toByteArray(); 將message序列化為byte[]。
static Person parseFrom(byte[] data); 從byte[]解析出message。
void writeTo(OutputStream output); 序列化message並寫到OutputStream。
static Person parseFrom(InputStream input); 從InputStream讀取並解析出message。

每個Protocol buffer類提供了對於二進位制資料的一些基本操作，在面向物件上面做的並不是很好，如果需要更豐富操作或者無法修改.proto檔案的情況下，建議在生成的類的基礎上封裝一層。

Writing A Message

import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;

class AddPerson {
  // This function fills in a Person message based on user input.
  static Person PromptForAddress(BufferedReader stdin,
                                 PrintStream stdout) throws IOException {
    Person.Builder person = Person.newBuilder();

    stdout.print("Enter person ID: ");
    person.setId(Integer.valueOf(stdin.readLine()));

    stdout.print("Enter name: ");
    person.setName(stdin.readLine());

    stdout.print("Enter email address (blank for none): ");
    String email = stdin.readLine();
    if (email.length() > 0) {
      person.setEmail(email);
    }

    while (true) {
      stdout.print("Enter a phone number (or leave blank to finish): ");
      String number = stdin.readLine();
      if (number.length() == 0) {
        break;
      }

      Person.PhoneNumber.Builder phoneNumber =
        Person.PhoneNumber.newBuilder().setNumber(number);

      stdout.print("Is this a mobile, home, or work phone? ");
      String type = stdin.readLine();
      if (type.equals("mobile")) {
        phoneNumber.setType(Person.PhoneType.MOBILE);
      } else if (type.equals("home")) {
        phoneNumber.setType(Person.PhoneType.HOME);
      } else if (type.equals("work")) {
        phoneNumber.setType(Person.PhoneType.WORK);
      } else {
        stdout.println("Unknown phone type.  Using default.");
      }

      person.addPhone(phoneNumber);
    }

    return person.build();
  }

  // Main function:  Reads the entire address book from a file,
  //   adds one person based on user input, then writes it back out to the same
  //   file.
  public static void main(String[] args) throws Exception {
    if (args.length != 1) {
      System.err.println("Usage:  AddPerson ADDRESS_BOOK_FILE");
      System.exit(-1);
    }

    AddressBook.Builder addressBook = AddressBook.newBuilder();

    // Read the existing address book.
    try {
      addressBook.mergeFrom(new FileInputStream(args[0]));
    } catch (FileNotFoundException e) {
      System.out.println(args[0] + ": File not found.  Creating a new file.");
    }

    // Add an address.
    addressBook.addPerson(
      PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),
                       System.out));

    // Write the new address book back to disk.
    FileOutputStream output = new FileOutputStream(args[0]);
    addressBook.build().writeTo(output);
    output.close();
  }
}

View Code

Reading A Message

import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintStream;

class ListPeople {
  // Iterates though all people in the AddressBook and prints info about them.
  static void Print(AddressBook addressBook) {
    for (Person person: addressBook.getPersonList()) {
      System.out.println("Person ID: " + person.getId());
      System.out.println("  Name: " + person.getName());
      if (person.hasEmail()) {
        System.out.println("  E-mail address: " + person.getEmail());
      }

      for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {
        switch (phoneNumber.getType()) {
          case MOBILE:
            System.out.print("  Mobile phone #: ");
            break;
          case HOME:
            System.out.print("  Home phone #: ");
            break;
          case WORK:
            System.out.print("  Work phone #: ");
            break;
        }
        System.out.println(phoneNumber.getNumber());
      }
    }
  }

  // Main function:  Reads the entire address book from a file and prints all
  //   the information inside.
  public static void main(String[] args) throws Exception {
    if (args.length != 1) {
      System.err.println("Usage:  ListPeople ADDRESS_BOOK_FILE");
      System.exit(-1);
    }

    // Read the existing address book.
    AddressBook addressBook =
      AddressBook.parseFrom(new FileInputStream(args[0]));

    Print(addressBook);
  }
}

View Code

擴充套件協議

實際使用過程中，.proto檔案可能經常需要進行擴充套件，協議擴充套件就需要考慮相容性的問題， Protocol Buffers有良好的擴充套件性，只要遵守一些規則：

不能修改現有欄位的tag number；
不能新增和刪除required欄位；
可以刪除optional和repeated欄位；
可以新增optional和repeated欄位，但是必須使用新的tag number。

向前相容（老程式碼處理新訊息）：老的程式碼會忽視新的欄位，刪除的option欄位會取預設值，repeated欄位會是空集合。

向後相容（新程式碼處理老訊息）：對新的程式碼來說可以透明的處理老的訊息，但是需要謹記新增的欄位在老訊息中是沒有的，所以需要顯示的通過has_方法判斷是否設定，或者在新的.proto中給新增的欄位設定合理的預設值，對於可選欄位來說如果.proto中沒有設定預設值那麼會使用型別的預設值，字串為空字串，數值型為0，布林型為false。

注意對於新增的repeated欄位來說因為沒有has_方法，所以如果為空的話是無法判斷到底是新程式碼設定的還是老程式碼生成的原因。

建議欄位都設定為optional，這樣擴充套件性是最強的。

編碼

英文好的可以直接看官方文件，但我覺得部落格園上這篇文章說的更清楚點。

總的來說Protocol Buffers的編碼的優點是非常緊湊、高效，佔用空間很小，解析很快，非常適合移動端。缺點是不含有型別資訊，不能自描述（使用一些技巧也可以實現），解析必須依賴.proto檔案。

Google把PB的這種編碼格式叫做wire-format。

PB的緊湊得益於Varint這種可變長度的整型編碼設計。

對比XML 和 JSON

資料大小

我們來簡單對比下Protocol Buffer和XML、JSON。

.proto

message Request {
  repeated string str = 1;
  repeated int32 a = 2;
}

JavaBean

public class Request {
    public List<String> strList;
    public List<Integer> iList;
}

首先我們來對比生成資料大小。測試程式碼很簡單，如下：

public static void main(String[] args) throws Exception {
    int n = 5;
    String str = "testtesttesttesttesttesttesttest";
    int val = 100;
    for (int i = 1; i <=n; i++) {
        for (int j = 0; j < i; j++) {
            str += str;
        }
        protobuf(i, (int) Math.pow(val, i), str);
        serialize(i, (int) Math.pow(val, i), str);
        System.out.println();
    }
}

public static void protobuf(int n, int in, String str) {
    RequestProto.Request.Builder req = RequestProto.Request.newBuilder();

    List<Integer> alist = new ArrayList<Integer>();
    for (int i = 0; i < n; i++) {
        alist.add(in);
    }
    req.addAllA(alist);

    List<String> strList = new ArrayList<String>();
    for (int i = 0; i < n; i++) {
        strList.add(str);
    }
    req.addAllStr(strList);

    // System.out.println(req.build());
    byte[] data = req.build().toByteArray();
    System.out.println("protobuf size:" + data.length);
}

public static void serialize(int n, int in, String str) throws Exception {
    Request req = new Request();

    List<String> strList = new ArrayList<String>();
    for (int i = 0; i < n; i++) {
        strList.add(str);
    }
    req.strList = strList;

    List<Integer> iList = new ArrayList<Integer>();

    for (int i = 0; i < n; i++) {
        iList.add(in);
    }
    req.iList = iList;

    String xml = SerializationInstance.sharedInstance().simpleToXml(req);
    // System.out.println(xml);
    System.out.println("xml size:" + xml.getBytes().length);

    String json = SerializationInstance.sharedInstance().fastToJson(req);
    // System.out.println(json);
    System.out.println("json size:" + json.getBytes().length);
}

View Code

隨著n的增大，int型別數值越大，string型別的值也越大。我們先將str置為空：

還原str值，將val置為1：

可以看到對於int型的欄位protobuf比xml和json的都要小不少，尤其是xml，這得益於它的Varint編碼。對於string型別的話，隨著字串內容越多，三者之間基本就沒有差距了。

針對序列話和解析（反序列化）的效能，選了幾個我們專案中比較常用的方案和Protocol Buffer做了下對比，只是簡單的基準測試（用的是bb.jar）結果如下：

序列化效能

可以看到資料量較小的情況下，protobuf要比一般的xml，json序列化快1-2個數量級，fastjson已經很快了，但是protobuf比它還是要快不少。

解析效能

protobuf解析的效能比一般的xml,json反序列化要快2-3個數量級，比fastjson也要快1個數量級左右。

Google Protocol Buffer 簡單介紹

為什麼使用 Protocol Buffers

.proto檔案

Protocol Buffers 語法

編譯.proto檔案

Protocol Buffers API

列舉和巢狀類

Builders vs. Messages

解析和序列化

Writing A Message

Reading A Message

擴充套件協議

編碼

對比XML 和 JSON

資料大小

序列化效能

解析效能

Google Protocol Buffer 簡單介紹

【神經網路與深度學習】Google Protocol Buffer介紹

Google Protocol Buffer入門

Google Protocol Buffer

windows下Google Protocol Buffer 編譯安裝使用教程

Google Protocol Buffer 的使用和原理

C++ Class Mapped Google Protocol Buffer Message

Android 使用Google Protocol buffer協議

從環境搭建開始學習使用Google Protocol Buffer和gRPC

Google Protocol Buffer 的使用(一)

Google Protocol Buffer 的使用(二) Google Protocol Buffer 的使用(一)

Google Protocol Buffer序列化入門實戰（附原始碼）

eclipse4.4的google protocol buffer的proto檔案編輯器Protocol Buffer Editor安裝

Google Protocol Buffer 傳輸資料相對其他格式較短的原理

【小松教你手遊開發】【unity實用技能】Google Protocol Buffer（protobuf）使用和研究

Google protocol buffer 的反射機制和應用

關於google protocol buffer（PB）的優缺點和一些個人的理解

Google protocol buffer檔案的原理和使用

ProtoBuf開發者指南大全（Google Protocol Buffer協議）

Google Protocol Buffer專案無法載入解決方案

Google Protocol Buffer 簡單介紹

為什麼使用 Protocol Buffers

.proto檔案

Protocol Buffers 語法

編譯.proto檔案

Protocol Buffers API

列舉和巢狀類

Builders vs. Messages

解析和序列化

Writing A Message

Reading A Message

擴充套件協議

編碼

對比XML 和 JSON

資料大小

序列化效能

解析效能

相關推薦