1. 程式人生 > >BSON能走多遠

BSON能走多遠

程式碼:

   BSON是在json基礎上一哥們提出的新的資料形式,它就是直接把一個物件轉化為二進位制數字來表示。。目前百度上基本上沒有資料,而google上它的資料也寥寥無幾。。新的東西出現後是否大家能接受確實需要一段時間的考驗。。小弟我不想去預測它的未來,但是我覺得提出這種形式本身就是一個進步。。。。相對大家現在都在使用的json格式post和get資料,如果你現在的web application或者是win app開始使用bson作為資料傳輸的方式,那麼相信你的應用程式安全性就比較高。。順便聊個題外話,我最近坐地鐵發現的一個現象。。

大家只要是坐公共交通工具都會有過這樣的經歷:在車門口旁邊站滿了人,後面上來的人還是會繼續擠在車門口,而不會費勁擠到車廂中間,所以車廂中間的人不會感覺到擠,甚至一個人有兩個人的位置。。。就像下面的這個圖:

在車廂中也形成了一個等壓線。。圖中綠色線。所以如果你以前坐車總是在紅色區域,並不是因為你不想去藍色區域而是你過不去,或者是因為你快要下車了,所以你在那裡擋住了別人去藍色區域的路線。實際上遠不是這樣。。。比如在公交車上我們有前門上車後門下車,但是如果很多人都是到終點站,你會發現大家還是都在前門那裡擠著。。。。不去多分析原因了,這雖然只是坐車,但是你試著去想想在中國經商是不是也這樣有多少人都是在從事紅色區域的行業或者某個行業的某一系列。

言歸正傳,說技術吧,呵呵。其實我們做技術也一樣,你要跳過那個等壓線去綠色區域,這樣你不僅過得舒服而且待遇還好。試著去多想想一定有好處的。

BSON的資料比較少。我給大家貼幾個我查到的資料吧。。

BSON [bee · sahn], short for Bin­ary JSON, is a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments. Like JSON, BSON sup­ports the em­bed­ding of doc­u­ments and ar­rays with­in oth­er doc­u­ments and ar­rays. BSON also con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type.

BSON can be com­pared to bin­ary inter­change for­mats, like Proto­col Buf­fers. BSON is more "schema-less" than Proto­col Buf­fers, which can give it an ad­vant­age in flex­ib­il­ity but also a slight dis­ad­vant­age in space ef­fi­ciency (BSON has over­head for field names with­in the seri­al­ized data).

BSON was de­signed to have the fol­low­ing three char­ac­ter­ist­ics:

  1. Lightweight

    Keep­ing spa­tial over­head to a min­im­um is im­port­ant for any data rep­res­ent­a­tion format, es­pe­cially when used over the net­work.

  2. Traversable

    BSON is de­signed to be tra­versed eas­ily. This is a vi­tal prop­erty in its role as the primary data rep­res­ent­a­tion for Mon­goDB.

  3. Efficient

    En­cod­ing data to BSON and de­cod­ing from BSON can be per­formed very quickly in most lan­guages due to the use of C data types.

BSON is a binary-encoded serialization of JSON-like documents, which essentially means its an efficient way of transfering information. Part of my work on the MongoDB NoRM drivers, discussed in more details by Rob Conery, is to write an efficient and maintainable BSON serializer and deserializer. The goal of the serializer is that you give it a .NET object and you get a byte array out of it which represents valid BSON. The deserializer does the opposite - give it a byte array and out pops your object. Of course, there are limits to what they can do - they are mostly meant to be used against POCO/domain entities.

Grammar
The first thing to understand when building serializers is how to read grammar. In programming languages, grammar is a way to express the valid keywords and values a parser might run into. Both the JSON and BSON grammars are great to learn, given how simplistic yet powerful they are. The JSON grammar, available on the homepage of json.org gives a nice representation of what valid JSON should look like. The BSON grammar, available at bsonspec.org under the specification button, follows a more traditional dialect. Essentially, you have symbols on the left and expressions on the right. The expressions can, and often will be, made up of additional symbols and or actual values. Eventually though, you'll end up with a symbol which is only made up of values - which means you can stop going down the rabbit hole. Its also very common for a child symbol to reference a parent symbol - but eventually something breaks this cycle.

An Example
So, say we wanted to serialize the following json:

{"valid": true}

Everything in BSON starts with a document. From the bson specification, we can see that a document is made up of a 32bit integer (representing the total size of the document, including the integer itself), another symbol called an e_list, and finally a termination character. As a start, we'd have something like:

Now, an e_list itself is made up of a symbol called an element followed by another e_list or an blank string. An element is made up of a single byte type (with /x08 representing a boolean), a symbol called e_name and a byte value for true or false. So now we have:

The only thing missing now is our e_name (which represents the word "valid" in the original JSON). An e_name is really just a cstring which is our value UTF8 encoded into an array of bytes with a trailing byte of /x00:

Our final byte array looks something like:

Serializing a single bool value might be the simplest of cases, but once you understand that, you're well on your way to being able to serialize anything. Sure, serializing an array might be a bit trickier, since each element within the array is its own document - but the challenge is mostly implementation versus conceptual.

Newtonjson現在已經有一個BSON的序列化和發序列化的庫大家可以使用。.net開發的。

我一會會把程式碼傳上去大家可以看看。

其實BSON與JSON相比,它對網速的要求是更低了,但是很多時候它的長度比json要更長。而且目前並沒有太多的js庫去序列化bson,它的未來還是個謎。。。

我自己琢磨著使用bson對於圖片(通過二進位制儲存)的ajax獲取會比較容易很多。目前我們都會在伺服器端解析,json不支援二進位制的傳輸。。它的優勢說不定有哪位高人會讓它顯示出來。