BSON能走多遠
程式碼:
BSON是在json基礎上一哥們提出的新的資料形式,它就是直接把一個物件轉化為二進位制數字來表示。。目前百度上基本上沒有資料,而google上它的資料也寥寥無幾。。新的東西出現後是否大家能接受確實需要一段時間的考驗。。小弟我不想去預測它的未來,但是我覺得提出這種形式本身就是一個進步。。。。相對大家現在都在使用的json格式post和get資料,如果你現在的web application或者是win app開始使用bson作為資料傳輸的方式,那麼相信你的應用程式安全性就比較高。。順便聊個題外話,我最近坐地鐵發現的一個現象。。
大家只要是坐公共交通工具都會有過這樣的經歷:在車門口旁邊站滿了人,後面上來的人還是會繼續擠在車門口,而不會費勁擠到車廂中間,所以車廂中間的人不會感覺到擠,甚至一個人有兩個人的位置。。。就像下面的這個圖:
在車廂中也形成了一個等壓線。。圖中綠色線。所以如果你以前坐車總是在紅色區域,並不是因為你不想去藍色區域而是你過不去,或者是因為你快要下車了,所以你在那裡擋住了別人去藍色區域的路線。實際上遠不是這樣。。。比如在公交車上我們有前門上車後門下車,但是如果很多人都是到終點站,你會發現大家還是都在前門那裡擠著。。。。不去多分析原因了,這雖然只是坐車,但是你試著去想想在中國經商是不是也這樣有多少人都是在從事紅色區域的行業或者某個行業的某一系列。
言歸正傳,說技術吧,呵呵。其實我們做技術也一樣,你要跳過那個等壓線去綠色區域,這樣你不僅過得舒服而且待遇還好。試著去多想想一定有好處的。
BSON的資料比較少。我給大家貼幾個我查到的資料吧。。
BSON [bee · sahn], short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more "schema-less" than Protocol Buffers, which can give it an advantage in flexibility but also a slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data).
BSON was designed to have the following three characteristics:
-
Lightweight
Keeping spatial overhead to a minimum is important for any data representation format, especially when used over the network.
-
Traversable
BSON is designed to be traversed easily. This is a vital property in its role as the primary data representation for MongoDB.
-
Efficient
Encoding data to BSON and decoding from BSON can be performed very quickly in most languages due to the use of C data types.
BSON is a binary-encoded serialization of JSON-like documents, which essentially means its an efficient way of transfering information. Part of my work on the MongoDB NoRM drivers, discussed in more details by Rob Conery, is to write an efficient and maintainable BSON serializer and deserializer. The goal of the serializer is that you give it a .NET object and you get a byte array out of it which represents valid BSON. The deserializer does the opposite - give it a byte array and out pops your object. Of course, there are limits to what they can do - they are mostly meant to be used against POCO/domain entities.
Grammar
The first thing to understand when building serializers is how to read grammar. In programming languages, grammar is a way to express the valid keywords and values a parser might run into. Both the JSON and BSON grammars are great to learn, given how simplistic yet powerful they are. The JSON grammar, available on the homepage of json.org gives a nice representation of what valid JSON should look like. The BSON grammar, available at bsonspec.org under the specification button, follows a more traditional dialect. Essentially, you have symbols on the left and expressions on the right. The expressions can, and often will be, made up of additional symbols and or actual values. Eventually though, you'll end up with a symbol which is only made up of values - which means you can stop going down the rabbit hole. Its also very common for a child symbol to reference a parent symbol - but eventually something breaks this cycle.
An Example
So, say we wanted to serialize the following json:
{"valid": true}
Everything in BSON starts with a document
. From the bson specification, we can see that a document
is made up of a 32bit integer (representing the total size of the document, including the integer itself), another symbol called an e_list
, and finally a termination character. As a start, we'd have something like:
Now, an e_list itself is made up of a symbol called an element
followed by another e_list
or an blank string. An element
is made up of a single byte type (with /x08
representing a boolean), a symbol called e_name
and a byte value for true or false. So now we have:
The only thing missing now is our e_name
(which represents the word "valid" in the original JSON). An e_name
is really just a cstring
which is our value UTF8 encoded into an array of bytes with a trailing byte of /x00
:
Our final byte array looks something like:
Serializing a single bool value might be the simplest of cases, but once you understand that, you're well on your way to being able to serialize anything. Sure, serializing an array might be a bit trickier, since each element within the array is its own document - but the challenge is mostly implementation versus conceptual.
Newtonjson現在已經有一個BSON的序列化和發序列化的庫大家可以使用。.net開發的。
我一會會把程式碼傳上去大家可以看看。
其實BSON與JSON相比,它對網速的要求是更低了,但是很多時候它的長度比json要更長。而且目前並沒有太多的js庫去序列化bson,它的未來還是個謎。。。
我自己琢磨著使用bson對於圖片(通過二進位制儲存)的ajax獲取會比較容易很多。目前我們都會在伺服器端解析,json不支援二進位制的傳輸。。它的優勢說不定有哪位高人會讓它顯示出來。