C++ 記憶體資料結構與二進位制檔案之間的序列化和反序列化
應用場景
許多後端檢索server啟動時候需要從檔案載入到記憶體中構建索引,這個過程往往會消耗比較多的時間,這樣會造成sever啟動消耗比較多的時間,在存在多臺伺服器的時候會更加明顯。
我們可以將夠構建索引的過程獨立成一個單獨的程序,此程序實現的功能是根據原始檔案構建索引結構,並將索引結構序列化到本地二進位制檔案,Server在啟動的時候只需要讀取二進位制檔案就可以構造出索引結構,可以大大提高啟動速度。
示例程式碼
io.hpp ,對std::ifstream 以及std::ofstream 的封裝,提供從vector序列化到二進位制檔案和從二進位制檔案反序列化到vector等介面
#ifndef IO_HPP
#define IO_HPP
#include <string>
#include <vector>
#include <fstream>
class FileReader
{
public:
FileReader(const std::string& filename)
: input_stream(filename,std::ios::binary)
{
}
/* Read count objects of type T into pointer dest */
template <typename T> void ReadInto(T *dest, const std::size_t count)
{
static_assert(std::is_trivially_copyable<T>::value,
"bytewise reading requires trivially copyable type");
if (count == 0)
return;
const auto &result = input_stream.read(reinterpret_cast <char *>(dest), count * sizeof(T));
const std::size_t bytes_read = input_stream.gcount();
if (bytes_read != count * sizeof(T) && !result)
{
return;
}
}
template <typename T> void ReadInto(std::vector<T> &target)
{
ReadInto(target.data(), target.size());
}
template <typename T> void ReadInto(T &target)
{
ReadInto(&target, 1);
}
template <typename T> T ReadOne()
{
T tmp;
ReadInto(tmp);
return tmp;
}
std::uint32_t ReadElementCount32()
{
return ReadOne<std::uint32_t>();
}
std::uint64_t ReadElementCount64()
{
return ReadOne<std::uint64_t>();
}
template <typename T> void DeserializeVector(std::vector<T> &data)
{
const auto count = ReadElementCount64();
data.resize(count);
ReadInto(data.data(), count);
}
private:
std::ifstream input_stream;
};
class FileWriter
{
public:
FileWriter(const std::string& filename)
: output_stream(filename,std::ios::binary)
{
}
/* Write count objects of type T from pointer src to output stream */
template <typename T> void WriteFrom(const T *src, const std::size_t count)
{
static_assert(std::is_trivially_copyable<T>::value,
"bytewise writing requires trivially copyable type");
if (count == 0)
return;
const auto &result =
output_stream.write(reinterpret_cast<const char *>(src), count * sizeof(T));
}
template <typename T> void WriteFrom(const T &target)
{
WriteFrom(&target, 1);
}
template <typename T> void WriteOne(const T tmp)
{
WriteFrom(tmp);
}
void WriteElementCount32(const std::uint32_t count)
{
WriteOne<std::uint32_t>(count);
}
void WriteElementCount64(const std::uint64_t count)
{
WriteOne<std::uint64_t>(count);
}
template <typename T> void SerializeVector(const std::vector<T> &data)
{
const auto count = data.size();
WriteElementCount64(count);
return WriteFrom(data.data(), count);
}
private:
std::ofstream output_stream;
};
#endif
binary_io.cpp
#include "io.hpp"
#include <iostream>
struct Data
{
int a;
double b;
friend std::ostream& operator<<(std::ostream& out,const Data& data)
{
out << data.a << "," << data.b;
return out;
}
};
template<typename T>
void printData(const std::vector<T>& data_vec)
{
for (const auto data : data_vec)
{
std::cout << "{" << data << "} ";
}
std::cout << std::endl;
}
template<typename T>
void serializeVector(const std::string& filename,const std::vector<T>& data_vec)
{
FileWriter file_writer(filename);
file_writer.SerializeVector<T>(data_vec);
}
template<typename T>
void deserializeVector(const std::string& filename,std::vector<T>& data_vec)
{
FileReader file_reader(filename);
file_reader.DeserializeVector<T>(data_vec);
}
int main()
{
std::vector<Data> vec1 = {{1,1.1},{2,2.2},{3,3.3},{4,4.4}};
std::cout << "before write to binary file.\n";
printData(vec1);
const std::string filename = "vector_data";
std::cout << "serialize vector to binary file.\n";
serializeVector<Data>(filename,vec1);
std::vector<Data> vec2;
deserializeVector<Data>(filename,vec2);
std::cout << "vector read from binary file.\n";
printData(vec2);
return 0;
}
編譯程式碼
g++ -std=c++11 binary_io.cpp -o binary_io
執行程式
./binary_io
執行結果
程式將記憶體中vector 資料寫入二進位制檔案,並從二進位制檔案中反序列化到一個新的vector。可以看到序列化前和序列化後的結果一致。
注意
序列化到檔案的資料結構需要滿足 is_trivially_copyable。std::is_trivially_copyable 在c++11 引入,TriviallyCopyable型別物件有以下性質
每個拷貝建構函式是trivial 或者是deleted
每個移動建構函式是trivial 或者是deleted
每個拷貝賦值運算子是trivial 或者是deleted
每個移動賦值運算子是trivial 或者是deleted
以上至少有一個是non-deleted
解構函式是trivial 並且non-deleted
對於is_trivially_copyable 型別物件的性質,解釋如下
Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read(). In general, a trivially copyable type is any type for which the underlying bytes can be copied to an array of char or unsigned char and into a new object of the same type, and the resulting object would have the same value as the original
只有滿足trivially-copyable的物件才可以保證序列化到二進位制檔案後, 從二進位制檔案反序列化到記憶體後的值保持不變。