1. 程式人生 > >Leveldb資料Compaction原始碼分析(1)

Leveldb資料Compaction原始碼分析(1)

Leveldb資料Compaction原始碼分析(1)

這一節來講Leveldb的資料壓縮過程,上一節講了Leveldb的資料尋找過程,文章地址為:但是最後在講Leveldb中的Leveln的層級尋找時,我想應該是有沒有看懂的,直接二分法找到sstable,然後載入快取就能找到檔案,看原始碼或許有些疑惑,但是這個是和Leveldb的資料壓縮過程是有關的,這節就來講Leveldb的資料壓縮過程。

Compaction


Leveldb中有兩種資料壓縮模式,分為minor Compaction和major Compaction,minor就是把memtable中的資料匯出到sstable中,而major過程則要合併不同的sstable,這個過程比較複雜,在後續原始碼中講解,先說minor compaction。

minor Compaction


minor compaction就是當記憶體中的memtable大小達到一定值時將資料儲存到sstable檔案中..首先看資料壓縮原始碼,為:

private void maybeScheduleCompaction()
    {
        checkState(mutex.isHeldByCurrentThread());

        if (backgroundCompaction != null) {
            // Already scheduled
        }
        else
if (shuttingDown.get()) { // DB is being shutdown; no more background compactions } else if (immutableMemTable == null && manualCompaction == null && !versions.needsCompaction()) { // No work to be done } else
{ backgroundCompaction = compactionExecutor.submit(new Callable<Void>() { @Override public Void call() throws Exception { try { backgroundCall(); } catch (DatabaseShutdownException ignored) { } catch (Throwable e) { backgroundException = e; } return null; } }); } }

這裡啟動了一個執行緒不斷的進行壓縮的方法,我們轉到backgroundCall()方法,

 mutex.lock();
        try {
            if (backgroundCompaction == null) {
                return;
            }

            try {
                if (!shuttingDown.get()) {
                    backgroundCompaction();
                }
            }
            finally {
                backgroundCompaction = null;
            }

轉到backgroundCompaction()方法,如下:

        private void backgroundCompaction()
            throws IOException
    {
        checkState(mutex.isHeldByCurrentThread());

        compactMemTableInternal();

        Compaction compaction;
        if (manualCompaction != null) {
            compaction = versions.compactRange(manualCompaction.level,
                    new InternalKey(manualCompaction.begin, MAX_SEQUENCE_NUMBER, VALUE),
                    new InternalKey(manualCompaction.end, 0, DELETION));
        }
        else {
            compaction = versions.pickCompaction();
        }
        }

我們看compactMemTableInternal()方法,這個主要就是minor compaction。

private void compactMemTableInternal()
            throws IOException
    {
        checkState(mutex.isHeldByCurrentThread());
        if (immutableMemTable == null) {
            return;
        }

        try {
            // Save the contents of the memtable as a new Table
            VersionEdit edit = new VersionEdit();
            Version base = versions.getCurrent();
            writeLevel0Table(immutableMemTable, edit, base);

            if (shuttingDown.get()) {
                throw new DatabaseShutdownException("Database shutdown during memtable compaction");
            }

            // Replace immutable memtable with the generated Table
            edit.setPreviousLogNumber(0);
            edit.setLogNumber(log.getFileNumber());  // Earlier logs no longer needed
            versions.logAndApply(edit);

            immutableMemTable = null;

            deleteObsoleteFiles();
        }
        finally {
            backgroundCondition.signalAll();
        }
    }

首先判斷immutableMemTable是否為null,為null則直接返回,這種情況一般是Leveldb剛剛被例項化的時候,immutableMemTable這個是否沒有寫入資料,接下來就是方法writeLevel0Table(),原始碼為:

private void writeLevel0Table(MemTable mem, VersionEdit edit, Version base)
            throws IOException
    {
        checkState(mutex.isHeldByCurrentThread());

        // skip empty mem table
        if (mem.isEmpty()) {
            return;
        }

        // write the memtable to a new sstable
        long fileNumber = versions.getNextFileNumber();
        pendingOutputs.add(fileNumber);
        mutex.unlock();
        FileMetaData meta;
        try {
            meta = buildTable(mem, fileNumber);
        }
        finally {
            mutex.lock();
        }
        pendingOutputs.remove(fileNumber);

        // Note that if file size is zero, the file has been deleted and
        // should not be added to the manifest.
        int level = 0;
        if (meta != null && meta.getFileSize() > 0) {
            Slice minUserKey = meta.getSmallest().getUserKey();
            Slice maxUserKey = meta.getLargest().getUserKey();
            if (base != null) {
                level = base.pickLevelForMemTableOutput(minUserKey, maxUserKey);
            }
            edit.addFile(level, meta);
        }
    }

取出當前儲存的下一個檔案編號,將mem中的資料儲存到檔案中,同時返回檔案元資料meta物件,meta儲存在version中,方便查詢資料,我們再看buildTable過程:

private FileMetaData buildTable(SeekingIterable<InternalKey, Slice> data, long fileNumber)
            throws IOException
    {
        File file = new File(databaseDir, Filename.tableFileName(fileNumber));
        try {
            InternalKey smallest = null;
            InternalKey largest = null;
            FileChannel channel = new FileOutputStream(file).getChannel();
            try {
                TableBuilder tableBuilder = new TableBuilder(options, channel, new InternalUserComparator(internalKeyComparator));

                for (Entry<InternalKey, Slice> entry : data) {
                    // update keys
                    InternalKey key = entry.getKey();
                    if (smallest == null) {
                        smallest = key;
                    }
                    largest = key;

                    tableBuilder.add(key.encode(), entry.getValue());
                }

                tableBuilder.finish();
            }
            finally {
                try {
                    channel.force(true);
                }
                finally {
                    channel.close();
                }
            }

            if (smallest == null) {
                return null;
            }
            FileMetaData fileMetaData = new FileMetaData(fileNumber, file.length(), smallest, largest);

            // verify table can be opened
            tableCache.newIterator(fileMetaData);

            pendingOutputs.remove(fileNumber);

            return fileMetaData;

        }
        catch (IOException e) {
            file.delete();
            throw e;
        }
    }

這個方法就是將memtable中的內容寫入到檔案中,不進行任務檔案或者資料的壓縮,同時組裝當前檔案的元資料並返回當前的元資料。

從這就能知道先前為什麼level0的查詢為什麼需要每一個檔案進行排序和根據最新的編輯時間進行查詢,因為minor compaction過一段時間就會進行一次,同時不做任何去重的操作,因此多個檔案之間多半有一些key都是重複的,需要找到最新更新過的key。


Major compaction的過程比較長,這一節就不再講述了,放到下一節再說。