mongodb原始碼分析(五)查詢2之mongod的資料庫載入

阿新 • • 發佈：2019-01-23

上一篇文章分析到了客戶端查詢請求的傳送，接著分析服務端的處理動作，分析從服務端響應開始到資料庫

正確載入止,主要流程為資料庫的讀入過程與使用者的認證.

mongod服務對於客戶端請求的處理在mongo/db/db.cpp MyMessageHandler::process中，其中呼叫了

函式assembleResponse完成請求響應,我們就從這個函式開始入手分析,程式碼很長,刪除一些支流或者不相關的程式碼.

    void assembleResponse( Message &m, DbResponse &dbresponse, const HostAndPort& remote ) {
        if ( op == dbQuery ) {
            if( strstr(ns, ".$cmd") ) {
                isCommand = true;
                opwrite(m);//寫入診斷用的log,預設loglevel為0,未開啟,需要開啟啟動時加入--diaglog x,0 = off; 1 = writes, 2 = reads, 3 = both
                if( strstr(ns, ".$cmd.sys.") ) {//7 = log a few reads, and all writes.
                    if( strstr(ns, "$cmd.sys.inprog") ) {
                        inProgCmd(m, dbresponse);//檢視當前進度的命令
                        return;
                    }
                    if( strstr(ns, "$cmd.sys.killop") ) {
                        killOp(m, dbresponse);//終止當前操作
                        return;
                    }
                    if( strstr(ns, "$cmd.sys.unlock") ) {
                        unlockFsync(ns, m, dbresponse);
                        return;
                    }
                }
            }
            else {
                opread(m);
            }
        }
        else if( op == dbGetMore ) {
            opread(m);
        }
        else {
            opwrite(m);
        }
        long long logThreshold = cmdLine.slowMS;//啟動的時候設定的引數預設是100ms,當操作超過了這個時間且啟動時設定--profile為1或者2
        bool shouldLog = logLevel >= 1;//時mongodb將記錄這次慢操作,1為只記錄慢操作,即操作時間大於了設定的slowMS,2表示記錄所有操作
        if ( op == dbQuery ) {         //可通過--slowms設定slowMS
            if ( handlePossibleShardedMessage( m , &dbresponse ) )//這裡和shard有關,以後會的文章會講到
                return;
            receivedQuery(c , dbresponse, m );//真正的查詢入口
        }
        else if ( op == dbGetMore ) {//已經查詢了資料,這裡只是執行得到更多資料的入口
            if ( ! receivedGetMore(dbresponse, m, currentOp) )
                shouldLog = true;
        }
                if ( op == dbKillCursors ) {
                    currentOp.ensureStarted();
                    logThreshold = 10;
                    receivedKillCursors(m);
                }
                else if ( op == dbInsert ) {//插入操作入口
                    receivedInsert(m, currentOp);
                }
                else if ( op == dbUpdate ) {//更新操作入口
                    receivedUpdate(m, currentOp);
                }
                else if ( op == dbDelete ) {//刪除操作入口
                    receivedDelete(m, currentOp);
                }
        if ( currentOp.shouldDBProfile( debug.executionTime ) ) {//該操作將被記錄,原因可能有二:一,啟動時設定--profile 2,則所有操作將被
            // performance profiling is on                    //記錄.二,啟動時設定--profile 1,且操作時間超過了預設的slowMs,那麼操作將被            else {//這個地方if部分被刪除了,就是在不能獲取鎖的狀況下不記錄該操作的程式碼
                Lock::DBWrite lk( currentOp.getNS() );//記錄具體記錄操作,就是在xxx.system.profile集合中插入該操作的具體記錄
                if ( dbHolder()._isLoaded( nsToDatabase( currentOp.getNS() ) , dbpath ) ) {
                    Client::Context cx( currentOp.getNS(), dbpath, false );
                    profile(c , currentOp );
                }
            }
        }

前進到receivedQuery,其解析了接收到的資料,然後呼叫runQuery負責處理查詢,然後出來runQuery丟擲的異常,直接進入runQuery.

    string runQuery(Message& m, QueryMessage& q, CurOp& curop, Message &result) {        
	shared_ptr<ParsedQuery> pq_shared( new ParsedQuery(q) );
        if ( pq.couldBeCommand() ) {//這裡表明這是一個命令,關於mongodb的命令的講解這裡有一篇文章,我就不再分析了.
            BSONObjBuilder cmdResBuf;// 
http://www.cnblogs.com/daizhj/archive/2011/04/29/mongos_command_source_code.html
            if ( runCommands(ns, jsobj, curop, bb, cmdResBuf, false, queryOptions) ){}
			
        bool explain = pq.isExplain();//這裡的explain來自這裡db.coll.find().explain(),若使用了.explain()則為true,否則false
        BSONObj order = pq.getOrder();
        BSONObj query = pq.getFilter();
        // Run a simple id query.
        if ( ! (explain || pq.showDiskLoc()) && isSimpleIdQuery( query ) && !pq.hasOption( QueryOption_CursorTailable ) ) {
            if ( queryIdHack( ns, query, pq, curop, result ) ) {//id查詢的優化
                return "";
            }
        }
        bool hasRetried = false;
        while ( 1 ) {//這裡的ReadContext這這篇文章的主角,其內部在第一次鎖資料庫時完成了資料庫的載入動作
                Client::ReadContext ctx( ns , dbpath ); // read locks
                replVerifyReadsOk(&pq);//還記得replset模式中無法查詢secondary伺服器嗎,就是在這裡限制的
                BSONObj oldPlan;
                if ( ! hasRetried && explain && ! pq.hasIndexSpecifier() ) {
                    scoped_ptr<MultiPlanScanner> mps( MultiPlanScanner::make( ns, query, order ) );
                    oldPlan = mps->cachedPlanExplainSummary();
                }//這裡才是真正的查詢,其內部很複雜,下一篇文章將講到
                return queryWithQueryOptimizer( queryOptions, ns, jsobj, curop, query, order,
                                                pq_shared, oldPlan, shardingVersionAtStart, 
                                                pgfs, npfe, result );
            }
        }
    }

Client::ReadContext::ReadContext(const string& ns, string path, bool doauth ) {
        {
            lk.reset( new Lock::DBRead(ns) );//資料庫鎖,這裡mongodb的鎖機制本文將不會涉及到,感興趣的自己分析
            Database *db = dbHolder().get(ns, path);
            if( db ) {//第一次載入時顯然為空
                c.reset( new Context(path, ns, db, doauth) );
                return;
            }
        }
        if( Lock::isW() ) { //全域性的寫鎖
			// write locked already
                DEV RARELY log() << "write locked on ReadContext construction " << ns << endl;
                c.reset( new Context(ns, path, doauth) );
            }
        else if( !Lock::nested() ) { 
            lk.reset(0);
            {
                Lock::GlobalWrite w;//加入全域性的寫鎖,這裡是真正的資料庫載入地點
                Context c(ns, path, doauth);
            }
            // db could be closed at this interim point -- that is ok, we will throw, and don't mind throwing.
            lk.reset( new Lock::DBRead(ns) );
            c.reset( new Context(ns, path, doauth) );
        }
    }

    Client::Context::Context(const string& ns, string path , bool doauth, bool doVersion ) :
        _client( currentClient.get() ), 
        _oldContext( _client->_context ),
        _path( path ), 
        _justCreated(false), // set for real in finishInit
        _doVersion(doVersion),
        _ns( ns ), 
        _db(0) 
    {
        _finishInit( doauth );
    }

繼續看_finishInit函式:

    void Client::Context::_finishInit( bool doauth ) {
        _db = dbHolderUnchecked().getOrCreate( _ns , _path , _justCreated );//讀取或者建立資料庫
        checkNsAccess( doauth, writeLocked ? 1 : 0 );//認證檢查
    }

    Database* DatabaseHolder::getOrCreate( const string& ns , const string& path , bool& justCreated ) {
        string dbname = _todb( ns );//將test.coll這種型別的字串轉換為test
        {
            SimpleMutex::scoped_lock lk(_m);
            Lock::assertAtLeastReadLocked(ns);
            DBs& m = _paths[path];//在配置的路徑中找到已經載入的資料庫,直接返回
            {
                DBs::iterator i = m.find(dbname); 
                if( i != m.end() ) {
                    justCreated = false;
                    return i->second;
                }
            }
        Database *db = new Database( dbname.c_str() , justCreated , path );//實際的資料讀取
        {
            SimpleMutex::scoped_lock lk(_m);//資料庫載入完成後按照路徑資料庫記錄
            DBs& m = _paths[path];
            verify( m[dbname] == 0 );
            m[dbname] = db;
            _size++;
        }
        return db;
    }

    Database::Database(const char *nm, bool& newDb, const string& _path )
        : name(nm), path(_path), namespaceIndex( path, name ),
          profileName(name + ".system.profile")
    {
        try {
            newDb = namespaceIndex.exists();//檢視xxx.ns檔案是否儲存,存在表示資料庫已經建立
            // If already exists, open.  Otherwise behave as if empty until
            // there's a write, then open.
            if (!newDb) {
                namespaceIndex.init();//載入具體的xxx.ns檔案
                if( _openAllFiles )
                    openAllFiles();//載入所有的資料檔案xxx.0,xxx.1,xxx.2這種型別的檔案
            }
            magic = 781231;
    }

繼續看namespaceIndex::init函式,若其未初始化則呼叫_init初始化,初始化了則什麼也不做,直接去到namespaceIndex::_init

    NOINLINE_DECL void NamespaceIndex::_init() {
        unsigned long long len = 0;
        boost::filesystem::path nsPath = path();//xxx.ns
        string pathString = nsPath.string();
        void *p = 0;
        if( boost::filesystem::exists(nsPath) ) {//如果存在該檔案,則使用記憶體對映檔案map該檔案
            if( f.open(pathString, true) ) {//這裡f為MongoMMF物件
                len = f.length();
                if ( len % (1024*1024) != 0 ) {
                    log() << "bad .ns file: " << pathString << endl;
                    uassert( 10079 ,  "bad .ns file length, cannot open database", len % (1024*1024) == 0 );
                }
                p = f.getView();//這裡得到map的檔案的指標
            }
        }
        else {
            // use lenForNewNsFiles, we are making a new database
            massert( 10343, "bad lenForNewNsFiles", lenForNewNsFiles >= 1024*1024 );
            maybeMkdir();
            unsigned long long l = lenForNewNsFiles;//建立具體的ns檔案,預設大小是16M,可以用--nssize 來設定大小,MB為單位,只對新建立的資料庫
            if( f.create(pathString, l, true) ) {   //起作用
                getDur().createdFile(pathString, l); // always a new file
                len = l;
                verify( len == lenForNewNsFiles );
                p = f.getView();
            }
        }
        verify( len <= 0x7fffffff );
        ht = new HashTable<Namespace,NamespaceDetails>(p, (int) len, "namespace index");
        if( checkNsFilesOnLoad )
            ht->iterAll(namespaceOnLoadCallback);
    }

繼續看MongoMMF::open流程:

    bool MongoMMF::open(string fname, bool sequentialHint) {
        LOG(3) << "mmf open " << fname << endl;
        setPath(fname);
        _view_write = mapWithOptions(fname.c_str(), sequentialHint ? SEQUENTIAL : 0);//這裡是真正的對映,
        return finishOpening();
    }

    bool MongoMMF::finishOpening() {
        if( _view_write ) {
            if( cmdLine.dur ) {//開啟了journal功能後建立一個私有的map,這個日誌功能我將以後專門寫一篇文章分析.
                _view_private = createPrivateMap();
                if( _view_private == 0 ) {
                    msgasserted(13636, str::stream() << "file " << filename() << " open/create failed in createPrivateMap (look in log for more information)");
                }
                privateViews.add(_view_private, this); // note that testIntent builds use this, even though it points to view_write then...
            }
            else {
                _view_private = _view_write;
            }
            return true;
        }
        return false;
    }

回到namespaceIndex::_init函式:

        ht = new HashTable<Namespace,NamespaceDetails>(p, (int) len, "namespace index");

這裡有必要關注下NamespaceDetails結構,每一個集合對應於一個NamespaceDetails結構,該結構作用如下(來自NamespaceDetails結構的上的描述)

NamespaceDetails : this is the "header" for a collection that has all its details.
It's in the .ns file and this is a memory mapped region (thus the pack pragma above).

    class NamespaceDetails {
    public:
        enum { NIndexesMax = 64, NIndexesExtra = 30, NIndexesBase  = 10 };
        /*-------- data fields, as present on disk : */
        DiskLoc firstExtent;//記錄第一個extent,在分析資料的插入時會具體討論mongodb的儲存
        DiskLoc lastExtent;//記錄的最後一個extent
        /* NOTE: capped collections v1 override the meaning of deletedList.
                 deletedList[0] points to a list of free records (DeletedRecord's) for all extents in
                 the capped namespace.
                 deletedList[1] points to the last record in the prev extent.  When the "current extent"
                 changes, this value is updated.  !deletedList[1].isValid() when this value is not
                 yet computed.
        */
        DiskLoc deletedList[Buckets];
        // ofs 168 (8 byte aligned)
        struct Stats {
            // datasize and nrecords MUST Be adjacent code assumes!
            long long datasize; // this includes padding, but not record headers
            long long nrecords;
        } stats;
        int lastExtentSize;
        int nIndexes;
    private:
        // ofs 192
        IndexDetails _indexes[NIndexesBase];//10個索引儲存到這裡,若1個集合索引超過10其它的索引以extra的形式存在,extra地址儲存在下面的
        // ofs 352 (16 byte aligned)        //extraOffset處
        int _isCapped;                         // there is wasted space here if I'm right (ERH)
        int _maxDocsInCapped;                  // max # of objects for a capped table.  TODO: should this be 64 bit?
        double _paddingFactor;                 // 1.0 = no padding.
        // ofs 386 (16)
        int _systemFlags; // things that the system sets/cares about
    public:
        DiskLoc capExtent;
        DiskLoc capFirstNewRecord;
        unsigned short dataFileVersion;       // NamespaceDetails version.  So we can do backward compatibility in the future. See filever.h
        unsigned short indexFileVersion;
        unsigned long long multiKeyIndexBits;
    private:
        // ofs 400 (16)
        unsigned long long reservedA;
        long long extraOffset;                // where the $extra info is located (bytes relative to this)
    public:
        int indexBuildInProgress;             // 1 if in prog
    private:
        int _userFlags;
        char reserved[72];
        /*-------- end data 496 bytes */
}

從這裡可以明白ns儲存了所有集合的頭資訊,其中包括了該集合的起始位置，結束位置以及索引所在.

_init函式執行完畢,網上回到Database::Database()函式:

                if( _openAllFiles )
                    openAllFiles();//這裡對映所有的xx.0,xx.1這種檔案,記錄對映的檔案,對映的方式如同對映xx.ns,在開啟了journal時同時儲存兩份地址.這裡不再分析,感興趣的自己研究吧

至此資料庫的對映工作完成.往上回到Client::Context::_finishInit函式,下面來看看許可權的檢查函式checkNsAccess,其最終呼叫了下面的函式,通過認證返回true,

未通過將返回false,返回false,將導致mongod向客戶端傳送未認證資訊,客戶端的操作請求失敗

    bool AuthenticationInfo::_isAuthorized(const string& dbname, Auth::Level level) const {
        if ( noauth ) {//啟動時可--noauth設定為true,--auth設定為false,預設為false
            return true;
        }
        {
            scoped_spinlock lk(_lock);
    //查詢dbname這個資料庫是否已經得到認證,這裡的認證資料是在mongo啟動時連線服務端認證通過後儲存的
            if ( _isAuthorizedSingle_inlock( dbname , level ) )
                return true;

            if ( _isAuthorizedSingle_inlock( "admin" , level ) )
                return true;

            if ( _isAuthorizedSingle_inlock( "local" , level ) )
                return true;
        }
        return _isAuthorizedSpecialChecks( dbname );//若未通過上面的認證將會檢視是否打開了_isLocalHostAndLocalHostIsAuthorizedForAll,也就是該連線是否是來自於本地連線.
    }

本文到這裡結束,主要是搞清楚了mongod接收到來自客戶端請求後的執行流程到資料庫的載入,重要的

是明白ns檔案的作用,普通資料檔案xx.0,xx.1的對映,下一篇文章我們將繼續分析查詢請求的處理.

作者: yhjj0108,楊浩

mongodb原始碼分析(五)查詢2之mongod的資料庫載入

上一篇文章分析到了客戶端查詢請求的傳送，接著分析服務端的處理動作，分析從服務端響應開始到資料庫正確載入止,主要流程為資料庫的讀入過程與使用者的認證. mongod服務對於客戶端請求的處理在mongo/db/db.cpp MyMessageH

mongodb原始碼分析(六)查詢3之mongod的cursor的產生

上一篇文章分析了mongod的資料庫載入部分,下面這一篇文章將繼續分析mongod cursor的產生,這裡cursor 的生成應該是mongodb系統中最複雜的部分.下面先介紹幾個關於mongodb的遊標概念. basicCursor: 直接掃描整個co

RabbitMQ客戶端原始碼分析(五)之ConsumerWorkSerivce與WorkPool

RabbitMQ-java-client版本 com.rabbitmq:amqp-client:4.3.0 RabbitMQ版本宣告: 3.6.15 WorkPool WorkPool可以認

spring4.2.9 java專案環境下ioc原始碼分析 (九）——refresh之postProcessBeanFactory方法

postProcessBeanFactory後處理beanFactory。時機是在所有的beanDenifition載入完成之後，bean例項化之前執行。比如，在beanfactory載入完成所有的bean後，想修改其中某個bean的定義，或者對beanFactory做一些其

mongodb原始碼分析(二)mongod的啟動

mongod是mongodb的儲存伺服器，其程式碼入口在mongo/db/db.cpp中,mongod的大部分程式碼都在mongo/db這個資料夾中。int main(int argc, char* argv[]) { int exitCode = mon

mongodb原始碼分析(十五)replication replset模式的初始化

相對於主從模式,replset模式複雜得多,其中的主從對應於這裡的primary,secondary概念,primary和 secondary之間可以切換,primary掉線後能夠自動的選取一個secondary成為新的primary,當然這裡也是有限

Mongodb原始碼分析--Replication之主從模式--Slave

在上文中介紹了主從(master-slave)模式下的一些基本概念及master的執行流程。今天接著介紹一下從(slave)結點是如何發起請求，並通過請求獲取的oplog資訊來構造本地資料的。不過開始今天的正文前，需要介紹一下mongodb在sla

Tomcat原始碼分析(五)--容器處理連線之servlet的對映

本文所要解決的問題：一個http請求過來，容器是怎麼知道選擇哪個具體servlet？我們知道，一個Context容器表示一個web應用，一個Wrapper容器表示一個servlet，所以上面的問題可以轉換為怎麼由Context容器選擇servlet，答案

JDK原始碼分析(五)——HashSet

目錄 HashSet概述內部欄位及構造方法儲存元素刪除元素包含元素總結 HashSet概述從前面開始，已經分析過集合中的List和Map，今天來介紹另一種集合元素:Set。這是JDK對HashSet的介紹： This class implements

NSQ原始碼分析(五）——Channel

Channel相關的程式碼主要位於nsqd/channel.go, nsqd/nsqd.go中。 Channel是消費者訂閱特定Topic的一種抽象。對於發往Topic的訊息，nsqd向該Topic下的所有Channel投遞訊息，而同一個Channel只投遞一次，Channel下如果

【kubernetes/k8s原始碼分析】 controller-manager之replicaset原始碼分析

ReplicaSet簡介 Kubernetes 中建議使用 ReplicaSet來取代 ReplicationController。ReplicaSet 跟 ReplicationController 沒有本質的不同， ReplicaSet 支援集合式的

用MongoDB profiler分析慢查詢

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

Android系統原始碼分析--View繪製流程之-setContentView

上一篇分析了四大元件之ContentProvider，這也是四大元件最後一個。因此，從這篇開始我們分析新的篇章--View繪製流程，View繪製流程在Android開發中佔有非常重要的位置，只要有檢視的顯示，都離不開View的繪製，所以瞭解View繪製原理對於應用開發以及系統的學習至關重要。由於View

ndroid系統原始碼分析--View繪製流程之-inflate

上一章我們分析了Activity啟動的時候呼叫setContentView載入佈局的過程，但是分析過程中我們留了兩個懸念，一個是將資原始檔中的layout中xml佈局檔案通過inflate載入到Activity中的過程，另一個是開始測量、佈局和繪製的過程，第二個我們放到measure過程中分析，這一篇先

tornado原始碼分析（二）之iostream

在事件驅動模型中，所有任務都是以某個事件的回撥函式的方式新增至事件迴圈中的，如：HTTPServer要從socket中讀取客戶端傳送的request訊息，就必須將該socket新增至ioloop中，並設定回掉函式，在回掉函式中從socket中讀取資料，並且檢查request訊息是否全部接收到了，如果

VScode原始碼分析：查詢服務可用的埠

基本資訊：分支： master； commitID： 3c4e9323；檔案路徑： src/vs/base/node/ports.ts 基本思路： 1、查詢方法返回一個Promise（結果為resolve） 2、與127.0.0.1逐個埠進行Socket連線：

Tomcat的原始碼分析(五)-Pipeline-value管道

一、Tomcat的Pipeline-value管道實現 Pipeline管道的實現分為生命週期管理和處理請求。在Engin的管道中依次執行Engin的各個Value,最後執行StandardEnginValue,依次類推StandardWrapperVa

spring原始碼分析五 bean的載入第二步

從parentBeanFactory中獲取，並且處理迴圈依賴的問題： org.springframework.beans.factory.support.AbstractBeanFactory.java中 doGetBean方法， @SuppressWarnin

OkHttp 原始碼分析(五)——ConnectInterceptor

0、前言在前面的文章中，我們分析了http的快取策略和Okhttp快取攔截器的快取機制，我們知道，在沒有快取命中的情況下，需要對網路資源進行請求，這時候攔截鏈就來到ConnectInterceptor。 ConnectInterceptor的主要作用是和伺服器建立連線，

nova原始碼分析--API（2）

/etc/nova/api-paste.ini檔案中的定義了以下三個composite： [composite:osapi_compute] use = call:nova.api.openstack.urlmap:urlmap_factory /: oscomputeve

mongodb原始碼分析(五)查詢2之mongod的資料庫載入

相關推薦