1. 程式人生 > >億級別記錄的mongodb分頁查詢java程式碼實現

億級別記錄的mongodb分頁查詢java程式碼實現

1.準備環境

  1.1 mongodb下載

  1.2 mongodb啟動

     C:\mongodb\bin\mongod --dbpath D:\mongodb\data

  1.3 視覺化mongo工具Robo 3T下載

2.準備資料

  

        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongo-java-driver</artifactId>
            <
version>3.6.1</version> </dependency>

java程式碼執行

    public static void main(String[] args) {

        try {

            /**** Connect to MongoDB ****/
            // Since 2.10.0, uses MongoClient
            MongoClient mongo = new MongoClient("localhost", 27017);

            
/**** Get database ****/ // if database doesn't exists, MongoDB will create it for you DB db = mongo.getDB("www"); /**** Get collection / table from 'testdb' ****/ // if collection doesn't exists, MongoDB will create it for you DBCollection table = db.getCollection("person");
/**** Insert ****/ // create a document to store key and value BasicDBObject document=null; for(int i=0;i<100000000;i++) { document = new BasicDBObject(); document.put("name", "mkyong"+i); document.put("age", 30); document.put("sex", "f"); table.insert(document); } /**** Done ****/ System.out.println("Done"); } catch (UnknownHostException e) { e.printStackTrace(); } catch (MongoException e) { e.printStackTrace(); } }

3.分頁查詢

 傳統的limit方式當資料量較大時查詢緩慢,不太適用。考慮別的方式,參考了logstash-input-mongodb的思路:

  public
  def get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)
    collection = mongodb.collection(mongo_collection_name)
    # Need to make this sort by date in object id then get the first of the series
    # db.events_20150320.find().limit(1).sort({ts:1})
    return collection.find({:_id => {:$gt => last_id_object}}).limit(batch_size)
  end

          collection_name = collection[:name]
          @logger.debug("collection_data is: #{@collection_data}")
          last_id = @collection_data[index][:last_id]
          #@logger.debug("last_id is #{last_id}", :index => index, :collection => collection_name)
          # get batch of events starting at the last_place if it is set


          last_id_object = last_id
          if since_type == 'id'
            last_id_object = BSON::ObjectId(last_id)
          elsif since_type == 'time'
            if last_id != ''
              last_id_object = Time.at(last_id)
            end
          end
          cursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)

使用java實現

import java.net.UnknownHostException;
import java.util.List;

import org.bson.types.ObjectId;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;

public class Test {

    public static void main(String[] args) {
        int pageSize=50000;

        try {

            /**** Connect to MongoDB ****/
            // Since 2.10.0, uses MongoClient
            MongoClient mongo = new MongoClient("localhost", 27017);

            /**** Get database ****/
            // if database doesn't exists, MongoDB will create it for you
            DB db = mongo.getDB("www");

            /**** Get collection / table from 'testdb' ****/
            // if collection doesn't exists, MongoDB will create it for you
            DBCollection table = db.getCollection("person");
            DBCursor dbObjects;            
            Long cnt=table.count();
            //System.out.println(table.getStats());
            Long page=getPageSize(cnt,pageSize);
            ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa");
            
            for(Long i=0L;i<page;i++) {
                Long start=System.currentTimeMillis();
                dbObjects=getCursorForCollection(table, lastIdObject, pageSize);
                System.out.println("第"+(i+1)+"次查詢,耗時:"+(System.currentTimeMillis()-start)/1000+"秒");
                List<DBObject> objs=dbObjects.toArray();
                lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id");
                                
            }            

        } catch (UnknownHostException e) {
            e.printStackTrace();
        } catch (MongoException e) {
            e.printStackTrace();
        }

    
    }
    
    public static DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,int pageSize) {
        DBCursor dbObjects=null;
        if(lastIdObject==null) {
            lastIdObject=(ObjectId) collection.findOne().get("_id");
        }
        BasicDBObject query=new BasicDBObject();
        query.append("_id",new BasicDBObject("$gt",lastIdObject));
        BasicDBObject sort=new BasicDBObject();
        sort.append("_id",1);
        dbObjects=collection.find(query).limit(pageSize).sort(sort);
        return dbObjects;
    }
    
    public static Long getPageSize(Long cnt,int pageSize) {
        return cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1;
    }

}

4.一些經驗教訓

  1. 不小心漏打了一個$符號,導致查詢不到資料,浪費了一些時間去查詢原因

query.append("_id",new BasicDBObject("$gt",lastIdObject));
2.建立索引
  建立普通的單列索引:db.collection.ensureIndex({field:1/-1});  1是升續 -1是降續
例項:db.articles.ensureIndex({title:1}) //注意 field 不要加""雙引號,否則建立不成功
  檢視當前索引狀態: db.collection.getIndexes();
  例項:
  db.articles.getIndexes();
  刪除單個索引db.collection.dropIndex({filed:1/-1});

      3.執行計劃

   db.student.find({"name":"dd1"}).explain()

 參考文獻:

【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/