1. 程式人生 > >使用ElasticSearch完成百萬級資料查詢附近的人功能

使用ElasticSearch完成百萬級資料查詢附近的人功能

上一篇文章介紹了ElasticSearch使用Repository和ElasticSearchTemplate完成構建複雜查詢條件,簡單介紹了ElasticSearch使用地理位置的功能。

這一篇我們來看一下使用ElasticSearch完成大資料量查詢附近的人功能,搜尋N米範圍的內的資料。

準備環境

本機測試使用了ElasticSearch最新版5.5.1,SpringBoot1.5.4,spring-data-ElasticSearch2.1.4.新建Springboot專案,勾選ElasticSearch和web。pom檔案如下
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.tianyalei</groupId>
	<artifactId>elasticsearch</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>elasticsearch</name>
	<description>Demo project for Spring Boot</description>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>1.5.4.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>com.sun.jna</groupId>
			<artifactId>jna</artifactId>
			<version>3.0.9</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>


</project>
新建model類Person
package com.tianyalei.elasticsearch.model;

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.GeoPointField;

import java.io.Serializable;

/**
 * model類
 */
@Document(indexName="elastic_search_project",type="person",indexStoreType="fs",shards=5,replicas=1,refreshInterval="-1")
public class Person implements Serializable {
    @Id
    private int id;

    private String name;

    private String phone;

    /**
     * 地理位置經緯度
     * lat緯度,lon經度 "40.715,-74.011"
     * 如果用陣列則相反[-73.983, 40.719]
     */
    @GeoPointField
    private String address;

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getPhone() {
        return phone;
    }

    public void setPhone(String phone) {
        this.phone = phone;
    }

    public String getAddress() {
        return address;
    }

    public void setAddress(String address) {
        this.address = address;
    }
}
我用address欄位表示經緯度位置。注意,使用String[]和String分別來表示經緯度時是不同的,見註釋。
import com.tianyalei.elasticsearch.model.Person;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

public interface PersonRepository extends ElasticsearchRepository<Person, Integer> {

}
看一下Service類,完成插入測試資料的功能,查詢的功能我放在Controller裡了,為了方便檢視,正常是應該放在Service裡
package com.tianyalei.elasticsearch.service;

import com.tianyalei.elasticsearch.model.Person;
import com.tianyalei.elasticsearch.repository.PersonRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.IndexQuery;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.List;

@Service
public class PersonService {
    @Autowired
    PersonRepository personRepository;
    @Autowired
    ElasticsearchTemplate elasticsearchTemplate;

    private static final String PERSON_INDEX_NAME = "elastic_search_project";
    private static final String PERSON_INDEX_TYPE = "person";

    public Person add(Person person) {
        return personRepository.save(person);
    }

    public void bulkIndex(List<Person> personList) {
        int counter = 0;
        try {
            if (!elasticsearchTemplate.indexExists(PERSON_INDEX_NAME)) {
                elasticsearchTemplate.createIndex(PERSON_INDEX_TYPE);
            }
            List<IndexQuery> queries = new ArrayList<>();
            for (Person person : personList) {
                IndexQuery indexQuery = new IndexQuery();
                indexQuery.setId(person.getId() + "");
                indexQuery.setObject(person);
                indexQuery.setIndexName(PERSON_INDEX_NAME);
                indexQuery.setType(PERSON_INDEX_TYPE);

                //上面的那幾步也可以使用IndexQueryBuilder來構建
                //IndexQuery index = new IndexQueryBuilder().withId(person.getId() + "").withObject(person).build();

                queries.add(indexQuery);
                if (counter % 500 == 0) {
                    elasticsearchTemplate.bulkIndex(queries);
                    queries.clear();
                    System.out.println("bulkIndex counter : " + counter);
                }
                counter++;
            }
            if (queries.size() > 0) {
                elasticsearchTemplate.bulkIndex(queries);
            }
            System.out.println("bulkIndex completed.");
        } catch (Exception e) {
            System.out.println("IndexerService.bulkIndex e;" + e.getMessage());
            throw e;
        }
    }
}
注意看bulkIndex方法,這個是批量插入資料用的,bulk也是ES官方推薦使用的批量插入資料的方法。這裡是每逢500的整數倍就bulk插入一次。
package com.tianyalei.elasticsearch.controller;

import com.tianyalei.elasticsearch.model.Person;
import com.tianyalei.elasticsearch.service.PersonService;
import org.elasticsearch.common.unit.DistanceUnit;
import org.elasticsearch.index.query.GeoDistanceQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.sort.GeoDistanceSortBuilder;
import org.elasticsearch.search.sort.SortBuilders;
import org.elasticsearch.search.sort.SortOrder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

@RestController
public class PersonController {
    @Autowired
    PersonService personService;
    @Autowired
    ElasticsearchTemplate elasticsearchTemplate;

    @GetMapping("/add")
    public Object add() {
        double lat = 39.929986;
        double lon = 116.395645;

        List<Person> personList = new ArrayList<>(900000);
        for (int i = 100000; i < 1000000; i++) {
            double max = 0.00001;
            double min = 0.000001;
            Random random = new Random();
            double s = random.nextDouble() % (max - min + 1) + max;
            DecimalFormat df = new DecimalFormat("######0.000000");
            // System.out.println(s);
            String lons = df.format(s + lon);
            String lats = df.format(s + lat);
            Double dlon = Double.valueOf(lons);
            Double dlat = Double.valueOf(lats);

            Person person = new Person();
            person.setId(i);
            person.setName("名字" + i);
            person.setPhone("電話" + i);
            person.setAddress(dlat + "," + dlon);

            personList.add(person);
        }
        personService.bulkIndex(personList);

//        SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.queryStringQuery("spring boot OR 書籍")).build();
//        List<Article> articles = elas、ticsearchTemplate.queryForList(se、archQuery, Article.class);
//        for (Article article : articles) {
//            System.out.println(article.toString());
//        }

        return "新增資料";
    }

    /**
     *
     geo_distance: 查詢距離某個中心點距離在一定範圍內的位置
     geo_bounding_box: 查詢某個長方形區域內的位置
     geo_distance_range: 查詢距離某個中心的距離在min和max之間的位置
     geo_polygon: 查詢位於多邊形內的地點。
     sort可以用來排序
     */
    @GetMapping("/query")
    public Object query() {
        double lat = 39.929986;
        double lon = 116.395645;

        Long nowTime = System.currentTimeMillis();
        //查詢某經緯度100米範圍內
        GeoDistanceQueryBuilder builder = QueryBuilders.geoDistanceQuery("address").point(lat, lon)
                .distance(100, DistanceUnit.METERS);

        GeoDistanceSortBuilder sortBuilder = SortBuilders.geoDistanceSort("address")
                .point(lat, lon)
                .unit(DistanceUnit.METERS)
                .order(SortOrder.ASC);

        Pageable pageable = new PageRequest(0, 50);

        NativeSearchQueryBuilder builder1 = new NativeSearchQueryBuilder().withFilter(builder).withSort(sortBuilder).withPageable(pageable);
        SearchQuery searchQuery = builder1.build();

        //queryForList預設是分頁,走的是queryForPage,預設10個
        List<Person> personList = elasticsearchTemplate.queryForList(searchQuery, Person.class);

        System.out.println("耗時:" + (System.currentTimeMillis() - nowTime));
        return personList;
    }
}
看Controller類,在add方法中,我們插入90萬條測試資料,隨機產生不同的經緯度地址。在查詢方法中,我們構建了一個查詢100米範圍內、按照距離遠近排序,分頁每頁50條的查詢條件。如果不指明Pageable的話,ESTemplate的queryForList預設是10條,通過原始碼可以看到。啟動專案,先執行add,等待百萬資料插入,大概幾十秒。然後執行查詢,看一下結果。
第一次查詢花費300多ms,再次查詢後時間就大幅下降,到30ms左右,因為ES已經自動快取到記憶體了。可見,ES完成地理位置的查詢還是非常快的。適用於查詢附近的人、範圍查詢之類的功能。-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------後記,在後來的使用中,Elasticsearch2.3版本時,按上面的寫法出現了geo型別無法索引的情況,進入es的為String,而不是標註的geofiled。在此記錄一下解決方法,將String型別修改為GeoPoint,且是org.springframework.data.elasticsearch.core.geo.GeoPoint包下的。然後需要在建立index時,顯式呼叫一下mapping方法,才能正確的對映為geofield。如下
if (!elasticsearchTemplate.indexExists("abc")) {
			elasticsearchTemplate.createIndex("abc");
			elasticsearchTemplate.putMapping(Person.class);
		}



參考:ES根據地理位置查詢 http://blog.csdn.net/bingduanlbd/article/details/52253542