1. 程式人生 > >Elasticsearch熱詞(新詞/自定義詞)更新配置

Elasticsearch熱詞(新詞/自定義詞)更新配置

網路詞語日新月異,如何讓新出的網路熱詞(或特定的詞語)實時的更新到我們的搜尋當中呢 

先用 ik 測試一下 :

curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '
成龍原名陳港生
'
#返回
{
  "tokens" : [ {
    "token" : "成龍",
    "start_offset" : 1,
    "end_offset" : 3,
    "type" : "CN_WORD",
    "position" : 0
  }, {
    "token" : "原名",
    "start_offset" : 3,
    "end_offset" : 5,
    "type" : "CN_WORD",
    "position" : 1
  }, {
    "token" : "陳",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "CN_CHAR",
    "position" : 2
  }, {
    "token" : "港",
    "start_offset" : 6,
    "end_offset" : 7,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "生",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "CN_CHAR",
    "position" : 4
  } ]
}
ik 的主詞典中沒有”陳港生” 這個詞,所以被拆分了。 
現在我們來配置一下 
修改 IK 的配置檔案 :ES 目錄/plugins/ik/config/ik/IKAnalyzer.cfg.xml 

修改如下:

<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">    
<properties>    
    <comment>IK Analyzer 擴充套件配置</comment>  
    <!--使用者可以在這裡配置自己的擴充套件字典 -->      
    <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>       
     <!--使用者可以在這裡配置自己的擴充套件停止詞字典-->  
    <entry key="ext_stopwords">custom/ext_stopword.dic</entry>  
    <!--使用者可以在這裡配置遠端擴充套件字典 -->   
    <entry key="remote_ext_dict">http://192.168.1.136/hotWords.php</entry>  
    <!--使用者可以在這裡配置遠端擴充套件停止詞字典-->  
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->  
</properties>  
這裡我是用的是遠端擴充套件字典,因為可以使用其他程式呼叫更新,且不用重啟 ES,很方便;使用本地的檔案進行詞庫擴充套件,需要重啟ES。當然使用自定義的 mydict.dic 字典也是很方便的,一行一個詞,自己加就可以了 
既然是遠端詞典,那麼就要是一個可訪問的連結,可以是一個頁面,也可以是一個txt的文件,但要保證輸出的內容是 utf-8 的格式 

hotWords.php 的內容

$s = <<<'EOF'  
陳港生  
元樓  
藍瘦  
EOF;  
header('Last-Modified: '.gmdate('D, d M Y H:i:s', time()).' GMT', true, 200);  
header('ETag: "5816f349-19"');  
echo $s; 

ik 接收兩個返回的頭部屬性 Last-Modified 和 ETag,只要其中一個有變化,就會觸發更新,ik 會每分鐘獲取一次 

重啟 Elasticsearch ,檢視啟動記錄,看到了三個詞已被載入進來

[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 陳港生  
[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 元樓  
[2016-10-31 15:08:57,749][INFO ][ik-analyzer              ] 藍瘦  

現在我們來測試一下,再次執行上面的請求,返回

...  
  }, {  
    "token" : "陳港生",  
    "start_offset" : 5,  
    "end_offset" : 8,  
    "type" : "CN_WORD",  
    "position" : 2  
  }, {  
... 

可以看到 ik 分詞器已經匹配到了 “陳港生” 這個詞。

Java伺服器端實現:實現載入擴充套件詞、新增擴充套件詞、擴充套件詞重新整理介面

<!--使用者可以在這裡配置遠端擴充套件字典 -->   
    <entry key="remote_ext_dict">http://ip:port/es/dic/loadExtDict</entry>  
@RestController
@RequestMapping("/es/dic")
public class DicController {
	
	private static final Logger logger = LoggerFactory.getLogger(DicController.class);
	
	@Autowired
	private DictRedis dictRedis;
	
	private static final String EXT_DICT_PATH = "E:\\ext_dict.txt";
	
	/**
	  * Description:載入擴充套件詞
	  * @param response
	 */
	@RequestMapping(value = "/loadExtDict")
	public void loadExtDict(HttpServletResponse response) {
		logger.error("extDict get start");
		long count = dictRedis.incr(RedisKeyConstants.ES_EXT_DICT_FLUSH);
		//要保證每個節點都能獲取到擴充套件詞
		if(count > getEsClusterNodesNum()) {
			return;
		}
		
		String result = FileUtil.read(EXT_DICT_PATH);
		if(StringUtils.isEmpty(result)) {
			return;
		}
		
//		String result = "黃燜雞米飯\n騰衝大救駕\n陳港生\n大西瓜\n大南瓜";
		try {
			response.setHeader("Last-Modified", TimeUtil.currentTimeHllDT().toString());
			response.setHeader("ETag",TimeUtil.currentTimeHllDT().toString());
			response.setContentType("text/plain; charset=UTF-8");
            PrintWriter out = response.getWriter();
            out.write(result);
            out.flush();
        } catch (IOException e) {
            logger.error("DicController loadExtDict exception" , e);
        }
		
		logger.error("extDict get end,result:{}", result);
	}
	
	/**
	  * Description:擴充套件詞重新整理
	  * @param response
	  * @return
	 */
	@RequestMapping(value = "/extDictFlush")
	public String extDictFlush() {
		String result = "ok";
		try {
			dictRedis.del(RedisKeyConstants.ES_EXT_DICT_FLUSH);
        } catch (Exception e) {
        	result = e.getMessage();
        }
		return result;
	}
	
	/**
	  * Description:新增擴充套件詞典,多個詞以逗號隔開“,”
	  * @param dict
	  * @return
	 */
	@RequestMapping(value = "/addExtDict")
	public String addExtDict(String dict) {
		String result = "ok";
		if(StringUtils.isEmpty(dict)) {
			return "新增詞不能為空";
		}
		
		StringBuilder sb = new StringBuilder();
		String[] dicts = dict.split(",");
		for (String str : dicts) {
			sb.append("\n").append(str);
		}
		
		boolean flag = FileUtil.write(EXT_DICT_PATH, sb.toString());
		if(flag) {
			extDictFlush();
		} else {
			result = "fail";
		}
		
		return result;
	}
	
	/**
	  * Description:獲取叢集節點個數,若未獲取到,預設10個
	  * @return
	 */
	private int getEsClusterNodesNum() {
		int num = 10;
		String esAddress = PropertyConfigurer.getString("es.address","http://172.16.32.69:9300,http://172.16.32.48:9300");
		List<String> clusterNodes = Arrays.asList(esAddress.split(","));
		if(clusterNodes != null && clusterNodes.size() != 0) {
			num = clusterNodes.size();
		}
		return num;
	}
}

檔案讀寫工具類:

public class FileUtil {

	private static final Logger logger = LoggerFactory.getLogger(FileUtil.class);

	/**
	  * Description:檔案讀取
	  *
	  * @param path
	  * @return
	  * @throws Exception
	 */
	public static String read(String path) {
		StringBuilder sb = new StringBuilder();
		BufferedReader reader = null;
		try {
			BufferedInputStream fis = new BufferedInputStream(new FileInputStream(new File(path)));
			reader = new BufferedReader(new InputStreamReader(fis, "utf-8"), 512);// 用512的緩衝讀取文字檔案

			String line = "";
			while ((line = reader.readLine()) != null) {
				sb.append(line).append("\n");
			}
		} catch (Exception e) {
			logger.error("FileUtil read exception", e);
		} finally {
			if(reader != null) {
				try {
					reader.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		return sb.toString();
	}

	/**
	  * Description:追加寫入檔案
	  *
	 */
	public static boolean write(String path, String content) {
		boolean flag = true;
		BufferedWriter out = null;
		try {
			out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(path), true))); // 追加的方法
			out.write(content);
		} catch (IOException e) {
			flag = false;
			logger.error("FileUtil write exception", e);
		} finally {
			try {
				if(out != null) {
					out.close();
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return flag;
	}

}