1. 程式人生 > >Logstash處理json格式日誌檔案的三種方法

Logstash處理json格式日誌檔案的三種方法

假設日誌檔案中的每一行記錄格式為json的,如:

{"Method":"JSAPI.JSTicket","Message":"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw","CreateTime":"2015/10/13 9:39:59","AppGUID":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_PartitionKey":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_RowKey":"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c
","_UnixTS":1444700398710}

預設配置下,logstash處理插入進elasticsearch後,查到的結果是這樣的:

複製程式碼
 1 {
 2     "_index": "logstash-2015.10.16",
 3     "_type": "voip_feedback",
 4     "_id": "sheE9eXiQASMDVtRJ0EYcg",
 5     "_version": 1,
 6     "found": true,
 7     "_source": {
 8         "message": "{\"Method\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}
", 9 "@version": "1", 10 "@timestamp": "2015-10-16T00:39:51.252Z", 11 "type": "voip_feedback", 12 "host": "ipphone", 13 "path": "/usr1/data/voip_feedback.txt" 14 } 15 }
複製程式碼

即會將json記錄做為一個字串放到”message”下,但是我是想讓logstash自動解析json記錄,將各欄位放入elasticsearch中。有三種配置方式可以實現。

第一種,直接設定format => json

複製程式碼
1     file {
2         type => "voip_feedback"
3         path => ["/usr1/data/voip_feedback.txt"]  
4         format => json
5         sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access"     
6     }
複製程式碼

這種方式查詢出的結果是:

複製程式碼
 1 {
 2     "_index": "logstash-2015.10.16",
 3     "_type": "voip_feedback",
 4     "_id": "NrNX8HrxSzCvLl4ilKeyCQ",
 5     "_version": 1,
 6     "found": true,
 7     "_source": {
 8         "Method": "JSAPI.JSTicket",
 9         "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw",
10         "CreateTime": "2015/10/13 9:39:59",
11         "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d",
12         "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d",
13         "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c",
14         "_UnixTS": 1444700398710,
15         "@version": "1",
16         "@timestamp": "2015-10-16T00:16:11.455Z",
17         "type": "voip_feedback",
18         "host": "ipphone",
19         "path": "/usr1/data/voip_feedback.txt"
20     }
21 }
複製程式碼

可以看到,json記錄已經被直接解析成各欄位放入到了_source中,但是原始記錄內容沒有被儲存

第二種,使用codec => json

複製程式碼
1 file {
2         type => "voip_feedback"
3         path => ["/usr1/data/voip_feedback.txt"]  
4         sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access"
5         codec => json {
6             charset => "UTF-8"
7         }       
8     }
複製程式碼

這種方式查詢出的結果與第一種一樣,欄位被解析,原始記錄內容也沒有儲存

第三種,使用filter json

複製程式碼
1 filter {
2     if [type] == "voip_feedback" {
3         json {
4             source => "message"
5             #target => "doc"
6             #remove_field => ["message"]
7         }        
8     }
9 }
複製程式碼

這種方式查詢出的結果是這樣的:

複製程式碼
 1 {
 2     "_index": "logstash-2015.10.16",
 3     "_type": "voip_feedback",
 4     "_id": "CUtesLCETAqhX73NKXZfug",
 5     "_version": 1,
 6     "found": true,
 7     "_source": {
 8         "message": "{\"Method222\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}",
 9         "@version": "1",
10         "@timestamp": "2015-10-16T00:28:20.018Z",
11         "type": "voip_feedback",
12         "host": "ipphone",
13         "path": "/usr1/data/voip_feedback.txt",
14         "Method222": "JSAPI.JSTicket",
15         "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw",
16         "CreateTime": "2015/10/13 9:39:59",
17         "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d",
18         "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d",
19         "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c",
20         "_UnixTS": 1444700398710,
21         "tags": [
22             "111",
23             "222"
24         ]
25     }
26 }
複製程式碼

可以看到,原始記錄被儲存,同時欄位也被解析儲存。如果確認不需要儲存原始記錄內容,可以加設定:remove_field => [“message”]

比較以上三種方法,最方便直接的就是在file中設定format => json

另外需要注意的是,logstash會在向es插入資料時預設會在_source下增加type,host,path三個欄位,如果json內容中本身也含有type,host,path欄位,那麼解析後將覆蓋掉logstash預設的這三個欄位,尤其是type欄位,這個同時也是做為index/type用的,覆蓋掉後,插入進es中的index/type就是json資料記錄中的內容,將不再是logstash config中配置的type值。

這時需要設定filter.json.target,設定該欄位後json原始內容將不會放在_source下,而是放到設定的”doc”下:

複製程式碼
 1 {
 2     "_index": "logstash-2015.10.20",
 3     "_type": "3alogic_log",
 4     "_id": "xfj3ngd5S3iH2YABjyU6EA",
 5     "_version": 1,
 6     "found": true,
 7     "_source": {
 8         "@version": "1",
 9         "@timestamp": "2015-10-20T11:36:24.503Z",
10         "type": "3alogic_log",
11         "host": "server114",
12         "path": "/usr1/app/log/mysql_3alogic_log.log",
13         "doc": {
14             "id": 633796,
15             "identity": "13413602120",
16             "type": "EAP_TYPE_PEAP",
17             "apmac": "88-25-93-4E-1F-96",
18             "usermac": "00-65-E0-31-62-5D",
19             "time": "20151020-193624",
20             "apmaccompany": "TP-LINK TECHNOLOGIES CO.,LTD",
21             "usermaccompany": ""
22         }
23     }
24 }
複製程式碼

這樣就不會覆蓋掉_source下的type,host,path值 
而且在kibana中顯示時欄位名稱為doc.type,doc.id…

補充: 無法解析的json不記錄到elasticsearch中複製程式碼
output {
      stdout{
    codec => rubydebug
    }
#無法解析的json不記錄到elasticsearch中
if "_jsonparsefailure" not in [tags] {
  elasticsearch {
    host => "localhost"
  }
}
複製程式碼

由於自己的專案只處理JSON字串的日誌,網上搜集資料過程中,還找到了一些對於系統日誌型別以及普通列印型別字串的日誌格式處理,留下連線以後有需要參考。