1. 程式人生 > >[NLP技術]關鍵詞提取演算法實現

[NLP技術]關鍵詞提取演算法實現

實現程式碼:

var nodejieba = require("nodejieba");
var fs = require('fs');
var topN = 100;
var result;
var data = fs.readFileSync('t.txt', 'utf8');
console.log(data);
result = nodejieba.extract(data, topN);
console.log("11==>",result);

t.txt

據中國之聲《新聞縱橫》報道,在剛剛過去的中秋之夜,一顆“火流星”滑亮了雲南省迪慶州的夜空。根據相關天文機構公佈的資訊,隕石墜落的地點,可能位於香格里拉市的巴拉格宗景區範圍內。

事發一週之後,昨天(11日)下午,記者專訪了巴拉格宗景區相關人員。對方稱,目前還是沒有確定隕石墜落的具體位置。最近,有很多人員都在當地尋找隕石,但至今沒有任何訊息。雖然隕石還沒有找到,但在網上有關隕石歸屬的問題已經引發了討論。

巴拉格宗景區的工作人員洛桑培楚說,事發當時,景區的多位工作人員都目睹了那顆“火流星”,“因為我們酒店的位置,剛好是在一個U字型的峽谷裡,感覺突然間天空特別亮,有個東西就飛過來了,打在對面的崖壁上,過了幾分鐘之後,就聽見咚的一聲,附近村民有明顯的震感。”

實現效果:

liuyugang:NodeJieBa apple$ node nodenlp.js
....
11==> [ { word: '隕石', weight: 45.6077707943 },
  { word: '格宗', weight: 35.21761292125063 },
  { word: '景區', weight: 32.27518069876 },
  { word: '巴拉', weight: 29.735080816230003 },
  { word: '火流星', weight: 24.582479479 },
  { word: '墜落', weight: 18.22637181838
}, { word: '事發', weight: 16.80701885336 }, { word: '工作人員', weight: 13.28734988976 }, { word: '震感', weight: 12.5143832909 }, { word: '迪慶', weight: 11.9547675029 }, { word: '11', weight: 11.739204307083542 }, { word: '培楚', weight: 11.739204307083542 }, { word: '有個', weight: 11.739204307083542 }, { word
: '人員', weight: 11.18200151198 }, { word: '新聞縱橫', weight: 11.0103058941 }, { word: '具體位置', weight: 10.8096351986 }, { word: '飛過來', weight: 10.765183436 }, { word: '香格里拉', weight: 10.642581114 }, { word: '洛桑', weight: 10.2630914922 }, { word: '字型', weight: 10.0088573539 }, { word: '相關', weight: 9.67141986604 }, { word: '崖壁', weight: 9.65218240993 }, { word: '沒有', weight: 9.338470695449999 }, { word: '目睹', weight: 8.79473217808 }, { word: '之後', weight: 8.7536825453 }, { word: '夜空', weight: 8.75318317516 }, { word: '之夜', weight: 8.65893063692 }, { word: '中秋', weight: 8.55357012126 }, { word: '那顆', weight: 8.5488195185 }, { word: '幾分鐘', weight: 8.4980002701 }, { word: '專訪', weight: 8.35941410682 }, { word: '多位', weight: 8.01735526349 }, { word: '雲南省', weight: 8.00903344015 }, { word: '歸屬', weight: 8.00078029839 }, { word: '剛好', weight: 7.90174109003 }, { word: '之聲', weight: 7.58531965045 }, { word: '天文', weight: 7.45973111134 }, { word: '峽谷', weight: 7.41757030052 }, { word: '村民', weight: 7.28595205177 }, { word: '酒店', weight: 7.19748953873 }, { word: '對面', weight: 7.13679274341 }, { word: '天空', weight: 6.90491149567 }, { word: '一顆', weight: 6.84364067028 }, { word: '地點', weight: 6.68250081357 }, { word: '一週', weight: 6.6090214428 }, { word: '討論', weight: 6.28144423575 }, { word: '引發', weight: 6.18600017817 }, { word: '網上', weight: 6.15610784262 }, { word: '尋找', weight: 6.04010686644 }, { word: '下午', weight: 5.96939289045 }, { word: '昨天', weight: 5.92683327603 }, { word: '聽見', weight: 5.92339566522 }, { word: '報道', weight: 5.88040717916 }, { word: '剛剛', weight: 5.78366356424 }, { word: '最近', weight: 5.76738379075 }, { word: '位置', weight: 5.67463922249 }, { word: '找到', weight: 5.66161232021 }, { word: '感覺', weight: 5.64147828931 }, { word: '確定', weight: 5.35063012369 }, { word: '資訊', weight: 5.25386069277 }, { word: '範圍', weight: 5.19468393767 }, { word: '附近', weight: 5.16934129144 }, { word: '一聲', weight: 5.15269025031 }, { word: '公佈', weight: 5.06198083963 }, { word: '訊息', weight: 5.03989475617 }, { word: '突然', weight: 4.99713421631 }, { word: '位於', weight: 4.96609078159 }, { word: '很多', weight: 4.85828267085 }, { word: '東西', weight: 4.77328420082 }, { word: '過去', weight: 4.75519585235 }, { word: '特別', weight: 4.74775455087 }, { word: '當時', weight: 4.67584283385 }, { word: '機構', weight: 4.65227107919 }, { word: '明顯', weight: 4.63964416568 }, { word: '記者', weight: 4.29694475313 }, { word: '問題', weight: 3.96351357308 }, { word: '目前', weight: 3.91528758382 }, { word: '可能', weight: 3.74802798573 }, { word: '已經', weight: 3.42054864564 }, { word: '中國', weight: 3.02732068666 }, { word: '一個', weight: 2.81755097213 } ] liuyugang:NodeJieBa apple$

原始碼地址