技術文檔翻譯-------glove readme（1）

阿新 • • 發佈：2018-02-23

並排 ext bsp rep cor ren contents into ssi

1 Package Contents
2 To train your own GloVe vectors, first you‘ll need to prepare your corpus as a single text file with all words separated by a single space. If your corpus has multiple documents, simply concatenate documents together with a single space. If your documents are particularly short, it
‘s possible that padding the gap between documents with e.g. 5 "dummy" words will produce better vectors. Once you create your corpus, you can train GloVe vectors using the following 4 tools. An example is included in demo.sh, which you can modify as necessary.
3
4 This four main tools in this package are:

5
6 1) vocab_count
7 This tool requires an input corpus that should already consist of whitespace-separated tokens. Use something like the Stanford Tokenizer first on raw text. From the corpus, it constructs unigram counts from a corpus, and optionally thresholds the resulting vocabulary based on total vocabulary size or minimum frequency count.

8
9 2) cooccur
10 Constructs word-word cooccurrence statistics from a corpus. The user should supply a vocabulary file, as produced by vocab_count, and may specify a variety of parameters, as described by running ./build/cooccur.
11
12 3) shuffle
13 Shuffles the binary file of cooccurrence statistics produced by cooccur. For large files, the file is automatically split into chunks, each of which is shuffled and stored on disk before being merged and shuffled together. The user may specify a number of parameters, as described by running ./build/shuffle.
14
15 4) glove
16 Train the GloVe model on the specified cooccurrence data, which typically will be the output of the shuffle tool. The user should supply a vocabulary file, as given by vocab_count, and may specify a number of other parameters, which are described by running ./build/glove.

 1 如果你要訓練你自己的glove詞向量，那麽你首先需要把準備一個包含你語料集的單獨文件，格式要求，文件中的詞都用一個空格隔開。如果你的語料集有多個文檔，請用兩兩之間用空格連接起來。如果你的文檔都非常的短，你可以用5個"dummy"單詞來填充文檔，這樣可以產生更好的詞向量。一旦你創建了語料庫，你就可以用以下4個工具進行glove詞向量訓練了。demo.sh中包含一個示例，可以再必要的時候修改它。
 2 
 3 攻擊包中主要的四個工具如下所示：
 4     （1） vocab_count
 5         這個工具要求輸入的語料庫已經是以空格分隔的標準格式。它會首先使用類似Stanford  Tokenizer 的方式作用在文本上，它會對語料庫中的一元詞進行統計計數，並根據總詞匯量或者最小詞頻計數來選擇閾值得到最終結果
 6     （2）ooccur 
 7         從語聊庫構建詞-詞共生統計，用戶應該提供一個由vocab_count得到的詞匯表文件，同時需要指定一系列參數， 就像運行./build/cooccur時顯示的描述樣
 8     （3）shuffle  
 9         混洗由cooccur生成二進制的共生統計結果文件。對於大文件，每個塊都會在混合並混洗在一起然後存儲並排列在磁盤陣列上。用戶需要指定一些參數，如運行 ./build/shuffle時顯示的那樣。
10         
11     （4） glove
12     
13         在指定的共生數據上訓練glove模型，這通常是混洗工具（shuffle）輸出的結果。用戶應該提供一個由vocab_count得出的文件並指定一系列參數，如運行./build/glove描述的那樣

技術文檔翻譯-------glove readme（1）

並排 ext bsp rep cor ren contents into ssi 1 Package Contents 2 To train your own GloVe vectors, first you‘ll need to prepare your cor

技術文檔翻譯-------glove readme（1）

技術文檔翻譯-------glove readme（1）

iOS Threading編程指南官方文檔翻譯第一篇（序言）

Android官方技術文檔翻譯——Gradle 插件用戶指南（4）

【Unity3D技術文檔翻譯】第1.1篇 AssetBundle 工作流

《UNIX環境高級編程》讀書筆記之系統數據文件和信息（1）

文檔對象模型（DOM）

spring接口文檔註解：@ApiOperation（轉）

HTML的文檔結構與語法（二）

ASP.NET Core 中文文件第二章指南（1）用 Visual Studio Code 在 macOS 上建立首個 ASP.NET Core 應用程式

【代碼筆記】Java文件的輸入輸出（1）——Java.io包的初步理解

管理方法論-學習技術管理實戰36講有感（1）-為什麼需要學管理

Google技術之 mod_Pagespeed 網頁優化探索（1）

Android官方技術文件翻譯——Gradle 外掛使用者指南（1-3）

ABP官方文檔翻譯 1.2 N層架構

SQLAlchemy技術文檔（中文版）（上）

SQLAlchemy技術文檔（中文版）（中）

ABP官方文檔翻譯 6.1.2 MVC視圖

ABP官方文檔翻譯 6.1.3 異常處理

ABP官方文檔翻譯 6.2.1 ASP.NET Core集成

ABP官方文檔翻譯 9.1 EntityFramework集成

技術文檔翻譯-------glove readme（1）

相關推薦