1. 程式人生 > >機器學習開源庫和專案總結

機器學習開源庫和專案總結

A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. Other awesome lists can be found in the awesome-awesomeness list.

If you want to contribute to this list (please do), send me a pull request or contact me @josephmisiti

For a list of free machine learning books available for download, go 

here

Table of Contents

C

General-Purpose Machine Learning

  • Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF).
  • Accord-Framework -The Accord.NET Framework is a complete framework for building machine learning, computer vision, computer audition, signal processing and statistical applications.

Computer Vision

  • CCV - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
  • VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox

C++

Computer Vision

  • OpenCV - OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
  • DLib - DLib has C++ and Python interfaces for face detection and training general object detectors.
  • EBLearn - Eblearn is an object-oriented C++ library that implements various machine learning models

General-Purpose Machine Learning

  • DLib - A suite of ML tools designed to be easy to imbed in other applications
  • ecogg
  • shark
  • Vowpal Wabbit (VW) - A fast out-of-core learning system.
  • sofia-ml - Suite of fast incremental algorithms.
  • Shogun - The Shogun Machine Learning Toolbox
  • Caffe - A deep learning framework developed with cleanliness, readability, and speed in mind. [DEEP LEARNING]
  • CXXNET - Yet another deep learning framework with less than 1000 lines core code [DEEP LEARNING]
  • XGBoost - A parallelized optimized general purpose gradient boosting library.
  • CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
  • Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling
  • BanditLib - A simple Multi-armed Bandit library.
  • Timbl - A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.

Natural Language Processing

  • MIT Information Extraction Toolkit - C, C++, and Python tools for named entity recognition and relation extraction
  • CRF++ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
  • BLLIP Parser - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
  • colibri-core - C++ library, command line tools, and Python binding for extracting and working with with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
  • ucto - Unicode-aware regular-expression based tokeniser for various languages. Tool and C++ library. Supports FoLiA format.
  • frog - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyser.

Speech Recognition

  • Kaldi - Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

Sequence Analysis

  • ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.

Clojure

Natural Language Processing

  • Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
  • Infections-clj - Rails-like inflection library for Clojure and ClojureScript

General-Purpose Machine Learning

  • Touchstone - Clojure A/B testing library
  • Clojush - he Push programming language and the PushGP genetic programming system implemented in Clojure
  • Infer - Inference and machine learning in clojure
  • Clj-ML - A machine learning library for Clojure built on top of Weka and friends
  • Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specialises in neural-nets)
  • Fungp - A genetic programming library for Clojure
  • Statistiker - Basic Machine Learning algorithms in Clojure.
  • clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm
  • comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm

Data Analysis / Data Visualization

  • Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
  • PigPen - Map-Reduce for Clojure.
  • Envision - Clojure Data Visualisation library, based on Statistiker and D3 ## Erlang

General-Purpose Machine Learning

  • Disco - Map Reduce in Erlang

Go

Natural Language Processing

  • go-porterstemmer - A native Go clean room implementation of the Porter Stemming algorithm.
  • paicehusk - Golang implementation of the Paice/Husk Stemming Algorithm.
  • snowball - Snowball Stemmer for Go.
  • go-ngram - In-memory n-gram index with compression.

General-Purpose Machine Learning

  • Go Learn - Machine Learning for Go
  • go-pr - Pattern recognition package in Go lang.
  • bayesian - Naive Bayesian Classification for Golang.
  • go-galib - Genetic Algorithms library written in Go / golang
  • Cloudforest - Ensembles of decision trees in go/golang.
  • gobrain - Neural Networks written in go

Data Analysis / Data Visualization

  • go-graph - Graph library for Go/golang language.
  • SVGo - The Go Language library for SVG generation

Haskell

General-Purpose Machine Learning

  • haskell-ml - Haskell implementations of various ML algorithms.
  • HLearn - a suite of libraries for interpreting machine learning models according to their algebraic structure.
  • hnn - Haskell Neural Network library.
  • hopfield-networks - Hopfield Networks for unsupervised learning in Haskell.

Java

Natural Language Processing

  • CoreNLP - Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words
  • Stanford Parser - A natural language parser is a program that works out the grammatical structure of sentences
  • Stanford POS Tagger - A Part-Of-Speech Tagger (POS Tagger
  • Stanford Name Entity Recognizer - Stanford NER is a Java implementation of a Named Entity Recognizer.
  • Stanford Word Segmenter - Tokenization of raw text is a standard pre-processing step for many NLP tasks.
  • Tregex, Tsurgeon and Semgrex - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
  • Stanford English Tokenizer - Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
  • Stanford Tokens Regex - A tokenizer divides text into a sequence of tokens, which roughly correspond to "words"
  • Stanford Temporal Tagger - SUTime is a library for recognizing and normalizing time expressions.
  • Stanford SPIED - Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion
  • Stanford Topic Modeling Toolbox - Topic modeling tools to social scientists and others who wish to perform analysis on datasets
  • Twitter Text Java - A Java implementation of Twitter's text processing library
  • MALLET - A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
  • OpenNLP - a machine learning based toolkit for the processing of natural language text.
  • LingPipe - A tool kit for processing text using computational linguistics.
  • ClearTK - ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA.
  • Apache cTAKES - Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.

General-Purpose Machine Learning

  • Datumbox - Machine Learning framework for rapid development of Machine Learning and Statistical applications
  • ELKI - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
  • H2O - ML engine that supports distributed learning on data stored in HDFS.
  • htm.java - General Machine Learning library using Numenta’s Cortical Learning Algorithm
  • java-deeplearning - Distributed Deep Learning Platform for Java, Clojure,Scala
  • JAVA-ML - A general ML library with a common interface for all algorithms in Java
  • JSAT - Numerous Machine Learning algoirhtms for classification, regresion, and clustering.
  • Mahout - Distributed machine learning
  • Meka - An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
  • MLlib in Apache Spark - Distributed machine learning library in Spark
  • Neuroph - Neuroph is lightweight Java neural network framework
  • ORYX - Simple real-time large-scale machine learning infrastructure.
  • RankLib - RankLib is a library of learning to rank algorithms
  • RapidMiner - RapidMiner integration into Java code
  • Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
  • WalnutiQ - object oriented model of the human brain
  • Weka - Weka is a collection of machine learning algorithms for data mining tasks

Speech Recognition

  • CMU Sphinx - Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.

Data Analysis / Data Visualization

  • Hadoop - Hadoop/HDFS
  • Spark - Spark is a fast and general engine for large-scale data processing.
  • Impala - Real-time Query for Hadoop

Javascript

Natural Language Processing

  • Twitter-text-js - A JavaScript implementation of Twitter's text processing library
  • NLP.js - NLP utilities in javascript and coffeescript
  • natural - General natural language facilities for node
  • Knwl.js - A Natural Language Processor in JS
  • Retext - Extensible system for analysing and manipulating natural language
  • TextProcessing - Sentiment analysis, stemming and lemmatization, part-of-speech tagging and chunking, phrase extraction and named entity recognition.

Data Analysis / Data Visualization

  • D3.js
  • dc.js
  • D3xter - Straight forward plotting built on D3
  • statkit - Statistics kit for JavaScript
  • science.js - Scientific and statistical computing in JavaScript.
  • Z3d - Easily make interactive 3d plots built on Three.js

General-Purpose Machine Learning

  • Convnet.js - ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING]
  • Clustering.js - Clustering algorithms implemented in Javascript for Node.js and the browser
  • Decision Trees - NodeJS Implementation of Decision Tree using ID3 Algorithm
  • Node-fann - FANN (Fast Artificial Neural Network Library) bindings for Node.js
  • Kmeans.js - Simple Javascript implementation of the k-means algorithm, for node.js and the browser
  • LDA.js - LDA topic modeling for node.js
  • Learning.js - Javascript implementation of logistic regression/c4.5 decision tree
  • Machine Learning - Machine learning library for Node.js
  • Node-SVM - Support Vector Machine for nodejs
  • Brain - Neural networks in JavaScript
  • Bayesian-Bandit - Bayesian bandit implementation for Node and the browser.
  • Synaptic - Architecture-free neural network library for node.js and the browser
  • kNear - JavaScript implementation of the k nearest neighbors algorithm for supervised learning

Julia

General-Purpose Machine Learning

  • PGM - A Julia framework for probabilistic graphical models.
  • DA - Julia package for Regularized Discriminant Analysis
  • Regression - Algorithms for regression analysis (e.g. linear regression and logistic regression)
  • Local Regression - Local regression, so smooooth!
  • Naive Bayes - Simple Naive Bayes implementation in Julia
  • Mixed Models - A Julia package for fitting (statistical) mixed-effects models
  • Simple MCMC - basic mcmc sampler implemented in Julia
  • Distance - Julia module for Distance evaluation
  • Decision Tree - Decision Tree Classifier and Regressor
  • Neural - A neural network in Julia
  • MCMC - MCMC tools for Julia
  • GLM - Generalized linear models in Julia
  • GLMNet - Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
  • Clustering - Basic functions for clustering data: k-means, dp-means, etc.
  • SVM - SVM's for Julia
  • Kernal Density - Kernel density estimators for julia
  • NMF - A Julia package for non-negative matrix factorization
  • ANN - Julia artificial neural networks
  • Mocha.jl - Deep Learning framework for Julia inspired by Caffe
  • XGBoost.jl - eXtreme Gradient Boosting Package in Julia

Natural Language Processing

Data Analysis / Data Visualization

  • Graph Layout - Graph layout algorithms in pure Julia
  • Data Frames Meta - Metaprogramming tools for DataFrames
  • Julia Data - library for working with tabular data in Julia
  • Data Read - Read files from Stata, SAS, and SPSS
  • Hypothesis Tests - Hypothesis tests for Julia
  • Gadfly - Crafty statistical graphics for Julia.
  • Stats - Statistical tests for Julia

  • RDataSets - Julia package for loading many of the data sets available in R

  • DataFrames - library for working with tabular data in Julia
  • Distributions - A Julia package for probability distributions and associated functions.
  • Data Arrays - Data structures that allow missing values
  • Time Series - Time series toolkit for Julia
  • Sampling - Basic sampling algorithms for Julia

Misc Stuff / Presentations

  • DSP - Digital Signal Processing (filtering, periodograms, spectrograms, window functions).
  • SignalProcessing - Signal Processing tools for Julia
  • Images - An image library for Julia

Lua

General-Purpose Machine Learning

    • cephes - Cephes mathematical functions library, wrapped for Torch. Provides and wraps the 180+ special mathematical functions from the Cephes mathematical library, developed by Stephen L. Moshier. It is used, among many other places, at the heart of SciPy.
    • graph - Graph package for Torch
    • randomkit - Numpy's randomkit, wrapped for Torch
    • signal - A signal processing toolbox for Torch-7. FFT, DCT, Hilbert, cepstrums, stft

    • nn - Neural Network package for Torch

    • nngraph - This package provides graphical computation for nn library in Torch7.
    • nnx - A completely unstable and experimental package that extends Torch's builtin nn library
    • optim - An optimization library for Torch. SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp and more.
    • unsup - A package for unsupervised learning in Torch. Provides modules that are compatible with nn (LinearPsd, ConvPsd, AutoEncoder, ...), and self-contained algorithms (k-means, PCA).
    • manifold - A package to manipulate manifolds
    • svm - Torch-SVM library
    • lbfgs - FFI Wrapper for liblbfgs
    • vowpalwabbit - An old vowpalwabbit interface to torch.
    • OpenGM - OpenGM is a C++ library for graphical modeling, and inference. The Lua bindings provide a simple way of describing graphs, from Lua, and then optimizing them with OpenGM.
    • sphagetti - Spaghetti (sparse linear) module for torch7 by @MichaelMathieu
    • LuaSHKit - A lua wrapper around the Locality sensitive hashing library SHKit
    • kernel smoothing - KNN, kernel-weighted average, local linear regression smoothers
    • cutorch - Torch CUDA Implementation
    • cunn - Torch CUDA Neural Network Implementation
    • imgraph - An image/graph library for Torch. This package provides routines to construct graphs on images, segment them, build trees out of them, and convert them back to images.
    • videograph - A video/graph library for Torch. This package provides routines to construct graphs on videos, segment them, build trees out of them, and convert them back to videos.
    • saliency - code and tools around integral images. A library for finding interest points based on fast integral histograms.
    • stitch - allows us to use hugin to stitch images and apply same stitching to a video sequence
    • sfm - A bundle adjustment/structure from motion package
    • fex - A package for feature extraction in Torch. Provides SIFT and dSIFT modules.
    • OverFeat - A state-of-the-art generic dense feature extractor
  • Lunum

Demos and Scripts

  • Core torch7 demos repository.
    • linear-regression, logistic-regression
    • face detector (training and detection as separate demos)
    • mst-based-segmenter
    • train-a-digit-classifier
    • train-autoencoder
    • optical flow demo
    • train-on-housenumbers
    • train-on-cifar
    • tracking with deep nets
    • kinect demo
    • filter-bank visualization
    • saliency-networks
  • Music Tagging - Music Tagging scripts for torch7
  • torch-datasets - Scripts to load several popular datasets including:
    • BSR 500
    • CIFAR-10
    • COIL
    • Street View House Numbers
    • MNIST
    • NORB
  • Atari2600 - Scripts to generate a dataset with static frames from the Arcade Learning Environment

Matlab

Computer Vision

  • Contourlets - MATLAB source code that implements the contourlet transform and its utility functions.
  • Shearlets - MATLAB code for shearlet transform
  • 相關推薦

    機器學習開源專案總結

    A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. Other awesome li

    2018 年 8 月以來 5 個最好的機器學習 GitHub Reddit 執行緒.md

    2018 年 8 月以來 5 個最好的機器學習 GitHub 專案和 Reddit 熱帖 PRANAV DAR, SEPTEMBER 2, 2018 前言 當我去年年初開始使用 GitHub 時,我從來沒有想過它對我來說有多麼有用。最初我只是用它來上傳我自己

    機器學習開源

    以下是根據不同語言型別和應用領域收集的各類工具庫,持續更新中。 C 通用機器學習 Recommender- 一個產品推薦的C語言庫,利用了協同過濾. 計算機視覺 CCV -C-based/Ca

    一文盤點近期熱門機器學習開源專案!(研究框架、AutoML、深度學習...)

    授權自AI科技大本營(ID:rgznai100)本文共1029字,建議閱讀5分鐘。本文為你從過去

    頂級的20名Python人工智慧機器學習開源專案

    本文用Python更新了頂級的AI和機器學習專案。Tensorflow已經成為了貢獻者的三位數增

    機器學習開源演算法

    C++計算機視覺 CCV —基於C語言/提供快取/核心的機器視覺庫,新穎的機器視覺庫 OpenCV—它提供C++, C, Python, Java 以及 MATLAB介面,並支援Windows, Linux, Android and Mac OS作業系統。

    10月機器學習開源專案Top10

    參加 2018 AI開發者大會,請點選 ↑↑↑ 作者 | Mybridge 譯者 | 林春眄 整理 | Jane 出品 | AI科技大本營 【導讀】過去一個月裡,我們對近 250 個機器學習開源專案進行了排名,並挑選出熱度前 10 的專案。這份清單

    Github近期最有趣的10款機器學習開源專案

    https://yq.aliyun.com/ziliao/294260 Face Recognition 世界上最簡單的人臉識別庫 Github近期最有趣的10款機器學習開源專案 本專案號稱世界上最簡單的人臉識別庫,可使用 Python 和命令列進行呼叫。該庫使用 dlib

    機器學習開源專案

    開源機器學習專案 30 個:原文地址 FastText:用於快速文字表示和分類的庫,基於快速文字的多語言無監督或監督詞嵌入 深色照片風格轉換:論文“深度照片風格轉移”程式碼和資料 Python和世界上最簡單的面部識別api和命令列 洋紅(Magenta):機器智慧生成音樂和藝

    2018年10月Top 10機器學習開源專案

    上個月MyBridge從250餘個新增機器學習開源專案中評選出了10個最佳專案: 這些專案在GitHub上平均獲得1345個star 專案涵蓋話題:深度學習,漫畫上色,影象增強,增強學習,資料庫 No.1 Fastai:利用當前最好的深度學習演算法簡化訓練神經網路的過程,包含了很多“開箱即用”

    最新機器學習開源專案Top10

    作者 | Mybridge 譯者 | Linstancy 整理 | Jane 出品 | AI科技大本營 【導讀】過去一個月裡,我們對近 1400 個機器學習專案進行了排名,並挑選出熱度前 10 的專案。這份清單涵蓋了包括 OpenAI 最新開發的 RN

    11月最佳機器學習開源專案Top10!

    整理 | Jane 出品 | AI科技大本營 過去一個月,我們從近 250 個機器學習開源專案中挑選出了最受大家關注的前十名。這些專案在 GitHub 上平均 Stars 數為 2713。這些專案涉及由 Google AI Research 開源的 BER

    機器學習系統設計診斷方法學習總結

    過擬合:對訓練資料擬合精準,但是對未知的資料預測能力差 如何應對? 2、丟棄一些不能幫助正確預測的特徵。 2.1、手工選擇丟棄特徵 2.2、使用模型選擇方法(如PCA) 3、正則化。保留所有的特徵,減少引數的大小 預防過擬合的方法步驟: 1、打亂資料集;2、劃分資料:70%

    年度大盤點:機器學習開源專案及框架

    我們先來看看Mybridge AI 中排名靠前的頂級開源專案,再聊聊機器學習今年都有哪些發展,最後探尋下新的一年中會有哪些有值得我們期待的事情。 頂級的開源專案 BERT BERT,全稱為Bidirectional Encoder Representations from

    30個超讚的機器學習開源專案

    Medium上的作者Mybridge從8800個專案中,挑選出了30個GitHub上收穫了超多星星的機器學習專案,量子位搬運一下,希望大家學的開心~ 注:此份列表的星星數量僅供參考,因為,GitHub上的星星數量是動態變化的。 No 1 | FastText 用於快速文字表示和分類的庫。

    最適合練手的10大機器學習開源專案,趕緊收藏!

    本文推薦的10大機器學習開源專案是由Mybridge從250個機器學習開源專案中挑選出來的,Gi

    10大機器學習開源專案推薦(Github平均star為1385)

    翻譯 | suisui出品 | 人工智慧頭條(AI_Thinker)本文推薦的10大機器學習開源專案是由Myb

    十大Python機器學習開源專案

    1、Scikit-learn 用於資料探勘和資料分析的簡單而有效的工具,基於NumPy,SciPy和matplotlib,開源,商業可用的BSD許可證。 2、Tensorflow  最初由Google機器智慧研究機構的Google Brain小組的研究人員和工程師開發

    機器學習:貝葉斯總結_3:線性迴歸貝葉斯迴歸

    線性迴歸的基函式模型 y(x,w)=w0+w1x1+......+wDxD y(x,w)=w0+∑M−1j=1wjϕj(x) ϕj(x):是基函數 基函式:多項式;高斯;sigmoid函式 基函

    20 個頂尖的 Python 機器學習開源專案

    1. Scikit-learn www.github.com/scikit-learn/scikit-learn Scikit-learn 是基於Scipy為機器學習建造的的一個Python模組,他的特色就是多樣化的分類,迴歸和聚類的演算法包括支援向量機,邏輯迴歸,樸