Top 10 Open Image Datasets for Machine Learning Research

阿新 • • 發佈：2018-12-28

This article would succinctly describe the best ten image datasets used for certain fundamental computer vision problems such as classification, detection and segmentation. Considering traditional computer vision approaches and also to encourage audience who are resource constrained and to seed an idea of getting started with computer vision, this article is planned and crafted in such a way that the list also includes some smaller image datasets.

Open Image Dataset Resources

IMAGENET

[Classification][Detection]

Imagenet is more or less the de facto in the computer vision problem of classification since the deep learning revolution. It contains more than 14M images with 21841 synsets. To enable you download such huge data, the organizers have provided the options to download raw images, urls, sift features, bounding boxes and object attributes. As an added advantage, it also has API integration.

Classification SOTA: 3.57% top-5 error (ResNet 2015). The detection problem has 150 images per each of 3k synsets.

: 73.1 mAP for 85 object categories.

PASCAL VOC

[Detection][Segmentation]

Covering 20 classes with 11.5k images and 27.5 objects, PASCAL VOC has been used for segmentation with 7k labeled images.

PASCAL VOC object detection challenge has been closed after a 7 year run and the excerpts are published.

Detection SOTA : 75.9 mAP at IoU = 0.5
: 89.0 mAP

[Detection][Segmentation][Image Captioning][Keypoint detection]

With more than 200k labeled images containing 1.5M instances of 80 classes, MS COCO has also been annotated with 5 captions per image. They also contain 250k people with keypoint annotations.

Detection SOTA : 0.52 mAP at IoU = s0.5:0.05:0.95
Segmentation SOTA : 0.48 mAP
Keypoint SOTA: 0.76 mAP

[Action Recognition]
1M sports videos of average length-5.5mins labelled for 487 sports classes.
SOTA: 73.3%

[Action Recognition]
Curated set of 8M YouTube videos that are between 2-10mins have at least 1000 views. It has been labeled for 4800 entities. The average video length is about 4 minutes.
SOTA: 0.839 GAP(Global Average Precision)

Those are the big shots. Are you constrained with resources and still interested in kick-starting with deep learning? You could use the following smaller image datasets for tasks such as classification.

[Classification]
CIFAR-10 consists of 60k images of smaller dimension(32×32) that are classified into 10 classes; could be used for trying out SIFT based approaches or maybe build a custom CNN of your own.

[Classification]

CIFAR-100 is an image dataset for fine-grained classification problem, it’s compiled to contain 100 classes with super classes. Each class contain 500 training images and 100 test images.

CALTECH datasets

[Classification]

CALTECH-101 – 101 classes with 40-800 images per class with dimension 300×200 pixels that are compiled to enable classification. CALTECH-256, a scale extension of its predecessor, contains 256 classes encompassing 30607 images.

Related:

Top 10 Open Image Datasets for Machine Learning Research

This article would succinctly describe the best ten image datasets used for certain fundamental computer vision problems such as classification, detecti

The 50 Best Public Datasets for Machine Learning

The 50 Best Public Datasets for Machine LearningWhat are some open datasets for machine learning? After scrapping the web for hours after hours, we have cr

7 Time Series Datasets for Machine Learning

Tweet Share Share Google Plus Machine learning can be applied to time series datasets. These are

斯坦福大學公開課機器學習：machine learning system design | data for machine learning（數據量很大時，學習算法表現比較好的原理）

ali 很多好的 info 可能斯坦福大學公開課數據 div http 下圖為四種不同算法應用在不同大小數據量時的表現，可以看出，隨著數據量的增大，算法的表現趨於接近。即不管多麽糟糕的算法，數據量非常大的時候，算法表現也可以很好。數據量很大時，學習算法表現比

Statistical Methods for Machine Learning

AS n-2 cal 元素 n) pan size AC 情況機器學習中的統計學方法。統計學是機器學習的一個支柱。原始觀察僅僅是數據, 但它們不是信息或知識。數據引發問題, 例如: 什麽是最常見的或預期的觀察？觀察的限制是什麽？數據是什麽樣子的？

U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》

when calculating U C

《C4.5: Programs for Machine Learning》chaper4實驗結果重現

使用自帶的ｖｏｔｅ資料集：實驗結果如下：剪枝前： physician fee freeze = n: | adoption of the budget resolution = y: democrat (151.0) | adoption of the budget resolution

the resource for machine learning

Questions and Answers What's matrix dot product in Deep Learning? Deep Neural Network with Matrices https://matrices.io/deep-neural-network-from-scrat

[Infographic] The Best Tools for Machine Learning Gengo AI

Machine learning projects can range from small datasets and standard algorithms, to much larger projects that use neural networks engines with massive data

Facebook's PyTorch plans to light the way to speedy workflows for Machine Learning • DEVCLASS

Facebook's development department has finished a first release candidate for v1 of its PyTorch project – just in time for the first conference dedicated to

Essential libraries for Machine Learning in Python

Python is often the language of choice for developers who need to apply statistical techniques or data analysis in their work. It is also used by data scie

H2O.ai Named "Top 3 Artificial Intelligence (AI) and Machine Learning (ML) Software Solution" by Enterprise Management Associate

H2O.ai, the open source leader in AI, has been named a "Top 3 Vendor" in Artificial Intelligence and Machine Learning by industry analyst firm Enterprise M

Top 10 Open Image Datasets for Machine Learning Research

Open Image Dataset Resources

Top 10 Open Image Datasets for Machine Learning Research

The 50 Best Public Datasets for Machine Learning

7 Time Series Datasets for Machine Learning

斯坦福大學公開課機器學習：machine learning system design | data for machine learning（數據量很大時，學習算法表現比較好的原理）

Statistical Methods for Machine Learning

U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》

《C4.5: Programs for Machine Learning》chaper4實驗結果重現

the resource for machine learning

[Infographic] The Best Tools for Machine Learning Gengo AI

Facebook's PyTorch plans to light the way to speedy workflows for Machine Learning • DEVCLASS

Essential libraries for Machine Learning in Python

H2O.ai Named "Top 3 Artificial Intelligence (AI) and Machine Learning (ML) Software Solution" by Enterprise Management Associate

Why Data Normalization is necessary for Machine Learning models

Gartner Identifies the Top 10 Strategic Technology Trends for 2019

Gartner Top 10 Strategic Technology Trends for 2019

NXP Owns the Stage for Machine Learning in Edge Devices

NXP's New Development Platform for Machine Learning in the IoT

Free Online Course: Neural Networks for Machine Learning from Coursera Class Central

Marginally Interesting: Slides for Machine Learning on Streams

Using Amazon’s Mechanical Turk for Machine Learning Data

Top 10 Open Image Datasets for Machine Learning Research

Open Image Dataset Resources

相關推薦