1. 程式人生 > >論文學習:YodaNN1: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

論文學習:YodaNN1: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

摘要:The
computational effort of today’s CNNs requires power-hungry
parallel processors(高耗能並行處理器) or GP-GPUs(計算圖形處理器).Recent developments in CNN accelerators for system-on-chip integration(系統級晶片整合) have reduced energy consumption (耗能)significantly.Unfortunately, even these highly optimized devices(高度優化的裝置) are above the power envelope(包絡功率) imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage.This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes(超低功耗物聯網節點) for near-sensor (對近感測器)analytics.Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training.These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage.These new findings bring major optimization opportunities in the arithmetic core(算術核心) by removing the need for expensive multiplications(大量乘法運算), as well as reducing I/O bandwidth and storage.These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work

, we present an accelerator optimized for binary-weight CNNs that achieves 1.5 TOp/s at 1.2V on a core area of only 1.33MGE (Million Gate Equivalent,百萬級等效門) or 1.9mm2 and with a power dissipation of 895μW in UMC 65nm technology at 0.6V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/
[email protected]
and 1.1 TOp/s/[email protected], respectively.

相關推薦

論文學習YodaNN1: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

摘要:The computational effort of today’s CNNs requires power-hungry parallel processors(高耗能並行處理器) or GP-GPUs(計算圖形處理器).Recent develo

論文學習Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

20180313,谷歌開源了語義影象分割模型 DeepLab-v3+。 GitHub 地址:https://github.com/tensorflow/models/tree/master/research/deeplab 論文連結:https://arxiv.org/abs/1802.02

論文筆記10ITSEGO: An Ontology for Game-based Intelligent Tutoring Systems

參考論文:ITSEGO: An Ontology for Game-based Intelligent Tutoring Systems Abstract 這項工作提出了一個方法,發展學生解決問題的能力和數字能力,實現從幼兒園到小學的過渡。通過一種基於本體的方法,該方法將一個智慧的輔導系統(

手勢跟蹤論文學習Realtime and Robust Hand Tracking from Depth(三)Cost Function

引入 tail track col div 理想 問題 from details iker原創。轉載請標明出處:http://blog.csdn.net/ikerpeng/article/details/39050619 Realtime and Robust Hand

論文學習Overview of the High Efficiency Video Coding Standard

需求 rev creating ace 要點 ransac 保持 https asi Souce IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMB

論文學習Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

目錄 1. Problem I. Introduction II. Analysis 2. Address the problem I. Batch normalization 及其問題 II. 梯度修正及其問題 III. Key alg

論文筆記Learning Region Features for Object Detection

中心思想 繼Relation Network實現可學習的nms之後,MSRA的大佬們覺得目標檢測器依然不夠fully learnable,這篇文章類似之前的Deformable ROI Pooling,主要在ROI特徵的組織上做文章,文章總結了現有的各種ROI Pooling變體,提出了一個統一的數學表示式

論文解讀Stacked Attention Networks for Image Question Answering

這是關於VQA問題的第二篇系列文章,這篇文章在vqa領域是一篇比較有影響的文章。本篇文章將介紹論文:主要思想;模型方法;主要貢獻。有興趣可以檢視原文:Stacked Attention Networks for Image Question Answering。原論文中附有作者原始碼。

初入SLAM,論文學習權美香,樸鬆昊,李國. 視覺 SLAM 綜述

摘要 視覺 SLAM 指的是相機作為唯一的外部感測器,在進行自身定位的同時建立環境地圖。 SLAM 建立的地圖的好壞對之後自主的定位、路徑規劃以及壁障的效能起到一個決定性的作用。 本文對基於特徵的視覺 SLAM 方法和直接的 SLAM 方法,視覺 SLAM 的主要標誌性成果,SLAM 的主要研究

論文筆記Feature Pyramid Networks for Object Detection

初衷 Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep

論文筆記Is object localization for free?

Is object localization for free? Weakly-supervised learning with convolutional neural networks 摘要 提出一個弱監督卷積神經網路for 分類。主要貢獻有:

論文筆記Deep neural networks for YouTube recommendations

https://blog.csdn.net/xiongjiezk/article/details/73445835 Download [1] Covington P, Adams J, Sargin E. Deep neural networks for youtube recommen

論文筆記Residual Attention Network for Image Classification

前言 深度學習中的Attention,源自於人腦的注意力機制,當人的大腦接受到外部資訊,如視覺資訊、聽覺資訊時,往往不會對全部資訊進行處理和理解,而只會將注意力集中在部分顯著或者感興趣的資訊上,這樣有助於濾除不重要的資訊,而提升資訊處理的效率。最早將A

論文閱讀Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

概述: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos,ICCV 2017的文章,實現的是用domain adaptation技術將沒有label 的視訊資料遷移到圖片識別網路中

[論文學習]An Effective Approach for Mining Mobile User Habits一種高效挖掘移動使用者習慣的方法

原文: Cao H, Bao T, Yang Q, et al. An effective approach for mining mobile user habits[C]//Proceedings of the 19th ACM international confere

深度學習論文翻譯解析(一)YOLOv3: An Incremental Improvement

cluster tina ble mac 曾經 media bject batch 因此 原標題: YOLOv3: An Incremental Improvement 原作者: Joseph Redmon Ali Farhadi YOLO官網:YOLO: Real-Tim

論文閱讀筆記十八ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

每一個 內核 基於 proc vgg 包含 rep 重要 偏差 論文源址:https://arxiv.org/abs/1606.02147 tensorflow github: https://github.com/kwotsin/TensorFlow-ENet 摘要

深度學習論文筆記Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes

這篇文章將深度學習演算法應用於機械故障診斷,採用了“小波包分解+深度殘差網路(ResNet)”的思路,將機械振動訊號按照故障型別進行分類。 文章的核心創新點:複雜旋轉機械系統的振動訊號包含著很多不同頻率的衝擊和振盪成分,而且不同頻帶內的振動成分在故障診斷中的重要程度經常是不同的,因此可以按照如下步驟設計深度

深度學習論文self-trainsfer learning for weakly supervised lesion localization

self-training learning: 自我訓練學習 weakly supervised :弱監督學習 主要關注三種弱監督型別: 第一種是不完全監督,即只有訓練資料集的一個(通常很小的)子集有標籤,其它資料則沒有標籤。 第二種是不確切監督,即只有粗粒度的標籤。又以影象

深度學習乳腺論文Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram

參考論文解析:https://blog.csdn.net/Coralccccc/article/details/73956702 論文翻譯:https://blog.csdn.net/u014264373/article/details/79581655 標題:Deep Multi-inst