Paper Review: fpgaConvNet--A Framework for Mapping Convolutional Neural Networks on FPGAs

阿新 • • 發佈：2018-12-11

注：本文中所有的圖片均擷取自原文作者的論文和講稿。

基本資訊

題目：fpgaConvNet：一個將CNN對映到FPGA上的平臺
作者：Stylianos I. Venieris， Christos-Savvas Bouganis
機構：Imperial College London
發表年份：2016年
更新文章：
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs(2017)
Latency-Driven Design for FPGA-based Convolutional Neural Networks(2017)
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs(2017)
專案主頁：

http://cas.ee.ic.ac.uk/people/sv1310/fpgaConvNet.html
其他：該框架是作者在博士期間的成果

主要內容

核心內容

基本想法

basicidea

關鍵詞

domain specific modelling framework 專有建模框架
automated design methodology 設計自動化方法
design space exploration 設計空間探索
synchronous data flow for capturing CNN workloads as streaming computations
domain specific language 專有描述語言

關鍵工作

通過將CNN的處理過程視為為一種流結構(streaming architecture)，將CNN描述為SDF(Synchronous Data Flow)模型，如圖，並進行設計空間探索，設計一套轉換庫實現CNN模型從SDF到FPGA上的對映，最終輸出可綜合的Vivado HLS硬體設計。
SDFmodeling
從SDF模型到硬體building block的可配置對映有四種方法：
- 將SDFG進行拆分，每個subgraph用定製的full reconfigurable FPGA資源實現
reconfigration
- 引數化網路層的展開程度，如圖
coarsefolding
finefolding
- 引數化點乘的展開程度，如上圖
- 權值重載入

設計的突出特點

相比於前人基於FPGA的CNN設計及優化，該作者提出的優化和對映方法可以包含CNN的卷積層、池化層和非線性層，可以吸收所有FPGA平臺的引數。

框架的處理流圖

processingflow

框架結構圖

frameworkconstructure

工作的更新

2016年發表的版本包括了fpgaConvNet設計的核心：基於SDF的建模和對映以及自動化設計的流程。從SDF到硬體的對映的方法有三種：SDFG劃分以及FPGA資源的重配置；粗粒度的摺疊，實現途徑是引數化一個層(layer)的展開的程度，如果資源足夠可以完全展開並行，也可以只展開一半，分兩次處理；細粒度摺疊，實現途徑是引數化點乘的並行程度，同樣是完全並行或者時分複用。注意，沒有粗細力度摺疊的情況下FPGA的實現效能最高。此時的fpgaConvNet面向的主要是高吞吐率的應用。
2017年發表的版本對框架進行了拓展，加入了面向低延遲的設計優化，同時可以優化大尺寸網路，例如AlexNet和VGG16。
2017年更重要的更新是引入了一個SDF轉換模型：weights reloading，這種方法在不需要對輸入進行batch processing的情況下還可以降低延遲。

動機

CNN是計算密集型的機器學習演算法，不利於應用的推廣，尤其是AI嵌入式應用；
FPGA是一種可配置的結構，可以在效能、功耗和花費上做權衡。
基於FPGA的CNN設計受到FPGA資源和規模、CNN網路種類和規模以及應用特性需求變化的影響，需要一個能夠抽象FPGA資源的平臺來加強基於FPGA的CNN設計的可移植性和尺度變換性。
降低深度學習專家硬體實現CNN的門檻

背景

SDF

SDF的視覺化表示是有向圖，每個節點代表計算，每條邊代表資料流，計算節點的特點是隻要資料驅動，輸入資料準備好就進行計算，優點是可以對計算進行靜態排程，節點間緩衝儲存有限且可預測，不足是不能表示帶條件的計算。另外，也可以利用SDF的可運算特性(mathematical property)加強分析。如圖所示。

hardwaremapping

workloadmapping

實驗對比結果

關注的引數：performance density（每一個FPGA slice上的效能，單位Gops/s/slice） and performance efficiency（每瓦特功率產生的效能，單位Gops/s/Watt）

benchmark

進化版本

進化版本增加了兩個特性：
- Support for Irregular Networks
fpgaConvNet offers support for a wide range of networks, including both conventional ConvNets with regular layer connectivity as well as compound modules, such as Inception modules, residual blocks and dense blocks.
- Support for large networks
fpgaConvNet makes no assumptions on the size of ConvNets and supports the mapping of deep and wide networks independently of the target FPGA resources. This is achieved by supporting (i) bitstream-level reconfiguration which allows the mapping of ConvNets of large depth and (ii) the weights reloading of a layer which allows ConvNets to have wide convolutional layers without being constrained by the available on-chip memory. Both the reconfiguration and weights reloading employed by the generated hardware architecture are parametrised and optimised by fpgaConvNet for the target ConvNet-FPGA pair.

Paper Review: fpgaConvNet--A Framework for Mapping Convolutional Neural Networks on FPGAs

基本資訊

主要內容

核心內容

基本想法

關鍵詞

關鍵工作

設計的突出特點

框架的處理流圖

框架結構圖

工作的更新

動機

背景

SDF

實驗對比結果

benchmark

進化版本

Paper Review: fpgaConvNet--A Framework for Mapping Convolutional Neural Networks on FPGAs

Paper Review: FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

A Sensitivity Analysis of Convolutional Neural Networks for Sentence Classification

【DATE2017】Double MAC: Doubling the Performance of Convolutional Neural Networks on Modern FPGAs

深層CNN的調參經驗 | A practical theory for designing very deep convolutional neural networks

Holovive： A Framework For Connecting Virtual and Augmented Reality

區塊鏈深入理解---BLOCKBENCH：A Framework for Analyzing Blockchains

Understanding Convolutional Neural Networks for NLP

[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks論文筆記

A Beginner's Guide To Understanding Convolutional Neural Networks Part One 筆記

EffNet: An Efficient Structure for Convolutional Neural Networks

3-----A Forcast for Bicycle Rental Demand Based on Random Forests and Multiple Linear Regression

《Convolutional Neural Networks for Sentence Classification》論文結構解讀

【論文閱讀】Learning Dual Convolutional Neural Networks for Low-Level Vision

Building Fast and Compact Convolutional Neural Networks for Offline HCCR

Bag of Tricks for Image Classification with Convolutional Neural Networks

論文閱讀-(CVPR 2017) Kernel Pooling for Convolutional Neural Networks

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition

Convolutional Neural Networks for Beginners: Practical Guide with Python and Keras

【論文閱讀】Bag of Tricks for Image Classification with Convolutional Neural Networks

Paper Review: fpgaConvNet--A Framework for Mapping Convolutional Neural Networks on FPGAs

基本資訊

主要內容

核心內容

基本想法

關鍵詞

關鍵工作

設計的突出特點

框架的處理流圖

框架結構圖

工作的更新

動機

背景

SDF

實驗對比結果

benchmark

進化版本

相關推薦