[論文閱讀] CTPN---Detecting Text in Natural Image with Connectionist Text Proposal Network

這篇論文思路和Faster RCNN是差不多的。總體來說，就是先通過RPN(Region Proposal Network)來提取proposal，然後再對提取得到的proposal進行classification。
文章對Faster RCNN有以下幾點改進：
- Faster RCNN中使用的3種size和3種長寬比組合的9種anchor，但是CTPN中，他固定了anchor為16px(vgg16, 因為有4個pooling 層),而只是設定了10種高的值。這樣是結合了text detection的特點，一般都是細長的結構。
- 再得到Feature map之後，我們通過一個BD-LSTM結構去提取每個pixel對應的Feature。這樣做是為了利用global information。他將一行的pixel看成一個序列輸入給BD-LSTM去提取Feature。得到BD-LSTM的輸出以後，我們再去得到每個anchor的score以及對應的anchor的座標值。
- 還有一個contribution是他對水平座標還做了一定的微調。具體的公式如下所示：
  $o = (x_{side} - c_x^a) / w^a, o^* = (x_{side}^* - c_x^a) / w^a$
  這裡面o代表的predict， $o^*$ 代表的是GT。 $x_{side}$ 代表的是未修正的預測的anchor的座標， $x_{side}^*$ 代表的就是ground truth。 $c_x^a$ 代表anchor的對心所對應的x座標。 $w^a$ 代表anchor的寬，這裡是固定值（16）。之所以除以寬相對於做了一定的歸一化吧。

演算法的流程：如圖上所示:
- 首先通過常規的特徵提取模組(例如，VGG16)來得到feature map，假設大小為h*w*c
- 通過一個卷積層，將其轉化為h*w*256的shape
- 我們將其轉化為h*(w*256),其中，將w*256看成一個長度為w的輸入序列，將其輸入到BD-LSTM中。
- 將得到feature 再轉化成h*w*d 其中d代表的是BD-LSTM輸出的維度
- 然後再分別通過全連線層來對每個anchor預測score以及座標。注意，這裡是對feature map中的每個pixel進行預測的。也就是說我們fc的輸出分別是h*w*(10*2)以及h*w*(10*4)
- 最後使用上面步驟訓練好的網路，得到類似與上圖B的許多anchor，然後在使用連線演算法，將其連線起來。連線演算法的定義如下：
  - 首先挑選出所有score>0.7的anchor
  - 針對每個anchor, 定義他的鄰居anchor , 他們要滿足以下條件
    - 這兩個anchor的最近的
    - anchor之間的距離小於50個pixel
  - 如果 $B_i$ 和 $B_j$ 互為鄰居，那麼就將其合併，知道找不到互為鄰居的anchor為止。
上面講了演算法的流程，接下來我們看一下loss的定義，來了解具體我們怎麼訓練我們的網路
- 上面是訓練的整體的loss，它由三部分組成，第一部分是分類的交叉熵，第二部分是對垂直座標做regression的Smooth L1loss，第三部分是對水平座標做regression的Smooth L1 loss。
- $s_i$ 代表的是第i個anchor預測是text的概率， $s_i^*$ 是對應的ground truth{0,1}。
- $v_j, v_j^*分別代表的是第j個anchor所對應縱座標的預測值和ground truth。注意，這裡的j和i不一樣是因為，這裡我們只計算probability>0.7或者是$ s_j^*=1$的anchor，也就是隻計算正樣本
- Smooth L1 loss的定義如下：
  $相關推薦 .r{ margin-bottom:10px; border-bottom:1px solid #f1f1f1; padding-bottom:10px;} .r p{ color:#999; line-height:25px;} .r h5 a{ font-size:16px; line-height:25px;} .r h5 a:hover{ color:#ff6600} 《Detecting Text in Natural Image with Connectionist Text Proposal Network》論文閱讀之CTPN 前言 2016年出了一篇很有名的文字檢測的論文：《Detecting Text in Natural Image with Connectionist Text Proposal Network》，這個深度神經網路叫做CTPN，直到今天這個網路框架一直是OCR系統中做文字檢測的一個常用網路，極大 [論文閱讀] CTPN---Detecting Text in Natural Image with Connectionist Text Proposal Network 這篇論文思路和Faster RCNN是差不多的。總體來說，就是先通過RPN(Region Proposal Network)來提取proposal，然後再對提取得到的proposal進行classification。文章對Faster RCNN有以下幾點改進：深度學習論文翻譯解析（三）：Detecting Text in Natural Image with Connectionist Text Proposal Network 論文標題：Detecting Text in Natural Image with Connectionist Text Proposal Network 論文作者：Zhi Tian , Weilin Huang, Tong He , Pan He , and Yu Qiao 論文原始碼的下載地址：htt Detecting Text in Natural Image + YOLOv3+crnn 本專案基於yolo3 與crnn 實現中文自然場景文字檢測及識別專案地址：https://github.com/chineseocr/chineseocr 環境部署 python=3.6 pytorch==0.4.1 git clone https://gith SWT（Detecting Text in Natural Scenes with Stroke Width Transform）演算法詳解《Detecting Text in Natural Scenes with Stroke Width Transform》，這是微軟公司的一篇發表於CVPR2010的文章，使用傳統方法來檢測自然場景中的文字。程式碼地址：https://github.com/aperrau/DetectTe 論文速讀（Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network ）整體 text one ext red more show 檢測 another Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network 論文 Jiaming L SegLink（Detecting Oriented Text in Natural Images by Linking Segments）演算法詳解《Detecting Oriented Text in Natural Images by Linking Segments》是和EAST同年的一篇發表在CVPR2017的OCR論文。程式碼地址：https://github.com/bgshih/seglink，這是該文章其中一個作者提供的【論文閱讀】Between-class Learning for Image Classification 文章：Between-class Learning for Image Classification 連結：https://arxiv.org/pdf/1711.10284.pdf CVPR2018 作者嘗試了將在音訊上的方法用在影象上的，並提出了一種將影象作為波形處理的混合方法（作者認為圖形波長融【論文閱讀】Bag of Tricks for Image Classification with Convolutional Neural Networks Bag of Tricks for Image Classification with Convolutional Neural Networks 論文：https://arxiv.org/pdf/1812.01187.pdf 本文作者總結了模型訓練過程中可以提高準確率的方法,如題，論文閱讀——《Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning》對於同一張圖片的不同區域，需要的denoise的網路是一樣的嗎？有些區域可能很簡單的網路就可以實現很好的效果，但有些區域需要比較複雜的網路才可以得到不錯的效果。對於不同的圖片，也是如此，有些圖片需要複雜的網路，有些圖片不需要複雜的網路。如何的自適應地去應對不同的condition？ ECCV2018 | 論文閱讀CornerNet: Detecting Objects as Paired Keypoints CornerNet: Detecting Objects as Paired Keypoints 論文內容詳細整理！！！理解CornerNet，看這一篇就夠了~ 論文連結：https://arxiv.org/abs/1808.01244 程式碼連結：https://github. [原創·論文閱讀]QGesture: Quantifying Gesture Distance and Direction with WiFi Signals [原創·論文閱讀]QGesture: Quantifying Gesture Distance and Direction with WiFi Signals 前言本文推出了一個叫做QGesture的系統，在一維和二維部署場景下，它能對人的手勢的運動距離和方向進行測量。部署場景：論文閱讀筆記之——《Benchmarking Denoising Algorithms with Real Photographs》本博文為論文《Benchmarking Denoising Algorithms with Real Photographs》的閱讀筆記通過Gaussian noise來合成圖片 benchmarking denoising techniques（基準測試去噪技術）影象去噪傳統的最新論文閱讀（12）--Feedforward semantic segmentation with zoom-out features Feedforward semantic segmentation with zoom-out features - CVPR2015 - TTI-zoomout-16；語義分割 - 　　這篇文章的方法是superpixel-level的，主 High Performance Visual Tracking with Siamese Region Proposal Network論文筆記論文：High Performance Visual Tracking with Siamese Region Proposal Network 文論下載：http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performa High Performance Visual Tracking with Siamese Region Proposal Network 閱讀筆記 1，(IDEA) In tracking task we don’t have pre-defined categories, so we need the template branch to encode the target’s appearance 論文閱讀1《AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networ》 paper連結https://arxiv.org/abs/1711.10485， code連結https://github.com/taoxugit/AttnGAN，作者的homepage https://sites.google.com/view/taoxu 本文給出的是CVPR 2 自然場景文字處理論文整理（5）Detecting Curve Text in the Wild: New Dataset and New Solution 這篇文章是在自然場景文字處理中針對彎曲問題做的非常好的一篇文章。後面打算先用這篇論文來做實驗。 paper：https://arxiv.org/abs/1712.02170 github:https://github.com/Yuliang-Liu/Curve-Text-Detect 論文閱讀：A Primer on Neural Network Models for Natural Language Processing（1）選擇 works embed 負責距離 feature 結構 tran put 前言 2017.10.2博客園的第一篇文章，Mark。由於實驗室做的是NLP和醫療相關的內容，因此開始啃NLP這個硬骨頭，希望能學有所成。後續將關註知識圖譜，深度強化學習等內 [Javascript] Classify text into categories with machine learning in Natural bus easy ann etc hms scrip steps spam not In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression cl$

[論文閱讀] CTPN---Detecting Text in Natural Image with Connectionist Text Proposal Network

相關推薦