1. 程式人生 > >使用python模組plotdigitizer摳取論文圖片中的資料

使用python模組plotdigitizer摳取論文圖片中的資料

# 技術背景 對於各行各業的研究人員來說,經常會面臨這樣的一個問題:有一篇不錯的文章裡面有很好的資料,但是這個資料在文章中僅以圖片的形式出現。而假如我們希望可以從該圖片中提取出資料,這樣就可以用我們自己的形式重新來展現這些資料,還可以額外再附上自己優化後的資料。因此從論文圖片中提取資料,是一個非常實際的需求。這裡以前面寫的[量子退火的部落格](https://www.cnblogs.com/dechinphy/p/annealer.html)為例,部落格中有這樣的一張圖片: ![](https://img2020.cnblogs.com/blog/2277440/202103/2277440-20210305201349729-1744984352.png) 在這篇文章中,我們將介紹如何使用python從圖片上把資料摳取出來。 # plotdigitizer的安裝 這裡我們使用`pip`來安裝python第三方庫`plotdigitizer`,該庫的主要功能就是可以自動化的從圖片中提取出資料,我們可以使用騰訊的pip映象源來加速我們的安裝過程: ```bash [dechin@dechin-manjaro plotdigitizer]$ python3 -m pip install -i https://mirrors.cloud.tencent.com/pypi/simple plotdigitizer Looking in indexes: https://mirrors.cloud.tencent.com/pypi/simple Collecting plotdigitizer Downloading https://mirrors.cloud.tencent.com/pypi/packages/89/bb/ff753093458c05ce3b52fd17527b6b0622ca096aadcf561c6316320ab793/plotdigitizer-0.1.3-py3-none-any.whl (20 kB) Collecting loguru<0.6.0,>=0.5.3 Downloading https://mirrors.cloud.tencent.com/pypi/packages/6d/48/0a7d5847e3de329f1d0134baf707b689700b53bd3066a5a8cfd94b3c9fc8/loguru-0.5.3-py3-none-any.whl (57 kB) |████████████████████████████████| 57 kB 521 kB/s Collecting opencv-python<5.0.0,>=4.5.1 Downloading https://mirrors.cloud.tencent.com/pypi/packages/2a/9a/ff309b530ac1b029bfdb9af3a95eaff0f5f45f6a2dbe37b3454ae8412f4c/opencv_python-4.5.1.48-cp38-cp38-manylinux2014_x86_64.whl (50.4 MB) |████████████████████████████████| 50.4 MB 467 kB/s Collecting numpy<2.0.0,>=1.19.5 Downloading https://mirrors.cloud.tencent.com/pypi/packages/c7/e6/dccac76b7e825915ffb906beeba5a953597b6cfe1fe686b5276e122cb07c/numpy-1.20.1-cp38-cp38-manylinux2010_x86_64.whl (15.4 MB) |████████████████████████████████| 15.4 MB 20.4 MB/s Collecting matplotlib<4.0.0,>=3.3.4 Downloading https://mirrors.cloud.tencent.com/pypi/packages/ab/20/60cfe5d611ac86df07b7b1f9b9582f22f7eda5edbe2124ba85bdf3133822/matplotlib-3.3.4-cp38-cp38-manylinux1_x86_64.whl (11.6 MB) |████████████████████████████████| 11.6 MB 4.4 MB/s Requirement already satisfied: python-dateutil>=2.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (2.8.1) Requirement already satisfied: cycler>=0.10 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (8.0.1) Requirement already satisfied: kiwisolver>=1.0.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (1.3.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (2.4.7) Requirement already satisfied: six>=1.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.1->matplotlib<4.0.0,>=3.3.4->plotdigitizer) (1.15.0) Installing collected packages: loguru, numpy, opencv-python, matplotlib, plotdigitizer Attempting uninstall: numpy Found existing installation: numpy 1.19.2 Uninstalling numpy-1.19.2: Successfully uninstalled numpy-1.19.2 Attempting uninstall: matplotlib Found existing installation: matplotlib 3.3.2 Uninstalling matplotlib-3.3.2: Successfully uninstalled matplotlib-3.3.2 Successfully installed loguru-0.5.3 matplotlib-3.3.4 numpy-1.20.1 opencv-python-4.5.1.48 plotdigitizer-0.1.3 ``` 通過執行幫助指令,我們可以檢視是否安裝成功: ```bash [dechin@dechin-manjaro plotdigitizer]$ plotdigitizer -h usage: plotdigitizer [-h] --data-point DATA_POINT [--location LOCATION] [--plot PLOT] [--output OUTPUT] [--preprocess] [--debug] INPUT Digitize image. positional arguments: INPUT Input image file. optional arguments: -h, --help show this help message and exit --data-point DATA_POINT, -p DATA_POINT Datapoints (min 3 required). You have to click on them later. At least 3 points are recommended. e.g -p 0,0 -p 10,0 -p 0,1 Make sure that point are comma separated without any space. --location LOCATION, -l LOCATION Location of a points on figure in pixels (integer). These values should appear in the same order as -p option. If not given, you will be asked to click on the figure. --plot PLOT Plot the final result. Requires matplotlib. --output OUTPUT, -o OUTPUT Name of the output file else trajectory will be written to .traj.csv --preprocess Preprocess the image. Useful with bad resolution images. --debug Enable debug logger ``` # 執行指令與輸出圖片 先把需要摳取資料的圖片放到當前目錄下,然後執行如下指令: ```bash plotdigitizer ./test1.png -p 0,-1 -p 20,0 -p 0,0.1 --plot output.png ``` 該指令會將`test1.png`中的資料提取出來,可以使用`-o`儲存為csv格式的資料表格。這裡實際使用中我們發現,即使不用`plot`指令,也會在`Manjaro Linux`系統下不斷的輸出列印圖片,只有通過`kill -9`的方式才能強行將程序殺死,有可能是開源庫中存在的某個bug。這裡展示一下用新的資料繪製出來的效果圖: ![](https://img2020.cnblogs.com/blog/2277440/202103/2277440-20210305201020309-1172785513.png) 執行結束後,該圖片會被輸出到臨時資料夾`tmp/plotdigitizer/`下,但是注意前面產生的圖片會被後來的臨時檔案所覆蓋。 # 總結概要 這裡我們僅僅是介紹和演示了plotdigitizer的基本使用方法,這樣一個使用python製作的影象資料工具更加符合`pythoner`的使用習慣和邏輯。雖然實際使用過程中工具可能出現各種各樣的問題,但是基本上是一個比較好的工具,值得推薦。 # 版權宣告 本文首發連結為:https://www.cnblogs.com/dechinphy/p/plotdigitizer.html 作者ID:DechinPhy 更多原著文章請參考:https://www.cnblogs.com/dec