1. 程式人生 > >python批量下載上次論文,還在爬取貼吧圖片?快用批量下載sci論文吧,根據標題名或者DOI批量下載 scihub 科研下載神器

python批量下載上次論文,還在爬取貼吧圖片?快用批量下載sci論文吧,根據標題名或者DOI批量下載 scihub 科研下載神器

昨晚在下載scil論文,一共295篇,手動下載的話豈不是要累si?

於是想到有沒有批量下載sci論文的。

在web of science 上匯出下載問下的標題、DOI等txt檔案,然後篩選得到DOI和標題,儲存為新檔案。

通過迴圈得到DOI與標題,下載並儲存成標題命名。

程式參考如下網址:

https://github.com/zaytoun/scihub.py

Setup

pip install -r requirements.txt

Usage

You can interact with scihub.py from the commandline:

usage: scihub.py [-h] [-d (DOI|PMID|URL)] [-f path] [-s query] [-sd query]
                 [-l N] [-o path] [-v]

SciHub - To remove all barriers in the way of science.

optional arguments:
  -h, --help            show this help message and exit
  -d (DOI|PMID|URL), --download (DOI|PMID|URL)
                        tries to find and download the paper
  -f path, --file path  pass file with list of identifiers and download each
  -s query, --search query
                        search Google Scholars
  -sd query, --search_download query
                        search Google Scholars and download if possible
  -l N, --limit N       the number of search results to limit to
  -o path, --output path
                        directory to store papers
  -v, --verbose         increase output verbosity
  -p, --proxy           set proxy

You can also import scihub. The following examples below demonstrate all the features.

fetch

from scihub import SciHub

sh = SciHub()

# fetch specific article (don't download to disk)
# this will return a dictionary in the form 
# {'pdf': PDF_DATA,
#  'url': SOURCE_URL,
#  'name': UNIQUE_GENERATED NAME
# }
result = sh.fetch('http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1648853')

download

from scihub import SciHub

sh = SciHub()

# exactly the same thing as fetch except downloads the articles to disk
# if no path given, a unique name will be used as the file name
result = sh.download('http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1648853', path='paper.pdf')

search

from scihub import SciHub

sh = SciHub()

# retrieve 5 articles on Google Scholars related to 'bittorrent'
results = sh.search('bittorrent', 5)

# download the papers; will use sci-hub.io if it must
for paper in results['papers']:
	sh.download(paper['url'])

但是scihub存在驗證碼問題,驗證碼問題如何解決呢?

http://sci-hub.tw/

存在驗證碼問題

導致爬取失敗,如何解決驗證碼識別問題將是關鍵!!

以後有時間再試試咯!