1. 程式人生 > >Linux wget 批量下載

Linux wget 批量下載

方案一:使用wget自帶的一個功能 -i 選項  從指定檔案中讀取下載地址,這樣的好處是一直是這一個wget程序下載所有pdf,不會來回的啟、停止程序

複製程式碼
[[email protected] tmp]# pwd
/root/tmp
[[email protected] tmp]# wc -l 50pdf.log 
50 50pdf.log
[[email protected] tmp]# head -3 50pdf.log 
14788669468643331.pdf
1479035133045678.pdf
14799731544302441.pdf
[[email protected]
tmp]#
awk '{print "http://xxxxx/"$1}' 50pdf.log > download.log [[email protected] tmp]# head -3 download.log http://xxxxx/14788669468643331.pdf http://xxxxx/1479035133045678.pdf http://xxxxx/14799731544302441.pdf [[email protected] tmp]# wget -i download.log --2017-09-05 16:12:52-- http://xxxxx/14788669468643331.pdf
Resolving nfs.htbaobao.com... 106.75.138.13 Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2601963 (2.5M) [application/pdf] Saving to: “14788669468643331.pdf” 100%[========================================================================================================================================================================>] 2
,601,963 244K/s in 10s 2017-09-05 16:13:02 (245 KB/s) - “14788669468643331.pdf” saved [2601963/2601963] .......................................中間省略 --2017-09-05 16:14:04-- http://xxxxx/1481341338750833.pdf Reusing existing connection to nfs.htbaobao.com:80. HTTP request sent, awaiting response... 200 OK Length: 152155 (149K) [application/pdf] Saving to: “1481341338750833.pdf” 100%[========================================================================================================================================================================>] 152,155 209K/s in 0.7s 2017-09-05 16:14:05 (209 KB/s) - “1481341338750833.pdf” saved [152155/152155] FINISHED --2017-09-05 16:14:05-- Downloaded: 50 files, 16M in 1m 13s (226 KB/s)

[[email protected] tmp]# ls
14788669468643331.pdf 1481187682278708.pdf 1481262534034760.pdf 1481266593232456.pdf 1481340827926207.pdf 1481340948842260.pdf 1481341049634040.pdf 1481341172815801.pdf 1481341307823881.pdf
1479035133045678.pdf 1481193562811982.pdf 1481262611307371.pdf 1481267034803389.pdf 1481340853666343.pdf 1481340973957872.pdf 1481341112979143.pdf 1481341185245978.pdf 1481341338750833.pdf
14799731544302441.pdf 1481247789582233.pdf 1481262623674903.pdf 1481270022285676.pdf 1481340897933322.pdf 1481341008561312.pdf 1481341130545646.pdf 1481341216517700.pdf 50pdf.log
14799944743125144.pdf 1481262178457017.pdf 1481262846773279.pdf 1481286012498927.pdf 1481340922434822.pdf 1481341008584230.pdf 1481341134346522.pdf 1481341229730723.pdf download.log
1481034002739896.pdf 1481262229905206.pdf 1481265452669335.pdf 1481340787767089.pdf 1481340927135663.pdf 1481341022043499.pdf 1481341148759269.pdf 1481341244148718.pdf
1481095290513785.pdf 1481262241457479.pdf 1481265807661321.pdf 1481340826599027.pdf 1481340943094250.pdf 1481341045655154.pdf 1481341159027852.pdf 1481341261314587.pdf

複製程式碼

在下載過程中開啟另外一個視窗檢視是否是同一個wget程序

複製程式碼
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11752  9933  0 16:12 pts/1    00:00:00 wget -i download.log
[[email protected] ~]# ps -ef|grep -v grep|grep wget
[[email protected] ~]# 
複製程式碼

方案二:把這些URL地址放在一個檔案裡面,然後寫個指令碼直接for迴圈取一個URL地址交給wget下載,但是這樣不好的是每下載一個pdf都會啟動一個wget程序,下載完成後關閉wget程序 ......一直這樣迴圈到最後一個,比較影響系統性能

複製程式碼
[[email protected] tmp]# ls
50pdf.log  download.log  wget_pdf.sh
[[email protected] tmp]# cat wget_pdf.sh
#!/usr/bin/env bash
#
for url in `cat /root/tmp/download.log`;do
    wget $url
done
[[email protected] tmp]# sh wget_pdf.sh 
--2017-09-05 16:24:06--  http://xxxxx/14788669468643331.pdf
Resolving nfs.htbaobao.com... 106.75.138.13
Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2601963 (2.5M) [application/pdf]
Saving to: “14788669468643331.pdf”

100%[========================================================================================================================================================================>] 2,601,963    230K/s   in 11s     

2017-09-05 16:24:17 (224 KB/s) - “14788669468643331.pdf” saved [2601963/2601963]
......................................................中間省略
--2017-09-05 16:25:21--  http://xxxxx/1481341338750833.pdf
Resolving nfs.htbaobao.com... 106.75.138.13
Connecting to nfs.htbaobao.com|106.75.138.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152155 (149K) [application/pdf]
Saving to: “1481341338750833.pdf”

100%[========================================================================================================================================================================>] 152,155      184K/s   in 0.8s    

2017-09-05 16:25:22 (184 KB/s) - “1481341338750833.pdf” saved [152155/152155]

[[email protected] tmp]# ls
14788669468643331.pdf  1481187682278708.pdf  1481262534034760.pdf  1481266593232456.pdf  1481340827926207.pdf  1481340948842260.pdf  1481341049634040.pdf  1481341172815801.pdf  1481341307823881.pdf
1479035133045678.pdf   1481193562811982.pdf  1481262611307371.pdf  1481267034803389.pdf  1481340853666343.pdf  1481340973957872.pdf  1481341112979143.pdf  1481341185245978.pdf  1481341338750833.pdf
14799731544302441.pdf  1481247789582233.pdf  1481262623674903.pdf  1481270022285676.pdf  1481340897933322.pdf  1481341008561312.pdf  1481341130545646.pdf  1481341216517700.pdf  50pdf.log
14799944743125144.pdf  1481262178457017.pdf  1481262846773279.pdf  1481286012498927.pdf  1481340922434822.pdf  1481341008584230.pdf  1481341134346522.pdf  1481341229730723.pdf  download.log
1481034002739896.pdf   1481262229905206.pdf  1481265452669335.pdf  1481340787767089.pdf  1481340927135663.pdf  1481341022043499.pdf  1481341148759269.pdf  1481341244148718.pdf  wget_pdf.sh
1481095290513785.pdf   1481262241457479.pdf  1481265807661321.pdf  1481340826599027.pdf  1481340943094250.pdf  1481341045655154.pdf  1481341159027852.pdf  1481341261314587.pdf
複製程式碼

在下載過程中開啟另外一個視窗檢視是否是同一個wget程序

複製程式碼
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11780 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14788669468643331.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11784 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/1479035133045678.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11784 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/1479035133045678.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11791 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799731544302441.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11791 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799731544302441.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11798 11778  0 16:24 pts/1    00:00:00 wget http://xxxxx/14799944743125144.pdf
[[email protected] ~]# ps -ef|grep -v grep|grep wget
root     11778  9933  0 16:24 pts/1    00:00:00 sh wget_pdf.sh
root     11846 11778  0 16:25 pts/1    00:00:00 wget http://xxxxx/1481341307823881.pdf
複製程式碼

小結:

  1、使用方案一 只有一個程序下載,且在最後會顯示總共下載了多少個檔案,下載的總大小等資訊

  2、使用方案二 每次下載都會重新生成一個wget程序,上下文頻繁切換

相關推薦

no