python實現百度URL的采集

阿新 • • 發佈：2017-08-25

end not ref result [0 -a odin mozilla 代碼

用到的模塊：threading多線程模塊 requests模塊 BeautifulSoup模塊

實現功能：可以通過命令行控制關鍵字以及線程數，實現百度的url采集

代碼如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date : 2017-08-25 12:47:59
# @Author : arong
# @Link :
# @Version : $Id$

import requests,threading
from bs4 import BeautifulSoup as bs
import time,Queue
import sys

headers={‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0‘}
class BaiduSpider(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self._queue=queue

def run(self):
while not self._queue.empty():
url=self._queue.get()
try:
self.spider(url)
except Exception,e:
print e
pass
def spider(self,url):
r=requests.get(url=url,headers=headers)
soup=bs(r.content,‘lxml‘)
result=soup.find_all(name=‘a‘,attrs={‘class‘:‘c-showurl‘})
for url in result:
url2=url[‘href‘]
r_get_url=requests.get(url=url2,headers=headers,timeout=8)
if r_get_url.status_code==200:
url_tmp=r_get_url.url.split(‘/‘)
print url_tmp[2]

def main(keyword,thread_count):
queue=Queue.Queue()
for i in range(0,50,10):
queue.put(‘https://www.baidu.com/s?wd=%s&pn=%s‘%(keyword,str(i)))
threads=[]
thread_count=int(thread_count)
for i in range(thread_count):
threads.append(BaiduSpider(queue))
for t in threads:
t.start()
for t in threads:
t.join()
if __name__==‘__main__‘:
if len(sys.argv)!=3:
print ‘use %s keyword,thread_count‘%sys.argv[0]
sys.exit(1)
else:
main(sys.argv[1],sys.argv[2])

感覺還是有點慢，優化的事情等再學習學習再說吧哈哈哈

python實現百度URL的采集

end not ref result [0 -a odin mozilla 代碼用到的模塊：threading多線程模塊 requests模塊 BeautifulSoup模塊實現功能：可以通過命令行控制關鍵字以及線程數，實現百度的url采集代碼如下： #!/usr/

python實現百度URL的采集

python實現百度URL的采集

python實現百度搜索

python實現百度VIP音樂爬取

超詳細的Python實現百度雲盤模擬登陸(模擬登陸進階)

Python實驗:百度搜索關鍵字自動打開相關URL

python 爬取百度url

python一行程式碼實現百度翻譯和有道翻譯結果獲取-----py學習爬蟲歷程（一）

python之百度AI實現人臉識別

python+webdriver+autoit實現百度雲盤檔案上傳

SEO【集】實現百度右側排名相關搜尋全攻略

Python利用百度地圖api批量獲取地址經緯度

《Python網絡數據采集》筆記之BeautifulSoup

利用WebBrowser控件實現百度自動搜索

我的第一個自動化腳本（python）----百度搜索

Python網絡數據采集

使用python進行數據的采集

vue中實現百度地圖拖拽地圖定位功能

C#實現百度網站收錄和排名查詢功能思路及實例

python實現中文轉換url編碼的方法

利用IDM實現百度雲滿速下載

python實現百度URL的采集

相關推薦