python爬蟲學習筆記--python多程序

阿新 • • 發佈：2019-01-21

使用multiprocessing模組建立多程序：

import os
from multiprocessing import Process

#子程序要執行的程式碼
def run_proc(name):
    print('Child process %s (%s)Running...' %(name,os.getpid()))
    
if __name__=='__main__':
    print('Parent poecess %s.' % os.getpid())
    for i in range(5):
        p=Process(target=run_proc,args=(str(i),))
        print('Process will start.')
        p.start()
    p.join()
    print('Process end.')

執行效果：

使用multiprocessing模組中的Pool類代表程序池物件：

Pool可以提供指定數量的程序供使用者呼叫，預設大小是CPU的核數。

from multiprocessing import Pool
import os,time,random

def run_task(name): 
    print('Task %s (pid=%s) is running...' % (name,os.getpid())) 
    time.sleep(random.random()*3) 
    print('Task %s end.' % name)
	
	
if __name__=='__main__': 
    print('Current process %s.' % os.getpid()) 
    p=Pool(processes=3)#指定建立程序池大小為3，用p=Pool()則程序池大小為CPU核數 
    for i in range(5): 
        p.apply_async(run_task,args=(i,)) 
    print('Wating for all subprocess done...') 
    p.close() 
    p.join() 
    print('All subprocesses done.')

執行效果：

程序間通訊：

Queue通訊方式

from multiprocessing import Process,Queue
import os,time,random

#寫資料程序執行的程式碼
def proc_write(q,urls):
    print('Process(%s) is writing...' % os.getpid())
    for url in urls:
        q.put(url)
        print('Put %s to queue...' % url)
        time.sleep(random.random())
        
#讀資料程序執行的程式碼
def proc_read(q):
    print('Process (%s) is reading...' % os.getpid())
    while True:
        url=q.get(True)
        print('Get %s from queue.' % url)

if __name__=='__main__':
    #父程序建立Queue,並傳給各個子程序
    q=Queue()
    proc_writer1=Process(target=proc_write,args=(q,['url_1','url_2','url_3']))
    proc_writer2=Process(target=proc_write,args=(q,['url_4','url_5','url_6']))
    proc_reader=Process(target=proc_read,args=(q,))
    #啟動子程序proc_writer寫入
    proc_writer1.start()
    proc_writer2.start()
    #啟動子程序proc_read讀取：
    proc_reader.start()
    #等待proc_writer結束：
    proc_writer1.join()
    proc_writer2.join()
    #proc_read程序裡是死迴圈，無法等待其結束，只能強行終止
    proc_reader.terminate()

執行效果:

python爬蟲學習筆記--python多程序

使用multiprocessing模組建立多程序：import os from multiprocessing import Process #子程序要執行的程式碼 def run_proc(name): print('Child process %s (%s)Ru

Python爬蟲學習筆記——Python基礎

Python爬蟲學習筆記——Python基礎 1 IO程式設計 1.1 檔案讀寫 Python內建了讀寫檔案的函式，語法為： open(name[.mode[.buffering]]) #開啟檔案 f = open(r'C:\text\myTextFile.txt') #讀取

python爬蟲入門八：多程序/多執行緒 python佇列Queue Python多執行緒（2）——執行緒同步機制 python學習筆記——多程序中共享記憶體Value & Array python 之多程序 Python多程序 Python 使用multiprocessing 特別耗記

什麼是多執行緒/多程序引用蟲師的解釋：計算機程式只不過是磁碟中可執行的，二進位制（或其它型別）的資料。它們只有在被讀取到記憶體中，被作業系統呼叫的時候才開始它們的生命期。程序（有時被稱為重量級程序）是程式的一次執行。每個程序都有自己的地址空間，記憶體，資料棧以及其它記錄其執行軌跡的輔助資料

python爬蟲學習筆記--python多程序

python爬蟲學習筆記--python多程序

Python爬蟲學習筆記——Python基礎

python爬蟲入門八：多程序/多執行緒 python佇列Queue Python多執行緒（2）——執行緒同步機制 python學習筆記——多程序中共享記憶體Value & Array python 之多程序 Python多程序 Python 使用multiprocessing 特別耗記

【Python爬蟲學習筆記10】多線程中的生產者消費者模式

Python學習筆記：多程序

python學習筆記（多程序）

Python爬蟲學習筆記之微信宮格驗證碼的識別(存在問題)

Python爬蟲學習筆記之模擬登陸並爬去GitHub

Python爬蟲學習筆記（一）——urllib庫的使用

Python爬蟲學習筆記（二）——requests庫的使用

Python爬蟲學習筆記（三）——正則表達式

Python爬蟲學習筆記（七）——智高考數據爬取

【Python爬蟲學習筆記2】urllib庫的基本使用

【Python爬蟲學習筆記8-2】MongoDB數據庫操作詳解

Python爬蟲學習筆記總結(一)

python爬蟲學習筆記一：爬蟲學習概覽與Requests庫的安裝與使用

python爬蟲學習筆記-urllib的使用

python爬蟲學習筆記-requests用法

python爬蟲學習筆記二：Requests庫詳解及HTTP協議

python爬蟲學習筆記三：圖片爬取

python爬蟲學習筆記--python多程序

相關推薦