進程和線程的區別, 面相對象補充, 進程, 數據共享, 鎖, 進程池, 爬蟲模塊(requests, bs4(beautifulsoup))
阿新 • • 發佈:2018-09-12
request %s 功能 val 差異 http += 共享 str
一. 進程和線程的區別?
第一:
進程是cpu資源分配的最小單元。
線程是cpu計算的最小單元。
第二:
一個進程中可以有多個線程。
第三:
對於Python來說他的進程和線程和其他語言有差異,是有GIL鎖。
GIL鎖保證一個進程中同一時刻只有一個線程被cpu調度。
IO密集型操作可以使用多線程;計算密集型可以使用多進程;
二. 面向對象補充:
class Foo(object): def __init__(self): object.__setattr__(self, ‘info‘, {}) # 在繼承的對象中設置值的本質 def __setattr__(self, key, value): # 會攔截所有屬性的的賦值語句 self.info[key] = value def __getattr__(self, item): # 攔截點號運算。當對未定義的屬性名稱和實例進行點號 # 運算時,就會用屬性名作為字符串調用這個方法。如果繼承樹可以找到該屬性,則不調用此方法 print(item) # name return self.info[item] obj = Foo() obj.name = ‘nacho‘ print(obj.name) # nacho print(obj.info) # {‘name‘: ‘nacho‘}
三. 進程
- 進程間數據不共享
data_list = [] def task(arg): data_list.append(arg) print(data_list) def run(): for i in range(10): p = multiprocessing.Process(target=task,args=(i,)) # p = threading.Thread(target=task,args=(i,)) p.start() if __name__ == ‘__main__‘: # win10需要用這個, linux不需要 run()
- 常用功能:
- join
- deamon
- name
- multiprocessing.current_process()
- multiprocessing.current_process().ident/pid
- 類繼承方式創建進程
class MyProcess(multiprocessing.Process): def run(self): print(‘當前進程‘,multiprocessing.current_process()) def run(): p1 = MyProcess() p1.start() p2 = MyProcess() p2.start() if __name__ == ‘__main__‘: run()
四. 進程間數據共享
Queue: linux: q = multiprocessing.Queue() def task(arg,q): q.put(arg) def run(): for i in range(10): p = multiprocessing.Process(target=task, args=(i, q,)) p.start() while True: v = q.get() print(v) run() windows: def task(arg,q): q.put(arg) if __name__ == ‘__main__‘: q = multiprocessing.Queue() for i in range(10): p = multiprocessing.Process(target=task,args=(i,q,)) p.start() while True: v = q.get() print(v) Manager:(*) Linux: m = multiprocessing.Manager() dic = m.dict() def task(arg): dic[arg] = 100 def run(): for i in range(10): p = multiprocessing.Process(target=task, args=(i,)) p.start() input(‘>>>‘) print(dic.values()) if __name__ == ‘__main__‘: run() windows: def task(arg,dic): time.sleep(2) dic[arg] = 100 if __name__ == ‘__main__‘: m = multiprocessing.Manager() dic = m.dict() process_list = [] for i in range(10): p = multiprocessing.Process(target=task, args=(i,dic,)) p.start() process_list.append(p) while True: count = 0 for p in process_list: if not p.is_alive(): count += 1 if count == len(process_list): break print(dic)
五. 進程鎖
import time import threading import multiprocessing lock = multiprocessing.RLock() def task(arg): print(‘鬼子來了‘) lock.acquire() time.sleep(2) print(arg) lock.release() if __name__ == ‘__main__‘: p1 = multiprocessing.Process(target=task,args=(1,)) p1.start() p2 = multiprocessing.Process(target=task, args=(2,)) p2.start()
六. 進程池
import time from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor def task(arg): time.sleep(2) print(arg) if __name__ == ‘__main__‘: pool = ProcessPoolExecutor(6) # 取決於CPU的核心數 for i in range(10): pool.submit(task,i)
七. 爬蟲:
示例:
import requests from bs4 import BeautifulSoup from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor # 模擬瀏覽器發送請求 # 內部創建 sk = socket.socket() # 和抽屜進行socket連接 sk.connect(...) # sk.sendall(‘...‘) # sk.recv(...) def task(url): print(url) r1 = requests.get( url=url, headers={ ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36‘ } ) # 查看下載下來的文本信息 soup = BeautifulSoup(r1.text,‘html.parser‘) print(soup.text) # content_list = soup.find(‘div‘,attrs={‘id‘:‘content-list‘}) # for item in content_list.find_all(‘div‘,attrs={‘class‘:‘item‘}): # title = item.find(‘a‘).text.strip() # target_url = item.find(‘a‘).get(‘href‘) # print(title,target_url) def run(): pool = ThreadPoolExecutor(5) for i in range(1,50): pool.submit(task,‘https://dig.chouti.com/all/hot/recent/%s‘ %i) if __name__ == ‘__main__‘: run()
進程和線程的區別, 面相對象補充, 進程, 數據共享, 鎖, 進程池, 爬蟲模塊(requests, bs4(beautifulsoup))