1. 程式人生 > >進程和線程的區別, 面相對象補充, 進程, 數據共享, 鎖, 進程池, 爬蟲模塊(requests, bs4(beautifulsoup))

進程和線程的區別, 面相對象補充, 進程, 數據共享, 鎖, 進程池, 爬蟲模塊(requests, bs4(beautifulsoup))

request %s 功能 val 差異 http += 共享 str

一. 進程和線程的區別?
第一:
進程是cpu資源分配的最小單元。
線程是cpu計算的最小單元。
第二:
一個進程中可以有多個線程。
第三:
對於Python來說他的進程和線程和其他語言有差異,是有GIL鎖。
GIL鎖保證一個進程中同一時刻只有一個線程被cpu調度。

IO密集型操作可以使用多線程;計算密集型可以使用多進程;

二. 面向對象補充:

class Foo(object):

    def __init__(self):
        object.__setattr__(self, ‘info‘, {}) # 在繼承的對象中設置值的本質

    def __setattr__(self, key, value):          # 會攔截所有屬性的的賦值語句
        self.info[key] = value

    def __getattr__(self, item):         #  攔截點號運算。當對未定義的屬性名稱和實例進行點號
        # 運算時,就會用屬性名作為字符串調用這個方法。如果繼承樹可以找到該屬性,則不調用此方法
        print(item)             # name
        return self.info[item]

obj = Foo()
obj.name = ‘nacho‘
print(obj.name)     # nacho
print(obj.info)     # {‘name‘: ‘nacho‘}

三. 進程
- 進程間數據不共享

data_list = []
def task(arg):
    data_list.append(arg)
    print(data_list)

def run():
    for i in range(10):
        p = multiprocessing.Process(target=task,args=(i,))
        # p = threading.Thread(target=task,args=(i,))
        p.start()

if __name__ == ‘__main__‘:      # win10需要用這個, linux不需要
    run()

- 常用功能:
- join
- deamon
- name
- multiprocessing.current_process()
- multiprocessing.current_process().ident/pid

- 類繼承方式創建進程

class MyProcess(multiprocessing.Process):

    def run(self):
        print(‘當前進程‘,multiprocessing.current_process())


    def run():
        p1 = MyProcess()
        p1.start()

        p2 = MyProcess()
        p2.start()

if __name__ == ‘__main__‘:
    run()

四. 進程間數據共享

Queue:
    linux:
        q = multiprocessing.Queue()

        def task(arg,q):
            q.put(arg)

        def run():
            for i in range(10):
                p = multiprocessing.Process(target=task, args=(i, q,))
                p.start()

            while True:
                v = q.get()
                print(v)

        run()
    windows:
        def task(arg,q):
            q.put(arg)

        if __name__ == ‘__main__‘:
            q = multiprocessing.Queue()
            for i in range(10):
                p = multiprocessing.Process(target=task,args=(i,q,))
                p.start()
            while True:
                v = q.get()
                print(v)

Manager:(*)
    Linux:
        m = multiprocessing.Manager()
        dic = m.dict()

        def task(arg):
            dic[arg] = 100

        def run():
            for i in range(10):
                p = multiprocessing.Process(target=task, args=(i,))
                p.start()

            input(‘>>>‘)
            print(dic.values())

        if __name__ == ‘__main__‘:
            run()

    windows:
        def task(arg,dic):
            time.sleep(2)
            dic[arg] = 100

        if __name__ == ‘__main__‘:
            m = multiprocessing.Manager()
            dic = m.dict()

            process_list = []
            for i in range(10):
                p = multiprocessing.Process(target=task, args=(i,dic,))
                p.start()

                process_list.append(p)

            while True:
                count = 0
                for p in process_list:
                    if not p.is_alive():
                        count += 1
                if count == len(process_list):
                    break
            print(dic)

五. 進程鎖

import time
import threading
import multiprocessing


lock = multiprocessing.RLock()

def task(arg):
    print(‘鬼子來了‘)
    lock.acquire()
    time.sleep(2)
    print(arg)
    lock.release()

if __name__ == ‘__main__‘:
    p1 = multiprocessing.Process(target=task,args=(1,))
    p1.start()

    p2 = multiprocessing.Process(target=task, args=(2,))
    p2.start()

六. 進程池

import time
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor

def task(arg):
    time.sleep(2)
    print(arg)

if __name__ == ‘__main__‘:

    pool = ProcessPoolExecutor(6)      # 取決於CPU的核心數
    for i in range(10):
        pool.submit(task,i)


七. 爬蟲:
示例:

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor


# 模擬瀏覽器發送請求
# 內部創建 sk = socket.socket()
# 和抽屜進行socket連接 sk.connect(...)
# sk.sendall(‘...‘)
# sk.recv(...)

def task(url):
    print(url)
    r1 = requests.get(
        url=url,
        headers={
            ‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36‘
        }
    )

    # 查看下載下來的文本信息
    soup = BeautifulSoup(r1.text,‘html.parser‘)
    print(soup.text)
    # content_list = soup.find(‘div‘,attrs={‘id‘:‘content-list‘})
    # for item in content_list.find_all(‘div‘,attrs={‘class‘:‘item‘}):
    #     title = item.find(‘a‘).text.strip()
    #     target_url = item.find(‘a‘).get(‘href‘)
    #     print(title,target_url)

def run():
    pool = ThreadPoolExecutor(5)
    for i in range(1,50):
        pool.submit(task,‘https://dig.chouti.com/all/hot/recent/%s‘ %i)


if __name__ == ‘__main__‘:
    run()

進程和線程的區別, 面相對象補充, 進程, 數據共享, 鎖, 進程池, 爬蟲模塊(requests, bs4(beautifulsoup))