# 如何使用Python3.5並行執行多個web請求(不適用aiohttp

阿新 • • 發佈：2019-07-06

> 作者的生產環境剛從2.6升級到3.5.0，但滿足不了aiohttp的最低版本需求。所以在有了這篇文章，如何改造程式碼，充分利用python3.5 asyncio提供的非同步功能。原文連結

近日IT部門最終將我們工作環境的分散式Python版本升級到了3.5.0。這對從2.6版本來說是一次巨大的升級，但依然有些遺憾。3.5.0 不能滿足一些庫的最小版本需求，這其中就包括aiohttp。

儘管有這些限制，我依然需要寫指令碼從我們的API獲取數以百計的csv檔案，然後處理資料。Python本身並不想NodeJS那樣基於事件驅動和原生非同步，但這並不妨礙Python 也能實現一樣的功能。這篇文件將詳細介紹我如何學習非同步操作，並列出它的優勢。

宣告: 如果你有更高的版本（3.5.2+），強烈推薦你使用aiohttp。這是個非常健壯的庫，特別適合解決這類問題。網上也有很多關於她的教程。

假設

作如下假設：

> * 熟悉Python和它的語法 > * 熟悉基礎的網路請求 > * 知道非同步執行的概念

開始

安裝requests

$ python -m pip install requests

沒有許可權可以做如下安裝

$ python -m pip install requests --user

錯誤的做法：同步請求

為了體現並行的好處，先看看同步的做法。我大概描述一下程式碼將要做什麼。我們要執行一個能獲取csv檔案的GET請求，測量讀取其中文字的時間。

我們將從這個網址（https://people.sc.fsu.edu/~jburkardt/data/csv/）下載多個csv檔案，裡面有很多例項資料。

在說明一下，我們將用requests 庫裡 Session物件，執行GET請求。

首先，需要一個方法執行web請求：

def fetch(session, csv):
    base_url = "https://people.sc.fsu.edu/~jburkardt/data/csv/"
    with session.get(base_url + csv) as response:
        data = response.text
        if response.status_code != 200:
            print("FAILURE::{0}".format(url))
        # Return .csv data for future consumption
        return data

這個函式使用Session物件和csv名字，執行網路請求，然後返回response裡的文字內容。

下面，我們需要一個函式遍歷檔案列表，然後去請求，統計執行請求的時間。

from timeit import default_timer()

def get_data_synchronous():
    csvs_to_fetch = [
        "ford_escort.csv",
        "cities.csv",
        "hw_25000.csv",
        "mlb_teams_2012.csv",
        "nile.csv",
        "homes.csv",
        "hooke.csv",
        "lead_shot.csv",
        "news_decline.csv",
        "snakes_count_10000.csv",
        "trees.csv",
        "zillow.csv"
    ]

    with requests.Session() as session:
        print("{0:&lt;30} {1:&gt;20}".format("File", "Completed at"))
        
        # Set any session parameters here before calling `fetch`
        # For instance, if you needed to set Headers or Authentication
        # this can be done before starting the loop
        
        total_start_time = default_timer()
        for csv in csvs_to_fetch:
            fetch(session, csv)
            elapsed = default_timer() - total_start_time
            time_completed_at = "{:5.2f}s".format(elapsed)
            print("{0:&lt;30} {1:&gt;20}".format(csv, time_completed_at))

這個函式建立了一個Session物件，然後遍歷csvs_to_fetch裡的每個檔案。一旦fetch操作結束，就將計算下載時間，並以易讀的格式展示。

最後main函式呼叫：

def main():
    # Simple for now
    get_data_synchronous()

main()

同步執行的完整程式碼


import requests
from timeit import default_timer

def fetch(session, csv):
    base_url = "https://people.sc.fsu.edu/~jburkardt/data/csv/"
    with session.get(base_url + csv) as response:
        data = response.text
        if response.status_code != 200:
            print("FAILURE::{0}".format(url))
        # Return .csv data for future consumption
        return data

def get_data_synchronous():
    csvs_to_fetch = [
        "ford_escort.csv",
        "cities.csv",
        "hw_25000.csv",
        "mlb_teams_2012.csv",
        "nile.csv",
        "homes.csv",
        "hooke.csv",
        "lead_shot.csv",
        "news_decline.csv",
        "snakes_count_10000.csv",
        "trees.csv",
        "zillow.csv"
    ]

    with requests.Session() as session:
        print("{0:&lt;30} {1:&gt;20}".format("File", "Completed at"))
        
        # Set any session parameters here before calling `fetch`
        # For instance, if you needed to set Headers or Authentication
        # this can be done before starting the loop
        
        total_start_time = default_timer()
        for csv in csvs_to_fetch:
            fetch(session, csv)
            elapsed = default_timer() - total_start_time
            time_completed_at = "{:5.2f}s".format(elapsed)
            print("{0:&lt;30} {1:&gt;20}".format(csv, time_completed_at))

def main():
    # Simple for now
    get_data_synchronous()

main()

結果：

同步程式碼.注意觀察一個完成後，才能執行下一個

多虧了Python3 asyncio，通過它我們可以大幅度提高效能。

正確的解決辦法：一次執行多個非同步請求

為了能起作用，我們要先重做現有的程式碼。從fetch開始：

import requests
from timeit import default_timer

# We'll need access to this variable later
START_TIME = default_timer()

def fetch(session, csv):
    base_url = "https://people.sc.fsu.edu/~jburkardt/data/csv/"
    with session.get(base_url + csv) as response:
        data = response.text
        if response.status_code != 200:
            print("FAILURE::{0}".format(url))
        # Now we will print how long it took to complete the operation from the 
        # `fetch` function itself
        elapsed = default_timer() - START_TIME
        time_completed_at = "{:5.2f}s".format(elapsed)
        print("{0:&lt;30} {1:&gt;20}".format(csv, time_completed_at))

        return data

下一步，改造get_data為非同步函式

import asyncio
from timeit import default_timer
from concurrent.futures import ThreadPoolExecutor

async def get_data_asynchronous():
    csvs_to_fetch = [
        "ford_escort.csv",
        "cities.csv",
        "hw_25000.csv",
        "mlb_teams_2012.csv",
        "nile.csv",
        "homes.csv",
        "hooke.csv",
        "lead_shot.csv",
        "news_decline.csv",
        "snakes_count_10000.csv",
        "trees.csv",
        "zillow.csv"
    ]
    print("{0:&lt;30} {1:&gt;20}".format("File", "Completed at"))
    
    # Note: max_workers is set to 10 simply for this example,
    # you'll have to tweak with this number for your own projects
    # as you see fit
    with ThreadPoolExecutor(max_workers=10) as executor:
        with requests.Session() as session:
            # Set any session parameters here before calling `fetch`

            # Initialize the event loop        
            loop = asyncio.get_event_loop()
            
            # Set the START_TIME for the `fetch` function
            START_TIME = default_timer()
            
            # Use list comprehension to create a list of
            # tasks to complete. The executor will run the `fetch`
            # function for each csv in the csvs_to_fetch list
            tasks = [
                loop.run_in_executor(
                    executor,
                    fetch,
                    *(session, csv) # Allows us to pass in multiple arguments to `fetch`
                )
                for csv in csvs_to_fetch
            ]
            
            # Initializes the tasks to run and awaits their results
            for response in await asyncio.gather(*tasks):
                pass

現在的程式碼建立了多個執行緒，為每個csv檔案執行fetch函式進行下載。

最後，我們的mian函式為了正確的初始化非同步函式，也需要稍微做些修改。

def main():
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(get_data_asynchronous())
    loop.run_until_complete(future)

main()

再執行下，看看結果：

非同步例子。注意獲取檔案並不是按順序的。

略微修改後，12個檔案的下載時間3.43s vs 10.84s。下載時間減少了近70%。

import requests
import asyncio
from concurrent.futures import ThreadPoolExecutor
from timeit import default_timer

START_TIME = default_timer()

def fetch(session, csv):
    base_url = "https://people.sc.fsu.edu/~jburkardt/data/csv/"
    with session.get(base_url + csv) as response:
        data = response.text
        if response.status_code != 200:
            print("FAILURE::{0}".format(url))

        elapsed = default_timer() - START_TIME
        time_completed_at = "{:5.2f}s".format(elapsed)
        print("{0:&lt;30} {1:&gt;20}".format(csv, time_completed_at))

        return data

async def get_data_asynchronous():
    csvs_to_fetch = [
        "ford_escort.csv",
        "cities.csv",
        "hw_25000.csv",
        "mlb_teams_2012.csv",
        "nile.csv",
        "homes.csv",
        "hooke.csv",
        "lead_shot.csv",
        "news_decline.csv",
        "snakes_count_10000.csv",
        "trees.csv",
        "zillow.csv"
    ]
    print("{0:&lt;30} {1:&gt;20}".format("File", "Completed at"))
    with ThreadPoolExecutor(max_workers=10) as executor:
        with requests.Session() as session:
            # Set any session parameters here before calling `fetch`
            loop = asyncio.get_event_loop()
            START_TIME = default_timer()
            tasks = [
                loop.run_in_executor(
                    executor,
                    fetch,
                    *(session, csv) # Allows us to pass in multiple arguments to `fetch`
                )
                for csv in csvs_to_fetch
            ]
            for response in await asyncio.gather(*tasks):
                pass

def main():
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(get_data_asynchronous())
    loop.run_until_complete(future)

main()

希望你喜歡這篇文章，並將這些技術應用到必須使用舊版本Python的專案。儘管Python沒有簡單的async / await 模式，但要取得類似的結

# 如何使用Python3.5並行執行多個web請求(不適用aiohttp

> 作者的生產環境剛從2.6升級到3.5.0，但滿足不了aiohttp的最低版本需求。所以在有了這篇文章，如何改造程式碼，充分利用python3.5 asyncio提供的非同步功能。原文連結近日IT部門最終將我們工作環境的分散式Python版本升級到了3.5.0。這對從2.6版本來說是一次巨大的升

Java異步執行多個HTTP請求的例子（需要apache http類庫）

ride 同步 conn done 例子 latch block org ftw 直接上代碼 package org.jivesoftware.spark.util; import java.io.IOException; import java.uti

Python的併發處理：（一）並行執行多個互不相干的子程序

這是併發處理中最簡單的一種情況。應用場景當然也很簡單。一般會是這樣：通過傳遞不同的引數，讓同一個函式在同一時間內執行幾種不同的任務，達到多工並行的效果，提升吞吐量。我們有這樣一個要求：分別往2個檔案中寫入百W級資料，在序列狀態下的指令碼是這樣的：【code-1】 import time '

使用shell並行執行多個指令碼

有沒有一種比較通用的並行執行多個SQL指令碼的方法呢？每種資料庫都提供命令列介面執行SQL語句，因此最容易想到的就是通過初始化多個併發的會話並行執行，每個會話執行一個單獨的查詢，用來抽取不同的資料部分。以Oracle例如，假設要從訂單表抽取資料，訂單表已經是按月做了範圍分

extjs 迴圈執行多個非同步請求時，引數和後臺以及相關問題和衍生問題的處理

在Extjs中，非同步請求的寫法： Ext.Ajax.request({ url: '***.action', //async: false, params: { p1: v1, p2: v2 ... }, success: function(resp

[轉]多個ajax請求時控制執行順序或全部執行後的操作

on() .when ati ack login tps als fun lan 本文轉自：https://blog.csdn.net/fsdad/article/details/71514822 一、當確保執行順序時： 1、請求加async: false,，

iOS開發系列--並行開發(處理多個網路請求併發的情況)

概覽大家都知道，在開發過程中應該儘可能減少使用者等待時間，讓程式儘可能快的完成運算。可是無論是哪種語言開發的程式最終往往轉換成組合語言進而解釋成機器碼來執行。但是機器碼是按順序執行的，一個複雜的多步操作只能一步步按順序逐個執行。改變這種狀況可以從兩個角度出發：對於單核處理

web專案Log4j日誌輸出路徑配置問題問題描述：一個web專案想在一個tomcat下執行多個例項（通過修改war包名稱的實現），然後每個例項都將日誌輸出到tomcat的logs目錄下例項名命名的文

問題描述：一個web專案想在一個tomcat下執行多個例項（通過修改war包名稱的實現），然後每個例項都將日誌輸出到tomcat的logs目錄下例項名命名的資料夾下進行區分檢視每個例項日誌，要求通過儘可能少的改動配置檔案，最好修改例項名後可以不修改log4j的配置檔案。實現分析：一般實現上面需求，需要在修

前端踩坑小結:多個非同步請求在同一個函式裡面執行時的同步問題之promise的用法。

今天用VUE編寫專案時，涉及到兩個非同步請求在一個方法裡面對同一個變數進行操作，之前自己都沒意識到多個非同步請求對同一變數進行操作的時候會導致資料錯誤，結果今天除錯了半天，才想到了這個問題。也是怪自己還是一個新手，對於這種常識性的錯誤都不敏感。於是自己查了一下相關資料，發

jquery $.when()多個非同步請求成功後再執行後續方法

$.when( $.ajax( "/page1.php" ), $.ajax( "/page2.php" ) ).done(function( a1, a2 ) { // a1 and a2 are arguments resolved for the page1 and

多個ajax請求時控制執行順序或全部執行後的操作

一、當確保執行順序時： 1、請求加async: false,，這樣所有的ajax就會同步執行，請求順序就是程式碼順序 2、$.when($.ajax( 　　　　{async: false，　　　　 url : url1 　　

多個非同步請求執行順序問題。

情景一：多個非同步請求，虛擬碼：Ajax1(); Ajax2(); Ajax3();這三個Ajax請求並不存在執行順序，也就是2和3並不會等第一個Ajax請求完成再去執行，而是直接執行如果想要達到順序執行的效果，可以通過回撥函式來完成，虛擬碼：$.ajax({url:"ser

一臺伺服器執行多個ActiveMQ(版本apache-activemq-5.4.2)

一、在同一臺機器上配置多個ActiveMQ(版本apache-activemq-5.4.2) 1.複製一份執行檔案 cp -r apache-activemq-5.4.2 apache-activemq-5.4.2-new 2.修改配置檔案activemq.xml v

iOS 多個網路請求並行／併發處理

需求：同時存在A,B,C,D四個網路請求，要求同時發起四個網路請求，當四個網路請求都返回資料以後再處理事件E。解決方法： /建立訊號量/ dispatch_semaphore_t semaphore = dispat

怎樣用批處理來執行多個exe文件

asc pri line views bat文件 tro ext sta each 怎樣用批處理來運行多個exe文件 @echo off start *****.exe start *****.exe start *****.exe star

Python開發【筆記】：單線程下執行多個定時器

自動代碼 python 線程 timer ont -s 大量過多單線程多定時器　　前言：公司業務需求，實例當中大量需要啟動定時器的操作；大家都知道python中的定時器用的是threading.Timer，每當啟動一個定時器時，程序內部起了一個線程，定時器觸發執行結

使用SQLCMD在SQLServer執行多個腳本

created 告訴 out c盤 order 步驟 null bat文件 mar 概述：作為DBA，經常要用開發人員提供的SQL腳本來更新正式數據庫，但是一個比較合理的開發流程，當提交腳本給DBA執行的時候，可能已經有幾百個sql文件，並且有執行順

python爬蟲scrapy之如何同時執行多個scrapy爬行任務

還需學習 lis 參數文件名其中 .project 自定義 com 背景：　　剛開始學習scrapy爬蟲框架的時候，就在想如果我在服務器上執行一個爬蟲任務的話，還說的過去。但是我不能每個爬蟲任務就新建一個項目吧。例如我建立了一個知乎的爬行任務，但是我在這個爬行任務中

scrapy順序執行多個爬蟲

clas aio 爬蟲 sleep class abs pan path execute 1 # -*- coding:utf-8 -*- 2 3 from scrapy import cmdline 4 from scrapy.cmdline import e

nginx_include發布多個web站點

linux我們有了一個 VPS 主機以後，為了不浪費 VPS 的強大資源（相比共享主機1000多個站點擠在一臺機器上），往往有想讓 VPS 做點什麽的想法，銀子不能白花啊:)。放置多個網站或者博客是個不錯的想法，可是如何配置 web 服務器才能在一個 VPS 上放置多個網站/博客呢？如何通過一個 IP 訪問多

# 如何使用Python3.5並行執行多個web請求(不適用aiohttp

假設

開始

錯誤的做法：同步請求

正確的解決辦法： 一次執行多個非同步請求

相關推薦

正確的解決辦法：一次執行多個非同步請求