從零開始構建Web應用-PART 1
譯者前言
使用Python開發web應用非常方便,有很多成熟的框架,比如Flask,Django等等。而這個系列文章是從零開始構建,從中可以學習HTTP協議以及很多原理知識,這對深入理解Web應用的開發非常有幫助。目前,本系列文章共4篇,這是第一篇的譯文。
我將使用Python從零開始構建一個web應用(以及它的web伺服器),本文是這個系列文章的首篇。為了完成這個系列,唯一的依賴就是Python標準庫,並且我會忽略WSGI標準。
言歸正傳,我們馬上開始!
Web伺服器
首先,我們將編寫一個HTTP伺服器用於執行我們的web應用。但是,我們先要花一點時間瞭解一下HTTP協議的工作原理。
HTTP如何工作
簡單來說,HTTP客戶端通過網路連線HTTP伺服器,並且向它們傳送包含字串資料的請求。伺服器會解析這些請求,並且向客戶端返回一個響應。整個協議以及請求和響應的格式在RFC2616 中詳細的介紹,而我會在本文中通俗地講解一下,所以你無需閱讀整個協議的文件。
請求格式
請求是由一些由 \r\n
分隔的行來表示,第一行叫做“請求行”。請求行由以下部分組成:HTTP方法,後跟一個空格,再後跟檔案的請求路徑,再後跟一個空格,然後是客戶端指定的HTTP協議的版本,最後是回車 \r
和換行 \n
符。
GET /some-path HTTP/1.1\r\n 複製程式碼
請求行之後,可能會有零個或者多個請求頭。每個請求頭都由以下內容組成:一個請求頭名稱,後跟冒號,然後是可選值,最後是 \r\n
:
Host: example.com\r\n Accept: text/html\r\n 複製程式碼
使用空行來標記請求頭的結束:
\r\n 複製程式碼
最後,請求可能包含一個請求體——一個任意的有效負荷,隨著這個請求發向伺服器。
將上述內容彙總一下,得到一個簡單的 GET
請求:
GET / HTTP/1.1\r\n Host: example.com\r\n Accept: text/html\r\n \r\n 複製程式碼
以下是一個帶有請求體的 POST
請求:
POST / HTTP/1.1\r\n Host: example.com\r\n Accept: application/json\r\n Content-type: application/json\r\n Content-length: 2\r\n \r\n {} 複製程式碼
響應格式
響應,和請求類似,也是由一些 \r\n
分隔的行組成。響應的首行叫做“狀態行”,它包含以下資訊:HTTP協議版本,後跟一個空格,後跟響應狀態碼,後跟一個空格,然後是狀態碼的資訊,最後還是 \r\n
:
HTTP/1.1 200 OK\r\n 複製程式碼
狀態行之後是響應頭,然後是一個空行,再就是可選的響應體:
HTTP/1.1 200 OK\r\n Content-type: text/html\r\n Content-length: 15\r\n \r\n <h1>Hello!</h1> 複製程式碼
一個簡單的伺服器
根據我們目前對協議的瞭解,讓我們來編寫一個伺服器,該伺服器不管接受什麼請求都返回相同的響應。
我們需要建立一個套接字,將其繫結到一個地址,然後開始監聽連線:
import socket HOST = "127.0.0.1" PORT = 9000 # By default, socket.socket creates TCP sockets. with socket.socket() as server_sock: # This tells the kernel to reuse sockets that are in `TIME_WAIT` state. server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # This tells the socket what address to bind to. server_sock.bind((HOST, PORT)) # 0 is the number of pending connections the socket may have before # new connections are refused.Since this server is going to process # one connection at a time, we want to refuse any additional connections. server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") 複製程式碼
如果你現在就執行程式碼,它將輸出它在監聽 127.0.0.1:9000
,立馬就結束了。為了能夠處理來的連線,我們需要呼叫套接字的 accept
方法。這樣做就可以阻塞處理過程直到有一個客戶端連線到我們的伺服器。
with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") client_sock, client_addr = server_sock.accept() print(f"New connection from {client_addr}.") 複製程式碼
一旦我們有一個套接字連線到客戶端,我們就可以開始和它通訊。使用 sendall
方法,向客戶端傳送響應:
RESPONSE = b"""\ HTTP/1.1 200 OK Content-type: text/html Content-length: 15 <h1>Hello!</h1>""".replace(b"\n", b"\r\n") with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") client_sock, client_addr = server_sock.accept() print(f"New connection from {client_addr}.") with client_sock: client_sock.sendall(RESPONSE) 複製程式碼
此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會看到字串 “Hello!” 。不幸的是,伺服器傳送了這個響應後就立即結束了,所以重新整理瀏覽器就會報錯。下面修復這個問題:
RESPONSE = b"""\ HTTP/1.1 200 OK Content-type: text/html Content-length: 15 <h1>Hello!</h1>""".replace(b"\n", b"\r\n") with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"New connection from {client_addr}.") with client_sock: client_sock.sendall(RESPONSE) 複製程式碼
此時,我們就擁有了一個web伺服器,它可以執行一個簡單的HTML網頁,一共才25行程式碼。這還不算太遭!
一個檔案伺服器
我們繼續擴充套件這個HTTP伺服器,讓它可以處理硬碟上的檔案。
請求抽象
在修改之前,我們需要能夠讀取並且解析來自客戶端的請求。因為我們已經知道,請求資料是由一系列的行表示,每行由 \r\n
分隔,讓我們編寫一個生成器函式,它可以讀取套接字中的資料,並且解析出每一行的資料:
import typing def iter_lines(sock: socket.socket, bufsize: int = 16_384) -> typing.Generator[bytes, None, bytes]: """Given a socket, read all the individual CRLF-separated lines and yield each one until an empty one is found.Returns the remainder after the empty line. """ buff = b"" while True: data = sock.recv(bufsize) if not data: return b"" buff += data while True: try: i = buff.index(b"\r\n") line, buff = buff[:i], buff[i + 2:] if not line: return buff yield line except IndexError: break 複製程式碼
以上程式碼看上去有點困難,實際上,它只是從套接字中儘可能的讀取資料,將它們放到一個緩衝區裡,不斷得將緩衝到的資料拆分成單獨的行,每次給出一行。一旦它發現一個空行,它就會返回提取到的資料。
使用 iter_lines
,我們可以開始打印出從客戶端讀取到的請求:
with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"New connection from {client_addr}.") with client_sock: for request_line in iter_lines(client_sock): print(request_line) client_sock.sendall(RESPONSE) 複製程式碼
此時如果你執行程式碼,然後在瀏覽器裡訪問 http://127.0.0.1:9000 ,你會在控制檯裡看到以下內容:
Received connection from ('127.0.0.1', 62086)... b'GET / HTTP/1.1' b'Host: localhost:9000' b'Connection: keep-alive' b'Cache-Control: max-age=0' b'Upgrade-Insecure-Requests: 1' b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36' b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' b'Accept-Encoding: gzip, deflate, br' b'Accept-Language: en-US,en;q=0.9,ro;q=0.8' 複製程式碼
相當整齊!讓我們抽象出一個 Request
類:
import typing class Request(typing.NamedTuple): method: str path: str headers: typing.Mapping[str, str] 複製程式碼
現在,這個請求類只知道請求方法,路徑,請求頭,後續,我們繼續支援查詢字串引數以及讀取請求體。
為了封裝邏輯需要構建一個請求,我們在Request類中增加一個類方法 from_socket
:
class Request(typing.NamedTuple): method: str path: str headers: typing.Mapping[str, str] @classmethod def from_socket(cls, sock: socket.socket) -> "Request": """Read and parse the request from a socket object. Raises: ValueError: When the request cannot be parsed. """ lines = iter_lines(sock) try: request_line = next(lines).decode("ascii") except StopIteration: raise ValueError("Request line missing.") try: method, path, _ = request_line.split(" ") except ValueError: raise ValueError(f"Malformed request line {request_line!r}.") headers = {} for line in lines: try: name, _, value = line.decode("ascii").partition(":") headers[name.lower()] = value.lstrip() except ValueError: raise ValueError(f"Malformed header line {line!r}.") return cls(method=method.upper(), path=path, headers=headers) 複製程式碼
這裡用到了 iter_lines
函式,剛才我們在讀取請求行時用過它。這裡獲取了請求方法和路徑,然後讀取每一個請求頭並且進行轉換。最終,它構建了一個 Request
物件並返回了該物件。如果我們把它放到之前的伺服器迴圈裡,會像下面這樣:
with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"Received connection from {client_addr}...") with client_sock: request = Request.from_socket(client_sock) print(request) client_sock.sendall(RESPONSE) 複製程式碼
如果你現在連線到伺服器,你會看到如下資訊:
Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'}) 複製程式碼
因為 from_socket
在特定的情況下會丟擲一個異常,如果你現在給出一個非法的請求,那麼伺服器就可能會宕機。為了模擬這種請求,你可以使用 telnet
連線到伺服器,然後傳送一些偽造的資料:
> telnet 127.0.0.1 9000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. hello Connection closed by foreign host. 複製程式碼
果然,這個伺服器宕機了:
Received connection from ('127.0.0.1', 62404)... Traceback (most recent call last): File "server.py", line 53, in parse request_line = next(lines).decode("ascii") ValueError: not enough values to unpack (expected 3, got 1) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "server.py", line 82, in <module> with client_sock: File "server.py", line 55, in parse raise ValueError("Request line missing.") ValueError: Malformed request line 'hello'. 複製程式碼
為了能夠更加優雅地處理這種情況,我們使用 try-except
包裹起對 from_socket
的呼叫,然後當遇到有缺陷的請求時,就向客戶端傳送一個“400 Bad Request“響應:
BAD_REQUEST_RESPONSE = b"""\ HTTP/1.1 400 Bad Request Content-type: text/plain Content-length: 11 Bad Request""".replace(b"\n", b"\r\n") with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"Received connection from {client_addr}...") with client_sock: try: request = Request.from_socket(client_sock) print(request) client_sock.sendall(RESPONSE) except Exception as e: print(f"Failed to parse request: {e}") client_sock.sendall(BAD_REQUEST_RESPONSE) 複製程式碼
如果我們再去嘗試搞掛伺服器,我們的客戶端會得到一個響應,並且伺服器會繼續正常執行:
~> telnet 127.0.0.1 9000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. hello HTTP/1.1 400 Bad Request Content-type: text/plain Content-length: 11 Bad RequestConnection closed by foreign host. 複製程式碼
現在我們準備開始實現處理檔案的部分,首先,我們在定義一個預設的”404 Not Found“響應:
NOT_FOUND_RESPONSE = b"""\ HTTP/1.1 404 Not Found Content-type: text/plain Content-length: 9 Not Found""".replace(b"\n", b"\r\n") #... with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"Received connection from {client_addr}...") with client_sock: try: request = Request.from_socket(client_sock) print(request) client_sock.sendall(NOT_FOUND_RESPONSE) except Exception as e: print(f"Failed to parse request: {e}") client_sock.sendall(BAD_REQUEST_RESPONSE) 複製程式碼
此外,再增加一個“405 Method Not Allowed ”響應。我們將會只處理 GET
請求:
METHOD_NOT_ALLOWED_RESPONSE = b"""\ HTTP/1.1 405 Method Not Allowed Content-type: text/plain Content-length: 17 Method Not Allowed""".replace(b"\n", b"\r\n") 複製程式碼
我們來定一個 SERVER_ROOT
常量和一個 serve_file
函式,這個常量用於表示伺服器處理哪裡的檔案。
import mimetypes import os import socket import typing SERVER_ROOT = os.path.abspath("www") FILE_RESPONSE_TEMPLATE = """\ HTTP/1.1 200 OK Content-type: {content_type} Content-length: {content_length} """.replace("\n", "\r\n") def serve_file(sock: socket.socket, path: str) -> None: """Given a socket and the relative path to a file (relative to SERVER_SOCK), send that file to the socket if it exists.If the file doesn't exist, send a "404 Not Found" response. """ if path == "/": path = "/index.html" abspath = os.path.normpath(os.path.join(SERVER_ROOT, path.lstrip("/"))) if not abspath.startswith(SERVER_ROOT): sock.sendall(NOT_FOUND_RESPONSE) return try: with open(abspath, "rb") as f: stat = os.fstat(f.fileno()) content_type, encoding = mimetypes.guess_type(abspath) if content_type is None: content_type = "application/octet-stream" if encoding is not None: content_type += f"; charset={encoding}" response_headers = FILE_RESPONSE_TEMPLATE.format( content_type=content_type, content_length=stat.st_size, ).encode("ascii") sock.sendall(response_headers) sock.sendfile(f) except FileNotFoundError: sock.sendall(NOT_FOUND_RESPONSE) return 複製程式碼
serve_file
獲得客戶端套接字和一個檔案的路徑。然後它嘗試解決真正檔案的路徑,這些檔案位於 SERVER_ROOT
,對於 SERVER_ROO
之外的檔案就返回“not found”。然後嘗試開啟檔案,找到它的mime型別和大小(使用 os.fstat
),接著構造響應頭,然後使用 sendfile
系統呼叫將檔案寫入套接字。如果在硬碟上找不到檔案,就返回"not found"響應。
如果我們增加 serve_file
,我們的伺服器迴圈像這個樣子:
with socket.socket() as server_sock: server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((HOST, PORT)) server_sock.listen(0) print(f"Listening on {HOST}:{PORT}...") while True: client_sock, client_addr = server_sock.accept() print(f"Received connection from {client_addr}...") with client_sock: try: request = Request.from_socket(client_sock) if request.method != "GET": client_sock.sendall(METHOD_NOT_ALLOWED_RESPONSE) continue serve_file(client_sock, request.path) except Exception as e: print(f"Failed to parse request: {e}") client_sock.sendall(BAD_REQUEST_RESPONSE) 複製程式碼
如果你增加一個檔案 www\index.html
,靠著 server.py
檔案,然後訪問 http://localhost:9000 ,你就會看到檔案的內容。
尾聲
這是Part 1。在Part 2中,我們將提取 Server
和 Response
的抽象,以及如何處理多個併發的請求。如果你想獲得完整的原始碼,訪問 這裡 。
原文: WEB APPLICATION FROM SCRATCH, PART I
- *作者:*Bogdan Popa
- 譯者:noONE
更多精彩內容,關注公眾號 SeniorEngineer :
