1. 程式人生 > >深入理解SPDK: 內存管理

深入理解SPDK: 內存管理

ica 映射 物理地址 byte add starting posix app win

CPU 內存 物理地址空間會散列 ( interleave between channel/memory controller);

NVME 設備對使用內存物理地址空間的限制:
The NVMe 1.0 specification requires all physical memory to be describable by what is called a PRP list. To be described by a PRP list, memory must have the following properties:

NVME 設備通過DMA傳輸數據
NVMe devices transfer data to and from system memory using Direct Memory Access (DMA). Specifically, they send messages across the PCI bus requesting data transfers. In the absence of an IOMMU, these messages contain physical memory addresses. These data transfers happen without involving the CPU, and the MMU is responsible for making access to memory coherent.

The memory is broken into physical 4KiB pages, which we‘ll call device pages.
The first device page can be a partial page starting at any 4-byte aligned address. It may extend up to the end of the current physical page, but not beyond.
If there is more than one device page, the first device page must end on a physical 4KiB page boundary.
The last device page begins on a physical 4KiB page boundary, but is not required to end on a physical 4KiB page boundary.

The specification allows for device pages to be other sizes than 4KiB, but all known devices as of this writing use 4KiB.

用戶態的程序(SPDK)使用戶態的地址,而nvme 設備需要使用物理地址,因此需要實現這兩個地址之間的轉換(映射)。

可以考慮的方法:

  • inspect /proc/self 看虛擬地址和物理地址的映射關系。
    但是由於page swap out/swap in可能導致映射關系改變,沒法保證 nvme DAM 傳輸過程中pinned page的要求;

  • mlock call

    mlock 強制內存的一個虛擬page 一直被一個物理頁backed。這會導致swapping 被禁止。但這也無法保證那是static mapping,因為POSIX並沒有定義一個支持pining memory 的API,分配pinned memroy的機制是和 OS 相關的。

  • huge page

雖然這並非內核刻意的設計,但內核對huge page的處理不同於傳統4KB page的處理,kernel 從不回改變它對應的物理內存的位置。

在沒有IOMMU的情況下,上面通過huge page 申請到的虛擬地址還是需要通過轉換成物理地址。
MMU:mem virtual address <----> physical address of memory
IOMMU: pci bus address of nvme device buffer/cache <----> buffer virtual address
數據交換: mem virtual address <=====> buffer virtual address

深入理解SPDK: 內存管理