深入理解Linux核心第3版--筆記-2.pdf
Chapter 8. Memory Management
8.1. Page Frame Management
8.1.1. Page Descriptors
State information of a page frame is kept in a page descriptor of type page
All page descriptors are stored in the mem_map array.
virt_to_page(addr)
pfn_to_page(pfn)
8.1.2. Non-Uniform Memory Access (NUMA)
The physical memory inside each node can be split into several zones, as we will see in the next
section. Each node has a descriptor of type pg_data_t,
8.1.3. Memory Zones
Linux 2.6 partitions the physical memory of every memory node
into three zones
ZONE_DMA
Contains page frames of memory below 16 MB
ZONE_NORMAL
Contains page frames of memory at and above 16 MB and below 896 MB
ZONE_HIGHMEM
Contains page frames of memory at and above 896
The ZONE_DMA and ZONE_NORMAL zones include the "normal" page frames that can be directly accessed
by the kernel through the linear mapping in the fourth gigabyte of the linear address space (see the
section "Kernel Page Tables" in Chapter 2). Conversely, the ZONE_HIGHMEM zone includes page frames
that cannot be directly accessed by the kernel through the linear mapping in the fourth gigabyte of
linear address space (see the section "Kernel Mappings of High-Memory Page Frames" later in this
chapter). The ZONE_HIGHMEM zone is always empty on 64-bit architectures.
Each memory zone has its own descriptor of type zone. Its fields are shown in Table 8-4.
8.1.4. The Pool of Reserved Page Frames
min_free_kbytes,
initially min_free_kbytes cannot be lower than 128 and greater than 65,536
The pages_min field of the zone descriptor stores the number of reserved page frames inside the
zone. As we'll see in Chapter 17, this field plays also a role for the page frame reclaiming algorithm,
together with the pages_low and pages_high fields. The pages_low field is always set to 5/4 of the
value of pages_min, and pages_high is always set to 3/2 of the value of pages_min
8.1.5. The Zoned Page Frame Allocator
8.1.5.1. Requesting and releasing page frames
alloc_pages(gfp_mask, order)
alloc_page(gfp_mask)
Macro used to request 2order contiguous page frames. It returns the address of the descriptor
of the first allocated page frame or returns NULL if the allocation failed.
_ _get_free_pages(gfp_mask, order
_ _get_free_page(gfp_mask)
get_zeroed_page(gfp_mask)
_ _get_dma_pages(gfp_mask, order)
but it returns the linear address of the first allocated page.
_ _free_pages(page, order)
_ _free_page(page)
This function checks the page descriptor pointed to by page; if the page frame is not reserved
(i.e., if the PG_reserved flag is equal to 0), it decreases the count field of the descriptor. If
count becomes 0, it assumes that 2order contiguous page frames starting from the one
corresponding to page are no longer used. In this case, the function releases the page frames
as explained in the later section
free_pages(addr, order)
free_page(addr)
but it receives as an argument the linear address addr of the first page frame to be released.
8.1.6. Kernel Mappings of High-Memory Page Frames????
The kernel uses three different mechanisms to map page frames in high memory; they are called
permanent kernel mapping, temporary kernel mapping, and noncontiguous memory allocation. In
this section, we'll cover the first two techniques; the third one is discussed in the section
"Noncontiguous Memory Area Management" later in this chapter
8.1.6.1. Permanent kernel mappings
page_address( );
The page_address( ) function returns the linear address associated with the page frame, or NULL if the page frame is in high memory and is not mapped.
kmap_high() The kmap_high( ) function is invoked if the page frame really belongs to high memory.
kunmap( )
The kunmap( ) function destroys a permanent kernel mapping established previously by kmap( ).
8.1.6.2. Temporary kernel mappings
kmap_atomic( )
8.1.7. The Buddy System Algorithm
The technique adopted by Linux to solve the external fragmentation problem is based on the wellknown
buddy system algorithm. All free page frames are grouped into 11 lists of blocks that contain
groups of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames, respectively. The
largest request of 1024 page frames corresponds to a chunk of 4 MB of contiguous RAM. The
physical address of the first page frame of a block is a multiple of the group size.for example, the
initial address of a 16-page-frame block is a multiple of 16 x 212 (212 = 4,096, which is the regular
page size).
8.1.7.1. Data structures
1:zone->zone_mem_map Pointer to first page descriptor of the zone.
2:An array consisting of eleven elements of type free_area, one element for each group size.
The array is stored in the free_area field of the zone descriptor.
zone->free_area [k]
8.1.7.2. Allocating a block
The _ _rmqueue( ) function is used to find a free block in a zone
8.1.7.3. Freeing a block
_ _free_pages_bulk( )/__free_one_page()
function implements the buddy system strategy for freeing page frames
8.1.8. The Per-CPU Page Frame Cache
The main data structure implementing the per-CPU page frame cache is an array of per_cpu_pageset
data structures stored in the pageset field of the memory zone descriptor. The array includes one
element for each CPU; this element, in turn, consists of two per_cpu_pages descriptors, one for the
hot cache and the other for the cold cache. The fields of the per_cpu_pages descriptor are listed in
Table 8-7. The fields of the per_cpu_pages descriptor
Type Name Description
int count Number of pages frame in the cache
int low Low watermark for cache replenishing
int high High watermark for cache depletion
int batch Number of page frames to be added or subtracted from the cache
struct list_head list List of descriptors of the page frames included in the cache
8.1.8.1. Allocating page frames through the per-CPU page frame caches
buffered_rmqueue( )
8.1.8.2. Releasing page frames to the per-CPU page frame caches
free_hot_cold_page( )
8.1.9. The Zone Allocator
_ _alloc_pages( )-->zone_watermark_ok( )
_ _free_pages( )-->__free_one_page()
8.2. Memory Area Management
8.2.1. The Slab Allocator
Figure 8-3. The slab allocator components
8.2.2. Cache Descriptor
1: Each cache is described by a structure of type kmem_cache_t(eg:kmem_cache)
Table 8-8. The fields of the kmem_cache_t descriptor
Type Name Description
struct array_cache *array[] array Per-CPU array of pointers to local caches of free objects (see the section "Local Caches of Free Slab Objects" later in this chapter).
unsigned int batchcount Number of objects to be transferred in bulk to or from the local caches.
unsigned int limit Maximum number of free objects in the local caches. This is tunable.
struct kmem_list3 lists See next table.
unsigned int objsize Size of the objects included in the cache
unsigned int flags Set of flags that describes permanent properties of the cache.
unsigned int num Number of objects packed into a single slab. (All slabs of the cache
have the same size.)
unsigned int free_limit Upper limit of free objects in the whole slab cache
spinlock_t spinlock Cache spin lock.
unsigned int gfporder Logarithm of the number of contiguous page frames included in a single slab.
unsigned int gfpflags Set of flags passed to the buddy system function when allocating page frames.
size_t colour Number of colors for the slabs (see the section "Slab Coloring" later
in this chapter).
unsigned int colour_off Basic alignment offset in the slabs.
unsigned int colour_next Color to use for the next allocated slab.
kmem_cache_t* slabp_cache Pointer to the general slab cache containing the slab descriptors
(NULL if internal slab descriptors are used; see next section).
unsigned int slab_size The size of a single slab
unsigned int dflags Set of flags that describe dynamic properties of the cache
void * ctor Pointer to destructor method associated with the cache
void * dtor Pointer to destructor method associated with the cache
const char * name Character array storing the name of the cache
struct list_head next Pointers for the doubly linked list of cache descriptors.
The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor is set to one if the slab descriptor is stored outside the slab; it is set to zero otherwise.
2: The lists field of the kmem_cache_t descriptor
8.2.3. Slab Descriptor
kmem_cache->flags :
The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor
is set to one if the slab descriptor is stored outside the slab;
External slab descriptor
Internal slab descriptor
Figure 8-4. Relationship between cache and slab descriptors
8.2.4. General and Specific Caches
general caches are:
1: A first cache called kmem_cachewhose objects are the cache descriptors of the remaining
caches used by the kernel. The cache_cache variable contains the descriptor of this special cache.
2: Several additional caches contain general purpose memory areas. The range of the memory
area sizes typically includes 13 geometrically distributed sizes. A table called malloc_sizes
(whose elements are of type cache_sizes) points to 26 cache descriptors associated with
memory areas of size 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768,
65,536, and 131,072 bytes. For each size, there are two caches: one suitable for ISA DMA
allocations and the other for normal allocations.
specific caches
Specific caches are created by the kmem_cache_create( ) function. Depending on the parameters,
the function first determines the best way to handle the new cache (for instance, whether to include
the slab descriptor inside or outside of the slab). It then allocates a cache descriptor for the new
cache from the cache_cache general cache and inserts the descriptor in the cache_chain list of cache
descriptors (the insertion is done after having acquired the cache_chain_sem semaphore that
protects the list from concurrent accesses).
It is also possible to destroy a cache and remove it from the cache_chain list by invoking
kmem_cache_destroy( ). This function is mostly useful to modules that create their own caches when
loaded and destroy them when unloaded. To avoid wasting memory space, the kernel must destroy
all slabs before destroying the cache itself. The kmem_cache_shrink( ) function destroys all the slabs
in a cache by invoking slab_destroy( ) iteratively (see the later section "Releasing a Slab from a
Cache").
The names of all general and specific caches can be obtained at runtime by reading /proc/slabinfo;
this file also specifies the number of free objects and the number of allocated objects in each cach
8.2.5. Interfacing the Slab Allocator with the Zoned Page Frame Allocator
kmem_getpages( )
kmem_freepages( )
8.2.6. Allocating a Slab to a Cache
cache_ grow( )
8.2.7. Releasing a Slab from a Cache
slab_dest
8.2.8. Object Descriptor
Internal object descriptors
External object descriptors
The first object descriptor in the array describes the first object in the slab, and so on. An object
descriptor is simply an unsigned short integer, which is meaningful only when the object is free. It
contains the index of the next free object in the slab, thus implementing a simple list of free objects
inside the slab. The object descriptor of the last element in the free object list is marked by the
conventional value BUFCTL_END (0xffff).
Figure 8-5. Relationships between slab and object descriptors
8.2.9. Aligning Objects in Memory
8.2.10. Slab Coloring
Objects that have the same offset within different slabs will end up mapped in the same cache line.
The cache hardware might therefore waste memory cycles transferring two objects from the same cache line back and forth to different RAM
locations, while other cache lines go underutilized.
policy called slab coloring : different arbitrary values called colors are assigned to the slabs.
Figure 8-6. Slab with color col and alignment aln
8.2.11. Local Caches of Free Slab Objects
cache of the slab allocator includes a per-CPU data structure consisting of a small array of pointers to freed objects called the slab local cache, the slab data structures get involved only when the local cache underflows or overflows
kmem_cache->array
Table 8-11. The fields of the array_cache structure
Type Name Description
unsigned int avail Number of pointers to available objects in the local cache. The field also
acts as the index of the first free slot in the cache.
unsigned int limit Size of the local cachethat is, the maximum number of pointers in the
local cache.
unsigned int batchcount Chunk size for local cache refill or empty
unsigned int touched Flag set to 1 if the local cache has been recently used
8.2.12. Allocating a Slab Object
kmem_cache_alloc( )
-->cache_alloc_refill( )
8.2.13. Freeing a Slab Object
kmem_cache_free( )
-->cache_flusharray( )
8.2.14. General Purpose Objects
kmalloc( )
kfree()
8.2.15. Memory Pools
"The Pool of Reserved Page Frames."
those page frames can be used only to satisfy atomic memory allocation requests issued by interrupt handlers or inside critical regions.
Memory Pools
is a reserve of dynamic memory that can be used only by a specific kernel component, namely the "owner" of the pool
A memory pool is described by a mempool_t object
Table 8-12. The fields of the mempool_t object
Type Name Description
spinlock_t lock Spin lock protecting the object fields
int min_nr Minimum number of elements in the memory pool
int curr_nr Current number of elements in the memory pool
void ** elements Pointer to an array of pointers to the reserved elements
void * pool_data Private data available to the pool's owner
mempool_alloc_t * alloc Method to allocate an element
mempool_free_t * free Method to free an element
wait_queue_head_t wait Wait queue used when the memory pool is empty
mempool_create( )
mempool_destroy( )
mempool_alloc( )
mempool_free( )
8.3. Noncontiguous Memory Area Management
it makes sense to consider an allocation scheme based on noncontiguous page frames accessed through contiguous linear
addresses . The main advantage of this schema is to avoid external fragmentation,
8.3.1. Linear Addresses of Noncontiguous Memory Areas
Figure 8-7. The linear address interval starting from PAGE_OFFSET
1: The beginning of the area includes the linear addresses that map the first 896 MB of RAM (see
the section "Process Page Tables" in Chapter 2); the linear address that corresponds to the end
of the directly mapped physical memory is stored in the high_memory variable.
2: The end of the area contains the fix-mapped linear addresses (see the section "Fix-Mapped
Linear Addresses" in Chapter 2).
3: The remaining linear addresses can be used for noncontiguous memory areas. A safety interval
of size 8 MB (macro VMALLOC_OFFSET) is inserted between the end of the physical memory
mapping and the first memory area; its purpose is to "capture" out-of-bounds memory
accesses. For the same reason, additional safety intervals of size 4 KB are inserted to separate
noncontiguous memory areas.
8.3.2. Descriptors of Noncontiguous Memory Areas
Each noncontiguous memory area is associated with a descriptor of type vm_struct
Table 8-13. The fields of the vm_struct descriptor
Type Name Description
void * addr Linear address of the first memory cell of the area
unsigned long size Size of the area plus 4,096 (inter-area safety interval)
unsigned long flags Type of memory mapped by the noncontiguous memory area
struct page ** pages Pointer to array of nr_pages pointers to page descriptors
unsigned int nr_pages Number of pages filled by the area
unsigned long phys_addr Set to 0 unless the area has been created to map the I/O shared
memory of a hardware device
struct vm_struct * next Pointer to next vm_struct structure
1: These descriptors are inserted in a simple list by means of the next field; the address of the first element of the list is stored in the vmlist variable.
2: The flags field identifies the type of memory mapped by the area:
VM_ALLOC for pages obtained by means of vmalloc( ),
VM_MAP for already allocated pages mapped by means of vmap() (see the next section), and
VM_IOREMAP for on-board memory of hardware devices mapped by means of ioremap( ) (see Chapter 13).
3: The get_vm_area( ) function looks for a free range of linear addresses between VMALLOC_START and VMALLOC_END.
8.3.3. Allocating a Noncontiguous Memory Area
vmalloc( )
The last crucial step consists of fiddling with the page table entries used by the kernel to
indicate that each page frame allocated to the noncontiguous memory area is now associated with a
linear address included in the interval of contiguous linear addresses yielded by vmalloc( ). This is
what map_vm_area( ) does.
8.3.4. Releasing a Noncontiguous Memory Area
vfree( )
-->remove_vm_area( )
Chapter 9. Process Address Space
9.1. The Process's Address Space
The kernel represents intervals of linear addresses by means of resources called memory regions
Table 9-1. System calls related to memory region creation and deletion
System call Description
brk( ) Changes the heap size of the process
execve( ) Loads a new executable file, thus changing the process address space
_exit( ) Terminates the current process and destroys its address space
fork( ) Creates a new process, and thus a new address space
mmap( ), mmap2( ) Creates a memory mapping for a file, thus enlarging the process address space
mremap( ) Expands or shrinks a memory region
remap_file_pages() Creates a non-linear mapping for a file (see Chapter 16)
munmap( ) Destroys a memory mapping for a file, thus contracting the process address space
shmat( ) Attaches a shared memory region
shmdt( ) Detaches a shared memory region
9.2. The Memory Descriptor
mm_struct
Table 9-2. The fields of the memory descriptor
Type Field Description
struct vm_area_struct* mmap Pointer to the head of the list of memory region objects
struct rb_root mm_rb Pointer to the root of the red-black tree of memory region objects
struct vm_area_struct* mmap_cache Pointer to the last referenced memory region object
unsigned long(*)( ) get_unmapped_area Method that searches an available linear address interval in
the process address space
void (*)( ) unmap_area Method invoked when releasing a linear address interval
unsigned long mmap_base Identifies the linear address of the first allocated
anonymous memory region or file memory mapping (see
the section "Program Segments and Process Memory
Regions" in Chapter 20)
unsigned long free_area_cache Address from which the kernel will look for a free interval of
linear addresses in the process address space
pgd_t * pgd Pointer to the Page Global Directory
atomic_t mm_users Secondary usage counter
atomic_t mm_count Main usage counter
struct rw_semaphore mmap_sem Memory regions' read/write semaphore
spinlock_t page_table_lock Memory regions' and Page Tables' spin lock
struct list_head mmlist Pointers to adjacent elements in the list of memory descriptors
unsigned long start_code Initial address of executable code
unsigned long end_code Final address of executable code
unsigned long start_data
相關推薦
no