1. 程式人生 > >深入理解Linux核心第3版--筆記-2.pdf

深入理解Linux核心第3版--筆記-2.pdf

Chapter 8. Memory Management

8.1. Page Frame Management

8.1.1. Page Descriptors

State information of a page frame is kept in a page descriptor of type page

            All page descriptors are stored in the mem_map array.

            virt_to_page(addr)

                 pfn_to_page(pfn)

            8.1.2. Non-Uniform Memory Access (NUMA)

The physical memory inside each node can be split into several zones, as we will see in the next

                 section. Each node has a descriptor of type pg_data_t,

                 8.1.3. Memory Zones

Linux 2.6 partitions the physical memory of every memory node

                      into three zones

. In the 80 x 86 UMA architecture the zones are:

ZONE_DMA

                      Contains page frames of memory below 16 MB

              ZONE_NORMAL

                      Contains page frames of memory at and above 16 MB and below 896 MB

              ZONE_HIGHMEM

                      Contains page frames of memory at and above 896

MB

                      The ZONE_DMA and ZONE_NORMAL zones include the "normal" page frames that can be directly accessed

                      by the kernel through the linear mapping in the fourth gigabyte of the linear address space (see the

                      section "Kernel Page Tables" in Chapter 2). Conversely, the ZONE_HIGHMEM zone includes page frames

                      that cannot be directly accessed by the kernel through the linear mapping in the fourth gigabyte of

                      linear address space (see the section "Kernel Mappings of High-Memory Page Frames" later in this

                      chapter). The ZONE_HIGHMEM zone is always empty on 64-bit architectures.

                      Each memory zone has its own descriptor of type zone. Its fields are shown in Table 8-4.

                 8.1.4. The Pool of Reserved Page Frames

min_free_kbytes,

                initially min_free_kbytes cannot be lower than 128 and greater than 65,536

                      The pages_min field of the zone descriptor stores the number of reserved page frames inside the

                      zone. As we'll see in Chapter 17, this field plays also a role for the page frame reclaiming algorithm,

                      together with the pages_low and pages_high fields. The pages_low field is always set to 5/4 of the

                      value of pages_min, and pages_high is always set to 3/2 of the value of pages_min

8.1.5. The Zoned Page Frame Allocator

8.1.5.1. Requesting and releasing page frames

alloc_pages(gfp_mask, order)

              alloc_page(gfp_mask)

              Macro used to request 2order contiguous page frames. It returns the address of the descriptor

                      of the first allocated page frame or returns NULL if the allocation failed.

              _ _get_free_pages(gfp_mask, order

              _ _get_free_page(gfp_mask)

              get_zeroed_page(gfp_mask)

              _ _get_dma_pages(gfp_mask, order)

              but it returns the linear address of the first allocated page.

                      _ _free_pages(page, order)

              _ _free_page(page)

              This function checks the page descriptor pointed to by page; if the page frame is not reserved

                      (i.e., if the PG_reserved flag is equal to 0), it decreases the count field of the descriptor. If

                count becomes 0, it assumes that 2order contiguous page frames starting from the one

                      corresponding to page are no longer used. In this case, the function releases the page frames

                      as explained in the later section

free_pages(addr, order)

              free_page(addr)

                       but it receives as an argument the linear address addr of the first page frame to be released.

                 8.1.6. Kernel Mappings of High-Memory Page Frames????

The kernel uses three different mechanisms to map page frames in high memory; they are called

                      permanent kernel mapping, temporary kernel mapping, and noncontiguous memory allocation. In

                      this section, we'll cover the first two techniques; the third one is discussed in the section

                      "Noncontiguous Memory Area Management" later in this chapter

8.1.6.1. Permanent kernel mappings

page_address( );

                The page_address( ) function returns the linear address associated with the page frame, or NULL if the page frame is in high memory and is not                     mapped.

                kmap_high()                 The kmap_high( ) function is invoked if the page frame really belongs to high memory.

                kunmap( )

                The kunmap( ) function destroys a permanent kernel mapping established previously by kmap( ).

8.1.6.2. Temporary kernel mappings

 kmap_atomic( )

                 8.1.7. The Buddy System Algorithm

The technique adopted by Linux to solve the external fragmentation problem is based on the wellknown

                      buddy system algorithm. All free page frames are grouped into 11 lists of blocks that contain

                      groups of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames, respectively. The

                      largest request of 1024 page frames corresponds to a chunk of 4 MB of contiguous RAM. The

                      physical address of the first page frame of a block is a multiple of the group size.for example, the

                      initial address of a 16-page-frame block is a multiple of 16 x 212 (212 = 4,096, which is the regular

                      page size).

8.1.7.1. Data structures

1:zone->zone_mem_map Pointer to first page descriptor of the zone.

                      2:An array consisting of eleven elements of type free_area, one element for each group size.

                      The array is stored in the free_area field of the zone descriptor.

zone->free_area [k]

                            8.1.7.2. Allocating a block

The _ _rmqueue( ) function is used to find a free block in a zone

8.1.7.3. Freeing a block

_ _free_pages_bulk( )/__free_one_page()

function implements the buddy system strategy for freeing page frames

                 8.1.8. The Per-CPU Page Frame Cache

The main data structure implementing the per-CPU page frame cache is an array of per_cpu_pageset

                      data structures stored in the pageset field of the memory zone descriptor. The array includes one

                      element for each CPU; this element, in turn, consists of two per_cpu_pages descriptors, one for the

                      hot cache and the other for the cold cache. The fields of the per_cpu_pages descriptor are listed in

                      Table 8-7. The fields of the per_cpu_pages descriptor

                            Type Name Description

                int count Number of pages frame in the cache

                int low Low watermark for cache replenishing

                int high High watermark for cache depletion

                int batch Number of page frames to be added or subtracted from the cache

                struct list_head list List of descriptors of the page frames included in the cache

                            8.1.8.1. Allocating page frames through the per-CPU page frame caches

buffered_rmqueue( )

                    8.1.8.2. Releasing page frames to the per-CPU page frame caches

free_hot_cold_page( )

                 8.1.9. The Zone Allocator

_ _alloc_pages( )-->zone_watermark_ok( )

                    _ _free_pages( )-->__free_one_page()

8.2. Memory Area Management

                 8.2.1. The Slab Allocator

Figure 8-3. The slab allocator components

                 8.2.2. Cache Descriptor

1:   Each cache is described by a structure of type kmem_cache_t(eg:kmem_cache)

                    Table 8-8. The fields of the kmem_cache_t descriptor

Type                                Name                               Description

struct array_cache *array[]     array                   Per-CPU array of pointers to local caches of free objects (see the section                                                                                  "Local Caches of Free Slab Objects" later in this chapter).

            unsigned int                batchcount              Number of objects to be transferred in bulk to or from the local caches.

            unsigned int                limit                   Maximum number of free objects in the local caches. This is tunable.

struct kmem_list3          lists                   See next table.

                 unsigned int                objsize             Size of the objects included in the cache

            unsigned int                flags                           Set of flags that describes permanent properties of the cache.

                 unsigned int                num                            Number of objects packed into a single slab. (All slabs of the cache

                                                                                    have the same size.)

                 unsigned int                free_limit                     Upper limit of free objects in the whole slab cache

                 spinlock_t                     spinlock                  Cache spin lock.

                 unsigned int                gfporder                      Logarithm of the number of contiguous page frames included in a single slab.

                 unsigned int                gfpflags                 Set of flags passed to the buddy system function when allocating page                                                                                    frames.

                 size_t                  colour                          Number of colors for the slabs (see the section "Slab Coloring" later

                                                                                    in this chapter).

                 unsigned int                colour_off                    Basic alignment offset in the slabs.

                 unsigned int                colour_next                  Color to use for the next allocated slab.

                 kmem_cache_t*            slabp_cache                  Pointer to the general slab cache containing the slab descriptors

                                                                                    (NULL if internal slab descriptors are used; see next section).

                 unsigned int                  slab_size                      The size of a single slab

                 unsigned int                dflags                          Set of flags that describe dynamic properties of the cache

                 void *                          ctor                       Pointer to destructor method associated with the cache

                 void *                          dtor                       Pointer to destructor method associated with the cache

                 const char *                 name                          Character array storing the name of the cache

                 struct list_head             next                            Pointers for the doubly linked list of cache descriptors.

                 The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor is set to one if the slab descriptor is stored outside the slab; it is set to zero                         otherwise.

            2:   The lists field of the kmem_cache_t descriptor

                 8.2.3. Slab Descriptor

kmem_cache->flags :

                    The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor

                            is set to one if the slab descriptor is stored outside the slab;

                            External slab descriptor

                            Internal slab descriptor

                            Figure 8-4. Relationship between cache and slab descriptors

8.2.4. General and Specific Caches

general caches are:

                      1:   A first cache called kmem_cachewhose objects are the cache descriptors of the remaining

                            caches used by the kernel. The cache_cache variable contains the descriptor of this special cache.

                      2:        Several additional caches contain general purpose memory areas. The range of the memory

                            area sizes typically includes 13 geometrically distributed sizes. A table called malloc_sizes

                            (whose elements are of type cache_sizes) points to 26 cache descriptors associated with

                            memory areas of size 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768,

                            65,536, and 131,072 bytes. For each size, there are two caches: one suitable for ISA DMA

                            allocations and the other for normal allocations.

                 specific caches

Specific caches are created by the kmem_cache_create( ) function. Depending on the parameters,

                      the function first determines the best way to handle the new cache (for instance, whether to include

                      the slab descriptor inside or outside of the slab). It then allocates a cache descriptor for the new

                      cache from the cache_cache general cache and inserts the descriptor in the cache_chain list of cache

                      descriptors (the insertion is done after having acquired the cache_chain_sem semaphore that

                      protects the list from concurrent accesses).

                      It is also possible to destroy a cache and remove it from the cache_chain list by invoking

                kmem_cache_destroy( ). This function is mostly useful to modules that create their own caches when

                      loaded and destroy them when unloaded. To avoid wasting memory space, the kernel must destroy

                      all slabs before destroying the cache itself. The kmem_cache_shrink( ) function destroys all the slabs

                      in a cache by invoking slab_destroy( ) iteratively (see the later section "Releasing a Slab from a

                      Cache").

                      The names of all general and specific caches can be obtained at runtime by reading /proc/slabinfo;

                      this file also specifies the number of free objects and the number of allocated objects in each cach

                 8.2.5. Interfacing the Slab Allocator with the Zoned Page Frame Allocator

kmem_getpages( )

                    kmem_freepages( )

            8.2.6. Allocating a Slab to a Cache

cache_ grow( )

            8.2.7. Releasing a Slab from a Cache

slab_dest

            8.2.8. Object Descriptor

Internal object descriptors

                      External object descriptors

The first object descriptor in the array describes the first object in the slab, and so on. An object

                      descriptor is simply an unsigned short integer, which is meaningful only when the object is free. It

                      contains the index of the next free object in the slab, thus implementing a simple list of free objects

                      inside the slab. The object descriptor of the last element in the free object list is marked by the

                      conventional value BUFCTL_END (0xffff).

Figure 8-5. Relationships between slab and object descriptors

8.2.9. Aligning Objects in Memory

                 8.2.10. Slab Coloring

Objects that have the same offset within different slabs will end up mapped in the same cache line.

                 The cache hardware might therefore waste memory cycles transferring two objects from the same cache line back and forth to different RAM

                 locations, while other cache lines go underutilized.

                 policy called slab coloring : different arbitrary values called colors are assigned to the slabs.

Figure 8-6. Slab with color col and alignment aln

8.2.11. Local Caches of Free Slab Objects

cache of the slab allocator includes a per-CPU data structure consisting of a small array of pointers to freed objects called the slab local                       cache, the slab data structures get involved only when the local cache underflows or overflows

                      kmem_cache->array

Table 8-11. The fields of the array_cache structure

Type                         Name                               Description

unsigned int            avail                   Number of pointers to available objects in the local cache. The field also

                                                                   acts as the index of the first free slot in the cache.

                 unsigned int                 limit                   Size of the local cachethat is, the maximum number of pointers in the

                                                                   local cache.

unsigned int            batchcount              Chunk size for local cache refill or empty

unsigned int            touched             Flag set to 1 if the local cache has been recently used

                 8.2.12. Allocating a Slab Object

                 kmem_cache_alloc( )

            -->cache_alloc_refill( )

            8.2.13. Freeing a Slab Object

kmem_cache_free( )

            -->cache_flusharray( )

            8.2.14. General Purpose Objects

kmalloc( )

            kfree()

                 8.2.15. Memory Pools

                 "The Pool of Reserved Page Frames."

                 those page frames can be used only to satisfy atomic memory allocation requests issued by interrupt handlers or inside critical regions.

Memory Pools

                 is a reserve of dynamic memory that can be used only by a specific kernel component, namely the "owner" of the pool

A memory pool is described by a mempool_t object

                 Table 8-12. The fields of the mempool_t object

                     Type                  Name                        Description

                 spinlock_t      lock                Spin lock protecting the object fields

                 int         min_nr              Minimum number of elements in the memory pool

                 int         curr_nr         Current number of elements in the memory pool

                 void **     elements            Pointer to an array of pointers to the reserved elements

            void *          pool_data           Private data available to the pool's owner

                 mempool_alloc_t *   alloc               Method to allocate an element

                 mempool_free_t *    free                Method to free an element

            wait_queue_head_t   wait                Wait queue used when the memory pool is empty

                 mempool_create( )

            mempool_destroy( )

            mempool_alloc( )

                 mempool_free( )

        8.3. Noncontiguous Memory Area Management

it makes sense to consider an allocation scheme based on noncontiguous page frames accessed through contiguous linear

                 addresses . The main advantage of this schema is to avoid external fragmentation,

            8.3.1. Linear Addresses of Noncontiguous Memory Areas

Figure 8-7. The linear address interval starting from PAGE_OFFSET

                 1:   The beginning of the area includes the linear addresses that map the first 896 MB of RAM (see

                      the section "Process Page Tables" in Chapter 2); the linear address that corresponds to the end

                      of the directly mapped physical memory is stored in the high_memory variable.

                 2:   The end of the area contains the fix-mapped linear addresses (see the section "Fix-Mapped

                      Linear Addresses" in Chapter 2).

3:   The remaining linear addresses can be used for noncontiguous memory areas. A safety interval

                      of size 8 MB (macro VMALLOC_OFFSET) is inserted between the end of the physical memory

                      mapping and the first memory area; its purpose is to "capture" out-of-bounds memory

                      accesses. For the same reason, additional safety intervals of size 4 KB are inserted to separate

                      noncontiguous memory areas.

           8.3.2. Descriptors of Noncontiguous Memory Areas

Each noncontiguous memory area is associated with a descriptor of type vm_struct

            Table 8-13. The fields of the vm_struct descriptor

            Type                   Name                      Description

            void *      addr                Linear address of the first memory cell of the area

            unsigned long   size                Size of the area plus 4,096 (inter-area safety interval)

            unsigned long   flags               Type of memory mapped by the noncontiguous memory area

            struct page ** pages               Pointer to array of nr_pages pointers to page descriptors

                 unsigned int            nr_pages                Number of pages filled by the area

                 unsigned long          phys_addr              Set to 0 unless the area has been created to map the I/O shared

                                                        memory of a hardware device

                 struct vm_struct *    next                     Pointer to next vm_struct structure

        1:  These descriptors are inserted in a simple list by means of the next field; the address of the first element of the list is stored in the vmlist variable.

        2:  The flags field identifies the type of memory mapped by the area:

                  VM_ALLOC for pages obtained by means of vmalloc( ),

                  VM_MAP for already allocated pages mapped by means of vmap() (see the next section), and

                  VM_IOREMAP for on-board memory of hardware devices mapped by means of ioremap( ) (see Chapter 13).

        3:  The get_vm_area( ) function looks for a free range of linear addresses between VMALLOC_START and VMALLOC_END.

        8.3.3. Allocating a Noncontiguous Memory Area

vmalloc( )

The last crucial step consists of fiddling with the page table entries used by the kernel to

                 indicate that each page frame allocated to the noncontiguous memory area is now associated with a

                 linear address included in the interval of contiguous linear addresses yielded by vmalloc( ). This is

                 what map_vm_area( ) does.

        8.3.4. Releasing a Noncontiguous Memory Area

vfree( )

            -->remove_vm_area( )

Chapter 9. Process Address Space

        9.1. The Process's Address Space

The kernel represents intervals of linear addresses by means of resources called memory regions

        Table 9-1. System calls related to memory region creation and deletion

                     System call                                   Description

            brk( )                  Changes the heap size of the process

            execve( )                   Loads a new executable file, thus changing the process address space

            _exit( )                    Terminates the current process and destroys its address space       

            fork( )                 Creates a new process, and thus a new address space

            mmap( ), mmap2( )                Creates a memory mapping for a file, thus enlarging the process address space

            mremap( )                   Expands or shrinks a memory region

            remap_file_pages()              Creates a non-linear mapping for a file (see Chapter 16)

            munmap( )                   Destroys a memory mapping for a file, thus contracting the process address space

            shmat( )                    Attaches a shared memory region

            shmdt( )                    Detaches a shared memory region

        9.2. The Memory Descriptor

            mm_struct

            Table 9-2. The fields of the memory descriptor

            Type                                Field                          Description

            struct vm_area_struct*      mmap                Pointer to the head of the list of memory region objects

            struct rb_root          mm_rb               Pointer to the root of the red-black tree of memory region objects

            struct vm_area_struct*      mmap_cache          Pointer to the last referenced memory region object

            unsigned long(*)( )              get_unmapped_area      Method that searches an available linear address interval in

                                                                   the process address space

            void (*)( )                    unmap_area           Method invoked when releasing a linear address interval

            unsigned long                     mmap_base            Identifies the linear address of the first allocated

                                                                   anonymous memory region or file memory mapping (see

                                                                   the section "Program Segments and Process Memory

                                                                   Regions" in Chapter 20)

            unsigned long                     free_area_cache           Address from which the kernel will look for a free interval of

                                                linear addresses in the process address space

                 pgd_t *             pgd             Pointer to the Page Global Directory

                 atomic_t                mm_users            Secondary usage counter

            atomic_t                mm_count            Main usage counter

            struct rw_semaphore         mmap_sem            Memory regions' read/write semaphore

                 spinlock_t              page_table_lock     Memory regions' and Page Tables' spin lock

                 struct list_head            mmlist              Pointers to adjacent elements in the list of memory descriptors

            unsigned long           start_code          Initial address of executable code

            unsigned long           end_code            Final address of executable code

            unsigned long           start_data         

相關推薦

no