1. 程式人生 > >linux I-O體系結構和裝置驅動程式

linux I-O體系結構和裝置驅動程式

裝置驅動程式模型

基於linux 3.13

sysfs檔案系統

允許使用者態應用程式訪問核心內部資料結構的一種檔案系統。被安裝於/sys目錄下,相應的高層目錄結構如下:

block
    塊裝置,獨立於所連線的匯流排
devices
    所有被核心所識別的硬體裝置,依照連線它們的匯流排對其進行組織
bus
    系統中用於連線裝置的匯流排
dev
    在核心中註冊的裝置,分block和char兩大類,存放的裝置號
    (major:minor),其連結到devices目錄下的裝置
class
    系統中裝置的型別(音效卡、網絡卡等),同一類可能包含由不同匯流排連結的裝置,
    於是由不同的驅動程式驅動
power
    處理一些硬體裝置電源狀態的檔案
firmware
    處理一些硬體裝置的韌體的檔案
module
包含所有編譯的模組資訊 fs 處理一些特殊的檔案系統,如cgroup,fuse hypevisor 與虛擬化xen技術有關

kobject

include/linux/kobject.h
裝置驅動程式模型的核心資料結構是一個普通的資料結構,叫做kobject,多個kobject聚成一個kset。kset中包含kobject,kobject又可以聚成一個上層的kset。這樣形成一個樹形結構,對應/sys目錄下的檔案

                 kset(kobject)
       /             / ...           \
   kset(kobject)    kset(kobject)     kset(kobject)
   /       \           /       \         /       \  
 kobject   kobject   kobject   kobject  kobject  kobject

struct
kobject { const char *name; // 指向容器名稱的字串 struct list_head entry; //用於kobject所插入的連結串列的指標 struct kobject *parent; // 指向父kobject struct kset *kset; //指向包含該kobject的kset struct kobj_type *ktype; // 型別 struct sysfs_dirent *sd; // 指向與kobject對應的sysfs檔案中的sysfs_dirent資料結構 struct
kref kref; //引用計數 #ifdef CONFIG_DEBUG_KOBJECT_RELEASE struct delayed_work release; #endif unsigned int state_initialized:1; unsigned int state_in_sysfs:1; unsigned int state_add_uevent_sent:1; unsigned int state_remove_uevent_sent:1; unsigned int uevent_suppress:1; }; struct kobj_type { void (*release)(struct kobject *kobj); // 釋放kobject const struct sysfs_ops *sysfs_ops; //sysfs操作表的操作,包含兩個函式show和store,對應讀和寫 struct attribute **default_attrs; //sysfs檔案系統預設屬性連結串列 const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); const void *(*namespace)(struct kobject *kobj); }; struct kset { struct list_head list; // 包含在kset中的kobject的頭部 spinlock_t list_lock; //遍歷kobject表的鎖 struct kobject kobj; //內嵌在kset中的kobject const struct kset_uevent_ops *uevent_ops; //處理所有kobject結構的公共方法 };

如果想讓kobject、kset出現在sysfs子樹中,就必須首先註冊它們。與kobject對應的目錄總是出現在其父kobject的目錄中。因此,sysfs子樹的結構就描述了各種已註冊的kobject之間以及各種容器物件(kset)之間的層次關係。
kset_register和kset_unregister分別用來註冊和撤銷kset的。

裝置驅動程式模型的元件

include/linux/device.h
裝置驅動程式模型建立在幾個基本資料結構之上

/**
 * struct device - The basic device structure
 * @parent: The device's "parent" device, the device to which it is attached.
 *      In most cases, a parent device is some sort of bus or host
 *      controller. If parent is NULL, the device, is a top-level device,
 *      which is not usually what you want.
 * @p:      Holds the private data of the driver core portions of the device.
 *      See the comment of the struct device_private for detail.
 * @kobj:   A top-level, abstract class from which other classes are derived.
 * @init_name:  Initial name of the device.
 * @type:   The type of device.
 *      This identifies the device type and carries type-specific
 *      information.
 * @mutex:  Mutex to synchronize calls to its driver.
 * @bus:    Type of bus device is on.
 * @driver: Which driver has allocated this
 * @platform_data: Platform data specific to the device.
 *      Example: For devices on custom boards, as typical of embedded
 *      and SOC based hardware, Linux often uses platform_data to point
 *      to board-specific structures describing devices and how they
 *      are wired.  That can include what ports are available, chip
 *      variants, which GPIO pins act in what additional roles, and so
 *      on.  This shrinks the "Board Support Packages" (BSPs) and
 *      minimizes board-specific #ifdefs in drivers.
 * @power:  For device power management.
 *      See Documentation/power/devices.txt for details.
 * @pm_domain:  Provide callbacks that are executed during system suspend,
 *      hibernation, system resume and during runtime PM transitions
 *      along with subsystem-level and driver-level callbacks.
 * @pins:   For device pin management.
 *      See Documentation/pinctrl.txt for details.
 * @numa_node:  NUMA node this device is close to.
 * @dma_mask:   Dma mask (if dma'ble device).
 * @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not all
 *      hardware supports 64-bit addresses for consistent allocations
 *      such descriptors.
 * @dma_parms:  A low level driver may set these to teach IOMMU code about
 *      segment limitations.
 * @dma_pools:  Dma pools (if dma'ble device).
 * @dma_mem:    Internal for coherent mem override.
 * @cma_area:   Contiguous memory area for dma allocations
 * @archdata:   For arch-specific additions.
 * @of_node:    Associated device tree node.
 * @acpi_node:  Associated ACPI device node.
 * @devt:   For creating the sysfs "dev".
 * @id:     device instance
 * @devres_lock: Spinlock to protect the resource of the device.
 * @devres_head: The resources list of the device.
 * @knode_class: The node used to add the device to the class list.
 * @class:  The class of the device.
 * @groups: Optional attribute groups.
 * @release:    Callback to free the device after all references have
 *      gone away. This should be set by the allocator of the
 *      device (i.e. the bus driver that discovered the device).
 * @iommu_group: IOMMU group the device belongs to.
 *
 * @offline_disabled: If set, the device is permanently online.
 * @offline:    Set after successful invocation of bus type's .offline().
 *
 * At the lowest level, every device in a Linux system is represented by an
 * instance of struct device. The device structure contains the information
 * that the device model core needs to model the system. Most subsystems,
 * however, track additional information about the devices they host. As a
 * result, it is rare for devices to be represented by bare device structures;
 * instead, that structure, like kobject structures, is usually embedded within
 * a higher-level representation of the device.
 */
struct device {
    struct device       *parent;   

    struct device_private   *p;  // 私有資料

    struct kobject kobj;  // 內嵌的kobject物件
    const char      *init_name; /* initial name of the device */
    const struct device_type *type; //裝置型別,包含特定的資訊,以及公共操作

    struct mutex        mutex;  /* mutex to synchronize calls to its driver.    */

    struct bus_type *bus;       /* type of bus device is on */
    struct device_driver *driver;   /* which driver has allocated this device */
    void    *platform_data; /* Platform specific data, device
                       core doesn't touch it */
    struct dev_pm_info  power;
    struct dev_pm_domain    *pm_domain;

#ifdef CONFIG_PINCTRL
    struct dev_pin_info *pins;
#endif

#ifdef CONFIG_NUMA
    int     numa_node;  /* NUMA node this device is close to */
#endif
    u64     *dma_mask;  /* dma mask (if dma'able device) */
    u64     coherent_dma_mask;/* Like dma_mask, but for
                         alloc_coherent mappings as
                         not all hardware supports
                         64 bit addresses for consistent
                         allocations such descriptors. */

    struct device_dma_parameters *dma_parms;

    struct list_head    dma_pools;  /* dma pools (if dma'ble) */

    struct dma_coherent_mem *dma_mem; /* internal for coherent mem override */
#ifdef CONFIG_DMA_CMA
    struct cma *cma_area;       /* contiguous memory area for dma allocations */
#endif
    /* arch specific additions */
    struct dev_archdata archdata;

    struct device_node  *of_node; /* associated device tree node */
    struct acpi_dev_node    acpi_node; /* associated ACPI device node */

    dev_t       devt;   /* dev_t, creates the sysfs "dev" */
    u32         id; /* device instance */

    spinlock_t      devres_lock;
    struct list_head    devres_head; // 資源列表

    struct klist_node   knode_class;
    struct class        *class;
    const struct attribute_group **groups;  /* optional groups */

    void    (*release)(struct device *dev);
    struct iommu_group  *iommu_group;

    bool            offline_disabled:1;
    bool            offline:1;
}

device_register函式是往裝置驅動程式模型中插入一個新的device物件,其通過內嵌的kobject物件連結到整個kobject層次樹中,然後再連結到其他的子系統中,比如bus,class。

/**
 * struct device_driver - The basic device driver structure
 * @name:   Name of the device driver.
 * @bus:    The bus which the device of this driver belongs to.
 * @owner:  The module owner.
 * @mod_name:   Used for built-in modules.
 * @suppress_bind_attrs: Disables bind/unbind via sysfs.
 * @of_match_table: The open firmware table.
 * @acpi_match_table: The ACPI match table.
 * @probe:  Called to query the existence of a specific device,
 *      whether this driver can work with it, and bind the driver
 *      to a specific device.
 * @remove: Called when the device is removed from the system to
 *      unbind a device from this driver.
 * @shutdown:   Called at shut-down time to quiesce the device.
 * @suspend:    Called to put the device to sleep mode. Usually to a
 *      low power state.
 * @resume: Called to bring a device from sleep mode.
 * @groups: Default attributes that get created by the driver core
 *      automatically.
 * @pm:     Power management operations of the device which matched
 *      this driver.
 * @p:      Driver core's private data, no one other than the driver
 *      core can touch this.
 *
 * The device driver-model tracks all of the drivers known to the system.
 * The main reason for this tracking is to enable the driver core to match
 * up drivers with new devices. Once drivers are known objects within the
 * system, however, a number of other things become possible. Device drivers
 * can export information and configuration variables that are independent
 * of any specific device.
 */
struct device_driver {
    const char      *name;
    struct bus_type     *bus;

    struct module       *owner;
    const char      *mod_name;  /* used for built-in modules */

    bool suppress_bind_attrs;   /* disables bind/unbind via sysfs 在kernel中,bind/unbind是從使用者空間手動的為driver繫結/解繫結指定的裝置的機制。*/ 

    const struct of_device_id   *of_match_table;  //用來匹配裝置
    const struct acpi_device_id *acpi_match_table; // 用來匹配支援acpi的裝置

    int (*probe) (struct device *dev);
    int (*remove) (struct device *dev);
    void (*shutdown) (struct device *dev);
    int (*suspend) (struct device *dev, pm_message_t state);
    int (*resume) (struct device *dev);
    const struct attribute_group **groups;  // 驅動建立的預設屬性

    const struct dev_pm_ops *pm;  // 電源管理操作

    struct driver_private *p;  // 私有資料
};

device_driver的probe方法是當驅動程式發現一個可能由它處理的裝置時就會呼叫的方法,相應的函式將會探測該硬體,從而對該裝置進行更進一步的檢查。

/**
 * struct bus_type - The bus type of the device
 *
 * @name:   The name of the bus.
 * @dev_name:   Used for subsystems to enumerate devices like ("foo%u", dev->id).
 * @dev_root:   Default device to use as the parent.
 * @dev_attrs:  Default attributes of the devices on the bus.
 * @bus_groups: Default attributes of the bus.
 * @dev_groups: Default attributes of the devices on the bus.
 * @drv_groups: Default attributes of the device drivers on the bus.
 * @match:  Called, perhaps multiple times, whenever a new device or driver
 *      is added for this bus. It should return a nonzero value if the
 *      given device can be handled by the given driver.
 * @uevent: Called when a device is added, removed, or a few other things
 *      that generate uevents to add the environment variables.
 * @probe:  Called when a new device or driver add to this bus, and callback
 *      the specific driver's probe to initial the matched device.
 * @remove: Called when a device removed from this bus.
 * @shutdown:   Called at shut-down time to quiesce the device.
 *
 * @online: Called to put the device back online (after offlining it).
 * @offline:    Called to put the device offline for hot-removal. May fail.
 *
 * @suspend:    Called when a device on this bus wants to go to sleep mode.
 * @resume: Called to bring a device on this bus out of sleep mode.
 * @pm:     Power management operations of this bus, callback the specific
 *      device driver's pm-ops.
 * @iommu_ops:  IOMMU specific operations for this bus, used to attach IOMMU
 *              driver implementations to a bus and allow the driver to do
 *              bus-specific setup
 * @p:      The private data of the driver core, only the driver core can
 *      touch this.
 * @lock_key:   Lock class key for use by the lock validator
 *
 * A bus is a channel between the processor and one or more devices. For the
 * purposes of the device model, all devices are connected via a bus, even if
 * it is an internal, virtual, "platform" bus. Buses can plug into each other.
 * A USB controller is usually a PCI device, for example. The device model
 * represents the actual connections between buses and the devices they control.
 * A bus is represented by the bus_type structure. It contains the name, the
 * default attributes, the bus' methods, PM operations, and the driver core's
 * private data.
 */
struct bus_type {
    const char      *name;
    const char      *dev_name;
    struct device       *dev_root;
    struct device_attribute *dev_attrs; /* use dev_groups instead */
    const struct attribute_group **bus_groups;
    const struct attribute_group **dev_groups;
    const struct attribute_group **drv_groups;

    int (*match)(struct device *dev, struct device_driver *drv);
    int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
    int (*probe)(struct device *dev);
    int (*remove)(struct device *dev);
    void (*shutdown)(struct device *dev);

    int (*online)(struct device *dev);
    int (*offline)(struct device *dev);

    int (*suspend)(struct device *dev, pm_message_t state);
    int (*resume)(struct device *dev);

    const struct dev_pm_ops *pm;

    struct iommu_ops *iommu_ops;

    struct subsys_private *p;
    struct lock_class_key lock_key;
};

bus_type是與/sys/bus的目錄對應的,其下有多種型別,如pci,pci
下面還有devices目錄,其下的都是與該匯流排連結的裝置。因為在/sys/devices下面已經有了裝置,所以/sys/bus/pci/devices下面的都連結到/sys/devices下面的裝置

/**
 * struct class - device classes
 * @name:   Name of the class.
 * @owner:  The module owner.
 * @class_attrs: Default attributes of this class.
 * @dev_groups: Default attributes of the devices that belong to the class.
 * @dev_kobj:   The kobject that represents this class and links it into the hierarchy.
 * @dev_uevent: Called when a device is added, removed from this class, or a
 *      few other things that generate uevents to add the environment
 *      variables.
 * @devnode:    Callback to provide the devtmpfs.
 * @class_release: Called to release this class.
 * @dev_release: Called to release the device.
 * @suspend:    Used to put the device to sleep mode, usually to a low power
 *      state.
 * @resume: Used to bring the device from the sleep mode.
 * @ns_type:    Callbacks so sysfs can detemine namespaces.
 * @namespace:  Namespace of the device belongs to this class.
 * @pm:     The default device power management operations of this class.
 * @p:      The private data of the driver core, no one other than the
 *      driver core can touch this.
 *
 * A class is a higher-level view of a device that abstracts out low-level
 * implementation details. Drivers may see a SCSI disk or an ATA disk, but,
 * at the class level, they are all simply disks. Classes allow user space
 * to work with devices based on what they do, rather than how they are
 * connected or how they work.
 */
struct class {
    const char      *name;
    struct module       *owner;

    struct class_attribute      *class_attrs;
    const struct attribute_group    **dev_groups;
    struct kobject          *dev_kobj;

    int (*dev_uevent)(struct device *dev, struct kobj_uevent_env *env);
    char *(*devnode)(struct device *dev, umode_t *mode);

    void (*class_release)(struct class *class);
    void (*dev_release)(struct device *dev);

    int (*suspend)(struct device *dev, pm_message_t state);
    int (*resume)(struct device *dev);

    const struct kobj_ns_type_operations *ns_type;
    const void *(*namespace)(struct device *dev);

    const struct dev_pm_ops *pm;

    struct subsys_private *p;
};

class是高度抽象的裝置型別,比如scsi磁碟和ata磁碟,都被歸為磁碟。這樣抽象可以讓使用者空間只跟磁碟的通用特性打交道,而不用關心底層的連線、尋道等。

裝置檔案

  類unix系統都是基於檔案概念的,檔案是由位元組序列而構成的資訊載體。根據這一點,可以把I/O裝置當成裝置檔案這種所謂的特殊檔案來處理。因此,與磁碟上的普通檔案進行互動所用的同一系統呼叫可直接用於I/O裝置。
  通常,裝置識別符號由裝置檔案的型別(char/block)和一對引數組成。第一個引數叫做主裝置號,它標識了裝置的型別。第二個引數叫做次裝置號,它標識了主裝置號相同的裝置組中的一個特定裝置。具有相同主裝置號和型別的所有裝置檔案共享相同的檔案操作集合,因為它們是由同一裝置驅動程式處理的。
  mknod系統呼叫是用來建立裝置檔案的。引數分別為裝置檔名,裝置型別,主次裝置號。

裝置檔案的使用者態處理

主次裝置號被合併到一個結構體dev_t中,獲得主次裝置號最好使用MAJOR和MINOR巨集來獲得。這樣避免以後dev_t升級到64位時,改程式碼。

動態分配裝置號

因為可能裝置號衝突,所以可以使用動態分配裝置號的方式獲得裝置號。在這種情景下,不能永久性的獲得一個固定的裝置號,所以需要一個標準的方法將每個驅動程式所使用的裝置號輸出到使用者態應用程式中。通常在/sys/class子目錄下的dev屬性中。

動態建立裝置檔案

linux核心可以動態建立裝置檔案,它無需把每一個可能想到的硬體裝置的裝置檔案都填充到/dev目錄下,因為裝置檔案可以按需要來建立。由於裝置驅動程式模型的存在,linux 2.6提供了一個稱為udev的工具集,udev是一個通用的核心裝置管理器。它以守護程序的方式運行於Linux系統,並監聽在新裝置初始化或裝置從系統中移除時,核心(通過netlink socket)所發出的uevent。然後執行相關的操作。

裝置檔案的VFS處理

vfs發現呼叫的索引節點與裝置檔案對應時,會把索引節點的i_rdev欄位初始化為裝置檔案的主次裝置號,而把索引節點的i_fop制度設定為def_blk_fops或者def_chr_fops檔案操作表的地址。這樣可以隱藏裝置檔案和普通檔案的區別。最後呼叫的都是裝置相關的操作。

裝置驅動程式

  1. 註冊裝置驅動程式
    • 分配一個device_driver
    • 呼叫driver_register(), 將其插入裝置驅動程式模型的資料結構中
  2. 初始化裝置驅動程式
    • 分配資源

字元裝置驅動程式

字元裝置驅動程式時由一個cdev結構描述的。

struct cdev {
    struct kobject kobj;
    struct module *owner;
    const struct file_operations *ops;
    struct list_head list;  //與字元裝置檔案對應的索引節點連結串列的頭,可能多個裝置檔案具有相同的裝置號,並對應於相同的字元裝置
    dev_t dev;
    unsigned int count;
};

void cdev_init(struct cdev *, const struct file_operations *);

struct cdev *cdev_alloc(void);

void cdev_put(struct cdev *p);

int cdev_add(struct cdev *, dev_t, unsigned);

void cdev_del(struct cdev *);

void cd_forget(struct inode *);

cdev_alloc()函式是動態分配cdev描述符,並初始化內嵌的KObject物件,引用計數為0時,自動釋放該描述符。
cdev_add()函式是在裝置驅動模型中註冊一個cdev描述符,它初始化cdev中的dev和count欄位,然後呼叫kobj_map()函式。kobj_map()函式依次建立裝置驅動程式模型的資料結構,把裝置號範圍複製到裝置驅動程式的描述符中。

分配裝置號

  1. register_chrdev_region()函式和alloc_chrdev_region()函式為驅動程式分配任意範圍內的裝置號,它們不呼叫cdev_add()函式,所以執行完了後還要執行cdev_add()函式。後者可以動態分配主裝置號,前者是檢查裝置號範圍是否跨越一些次裝置號,如果是,則確定其主裝置號以及覆蓋整個區間的相應裝置號範圍,然後在每個相應裝置號範圍上分配。
  2. register_chrdev()函式,分配一個固定的裝置號範圍。內部已經呼叫了cdev_add()函式。裝置驅動程式不用再呼叫了。

塊裝置驅動程式

塊裝置的處理

名稱 意義 大小
扇區 磁碟傳輸的最小單位 通常是512位元組,也有更大的
vfs和檔案系統傳送資料的基本單位 通常是2的冪,而且不能超過一個頁框,必須是扇區大小的倍數。
為了處理分散-聚集DMA傳輸,一個段就是一個記憶體頁或記憶體頁的一部分,它們包含相鄰磁碟扇區中的資料 扇區大小的倍數,小於或等於頁大小
記憶體管理劃分的大小 通常為4096位元組

通用塊層

include/linux/blk_types.h

/*
 * main unit of I/O for the block layer and lower layers (ie drivers and
 * stacking drivers)
 */
struct bio {
    sector_t        bi_sector;  /* device address in 512 byte
                           sectors */
    struct bio      *bi_next;   /* request queue link */
    struct block_device *bi_bdev;
    unsigned long       bi_flags;   /* status, command, etc */
    unsigned long       bi_rw;      /* bottom bits READ/WRITE,
                         * top bits priority
                         */

    unsigned short      bi_vcnt;    /* how many bio_vec's */
    unsigned short      bi_idx;     /* current index into bvl_vec */

    /* Number of segments in this BIO after
     * physical address coalescing is performed.
     */
    unsigned int        bi_phys_segments;

    unsigned int        bi_size;    /* residual I/O count */

    /*
     * To keep track of the max segment size, we account for the
     * sizes of the first and last mergeable segments in this bio.
     */
    unsigned int        bi_seg_front_size;
    unsigned int        bi_seg_back_size;

    bio_end_io_t        *bi_end_io;

    void            *bi_private;
#ifdef CONFIG_BLK_CGROUP
    /*
     * Optional ioc and css associated with this bio.  Put on bio
     * release.  Read comment on top of bio_associate_current().
     */
    struct io_context   *bi_ioc;
    struct cgroup_subsys_state *bi_css;
#endif
#if defined(CONFIG_BLK_DEV_INTEGRITY)
    struct bio_integrity_payload *bi_integrity;  /* data integrity */
#endif

    /*
     * Everything starting with bi_max_vecs will be preserved by bio_reset()
     */

    unsigned int        bi_max_vecs;    /* max bvl_vecs we can hold */

    atomic_t        bi_cnt;     /* pin count */

    struct bio_vec      *bi_io_vec; /* the actual vec list */

    struct bio_set      *bi_pool;

    /*
     * We can inline a number of vecs at the end of the bio, to avoid
     * double allocations for a small number of bio_vecs. This member
     * MUST obviously be kept at the very end of the bio.
     */
    struct bio_vec      bi_inline_vecs[0];
}


/*
 * was unsigned short, but we might as well be ready for > 64kB I/O pages
 */
struct bio_vec {
    struct page *bv_page;  //段所在的頁的描述符
    unsigned int    bv_len; //段長
    unsigned int    bv_offset; //段在頁中的偏移位置
};

bi_io_dev是bio_vec資料結構, 存放的是該bio中包含的段,bi_vcnt存放的是bi_io_dev中段的個數,bi_idx是bi_io_dev當前段的索引。

inlcude/linux/gendhd.h


struct gendisk {
    /* major, first_minor and minors are input parameters only,
     * don't use directly.  Use disk_devt() and disk_max_parts().
     */
    int major;          /* major number of driver */
    int first_minor;
    int minors;                     /* maximum number of minors, =1 for
                                         * disks that can't be partitioned. */

    char disk_name[DISK_NAME_LEN];  /* name of major driver */
    char *(*devnode)(struct gendisk *gd, umode_t *mode);

    unsigned int events;        /* supported events */
    unsigned int async_events;  /* async events, subset of all */

    /* Array of pointers to partitions indexed by partno.
     * Protected with matching bdev lock but stat and other
     * non-critical accesses use RCU.  Always access through
     * helpers.
     */
    struct disk_part_tbl __rcu *part_tbl; //分割槽表,這裡記錄一個disk中所有的邏輯分割槽
    struct hd_struct part0; //0邏輯分割槽

    const struct block_device_operations *fops;
    struct request_queue *queue;
    void *private_data;

    int flags;
    struct device *driverfs_dev;  // FIXME: remove
    struct kobject *slave_dir;

    struct timer_rand_state *random;
    atomic_t sync_io;       /* RAID */
    struct disk_events *ev;
#ifdef  CONFIG_BLK_DEV_INTEGRITY
    struct blk_integrity *integrity;
#endif
    int node_id;
};

flags標識gendisk的狀態,其中GENHD_FL_UP表示已經初始化正在工作,GENHD_FL_REMOVABLE表示是否支援移除,比如軟盤和光碟。fops欄位是一個指向一個block_device_operations型別的指標,其中都是該disk的通用操作,重要的而是open,release,ioctl三個操作。
include/linux/blkdev.h

struct block_device_operations {
    int (*open) (struct block_device *, fmode_t); //開啟一個裝置檔案
    void (*release) (struct gendisk *, fmode_t); // 釋放裝置檔案
    int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);  // 使用大核心鎖釋放ioctl呼叫
    int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); //不適用大核心鎖釋放ioctl呼叫
    int (*direct_access) (struct block_device *, sector_t,
                        void **, unsigned long *);
    unsigned int (*check_events) (struct gendisk *disk,
                      unsigned int clearing);
    /* ->media_changed() is DEPRECATED, use ->check_events() instead */
    int (*media_changed) (struct gendisk *);
    void (*unlock_native_capacity) (struct gendisk *);
    int (*revalidate_disk) (struct gendisk *);
    int (*getgeo)(struct block_device *, struct hd_geometry *);
    /* this callback is with swap_lock and sometimes page table lock held */
    void (*swap_slot_free_notify) (struct block_device *, unsigned long);
    struct module *owner;
};

一個磁碟可能有多個邏輯分割槽,每個分割槽使用hd_struct資料結構來代表。資料結構如下:
include/linux/genhd.h

struct hd_struct {
    sector_t start_sect;  //開始的扇區號
    /*
     * nr_sects is protected by sequence counter. One might extend a
     * partition while IO is happening to it and update of nr_sects
     * can be non-atomic on 32bit machines with 64bit sector_t.
     */
    sector_t nr_sects;  //總的扇區號
    seqcount_t nr_sects_seq;
    sector_t alignment_offset;
    unsigned int discard_alignment;
    struct device __dev;
    struct kobject *holder_dir;
    int policy, partno;
    struct partition_meta_info *info;
#ifdef CONFIG_FAIL_MAKE_REQUEST
    int make_it_fail;
#endif
    unsigned long stamp;
    atomic_t in_flight[2];
#ifdef  CONFIG_SMP
    struct disk_stats __percpu *dkstats;
#else
    struct disk_stats dkstats;
#endif
    atomic_t ref;
    struct rcu_head rcu_head;
};

當核心檢測到一個新的磁碟,它會呼叫alloc_disk()函式來分配和初始化一個gendisk物件,如果這個磁碟分為幾個分割槽,則會為每個分割槽分配hd_struct結構。最後呼叫add_disk()函式來將gendisk資料結構插入到通用塊層相關結構中。

下面分析下系統提交一個io請求的步驟。
1. 使用bio_alloc()函式來分配一個bio資料結構,並初始化相關欄位
2. 呼叫generic_make_request()函式

  1. 呼叫generic_make_request_checks函式來檢查bio->bi_sector是否超過了塊裝置的扇區數
    1. 如果超過了就設定bio->bi_flags為BIO_EOF,輸出核心錯誤資訊,並呼叫bio_endio()函式,然後中止。
    2. 否則,呼叫 blk_partition_remap函式來檢視是否該裝置是磁碟分割槽,如果是就進行重新對映為磁碟的扇區,並把bio->bi_dev指向磁碟的塊描述符。從現在開始,io排程器和磁碟驅動只針對磁碟進行操作,不再有分割槽的概念。
  2. 獲得磁碟的request_queue,呼叫對應的make_request_fn函式來將bio請求插入到請求佇列中。

每個裝置都有一個相關的請求佇列,在linux中,使用如下資料結構刻畫。
include/linux/blkdev.h

struct request_queue {
    /*
     * Together with queue_head for cacheline sharing
     */
    struct list_head    queue_head; //請求的連結串列頭
    struct request      *last_merge;
    struct elevator_queue   *elevator; //排程器例項
    int         nr_rqs[2];  /* # allocated [a]sync rqs */
    int         nr_rqs_elvpriv; /* # allocated rqs w/ elvpriv */

    /*
     * If blkcg is not used, @q->root_rl serves all requests.  If blkcg
     * is used, root blkg allocates from @q->root_rl and all other
     * blkgs from their own blkg->rl.  Which one to use should be
     * determined using bio_request_list().
     */
    struct request_list root_rl;

    request_fn_proc     *request_fn;  //驅動程式策略例程的入口點
    make_request_fn     *make_request_fn; //當一個新的request要插入到佇列中時觸發的函式
    prep_rq_fn      *prep_rq_fn;
    unprep_rq_fn        *unprep_rq_fn;
    merge_bvec_fn       *merge_bvec_fn;
    softirq_done_fn     *softirq_done_fn;
    rq_timed_out_fn     *rq_timed_out_fn;
    dma_drain_needed_fn *dma_drain_needed;
    lld_busy_fn     *lld_busy_fn;

    struct blk_mq_ops   *mq_ops;

    unsigned int        *mq_map;

    /* sw queues */
    struct blk_mq_ctx   *queue_ctx;
    unsigned int        nr_queues;

    /* hw dispatch queues */
    struct blk_mq_hw_ctx    **queue_hw_ctx;
    unsigned int        nr_hw_queues;

    /*
     * Dispatch queue sorting
     */
    sector_t        end_sector;
    struct request      *boundary_rq;

    /*
     * Delayed queue handling
     */
    struct delayed_work delay_work;

    struct backing_dev_info backing_dev_info;

    /*
     * The queue owner gets to use this for whatever they like.
     * ll_rw_blk doesn't touch it.
     */
    void            *queuedata;

    /*
     * various queue flags, see QUEUE_* below
     */
    unsigned long       queue_flags;

    /*
     * ida allocated id for this queue.  Used to index queues from
     * ioctx.
     */
    int         id;

    /*
     * queue needs bounce pages for pages above this limit
     */
    gfp_t           bounce_gfp;

    /*
     * protects queue structures from reentrancy. ->__queue_lock should
     * _never_ be used directly, it is queue private. always use
     * ->queue_lock.
     */
    spinlock_t      __queue_lock;
    spinlock_t      *queue_lock;

    /*
     * queue kobject
     */
    struct kobject kobj;

    /*
     * mq queue kobject
     */
    struct kobject mq_kobj;

#ifdef CONFIG_PM_RUNTIME
    struct device       *dev;
    int         rpm_status;
    unsigned int        nr_pending;
#endif

    /*
     * queue settings
     */
    unsigned long       nr_requests;    /* Max # of requests */
    unsigned int        nr_congestion_on;
    unsigned int        nr_congestion_off;
    unsigned int        nr_batching;

    unsigned int        dma_drain_size;
    void            *dma_drain_buffer;
    unsigned int        dma_pad_mask;
    unsigned int        dma_alignment;

    struct blk_queue_tag    *queue_tags;
    struct list_head    tag_busy_list;

    unsigned int        nr_sorted;
    unsigned int        in_flight[2];
    /*
     * Number of active block driver functions for which blk_drain_queue()
     * must wait. Must be incremented around functions that unlock the
     * queue_lock internally, e.g. scsi_request_fn().
     */
    unsigned int        request_fn_active;

    unsigned int        rq_timeout;
    struct timer_list   timeout;
    struct list_head    timeout_list;

    struct list_head    icq_list;
#ifdef CONFIG_BLK_CGROUP
    DECLARE_BITMAP      (blkcg_pols, BLKCG_MAX_POLS);
    struct blkcg_gq     *root_blkg;
    struct list_head    blkg_list;
#endif

    struct queue_limits limits;

    /*
     * sg stuff
     */
    unsigned int        sg_timeout;
    unsigned int        sg_reserved_size;
    int         node;
#ifdef CONFIG_BLK_DEV_IO_TRACE
    struct blk_trace    *blk_trace;
#endif
    /*
     * for flush operations
     */
    unsigned int        flush_flags;
    unsigned int        flush_not_queueable:1;
    unsigned int        flush_queue_delayed:1;
    unsigned int        flush_pending_idx:1;
    unsigned int        flush_running_idx:1;
    unsigned long       flush_pending_since;
    struct list_head    flush_queue[2];
    struct list_head    flush_data_in_flight;
    union {
        struct request  flush_rq;
        struct {
            spinlock_t mq_flush_lock;
            struct work_struct mq_flush_work;
        };
    };

    struct mutex        sysfs_lock;

    int         bypass_depth;

#if defined(CONFIG_BLK_DEV_BSG)
    bsg_job_fn      *bsg_job_fn;
    int         bsg_job_size;
    struct bsg_class_device bsg_dev;
#endif

#ifdef CONFIG_BLK_DEV_THROTTLING
    /* Throttle data */
    struct throtl_data *td;
#endif
    struct rcu_head     rcu_head;
    wait_queue_head_t   mq_freeze_wq;
    struct percpu_counter   mq_usage_counter;
    struct list_head    all_q_node;
};

backing_dev_info儲存硬體塊裝置的io資料流,比如預讀和請求佇列擁擠狀態資訊。
每個io請求由如下資料結構刻畫:
include/linux/blkdev.h

/*
 * try to put the fields that are referenced together in the same cacheline.
 * if you modify this structure, be sure to check block/blk-core.c:blk_rq_init()
 * as well!
 */
struct request {
    union {
        struct list_head queuelist; // 連結到request_queue
        struct llist_node ll_list;
    };
    
            
           

相關推薦

linux I-O體系結構裝置驅動程式

裝置驅動程式模型 基於linux 3.13 sysfs檔案系統 允許使用者態應用程式訪問核心內部資料結構的一種檔案系統。被安裝於/sys目錄下,相應的高層目錄結構如下: block 塊裝置,獨立於所連線的匯流排 devices

I/O體系結構裝置驅動程式(一)

1、I/O體系結構 為確保計算機能夠正常工作,必須提供資料通路,讓資訊在連線到計算機的CPU、RAM、和I/O裝置之間流動,這些資料通路總稱為匯流排,擔當計算機內部主通訊通道的作用。 所有計算機都擁有一條系統匯流排,它連線大部分內部硬體裝置,一種典型的系統匯流排是PCI(

java中的i/o體系結構及流分類

Java中IO流的體系結構如圖: 在整個Java.io包中最重要的就是5個類和一個介面。5個類指的是File、OutputStream、InputStream、Writer、Reader;一個介面指的是Serializable.掌握了這些IO的核心操作那麼

字元裝置驅動-------Linux異常處理體系結構

  裸機中斷流程 外部觸發 CPU 發生中斷, 強制的跳到異常向量處 跳轉到具體函式 儲存被中斷處的現場(各種暫存器的值) 執行中斷處理函式,處理具體任務 恢復被中斷的現場 Linux處理異常流程   異常發生時,會去異常向量表找到入口

Linux SPI匯流排裝置驅動架構之一:系統概述

SPI是"Serial Peripheral Interface" 的縮寫,是一種四線制的同步序列通訊介面,用來連線微控制器、感測器、儲存裝置,SPI裝置分為主裝置和從裝置兩種,用於通訊和控制的四根線分別是: CS    片選訊號SCK  時鐘訊號MISO  主裝置的資料

Linux SPI匯流排裝置驅動架構之二:SPI通用介面層

通過上一篇文章的介紹,我們知道,SPI通用介面層用於把具體SPI裝置的協議驅動和SPI控制器驅動聯接在一起,通用介面層除了為協議驅動和控制器驅動提供一系列的標準介面API,同時還為這些介面API定義了相應的資料結構,這些資料結構一部分是SPI裝置、SPI協議驅動和SPI控制

Linux SPI匯流排裝置驅動架構之四:SPI資料傳輸的佇列化

我們知道,SPI資料傳輸可以有兩種方式:同步方式和非同步方式。所謂同步方式是指資料傳輸的發起者必須等待本次傳輸的結束,期間不能做其它事情,用程式碼來解釋就是,呼叫傳輸的函式後,直到資料傳輸完成,函式才會返回。而非同步方式則正好相反,資料傳輸的發起者無需等待傳輸的結束,資料傳

Linux裝置節點、裝置裝置驅動

Linux裝置分成三種基本型別: 字元裝置塊裝置網路裝置裝置驅動程式也分為對應的三類:字元裝置驅動程式、塊裝置驅動程式和網路裝置驅動程式。 裝置節點被建立在/dev 下,是連線核心與使用者層的樞紐,就是裝置是接到對應哪種介面的哪個 ID 上。 相當於硬碟的inode一

Linux I/O Block--塊裝置的表示

       塊裝置的特點是其平均訪問時間較長,因此為了提高塊裝置的訪問效率,Linux核心用了很多的筆墨來設計和塊裝置相關的部分,這樣一來,從程式碼的角度來看,訪問一個檔案的過程變得尤其的漫長……整個路徑包含的過程基本可以概括為虛擬檔案系統-->塊裝置實際檔案系統-

Oracle體系結構用戶管理

oracle 體系結構 用戶管理 數據庫體系結構 定義: 數據庫的組成,工作過程,數據庫中的數據的組成與管理機制。 組成: 實例、用戶進程、服務器進程、數據庫文件、

Spring MVC體系結構處理請求控制器

基於 耦合 handle 邏輯 圖解 運用 ann 處理方式 設計   MVC設計模式在各種成熟框架中都得到了良好的運用,它將View,Controller,Model三層清晰地劃分開,搭建一個松耦合,高重用性,高可適用性的完美架構。   Spring MVC框架是經典的M

Spring MVC 體系結構處理請求控制器

運行 替換 處理流 -c 視圖渲染 mapping exec 環境搭建 有一個 1.Spring框架簡介   Spring MVC框架是有一個MVC框架,通過實現Model-View-Controller模式來很好地將數據、業務與展現進行分離。在Spring MVC 框架中

第一章 概論 計算機網絡筆記 學堂在線 1.4 網絡體系結構協議

計算機 適合 下層 幀格式 會話 運行 規範 應用 tcp 1 分層對每一層進行定義:   下一層為本層提供的服務   本層為上一層提供的服務   本層需要完成的功能 對相鄰層之間接口進行定義:   n層通過接口發出服務請求,n-1 層通過接口提供服務響應。   只要n層與

Linux I/O復用中select poll epoll模型的介紹及其優缺點的比較

創建 等待 歸類 好的 第一個 class ews tor client 關於I/O多路復用: I/O多路復用(又被稱為“事件驅動”),首先要理解的是。操作系統為你

linux初級學習筆記九:linux I/O管理,重定向及管道!(視頻序號:04_3)

font 運算 bsp 輸出 指令 所有 inittab tput bin 本節學習的命令:tr,tee,wc 本節學習的技能:       計算機的組成       I/O管理及重定向      管道的使用   知識點九:管理及IO重定向(4_3) 計算機組成:  

hadoop學習筆記(三):hdfs體系結構讀寫流程(轉)

sim 百萬 服務器 發表 繼續 什麽 lose 基於 一次 原文:https://www.cnblogs.com/codeOfLife/p/5375120.html 目錄 HDFS 是做什麽的 HDFS 從何而來 為什麽選擇 HDFS 存儲數據 HDFS

java的類加載器體系結構雙親委派機制

答案 類加載器 父類 編譯 自己 體系 文件加載 ext 類名 類加載器將字節碼文件加載到內存中,同時在方法區中生成對應的java.land.class對象 作為外部訪問方法區的入口。 類加載器的層次結構:            引導類加載器《-------------擴

Linux I/O調度

預訂 完成 進程優先級 關系 組織 合並 針對 介質 最大的 一) I/O調度程序的總結 1) 當向設備寫入數據塊或是從設備讀出數據塊時,請求都被安置在一個隊列中等待完成. 2) 每個塊設備都有它自己的隊列. 3) I/O調度程序負責維護這些隊列的順序,

MySQL體系結構存儲引擎概述

管理軟件 文件 提高 數據存儲 系統 數據庫實例 sel 技術 bubuko 一、定義數據庫和實例 數據庫: 物理操作系統文件或其他形式文件類型的集合。數據庫文件可以是frm、MYD、ibd 結尾的文件。 從概念上來說,數據庫是文件的集合,是依照某種數據模型組織起來並

淺析理解Oracle數據庫體系結構存儲結構

控制文件 打開 提高 相互 col 刪除 undo 建議 行數 一、Oracle體系結構 個人比喻幫助理解:類似於圖書館,去圖書館的客戶(用戶進程和服務進程等)需要調取資料,求助於圖書管理員(實例)進入圖書分區(數據庫)進行資料查找。【如果比喻不當,歡迎指正,盡請諒解】