學寫塊裝置驅動(三)----踢開IO排程器,自己處理bio(下)
本篇的(上)基本搞清楚了我們已經實現的記憶體塊裝置驅動和通用塊層之間的絲絲聯絡。現在我們該做點自己想做的事情了: 踢開IO排程器,自己來處理bio。
踢開IO排程器很容易,即不使用__make_request 這個系統指定的強力函式,如何不使用?其實我們從(上)的blk_init_queue()函式中也能看出來,系統使用了blk_queue_make_request(q, __make_request)這個函式,那麼我們也可以使用這個函式來指定我們自己的策略函式,從而替換掉__make_request函式。那初始化request_queue的blk_init_queue函式也不需要了。
直接看更改過後的原始碼:
simp_blkdev.c:
為了不使用IO排程器,自己處理bio,我們需要掌握如下幾個關鍵方法和資料結構:#include<linux/init.h> #include<linux/module.h> #include<linux/genhd.h> #include<linux/fs.h> #include<linux/blkdev.h> #include<linux/bio.h> #define SIMP_BLKDEV_DISKNAME "simp_blkdev" #define SIMP_BLKDEV_DEVICEMAJOR COMPAQ_SMART2_MAJOR #define SIMP_BLKDEV_BYTES (8*1024*1024) static DEFINE_SPINLOCK(rq_lock); unsigned char simp_blkdev_data[SIMP_BLKDEV_BYTES]; static struct gendisk *simp_blkdev_disk; static struct request_queue *simp_blkdev_queue;//device's request queue struct block_device_operations simp_blkdev_fops = { .owner = THIS_MODULE, }; //handle bio static int simp_blkdev_make_request(struct request_queue *q, struct bio *bio){ struct bio_vec *bvec; int i; void *dsk_mem; if( (bio->bi_sector << 9) + bio->bi_size > SIMP_BLKDEV_BYTES){ printk(KERN_ERR SIMP_BLKDEV_DISKNAME ":bad request:block=%llu,count=%u\n",(unsigned long long)bio->bi_sector,bio->bi_size); bio_endio(bio,-EIO); return 0; } dsk_mem = simp_blkdev_data + (bio->bi_sector << 9); bio_for_each_segment(bvec, bio, i){ void *iovec_mem; switch( bio_rw(bio) ){ case READ: case READA: iovec_mem = kmap(bvec->bv_page) + bvec->bv_offset; memcpy(iovec_mem, dsk_mem, bvec->bv_len); kunmap(bvec->bv_page); break; case WRITE: iovec_mem = kmap(bvec->bv_page) + bvec->bv_offset; memcpy(dsk_mem, iovec_mem, bvec->bv_len); kunmap(bvec->bv_page); break; default: printk(KERN_ERR SIMP_BLKDEV_DISKNAME ": unknown value of bio_rw: %lu\n",bio_rw(bio)); bio_endio(bio,-EIO); return 0; } dsk_mem += bvec->bv_len; } bio_endio(bio,0); return 0; } static int simp_blkdev_init(void){ int ret; simp_blkdev_queue = blk_alloc_queue(GFP_KERNEL); if(!simp_blkdev_queue){ ret = -ENOMEM; goto error_alloc_queue; } blk_queue_make_request(simp_blkdev_queue, simp_blkdev_make_request); //alloc the resource of gendisk simp_blkdev_disk = alloc_disk(1); if(!simp_blkdev_disk){ ret = -ENOMEM; goto error_alloc_disk; } //populate the gendisk structure strcpy(simp_blkdev_disk->disk_name,SIMP_BLKDEV_DISKNAME); simp_blkdev_disk->major = SIMP_BLKDEV_DEVICEMAJOR; simp_blkdev_disk->first_minor = 0; simp_blkdev_disk->fops = &simp_blkdev_fops; simp_blkdev_disk->queue = simp_blkdev_queue; set_capacity(simp_blkdev_disk,SIMP_BLKDEV_BYTES>>9); add_disk(simp_blkdev_disk); printk("module simp_blkdev added.\n"); return 0; error_alloc_queue: blk_cleanup_queue(simp_blkdev_queue); error_alloc_disk: return ret; } static void simp_blkdev_exit(void){ del_gendisk(simp_blkdev_disk); put_disk(simp_blkdev_disk); blk_cleanup_queue(simp_blkdev_queue); printk("module simp_blkdev romoved.\n"); } module_init(simp_blkdev_init); module_exit(simp_blkdev_exit);
request_queue *blk_alloc_queue(gfp_t gfp_mask) //用來初始化request_queue,填充基本結構,如連結串列頭,鎖。
void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) //原始碼的註釋對該函式做了清楚的說明:
明白了吧,我們的塊裝置驅動由於也是虛擬的塊裝置,故並不受益於IO排程,而受益於直接處理bio。該函式的第二個引數就是我們需要編寫的處理bio的函式。/** * blk_queue_make_request - define an alternate make_request function for a device * @q: the request queue for the device to be affected * @mfn: the alternate make_request function * * Description: * The normal way for &struct bios to be passed to a device * driver is for them to be collected into requests on a request * queue, and then to allow the device driver to select requests * off that queue when it is ready. This works well for many block * devices. However some block devices (typically virtual devices * such as md or lvm) do not benefit from the processing on the * request queue, and are served best by having the requests passed * directly to them. This can be achieved by providing a function * to blk_queue_make_request(). * * Caveat: * The driver that does this *must* be able to deal appropriately * with buffers in "highmemory". This can be accomplished by either calling * __bio_kmap_atomic() to get a temporary kernel mapping, or by calling * blk_queue_bounce() to create a buffer in normal memory. **/
int (your_make_request) (struct request_queue *q, struct bio *bio) // 這是我們需要編寫的主要函式,功能即對bio進行處理。bio的結構自己去google吧,在這裡我們只點出,bio對應塊裝置上一段連續空間的請求,bio中包含的多個bio_vec用來指出這個請求對應的每段記憶體。所以,該函式的本質即 在一個迴圈中,處理bio中的每個bio_vec。
bio_for_each_segment(bvl, bio, i) // 巨集,用來方便我們對bio結構進行遍歷。
bio->bi_sector //bio請求的塊裝置起始扇區
bio->bi_size //bio請求的扇區數
void bio_endio(struct bio *bio, int error) // 結束bio請求。
void *kmap(struct page *page) // 返回頁的虛擬地址。如果頁在高階記憶體,則將記憶體頁對映到非線性對映區域再返回地址。
void kunmap(struct page *page) //將對映的非線性區域還給系統。
掌握了上面的知識,我們就可以看懂simp_blkdev_make_request函數了,總體過程為在bio_for_each_segment迴圈中根據讀或者寫來處理bio中的每一個bio_vec,處理bio_vec時,基本思想為計算bio_vec描述的記憶體地址以及我們塊裝置的地址dsk_mem,然後memcpy。細節為兩邊地址的計算。
好了。我們來實驗一下我們新的塊裝置驅動程式吧:
初始化塊裝置:
掛載:
讀寫: