核心空間和使用者空間資料交換一

阿新 • • 發佈：2019-02-06

debugfs

核心開發者經常需要向用戶空間應用輸出一些除錯資訊，在穩定的系統中可能根本不需要這些除錯資訊，但是在開發過程中，為了搞清楚核心的行為，除錯資訊非常必要，printk可能是用的最多的，但它並不是最好的，除錯資訊只是在開發中用於除錯，而printk將一直輸出，因此開發完畢後需要清除不必要的printk語句，另外如果開發者希望使用者空間應用能夠改變核心行為時，printk就無法實現。因此，需要一種新的機制，那只有在需要的時候使用，它在需要時通過在一個虛擬檔案系統中建立一個或多個檔案來向用戶空間應用提供除錯資訊。

有幾種方式可以實現上述要求：
(1)使用procfs，在/proc建立檔案輸出除錯資訊，但是procfs對於大於一個記憶體頁（對於x86是4K）的輸出比較麻煩，而且速度慢，有時回出現一些意想不到的問題。
(2)使用sysfs（2.6核心引入的新的虛擬檔案系統），在很多情況下，除錯資訊可以存放在那裡，但是sysfs主要用於系統管理，它希望每一個檔案對應核心的一個變數，如果使用它輸出複雜的資料結構或除錯資訊是非常困難的。
(3)使用libfs建立一個新的檔案系統，該方法極其靈活，開發者可以為新檔案系統設定一些規則，使用libfs使得建立新檔案系統更加簡單，但是仍然超出了一個開發者的想象。
(4)為了使得開發者更加容易使用這樣的機制，Greg Kroah-Hartman開發了debugfs（在2.6.11中第一次引入），它是一個虛擬檔案系統，專門用於輸出除錯資訊，該檔案系統非常小，很容易使用，可以在配置核心時選擇是否構件到核心中，在不選擇它的情況下，使用它提供的API的核心部分不需要做任何改動。
使用debugfs的開發者首先需要在檔案系統中建立一個目錄，下面函式用於在debugfs檔案系統下建立一個目錄：

      struct dentry  *debugfs_create_dir(const char *name, struct dentry *parent);

引數name是要建立的目錄名，
引數parent指定建立目錄的父目錄的dentry，如果為NULL，目錄將建立在debugfs檔案系統的根目錄下。如果返回為-ENODEV，表示核心沒有把debugfs編譯到其中，如果返回為NULL，表示其他型別的建立失敗，如果建立目錄成功，返回指向該目錄對應的dentry條目的指標。

下面函式用於在debugfs檔案系統中建立一個檔案：

 struct dentry  *debugfs_create_file(const char *name, mode_t mode, struct  dentry *parent,
 void *data, struct  file_operations *fops);

引數name指定要建立的檔名，
引數mode指定該檔案的訪問許可，
引數parent指向該檔案所在目錄，
引數data為該檔案特定的一些資料，
引數fops為實現在該檔案上進行檔案操作的fiel_operations結構指標，在很多情況下，由seq_file提供的檔案操作實現就足夠了，因此使用debugfs很容易，當然，在一些情況下，開發者可能僅需要使用使用者應用可以控制的變數來除錯，debugfs也提供了4個這樣的API方便開發者使用：

struct dentry *debugfs_create_u8(const char *name, mode_t mode, struct 
  dentry *parent, u8 *value);
struct dentry *debugfs_create_u16(const char *name, mode_t mode, struct dentry *parent, u16 *value);
struct dentry *debugfs_create_u32(const char *name, mode_t mode, struct dentry *parent, u32 *value);
struct dentry *debugfs_create_bool(const char *name, mode_t mode, struct dentry  *parent, u32 *value);

引數name和mode指定檔名和訪問許可，
引數value為需要讓使用者應用控制的核心變數指標。

當核心模組解除安裝時，Debugfs並不會自動清除該模組建立的目錄或檔案，因此對於建立的每一個檔案或目錄，開發者必須呼叫下面函式清除：

  void debugfs_remove(struct dentry *dentry);

引數dentry為上面建立檔案和目錄的函式返回的dentry指標。

在下面給出了一個使用debufs的示例模組debugfs_exam.c，為了保證該模組正確執行，必須讓核心支援debugfs， debugfs是一個除錯功能，因此它位於主選單Kernel hacking，並且必須選擇Kernel debugging選項才能選擇，它的選項名稱為Debug Filesystem。為了在使用者態使用debugfs，使用者必須mount它，下面是在作者系統上的使用輸出：

  $ mkdir -p  /debugfs
  $ mount -t debugfs debugfs /debugfs
  $ insmod  ./debugfs_exam.ko
  $ ls /debugfs
  debugfs-exam
  $ ls /debugfs/debugfs-exam
  u8_var         u16_var        u32_var        bool_var
  $ cd /debugfs/debugfs-exam
  $ cat u8_var
  0
  $ echo 200 > u8_var
  $ cat u8_var
  200
  $ cat bool_var
  N
  $ echo 1 > bool_var
  $ cat bool_var
  Y

debugfs例子module

//kernel module: debugfs_exam.c
#include <linux/config.h>
#include <linux/module.h>
#include <linux/debugfs.h>
#include <linux/types.h>

/*dentry:目錄項，是Linux檔案系統中某個索引節點(inode)的連結。這個索引節點可以是檔案，也可以是目錄。
Linux用資料結構dentry來描述fs中和某個檔案索引節點相連結的一個目錄項(能是檔案,也能是目錄)。
　　（1）未使用（unused）狀態：該dentry物件的引用計數d_count的值為0，但其d_inode指標仍然指向相關
的的索引節點。該目錄項仍然包含有效的資訊，只是當前沒有人引用他。這種dentry物件在回收記憶體時可能會被釋放。
　　（2）正在使用（inuse）狀態：處於該狀態下的dentry物件的引用計數d_count大於0，且其d_inode指向相關
的inode物件。這種dentry物件不能被釋放。
　　（3）負（negative）狀態：和目錄項相關的inode物件不復存在（相應的磁碟索引節點可能已被刪除），dentry
物件的d_inode指標為NULL。但這種dentry物件仍然儲存在dcache中，以便後續對同一檔名的查詢能夠快速完成。
這種dentry物件在回收記憶體時將首先被釋放。
*/
static struct dentry *root_entry, *u8_entry, *u16_entry, *u32_entry, *bool_entry;
static u8 var8;
static u16 var16;
static u32 var32;
static u32 varbool;

static int __init exam_debugfs_init(void)
{

        root_entry = debugfs_create_dir("debugfs-exam", NULL);
        if (!root_entry) {
                printk("Fail to create proc dir: debugfs-exam\n");
                return 1;
        }

        u8_entry = debugfs_create_u8("u8-var", 0644, root_entry, &var8);
        u16_entry = debugfs_create_u16("u16-var", 0644, root_entry, &var16);
        u32_entry = debugfs_create_u32("u32-var", 0644, root_entry, &var32);
        bool_entry = debugfs_create_bool("bool-var", 0644, root_entry, &varbool);

        return 0;
}

static void __exit exam_debugfs_exit(void)
{
        debugfs_remove(u8_entry);
        debugfs_remove(u16_entry);
        debugfs_remove(u32_entry);
        debugfs_remove(bool_entry);
        debugfs_remove(root_entry);
}

module_init(exam_debugfs_init);
module_exit(exam_debugfs_exit);
MODULE_LICENSE("GPL");

procfs

procfs是比較老的一種使用者態與核心態的資料交換方式，核心的很多資料都是通過這種方式出口給使用者的，核心的很多引數也是通過這種方式來讓使用者方便設定的。除了sysctl出口到/proc下的引數，procfs提供的大部分核心引數是隻讀的。實際上，很多應用嚴重地依賴於procfs，因此它幾乎是必不可少的元件。本節將講解如何使用procfs。
Procfs提供瞭如下API：

  struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode, struct proc_dir_entry *parent)

該函式用於建立一個正常的proc條目，
引數name給出要建立的proc條目的名稱，
引數mode給出了建立的該proc條目的訪問許可權，
引數 parent指定建立的proc條目所在的目錄。如果要在/proc下建立proc條目，parent應當為NULL。否則它應當為proc_mkdir 返回的struct proc_dir_entry結構的指標。

  extern void remove_proc_entry(const char *name, struct proc_dir_entry *parent)

該函式用於刪除上面函式建立的proc條目，
引數name給出要刪除的proc條目的名稱，
引數parent指定建立的proc條目所在的目錄。

  struct  proc_dir_entry *proc_mkdir(const char * name, struct proc_dir_entry *parent)

該函式用於建立一個proc目錄，
引數name指定要建立的proc目錄的名稱，
引數parent為該proc目錄所在的目錄。

extern struct proc_dir_entry *proc_mkdir_mode(const char *name, mode_t mode, struct proc_dir_entry *parent)
 struct proc_dir_entry *proc_symlink(const char * name, struct proc_dir_entry* parent, const char *dest)

該函式用於建立一個proc條目的符號連結，
引數name給出要建立的符號連結proc條目的名稱，
引數parent指定符號連線所在的目錄，
引數dest指定連結到的proc條目名稱。

struct  proc_dir_entry *create_proc_read_entry(const char *name, mode_t mode, struct proc_dir_entry *base,
        read_proc_t *read_proc, void * data);

該函式用於建立一個規則的只讀proc條目，
引數name給出要建立的proc條目的名稱，
引數mode給出了建立的該proc條目的訪問許可權，
參數base指定建立的proc條目所在的目錄，
引數read_proc給出讀去該proc條目的操作函式，
引數data為該proc條目的專用資料，它將儲存在該proc條目對應的struct file結構的private_data欄位中。

struct  proc_dir_entry *create_proc_info_entry(const char *name, mode_t mode, struct proc_dir_entry *base,
        get_info_t *get_info);

該函式用於建立一個info型的proc條目，
引數name給出要建立的proc條目的名稱，
引數mode給出了建立的該proc條目的訪問許可權，
引數base指定建立的proc條目所在的目錄，
引數get_info指定該proc條目的get_info操作函式。實際上get_info等同於read_proc，如果proc條目沒有定義個read_proc，對該proc條目的read操作將使用get_info取代，因此它在功能上非常類似於函式create_proc_read_entry。

  struct  proc_dir_entry *proc_net_create(const char *name, mode_t mode, get_info_t *get_info)

該函式用於在/proc/net目錄下建立一個proc條目，
引數name給出要建立的proc條目的名稱，
引數mode給出了建立的該proc條目的訪問許可權，
引數get_info指定該proc條目的get_info操作函式。

struct  proc_dir_entry *proc_net_fops_create(const char *name, mode_t mode, struct file_operations *fops)

該函式也用於在/proc/net下建立proc條目，但是它也同時指定了對該proc條目的檔案操作函式。

  void proc_net_remove(const char *name)

該函式用於刪除前面兩個函式在/proc/net目錄下建立的proc條目。
引數name指定要刪除的proc名稱。

除了這些函式，值得一提的是結構struct proc_dir_entry，為了建立一了可寫的proc條目並指定該proc條目的寫操作函式，必須設定上面的這些建立proc條目的函式返回的指標指向的struct proc_dir_entry結構的write_proc欄位，並指定該proc條目的訪問許可權有寫許可權。
為了使用這些介面函式以及結構struct proc_dir_entry，使用者必須在模組中包含標頭檔案linux/proc_fs.h。
在原始碼包中給出了procfs示例程式procfs_exam.c，它定義了三個proc檔案條目和一個proc目錄條目，讀者在插入該模組後應當看到如下結構：

  $ ls /proc/myproctest
  aint  astring  bigprocfile
  $

讀者可以通過cat和echo等檔案操作函式來檢視和設定這些proc檔案。特別需要指出，bigprocfile是一個大檔案（超過一個記憶體
頁），對於這種大檔案，procfs有一些限制，因為它提供的快取，只有一個頁，因此必須特別小心，並對超過頁的部分做特別的考慮，處理起來比較複雜並且
很容易出錯，所有procfs並不適合於大資料量的輸入輸出，後面一節seq_file就是因為這一缺陷而設計的，當然seq_file依賴於 procfs的一些基礎功能。

//kernel module: procfs_exam.c
#include <linux/config.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/sched.h>
#include <linux/types.h>
#include <asm/uaccess.h>

#define STR_MAX_SIZE 255
static int int_var;
static char string_var[256];
static char big_buffer[65536];
static int big_buffer_len = 0;
static struct proc_dir_entry * myprocroot;
static int first_write_flag = 1;

int int_read_proc(char *page, char **start, off_t off, int count, int *eof, void *data)
{
count = sprintf(page, "%d", *(int *)data);
return count;
}

int int_write_proc(struct file *file, const char __user *buffer,unsigned long count, void *data)
{
unsigned int c = 0, len = 0, val, sum = 0;
int * temp = (int *)data;

while (count) {
if (get_user(c, buffer)) //從使用者空間中得到資料
 return -EFAULT;

len++;
buffer++;
count--;

if (c == 10 || c == 0)
break;
val = c - '0';
if (val > 9)
return -EINVAL;
sum *= 10;
sum += val;
}
* temp = sum;
return len;
}

int string_read_proc(char *page, char **start, off_t off,int count, int *eof, void *data)
{
count = sprintf(page, "%s", (char *)data);
return count;
}

int string_write_proc(struct file *file, const char __user *buffer, unsigned long count, void *data)
{
if (count > STR_MAX_SIZE) {
count = 255;
}
copy_from_user(data, buffer, count);
return count;
}

int bigfile_read_proc(char *page, char **start, off_t off, int count, int *eof, void *data)
{
if (off > big_buffer_len) {
* eof = 1;
return 0;
}

if (count > PAGE_SIZE) {
count = PAGE_SIZE;
}

if (big_buffer_len - off < count) {
count = big_buffer_len - off;
}

memcpy(page, data, count);
*start = page;
return count;

}

int bigfile_write_proc(struct file *file, const char __user *buffer, unsigned long count, void *data)
{
char * p = (char *)data;

if (first_write_flag) {
big_buffer_len = 0;
first_write_flag = 0;
}

if (65536 - big_buffer_len < count) {
count = 65536 - big_buffer_len;
first_write_flag = 1;
}

copy_from_user(p + big_buffer_len, buffer, count);
big_buffer_len += count;
return count;
}
static int __init procfs_exam_init(void)
{
#ifdef CONFIG_PROC_FS
struct proc_dir_entry * entry;
myprocroot = proc_mkdir("myproctest", NULL);
entry = create_proc_entry("aint", 0644, myprocroot);
if (entry) {
entry->data = &int_var;
entry->read_proc = &int_read_proc;
entry->write_proc = &int_write_proc; 
}


entry = create_proc_entry("astring", 0644, myprocroot);
if (entry) {
entry->data = &string_var;
entry->read_proc = &string_read_proc;
entry->write_proc = &string_write_proc; 
}

entry = create_proc_entry("bigprocfile", 0644, myprocroot);
if (entry) {
entry->data = &big_buffer;
entry->read_proc = &bigfile_read_proc;
entry->write_proc = &bigfile_write_proc; 
}
#else
printk("This module requires the kernel to support procfs,\n");
#endif


return 0;
}

static void __exit procfs_exam_exit(void)
{
#ifdef CONFIG_PROC_FS
remove_proc_entry("aint", myprocroot);
remove_proc_entry("astring", myprocroot);
remove_proc_entry("bigprocfile", myprocroot);
remove_proc_entry("myproctest", NULL);
#endif
}

module_init(procfs_exam_init);
module_exit(procfs_exam_exit);
MODULE_LICENSE("GPL");

seq_file

一般地，核心通過在procfs檔案系統下建立檔案來向用戶空間提供輸出資訊，使用者空間可以通過任何文字閱讀應用檢視該檔案資訊，但是procfs 有一個缺陷，如果輸出內容大於1個記憶體頁，需要多次讀，因此處理起來很難，另外，如果輸出太大，速度比較慢，有時會出現一些意想不到的情況， Alexander Viro實現了一套新的功能，使得核心輸出大檔案資訊更容易，該功能出現在2.4.15（包括2.4.15）以後的所有2.4核心以及2.6核心中，尤其是在2.6核心中，已經大量地使用了該功能。
要想使用seq_file功能，開發者需要包含標頭檔案linux/seq_file.h，並定義與設定一個seq_operations結構（類似於file_operations結構）:

  struct seq_operations {
          void* (*start) (struct seq_file *m, loff_t *pos);
          void  (*stop) (struct seq_file *m, void *v);
          void* (*next) (struct seq_file *m, void *v, loff_t *pos);
          int   (*show) (struct seq_file *m, void *v);
  };

start函式用於指定seq_file檔案的讀開始位置，返回實際讀開始位置，如果指定的位置超過檔案末尾，應當返回NULL，start函式可以有一個特殊的返回SEQ_START_TOKEN，它用於讓show函式輸出檔案頭，但這隻能在pos為0時使用.
next函式用於把seq_file檔案的當前讀位置移動到下一個讀位置，返回實際的下一個讀位置，如果已經到達檔案末尾，返回NULL.
stop函式用於在讀完seq_file檔案後呼叫，它類似於檔案操作close，用於做一些必要的清理，如釋放記憶體等，show函式用於格式化輸出，如果成功返回0，否則返回出錯碼。

Seq_file也定義了一些輔助函式用於格式化輸出：

  /*函式seq_putc用於把一個字元輸出到seq_file檔案*/
  int seq_putc(struct seq_file *m, char c);

 /*函式seq_puts則用於把一個字串輸出到seq_file檔案*/
  int seq_puts(struct seq_file *m, const char *s);

  /*函式seq_escape類似於seq_puts，只是，它將把第一個字串引數中出現的包含在第二個字串引數
中的字元按照八進位制形式輸出，也即對這些字元進行轉義處理*/
  int seq_escape(struct seq_file *, const char *, const char *);


 /*函式seq_printf是最常用的輸出函式，它用於把給定引數按照給定的格式輸出到seq_file檔案*/
  int seq_printf(struct seq_file *, const char *, ...)__attribute__ ((format(printf,2,3)));

 /*函式seq_path則用於輸出檔名，字串引數提供需要轉義的檔名字元，它主要供檔案系統使用*/
  int seq_path(struct seq_file *, struct vfsmount *, struct dentry *, char *);

在定義了結構struct seq_operations之後，使用者還需要把開啟seq_file檔案的open函式，以便該結構與對應於seq_file檔案的struct file結構關聯起來，例如，struct seq_operations定義為：

  struct seq_operations exam_seq_ops = {
     .start = exam_seq_start,
     .stop = exam_seq_stop,
     .next = exam_seq_next,
     .show = exam_seq_show
  };

那麼，open函式應該如下定義：

  static int exam_seq_open(struct inode *inode, struct file *file)
  {
      return seq_open(file, &exam_seq_ops);
  };

注意，函式seq_open是seq_file提供的函式，它用於把struct seq_operations結構與seq_file檔案關聯起來。

最後，使用者需要如下設定struct file_operations結構：

  struct  file_operations exam_seq_file_ops = {
          .owner   = THIS_MODULE,
          .open    = exm_seq_open,
          .read    = seq_read,
          .llseek  = seq_lseek,
          .release = seq_release
  };

注意，使用者僅需要設定open函式，其它的都是seq_file提供的函式。
然後，使用者建立一個/proc檔案並把它的檔案操作設定為exam_seq_file_ops即可：

  struct proc_dir_entry *entry;
  entry =  create_proc_entry("exam_seq_file", 0, NULL);
  if (entry)
 entry->proc_fops = &exam_seq_file_ops;

對於簡單的輸出，seq_file使用者並不需要定義和設定這麼多函式與結構，它僅需定義一個show函式，然後使用single_open來定義open函式就可以，以下是使用這種簡單形式的一般步驟：
1) 定義一個show函式

  int exam_show(struct seq_file *p, void *v)
  {
  …
  }

2) 定義open函式

  int  exam_single_open(struct inode *inode, struct file *file)
  {
          return(single_open(file, exam_show,  NULL));
  }

注意要使用single_open而不是seq_open。

3) 定義struct file_operations結構

  struct file_operations exam_single_seq_file_operations = {
          .open           = exam_single_open,
          .read           = seq_read,
          .llseek         = seq_lseek,
          .release        = single_release,
  };

注意，如果open函式使用了single_open，release函式必須為single_release，而不是seq_release。下面給出了一個使用seq_file的具體例子seqfile_exam.c，它使用seq_file提供了一個檢視當前系統執行的所有程序的/proc介面，在編譯並插入該模組後，使用者通過命令”cat /proc/exam_esq_file“可以檢視系統的所有程序。

//kernel module: seqfile_exam.c
#include <linux/config.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/percpu.h>
#include <linux/sched.h>

static struct proc_dir_entry *entry;

static void *l_start(struct seq_file *m, loff_t * pos)
{
        loff_t index = *pos;

        if (index == 0) {
                seq_printf(m, "Current all the processes in system:\n"
                           "%-24s%-5s\n", "name", "pid");
                return &init_task;
        }
        else {
                return NULL;
        }
}

static void *l_next(struct seq_file *m, void *p, loff_t * pos)
{
        task_t * task = (task_t *)p;

        task = next_task(task);
        if ((*pos != 0) && (task == &init_task)) {
                return NULL;
        }
        ++*pos;
        return task;
}

static void l_stop(struct seq_file *m, void *p)
{
}

static int l_show(struct seq_file *m, void *p)
{
        task_t * task = (task_t *)p;

        seq_printf(m, "%-24s%-5d\n", task->comm, task->pid);
        return 0;
}

static struct seq_operations exam_seq_op = {
        .start = l_start,
        .next  = l_next,
        .stop  = l_stop,
        .show  = l_show
};

static int exam_seq_open(struct inode *inode, struct file *file)
{
        return seq_open(file, &exam_seq_op);
}

static struct file_operations exam_seq_fops = {
        .open = exam_seq_open,
        .read = seq_read,
        .llseek = seq_lseek,
        .release = seq_release,
};

static int __init exam_seq_init(void)
{

        entry = create_proc_entry("exam_esq_file", 0, NULL);
        if (entry)
                entry->proc_fops = &exam_seq_fops;

        return 0;
}

static void __exit exam_seq_exit(void)
{
        remove_proc_entry("exam_esq_file", NULL);
}

module_init(exam_seq_init);
module_exit(exam_seq_exit);
MODULE_LICENSE("GPL");

relayfs

relayfs是一個快速的轉發（relay）資料的檔案系統，它以其功能而得名。它為那些需要從核心空間轉發大量資料到使用者空間的工具和應用提供了快速有效的轉發機制。

Channel是relayfs檔案系統定義的一個主要概念，每一個channel由一組核心快取組成，每一個CPU有一個對應於該channel 的核心快取，每一個核心快取用一個在relayfs檔案系統中的檔案檔案表示，核心使用relayfs提供的寫函式把需要轉發給使用者空間的資料快速地寫入當前CPU上的channel核心快取，使用者空間應用通過標準的檔案I/O函式在對應的channel檔案中可以快速地取得這些被轉發出的資料mmap 來。寫入到channel中的資料的格式完全取決於核心中建立channel的模組或子系統。

relayfs的使用者空間API：
relayfs實現了四個標準的檔案I/O函式，open、mmap、poll和close.

open()，開啟一個channel在某一個CPU上的快取對應的檔案。
mmap()，把開啟的channel快取對映到呼叫者程序的記憶體空間。
read ()，讀取channel快取，隨後的讀操作將看不到被該函式消耗的位元組，如果channel的操作模式為非覆蓋寫，那麼使用者空間應用在有核心模組寫時仍可以讀取，但是如果channel的操作模式為覆蓋式，那麼在讀操作期間如果有核心模組進行寫，結果將無法預知，因此對於覆蓋式寫的channel，使用者應當在確認在channel的寫完全結束後再進行讀。
poll()，用於通知使用者空間應用轉發資料跨越了子快取的邊界，支援的輪詢標誌有POLLIN、POLLRDNORM和POLLERR。
close()，關閉open函式返回的檔案描述符，如果沒有程序或核心模組開啟該channel快取，close函式將釋放該channel快取。

注意：使用者態應用在使用上述API時必須保證已經掛載了relayfs檔案系統，但核心在建立和使用channel時不需要relayfs已經掛載。下面命令將把relayfs檔案系統掛載到/mnt/relay。

  mount -t relayfs relayfs /mnt/relay

relayfs核心API：

relayfs提供給核心的API包括四類：
channel管理、寫函式、回撥函式和輔助函式。

Channel管理函式包括：

relay_open(base_filename, parent, subbuf_size, n_subbufs, overwrite, callbacks) 
relay_close(chan)
relay_flush(chan)
relay_reset(chan)
relayfs_create_dir(name, parent)
relayfs_remove_dir(dentry)
relay_commit(buf, reserved, count)
relay_subbufs_consumed(chan, cpu, subbufs_consumed)

寫函式包括：

relay_write(chan, data, length)
__relay_write(chan, data, length)
relay_reserve(chan, length)

回撥函式包括：

subbuf_start(buf, subbuf, prev_subbuf_idx, prev_subbuf)
buf_mapped(buf, filp)
buf_unmapped(buf, filp)

輔助函式包括：

relay_buf_full(buf)
subbuf_start_reserve(buf, length)

前面已經講過，每一個channel由一組channel快取組成，每個CPU對應一個該channel的快取，每一個快取又由一個或多個子快取組成，每一個快取是子快取組成的一個環型快取。

函式relay_open用於建立一個channel並分配對應於每一個CPU的快取，使用者空間應用通過在relayfs檔案系統中對應的檔案可以訪問channel快取，引數base_filename用於指定channel的檔名，relay_open函式將在relayfs檔案系統中建立 base_filename0..base_filenameN-1，即每一個CPU對應一個channel檔案，其中N為CPU數，預設情況下，這些檔案將建立在relayfs檔案系統的根目錄下，但如果引數parent非空，該函式將把channel檔案創建於parent目錄下，parent目錄使用函式relay_create_dir建立，函式relay_remove_dir用於刪除由函式relay_create_dir建立的目錄，誰建立的目錄，誰就負責在不用時負責刪除。
引數subbuf_size用於指定channel快取中每一個子快取的大小，
引數n_subbufs用於指定 channel快取包含的子快取數，因此實際的channel快取大小為(subbuf_size x n_subbufs)，
引數overwrite用於指定該channel的操作模式，relayfs提供了兩種寫模式，一種是覆蓋式寫，另一種是非覆蓋式寫。使用哪一種模式完全取決於函式subbuf_start的實現，覆蓋寫將在快取已滿的情況下無條件地繼續從快取的開始寫資料，而不管這些資料是否已經被使用者應用讀取，因此寫操作決不失敗。在非覆蓋寫模式下，如果快取滿了，寫將失敗，但核心將在使用者空間應用讀取快取資料時通過函式relay_subbufs_consumed()通知relayfs。如果使用者空間應用沒來得及消耗快取中的資料或快取已滿，兩種模式都將導致資料丟失，唯一的區別是，前者丟失資料在快取開頭，而後者丟失資料在快取末尾。一旦核心再次呼叫函式relay_subbufs_consumed()，已滿的快取將不再滿，因而可以繼續寫該快取。當快取滿了以後，relayfs將呼叫回撥函式buf_full()來通知核心模組或子系統。當新的資料太大無法寫入當前子快取剩餘的空間時，relayfs將呼叫回撥函式subbuf_start()來通知核心模組或子系統將需要使用新的子快取。核心模組需要在該回調函式中實現下述功能：

初始化新的子快取；

如果1正確，完成當前子快取；

如果2正確，返回是否正確完成子快取切換；

在非覆蓋寫模式下，回撥函式subbuf_start()應該如下實現：

static int subbuf_start(struct rchan_buf *buf, void *subbuf, void *prev_subbuf, unsigned intprev_padding)
{
if (prev_subbuf)
            *((unsigned *)prev_subbuf) = prev_padding;

        if (relay_buf_full(buf))
               return 0;

        subbuf_start_reserve(buf, sizeof(unsigned int));
        return 1;
}

如果當前快取滿，即所有的子快取都沒讀取，該函式返回0，指示子快取切換沒有成功。當子快取通過函式relay_subbufs_consumed ()被讀取後，讀取者將負責通知relayfs，函式relay_buf_full()在已經有讀者讀取子快取資料後返回0，在這種情況下，子快取切換成功進行。

在覆蓋寫模式下，subbuf_start()的實現與非覆蓋模式類似：

static int subbuf_start(struct rchan_buf *buf, void *subbuf, void *prev_subbuf, unsigned int prev_padding)
{
        if (prev_subbuf)
               *((unsigned *)prev_subbuf) = prev_padding;

        subbuf_start_reserve(buf, sizeof(unsigned int));

        return 1;
}

只是不做relay_buf_full()檢查，因為此模式下，快取是環行的，可以無條件地寫。因此在此模式下，子快取切換必定成功，函式 relay_subbufs_consumed() 也無須呼叫。如果channel寫者沒有定義subbuf_start()，預設的實現將被使用。可以通過在回撥函式subbuf_start()中呼叫輔助函式subbuf_start_reserve()在子快取中預留頭空間，預留空間可以儲存任何需要的資訊，如上面例子中，預留空間用於儲存子快取填充位元組數，在subbuf_start()實現中，前一個子快取的填充值被設定。前一個子快取的填充值和指向前一個子快取的指標一道作為subbuf_start()的引數傳遞給subbuf_start()，只有在子快取完成後，才能知道填充值。 subbuf_start()也被在channel建立時分配每一個channel快取的第一個子快取時呼叫，以便預留頭空間，但在這種情況下，前一個子快取指標為NULL。

核心模組使用函式relay_write()或__relay_write()往channel快取中寫需要轉發的資料，它們的區別是前者失效了本地中斷，而後者只搶佔失效，因此前者可以在任何核心上下文安全使用，而後者應當在沒有任何中斷上下文將寫channel快取的情況下使用。這兩個函式沒有返回值，因此使用者不能直接確定寫操作是否失敗，在快取滿且寫模式為非覆蓋模式時，relayfs將通過回撥函式buf_full來通知核心模組。

函式relay_reserve()用於在channel快取中預留一段空間以便以後寫入，在那些沒有臨時快取而直接寫入channel快取的核心模組可能需要該函式，使用該函式的核心模組在實際寫這段預留的空間時可以通過呼叫relay_commit()來通知relayfs。當所有預留的空間全部寫完並通過relay_commit通知relayfs後，relayfs將呼叫回撥函式deliver()通知核心模組一個完整的子快取已經填滿。由於預留空間的操作並不在寫channel的核心模組完全控制之下，因此relay_reserve()不能很好地保護快取，因此當核心模組呼叫 relay_reserve()時必須採取恰當的同步機制。

當核心模組結束對channel的使用後需要呼叫relay_close() 來關閉channel，如果沒有任何使用者在引用該channel，它將和對應的快取全部被釋放。

函式relay_flush()強制在所有的channel快取上做一個子快取切換，它在channel被關閉前使用來終止和處理最後的子快取。

函式relay_reset()用於將一個channel恢復到初始狀態，因而不必釋放現存的記憶體對映並重新分配新的channel快取就可以使用channel，但是該呼叫只有在該channel沒有任何使用者在寫的情況下才可以安全使用。

回撥函式buf_mapped() 在channel快取被對映到使用者空間時被呼叫。

回撥函式buf_unmapped()在釋放該對映時被呼叫。核心模組可以通過它們觸發一些核心操作，如開始或結束channel寫操作。

在原始碼包中給出了一個使用relayfs的示例程式relayfs_exam.c，它只包含一個核心模組，對於複雜的使用，需要應用程式配合。該模組實現了類似於中seq_file示例實現的功能。

當然為了使用relayfs，使用者必須讓核心支援relayfs，並且要mount它，下面是作者系統上的使用該模組的輸出資訊：

$ mkdir -p /relayfs
$ insmod ./relayfs-exam.ko
$ mount -t relayfs relayfs /relayfs
$ cat /relayfs/example0
…
$

relayfs是一種比較複雜的核心態與使用者態的資料交換方式，本例子程式只提供了一個較簡單的使用方式，對於複雜的使用，請參考relayfs用例頁面http://relayfs.sourceforge.net/examples.html。

//kernel module: relayfs-exam.c
#include <linux/module.h>
#include <linux/relayfs_fs.h>
#include <linux/string.h>
#include <linux/sched.h>

#define WRITE_PERIOD (HZ * 60)
static struct rchan *   chan;
static size_t           subbuf_size = 65536;
static size_t           n_subbufs = 4;
static char buffer[256];

void relayfs_exam_write(unsigned long data);

static DEFINE_TIMER(relayfs_exam_timer, relayfs_exam_write, 0, 0);

void relayfs_exam_write(unsigned long data)
{
        int len;
        task_t * p = NULL;

        len = sprintf(buffer, "Current all the processes:\n"); 
        len += sprintf(buffer + len, "process name\t\tpid\n"); 
        relay_write(chan, buffer, len);

        for_each_process(p) {
                len = sprintf(buffer, "%s\t\t%d\n", p->comm, p->pid); 
                relay_write(chan, buffer, len);
        }
        len = sprintf(buffer, "\n\n"); 
        relay_write(chan, buffer, len);

        relayfs_exam_timer.expires = jiffies + WRITE_PERIOD;
        add_timer(&relayfs_exam_timer);
}


/*
 * subbuf_start() relayfs callback.
 *
 * Defined so that we can 1) reserve padding counts in the sub-buffers, and
 * 2) keep a count of events dropped due to the buffer-full condition.
 */
static int subbuf_start(struct rchan_buf *buf,
                        void *subbuf,
                        void *prev_subbuf,
                        unsigned int prev_padding)
{
        if (prev_subbuf)
                *((unsigned *)prev_subbuf) = prev_padding;

        if (relay_buf_full(buf))
                return 0;

        subbuf_start_reserve(buf, sizeof(unsigned int));

        return 1;
}

/*
 * relayfs callbacks
 */
static struct rchan_callbacks relayfs_callbacks =
{
        .subbuf_start = subbuf_start,
};

/**
 *      module init - creates channel management control files
 *
 *      Returns 0 on success, negative otherwise.
 */
static int init(void)
{

        chan = relay_open("example", NULL, subbuf_size,
                          n_subbufs, &relayfs_callbacks);

        if (!chan) {
                printk("relay channel creation failed.\n");
                return 1;
        }
        relayfs_exam_timer.expires = jiffies + WRITE_PERIOD;
        add_timer(&relayfs_exam_timer);

        return 0;
}

static void cleanup(void)
{
        del_timer_sync(&relayfs_exam_timer);
        if (chan) {
                relay_close(chan);
                chan = NULL;
        }
}

module_init(init);
module_exit(cleanup);
MODULE_LICENSE("GPL");

核心空間和使用者空間資料交換一

debugfs

procfs

seq_file

relayfs

核心空間和使用者空間資料交換一

核心空間和使用者空間的資料交換

Linux核心空間和使用者空間傳遞資料

Oracle中查詢當前資料庫中的所有表空間和對應的資料檔案語句命令

informix 建表初始空間和擴充套件空間設定不合理導致插入資料時鎖表問題.

作業系統核心空間和使用者空間的互訪問

Lebesgue空間和Riemann空間

sed的模式空間和保持空間

磁碟可用空間和佔用空間對不上的問題

k8s中的儲存卷-節點和POD儲存資料（一）

Unity Shader - 對Cubemap進行環境對映（世界空間和切線空間下的對比）

Windows10系統C盤檔案實際大小佔用空間和可用空間不一致（相差差8到20G）

Apache Kafka入門教程輕鬆學-第四章 Kafka核心元件和流程-設計-原理（一）控制器

列空間和零空間-線性代數課時6（MIT Linear Algebra , Gilbert Strang）

關於棧空間和堆空間（指標）

Java棧空間和堆空間

堆空間和棧空間的大小

android裝置的記憶體空間(RAM)總空間和可用空間大小的獲取以及一些思考

線性代數導論6——列空間和零空間

使用者空間與核心空間資料交換的方式(1)------debugfs

核心空間和使用者空間資料交換一

debugfs

procfs

seq_file

relayfs

相關推薦