20189220 余超《Linux內核原理與分析》第九周作業

阿新 • • 發佈：2018-12-09

maintain disable 指針 processor moved 基礎知識現場 kqueue spinlock

理解進程調度時機跟蹤分析進程調度與進程切換的過程

本章的基礎知識總結

一般來說，進程調度分為三種類型：中斷處理過程（包括時鐘中斷、I/O 中斷、系統調用和異常）中，直接調用schedule，或者返回用戶態時根據 need_resched 標記調用 schedule；內核線程可以直接調用 schedule 進行進程切換，也可以在中斷處理過程中進行調度，也就是說內核線程作為一類的特殊的進程可以主動調度，也可以被動調度；用戶態進程無法實現主動調度，僅能通過陷入內核態後的某個時機點進行調度，即在中斷處理過程中進行調度。
為了控制進程的執行，內核必須有能力掛起正在 CPU 上執行的進程，並恢復以前掛起的某個進程的執行的過程，叫做進程切換、任務切換、上下文切換。掛起正在 CPU 上執行的進程，與中斷時保存現場是有區別的，中斷前後是在同一個進程上下文中，只是由用戶態轉向內核態執行。也即是說中斷是在同一個進程中執行的，進程上下文是在不同的進程中執行的。

進程上下文信息：用戶地址空間：包括程序代碼，數據，用戶堆棧等；控制信息：進程描述符，內核堆棧等；硬件上下文（註意中斷也要保存硬件上下文只是保存的方法不同）；schedule 函數選擇一個新的進程來運行，並調用 context_switch 宏進行上下文的切換，這個宏又調用 switch_to 宏來進行關鍵上下文切換；switch_to 宏中定義了 prev 和 next 兩個參數：prev 指向當前進程，next 指向被調度的進程。

實驗流程

1.用gdb來進行調試，並設置相應的斷點
技術分享圖片

2.schedule()函數斷點截圖，進程調度的主體
技術分享圖片

3.context_switch函數的斷點截圖，用於實現進程的切換

4.pick_next_task函數斷點截圖，使用某種調度策略選擇下一個進程來切換
技術分享圖片

代碼分析

static void __sched __schedule(void)
{
  struct task_struct *prev, *next;
  unsigned long *switch_count;
  struct rq *rq;
  int cpu;

need_resched:
  preempt_disable();
  cpu = smp_processor_id();
  rq = cpu_rq(cpu);
  rcu_note_context_switch(cpu);
  prev = rq->curr;

  schedule_debug(prev);

  if (sched_feat(HRTICK))
    hrtick_clear(rq);

  /*
   * Make sure that signal_pending_state()->signal_pending() below
   * can‘t be reordered with __set_current_state(TASK_INTERRUPTIBLE)
   * done by the caller to avoid the race with signal_wake_up().
   */
  smp_mb__before_spinlock();
  raw_spin_lock_irq(&rq->lock);

  switch_count = &prev->nivcsw;
  if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
    if (unlikely(signal_pending_state(prev->state, prev))) {
      prev->state = TASK_RUNNING;
    } else {
      deactivate_task(rq, prev, DEQUEUE_SLEEP);
      prev->on_rq = 0;

      /*
       * If a worker went to sleep, notify and ask workqueue
       * whether it wants to wake up a task to maintain
       * concurrency.
       */
      if (prev->flags & PF_WQ_WORKER) {
        struct task_struct *to_wakeup;

        to_wakeup = wq_worker_sleeping(prev, cpu);
        if (to_wakeup)
          try_to_wake_up_local(to_wakeup);
      }
    }
    switch_count = &prev->nvcsw;
  }

  if (task_on_rq_queued(prev) || rq->skip_clock_update < 0)
    update_rq_clock(rq);

  next = pick_next_task(rq, prev);
  clear_tsk_need_resched(prev);
  clear_preempt_need_resched();
  rq->skip_clock_update = 0;

  if (likely(prev != next)) {
    rq->nr_switches++;
    rq->curr = next;
    ++*switch_count;

    context_switch(rq, prev, next); /* unlocks the rq */
    /*
     * The context switch have flipped the stack from under us
     * and restored the local variables which were saved when
     * this task called schedule() in the past. prev == current
     * is still correct, but it can be moved to another cpu/rq.
     */
    cpu = smp_processor_id();
    rq = cpu_rq(cpu);
  } else
    raw_spin_unlock_irq(&rq->lock);

  post_schedule(rq);

  sched_preempt_enable_no_resched();
  if (need_resched())
    goto need_resched;
}

schedule 函數主要做了這麽幾件事：針對搶占的處理;檢查prev的狀態，並且重設state的狀態;next = pick_next_task(rq, prev); //進程調度;更新就緒隊列的時鐘;context_switch(rq, prev, next); //進程上下文切換

stwitch_to的代碼

asm volatile("pushfl\n\t"  /* save    flags */        "pushl %%ebp\n\t"  /* save    EBP   */        "movl %%esp,%[prev_sp]\n\t" /* save    ESP   */        "movl %[next_sp],%%esp\n\t" /* restore ESP   */        "movl $1f,%[prev_ip]\n\t" /* save    EIP   */        "pushl %[next_ip]\n\t" /* restore EIP   */        "jmp __switch_to\n" /* regparm call  */        "1:\t"             "popl %%ebp\n\t"  /* restore EBP   */        "popfl\n"   /* restore flags */                 /* output parameters */                              : [prev_sp] "=m" (prev->thread.sp),         /* =m 表示把變量放入內存，即把 [prev_sp] 存儲的變量放入內存，最後再寫入prev->thread.sp */         [prev_ip] "=m" (prev->thread.ip),           "=a" (last),                                                    /*=a 表示把變量 last 放入 ax, eax = last */                           /* clobbered output registers: */           "=b" (ebx), "=c" (ecx), "=d" (edx),           /* b 表示放入ebx, c 表示放入 ecx，d 表示放入 edx, S表示放入 si, D 表示放入 edi */         "=S" (esi), "=D" (edi)                             /* input parameters: */           : [next_sp]  "m" (next->thread.sp),         /* next->thread.sp 放入內存中的 [next_sp] */         [next_ip]  "m" (next->thread.ip),                           /* regparm parameters for __switch_to (): */          [prev]     "a" (prev),             /*eax = prev  edx = next*/         [next]     "d" (next)                    : /* reloaded segment registers */          "memory");

switch_to從A進程切換到B進程的步驟如下：
1.復制兩個變量到寄存器： [prev]"a" (prev) [next]"d" (next)。這也就是eax <== prev_A或eax<==%p(%ebp_A) edx <== next_A 或edx<==%n(%ebp_A)

2.保存進程A的ebp和eflags。註意，因為現在esp還在A的堆棧中，所以它們是被保存到A進程的內核堆棧中。

3.保存當前esp到A進程內核描述符中：這也就是prev_A->thread.sp<== esp_A 在調用switch_to時，prev是指向A進程自己的進程描述符的。

4.從next（進程B）的描述符中取出之前從B切換出去時保存的esp_B 註意，在A進程中的next是指向B的進程描述符的。從這個時候開始，CPU當前執行的進程已經是B進程了，因為esp已經指向B的內核堆棧。但是，現在的ebp仍然指向A進程的內核堆棧，所以所有局部變量仍然是A中的局部變量，比如next實質上是%n(%ebp_A)，也就是next_A，即指向B的進程描述符。

5.把標號為1的指令地址保存到A進程描述符的ip域：當A進程下次從switch_to回來時，會從這條指令開始執行。具體方法要看後面被切換回來的B的下一條指令。

6.將返回地址保存到堆棧，然後調用switch_to()函數，switch_to()函數完成硬件上下文切換註意，如果之前B也被switch_to出去過，那麽[next_ip]裏存的就是下面這個1f的標號，但如果進程B剛剛被創建，之前沒有被switch_to出去過，那麽[next_ip]裏存的將是ret_ftom_fork（參看copy_thread()函數）。
當這裏switch_to()返回時，將返回值prev_A又寫入了%eax，這就使得在switch_to宏裏面eax寄存器始終保存的是prev_A的內容，或者，更準確的說，是指向A進程描述符的“指針”。

7.從switch_to()返回後繼續從1:標號後面開始執行，修改ebp到B的內核堆棧，恢復B的eflags。

8.將eax寫入last，以在B的堆棧中保存正確的prev信息。所以，這裏面的last實質上就是prev，因此在switch_to宏執行完之後，prev_B就是正確的A的進程描述符了。這裏，last的作用相當於把進程A堆棧中的A進程描述符地址復制到了進程B的堆棧中。

9.至此，switch_to已經執行完成，A停止運行，而開始執行B。此後，可能在某一次調度中，進程A得到調度，就會出現switch_to(C,A)這樣的調用，這時，A再次得到調度，得到調度後，A進程從context_switch()中switch_to後面的代碼開始執行，這時候，它看到的prev_A將指向C的進程描述符。

本章總結

一般情形：

正在運行的用戶態進程 A 切換到運行用戶態進程 B 的過程：

1、正在運行的用戶態進程 A；
2、中斷——save cs:eip/esp/eflags(current) to kernel stack，and load cs:eip(entry of a specific ISR) and ss:esp(point to kernel stack)；
3、SAVE_ALL //保存現場；
4、中斷處理或中斷返回前調用 schedule，其中，switch_to 做了關鍵的進程上下文切換；
5、標號1之後開始運行用戶態進程 B；
6、restore_all //恢復現場；
7、iret——pop cs:eip/ss:esp/eflags from kernel stack；
8、繼續運行用戶態進程 B；

特殊情況：

1、通過中斷處理過程中的調度，用戶態進程與內核進程之間互相切換，與一般情形類似；
2、內核進程程主動調用 schedule 函數，只有進程上下文的切換，沒有中斷上下文切換；
3、創建子進程的系統調用在子進程中的執行起點及返回用戶態，如：fork；
4、加載一個新的可執行程序後返回到用戶態的情況，如：execve；

20189220 余超《Linux內核原理與分析》第九周作業

20189220 余超《Linux內核原理與分析》第九周作業

理解進程調度時機跟蹤分析進程調度與進程切換的過程

本章的基礎知識總結

實驗流程

代碼分析

本章總結

20189220 余超《Linux內核原理與分析》第二周作業

20189221 郭開世《Linux內核原理與分析》第二周作業

20189220 余超《Linux內核原理與分析》第九周作業

2017-2018-1 20179202《Linux內核原理與分析》第八周作業

20179223《Linux內核原理與分析》第九周學習筆記

2017-2018-1 20179202《Linux內核原理與分析》第九周作業

20179203 《Linux內核原理與分析》第十周作業

2017-2018-1 20179215《Linux內核原理與分析》第十周作業

2017-2018-1 20179215《Linux內核原理與分析》第十二周作業

20179203 《Linux內核原理與分析》第十二周作業

2018-2019-1 20189203《Linux內核原理與分析》第三周作業

2018-2019-1 20189206 《Linux內核原理與分析》第三周作業

2018-2019-1 20189229《Linux內核原理與分析》第三周作業

2018-2019-1 20189219《Linux內核原理與分析》第四周作業

2018-2019-1 20189215 《Linux內核原理與分析》第四周作業

2018-2019-1 20189206 《Linux內核原理與分析》第四周作業

2018-2019-1 20189219《Linux內核原理與分析》第五周作業

2018-2019-1 20189206 《Linux內核原理與分析》第五周作業

2018-2019-1 20189201 《LInux內核原理與分析》第五周作業

2018-2019-1 20189206 《Linux內核原理與分析》第六周作業

20189220 余超《Linux內核原理與分析》第九周作業

理解進程調度時機跟蹤分析進程調度與進程切換的過程

本章的基礎知識總結

實驗流程

代碼分析

本章總結

相關推薦