1. 程式人生 > >十四 第三章再續 快速選擇SELECT演算法的深入分析與實現

十四 第三章再續 快速選擇SELECT演算法的深入分析與實現

               

                          十四、亦第三章再續:快速選擇SELECT演算法的深入分析與實現

前言

    經典演算法研究系列已經寫了十三個演算法,共計22篇文章(詳情,見這:十三個經典演算法研究與總結、目錄+索引),我很怕我自己不再把這個算法系列給繼續寫下去了。沉思良久,到底是不想因為要創作狂想曲系列而耽擱這個經典演算法研究系列,何況它,至今反響還不錯。

    ok,狂想曲第三章提出了一個演算法,就是快速選擇SELECT演算法,關於這個SELECT演算法通過選取陣列中中位數的中位數作為樞紐元能保證在最壞情況下,亦能做到線性O(N)的時間複雜度的證明,在狂想曲第三章也已經給出。

   本文咱們從快速排序演算法分析開始(因為如你所知,快速選擇演算法與快速排序演算法在partition劃分過程上是類似的),參考Mark的資料結構與演算法分析-c語言描述一書,而後逐步深入分析快速選擇SELECT演算法,最後,給出SELECT演算法的程式實現。

   同時,本文有部分內容來自狂想曲系列第三章,也算是對第三章、尋找最小的k個數的一個總結。yeah,有任何問題,歡迎各位批評指正,如果你挑出了本文章或本blog任何一個問題或錯誤,當即免費給予單獨贈送本blog最新一期第6期的博文集錦CHM檔案,謝謝。

第一節、快速排序

1.1、快速排序演算法的介紹

      關於快速排序演算法,本人已經寫了3篇文章(可參見其中的兩篇:1、

十二、快速排序演算法之所有版本的c/c++實現,2、一之續、快速排序演算法的深入分析),為何又要舊事重提列?正如很多事物都有相似的地方,而咱們面臨的問題--快速選擇演算法中的劃分過程等同於快速排序,所以,在分析快速選擇SELECT演算法之前,咱們先再來簡單回顧和分析下快速排序,ok,今天看到Mark的資料結構與演算法分析-c語言描述一書上對快速排序也有不錯的介紹,所以為了增加點新鮮感,就不用自己以前的文章而改為直接引用Mark的敘述了:

    As its name implies, quicksort is the fastest known sorting algorithm in practice. Its average running time is O(n log n)(快速排序是實踐中已知的最快的排序演算法,他的平均執行時間為O(N*logN)). It is very fast, mainly due to a very tight and highly optimized inner loop. It has O(n2) worst-case performance(最壞情形的效能為O(N^2)), but this can be made exponentially unlikely with a little effort.

    The quicksort algorithm is simple to understand and prove correct, although for many years it had the reputation of being an algorithm that could in theory be highly optimized but in practice was impossible to code correctly (no doubt because of FORTRAN).

    Like mergesort, quicksort is a divide-and-conquer recursive algorithm(像歸併排序一樣,快速排序也是一種採取分治方法的遞迴演算法). The basic algorithm to sort an array S consists of the following four easy steps(通過下面的4個步驟將陣列S排序的演算法如下):

1. If the number of elements in S is 0 or 1, then return(如果S中元素個數是0或1,則返回).2. Pick any element v in S. This is called the pivot(取S中任一元素v,作為樞紐元).3. Partition S - {v} (the remaining elements in S) into two disjoint groups(樞紐元v將S中其餘的元素分成兩個不相交的集合): S1 = {x(- S-{v}| x <= v}, and S2 = {x(- S-{v}| x >= v}.4. Return { quicksort(S1) followed by v followed by quicksort(S2)}.

下面依據上述步驟對序列13,81,92,43,65,31,57,26,75,0 進行第一趟劃分處理,可得到如下圖所示的過程:

1.2、選取樞紐元的幾種方法1、糟糕的方法    通常的做法是選擇陣列中第一個元素作為樞紐元,如果輸入是隨機的,那麼這是可以接受的。但是,如果輸入序列是預排序的或者是反序的,那麼依據這樣的樞紐元進行劃分則會出現相當糟糕的情況,因為可能所有的元素不是被劃入S1,就是都被劃入S2中。2、較好的方法   一個比較好的做法是隨機選取樞紐元,一般來說,這種策略是比較妥當的。3、三數取取中值方法   例如,輸入序列為 8, 1, 4, 9, 6, 3, 5, 2, 7, 0 ,它的左邊元素為8,右邊元素為0,中間位置|_left+right)/2_|上的元素為6,於是樞紐元為6.顯然,使用三數中值分割法消除了預排序輸入的壞情形,並且減少了快速排序大約5%(此為前人實驗所得資料,無法具體證明)的執行時間。

1.3、劃分過程   下面,我們再對序列8, 1, 4, 9, 6, 3, 5, 2, 7, 0進行第一趟劃分,我們要達到的劃分目的就是為了把所有小於樞紐元(據三數取中分割法取元素6為樞紐元)的元素移到陣列的左邊,而把所有大於樞紐元的元素全部移到陣列的右邊。

   此過程,如下述幾個圖所示:8  1  4  9  0  3  5  2  7  6                     i                               j

8  1  4  9  0  3  5  2  7  6                 i                           j

      After First Swap:----------------------------2  1  4  9  0  3  5  8  7  6               i                           j

      Before Second Swap:----------------------------2  1  4  9  0  3  5  8  7  6                            i           j

      After Second Swap:----------------------------2  1  4  5  0  3  9  8  7  6                           i           j

     Before Third Swap----------------------------2  1  4  5  0  3  9  8  7  6                    j   i   //i,j在元素3處碰頭之後,i++指向了9,最後與6交換後,得到:

2  1  4  5  0  3  6  8  7  9                                                         i         pivot

至此,第一趟劃分過程結束,樞紐元6將整個序列劃分成了左小右大兩個部分。

1.4、四個細節

下面,是4個值得你注意的細節問題:    1、我們要考慮一下,就是如何處理那些等於樞紐元的元素,問題在於當i遇到第一個等於樞紐元的關鍵字時,是否應該停止移動i,或者當j遇到一個等於樞紐元的元素時是否應該停止移動j。答案是:如果i,j遇到等於樞紐元的元素,那麼我們就讓i和j都停止移動。    2、對於很小的陣列,如陣列的大小N<=20時,快速排序不如插入排序好。    3、只通過元素間進行比較達到排序目的的任何排序演算法都需要進行O(N*logN)次比較,如快速排序演算法(最壞O(N^2),最好O(N*logN)),歸併排序演算法(最壞O(N*logN,不過歸併排序的問題在於合併兩個待排序的序列需要附加線性記憶體,在整個演算法中,還要將資料拷貝到臨時陣列再拷貝回來這樣一些額外的開銷,放慢了歸併排序的速度)等。    4、下面是實現三數取中的劃分方法的程式:

//三數取中分割法input_type median3( input_type a[], int left, int right )    //下面的快速排序演算法實現之一,及通過三數取中分割法尋找最小的k個數的快速選擇SELECT演算法都要呼叫這個median3函式{  int center;  center = (left + right) / 2;  if( a[left] > a[center] )    swap( &a[left], &a[center] );  if( a[left] > a[right] )    swap( &a[left], &a[right] );  if( a[center] > a[right] )    swap( &a[center], &a[right] );   /* invariant: a[left] <= a[center] <= a[right] */  swap( &a[center], &a[right-1] );     /* hide pivot */  return a[right-1];                   /* return pivot */ } 

下面的程式是利用上面的三數取中分割法而執行的快速排序演算法:

//快速排序的實現之一void q_sort( input_type a[], int left, int right ){  int i, j;  input_type pivot;  if( left + CUTOFF <= right )  {    pivot = median3( a, left, right );   //呼叫上面的實現三數取中分割法的median3函式  i=left; j=right-1;   //第8句   for(;;)    {      while( a[++i] < pivot );      while( a[--j] > pivot );     if( i < j )       swap( &a[i], &a[j] );      else        break;       //第16句     }    swap( &a[i], &a[right-1] );   /*restore pivot*/      q_sort( a, left, i-1 );          q_sort( a, i+1, right );     //如上所見,在劃分過程(partition)後,快速排序需要兩次遞迴,一次對左邊遞迴  //一次對右邊遞迴。下面,你將看到,快速選擇SELECT演算法始終只對一邊進行遞迴。  //這從直觀上也能反應出:此快速排序演算法(O(N*logN))明顯會比  //下面第二節中的快速選擇SELECT演算法(O(N))平均花費更多的執行時間。 }   }

如果上面的第8-16句,改寫成以下這樣:

i=left+1; j=right-2;for(;;){ while( a[i] < pivot ) i++;  while( a[j] > pivot ) j--;  if( i < j )   swap( &a[i], &a[j] );  else  break; }

那麼,當a[i] = a[j] = pivot則會產生無限,即死迴圈(相信,不用我多餘解釋,:D)。ok,接下來,咱們將進入正題--快速選擇SELECT演算法。

第二節、線性期望時間的快速選擇SELECT演算法

2.1、快速選擇SELECT演算法的介紹

  Quicksort can be modified to solve the selection problem, which we have seen in chapters 1 and 6. Recall that by using a priority queue, we can find the kth largest (or smallest) element in O(n + k log n)(以用最小堆初始化陣列,然後取這個優先佇列前k個值,複雜度O(n)+k*O(log n)。實際上,最好採用最大堆尋找最小的k個數,那樣,此時複雜度為n*logk。更多詳情,請參見:狂想曲系列第三章、尋找最小的k個數). For the special case of finding the median, this gives an O(n log n) algorithm.

    Since we can sort the file in O(nlog n) time, one might expect to obtain a better time bound for selection. The algorithm we present to find the kth smallest element in a set S is almost identical to quicksort. In fact, the first three steps are the same. We will call this algorithm quickselect(叫做快速選擇). Let |Si| denote the number of elements in Si(令|Si|為Si中元素的個數). The steps of quickselect are:

    1. If |S| = 1, then k = 1 and return the elements in S as the answer. If a cutoff for small files is being used and |S| <=CUTOFF, then sort S and return the kth smallest element.    2. Pick a pivot element, v (- S.(選取一個樞紐元v屬於S)    3. Partition S - {v} into S1 and S2, as was done with quicksort.(將集合S-{v}分割成S1和S2,就像我們在快速排序中所作的那樣)

    4. If k <= |S1|, then the kth smallest element must be in S1. In this case, return quickselect (S1, k). If k = 1 + |S1|, then the pivot is the kth smallest element and we can return it as the answer. Otherwise, the kth smallest element lies in S2, and it is the (k - |S1| - 1)st smallest element in S2. We make a recursive call and return quickselect (S2, k - |S1| - 1).(如果k<=|S1|,那麼第k個最小元素必然在S1中。在這種情況下,返回quickselect(S1,k)。如果k=1+|S1|,那麼樞紐元素就是第k個最小元素,即找到,直接返回它。否則,這第k個最小元素就在S2中,即S2中的第(k-|S1|-1)個最小元素,我們遞迴呼叫並返回quickselect(S2,k-|S1|-1))(下面幾節的程式關於k的表述可能會有所出入,但無礙,抓住原理即ok)。

    In contrast to quicksort, quickselect makes only one recursive call instead of two. The worst case of quickselect is identical to that of quicksort and is O(n2). Intuitively, this is because quicksort's worst case is when one of S1 and S2 is empty; thus, quickselect(快速選擇) is not really saving a recursive call. The average running time, however, is O(n)(不過,其平均執行時間為O(N)。看到了沒,就是平均複雜度為O(N)這句話). The analysis is similar to quicksort's and is left as an exercise.

    The implementation of quickselect is even simpler than the abstract description might imply. The code to do this shown in Figure 7.16. When the algorithm terminates, the kth smallest element is in position k. This destroys the original ordering; if this is not desirable, then a copy must be made.

2.2、三數中值分割法尋找第k小的元素

    第一節,已經介紹過此三數中值分割法,有個細節,你要注意,即陣列元素索引是從“0...i”開始計數的,所以第k小的元素應該是返回a[i]=a[k-1].即k-1=i。換句話就是說,第k小元素,實際上應該在陣列中對應下標為k-1。ok,下面給出三數中值分割法尋找第k小的元素的程式的兩個程式碼實現:

[cpp] view plain copy print?
  1. //程式碼實現一
  2. //[email protected] mark allen weiss
  3. //July、updated,2011.05.05凌晨.
  4. //三數中值分割法尋找第k小的元素的快速選擇SELECT演算法
  5. void q_select( input_type a[], int k, int left, int right )  
  6. {  
  7.     int i, j;   
  8.     input_type pivot;    
  9.     if( left /*+ CUTOFF*/ <= right )  //去掉CUTOFF常量,無用
  10.     {   
  11.         pivot = median3( a, left, right );   //呼叫1、4節裡的實現三數取中分割法的median3函式
  12.         //取三數中值作為樞紐元,可以消除最壞情況而保證此演算法是O(N)的。不過,這還只侷限在理論意義上。
  13.         //稍後,您將看到另一種選取樞紐元的方法。
  14.         i=left; j=right-1;    
  15.         for(;;)  //此句到下面的九行程式碼,即為快速排序中的partition過程的實現之一
  16.         {    
  17.             while( a[++i] < pivot ){}    
  18.             while( a[--j] > pivot ){}   
  19.             if (i < j )    
  20.                 swap( &a[i], &a[j] );    
  21.             else
  22.                 break;     
  23.         }   
  24.         swap( &a[i], &a[right-1] ); /* restore pivot */
  25.         if( k < i)   
  26.             q_select( a, k, left, i-1 );    
  27.         else
  28.             if( k-1 > i )  //此條語句相當於:if(k>i+1)
  29.                 q-select( a, k, i+1, right );      
  30.             //1、希望你已經看到,通過上面的if-else語句表明,此快速選擇SELECT演算法始終只對陣列的一邊進行遞迴,
  31.             //這也是其與第一節中的快速排序演算法的本質性區別。
  32.             //2、這個區別則直接決定了:快速排序演算法最快能達到O(N*logN),
  33.             //而快速選擇SELECT演算法則最壞亦能達到O(N)的線性時間複雜度。
  34.             //3、而確保快速選擇演算法最壞情況下能做到O(N)的根本保障在於樞紐元元素的選取,
  35.             //即採取稍後的2.3節裡的五分化中項的中項,或2.4節裡的中位數的中外位數的樞紐元選擇方法達到O(N)的目的。
  36.             //後天老爸生日,孩兒深深祝福。July、updated,2011.05.19。
  37.     }  
  38.     else
  39.         insert_sort(a, left, right-left+1 );    
  40. }  
  41. //程式碼實現二
  42. //copyright @ 飛羽
  43. //July、updated,2011.05.11。
  44. //三數中值分割法尋找第k小的元素
  45. bool median_select(int array[], int left, int right, int k)     
  46. {     
  47.     //第k小元素,實際上應該在陣列中下標為k-1   
  48.     if (k-1 > right || k-1 < left)        
  49.         returnfalse;     
  50.     //三數中值作為樞紐元方法,關鍵程式碼就是下述六行:   
  51.     int midIndex=(left+right)/2;     
  52.     if(array[left]<array[midIndex])     
  53.         swap(array[left],array[midIndex]);     
  54.     if(array[right]<array[midIndex])     
  55.         swap(array[right],array[midIndex]);     
  56.     if(array[right]<array[left])     
  57.         swap(array[right],array[left]);     
  58.     swap(array[midIndex], array[right]);     
  59.     int pos = partition(array, left, right);     
  60.     if (pos == k-1)   //第k小元素,實際上應該在陣列中下標為k-1  
  61.         returntrue;     
  62.     elseif (pos > k-1)     
  63.         return median_select(array, left, pos-1, k);     
  64.     elsereturn median_select(array, pos+1, right, k);     
  65. }  
//程式碼實現一//[email protected] mark allen weiss//July、updated,2011.05.05凌晨.//三數中值分割法尋找第k小的元素的快速選擇SELECT演算法void q_select( input_type a[], int k, int left, int right ){ int i, j;  input_type pivot;   if( left /*+ CUTOFF*/ <= right )  //去掉CUTOFF常量,無用 {   pivot = median3( a, left, right );   //呼叫1、4節裡的實現三數取中分割法的median3函式  //取三數中值作為樞紐元,可以消除最壞情況而保證此演算法是O(N)的。不過,這還只侷限在理論意義上。  //稍後,您將看到另一種選取樞紐元的方法。    i=left; j=right-1;    for(;;)  //此句到下面的九行程式碼,即為快速排序中的partition過程的實現之一  {     while( a[++i] < pivot ){}     while( a[--j] > pivot ){}    if (i < j )      swap( &a[i], &a[j] );     else       break;     }   swap( &a[i], &a[right-1] ); /* restore pivot */      if( k < i)    q_select( a, k, left, i-1 );    else     if( k-1 > i )  //此條語句相當於:if(k>i+1)    q-select( a, k, i+1, right );       //1、希望你已經看到,通過上面的if-else語句表明,此快速選擇SELECT演算法始終只對陣列的一邊進行遞迴,   //這也是其與第一節中的快速排序演算法的本質性區別。      //2、這個區別則直接決定了:快速排序演算法最快能達到O(N*logN),   //而快速選擇SELECT演算法則最壞亦能達到O(N)的線性時間複雜度。   //3、而確保快速選擇演算法最壞情況下能做到O(N)的根本保障在於樞紐元元素的選取,   //即採取稍後的2.3節裡的五分化中項的中項,或2.4節裡的中位數的中外位數的樞紐元選擇方法達到O(N)的目的。   //後天老爸生日,孩兒深深祝福。July、updated,2011.05.19。 } else          insert_sort(a, left, right-left+1 );  }//程式碼實現二//copyright @ 飛羽//July、updated,2011.05.11。//三數中值分割法尋找第k小的元素bool median_select(int array[], int left, int right, int k)   {       //第k小元素,實際上應該在陣列中下標為k-1       if (k-1 > right || k-1 < left)              return false;        //三數中值作為樞紐元方法,關鍵程式碼就是下述六行:       int midIndex=(left+right)/2;       if(array[left]<array[midIndex])           swap(array[left],array[midIndex]);       if(array[right]<array[midIndex])           swap(array[right],array[midIndex]);       if(array[right]<array[left])           swap(array[right],array[left]);       swap(array[midIndex], array[right]);        int pos = partition(array, left, right);        if (pos == k-1)   //第k小元素,實際上應該在陣列中下標為k-1          return true;       else if (pos > k-1)           return median_select(array, left, pos-1, k);       else return median_select(array, pos+1, right, k);   }     上述程式使用三數中值作為樞紐元的方法可以使得最壞情況發生的概率幾乎可以忽略不計。然而,稍後,您將看到:通過一種更好的方法,如“五分化中項的中項”,或“中位數的中位數”等方法選取樞紐元,我們將能徹底保證在最壞情況下依然是線性O(N)的複雜度。即,如稍後2.3節所示。

2.3、五分化中項的中項,確保O(N)

    The selection problem requires us to find the kth smallest element in a list S of n elements(要求我們找出含N個元素的表S中的第k個最小的元素). Of particular interest is the special case of finding the median. This occurs when k = |-n/2-|(向上取整).(我們對找出中間元素的特殊情況有著特別的興趣,這種情況發生在k=|-n/2-|的時候)

    In Chapters 1, 6, 7 we have seen several solutions to the selection problem. The solution in Chapter 7 uses a variation of quicksort and runs in O(n) average time(第7章中的解法,即本文上面第1節所述的思路4,用到快速排序的變體並以平均時間O(N)執行). Indeed, it is described in Hoare's original paper on quicksort.

    Although this algorithm runs in linear average time, it has a worst case of O (n2)(但它有一個O(N^2)的最快情況). Selection can easily be solved in O(n log n) worst-case time by sorting the elements, but for a long time it was unknown whether or not selection could be accomplished in O(n) worst-case time. The quickselect algorithm outlined in Section 7.7.6 is quite efficient in practice, so this was mostly a question of theoretical interest.

    Recall that the basic algorithm is a simple recursive strategy. Assuming that n is larger than the cutoff point where elements are simply sorted, an element v, known as the pivot, is chosen. The remaining elements are placed into two sets, S1 and S2. S1 contains elements that are guaranteed to be no larger than v, and S2 contains elements that are no smaller than v. Finally, if k <= |S1|, then the kth smallest element in S can be found by recursively computing the kth smallest element in S1. If k = |S1| + 1, then the pivot is the kth smallest element. Otherwise, the kth smallest element in S is the (k - |S1| -1 )st smallest element in S2. The main difference between this algorithm and quicksort is that there is only one subproblem to solve instead of two(這個快速選擇演算法與快速排序之間的主要區別在於,這裡求解的只有一個子問題,而不是兩個子問題)。

    定理10.9The running time of quickselect using median-of-median-of-five partitioning is O(n)。

    The basic idea is still useful. Indeed, we will see that we can use it to improve the expected number of comparisons that quickselect makes. To get a good worst case, however, the key idea is to use one more level of indirection. Instead of finding the median from a sample of random elements, we will find the median from a sample of medians.

The basic pivot selection algorithm is as follows:    1. Arrange the n elements into |_n/5_| groups of 5 elements, ignoring the (at most four) extra elements.    2. Find the median of each group. This gives a list M of |_n/5_| medians.    3. Find the median of M. Return this as the pivot, v.

    We will use the term median-of-median-of-five partitioning to describe the quickselect algorithm that uses the pivot selection rule given above. (我們將用術語“五分化中項的中項”來描述使用上面給出的樞紐元選擇法的快速選擇演算法)。We will now show that median-of-median-of-five partitioning guarantees that each recursive subproblem is at most roughly 70 percent as large as the original(現在我們要證明,“五分化中項的中項”,得保證每個遞迴子問題的大小最多為原問題的大約70%). We will also show that the pivot can be computed quickly enough to guarantee an O (n) running time for the entire selection algorithm(我們還要證明,對於整個選擇演算法,樞紐元可以足夠快的算出,以確保O(N)的執行時間。看到了沒,這再次佐證了我們的類似快速排序的partition過程的分治方法為O(N)的觀點)(更多詳細的證明,請參考:第三章、尋找最小的k個數)。

2.4、中位數的中位數,O(N)的再次論證

    以下內容來自演算法導論第九章第9.3節全部內容(最壞情況線性時間的選擇),如下(我酌情對之參考原中文版做了翻譯,下文中括號內的中文解釋,為我個人新增):

9.3 Selection in worst-case linear time(最壞情況下線性時間的選擇演算法)

    We now examine a selection algorithm whose running time is O(n) in the worst case(現在來看,一個最壞情況執行時間為O(N)的選擇演算法SELECT). Like RANDOMIZED-SELECT, the algorithm SELECT finds the desired element by recursively partitioning the input array. The idea behind the algorithm, however, is to guarantee a good split when the array is partitioned. SELECT uses the deterministic partitioning algorithm PARTITION from quicksort (see Section 7.1), modified to take the element to partition around as an input parameter(像RANDOMIZED-SELECT一樣,SELECTT通過輸入陣列的遞迴劃分來找出所求元素,但是,該演算法的基本思想是要保證對陣列的劃分是個好的劃分。SECLECT採用了取自快速排序的確定性劃分演算法partition,並做了修改,把劃分主元元素作為其引數).

    The SELECT algorithm determines the ith smallest of an input array of n > 1 elements by executing the following steps. (If n = 1, then SELECT merely returns its only input value as the ith smallest.)(演算法SELECT通過執行下列步驟來確定一個有n>1個元素的輸入陣列中的第i小的元素。(如果n=1,則SELECT返回它的唯一輸入數值作為第i個最小值。))

  1. Divide the n elements of the input array into n/5 groups of 5 elements each and at most one group made up of the remaining n mod 5 elements.
  2. Find the median of each of the n/5 groups by first insertion sorting the elements of each group (of which there are at most 5) and then picking the median from the sorted list of group elements.
  3. Use SELECT recursively to find the median x of the n/5 medians found in step 2. (If there are an even number of medians, then by our convention, x is the lower median.)
  4. Partition the input array around the median-of-medians x using the modified version of PARTITION. Let k be one more than the number of elements on the low side of the partition, so that x is the kth smallest element and there are n-k elements on the high side of the partition.(利用修改過的partition過程,按中位數的中位數x對輸入陣列進行劃分,讓k比劃低去的元素數目多1,所以,x是第k小的元素,並且有n-k個元素在劃分的高區)
  5. If i = k, then return x. Otherwise, use SELECT recursively to find the ith smallest element on the low side if i < k, or the (i - k)th smallest element on the high side if i > k.(如果要找的第i小的元素等於程式返回的k,即i=k,則返回x。否則,如果i<k,則在低區遞迴呼叫SELECT以找出第i小的元素,如果i>k,則在高區間找第(i-k)個最小元素)

(以上五個步驟,即本文上面的第四節末中所提到的所謂“五分化中項的中項”的方法。)

    To analyze the running time of SELECT, we first determine a lower bound on the number of elements that are greater than the partitioning element x. (為了分析SELECT的執行時間,先來確定大於劃分主元元素x的的元素數的一個下界)Figure 9.1 is helpful in visualizing this bookkeeping. At least half of the medians found in step 2 are greater than[1] the median-of-medians x. Thus, at least half of the n/5 groups contribute 3 elements that are greater than x, except for the one group that has fewer than 5 elements if 5 does not divide n exactly, and the one group containing x itself. Discounting these two groups, it follows that the number of elements greater than x is at least:

  

        (Figure 9.1: 對上圖的解釋或稱對SELECT演算法的分析:n個元素由小圓圈來表示,並且每一個組佔一縱列。組的中位數用白色表示,而各中位數的中位數x也被標出。(當尋找偶數數目元素的中位數時,使用下中位數)。箭頭從比較大的元素指向較小的元素,從中可以看出,在x的右邊,每一個包含5個元素的組中都有3個元素大於x,在x的左邊,每一個包含5個元素的組中有3個元素小於x。大於x的元素以陰影背景表示。 )

    Similarly, the number of elements that are less than x is at least 3n/10 - 6. Thus, in the worst case, SELECT is called recursively on at most 7n/10 + 6 elements in step 5.

    We can now develop a recurrence for the worst-case running time T(n) of the algorithm SELECT. Steps 1, 2, and 4 take O(n) time. (Step 2 consists of O(n) calls of insertion sort on sets of size O(1).) Step 3 takes time T(n/5), and step 5 takes time at most T(7n/10+ 6), assuming that T is monotonically increasing. We make the assumption, which seems unmotivated at first, that any input of 140 or fewer elements requires O(1) time; the origin of the magic constant 140 will be clear shortly. We can therefore obtain the recurrence:

         

    We show that the running time is linear by substitution. More specifically, we will show that T(n) ≤ cn for some suitably large constant c and all n > 0. We begin by assuming that T(n) ≤ cn for some suitably large constant c and all n ≤ 140; this assumption holds if c is large enough. We also pick a constant a such that the function described by the O(n) term above (which describes the non-recursive component of the running time of the algorithm) is bounded above by an for all n > 0. Substituting this inductive hypothesis into the right-hand side of the recurrence yields

T(n)

cn/5 + c(7n/10 + 6) + an

cn/5 + c + 7cn/10 + 6c + an

=

9cn/10 + 7c + an

=

cn + (-cn/10 + 7c + an) ,

which is at most cn if

              

Inequality (9.2) is equivalent to the inequality c ≥ 10a(n/(n - 70)) when n > 70. Because we assume that n ≥ 140, we have n/(n - 70) ≤ 2, and so choosing c ≥ 20a will satisfy inequality (9.2). (Note that there is nothing special about the constant 140; we could replace it by any integer strictly greater than 70 and then choose c accordingly.) The worst-case running time of SELECT is therefore linear(因此,此SELECT的最壞情況的執行時間是線性的).

    As in a comparison sort (see Section 8.1), SELECT and RANDOMIZED-SELECT determine information about the relative order of elements only by comparing elements. Recall from Chapter 8 that sorting requires (n lg n) time in the comparison model, even on average (see Problem 8-1). The linear-time sorting algorithms in Chapter 8 make assumptions about the input. In contrast, the linear-time selection algorithms in this chapter do not require any assumptions about the input. They are not subject to the (n lg n) lower bound because they manage to solve the selection problem without sorting.

(與比較排序(演算法導論8.1節)中的一樣,SELECT和RANDOMIZED-SELECT僅通過元素間的比較來確定它們之間的相對次序。在演算法導論第8章中,我們知道在比較模型中,即使在平均情況下,排序仍然要O(n*logn)的時間。第8章得線性時間排序演算法在輸入上做了假設。相反地,本節提到的此類似partition過程的SELECT演算法不需要關於輸入的任何假設,它們不受下界O(n*logn)的約束,因為它們沒有使用排序就解決了選擇問題(看到了沒,道出了此演算法的本質阿))

    Thus, the running time is linear because these algorithms do not sort; the linear-time behavior is not a result of assumptions about the input, as was the case for the sorting algorithms in Chapter 8. Sorting requires (n lg n) time in the comparison model, even on average (see Problem 8-1), and thus the method of sorting and indexing presented in the introduction to this chapter is asymptotically inefficient.(所以,本節中的選擇演算法之所以具有線性執行時間,是因為這些演算法沒有進行排序;線性時間的結論並不需要在輸入上所任何假設,即可得到。.....)

第三節、快速選擇SELECT演算法的實現

  本節,咱們將依據下圖所示的步驟,採取中位數的中位數選取樞紐元的方法來實現此SELECT演算法,

    不過,在實現之前,有個細節我還是必須要提醒你,即上文中2.2節開頭處所述,“陣列元素索引是從“0...i”開始計數的,所以第k小的元素應該是返回a[i]=a[k-1].即k-1=i。換句話就是說,第k小元素,實際上應該在陣列中對應下標為k-1”這句話,我想,你應該明白了:返回陣列中第k小的元素,實際上就是返回陣列中的元素array[i],即array[k-1]。ok,最後請看此快速選擇SELECT演算法的完整程式碼實現(據我所知,在此之前,從沒有人採取中位數的中位數選取樞紐元的方法來實現過這個SELECT演算法):

[cpp] view plain copy print?
  1. //[email protected] yansha && July && 飛羽
  2. //July、updated,2011.05.19.清晨。
  3. //版權所有,引用必須註明出處:http://blog.csdn.net/v_JULY_v。
  4. #include <iostream>
  5. #include <time.h>
  6. usingnamespace std;  
  7. constint num_array = 13;  
  8. constint num_med_array = num_array / 5 + 1;  
  9. int array[num_array];  
  10. int midian_array[num_med_array];  
  11. //氣泡排序(晚些時候將修正為插入排序)
  12. /*void insert_sort(int array[], int left, int loop_times, int compare_times) 
  13. { 
  14.     for (int i = 0; i < loop_times; i++) 
  15.     { 
  16.         for (int j = 0; j < compare_times - i; j++) 
  17.         { 
  18.             if (array[left + j] > array[left + j + 1]) 
  19.                 swap(array[left + j], array[left + j + 1]); 
  20.         } 
  21.     } 
  22. }*/
  23. /* 
  24. //插入排序演算法虛擬碼 
  25. INSERTION-SORT(A)                              cost    times 
  26. 1  for j ← 2 to length[A]                      c1      n 
  27. 2       do key ← A[j]                          c2      n - 1 
  28. 3          Insert A[j] into the sorted sequence A[1 ‥ j - 1].     0...n - 1 
  29. 4          i ← j - 1                           c4      n - 1 
  30. 5          while i > 0 and A[i] > key           c5       
  31. 6             do A[i + 1] ← A[i]               c6       
  32. 7             i ← i - 1                        c7       
  33. 8          A[i + 1] ← key                      c8      n - 1 
  34. */
  35. //已修正為插入排序,如下:
  36. void insert_sort(int array[], int left, int loop_times)  
  37. {  
  38.     for (int j = left; j < left+loop_times; j++)  
  39.     {  
  40.         int key = array[j];  
  41.         int i = j-1;  
  42.         while ( i>left && array[i]>key )  
  43.         {  
  44.             array[i+1] = array[i];  
  45.             i--;  
  46.         }  
  47.         array[i+1] = key;  
  48.     }  
  49. }  
  50. int find_median(int array[], int left, int right)  
  51. {  
  52.     if (left == right)  
  53.         return array[left];  
  54.     int index;  
  55.     for (index = left; index < right - 5; index += 5)  
  56.     {  
  57.         insert_sort(array, index, 4);  
  58.         int num = index - left;  
  59.         midian_array[num / 5] = array[index + 2];  
  60.     }  
  61.     // 處理剩餘元素
  62.     int remain_num = right - index + 1;  
  63.     if (remain_num > 0)  
  64.     {  
  65.         insert_sort(array, index, remain_num - 1);  
  66.         int num = index - left;  
  67.         midian_array[num / 5] = array[index + remain_num / 2];  
  68.     }  
  69.     int elem_aux_array = (right - left) / 5 - 1;  
  70.     if ((right - left) % 5 != 0)  
  71.         elem_aux_array++;  
  72.     // 如果剩餘一個元素返回,否則繼續遞迴
  73.     if (elem_aux_array == 0)  
  74.         return midian_array[0];  
  75.     else
  76.         return find_median(midian_array, 0, elem_aux_array);  
  77. }  
  78. // 尋找中位數的所在位置
  79. int find_index(int array[], int left, int right, int median)  
  80. {  
  81.     for (int i = left; i <= right; i++)  
  82.     {  
  83.         if (array[i] == median)  
  84.             return i;  
  85.     }  
  86.     return -1;  
  87. }  
  88. int q_select(int array[], int left, int right, int k)  
  89. {  
  90.     // 尋找中位數的中位數
  91.     int median = find_median(array, left, right);  
  92.     // 將中位數的中位數與最右元素交換
  93.     int index = find_index(array, left, right, median);  
  94.     swap(array[index], array[right]);  
  95.     int pivot = array[right];  
  96.     // 申請兩個移動指標並初始化
  97.     int i = left;   
  98.     int j = right - 1;    
  99.     // 根據樞紐元素的值對陣列進行一次劃分
  100.     while (true)  
  101.     {    
  102.         while(array[i] < pivot)  
  103.             i++;  
  104.         while(array[j] > pivot)  
  105.             j--;  
  106.         if (i < j)   
  107.             swap(array[i], array[j]);   
  108.         else
  109.             break;     
  110.     }  
  111.     swap(array[i], array[right]);   
  112.     /* 對三種情況進行處理:(m = i - left + 1) 
  113.     1、如果m=k,即返回的主元即為我們要找的第k小的元素,那麼直接返回主元a[i]即可; 
  114.     2、如果m>k,那麼接下來要到低區間A[0....m-1]中尋找,丟掉高區間; 
  115.     3、如果m<k,那麼接下來要到高區間A[m+1...n-1]中尋找,丟掉低區間。 
  116.     */
  117.     int m = i - left + 1;      
  118.     if (m == k)  
  119.         return array[i];  
  120.     elseif(m > k)    
  121.         //上條語句相當於if( (i-left+1) >k),即if( (i-left) > k-1 ),於此就與2.2節裡的程式碼實現一、二相對應起來了。
  122.         return q_select(array, left, i - 1, k);    
  123.     else
  124.         return q_select(array, i + 1, right, k - m);  
  125. }  
  126. int main()  
  127. {  
  128.     //srand(unsigned(time(NULL)));
  129.     //for (int j = 0; j < num_array; j++)
  130.     //array[j] = rand();
  131.     int array[num_array]={0,45,78,55,47,4,1,2,7,8,96,36,45};  
  132.     // 尋找第k最小數
  133.     int k = 4;  
  134.     int i = q_select(array, 0, num_array - 1, k);  
  135.     cout << i << endl;  
  136.     return 0;  
  137. }  
//[email protected] yansha && July && 飛羽//July、updated,2011.05.19.清晨。//版權所有,引用必須註明出處:http://blog.csdn.net/v_JULY_v。#include <iostream>#include <time.h>using namespace std;const int num_array = 13;const int num_med_array = num_array / 5 + 1;int array[num_array];int midian_array[num_med_array];//氣泡排序(晚些時候將修正為插入排序)/*void insert_sort(int array[], int left, int loop_times, int compare_times){ for (int i = 0; i < loop_times; i++) {  for (int j = 0; j < compare_times - i; j++)  {   if (array[left + j] > array[left + j + 1])    swap(array[left + j], array[left + j + 1]);  } }}*//*//插入排序演算法虛擬碼INSERTION-SORT(A)                              cost    times1  for j ← 2 to length[A]                      c1      n2       do key ← A[j]