二分查詢-兩已排序陣列中找中位數二題
第一題來自於《演算法導論》第九章習題 9.3-8. 已知兩個已排序陣列X[n], Y[n](假設升序),問在時間O(lgn)內找到全部2n個數中的中位數。
給了提示時間O(lgn),那麼必定使用二分查詢。這道題的“梗”在於如何處理兩個已排序陣列。我們有總共2n個數,偶數,那麼全域性的中位數來自於兩個數a,b,在排好序的2n個數中,它們以上(比它們大)和以下(比它們小)應該各有n-1個數。令a > b,假設a==X[i], 即陣列X 中,a以下有i個數小於a。對於b,有兩個可能來源,X[i-1] 或者Y[n-1-i],這裡Y[n-1-i]是陣列Y中比a小的數中最大的一個。
綜上所述,我們的二分查詢邏輯如下:首先在陣列X中對於區間[v,u), 由i=(v+u)/2尋找X[i] ,對於j=n-1-i, 檢查 if Y[j+1] > X[i] && Y[j] < X[i],否則通過增大或者縮小i來反向改變j
實現程式碼如下,注意陣列邊界情況,比如i==0 或者i==n-1
bool findmediansingle(int* A, int* B, int n, double& res){ // find total median in array A int u=n, v=0; while(v<u){ //[v,u) int i = (v+u)/2; int j = n-1-i; //index in B[] if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle int high = B[j]; if(i<n-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j] high = A[i+1]; } res = (double)(low + high)/2; return true; }else{ v=i+1; //enlarge i to reduce j continue; } }else{ //A[i] > B[j] if(j==n-1 || A[i] <= B[j+1]){ int high = A[i]; //now we fix A[i] as ceil of the two middle int low = B[j]; if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j] low = A[i-1]; } res = (double)(low+high)/2; return true; }else{ u = i; continue; } } } return false; }
double findmedian(int* A, int* B, int n){
double res = 0.0;
if(!findmediansingle(A, B, n, res)){
findmediansingle(B, A, n, res);
}
return res;
}
測試資料:
{1}, {1]
{1,2}, {3,4}
{1,3}, {2,4}
{1,4}, {2,3]
-------------------------------------------------我是分割線------------------------------------------
很自然,第二題就是長度不同的兩個陣列,A[m]和B[n],求全部m+n個數中的中位數。 演算法原理一樣,我們需要看看會多出哪些邊界情況。
首先,m+n可能是奇數,這樣中位數就是所有數中第(m+n)/2個,比它小的數有(m+n)/2個,直接找到它,不需要考慮下邊界。
其次,在m==n情況下,對於j=n-1-i, 由於i屬於區間[0,n),則j也必定屬於區間[0,n),即此時j的取值一定是合法的。但是對於m>>n, 對於j=mid-1-i (mid = (m+n)/2),此時完全有可能出現j<0 或者j>n-1的陣列越界情況,所以這裡我們需要特別小心j的處理。
具體程式碼實現如下:
bool findmediansingle(int *A, int m, int *B, int n, double& res, int tag){ if(m==0 && n==0){ return false; }else if(n==0){ res = tag==1 ? A[m/2] : (double)(A[m/2] + A[m/2 - 1])/2; return true; }else if(m==0){ res = tag==1 ? B[n/2] : (double)(B[n/2 -1] + B[n/2])/2; return true; } int u=m, v=0, i=0; int mid = (m+n)/2; if(tag==0){ while(v<u){ //[v,u) i = (v+u)/2; int j=(mid-1-i); //index in B[] if(j<-1){ u=i; //reduce k to enlarge j }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid if(B[0] >= A[i]){ //B[0] is above A[i], out of mid elements below A[i], A[i] is floor of the middle two int high = A[i]; int low = i>0 ? A[i-1] : 0; res = (double)(high + low)/2; return true; } break; }else if(j>n-1){ //enlarge k to reduce j v=i+1; }else if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle int high = B[j]; if(i<m-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j] high = A[i+1]; } res = (double)(low + high)/2; return true; }else{ v=i+1; //enlarge i to reduce j continue; } }else{ //A[i] > B[j] if(j==n-1 || A[i] <= B[j+1]){ int high = A[i]; //now we fix A[i] as ceil of the two middle int low = B[j]; if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j] low = A[i-1]; } res = (double)(low+high)/2; return true; }else{ u = i; continue; } } } return false; }else{ while(v<u){ //[v,u) i = (v+u)/2; int j=(mid-1-i); //index in B[] if(j<-1){ u=i; //reduce k to enlarge j }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid if(B[0] >= A[i]){ //B[0] is out of the mid elements below A[i], A[i] is the middle of all res = (char)A[i]; return true; } break; }else if(j>n-1){ //enlarge k to reduce j v=i+1; }else if(A[i] >= B[j]){ //1.A[i] and B[j] are middle two; 2.max(A[i], B[j]) is middle of all if(j==n-1 || A[i] <= B[j+1]){ res = (double)A[i]; return true; }else{ u=i; //reduce i } }else{ if(i==m-1 || B[j] <= A[i+1]){ res = (double)B[j]; //we need max(A[i], B[j]) return true; }else{ v=i+1; //enlarge i } } } return false; } } double findmedian(int* A, int m, int* B, int n){ double res = 0.0; int tag=0; if((m+n)%2 == 1) tag=1; //1 for odd total count, 0 for even if(!findmediansingle(A,m,B,n,res,tag)){ findmediansingle(B,n,A,m,res,tag); } return res; }
測試資料:
{1,3,5,8}, {2,4,6,7}
{1,1}, {2,3,4,5}
{}, {1,2,3,4}
{1,2,3}, {}
{1}, {1}
{1,2,3,5,6}, {4}
{1}, {2,3,4}
{1}, {2,3,4,5}
{1,2,3,4}, {5}
{2}, {1,3,4}
{3}, {1,2,4}
{1,2,3}, {4,5}
{6,7,8}, {1,2,3,4,5}
小結:
這兩道題使用二分查詢的思路比較明顯,這裡需要強調的是:想要寫出bug-free的程式碼,必須事先設計出所有完整、全覆蓋的測試案例(資料),跟據這些測試案例,再逐步覆蓋到所有演算法分支。