1. 程式人生 > >二分查詢-兩已排序陣列中找中位數二題

二分查詢-兩已排序陣列中找中位數二題

第一題來自於《演算法導論》第九章習題 9.3-8. 已知兩個已排序陣列X[n], Y[n](假設升序),問在時間O(lgn)內找到全部2n個數中的中位數。

給了提示時間O(lgn),那麼必定使用二分查詢。這道題的“梗”在於如何處理兩個已排序陣列。我們有總共2n個數,偶數,那麼全域性的中位數來自於兩個數a,b,在排好序的2n個數中,它們以上(比它們大)和以下(比它們小)應該各有n-1個數。令a > b,假設a==X[i], 即陣列X 中,a以下有i個數小於a。對於b,有兩個可能來源,X[i-1] 或者Y[n-1-i],這裡Y[n-1-i]是陣列Y中比a小的數中最大的一個。

綜上所述,我們的二分查詢邏輯如下:首先在陣列X中對於區間[v,u), 由i=(v+u)/2尋找X[i] ,對於j=n-1-i, 檢查 if Y[j+1] > X[i] && Y[j] < X[i],否則通過增大或者縮小i來反向改變j

。如果這樣的a不在X[]中,則對於陣列Y重複以上處理。

實現程式碼如下,注意陣列邊界情況,比如i==0 或者i==n-1

bool findmediansingle(int* A, int* B, int n, double& res){ // find total median in array A
    int u=n, v=0;
    while(v<u){ //[v,u)
        int i = (v+u)/2;
        int j = n-1-i; //index in B[]
        if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array
            if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid
                int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle
                int high = B[j];
                if(i<n-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j]
                    high = A[i+1];
                }
                res = (double)(low + high)/2;
                return true;
            }else{
                v=i+1; //enlarge i to reduce j
                continue;
            }
        }else{ //A[i] > B[j]
            if(j==n-1 || A[i] <= B[j+1]){
                int high = A[i]; //now we fix A[i] as ceil of the two middle
                int low = B[j];
                if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j]
                    low = A[i-1];
                }
                res = (double)(low+high)/2;
                return true;
            }else{
                u = i;
                continue;
            }
        }
    }
    return false;
}
double findmedian(int* A, int* B, int n){
    double res = 0.0;
    if(!findmediansingle(A, B, n, res)){
        findmediansingle(B, A, n, res);
    }
    return res;
}

測試資料:

{1}, {1]

{1,2}, {3,4}

{1,3}, {2,4}

{1,4}, {2,3]

-------------------------------------------------我是分割線------------------------------------------

很自然,第二題就是長度不同的兩個陣列,A[m]和B[n],求全部m+n個數中的中位數。 演算法原理一樣,我們需要看看會多出哪些邊界情況。

首先,m+n可能是奇數,這樣中位數就是所有數中第(m+n)/2個,比它小的數有(m+n)/2個,直接找到它,不需要考慮下邊界。

其次,在m==n情況下,對於j=n-1-i, 由於i屬於區間[0,n),則j也必定屬於區間[0,n),即此時j的取值一定是合法的。但是對於m>>n, 對於j=mid-1-i (mid = (m+n)/2),此時完全有可能出現j<0 或者j>n-1的陣列越界情況,所以這裡我們需要特別小心j的處理。

具體程式碼實現如下:

bool findmediansingle(int *A, int m, int *B, int n, double& res, int tag){
    if(m==0 && n==0){
        return false;
    }else if(n==0){
        res = tag==1 ? A[m/2] : (double)(A[m/2] + A[m/2 - 1])/2;
        return true;
    }else if(m==0){
        res = tag==1 ? B[n/2] : (double)(B[n/2 -1] + B[n/2])/2;
        return true;
    }

    int u=m, v=0, i=0;
    int mid = (m+n)/2;
    if(tag==0){
        while(v<u){ //[v,u)
            i = (v+u)/2;
            int j=(mid-1-i); //index in B[]
            if(j<-1){
                u=i; //reduce k to enlarge j
            }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid
                if(B[0] >= A[i]){ //B[0] is above A[i], out of mid elements below A[i], A[i] is floor of the middle two
                    int high = A[i];
                    int low = i>0 ? A[i-1] : 0;
                    res = (double)(high + low)/2;
                    return true;
                }
                break;
            }else if(j>n-1){ //enlarge k to reduce j
                v=i+1;
            }else if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array
                if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid
                    int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle
                    int high = B[j];
                    if(i<m-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j]
                        high = A[i+1];
                    }
                    res = (double)(low + high)/2;
                    return true;
                }else{
                    v=i+1; //enlarge i to reduce j
                    continue;
                }
            }else{ //A[i] > B[j]
                if(j==n-1 || A[i] <= B[j+1]){
                    int high = A[i]; //now we fix A[i] as ceil of the two middle
                    int low = B[j];
                    if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j]
                        low = A[i-1];
                    }
                    res = (double)(low+high)/2;
                    return true;
                }else{
                    u = i;
                    continue;
                }
            }
        }
        return false;
    }else{
        while(v<u){ //[v,u)
            i = (v+u)/2;
            int j=(mid-1-i); //index in B[]
            if(j<-1){
                u=i; //reduce k to enlarge j
            }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid
                if(B[0] >= A[i]){ //B[0] is out of the mid elements below A[i], A[i] is the middle of all
                    res = (char)A[i];
                    return true;
                }
                break;
            }else if(j>n-1){ //enlarge k to reduce j
                v=i+1;
            }else if(A[i] >= B[j]){ //1.A[i] and B[j] are middle two; 2.max(A[i], B[j]) is middle of all
                if(j==n-1 || A[i] <= B[j+1]){
                    res = (double)A[i];
                    return true;
                }else{
                    u=i; //reduce i
                }
            }else{
                if(i==m-1 || B[j] <= A[i+1]){
                    res = (double)B[j]; //we need max(A[i], B[j])
                    return true;
                }else{
                    v=i+1; //enlarge i
                }
            }
        }
        return false;
    }
}

double findmedian(int* A, int m, int* B, int n){
    double res = 0.0;
    int tag=0;
    if((m+n)%2 == 1)
      tag=1; //1 for odd total count, 0 for even
    if(!findmediansingle(A,m,B,n,res,tag)){
        findmediansingle(B,n,A,m,res,tag);
    }
    return res;
}

測試資料:

{1,3,5,8}, {2,4,6,7}

{1,1}, {2,3,4,5}

{}, {1,2,3,4}

{1,2,3}, {}

{1}, {1}

{1,2,3,5,6}, {4}

{1}, {2,3,4}

{1}, {2,3,4,5}

{1,2,3,4}, {5}

{2}, {1,3,4}

{3}, {1,2,4}

{1,2,3}, {4,5}

{6,7,8}, {1,2,3,4,5}

小結

這兩道題使用二分查詢的思路比較明顯,這裡需要強調的是:想要寫出bug-free的程式碼,必須事先設計出所有完整、全覆蓋的測試案例(資料),跟據這些測試案例,再逐步覆蓋到所有演算法分支。