面試題：出現次數的Top K問題

阿新 • • 發佈：2019-01-01

題目：
給定String型別的陣列strArr，再給定整數k，請嚴格按照排名順序打印出現次數前k名的字串。
舉例：
strArr=[“1”,”2”,”3”,”4”], k=2
No.1:1,times:1
No.1:2,times:1
這種情況下，所有的字串都出現一樣多，隨便列印任何兩個字串都可以。
strArr=[“1”,”1”,”2”,”3”], k=2
No.1:1,times:2
No.1:2,times:1
或則
No.1:1,times:2
No.1:3,times:1
要求：如果strArr長度為N，時間複雜度請達到O(Nlogk)

解答：
首先遍歷strArr並統計字串的詞頻，例如，strArr=[“a”,”b”,”b”,”a”,”c”],遍歷後可以生成每種字串及其相關詞頻的雜湊表：
用雜湊表的每條資訊可以生成Node類的例項，Node類如下：

public class Node{
public String str;
public int times;

public Node(String s,int t){
    str=s;
    times=t;
    }
}

雜湊表中有多少資訊，就建立多少Node類的例項，並且依次放入堆中，具體過程為：
(1)建立一個大小為k的小根堆，這個堆放入的是Node類的例項。
(2)遍歷雜湊表的每條記錄，假設一條記錄為(s,t),s表示一種字串，s的詞頻為t，則生成Node類的例項，記為(str,times)。
a、如果小根堆沒有滿，就直接將(str,times)加入堆，然後進行建堆調整(heapInsert調整)，堆中Node類例項之間都以詞頻(times)來進行比較，詞頻越小，位置越往上。
b、如果小根堆已滿，說明此時小根堆已經選出k個最高詞頻的字串，那麼整個小根堆的堆頂自然代表已經選出的k個最高詞頻的字串中，詞頻最低的那個。堆頂的元素記為(headStr,minTimes)。如果minTimes小於times,說明字串str有資格進入當前k個最高詞頻字串的範圍。而headStr應該被移出這個範圍，所以把當前的堆頂(headStr,minTimes)替換成(str,times),然後從堆頂的位置進行堆的調整(heapify),如果minTimes>=times，說明字串str沒有資格進入當前k個最高詞頻字串的範圍，因為str的詞頻還不如目前選出的k個最高詞頻字串中詞頻最少的那個，所以說明也不做。
c、遍歷完strArr之後，小根堆裡就是所有字串中k個最高詞頻的字串，但要求嚴格按排名列印，所以還需要根據詞頻從大到小完成k個元素間的排序。
遍歷strArr建立雜湊表的過程是O(N)，雜湊表中記錄的條數最多為N條，每一條記錄進堆時，堆的調整時間複雜度為O(logk)，所以根據記錄更新小根堆的過程為O(Nlogk)。k條記錄排序的時間複雜度為O(klogk)，所以總的時間複雜度為O(N)+O(Nlogk)+O(klogk)，即O(Nlogk),具體程式碼如下：

package QuestionTest;

import java.util.HashMap;
import java.util.Map;

/**
 * Created by L_kanglin on 2017/4/23.
 * 出現次數的top K問題
 */
public class Test20 {
    public static  class Node{
        public String str;
        public int times;
        public Node(String s,int t){
            str=s;
            times=t;
        }
    }
    public 
 static void main(String[] args){
        String[] strArr={"a","b","b","a","c"};
        int k=2;
        printTopKAndRank(strArr,k);
    }
    public static void printTopKAndRank(String[] arr,int topK){
        if(arr==null|| topK<1){
            return;
        }
        HashMap<String,Integer> map=new HashMap<String,Integer>();
        //生成雜湊表（字串詞頻）
        //注意詞頻表的處理
        for(int i=0;i!=arr.length;i++){
            String cur = arr[i];
            if(!map.containsKey(cur)){
                map.put(cur,1);
            }else{
                map.put(cur,map.get(cur)+1);
            }
        }
        Node[] heap =new Node[topK];
        int index=0;
        for(Map.Entry<String,Integer> entry:map.entrySet()){
            String str=entry.getKey();
            int times=entry.getValue();
            Node node=new Node(str,times);
            if(index!=topK){
                heap[index]=node;
                heapInsert(heap,index++);
            }else{
                if(heap[0].times<node.times){
                    heap[0]=node;
                    heapify(heap,0,topK);
                }
            }
        }
        //把小根堆的所有元素按詞頻從大到小排序
        for(int i=index-1;i!=0;i--){
            swap(heap,0,i);
            heapify(heap,0,i);
        }
        //嚴格按照排名列印k條記錄
        for(int i=0;i!=heap.length;i++){
            if(heap[i]==null){
                break;
            }else{
                System.out.print("No."+(i+1)+": ");
                System.out.print(heap[i].str+",times: ");
                System.out.println(heap[i].times);
            }

        }
    }
    public static void heapInsert(Node[] heap,int index){
        while(index!=0){
            int parent =(index-1)/2;
            if(heap[index].times<heap[parent].times){
                swap(heap,parent,index);
                index=parent;
            }else{
                break;
            }
        }
    }
    public static void heapify(Node[] heap,int index,int heapSize){
        int left=index*2+1;
        int right=index*2+2;
        int smallest=index;
        while(left<heapSize){
            if(heap[left].times<heap[index].times){
                smallest=left;
            }
            if(right<heapSize && heap[right].times<heap[smallest].times){
                smallest=right;
            }else{
                break;
            }
            index=smallest;
            left=index*2+1;
            right=index*2+1;
        }
    }
    public static void swap(Node[] heap,int index1,int index2){
        Node tmp=heap[index1];
        heap[index1]=heap[index2];
        heap[index2]=tmp;

    }
}

執行結果如下：

No.1: b,times: 2
No.2: a,times: 2

面試題：出現次數的Top K問題

面試題：出現次數的Top K問題

連結串列面試題：返回倒數第k個節點

面試題：數組中出現次數超過一半的數字

面試題：字串出現次數最多的字母和次數（基礎思路篇）

百度面試題：找出陣列中出現次數超過一半的數

C#面試題：字串中字元出現的次數

劍指Offer面試題：32.數字在排序陣列中出現的次數

go語言面試題：輸入一段英文字串，找出重複出現次數最多的字母

微軟面試題： LeetCode 91. 解碼方法出現次數：3

微軟面試題： LeetCode 151. 翻轉字串裡的單詞出現次數：6

面試題：數組中只出現一次的數字

面試題：二叉搜索樹的第K個節點

面試題：找出無序陣列中出現頻率最高的元素

面試題：陣列中唯一隻出現一次的數字

《劍指Offer》面試題：找出陣列中有3個出現一次的數字

資料結構經典面試題：在字串中找到出現頻率大於50%的那個字元

11. 微軟面試題：輸入一個單向連結串列，輸出該連結串列中倒數第k個結點。連結串列的倒數第0個結點為連結串列的尾指標

劍指Offer面試題：14.連結串列的倒數第k個節點

劍指Offer面試題：30.第一個只出現一次的字元

面試題：請描述K-means的原理，說明選擇聚類中心的方法引發的回顧

面試題：出現次數的Top K問題

相關推薦