1. 程式人生 > >無聊寫排序之 ----部分排序(Partial Sort)

無聊寫排序之 ----部分排序(Partial Sort)

當有一個無序的序列集合的時候,我們想知道這個序列裡面按照某種排序關係最大的m個或者前top個有序的元素。比如我又100個學生,我只想知道排名前20的學生的名次列表,剩餘的我並不關心,如何去得到呢? 當然你腦海中第一個閃過的便是sort,做一次排序,取排序後前面的20不就好了嗎? 沒錯,排序作為做常規的方法,肯定是最先想到的,這裡要介紹的是比排序來的更快更直接的一個演算法:部分排序(partial_sort),該演算法來自於STL的演算法庫,在研究STL原始碼時看到的,瞬間眼前一亮,這裡分享出來。

partial_sort演算法接受一個middle的index,該middle位於[first, last)的元素序列範圍內,然後重新安排[first, last),使得序列中的middle-first個最小元素以指定順序排序最終放置在[first, middle)中, 其餘的元素安置在[middle, last)內,不保證有任何指定的順序。因此可以看出來partial_sort執行後並不保證所有的結果都有序,而有序的部分數量永遠都小於等於整個元素區間的數量。所以在只是挑出前m個元素的排序中,效率明顯要高於全排序的sort演算法,當然m越小效率越高,m等於n時相當於全排序了。

partial_sort的原理:部分排序的原型出現在STL的演算法庫裡面,根據其所描述的程式碼,很容易可以看出來partial_sort是借用了堆排序的思想來作為底層排序實現的。對於該演算法的原理這樣描述。假設我們有n個元素序列,需要找到其中最小的m個元素,m<=n時。 先界定區間[first, m) 然後對該區間使用make_heap()來組織成一個大頂堆。然後遍歷剩餘區間[m, last)中的元素, 剩餘區間的每個元素均與大頂堆的堆頂元素進行比較(大頂堆的堆頂元素為最大元素,該元素為第一個元素,很容易獲得),若堆頂元素較小,邊交換堆頂元素和遍歷得到的元素值,重新調整該大頂堆以維持該堆為大頂堆。遍歷結束後,[first, m)區間內的元素便是排名在前的m個元素,在對該堆做一次堆排序便可得到最好的結果。

                 

演算法使用演示如下:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;

int main()
{
	vector<int> vc;
	for (int i = 0; i < 10; i++)
	{
		vc.push_back(rand()%100);
	}

 	for (int i = 0; i < vc.size(); i++)
		cout << vc[i] << " ";
 	cout << endl;

	partial_sort(vc.begin(), vc.begin()+4, vc.end());

	for (int i = 0; i < vc.size(); i++)
		cout << vc[i] << " ";
	cout << endl;

	return 0;
}
執行結果:


STL原始碼:

template <class RandomAccessIterator>
inline void partial_sort(RandomAccessIterator first,
	RandomAccessIterator middle,
	RandomAccessIterator last) {
		__partial_sort(first, middle, last, value_type(first));
}

template <class RandomAccessIterator, class T>
void __partial_sort(RandomAccessIterator first, RandomAccessIterator middle,
	RandomAccessIterator last, T*) {
		make_heap(first, middle); //將區間[first, middle)構造為一個堆結構
		for (RandomAccessIterator i = middle; i < last; ++i)
			if (*i < *first)    // 遍歷堆以外的元素,並將更優的元素放入堆中
				__pop_heap(first, middle, i, T(*i), distance_type(first));
		sort_heap(first, middle); // 對最終的堆進行排序
}

heap原始碼:
<span style="font-size:12px;">template <class RandomAccessIterator>
inline void partial_sort(RandomAccessIterator first,
	RandomAccessIterator middle,
	RandomAccessIterator last) {
		__partial_sort(first, middle, last, value_type(first));
}

template <class RandomAccessIterator, class T>
void __partial_sort(RandomAccessIterator first, RandomAccessIterator middle,
	RandomAccessIterator last, T*) {
		make_heap(first, middle); //將區間[first, middle)構造為一個堆結構
		for (RandomAccessIterator i = middle; i < last; ++i)
			if (*i < *first)    // 遍歷堆以外的元素,並將更優的元素放入堆中
				__pop_heap(first, middle, i, T(*i), distance_type(first));
		sort_heap(first, middle); // 對最終的堆進行排序
}

template <class RandomAccessIterator>
inline void make_heap(RandomAccessIterator first, RandomAccessIterator last) {
	__make_heap(first, last, value_type(first), distance_type(first));
}

template <class RandomAccessIterator, class T, class Distance>
void __make_heap(RandomAccessIterator first, RandomAccessIterator last, T*,
	Distance*) {
		if (last - first < 2) return;	
		Distance len = last - first;
		Distance parent = (len - 2)/2; 

		while (true) {
			__adjust_heap(first, parent, len, T(*(first + parent)));
			if (parent == 0) return;	
			parent--;				
		}
}

template <class RandomAccessIterator, class Distance, class T>
void __adjust_heap(RandomAccessIterator first, Distance holeIndex,
	Distance len, T value) {
		Distance topIndex = holeIndex;
		Distance secondChild = 2 * holeIndex + 2;	
		while (secondChild < len) {
			if (*(first + secondChild) < *(first + (secondChild - 1)))
				secondChild--;   
			
			*(first + holeIndex) = *(first + secondChild);  
			holeIndex = secondChild;
			secondChild = 2 * (secondChild + 1);
		}
		if (secondChild == len) { 
			*(first + holeIndex) = *(first + (secondChild - 1));
			holeIndex = secondChild - 1;
		}
		__push_heap(first, holeIndex, topIndex, value);
}

template <class RandomAccessIterator, class Distance, class T>
void __push_heap(RandomAccessIterator first, Distance holeIndex,
	Distance topIndex, T value) {
		Distance parent = (holeIndex - 1) / 2;	
		while (holeIndex > topIndex && *(first + parent) < value) {
			*(first + holeIndex) = *(first + parent);	
			holeIndex = parent; 
			parent = (holeIndex - 1) / 2;	
		}  
		*(first + holeIndex) = value;	
}

template <class RandomAccessIterator>
inline void pop_heap(RandomAccessIterator first, RandomAccessIterator last) {
	__pop_heap_aux(first, last, value_type(first));
}

template <class RandomAccessIterator, class T>
inline void __pop_heap_aux(RandomAccessIterator first,
	RandomAccessIterator last, T*) {
		__pop_heap(first, last-1, last-1, T(*(last-1)), distance_type(first));
}

template <class RandomAccessIterator, class T, class Distance>
inline void __pop_heap(RandomAccessIterator first, RandomAccessIterator last,
	RandomAccessIterator result, T value, Distance*) {
		*result = *first;
		__adjust_heap(first, Distance(0), Distance(last - first), value);
}
</span>