1. 程式人生 > >概率論中高斯分佈(正態分佈)介紹及C++11中std::normal_distribution的使用

概率論中高斯分佈(正態分佈)介紹及C++11中std::normal_distribution的使用

高斯分佈:最常用的分佈是正態分佈(normal distribution),也稱為高斯分佈(Gaussian distribution):

正態分佈N(x;μ,σ2)呈現經典的”鐘形曲線”的形狀,其中中心峰的x座標由μ給出,峰的寬度受σ控制。

正態分佈由兩個引數控制,μ∈R和σ∈(0,∞)。引數μ給出了中心峰值的座標,這也是分佈的均值:E[x]= μ。分佈的標準差用σ表示,方差用σ2表示。

當我們要對概率密度函式求值時,我們需要對σ平方並且取倒數。當我們需要經常對不同引數下的概率密度函式求值時,一種更高效的引數化分佈的方式是使用引數β∈(0,∞),來控制分佈的精度(precision)(或方差的倒數):

採用正態分佈在很多應用中都是一個明智的選擇。當我們由於缺乏關於某個實數上分佈的先驗知識而不知道該選擇怎樣的形式時,正態分佈是預設的比較好的選擇,其中有兩個原因:

(1)、我們想要建模的很多分佈的真實情況是比較接近正態分佈的。中心極限定理(central limit theorem)說明很多獨立隨機變數的和近似服從正態分佈。這意味著在實際中,很多複雜系統都可以被成功地建模成正態分佈的噪聲,即使系統可以被分解成一些更結構化的部分。

(2)、在具有相同方差的所有可能的概率分佈中,正態分佈在實數上具有最大的不確定性。因此,我們可以認為正態分佈是對模型加入的先驗知識量最少的分佈。

正態分佈可以推廣到Rn空間,這種情況下被稱為多維正態分佈(multivariate normal distribution)。它的引數是一個正定對稱矩陣∑:

引數μ仍然表示分佈的均值,只不過現在是向量值。引數∑給出了分佈的協方差矩陣。和單變數的情況類似,當我們希望對很多不同引數下的概率密度函式多次求值時,協方差矩陣並不是一個很高效的引數化分佈的方式,因為對概率密度函式求值時需要對∑求逆。我們可以使用一個精度矩陣(precision matrix)β進行替代:

我們常常把協方差矩陣固定成一個對角陣。一個更簡單的版本是各向同性(isotropic)高斯分佈,它的協方差矩陣是一個標量乘以單位陣。

正態分佈(normal distribution):又名高斯分佈(Gaussian distribution,以德國數學家卡爾·弗里德里希·高斯的姓冠名),是一個在數學、物理及工程等領域都非常重要的概率分佈,由於這個分佈函式具有很大非常漂亮的性質,使得其在諸多涉及統計科學離散科學等領域的許多方面都有著重大的影響力。比如影象處理中最常用的濾波器型別為Gaussian濾波器(也就是所謂的正態分佈函式)。

若隨機變數X服從一個位置引數為μ、尺度引數為σ的概率分佈,記為:X∽N(μ,σ2),則其概率密度函式為:

正態分佈的數學期望值或期望值μ等於位置引數,決定了分佈的位置;其方差σ2的開平方或標準差σ等於尺度引數,決定了分佈的幅度。

正態分佈的概率密度函式曲線呈鐘形,因此人們又經常稱之為鐘形曲線。我們通常所說的標準正態分佈是位置引數μ=0,尺度引數σ2=1的正態分佈。

以下是對C++11中的正態分佈模板類std::normal_distribution的介紹:

C++11中引入了正態分佈模板類std::normal_distribution,在標頭檔案<random>中。

正態分佈(normal distribution)又名高斯分佈(Gaussian distribution)。若隨機變數X服從一個數學期望為μ、方差為σ2的高斯分佈,記為N(μ, σ2)。其概率密度函式為正態分佈的期望值μ決定了其位置,其標準差σ決定了分佈的幅度。通常所說的標準正態分佈是μ=0, σ=1的正態分佈。

std::normal_distribution:Normal distribution, Random number distribution that produces floating-point values according to a normal distribution, which is described by the following probability density function:

This distribution produces random numbers around the distribution mean (μ) with a specific standard deviation (σ).

The normal distribution is a common distribution used for many kind of processes,since it is the distribution that the aggregation of a large number of independent random variables approximates to, when all follow the same distribution (no matter which distribution).

The distribution parameters, mean (μ) and stddev (σ), are set on construction. To produce a random value following this distribution, call its member function operator().

 normal distribution represents an unbounded distribution.

以下是std::normal_distribution的測試code:

#include "normal_distribution.hpp"
#include <iostream>
#include <random>
#include <string>
#include <chrono>
#include <map>
#include <iomanip>

///////////////////////////////////////////////////////////////
// reference: http://www.cplusplus.com/reference/random/normal_distribution/
int test_normal_distribution_1()
{
{
	const int nrolls = 10000;  // number of experiments
	const int nstars = 100;    // maximum number of stars to distribute

	std::default_random_engine generator;
	std::normal_distribution<double> distribution(5.0, 2.0);

	int p[10] = {};

	for (int i = 0; i<nrolls; ++i) {
		double number = distribution(generator);
		if ((number >= 0.0) && (number<10.0)) ++p[int(number)];
	}

	std::cout << "normal_distribution (5.0,2.0):" << std::endl;

	for (int i = 0; i<10; ++i) {
		std::cout << i << "-" << (i + 1) << ": ";
		std::cout << std::string(p[i] * nstars / nrolls, '*') << std::endl;
	}
}

{ // (1)、normal_distribution::normal_distribution: Constructs a normal_distribution object,
//   adopting the distribution parameters specified either by mean and stddev or by object parm.
//   (2)、normal_distribution::max: Returns the least upper bound of the range of values potentially returned by member operator().
//   (3)、normal_distribution::min: Returns the greatest lower bound of the range of values potentially returned by member operator().
//   (4)、normal_distribution::mean: Returns the mean(μ)parameter associated with the normal_distribution object
//   (5)、normal_distribution::stddev: Returns the standard deviation (σ) associated with the normal_distribution object
//   (6)、normal_distribution::operator(): Returns a new random number that follows the distribution's parameters associated
//    to the object (version 1) or those specified by parm (version 2).

	// construct a trivial random generator engine from a time-based seed:
	unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
	std::default_random_engine generator(seed);

	std::normal_distribution<double> distribution(0.0, 1.0);

	std::cout << "some Normal-distributed(0.0,1.0) results:" << std::endl;
	for (int i = 0; i<10; ++i)
		std::cout << distribution(generator) << std::endl;
	std::cout << "max: " << distribution.max() << std::endl;
	std::cout << "min: " << distribution.min() << std::endl;
	std::cout << "mean: " << distribution.mean() << std::endl;
	std::cout << "stddev: " << distribution.stddev() << std::endl;
}

{ // normal_distribution::param: Distribution parameters
	std::default_random_engine generator;
	std::normal_distribution<double> d1(0.0, 1.0);
	std::normal_distribution<double> d2(d1.param());

	// print two independent values:
	std::cout << d1(generator) << std::endl;
	std::cout << d2(generator) << std::endl;
}

{ // normal_distribution::reset: Resets the distribution,
//   so that subsequent uses of the object do not depend on values already produced by it.
	std::default_random_engine generator;
	std::normal_distribution<double> distribution(0.0, 1.0);

	// print two independent values:
	std::cout << distribution(generator) << std::endl;
	distribution.reset();
	std::cout << distribution(generator) << std::endl;
}

	return 0;
}

///////////////////////////////////////////////////////////////////
// reference: http://en.cppreference.com/w/cpp/numeric/random/normal_distribution
int test_normal_distribution_2()
{
	std::random_device rd;
	std::mt19937 gen(rd());

	// values near the mean are the most likely
	// standard deviation affects the dispersion of generated values from the mean
	std::normal_distribution<> d(5, 2);

	std::map<int, int> hist;
	for (int n = 0; n<10000; ++n) {
		++hist[std::round(d(gen))];
	}
	for (auto p : hist) {
		std::cout << std::fixed << std::setprecision(1) << std::setw(2)
			<< p.first << ' ' << std::string(p.second / 200, '*') << '\n';
	}

	return 0;
}

////////////////////////////////////////////////////////////////
// reference: https://msdn.microsoft.com/en-us/library/bb982827.aspx
static void test(const double m, const double s, const int samples)
{
	using namespace std;
	// uncomment to use a non-deterministic seed  
	//    random_device gen;  
	//    mt19937 gen(rd());  
	mt19937 gen(1701);

	normal_distribution<> distr(m, s);

	cout << endl;
	cout << "min() == " << distr.min() << endl;
	cout << "max() == " << distr.max() << endl;
	cout << "m() == " << fixed << setw(11) << setprecision(10) << distr.mean() << endl;
	cout << "s() == " << fixed << setw(11) << setprecision(10) << distr.stddev() << endl;

	// generate the distribution as a histogram  
	map<double, int> histogram;
	for (int i = 0; i < samples; ++i) {
		++histogram[distr(gen)];
	}

	// print results  
	cout << "Distribution for " << samples << " samples:" << endl;
	int counter = 0;
	for (const auto& elem : histogram) {
		cout << fixed << setw(11) << ++counter << ": "
			<< setw(14) << setprecision(10) << elem.first << endl;
	}
	cout << endl;
}

int test_normal_distribution_3()
{
	using namespace std;

	double m_dist = 0;// 1;
	double s_dist = 1;
	int samples = 10;

	cout << "Use CTRL-Z to bypass data entry and run using default values." << endl;
	cout << "Enter a floating point value for the 'mean' distribution parameter: ";
	//cin >> m_dist;
	cout << "Enter a floating point value for the 'stddev' distribution parameter (must be greater than zero): ";
	//cin >> s_dist;
	cout << "Enter an integer value for the sample count: ";
	//cin >> samples;

	test(m_dist, s_dist, samples);

	return 0;
}

GitHub: https://github.com/fengbingchun/Messy_Test