1. 程式人生 > >Python學習之numpy之高斯分佈

Python學習之numpy之高斯分佈

介紹

正態分佈(Normal distribution)又名高斯分佈(Gaussian distribution),
是一個在數學、物理及project等領域都很重要的概率分佈,在統計學的很多方面有著重大的影響力。

若隨機變數X服從一個數學期望為μ、標準方差為σ2的高斯分佈,記為:X~N(μ,σ2),

正態分佈的期望值μ決定了其位置,其標準差σ決定了分佈的幅度。因其曲線呈鐘形,因此人們又常常稱之為鐘形曲線。
我們通常所說的標準正態分佈是μ = 0,σ = 1的正態分佈(見右圖中綠色曲線)。

numpy random類中,看名稱就是產生隨機數的模組。

numpy.random.normal() 高斯分佈隨機數

normal(...) method of mtrand.RandomState instance
    normal(loc=0.0, scale=1.0, size=None)

    Draw random samples from a normal (Gaussian) distribution.

    The probability density function of the normal distribution, first
    derived by De Moivre and 200 years later by both Gauss and Laplace
    independently [2
]_, is often called the bell curve because of its characteristic shape (see the example below). The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with
its own unique distribution [2]_. Parameters ---------- loc : float or array_like of floats Mean ("centre") of the distribution. scale : float or array_like of floats Standard deviation (spread or "width") of the distribution. size : int or tuple of ints, optional Output shape. If the given shape is, e.g., ``(m, n, k)``, then ``m * n * k`` samples are drawn. If size is ``None`` (default), a single value is returned if ``loc`` and ``scale`` are both scalars. Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn. loc:均值(數學期望為μ) ,scale:標準差 (標準方差為σ2,標準差就是去掉平放),size:抽取樣本的size 通過文件可以看到size引數可以是一個數字,或者是一個元組。他來決定了輸出結果的形狀。 如果傳入的是一個單個數字 30,返回就會是30個長度的列表. Returns ------- out : ndarray or scalar Drawn samples from the parameterized normal distribution. 返回是一個數組和這種型別陣列和list很像。 See Also -------- scipy.stats.norm : probability density function, distribution or cumulative density function, etc. Notes ----- The probability density for the Gaussian distribution(高斯分佈) is .. math:: p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} }, 這個是高斯分佈是函式式。 where :math:`\mu` is the mean(平均數) and :math:`\sigma` the standard deviation. The square of the standard deviation, :math:`\sigma^2`, is called the variance. The function has its peak at the mean, and its "spread" increases with the standard deviation (the function reaches 0.607 times its maximum at :math:`x + \sigma` and :math:`x - \sigma` [2]_). This implies that `numpy.random.normal` is more likely to return samples lying close to the mean, rather than those far away. References ---------- .. [1] Wikipedia, "Normal distribution", http://en.wikipedia.org/wiki/Normal_distribution .. [2] P. R. Peebles Jr., "Central Limit Theorem" in "Probability, Random Variables and Random Signal Principles", 4th ed., 2001, pp. 51, 51, 125. Examples -------- Draw samples from the distribution: >>> mu, sigma = 0, 0.1 # mean and standard deviation >>> s = np.random.normal(mu, sigma, 1000) Verify the mean and the variance: >>> abs(mu - np.mean(s)) < 0.01 True >>> abs(sigma - np.std(s, ddof=1)) < 0.01 True Display the histogram of the samples, along with the probability density function: >>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 30, density=True) >>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * ... np.exp( - (bins - mu)**2 / (2 * sigma**2) ), ... linewidth=2, color='r') >>> plt.show() n = np.random.normal(loc=0.0, scale=1, size=(3,3,3)) print(n) 上面輸出下面的一個列表。可以看到size引數 如果是3個數字的元組,就返回3*3*3的陣列, [[[ 1.0172058 -0.44269518 0.26462677] [-0.35355925 0.6063244 1.26014832] [-0.18538448 -0.49259078 -0.62822534]] [[ 0.95726339 -0.06239384 -1.56474133] [ 1.42287373 -0.50173702 2.1642026 ] [-0.54096807 0.2472884 -1.1990265 ]] [[ 0.15663884 -0.18501496 -1.80360639] [ 0.81581949 -2.73858599 0.34537614] [-0.50873844 0.0351258 0.14204044]] ] n = np.random.normal(loc=0.0, scale=1, size=(2,3)) print(n) size引數 如果是2個數字的元組,就返回2*3的 陣列, [[-0.45735578 -0.53921269 -0.67449221] [-0.98068719 -0.37125721 -0.43999013]] import pprint n = np.random.normal(loc=0.0, scale=1, size=30) pprint.pprint(n) array([ 0.59407443, -1.36867189, -0.32986369, 0.80101075, 0.72632235, 0.347779 , 0.10520544, 0.9492837 , 2.19274468, 1.59970721, -0.95508177, -1.12671986, -0.53202767, 0.25783216, -1.1101487 , 0.78002647, -0.14404636, -1.50865102, 1.29681861, -0.67255912, -0.97184549, -0.30896753, 0.94493543, 0.686387 , 0.89299833, -0.17019804, -0.12766749, -0.30600834, -0.0332422 , -0.05667029])

正態分佈 圖

我們可以藉助matplotlib模組來畫出正態分佈的圖

mu, sigma = 0, 1 # mean and standard deviation 正態分佈(Normal distribution)又名高斯分佈(Gaussian distribution)
# 若隨機變數X服從一個數學期望為μ、標準方差為σ2的高斯分佈,記為:X∼N(μ,σ2), 正態分佈的期望值μ決定了其位置,其標準差σ決定了分佈的幅度。
# 我們通常所說的標準正態分佈是μ = 0,σ = 1的正態分佈(見右圖中綠色曲線)。


s = np.random.normal(mu, sigma, 100000)
abs(mu - np.mean(s)) < 0.01
abs(sigma - np.std(s, ddof=1)) < 0.01

# matplotlib.pyplot.hist(
#    x, bins=10, range=None, normed=False,
#    weights=None, cumulative=False, bottom=None,
#    histtype=u'bar', align=u'mid', orientation=u'vertical',
#    rwidth=None, log=False, color=None, label=None, stacked=False,
#    hold=None, **kwargs)
#    引數講解,引數個數很多,後面還是個邊長引數,真是變態
# Compute and draw the histogram of x. The return value is a tuple (n, bins, patches) or ([n0, n1, ...], bins, [patches0, patches1,...]) if the input contains multiple data.
# x 這個引數是指定每個bin(箱子)分佈的資料,對應x軸
# bins : integer or array_like, optional 這個引數指定bin(箱子)的個數,也就是總共有幾條條狀圖

count, bins, ignored = plt.hist(s, 300, density=True) # hist 是 Plot a histogram. 柱狀圖的意思
x = bins
y = 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) )

plt.plot(x, y, linewidth=1, color='r')

程式碼例子

"""
=========================================================
Demo of the histogram (hist) function with a few features
=========================================================

In addition to the basic histogram, this demo shows a few optional
features:

    * Setting the number of data bins
    * The ``normed`` flag, which normalizes bin heights so that the
      integral of the histogram is 1. The resulting histogram is an
      approximation of the probability density function.
    * Setting the face color of the bars
    * Setting the opacity (alpha value).

Selecting different bin counts and sizes can significantly affect the
shape of a histogram. The Astropy docs have a great section on how to
select these parameters:
http://docs.astropy.org/en/stable/visualization/histogram.html
"""

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

np.random.seed(0)

# example data
mu = 100  # mean of distribution
sigma = 15  # standard deviation of distribution
x = mu + sigma * np.random.randn(437)

num_bins = 50

fig, ax = plt.subplots()

# the histogram of the data
n, bins, patches = ax.hist(x, num_bins, normed=1)

# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
ax.plot(bins, y, '--')
ax.set_xlabel('Smarts')
ax.set_ylabel('Probability density')
ax.set_title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()