學習大資料第五天：最小二乘法的Python實現（二）

阿新 • • 發佈：2019-01-20

1.numpy.random.normal

numpy.random.normal

numpy.random.normal(loc=0.0, scale=1.0, size=None)

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [R250]

, is often called the bell curve because of its characteristic shape (see the example below).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution

[R250].

Parameters:

Parameters:	loc : float Mean (“centre”) of the distribution. scale : float Standard deviation (spread or “width”) of the distribution. size : int or tuple of ints, optional Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. Default is None, in which case a single value is returned.

loc : float

Mean (“centre”) of the distribution.

scale : float

Standard deviation (spread or “width”) of the distribution.

size : int or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

See also

scipy.stats.distributions.norm: probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gaussian distribution is

$p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }}e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },$

where $\mu$ is the mean and $\sigma$ the standard deviation. The square of the standard deviation, $\sigma^2$ , is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at $x + \sigma$ and $x - \sigma$ [R250]). This implies that numpy.random.normal is more likely to return samples lying close to the mean, rather than those far away.

References

[R250]

(1, 2, 3, 4) P. R. Peebles Jr., “Central Limit Theorem” in “Probability, Random Variables and Random Signal Principles”, 4th ed., 2001, pp. 51, 51, 125.

Examples

Draw samples from the distribution:

>>>

>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> s = np.random.normal(mu, sigma, 1000)

Verify the mean and the variance:

>>>

>>> abs(mu - np.mean(s)) < 0.01
True

>>>

>>> abs(sigma - np.std(s, ddof=1)) < 0.01
True

Display the histogram of the samples, along with the probability density function:

>>>

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 30, normed=True)
>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
...                np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
...          linewidth=2, color='r')
>>> plt.show()

2.numpy.random.randn

import numpy as np
np.random.randn(2,3)

array([[ 0.59941534,  1.0991949 ,  1.36316028],
       [-0.01979197,  1.30783162, -0.69808199]])

意思是從標準正太分佈中隨機抽取。

3.scipy.optimize.leastsq

最小二乘法

import numpy as np
from scipy.optimize import leastsq

#待擬合的函式，x是變數，p是引數
def fun(x, p):
a, b = p
return a*x + b

#計算真實資料和擬合數據之間的誤差，p是待擬合的引數，x和y分別是對應的真實資料
def residuals(p, x, y):
return fun(x, p) - y

#一組真實資料，在a=2, b=1的情況下得出
x1 = np.array([1, 2, 3, 4, 5, 6], dtype=float)
y1 = np.array([3, 5, 7, 9, 11, 13], dtype=float)

#呼叫擬合函式，第一個引數是需要擬合的差值函式，第二個是擬合初始值，第三個是傳入函式的其他引數
r = leastsq(residuals, [1, 1], args=(x1, y1))

#列印結果，r[0]儲存的是擬合的結果，r[1]、r[2]代表其他資訊
print r[0]

執行之後，擬合結果是

[2. 1.]

但是在這次實際的使用過程中，我擬合的函式不是這樣簡單的，其中的一個難點是待擬合函式是一個分段函式，需要判斷自變數的值，然後給出不同的函式方程式，舉個例子, 這樣一個分段函式:當x > 3時，y = ax + b, 當x <= 3 時，y = ax – b, 用Python程式碼寫一下：

def fun(x, p):
a, b = p
if (x > 3):
return a*x + b
else:
return a*x - b

如果我們還是使用原來的差值函式進行擬合，會得到這樣的錯誤：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

原因很簡單，我們現在的fun函式只能計算單個值了，如果傳入的還是一個array，自然就會報錯。那麼怎麼辦呢？我也很鬱悶，於是在scipy的maillist裡尋求幫助, 外國牛牛們都很熱心，很快就指出了問題。其實是我對於差值函式理解錯了，leastsq函式所要傳入的差值函式需要返回的其實是一個array, 於是我們可以這樣修改差值函式：

def residuals(p, x, y):
temp = np.array([0,0,0,0,0,0],dtype=float)
for i in range(0, len(x)):
temp[i] = fun(x[i], p)
return temp - y

import numpy as np #慣例
import scipy as sp #慣例
from scipy.optimize import leastsq #這裡就是我們要使用的最小二乘的函式
import pylab as pl

m = 9 #多項式的次數

def real_func(x):
return np.sin(2*np.pi*x) #sin(2 pi x)

def fake_func(p, x):
f = np.poly1d(p) #多項式分佈的函式
return f(x)

#殘差函式
def residuals(p, y, x):
return y - fake_func(p, x)

#隨機選了9個點，作為x
x = np.linspace(0, 1, 9)
#畫圖的時候需要的“連續”的很多個點
x_show = np.linspace(0, 1, 1000)

y0 = real_func(x)
#加入正態分佈噪音後的y
y1 = [np.random.normal(0, 0.1) + y for y in y0]

#先隨機產生一組多項式分佈的引數
p0 = np.random.randn(m)

plsq = leastsq(residuals, p0, args=(y1, x))

print ('Fitting Parameters ：', plsq[0]) #輸出擬合引數

pl.plot(x_show, real_func(x_show), label='real')
pl.plot(x_show, fake_func(plsq[0], x_show), label='fitted curve')
pl.plot(x, y1, 'bo', label='with noise')
pl.legend()
pl.show()

學習大資料第五天：最小二乘法的Python實現（二）

numpy.random.normal

學習大資料第五天：最小二乘法的Python實現（二）

LeetCode第五題：最長迴文子串（C語言）

第五節：JQuery框架源碼簡析（1）

第五章：條件、迴圈以及其他語句（上）

python學習第五天：python基礎（字串、有序集合列表、元組；正確理解元組不可變）

第五天：python字符串和列表

第五天：JavaWeb核心之Servlet

Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第五章：渲染流水線

java學習之路--------第五天

【譯】你不知道的Chrome除錯工具技巧第五天：console的log中,讓人疑惑的案例

python第五天：字典的增刪改查，字典的巢狀

tensorflow100天—第5天：最近鄰演算法

《機器學習實戰》第五章：Logistic迴歸（1）基本概念和簡單例項

第五天：turtle程式語法元素分析

OpenCV學習系列教程第五篇：測試和提高程式碼的效率

【吳恩達機器學習筆記】第五章：多變數線性迴歸

第五天：結構型模式--介面卡模式

大資料第四天——MapReduce原理及IDEA Maven下WordCount例項兩種實現

學習源碼第五天(難得可貴)

演算法導論第五章：概率分析和隨機演算法筆記（僱傭問題、指示器隨機變數、隨機演算法、概率分析和指示器隨機變數的進一步使用）

學習大資料第五天：最小二乘法的Python實現（二）

numpy.random.normal

相關推薦