C#程式設計中常用統計學公式

Being able to apply statistics is like having a secret superpower.

Where most people see averages, you see confidence intervals.

When someone says “7 is greater than 5,” you declare that they're really the same.

In a cacophony of noise, you hear a cry for help.

Unfortunately, not enough programmers have this superpower. That's a shame, because the application of statistics can almost always enhance the display and interpretation of data.

As my modest contribution to developer-kind, I've collected together the statistical formulas that I find to be most useful; this page presents them all in one place, a sort of statistical cheat-sheet for the practicing programmer.

Most of these formulas can be found in Wikipedia, but others are buried in journal articles or in professors' web pages. They are all classical (not Bayesian), and to motivate them I have added concise commentary. I've also added links and references, so that even if you're unfamiliar with the underlying concepts, you can go out and learn more. Wearing a red cape is optional.

Send suggestions and corrections to [email protected]

1. Formulas For Reporting Averages

One of the first programming lessons in any language is to compute an average. But rarely does anyone stop to ask: what does the average actually tell us about the underlying data?

1.1 Corrected Standard Deviation

The standard deviation is a single number that reflects how spread out the data actually is. It should be reported alongside the average (unless the user will be confused).

s=1

N−1

∑

i=1

−x

)

−



⎷



Where:

N is the number of observations
xi is the value of the i^th observation
x¯ is the average value of xi

1.2 Standard Error of the Mean

From a statistical point of view, the "average" is really just an estimate of an underlying population mean. That estimate has uncertainty that is summarized by the standard error.

SE=s

−

√

1.3 Confidence Interval Around the Mean

A confidence interval reflects the set of statistical hypotheses that won't be rejected at a given significance level. So the confidence interval around the mean reflects all possible values of the mean that can't be rejected by the data. It is a multiple of the standard error added to and subtracted from the mean.

CI=x

±t

α/2

Where:

α is the significance level, typically 5% (one minus the confidence level)
tα/2 is the 1−α/2 quantile of a t-distribution with N−1 degrees of freedom

1.4 Two-Sample T-Test

A two-sample t-test can tell whether two groups of observations differ in their mean.

The test statistic is given by:

t=x

−x

−

√

The hypothesis of equal means is rejected if |t| exceeds the (1−α/2) quantile of a t distribution with degrees of freedom equal to:

df=(s

)

/(n

−1)+(s

)

/(n

−1)

2. Formulas For Reporting Proportions

It's common to report the relative proportions of binary outcomes or categorical data, but in general these are meaningless without confidence intervals and tests of independence.

2.1 Confidence Interval of a Bernoulli Parameter

A Bernoulli parameter is the proportion underlying a binary-outcome event (for example, the percent of the time a coin comes up heads). The confidence interval is given by:

CI=⎛

⎝

p+z

α/2

±z

α/2

[p(1−p)+

C#程式設計中常用統計學公式

Table of Contents

1. Formulas For Reporting Averages

1.1 Corrected Standard Deviation

1.2 Standard Error of the Mean

1.3 Confidence Interval Around the Mean

1.4 Two-Sample T-Test

2. Formulas For Reporting Proportions

2.1 Confidence Interval of a Bernoulli Parameter

C#程式設計中常用統計學公式

C#語言中常用的判斷語句和循環語句

C++---使用VS在C++程式設計中出現 fatal error C1010: 在查詢預編譯頭時遇到意外的檔案結尾。是否忘記了向源中新增“#include "stdafx.h"”?

JAVA程式設計中常用的四種JSON解析方式

java網路程式設計中常用的類

C#基礎之二十二 C#窗體中常用的控制元件

關於C#程式設計中方法的呼叫

程式設計中常用的單詞縮寫

Linux CURL安裝及C程式設計中curl.h標頭檔案缺少問題解決

C#開發中常用加密解密方法解析

C++程式設計中對緩衝區的理解（OS預設4096大小的緩衝區，有例子，很形象）

const關鍵字的用法，在C++程式設計中要儘可能用const

C\C++程式設計中：相對路徑與絕對路徑

C語言中常用排序演算法（氣泡排序、選擇排序、插入排序、希爾排序、快速排序、堆排序）實現比較

（轉載）C語言中常用的幾個標頭檔案及庫函式 (stdio.h ,string.h ,math.h ,stdlib.h)

【linux C】C語言中常用的幾個函式的總結【一】

【linux C】C語言中常用的幾個函數的總結【一】

【linux C】C語言中常用的幾個函式的總結【二】

詳解C++程式設計中類的宣告和物件成員的引用

c語言中常用函式

C#程式設計中常用統計學公式

Table of Contents

1. Formulas For Reporting Averages

1.1 Corrected Standard Deviation

1.2 Standard Error of the Mean

1.3 Confidence Interval Around the Mean

1.4 Two-Sample T-Test

2. Formulas For Reporting Proportions

2.1 Confidence Interval of a Bernoulli Parameter

相關推薦