1. 程式人生 > >Critical Values for Statistical Hypothesis Testing and How to Calculate Them in Python

Critical Values for Statistical Hypothesis Testing and How to Calculate Them in Python

In is common, if not standard, to interpret the results of statistical hypothesis tests using a p-value.

Not all implementations of statistical tests return p-values. In some cases, you must use alternatives, such as critical values. In addition, critical values are used when estimating the expected intervals for observations from a population, such as in tolerance intervals.

In this tutorial, you will discover critical values, why they are important, how they are used, and how to calculate them in Python using SciPy.

After completing this tutorial, you will know:

  • Examples of statistical hypothesis tests and their distributions from which critical values can be calculated and used.
  • How exactly critical values are used on one-tail and two-tail statistical hypothesis tests.
  • How to calculate critical values for the Gaussian, Student’s t, and Chi-Squared distributions.

Let’s get started.

A Gentle Introduction to Critical Values for Statistical Hypothesis Testing

A Gentle Introduction to Critical Values for Statistical Hypothesis Testing
Photo by Steve Bittinger

, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. Why Do We Need Critical Values?
  2. What Is a Critical Value?
  3. How to Use Critical Values
  4. How to Calculate Critical Values

Need help with Statistics for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Why Do We Need Critical Values?

Many statistical hypothesis tests return a p-value that is used to interpret the outcome of the test.

Some tests do not return a p-value, requiring an alternative method for interpreting the calculated test statistic directly.

A statistic calculated by a statistical hypothesis test can be interpreted using critical values from the distribution of the test statistic.

Some examples of statistical hypothesis tests and their distributions from which critical values can be calculated are as follows:

  • Z-Test: Gaussian distribution.
  • Student t-Test: Student’s t-distribution.
  • Chi-Squared Test: Chi-Squared distribution.
  • ANOVA: F-distribution.

Critical values are also used when defining intervals for expected (or unexpected) observations in distributions. Calculating and using critical values may be appropriate when quantifying the uncertainty of estimated statistics or intervals such as confidence intervals and tolerance intervals.

What Is a Critical Value?

A critical value is defined in the context of the population distribution and a probability.

An observation from the population with a value equal to or lesser than a critical value with the given probability.

We can express this mathematically as follows:

1 Pr[X <= critical value] = probability

Where Pr is the calculation of probability, X are observations from the population, critica_value is the calculated critical value, and probability is the chosen probability.

Critical values are calculated using a mathematical function where the probability is provided as an argument. For most common distributions, the value cannot be calculated analytically; instead it must be estimated using numerical methods. Historically it is common for tables of pre-calculated critical values to be provided in the appendices of statistics textbooks for reference purposes.

Critical values are used in statistical significance testing. The probability is often expressed as a significance, denoted as the lowercase Greek letter alpha (a), which is the inverted probability.

1 probability = 1 - alpha

Standard alpha values are used when calculating critical values, chosen for historical reasons and continually used for consistency reasons. These alpha values include:

  • 1% (alpha=0.01)
  • 5% (alpha=0.05)
  • 10% (alpha=0.10)

Critical values provide an alternative and equivalent way to interpret statistical hypothesis tests to the p-value.

How to Use Critical Values

Calculated critical values are used as a threshold for interpreting the result of a statistical test.

The observation values in the population beyond the critical value are often called the “critical region” or the “region of rejection“.

Critical Value: A value appearing in tables for specified statistical tests indicating at what computed value the null hypothesis can be rejected (the computed statistic falls in the rejection region).

One-Tailed Test

A one-tailed test has a single critical value, such as on the left or the right of the distribution.

Often, a one-tailed test has a critical value on the right of the distribution for non-symmetrical distributions (such as the Chi-Squared distribution).

The statistic is compared to the calculated critical value. If the statistic is less than or equal to the critical value, the null hypothesis of the statistical test is failed to be rejected. Otherwise it is rejected.

We can summarize this interpretation as follows:

  • Test Statistic < Critical Value: Fail to reject the null hypothesis of the statistical test.
  • Test Statistic => Critical Value: Reject the null hypothesis of the statistical test.

Two-Tailed Test

A two-tailed test has two critical values, one on each side of the distribution, which is often assumed to be symmetrical (e.g. Gaussian and Student-t distributions.).

When using a two-tailed test, a significance level (or alpha) used in the calculation of the critical values must be divided by 2. The critical value will then use a portion of this alpha on each side of the distribution.

To make this concrete, consider an alpha of 5%. This would be split to give two alpha values of 2.5% on either side of the distribution with an acceptance area in the middle of the distribution of 95%.

We can refer to each critical value as the lower and upper critical values for the left and right of the distribution respectively. Test statistic values more than or equal to the lower critical value and less than or equal to the upper critical value indicate the failure to reject the null hypothesis. Whereas test statistic values less than the lower critical value and more than the upper critical value indicate rejection of the null hypothesis for the test.

We can summarize this interpretation as follows:

  • Lower CR < Test Statistic < Upper CR: Failure to reject the null hypothesis of the statistical test.
  • Test Statistic <= Lower CR OR Test Statistic >= Upper CR: Reject the null hypothesis of the statistical test.

If the distribution of the test statistic is symmetric around a mean of zero, then we can shortcut the check by comparing the absolute (positive) value of the test statistic to the upper critical value.

  • |Test Statistic| < Upper Critical Value: Failure to reject the null hypothesis of the statistical test.

Where |Test Statistic| is the absolute value of the calculated test statistic.

How to Calculate Critical Values

Density functions return the probability of an observation in the distribution. Recall the definitions of the PDF and CDF as follows:

  • Probability Density Function (PDF): Returns the probability for an observation having a specific value from the distribution.
  • Cumulative Density Function (CDF): Returns the probability for an observation equal to or lesser than a specific value from the distribution.

In order to calculate a critical value, we require a function that, given a probability (or significance), will return the observation value from the distribution.

Specifically, we require the inverse of the cumulative density function, where given a probability, we are given the observation value that is less than or equal to the probability. This is called the percent point function (PPF), or more generally the quantile function.

  • Percent Point Function (PPF): Returns the observation value for the provided probability that is less than or equal to the provided probability from the distribution.

Specifically, a value from the distribution will equal or be less than the value returned from the PPF with the specified probability.

Let’s make this concrete with three distributions from which it is commonly required to calculate critical values. Namely, the Gaussian distribution, Student’s t-distribution, and the Chi-squared distribution.

We can calculate the percent point function in SciPy using the ppf() function on a given distribution. It should also be noted that you can also calculate the ppf() using the inverse survival function called isf() in SciPy. This is mentioned as you may see use of this alternate approach in third party code.

Gaussian Critical Values

The example below calculates the percent point function for 95% on the standard Gaussian distribution.

12345678910 # Gaussian Percent Point Functionfrom scipy.stats import norm# define probabilityp=0.95# retrieve value <= probabilityvalue=norm.ppf(p)print(value)# confirm with cdfp=norm.cdf(value)print(p)

Running the example first prints the value that marks 95% or less of the observations from the distribution of about 1.65. This value is then confirmed by retrieving the probability of the observation from the CDF, which returns 95%, as expected.

We can see that the value 1.65 aligns with our expectation with regard to the number of standard deviations from the mean that cover 95% of the distribution in the 68–95–99.7 rule.

12 1.64485362695147220.95

Student’s t Critical Values

The example below calculates the percentage point function for 95% on the standard Student’s t-distribution with 10 degrees of freedom.

1234567891011 # Student t-distribution Percent Point Functionfrom scipy.stats importt# define probabilityp=0.95df=10# retrieve value <= probabilityvalue=t.ppf(p,df)print(value)# confirm with cdfp=t.cdf(value,df)print(p)

Running the example returns the value of about 1.812 or less that covers 95% of the observations from the chosen distribution. The probability of the value is then confirmed (with minor rounding error) via the CDF.

12 1.81246112281073350.949999999999923

Chi-squared Critical Values

The example below calculates the percentage point function for 95% on the standard Chi-Squared distribution with 10 degrees of freedom.

1234567891011 # Chi-Squared Percent Point Functionfrom scipy.stats import chi2# define probabilityp=0.95df=10# retrieve value <= probabilityvalue=chi2.ppf(p,df)print(value)# confirm with cdfp=chi2.cdf(value,df)print(p)

Running the example first calculates the value of 18.3 or less that covers 95% of the observations from the distribution. The probability of this observation is confirmed by using it as input to the CDF.

12 18.3070380532751430.95

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

API

Articles

Summary

In this tutorial, you discovered critical values, why they are important, how they are used, and how to calculate them in Python using SciPy.

Specifically, you learned:

  • Examples of statistical hypothesis tests and their distributions from which critical values can be calculated and used.
  • How exactly critical values are used on one-tail and two-tail statistical hypothesis tests.
  • How to calculate critical values for the Gaussian, Student’s t, and Chi-Squared distributions.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Statistics for Machine Learning!

Statistical Methods for Machine Learning

Develop a working understanding of statistics

…by writing lines of code in python

It provides self-study tutorials on topics like:
Hypothesis Tests, Correlation, Nonparametric Stats, Resampling, and much more…

Discover how to Transform Data into Knowledge

Skip the Academics. Just Results.

相關推薦

Critical Values for Statistical Hypothesis Testing and How to Calculate Them in Python

Tweet Share Share Google Plus In is common, if not standard, to interpret the results of statist

5 Common Mistakes in Website Testing and How to Avoid Them

Here at Sentient, we are well acclimated to the world of website experimentation. Whether you are a conversion specialist, marketer, or ecommerce manag

Why (and how) to use eslint in your project

Why (and how) to use eslint in your projectThis story was written by Sam Roberts, a Senior Software Engineer at IBM Canada. It was first published in IBM d

Common crypto scams and how to avoid them

Despite all the hype and excitement, the crypto space has unfortunately become synonymous with hacks, scams and thefts.Cryptoaware.org estimates that at le

Javascript, the bad parts, and how to avoid them

Javascript, the bad parts, and how to avoid themI’ve started getting real with Javascript by October 2016. I started by reading some great books :JS : The

Top 3 Reasons Why Chatbots Fail in Finance [And How to Fix Them]

We use cookies to give you the best online experience. By using our website you agree to our use of cookies in accordance with our cookie

How to Round Numbers in Python

It’s the era of big data, and every day more and more business are trying to leverage their data to make informed decisions. Many businesses are turning

How to Load Data in Python with Scikit

Tweet Share Share Google Plus Before you can build machine learning models, you need to load you

Components testing in React: what and how to test with Jest and Enzyme.

Testing React components may be challenging for beginners as well as experienced developers who have already worked with tests. It may be interesting to co

Top September Stories: Essential Math for Data Science: Why and How; Machine Learning Cheat Sheets

Here are the most popular posts in KDnuggets in September, based on the number of unique page views (UPV), and social share counts from Facebook, Twitter,

What Is ACL (Access Control List) and How to Configure It?

Though the robust network promotes the connectivity among people at every comer of the globe, we may not enjoy its convenience or gain the information

How to Install OpenCV in Ubuntu 16.04 LTS for C / C++

Step 1 – Updating Ubuntu $ sudo apt-get update $ sudo apt-get upgrade Step 2 – Install dependencies $ sudo apt-get install build-esse

How to use AI in the insurance value chain: customer service and policy administration

How do you approach customer service and policy administration within your organization? In this blog post, I'll demonstrate how artificial intelligence (A

Google hack: Why it matters and how to delete your account

In June 2011, the world's biggest internet company launched what it hoped would become the world's biggest social network. And we aim to fix it," Google's

Code Reviews: Common Sources of Extreme Violations and How to Avoid Arguments about Fixing Them

Code Reviews: Common Sources of Extreme Violations and How to Avoid Arguments about Fixing ThemKnives are drawn. Blades are sharpened for conflict. A dispu

Hello World, book review: Algorithms, and how to live with them

The trajectory of books about new technologies follows a similar pattern: first, hype; then, backlash; then, finally, a more considered view of what it mig

Why most chatbots suck and how to build bots that bring value™

Why most chatbots suck and how to build bots that bring value™Conversational design seems to be way ahead of actual implementation. What can you do to fix 

Factor Authentication And How To Set It Up

What is 2-Factor Authentication and Why Should You Care?In the digital world that we live in, our virtual identity has become as important as the real one.

and how to own your “hero’s journey”

Why stories matter — and how to own your “hero’s journey”Originally published on JOTFORM.COMThe future belongs to those who tell the best stories.And behin

Computer Networks and how to actually understand them

Class A: As shown in the third column of the above image, for Class A IP addresses, the first bit of the first octet of IP address is constant and is “0”.T