Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

阿新 • • 發佈：2018-12-28

Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommendation example, it really is possible for Carlo’s to be recommended by a higher percentage of both men and women than Sophia’s but to be recommended by a lower percentage of

all reviewers. Before you declare this to be lunacy, here is the table to prove it.

Carlo’s wins among both men and women but loses overall!

The data clearly show that Carlo’s is preferred when the data are separated, but Sophia’s is preferred when the data are combined!

How is this possible? The problem here is that looking only at the percentages in the separate data ignores the sample size, the number of respondents answering the question. Each fraction shows the number of users who would recommend the restaurant out of the number asked. Carlo’s has far more responses from men than from women while the reverse is true for Sophia’s. Since men tend to approve of restaurants at a lower rate, this results in a lower average rating for Carlo’s when the data are combined and hence a paradox.

To answer the question of which restaurant we should go to, we need to decide if the data can be combined or if we should look at separately. Whether or not we should aggregate the data depends on the process generating the data — that is, the causal model of the data. We’ll cover what this means and how to resolve Simpson’s Paradox after we see another example.

Correlation Reversal

Another intriguing version of Simpson’s Paradox occurs when a correlation that points in one direction in stratified groups becomes a correlation in the opposite direction when aggregated for the population. Let’s take a look at a simplified example. Say we have data on the number of hours of exercise per week versus the risk of developing a disease for two sets of patients, those below the age of 50 and those over the age of 50. Here are individual plots showing the relationship between exercise and probability of disease.

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Correlation Reversal

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Meet The Dog That’s Learning How to Help Save An Endangered Owl

How to build Go plugin with data inside @ Alex Pliutau's Blog

How to Prove the ROI of Computer Vision Moderation

相關性不一定等於因果性：從 Yule-Simpson’s Paradox 講起

How to create own operator with python in mxnet?

How to get bitting code with SEC-E9 key cutting machine

How to SUM and GROUP BY of JSON data?

轉載 -- How To Optimize Your Site With GZIP Compression

轉載 -- How To Optimize Your Site With HTTP Caching

How To Update Android Apk Outside The Playstore

How to set connection timeout with OkHttp

[iOS] How to sort an NSMutableArray with custom objects in it?

A practical ES6 guide on how to perform HTTP requests using the Fetch API

How to become a team with chatbots

How to Mix Headless CMS with a Vue.js Website and Pay Zero

Australia's Crime Stoppers to digitise crime reporting through the cloud

How to Automate Surveillance Easily with Deep Learning

How to not make friends with AI?

Routing in React Native apps and how to configure your project with React

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Correlation Reversal

相關推薦