1. 程式人生 > >Machine Learning With Statistical And Causal Methods

Machine Learning With Statistical And Causal Methods

In November 2014, Bernhard Scholkopf was awarded the Milner Award by the Royal Society for his contributions to machine learning.

In accepting the award, he gave a layman’s presentation of his work on statistical and causal machine learning methods titled “

Statistical and causal approaches to machine learning“.

It’s an excellent one hour talk and I highly recommend that you watch it.

Statistical Learning

On the statistical side, Scholkopf talks about empirical inference and generalisation.

An interesting and motivating point he makes early is on hard inference problems, motivating his work on kernel machines.

Specifically, he references the problem of classifying DNA sequences from locations as mentioned in Sonnenburg, et al. 2008 titled “Large Scale Multiple Kernel Learning“. In the paper, the authors show that algorithm performance increases as a function of the amount of data available.

He calls this a paradigm changing fact and characterizes these hard inference problems as having:

  • High dimensionality
  • Complex regularities
  • Little prior knowledge
  • Requiring “big data” sets

He finishes this part of the talk on statistical learning, describing the three key aspects of contribution of kernels methods.

  • Formalizes the notion of similarity
  • Induces a linear representation of the data in a vector space, no mater where the original data comes from
  • Encodes the function class used for learning, solutions of kernel algorithms can be expressed as kernel expansions

Causal Learning

The second part of the talk talks about Scholkopf’s work on causal modeling.

He describes causality, graphical models of causality and how one may infer a causal model from data.

Specifically, he touched on two new approaches to addressing the problems in inferring a causal model:

  • Separating out the cause from the mechanism (independence of noise and functions)
  • Restricting the functional model

The most interesting part of this discussion for me was when he touched on his work on viewing semi-supservied learning through the lens of a causal model. This was drawn from his work in “On causal and anticausal learning“, 2012.

He describes two examples:

  • Example 1: Predicting proteins from mRNA sequences. Here X (mRNA) causes Y (protein) and it is a causal problem.
  • Example 2: Predicting class membership from a handwritten digit. Here X (class membership) causes Y (handwritten digit) and it is an anti-causal problem.

The key finding is that modeling P(X) with extra data does not help in the first problem. We assume that P(X) is independent of P(Y|X). But in the second case modeling P(Y) is helpful because P(Y) is dependent on P(X|Y).

Problems like those in example 2 (predicting the cause X from the effect Y) will benefit from semi-supervised learning techniques. I’m surprised that this finding is talked about more often, perhaps it’s obvious to those deeper in the field.

Summary

It’s a great video and I’m sure it will get you motivated with regard to two important areas of machine learning.

相關推薦

Machine Learning With Statistical And Causal Methods

Tweet Share Share Google Plus In November 2014, Bernhard Scholkopf was awarded the Milner Award

Removing Obstacles to Production Machine Learning with OpnIDS and Dragonfly MLE

Machine learning promises to address many of the challenges faced by network security analysts; however, there are still many obstacles that prevent widesp

Hands on Machine Learning with Sklearn and TensorFlow學習筆記——機器學習概覽

 一、什麼是機器學習?   計算機程式利用經驗E(訓練資料)學習任務T(要做什麼,即目標),效能是P(效能指標),如果針對任務T的效能P隨著經驗E不斷增長,成為機器學習。【這是湯姆米切爾在1997年定義】   大白話:類比於學生學習考試,你先練習一套有一套的模擬卷 (這就相當於訓練資料),在這幾

OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow學習筆記彙總

其中用到的知識點我都記錄在部落格中了:https://blog.csdn.net/dss_dssssd 第一章知識點總結: supervised learning k-Nearest Neighbors Linear Regression

Hands-on Machine Learning with Scikit-Learn and TensorFlow(中文版)和深度學習原理與TensorFlow實踐-學習筆記

監督學習:新增標籤。學習的目標是求出輸入與輸出之間的關係函式y=f(x)。樸素貝葉斯、邏輯迴歸和神經網路等都屬於監督學習的方法。 監督學習主要解決兩類核心問題,即迴歸和分類。 迴歸和分類的區別在於強調一個是連續的,一個是離散的。 非監督學習:不新增標籤。學習目標是為了探索樣本資料之間是否

Cool Factor: How to Steal Styles with Machine Learning, Turi Create, and ResNet

Turi Style TransferFirst of all, follow the Turi Create installation instructions on GitHub. It’s imperative to create a Python 2.7 environment with the sp

[Machine Learning with Python] Cross Validation and Grid Search: An Example of KNN

Train model: from sklearn.model_selection import GridSearchCV param_grid = [ # try 6 (3×2) combinations of hyperparameters {'n_neighbors': [3,

[Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels for train set. Here we use drop() method in Pandas li

二、《Hands-On Machine Learning with Scikit-Learn and TensorFlow》一個完整的機器學習專案

  本章中,你會假裝作為被一家地產公司剛剛僱傭的資料科學家,完整地學習一個案例專案。 下面是主要步驟: 1. 專案概述。 2. 獲取資料。 3. 發現並可視化資料,發現規律。 4. 為機器學習演算法準備資料。 5. 選擇模型,進行訓練。 6. 微調模型。 7. 給出解決方案。 8. 部

Machine Learning with Amazon SageMaker and Cloudwick

Cloudwick’s Machine Learning with Amazon SageMaker Platform on Amazon Web Services (AWS) helps developers and business users of all skillsets leve

Machine Learning (4) Classification and Representation

1. Classification and Representation [分類和表達]: 1.1 Classification [分類]: 1.1.1 為了達到分類的目的,一種方法是使用線性迴歸,並將所有大於某值的預測對映為1,而所有預測小於該值的都對映為0。然而,這種

Introduction to Machine Learning with Python/Python機器學習基礎教程_程式碼修改與更新

2.3.1樣本資料集 --程式碼bug及修改意見 import matplotlib.pyplot as plt import mglearn X,y=mglearn.datasets.make_forge() mglearn.discrete_scatter(X[:,0

【文藝學生】Learning with exploration, and go ahead with learning. Let's progress together! :)

文藝學生 Learning with exploration, and go ahead with learning. Let's progress together! :)

Machine Learning with Peppa

把Scala List的幾種常見方法梳理彙總如下,日常開發場景基本上夠用了。建立列表scala> val days = List("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Sat

spark機器學習 原始碼 Machine Learning With Spark source code

@rover這個是C++模板 --胡滿超 stack<Postion> path__;這個裡面 ”<> “符號是什麼意思?我在C++語言裡面沒見過呢? 初學者,大神勿噴。

Combining Machine Learning with Credit Risk Scorecards

With all the hype around artificial intelligence, many of our customers are asking for some proof that AI can get them better results in areas where other

Machine Learning with Kaggle Kernels

In the last article we introduced Kaggle's primary offerings and proceeded to execute our first "Hello World" program within a Kaggle Kernel. In this artic

Machine Learning with Time Series Data

As with any data science problem, exploring the data is the most important process before stating a solution. The dataset collected had data on Chicago wea

Machine Learning, Artificial Intelligence, and How the Two Fit into Information Security

Everywhere I look, someone's talking about machine learning (ML) or artificial intelligence (AI). These two technologies are shaping important conversation