1. 程式人生 > >Going Beyond Predictions

Going Beyond Predictions

The predictions you make with a predictive model do not matter, it is the use of those predictions that matters.

Jeremy Howard was the President and Chief Scientist of Kaggle, the competitive machine learning platform. In 2012 he presented at the

O’reilly Strata conference on what he called the Drivetrain Approach for building “data products” that go beyond just predictions.

In this post you will discover Howard’s Drivetrain Approach and how you can use it to structure the development of systems rather than make predictions.

The Drivetrain Approach

The Drivetrain Approach
Image from

O’Reilly, all rights reserved

Motivating the Approach

Jeremy Howard was a top Kaggle participant before investing in and joining the company. In talks like Getting In Shape For The Sport Of Data Science you get deep insight into Howard’s keen ability at diving into data and building effective models.

By the 2012 Strata talk, Howard had been at Kaggle for a year or two and had see a lot of competitions and a lot of competitive data scientists. You can’t help but think that his pitch of a more rounded methodology was born out of his frustration of the focus on just predictions and their accuracy.

The predictions are the accessible piece and it makes sense that they are the focus of competitions. I see his Drivetrain Approach as him throwing down the gauntlet and challenging the community to strive for more.

The Drivetrain Approach

In the talk he presents a four step process for his Drivetrain Approach:

  1. Define Objectives: What outcome am I trying to achieve?
  2. Levers: What inputs can we control?
  3. Data: What data can we collect?
  4. Models: How do the levers influence the objectives?

He describes collecting data because what he is really referring to is the need for causality data, which most organizations do not collect. This data must be collected through by performing a large number of random experiments.

This is key. It goes beyond mealy A/B testing a new page title, it involves the evaluation of unbiased behaviour, such as the response to randomly selected recommendations.

The forth step of Modeling is a pipeline comprising of the following sub-processes:

  • Objective: What outcome am I trying to achieve.
  • Raw data: Unbiased causal data
  • Modeller: Statistical model of the causal relationships in the data.
  • Simulator: The ability to plug in ad hoc inputs (move the levers) and evaluate the effects on the objective.
  • Optimizer: The search of inputs (leaver values) using the simulator toward maximizing (or minimizing) a desired outcome.
  • Actionable outcome: Achieving the objective with the result

Case Studies

The approach is a little abstract, and needs clarification with some examples.

In the presentation, Howard uses Google search as an example:

  • Objective: What webpage do you want to read?
  • Levers: The ordering of the sites you could visit on the SERP.
  • Data: The link network between pages.
  • Model: Not discussed, but one would assume the ongoing experimentation on and refinement of the authority indicators for pages.

Extending this example, Google is very likely performing random experience in the SERP by injecting other results and seeing how users behave. This would permit a predictive model to be constructed based on the likelihood of clicking, the simulation of user clicks and the optimization of the most clickable entries in the SERP for a given user. Now, I expect an approach just like this is used for Google’s advertising, which would have been a clearer example.

Howard also gives Marketing as a suggested area for improvement. He comments that the objective is the maximization of CLTV. Levers include the recommendation of products, offers, discounts and customer care calls. The causal relationships that could be collected as raw data would be the probability or purchase and the probability of liking the product, but not know about the product.

He also gives the example of a prior start-up in the Optimal Decisions Group for maximizing profit in insurance. He also touches on the Google Self-Driving Car as another example, instead of mealy route finding as in current GPS displays.

I feel like there is greater opportunity to elaborate on these ideas. I think that if the methodology was presented in a clearer way with a step-by-step example, that there would have been greater response to these ideas.

Summary

The notions of going beyond the predictions needs to be repeated often. It is easy to get caught up in a given problem. We talk a lot about defining the problem up front as an attempt to reduce such effects.

Howard’s Drivetrain Approach is a tool that you can use to design a system to solve a complex problem that uses machine learning, rather than use machine learning to make predictions and call it a day.

There is a lot of overlap in these ideas with Response Surface Methodology (RSM). Although not explicitly spelled out, the link is hinted at in a related post at the same time by Irfan Ahmad in his Taxonomy of Predictive Modeling, required to clarify some of Howard’s terms.

相關推薦

Going Beyond Predictions

Tweet Share Share Google Plus The predictions you make with a predictive model do not matter, it

Going Beyond Google: Are Search Engines Ready for JavaScript Crawling & Indexing?

I recently published the results of my JavaScript SEO experiment where I checked which JavaScript frameworks are properly crawled and indexed by Google. Th

【原創】Beyond Compare 萬能破解方法

use cnblogs users sse bash 主程 官網 support compare 一、Mac 平臺上的破解 在官網(http://www.scootersoftware.com/download.php)下載最新的 Beyond Compare 解壓後,

HDU 1533 Going Home(KM完美匹配)

pro min msu max tle scanf size dfs using HDU 1533 Going Home 題目鏈接 題意:就是一個H要相應一個m,使得總曼哈頓距離最小 思路:KM完美匹配,因為是要最小。所以邊權建負數來處理就可以

hdu1533 Going Home km算法解決最小權完美匹配

number send hdu 所有 end man rest until 相反數 Going Home Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)

POJ 2195 Going Home(費用流)

void 個人 nod const 移動 方向 push class main http://poj.org/problem?id=2195 題意: 在一個網格地圖上,有n個小人和n棟房子。在每個時間單位內,每個小人可以往水平方向或垂直方向上移動一步,走到相鄰的方格中。

beyond compare安裝教程

bcomparekey:HmB5oANygQOhaStTHNa+zOKgOeWHOkeAp6d1+QwIebz6z9kwYCm9-O0jF9F79zvzed9v5UVC4VrDkDMmTM8nB+安裝步驟:sudo apt-get updatesudo apt-get install gdebisudo gd

[音樂] 《犬夜叉》主題曲 Misses beyond time - 穿越時空的思念

3.1 height 女生 生日 idt 日本漫畫家 人的 為什麽 網易 試聽: 時代を超える想い2 - 蝦米音樂 http://www.xiami.com/song/1769288161 時代を超える想い2 - 網易雲音樂http://music.163.com/#/

每日一句英語:怎樣回答美國人的How is it going問候語?

font fashion clas href get ebp img man out 和中國人“吃了嗎”是一個性質,本質上僅僅是個話題的起始點,而不是真的想知道你吃了沒有。 美國人打招呼有幾種方式: 不太熟的人:How are you? 一

[POJ2762]Going from u to v or from v to u?

str for digi tchar har add size ins sta 題目大意: 判斷一個有向圖是否弱連通。 思路: Tarjan縮點。然後判斷原圖是否是一條鏈。 考慮鏈的特性:有且僅有一點入度為0,有且僅有一點出度為0。 因此縮點後直接判斷入度為

beyond Compare試用到期 解決辦法

beyond 試用 找到 title 打開 microsoft strong dll tro   找到beyond Compare 4文件夾下面的BCUnrar.dll,將其刪掉或者重命名,再重新打開接著使用!beyond Compare試用到期 解決辦法

UVA - 11090 Going in Cycle!!

using main str clear emp ostream 是否 string 入隊 題意:   給一個有向圖,問這個圖構成的所有的環中,平均權值最小是多少?   平均權值是,回路上權值和除以邊數。 分析:   二分答案,讓每條邊減去二分的值,然後用Bellm

翻譯(三)Stairway to T-SQL: Beyond The Basics Level 9: Dynamic T-SQL Code

數據庫表 studio 應用程序 cmd char 結束 管理 分代 應對 Stairway to T-SQL: Beyond The Basics Level 9: Dynamic T-SQL Code By Gregory Larsen, 2016/07/

Beyond compare 4 一個可用key

mys sub -s com cno cond number opera ots Beycond compare 4 一個可用keyH1bJTd2SauPv5Garuaq0Ig43uqq5NJOEw94wxdZTpU-pFB9GmyPk677gJ vC1Ro6sbAvKR4

Unit5 Going places

cit too tin led love any why each roo Recite: A B A;A; I‘m so excited!We have two weeks off! What are you going to do? B; I‘m not sure.

mac sourcetree 啟用 Beyond compare

工具 mage nbsp 分享圖片 ima tree alt beyond images mac sourceTree自帶的對比工具不是很理想,感覺還是Beyond compare好用一些,在sourceTree中配置了對比工具為Beyond compare之後提示找不

POJ2195:Going Home——題解

algorithm dash while 房子 sizeof org oid cos -s http://poj.org/problem?id=2195 題目大意: 有些人和房子,一個人只能進一個房子,人走到房子的路程即為代價。 求所有人走到房子後的最小代價。 &m

[poj] 2195 Going Home || 最小費用最大流

etc 移動 數量 print getc truct pop namespace -c 原題 給定一個N*M的地圖,地圖上有若幹個人和房子,且人與房子的數量一致。人每移動一格需花費1(即單位費用=單位距離),一間房子只能入住一個人。現在要求所有的人都入住,求最小費用。 把每

centos 使用 beyond compare 對比工具

config log x86 diff shel git -- img .cn 我這裏的環境是centos7桌面版 三條命令安裝beyond compare wget http://www.scootersoftware.com/bcompare-4.2.3.225

Going Home HDU - 1533 (費用流)

head stdin ret set ron clas 圖片 %d class Going Home HDU - 1533 1 //費用流初探 2 #include <iostream> 3 #include <queue>