Quantile regression, from linear models to trees to deep learning

阿新 • • 發佈：2018-12-28

Quantile regression, from linear models to trees to deep learning

Suppose a real estate analyst wants to predict home prices from factors like home age and distance from job centers. The typical goal will be generating the best home price point estimate given those factors, where “best” typically refers to the smallest squared deviations between predictions and reality.

But what if they want to predict not just a single estimate, but also the likely range? This is called the prediction interval, and the general method for producing them is known as quantile regression. In this post I’ll describe how this problem is formalized; how to implement it in six linear, tree-based, and deep learning methods (in Python —

here’s the Jupyter notebook); and how they perform against real-world datasets.

Quantile regression minimizes quantile loss

Just as regressions minimize the squared-error loss function to predict a single point estimate, quantile regressions minimize the quantile loss in predicting a certain quantile. The most popular quantile is the median, or the 50th percentile, and in this case the quantile loss is simply the sum of absolute

errors. Other quantiles could give endpoints of a prediction interval; for example a middle-80-percent range is defined by the 10th and 90th percentiles. The quantile loss differs depending on the evaluated quantile, such that more negative errors are penalized more for higher quantiles and more positive errors are penalized more for lower quantiles.

Before digging into the formula, suppose we’ve made a prediction for a single point with a true value of zero, and our predictions range from -1 to +1; that is, our errors also range from -1 to +1. This graph shows how the quantile loss varies with the error, depending on the quantile.

Let’s look at each line separately:

The medium blue line shows the median, which is symmetric around zero, where all losses equal zero because the prediction was perfect. Looks good so far: the median aims to bisect the set of predictions, so we want to weigh underestimates equally to overestimates. As we’ll see soon, the quantile loss around the median is half the absolute deviation, so 0.5 at both -1 and +1, and 0 at 0.
The light blue line shows the 10th percentile, which assigns a lower loss to negative errors and a higher loss to positive errors. The 10th percentile means we think there’s a 10 percent chance that the true value is below that predicted value, so it makes sense to assign less of a loss to underestimates than to overestimates.
The dark blue line shows the 90th percentile, which is the reverse pattern from the 10th percentile.

We can also look at this by quantile for under- and over-estimated predictions. The higher the quantile, the more the quantile loss function penalizes underestimates and the less it penalizes overestimates.

Given this intuition, here’s the quantile loss formula (source):

And in Python code, where we can replace the branched logic with a maximum statement:

def quantile_loss(q, y, f):# q: Quantile to be evaluated, e.g., 0.5 for median.# y: True value.# f: Fitted (predicted) value.e = y - freturn np.maximum(q * e, (q - 1) * e)

Next we’ll look at the six methods — OLS, linear quantile regression, random forests, gradient boosting, Keras, and TensorFlow — and see how they work with some real data.

The data

This analysis will use the Boston housing dataset, which contains 506 observations representing towns in the Boston area. It includes 13 features alongside the target, the median value of owner-occupied homes. Quantile regression therefore is predicting the share of towns (not homes) with median home values below a value.

I train the models on 80 percent and test on the remaining 20 percent. For easier visualization, the first set of models uses a single feature: AGE, the proportion of owner-occupied units built prior to 1940. As we might expect, towns with older homes have lower home values, though the relationship is noisy.

For each method, we’ll predict the 10th, 30th, 50th, 70th, and 90th percentiles on the test set.

Ordinary least squares

Although OLS predicts the mean rather than the median, we can still calculate prediction intervals from it based on standard errors and the inverse normal CDF:

def ols_quantile(m, X, q):  # m: OLS statsmodels model.  # X: X matrix.  # q: Quantile.  mean_pred = m.predict(X)  se = np.sqrt(m.scale)  return mean_pred + norm.ppf(q) * se

This baseline approach produces linear and parallel quantiles centered around the mean (which is predicted as the median). A well-tuned model will show about 80 percent of dots in between the top and bottom lines. Note the dots differ from the first scatter plot, as here we’re showing the test set to evaluate out-of-sample predictions.

Linear quantile regression

Linear models extend beyond the mean to the median and other quantiles. Linear quantile regression predicts a given quantile, relaxing OLS’s parallel trend assumption while still imposing linearity (under the hood, it’s minimizing quantile loss). This is straightforward with statsmodels :

sm.QuantReg(train_labels, X_train).fit(q=q).predict(X_test)# Provide q.

Random forests

Our first departure from linear models is random forests, a collection of trees. While this model doesn’t explicitly predict quantiles, we can treat each tree as a possible value, and calculate quantiles using its empirical CDF (Ando Saabas has written more on this):

def rf_quantile(m, X, q):  # m: sklearn random forests model.  # X: X matrix.  # q: Quantile.  rf_preds = []  for estimator in m.estimators_:    rf_preds.append(estimator.predict(X))    # One row per record.    rf_preds = np.array(rf_preds).transpose()  return np.percentile(rf_preds, q * 100, axis=1)

It goes a bit crazy in this case, suggesting overfitting. Since random forests are more commonly used for high-dimensional datasets, we’ll return to them after adding more features to the model.

Gradient boosting

Another tree-based method is gradient boosting, scikit-learn’s implementation of which supports explicit quantile prediction:

ensemble.GradientBoostingRegressor(loss='quantile', alpha=q)

While not as jumpy as the random forests, it doesn’t look to do great on the one-feature model either.

Keras (deep learning)

Keras is a user-friendly wrapper for neural network toolkits including TensorFlow. We can use deep neural networks to predict quantiles by passing the quantile loss function. The code is somewhat involved, so check out the Jupyter notebook or read more from Sachin Abeywardana to see how it works.

Underlying most deep nets are linear models with kinks (called rectified linear units, or ReLUs), which we can see here visually: Keras predicts more bunching up of home values for towns with about 70 percent built before 1940, while fanning out more at the very low and very high ends of age. This seems to be a good prediction based on fit of the test data.

TensorFlow

One disadvantage of Keras is that each quantile must be trained separately. To leverage patterns common to the quantiles, we have to go to TensorFlow itself. See the Jupyter notebook and Jacob Zweig’s article to learn more about this.

We can see this co-learning across the quantiles in its predictions, where the model learns a common kink rather than separate ones for each quantile. This looks to be a good Occam-inspired choice.

Which did best?

Eyeballing suggests that deep learning did well, linear models did OK, and tree-based methods did poorly, but can we quantify which is best? Yes we can, using quantile loss over the test set.

Recall that the quantile loss differs depending on the quantile. Since we calculated five quantiles, we have five quantile losses for each observation in the test set. Averaging over all quantile-observations confirms the visual intuition: random forests did worst, while TensorFlow did best.

We can also break this out by quantile, revealing that tree-based methods did especially poorly at the 90th percentile, while deep learning did best at the lower quantiles.

Larger datasets give more opportunity to improve over OLS

So random forests was awful for this one-feature dataset, but that’s not what they’re made for. What happens if we add the other 12 features to the Boston housing model?

The tree-based methods made a comeback, and while the OLS improved, the gap between OLS and other non-tree methods grew.

Quantile regression, from linear models to trees to deep learning

Quantile regression, from linear models to trees to deep learningSuppose a real estate analyst wants to predict home prices from factors like home age and

How to Develop a Deep Learning Photo Caption Generator from Scratch

Tweet Share Share Google Plus Develop a Deep Learning Model to Automatically Describe Photograph

docker to create awesome Deep Learning Environments for R (or Python) PT I | AITopics

How long does it take you to install your complete GPU-enabled deep learning environment including RStudio or jupyter and all your packages? And do you hav

Guide to choose right deep Learning framework for your AI project

As world rolling around Artificial Intelligence (AI), demand for the AI-based product seen exponential growth, so the AI research. Deep learning algorithms

How to build a Deep Learning Image Classifier for Game of Thrones dragons

Performance of most flavors of the old generations of learning algorithms will plateau. Deep learning, training large neural networks, is scalable and perf

Linear algebra cheat sheet for deep learning

Linear algebra cheat sheet for deep learningBeginner’s guide to commonly used operationsDuring Jeremy Howard’s excellent deep learning course I realized I

Regression：Generalized Linear Models

n) 參數 ane rest lis only tex enter 更多作者：桂。時間：2017-05-22 15:28:43 鏈接：http://www.cnblogs.com/xingshansi/p/6890048.html 前言主要記錄

連接mysql時報：message from server: "Host '192.168.76.89' is not allowed to connect to this MySQL server

conn hang 方案 mysql ges fec 它的 0.00 數據處理方案： 1、先用localhost方式連接到MySQL數據庫，然後使用MySQL自帶的數據庫mysql; use mysql； 2、執行：select host fro

分類和邏輯回歸(Classification and logistic regression)，廣義線性模型(Generalized Linear Models) ，生成學習算法(Generative Learning algorithms)

line learning nbsp ear 回歸 logs http zdb del 分類和邏輯回歸(Classification and logistic regression) http://www.cnblogs.com/czdbest/p/5768467.html

centos7 安裝後，意外出現Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]

from str linu 安裝完成 ase 方案 clas acc lease 安裝完成centos7後出現如下提示： Initial setup of CentOS Linux 7 (core) 1) [x] Creat user 2) [!] License in

zabbix Get value from agent failed: cannot connect to [[10.2.72.132]:10050]: [113] No route to host

http CP 取數 -- OS all alt host zabb 描述：item主動模式可以獲取數據，被動模式不可以。zabbix server無法訪問agent服務器的10050端口解決：開啟端口即可： redhat 7.x版本 firewall-cmd -

Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused

http 9.png emctl 服務器無法 span gen 進程運維監控zabbix服務端這臺服務器,然後顯示Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Co

SVN中Revert changes from this revision 跟Revert to this revision

效果 spa 十個表示 visio class ges 全部 revert 譬如有個文件，有十個版本，假定版本號是1，2，3，4，5，6，7，8，9，10。Revert to this revision：如果是在版本6這裏點擊“Revert to this revisi

Mysql報錯 message from server: "Host '61.148.245.96' is not allowed to connect to this MySQL server

原因是：遠端伺服器不允許你的java程式訪問它的資料庫。所以，我們要對遠端伺服器進行設定，使它允許你進行連線。步驟：一、開啟mysql控制檯，輸入：use mysql; 二、輸入：show tables; 三、輸入：select host from

From feature descriptors to deep learning: 20 years of computer vision

From feature descriptors to deep learning: 20 years of computer vision 搬到牆內時間 2015-01-20 23:45:00 tombone's blog 原

The method xx from the type xx refers to the missing type List等問題的解決辦法

導入 pat fig 就是 rem 發現 con 網上 path 這個問題真的是在網上找了好久導入項目之後運行就出現500錯誤然後錯誤顯示這個在網上查閱之後發現兩種種解決辦法 1. # 將import刪掉，重新導入。重新導入後，可以發現錯誤立馬消失了，但是一會兒還是回

(辦公)mysql連線不上(java.sql.SQLException: null, message from server: "Host 'LAPTOP-O0GA2P8J' is not allowed to connect to this MySQL server&qu

轉載自csdn文章:https://blog.csdn.net/Tangerine_bisto/article/details/803461511.對所有主機進行訪問授權 GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%' IDENTIFIED BY'mypass

Quantile regression, from linear models to trees to deep learning

Quantile regression, from linear models to trees to deep learning

Quantile regression minimizes quantile loss

The data

Ordinary least squares

Linear quantile regression

Random forests

Gradient boosting

Keras (deep learning)

TensorFlow

Which did best?

Larger datasets give more opportunity to improve over OLS

Quantile regression, from linear models to trees to deep learning

How to Develop a Deep Learning Photo Caption Generator from Scratch

docker to create awesome Deep Learning Environments for R (or Python) PT I | AITopics

Guide to choose right deep Learning framework for your AI project

How to build a Deep Learning Image Classifier for Game of Thrones dragons

Linear algebra cheat sheet for deep learning

Regression：Generalized Linear Models

連接mysql時報：message from server: "Host '192.168.76.89' is not allowed to connect to this MySQL server

分類和邏輯回歸(Classification and logistic regression)，廣義線性模型(Generalized Linear Models) ，生成學習算法(Generative Learning algorithms)

centos7 安裝後，意外出現Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]

zabbix Get value from agent failed: cannot connect to [[10.2.72.132]:10050]: [113] No route to host

Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused

SVN中Revert changes from this revision 跟Revert to this revision

Mysql報錯 message from server: "Host '61.148.245.96' is not allowed to connect to this MySQL server

From feature descriptors to deep learning: 20 years of computer vision

The method xx from the type xx refers to the missing type List等問題的解決辦法

(辦公)mysql連線不上(java.sql.SQLException: null, message from server: "Host 'LAPTOP-O0GA2P8J' is not allowed to connect to this MySQL server&qu

centos startx報錯Errors from xkbcomp are not fatal to the X server

From Cats to Categories: Processing Geospatial Data with Machine and Deep Learning

Udacity changes policy from lifetime access to content to 12

Quantile regression, from linear models to trees to deep learning

Quantile regression, from linear models to trees to deep learning

Quantile regression minimizes quantile loss

The data

Ordinary least squares

Linear quantile regression

Random forests

Gradient boosting

Keras (deep learning)

TensorFlow

Which did best?

Larger datasets give more opportunity to improve over OLS

相關推薦