1. 程式人生 > >Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm

Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm

Naive Bayes is a simple and powerful technique that you should be testing and using on your classification problems.

It is simple to understand, gives good results and is fast to build a model and make predictions. For these reasons alone you should take a closer look at the algorithm.

In this post you will learn tips and tricks to get the most from the Naive Bayes algorithm.

Better Naive Bayes

Better Naive Bayes
Photo by Duncan Hull, some rights reserved

1. Missing Data

Naive Bayes can handle missing data.

Attributes are handled separately by the algorithm at both model construction time and prediction time.

As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value.

2. Use Log Probabilities

Probabilities are often small numbers. To calculate joint probabilities, you need to multiply probabilities together. When you multiply one small number by another small number, you get a very small number.

It is possible to get into difficulty with the precision of your floating point values, such as under-runs. To avoid this problem, work in the log probability space (take the logarithm of your probabilities).

This works because to make a prediction in Naive Bayes we need to know which class has the larger probability (rank) rather than what the specific probability was.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Download For Free


Also get exclusive access to the machine learning algorithms email mini-course.

3. Use Other Distributions

To use Naive Bayes with categorical attributes, you calculate a frequency for each observation.

To use Naive Bayes with real-valued attributes, you can summarize the density of the attribute using a Gaussian distribution. Alternatively you can use another functional form that better describes the distribution of the data, such as an exponential.

Don’t constrain yourself to the distributions used in examples of the Naive Bayes algorithm. Choose distributions that best characterize your data and prediction problem.

4. Use Probabilities For Feature Selection

Feature selection is the selection of those data attributes that best characterize a predicted variable.

In Naive Bayes, the probabilities for each attribute are calculated independently from the training dataset. You can use a search algorithm to explore the combination of the probabilities of different attributes together and evaluate their performance at predicting the output variable.

5. Segment The Data

Is their a well-defined subset of your data that responds well to the the Naive Bayes probabilistic approach?

Identifying and separating out segments that are easily handled by a simple probabilistic approach like Naive Bayes can give you increase performance and focus on the elements of the problem that are more difficult to model.

Explore different subsets, such as as the average or popular cases that are very likely handled well by Naive Bayes.

6. Re-compute Probabilities

Calculate the probabilities for each attribute is very fast.

This benefit of Naive Bayes means that you can re-calculate the probabilities as the data changes. This may be monthly, daily, even hourly.

This is something that may be unthinkable for other algorithms, but should be tested when using Naive Bayes if there is some temporal drift in the problem being modeled.

7. Use as a Generative Model

The Naive Bayes method characterizes the problem, which in turn can be used for making predictions about unseen data.

This probabilistic characterization can also be used to generate instances of the problem.

In the case of a numeric vector, the probability distributions can be sampled to create new fictitious vectors.

In the case of text (a very popular application of Naive Bayes), the model can be used to create fictitious input documents.

How might this be useful in your problem?

At the very least you can use the generative approach to help provide context for what the model has characterized.

8. Remove Redundant Features

The performance of Naive Bayes can degrade if the data contains highly correlated features.

This is because the highly correlated features are voted for twice in the model, over inflating their importance.

Evaluate the correlation of attributes pairwise with each other using a correlation matrix and remove those features that are the most highly correlated.

Nevertheless, always test your problem before and after such a change and stick with the form of the problem that leads to the better results.

9. Parallelize Probability Calculation

The probabilities for each attribute are calculated independently. This is the independence assumption in the approach and the reason why it has it’s name “naive”.

You can exploit this assumption to further speed up the execution of the algorithm by calculating attribute probabilities in parallel.

Depending on the size of the dataset and your resources, you could do this using different CPUs, different machines or different clusters.

10. Less Data Than You Think

Naive Bayes does not need a lot of data to perform well.

It needs enough data to understand the probabilistic relationship of each attribute in isolation with the output variable.

Given that interactions between attributes are ignored in the model, we do not need examples of these interactions and therefore generally less data than other algorithms, such as logistic regression.

Further, it is less likely to overfit the training data with a smaller sample size.

Try Naive Bayes if you do not have much training data.

11. Zero Observations Problem

Naive Bayes will not be reliable if there are significant differences in the attribute distributions compared to the training dataset.

An important example of this is the case where a categorical attribute has a value that was not observed in training. In this case, the model will assign a 0 probability and be unable to make a prediction.

These cases should be checked for and handled differently. After such cases have been resolved (an answer is known), the probabilities should be recalculated and the model updated.

12. It Works Anyway

An interesting point about Naive Bayes is that even when the independence assumption is violated and there are clear known relationships between attributes, it works anyway.

Importantly, this is one of the reasons why you need to spot check a variety of algorithms on a given problem, because the results can very likely surprise you.

Summary

In this post you learned a lot about how to use and get more out of the Naive Bayes algorithm.

Do you have some tricks and tips for using Naive Bayes not covered in this post Leave a comment.


Frustrated With Machine Learning Math?

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.


相關推薦

Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm

Tweet Share Share Google Plus Naive Bayes is a simple and powerful technique that you should be

匯入android工程出現:unable to get system library for the project

匯入android工程的時候出現了 :  unable to get system library for the project 解決問題的方法是開啟工程中的project.properties檔案,把版本號改成本機版本   比如: target=a

redis:Unable to validate object ;Could not get a resource from the pool;(error) MISCONF Redis is con

原因: 強制關閉Redis快照導致不能持久化。 解決方式: 登入redis : redis-cli 127.0.0.1:6379>config set stop-writes-on-bgsave-error no ok 解決 ! 參考文章:

java 連接 redis集群時報錯:Could not get a resource from the pool

rom idt log 圖片 pool 本機ip redis style exce 由於弄這個的時候浪費了太多的時間,所以才記錄下這個錯,給大夥參考下 檢查了一下,配置啥的都沒問題的,但在redis集群機器上就可以,錯誤如下: Exception in thread "

開發手記:JedisConnectionException: Could not get a resource from the pool

nfa 最大連接數 redis color 重試 direction str bsp blog 對於Redis,生產環境是集群模式,測試環境是單例模式,如果在生產環境中用單例模式會報錯。 解決辦法,通過雲配置,將配置進行自動化配置。 另附一份Redis配置: #****

redis.clients.jedis.exception.JedisConnectionException:Could not get a resource from the pool

class verbose 沒有 mage resource open conf bubuko uri 啟動項目報該異常。原因是因為該項目是需要啟動redis的,報錯原因是因為沒有安裝redis或者沒有手動啟動redis,把redis設置成自啟動就行了 一、下載window

IDEA 執行SpringDataRedis出現異常:Could not get a resource from the pool

場景再現:SpringDataRedis小demo 專案結構: pom.xml: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"

Cannot get Jedis connection,Could not get a resource from the pool,DENIED Redis is running in protec

一個新專案使用redis做純快取,在本機中無障礙使用,redis放到伺服器就出現上面的錯誤。 折騰了小一天排查各種可能性終於解決問題。最後不使用任何框架直接使用jedis 才發現問題的根本是redis開啟了保護模式。 解決如下:連線redis客戶端,使用命令 127.0.0.1:63

【Docker容器啟動問題】容器啟動時, exceptions.JedisConnectionException:Could not get a resource from the pool

問題現場環境: 1、本地虛擬機器CentOS7 下的docker環境。 2、docker下的redis、mysql已正常啟動,且虛擬機器外可正常訪問。 3、啟動容器(SpringCloud 閘道器服務)需要使用 docker下的redis、MySql。   問題現象

【Redis】Could not get a resource from the pool 實乃叢集配置問題

先說些題外話~自上次確診為鼻竇炎+過敏性鼻炎到現在已經一個月了,最初那會,從下午到晚上頭疼難忍。大概是積勞成疾,以前流鼻涕、打噴嚏的時候從來沒有注意過,結果病根一下爆發。 關鍵在於鎖定問題,開始治療一兩天之後就不會頭疼了。當然,習慣也很重要,再也不敢用力擤鼻子了。 挺過那一陣就好受很多,之後就是鼻塞稍微煩

springboot【redis】打成war包後部署,訪問報could not get a resource from the pool

 Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool

Redis在windows下的安裝啟動(解決一個錯誤:Could not get a resource from the pool)

由於專案需要,最近在將專案的每個模組改變成一個單獨的服務來進行部署,但是服務寫完之後,在啟動時報了一個錯誤:Could not get a resource from the pool,如下圖所示: 由以上資訊並查閱資料後明白可能是redis沒有啟動,但是公司

redis遠端連線異常:Cannot get Jedis connection/Could not get a resource from the pool

如果是遠端連線redis,多數情況下是沒有禁用127.0.0.1 redis預設是隻允許本機訪問的,需要在redis.conf配置檔案將127.0.0.1給禁用掉,註釋掉即可。允許外部訪問 redis還有個保護模式,預設為yes 改為no, protected-mod

SSM整合redis,並且解決Could not get a resource from the pool

第一步:匯入redis依賴 <!-- jedis (一個redis client端的jar)--> <dependency> <groupId>redis.clients</groupId> <artifactI

Redis一個異常的解決辦法,異常描述:Could not get a resource from the pool

異常描述:  redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool at redis.clients.util.Pool.getRes

解決jedis異常JedisConnectionException:Could not get a resource from the pool

伺服器上啟動了redis之後,用jedis連線發現報錯   異常的意思是獲取不到jedis的連線池,網上查了下,可能是因為連線不上redis伺服器導致的。 在stackoverflow上看到很多人說需要修改下redis配置檔案, 配置檔案裡有一行bind 127.

The Node.js Foundation and JS Foundation Announce an Intent to Merge (A Message from the Boards…

The Node.js Foundation and JS Foundation Announce an Intent to Merge (A Message from the Boards and a FAQ around the Announcement)*The introduction of this

redis JedisConnectionException: Could not get a resource from the pool 的八種可能的原因

HTTP Status 500 - Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get

Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This ma

連線池暴蹦.... 在執行程式的時候: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred b

SpringBoot2.0(13)整合Redis詳解及踩過的坑(Could not get a resource from the pool)

SpringBoot2.0整合Redis 首先安裝的過程就不提了。上一個專案的redis是配置在Windows下的,整合很簡單,也沒有做什麼配置。這次為了進行測試,裝在了linux下。在SpringBoot整合的過程中遇到了一些小坑,分享一下。 po