1. 程式人生 > >27 Sep 2018 R 語言 logistics 迴歸學習筆記

27 Sep 2018 R 語言 logistics 迴歸學習筆記

Logistics regression有著非常好的模型解釋,
以下為本人總結的在adult資料集上的模型解釋步驟

##第1步:load data
experiment_data<-read.table(‘C:\Users\data\adult.txt’,sep = ‘,’, header = TRUE)
colnames(experiment_data) <- c(“age30”,“age60”,“private”,“self_emp”,“gov”,“edu12”,“edu9”,“prof”,“white”,“male”,“hours50”,“hours30”,“us”,“outcome”)
M<-ncol(experiment_data)
Y<-experiment_data[,14]
obsX<-experiment_data[,-14]
############################################### Algorithm One ###################################################

Aim is to explanatory the model

logistic regression model

##第2步:load data
glm.fit <- glm(Y ~ age30 + edu12 + edu9 + prof + white + male + hours50 + hours30,
experiment_data, family = binomial(link = ‘logit’))

Analysis results of the retured model

summary(glm.fit)
glm.probs <- predict(glm.fit, type=“response”)
glm.pred <- ifelse(glm.probs > 0.4, “1”, “0”)
p <- exp(glm.probs)/(1+exp(glm.probs))

#To caucluate the mis-classification rate in-sample.
mean(ifelse(fitted(glm.fit)<0.4,0,1)!=Y)
#第3步:逐步迴歸
glm.fit1<-step(glm.fit)
summary(glm.fit1)#逐步迴歸

#第4步:模型解讀
exp(glm.fit1KaTeX parse error: Expected 'EOF', got '#' at position 15: coefficients) #̲解釋Odds比與x的關係 ex…coefficients[] #求使得pi為0.5的x
ratio05<-glm.fit1KaTeX parse error: Expected 'EOF', got '#' at position 21: …icients[]*0.25 #̲pi為0.5處的pi關於x的變…

deviance-glm.fit1KaTeX parse error: Expected 'EOF', got '\n' at position 54: …ll R2=",R2cox,"\̲n̲") R2nag<-R2cox…null.deviance)/length(Y)))
#5.3計算Nagelkerke擬合優度
cat(“Nagelkerke R2=”,R2nag,"\n")
#5.4殘差分析
plot(residuals(glm.fit1))
#5.5異常值診斷
library(car)
influencePlot(glm.fit1)
#第6步:分類表

True Positive(真正,TP):將正類預測為正類數

True Negative(真負,TN):將負類預測為負類數

False Positive(假正,FP):將負類預測為正類數誤報 (Type I error)

False Negative(假負,FN):將正類預測為負類數→漏報 (Type II error)

fitt.pi<-fitted(glm.fit1)#同predict(glm.safe1,data.frame(x2=x2),type=“resp”)
ypred<-1*(fitt.pi>0.5) #1邏輯變數就變成了0和1變數
length(ypred)
n<-table(experiment_data$outcome,ypred)
Precision=n[1,1]/sum(n[1,])
recall=n[1,1]/sum(n[,1])
ACC=(n[1,1]+n[2,2])/(sum(n[1,])+sum(n[2,]))
F1=2
n[1,1]/(2*n[1,1]+n[1,2]+n[2,1])
specificity=n[2,2]/sum(n[2,])
Percantage<-c(n[1,1]/sum(n[1,]),n[2,2]/sum(n[2,]))