二分類模型評估指標的計算方法與程式碼實現
阿新 • • 發佈:2018-12-30
一、定義
在研究評估指標之前,先給出分類結果混淆矩陣(confusion matrix)。
1.準確率--accuracy
預測 真實 | 正例 | 反例 |
正例 | TP | FN |
反例 | FP | TN |
- 定義:對於給定的測試資料集,分類器正確分類的樣本數與總樣本數之比。
- 計算方法:
- 定義:被判定為正例(反例)的樣本中,真正的正例樣本(反例樣本)的比例。
- 計算方法:
- 定義:被正確分類的正例(反例)樣本,佔所有正例(反例)樣本的比例。
- 計算方法:
- 定義:基於精確率和召回率的調和平均。
- 計算方法:
- 定義:對於n個二分類混淆矩陣,在各混淆矩陣上分別計算精確率和召回率,記(P1,R1),(P2,R2)...(Pn,Rn),再計算平均值,得到巨集精確率(macro-P)、巨集召回率(macro-R),繼而得到巨集F1(macro-F1)。
- 定義:對於n個二分類混淆矩陣,先對TP、FN、FP、TN求平均值,再用均值計算得到微精確率(micro-P)、微召回率(micro-P),繼而得到微F1(micro-F1)。
二、python程式碼實現
# -*-coding: utf-8 -*- import numpy #true = [真實組1,真實組2...真實組N],predict = [預測組1,預測組2...預測組N] def evaluation(true,predict): num = len(true)#確定有幾組 (TP, FP, FN, TN) = ([0] * num for i in range(4))#賦初值 for m in range(0,len(true)): if(len(true[m]) != len(predict[m])):#樣本數都不等,顯然是有錯誤的 print "真實結果與預測結果樣本數不一致。" else: for i in range(0,len(true[m])):#對每一組資料分別計數 if (predict[m][i] == 1) and ((true[m][i] == 1)): TP[m] += 1.0 elif (predict[m][i] == 1) and ((true[m][i] == 0)): FP[m] += 1.0 elif (predict[m][i] == 0) and ((true[m][i] == 1)): FN[m] += 1.0 elif (predict[m][i] == 0) and ((true[m][i] == 0)): TN[m] += 1.0 # macro度量,先求每一組的評價指標,再求均值 (accuracy_macro, \ precision1_macro, precision0_macro, \ recall1_macro, recall0_macro, \ F1_score1_macro,F1_score0_macro) = \ ([0] * num for i in range(7)) for m in range(0,num): accuracy_macro[m] = (TP[m] + TN[m]) / (TP[m] + FP[m] + FN[m] +TN[m]) if (TP[m] + FP[m] == 0) : precision1_macro[m] = 0#預防一些分母為0的情況 else :precision1_macro[m] = TP[m] / (TP[m] + FP[m]) if (TN[m] + FN[m] == 0) : precision0_macro[m] = 0 else :precision0_macro[m] = TN[m] / (TN[m] + FN[m]) if (TP[m] + FN[m] == 0) : recall1_macro[m] = 0 else :recall1_macro[m] = TP[m] / (TP[m] + FN[m]) if (TN[m] + FP[m] == 0) : recall0_macro[m] = 0 recall0_macro[m] = TN[m] / (TN[m] + FP[m]) macro_accuracy = numpy.mean(accuracy_macro) macro_precision1 = numpy.mean(precision1_macro) macro_precision0 = numpy.mean(precision0_macro) macro_recall1 = numpy.mean(recall1_macro) macro_recall0 = numpy.mean(recall0_macro) #F1_score還是按這個公式來算,用macro-P和macro-R if (macro_precision1 + macro_recall1 == 0): macro_F1_score1 = 0 else: macro_F1_score1 = 2 * macro_precision1 * macro_recall1 / (macro_precision1 + macro_recall1) if (macro_precision0 + macro_recall0 == 0): macro_F1_score0 = 0 else: macro_F1_score0 = 2 * macro_precision0 * macro_recall0 / (macro_precision0 + macro_recall0) #micro度量,是用TP、TN、FP、FN的均值來計算評價指標 TPM = numpy.mean(TP) TNM = numpy.mean(TN) FPM = numpy.mean(FP) FNM = numpy.mean(FN) micro_accuracy = (TPM + TNM) / (TPM + FPM + FNM + TNM) if(TPM + FPM ==0): micro_precision1 = 0#預防一些分母為0的情況 else: micro_precision1 = TPM / (TPM + FPM) if(TNM + FNM ==0): micro_precision0 = 0 else: micro_precision0 = TNM / (TNM + FNM) if (TPM + FNM == 0):micro_recall1 = 0 else: micro_recall1 = TPM / (TPM + FNM) if (TNM + FPM == 0):micro_recall0 = 0 else: micro_recall0 = TNM / (TNM + FPM) # F1_score仍然按這個公式來算,用micro-P和micro-R if (micro_precision1 + micro_recall1 == 0): micro_F1_score1 = 0 else :micro_F1_score1 = 2 * micro_precision1 * micro_recall1 / (micro_precision1 + micro_recall1) if (micro_precision0 + micro_recall0 == 0): micro_F1_score0 = 0 else :micro_F1_score0 = 2 * micro_precision0 * micro_recall0 / (micro_precision0 + micro_recall0) print "*****************************macro*****************************" print "accuracy",":%.3f" % macro_accuracy print "%20s"%'precision',"%12s"%'recall',"%12s"%'F1_score' print "%5s" % "0", "%14.3f" % macro_precision0, "%12.3f" % macro_recall0, "%12.3f" %macro_F1_score0 print "%5s" % "1", "%14.3f" % macro_precision1, "%12.3f" % macro_recall1, "%12.3f" %macro_F1_score1 print "%5s" % "avg","%14.3f" % ((macro_precision0+macro_precision1)/2), \ "%12.3f" % ((macro_recall0+macro_recall1)/2), "%12.3f" %((macro_F1_score1+macro_F1_score0)/2) print "*****************************micro*****************************" print "accuracy",":%.3f" % micro_accuracy print "%20s"%'precision',"%12s"%'recall',"%12s"%'F1_score' print "%5s" % "0", "%14.3f" % micro_precision0, "%12.3f" % micro_recall0, "%12.3f" %micro_F1_score0 print "%5s" % "1", "%14.3f" % micro_precision1, "%12.3f" % micro_recall1, "%12.3f" %micro_F1_score1 print "%5s" % "avg", "%14.3f" % ((micro_precision0 + micro_precision1) / 2), \ "%12.3f" % ((micro_recall0 + micro_recall1) / 2), "%12.3f" % ((micro_F1_score0 + micro_F1_score1) / 2) if __name__ == "__main__": #簡單舉例 true = [[0, 1, 0, 1, 0], [0, 1, 1, 0]] predict = [[0, 1, 1, 1, 0], [0, 1, 0, 1]] evaluation(true,predict)
三、執行結果
*****************************macro*****************************
accuracy :0.650
precision recall F1_score
0 0.750 0.583 0.656
1 0.583 0.750 0.656
avg 0.667 0.667 0.656
*****************************micro*****************************
accuracy :0.667
precision recall F1_score
0 0.750 0.600 0.667
1 0.600 0.750 0.667
avg 0.675 0.675 0.667
可以看到,結果相當直觀了。
當未採用交叉驗證方法時,顯然true和predict都只有一組,macro和micro會輸出一樣的值。