Spark-MLlib的快速使用之五（梯度提升樹GBT 迴歸）

阿新 • • 發佈：2018-11-19

（1）描述

　梯度提升樹（GBT）是決策樹的集合。 GBT迭代地訓練決策樹以便使損失函式最小化。 spark.ml實現支援GBT用於二進位制分類和迴歸，可以使用連續和分類特徵。

（2）測試資料

1 153:5 154:63 155:197 181:20 182:254 183:230 184:24 209:20 210:254 211:254 212:48 237:20 238:254 239:255 240:48 265:20 266:254 267:254 268:57 293:20 294:254 295:254 296:108 321:16 322:239 323:254 324:143 350:178 351:254 352:143 378:178 379:254 380:143 406:178 407:254 408:162 434:178 435:254 436:240 462:113 463:254 464:240 490:83 491:254 492:245 493:31 518:79 519:254 520:246 521:38 547:214 548:254 549:150 575:144 576:241 577:8 603:144 604:240 605:2 631:144 632:254 633:82 659:230 660:247 661:40 687:168 688:209 689:31

（3）測試程式

// $example on$

SparkConf sparkConf = new SparkConf()

.setAppName("JavaGradientBoostedTreesRegressionExample").setMaster("local");

JavaSparkContext jsc = new JavaSparkContext(sparkConf);

// Load and parse the data file.

String datapath = "sample_libsvm_data.txt";

JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(), datapath).toJavaRDD();

// Split the data into training and test sets (30% held out for testing)

JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[]{0.7, 0.3});

JavaRDD<LabeledPoint> trainingData = splits[0];

JavaRDD<LabeledPoint> testData = splits[1];

// Train a GradientBoostedTrees model.

// The defaultParams for Regression use SquaredError by default.

BoostingStrategy boostingStrategy = BoostingStrategy.defaultParams("Regression");

boostingStrategy.setNumIterations(3); // Note: Use more iterations in practice.

boostingStrategy.getTreeStrategy().setMaxDepth(5);

// Empty categoricalFeaturesInfo indicates all features are continuous.

Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<Integer, Integer>();

boostingStrategy.treeStrategy().setCategoricalFeaturesInfo(categoricalFeaturesInfo);

final GradientBoostedTreesModel model = GradientBoostedTrees.train(trainingData, boostingStrategy);

// Evaluate model on test instances and compute test error

JavaPairRDD<Double, Double> predictionAndLabel =

testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {

@Override

public Tuple2<Double, Double> call(LabeledPoint p) {

return new Tuple2<Double, Double>(model.predict(p.features()), p.label());

}

});

System.out.println(predictionAndLabel.take(10));

Double testMSE =

predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() {

@Override

public Double call(Tuple2<Double, Double> pl) {

Double diff = pl._1() - pl._2();

return diff * diff;

}

}).reduce(new Function2<Double, Double, Double>() {

@Override

public Double call(Double a, Double b) {

return a + b;

}

}) / data.count();

System.out.println("Test Mean Squared Error: " + testMSE);

System.out.println("Learned regression GBT model:\n" + model.toDebugString());

// Save and load model

model.save(jsc.sc(), "target/tmp/myGradientBoostingRegressionModel");

GradientBoostedTreesModel sameModel = GradientBoostedTreesModel.load(jsc.sc(),

"target/tmp/myGradientBoostingRegressionModel");

// $example off$

}

（4）測試結果

[(0.0,0.0), (1.0,1.0), (1.0,1.0), (0.0,0.0), (0.0,0.0), (1.0,1.0), (1.0,1.0), (0.0,0.0), (1.0,1.0), (0.0,0.0)]

Spark-MLlib的快速使用之五（梯度提升樹GBT 迴歸）

Spark-MLlib的快速使用之五（梯度提升樹GBT 迴歸）

Spark-MLlib的快速使用之四（梯度提升樹GBT 分類）

PHP面向物件深入理解之五（內省函式與反射類）

高通平臺msm8953 Linux DTS(Device Tree Source)裝置樹詳解之二（DTS裝置樹匹配過程）

Spark-MLlib的快速使用之二（樸素貝葉斯分類）

sklearn的快速使用之五（隨機梯度下降）

Spark-MLlib的快速使用之三（樸素貝葉斯分類）

設計模式實例(Lua)筆記之五（Bridge模式）

spark定制之五：使用說明

bash參考手冊之五（shell變量）續三

機器學習（七）—Adaboost 和梯度提升樹GBDT

Mininet(輕量級軟件定義網絡和測試平臺) 之五（ARP攻擊與防範）

機器學習中的概率模型和概率密度估計方法及VAE生成式模型詳解之五（第3章之 EM算法）

Python機器學習之梯度提升樹

簡單演示django使用之五--（django概用完結總結篇）

梯度提升樹（GBDT）理解

GBDT梯度提升樹（一）

SV系統整合篇之五（終）：初論環境的複用性

慕課從零到一spark進階之路（一）

Mybatis框架的使用之五（動態SQL)

Spark-MLlib的快速使用之五（梯度提升樹GBT 迴歸）

相關推薦