[Spark]Spark-streaming通過Receiver方式實時消費Kafka流程(Yarn-cluster)
阿新 • • 發佈:2018-12-01
1.啟動zookeeper
2.啟動kafka服務(broker)
[[email protected] kafka_2.11-0.10.2.1]# ./bin/kafka-server-start.sh config/server.properties
3.啟動kafka的producer(前提:已經建立好topic
[[email protected] kafka_2.11-0.10.2.1]# ./bin/kafka-console-producer.sh --broker-list master:9092 --topic test
4.啟動kafka的consumer
[[email protected] kafka_2.11-0.10.2.1]#./bin/kafka-console-consumer.sh --zookeeper master:2181 --topic test --from-beginning
5.打jar包,將帶有依賴的jar包上傳到叢集上
mvn clean assembly:assembly
6.編寫啟動指令碼,啟動任務 sh run_receiver.sh
/usr/local/src/spark-2.0.2-bin-hadoop2.6/bin/spark-submit\ --class com.skyell.streaming.ReceiverFromKafka\ --master yarn-cluster \ --executor-memory 1G \ --total-executor-cores 2 \ --files $HIVE_HOME/conf/hive-site.xml \ ./Spark8Pro-2.0-SNAPSHOT-jar-with-dependencies.jar
監控任務及檢視日誌
http://master:8088/cluster
關閉spark streaming任務
yarn application -kill application_1539421032843_0093
資料驅動變革-雲將 個人部落格地址