1. 程式人生 > >[Spark]Spark-streaming通過Receiver方式實時消費Kafka流程(Yarn-cluster)

[Spark]Spark-streaming通過Receiver方式實時消費Kafka流程(Yarn-cluster)

1.啟動zookeeper
2.啟動kafka服務(broker)
[[email protected] kafka_2.11-0.10.2.1]# ./bin/kafka-server-start.sh config/server.properties
3.啟動kafka的producer(前提:已經建立好topic
[[email protected] kafka_2.11-0.10.2.1]# ./bin/kafka-console-producer.sh --broker-list master:9092 --topic test
4.啟動kafka的consumer
[[email protected]
kafka_2.11-0.10.2.1]#./bin/kafka-console-consumer.sh --zookeeper master:2181 --topic test --from-beginning
5.打jar包,將帶有依賴的jar包上傳到叢集上
mvn clean assembly:assembly
6.編寫啟動指令碼,啟動任務 sh run_receiver.sh
/usr/local/src/spark-2.0.2-bin-hadoop2.6/bin/spark-submit\
        --class com.skyell.streaming.ReceiverFromKafka\
        --master yarn-cluster \
        --executor-memory 1G \
        --total-executor-cores 2 \
        --files $HIVE_HOME/conf/hive-site.xml \
        ./Spark8Pro-2.0-SNAPSHOT-jar-with-dependencies.jar
監控任務及檢視日誌

http://master:8088/cluster

關閉spark streaming任務
yarn application -kill application_1539421032843_0093

資料驅動變革-雲將 個人部落格地址