1. 程式人生 > >Apache Spark 2.3 運行在Kubernete實戰

Apache Spark 2.3 運行在Kubernete實戰

https llb pen message vbs token CMF spa ive

  • 下載源代碼,並解壓
    下載地址
  • tar -zxvf v2.3.2.tar.gz
    1. 編譯
    cd spark-2.3.2
    build/mvn install -DskipTests
    build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
    build/mvn install -Pkubernetes -pl resource-managers/kubernetes/core -am -DskipTests
    
    [root@compile spark-2.3.2]# ls assembly/target/scala-2.11/jars/ -la|grep spark-kub*
    -rw-r--r-- 1 root root   381120 Sep 26 09:56 spark-kubernetes_2.11-2.3.2.jar
    
    dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes

    構建支持R語言和hive的tar

    ./dev/make-distribution.sh --name inspur-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pkubernetes

    出錯:

    ++ echo ‘Cannot find ‘\‘‘R_HOME‘\‘‘. Please specify ‘\‘‘R_HOME‘\‘‘ or make sure R is properly installed.‘
    Cannot find ‘R_HOME‘. Please specify ‘R_HOME‘ or make sure R is properly installed.

    此次我們只測試Spark running on kubernetes,因此暫時不需要解決此問題。

    1. 構建Docker Image
      ./bin/docker-image-tool.sh -r bigdata.registry.com:5000 -t 2.3.2 build
      ./bin/docker-image-tool.sh -r bigdata.registry.com:5000 -t 2.3.2 push

      由於本地的私有harbor中創建了倉庫insight
      因此,執行如下命令push Image:

      docker tag bigdata.registry.com:5000/spark:2.3.2 bigdata.registry.com:5000/insight/spark:2.3.2
      docker push  bigdata.registry.com:5000/insight/spark:2.3.2
    2. 將examples.jar上傳至httpd服務中
      [root@compile spark-2.3.2]# ll dist/examples/jars/spark-examples_2.11-2.3.2.jar 
      -rw-r--r-- 1 root root 1997551 Sep 26 09:56 dist/examples/jars/spark-examples_2.11-2.3.2.jar
      [root@compile spark-2.3.2]# cp dist/examples/jars/spark-examples_2.11-2.3.2.jar /opt/mnt/www/html/spark/
      [root@compile spark-2.3.2]# ll /opt/mnt/www/html/spark/
      -rw-r--r-- 1 root root  1997551 Sep 26 10:26 spark-examples_2.11-2.3.2.jar
    3. 準備kubernetes環境,即授權
      kubectl create serviceaccount spark -nspark
      kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark:spark --namespace=spark

      --seriveaccount=spark:spark 前一個spark是指namespace, 後一個spark是指serviceaccount

    4. 測試
      bin/spark-submit --master k8s://http://10.221.129.20:8080 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=bigdata.registry.com:5000/insight/spark:2.3.2   --conf spark.kubernetes.namespace=spark   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark http://10.221.129.22/spark/spark-examples_2.11-2.3.2.jar

      技術分享圖片
      運行日誌:

      2018-09-26 10:27:54 WARN Utils:66 - Kubernetes master URL uses HTTP instead of HTTPS.
      2018-09-26 10:28:25 WARN Config:347 - Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
      2018-09-26 10:28:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: N/A
      start time: N/A
      container images: N/A
      phase: Pending
      status: []
      2018-09-26 10:28:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: N/A
      container images: N/A
      phase: Pending
      status: []
      2018-09-26 10:28:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: 2018-09-26T02:28:27Z
      container images: bigdata.registry.com:5000/insight/spark:2.3.2
      phase: Pending
      status: [ContainerStatus(containerID=null, image=bigdata.registry.com:5000/insight/spark:2.3.2, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}), additionalProperties={})]
      2018-09-26 10:28:28 INFO Client:54 - Waiting for application spark-pi to finish...
      2018-09-26 10:28:51 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: 2018-09-26T02:28:27Z
      container images: bigdata.registry.com:5000/insight/spark:2.3.2
      phase: Pending
      status: [ContainerStatus(containerID=null, image=bigdata.registry.com:5000/insight/spark:2.3.2, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}), additionalProperties={})]
      2018-09-26 10:28:56 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: 2018-09-26T02:28:27Z
      container images: bigdata.registry.com:5000/insight/spark:2.3.2
      phase: Pending
      status: [ContainerStatus(containerID=null, image=bigdata.registry.com:5000/insight/spark:2.3.2, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=PodInitializing, additionalProperties={}), additionalProperties={}), additionalProperties={})]
      2018-09-26 10:28:57 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: 2018-09-26T02:28:27Z
      container images: bigdata.registry.com:5000/insight/spark:2.3.2
      phase: Running
      status: [ContainerStatus(containerID=docker://3abe8f7ac19d2f52ed3ba84e32e076268ae0dfde83ff0a75b2359924d3bac412, image=bigdata.registry.com:5000/insight/spark:2.3.2, imageID=docker-pullable://bigdata.registry.com:5000/insight/spark@sha256:0bfd1a27778f97a1ec620446b599d9f1fda882e8c3945a04ce8435356a40efe8, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-09-26T02:28:57Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
      2018-09-26 10:29:05 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
      pod name: spark-pi-7b0ffe8a4023370a872acdd679f024b1-driver
      namespace: default
      labels: spark-app-selector -> spark-74d52904a3794e8986895a12322c5cd9, spark-role -> driver
      pod uid: d9bce33c-c133-11e8-b988-fa163e609d06
      creation time: 2018-09-26T02:28:27Z
      service account name: default
      volumes: spark-init-properties, download-jars-volume, download-files-volume, default-token-7mnhw
      node name: master2
      start time: 2018-09-26T02:28:27Z
      container images: bigdata.registry.com:5000/insight/spark:2.3.2
      phase: Failed
      status: [ContainerStatus(containerID=docker://3abe8f7ac19d2f52ed3ba84e32e076268ae0dfde83ff0a75b2359924d3bac412, image=bigdata.registry.com:5000/insight/spark:2.3.2, imageID=docker-pullable://bigdata.registry.com:5000/insight/spark@sha256:0bfd1a27778f97a1ec620446b599d9f1fda882e8c3945a04ce8435356a40efe8, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://3abe8f7ac19d2f52ed3ba84e32e076268ae0dfde83ff0a75b2359924d3bac412, exitCode=1, finishedAt=Time(time=2018-09-26T02:29:04Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2018-09-26T02:28:57Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
      2018-09-26 10:29:05 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:
      Container name: spark-kubernetes-driver
      Container image: bigdata.registry.com:5000/insight/spark:2.3.2
      Container state: Terminated
      Exit code: 1
      2018-09-26 10:29:05 INFO Client:54 - Application spark-pi finished.
      2018-09-26 10:29:05 INFO ShutdownHookManager:54 - Shutdown hook called
      2018-09-26 10:29:05 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-53c85221-619e-41c6-8b94-80b950852b7e

    7.參考文檔:
    build-spark
    running-on-kubernetes

    備註:
    提交自定義的spark job

    bin/spark-submit     --master k8s://http://10.221.129.20:8080     --deploy-mode cluster     --name rule-engine     --class com.inspur.iot.RuleEngine     --conf spark.executor.instances=1     --conf spark.kubernetes.container.image=bigdata.registry.com:5000/insight/spark:2.3.2     --conf spark.kubernetes.namespace=spark     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark     http://10.221.129.22/spark/iot-stream-app-1.3-SNAPSHOT.jar     --base64=true --rule=c2VsZWN0IHRpbWVTdGFtcCBBcyBrZXksIGNvbmNhdF93cygifCIsIHN0YXRlLnJlcG9ydGVkLnRlbXBlcmF0dXJlLCBjbGllbnRUb2tlbikgYXMgdmFsdWUgZnJvbSB0b3BpY3M= --sample=‘{"timeStamp":1531381822,"clientToken":"clientId_lamp","state":{"reported":{"temperature":23}}}‘ --source-type=kafka --source=‘{"kafka.bootstrap.servers":"isa-kafka-svc.spark:9092","subscribe":"sensor"}‘ --sink-type=console --verbose

    Apache Spark 2.3 運行在Kubernete實戰