Spark取到Kafka，出現ZK和Kafka offset不一致

阿新 • • 發佈：2019-01-21

在專案中用到Spark Streaming讀取Kafka，應用的是Kafka的low level的API因此手動的把Offset儲存到ZK（每次執行成功後，才更新zk中的offset資訊）當中，但是如果出現Kafka出現網路問題或者ZK沒有寫入到情況就會出現ZK的offset和Kafka的offset不一致。此時就要對比Kafka和ZK中的Offset

PS：另外spark中也可以做checkpoint來儲存state

Using checkpoints
Keeping track of the offsets that have been processed.
另外it takes time for Spark to prepare them and store them
checkpoint比較耗時（平均時間3S做checkpoint)
牆裂推薦：

http://aseigneurin.github.io/2016/05/07/spark-kafka-achieving-zero-data-loss.html

邏輯：
如果ZK中的offset小於 EarliestOffset 大於LastestOffset說明ZK中的offset已經失效，把ZK中的offset更新為EarliestOffset；如果ZK的offset在 EarliestOffset 大於LastestOffset之間那麼以ZK的offset為準

KafkaUtil （SimpleConsumer從Kafka讀取offset）

public class KafkaUtil implements 
 Serializable {

    private static final long serialVersionUID = -7708717328840L;

    private static KafkaUtil kafkaUtil = null;

    private KafkaUtil() {
    }

    public static KafkaUtil getInstance() {
        if (kafkaUtil == null) {
            synchronized (KafkaUtil.class) {
                if 
 (kafkaUtil == null) {
                    kafkaUtil = new KafkaUtil();
                }
            }
        }
        return kafkaUtil;
    }

    /**
     * 從brokerList中獲取host
     *
     * @param brokerList
     * @return
     */
    public String[] getHostFromBrokerList(String brokerList) {
        String[] brokers = brokerList.split(",");
        for (int i = 0; i < brokers.length; i++) {
            brokers[i] = brokers[i].split(":")[0];
        }
        return brokers;
    }

    /**
     * 從brokerList中獲取port
     *
     * @param brokerList
     * @return
     */
    public Map<String, Integer> getPortFromBrokerList(String brokerList) {
        Map<String, Integer> portMap = new HashMap<String, Integer>();
        String[] brokers = brokerList.split(",");
        for (int i = 0; i < brokers.length; i++) {
            String host = brokers[i].split(":")[0];
            Integer port = Integer.valueOf(brokers[i].split(":")[1]);
            portMap.put(host, port);
        }
        return portMap;
    }

    public KafkaTopicOffset topicAndMetadataRequest(String brokerList, String topic) {
        List<String> topics = Collections.singletonList(topic);
        TopicMetadataRequest topicMetadataRequest = new TopicMetadataRequest(topics);
        KafkaTopicOffset kafkaTopicOffset = new KafkaTopicOffset(topic);
        String[] hosts = getHostFromBrokerList(brokerList);
        Map<String, Integer> portMap = getPortFromBrokerList(brokerList);

        for (String host : hosts) {
            SimpleConsumer simpleConsumer = null;
            try {
                simpleConsumer = new SimpleConsumer(host, portMap.get(host), Constant.TIME_OUT, Constant.BUFFERSIZE, Constant.groupId);
                kafka.javaapi.TopicMetadataResponse response = simpleConsumer.send(topicMetadataRequest);
                List<TopicMetadata> topicMetadatas = response.topicsMetadata();
                for (TopicMetadata metadata : topicMetadatas) {
                    for (PartitionMetadata partitionMetadata : metadata.partitionsMetadata()) {
                        kafkaTopicOffset.getLeaderList().put(partitionMetadata.partitionId(), partitionMetadata.leader().host());
                        kafkaTopicOffset.getOffsetList().put(partitionMetadata.partitionId(), 0L);
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (simpleConsumer != null) {
                    simpleConsumer.close();
                }

            }
        }

        return kafkaTopicOffset;
    }

    /**
     * 從Kafka取出某個topic中某個partition的最小或者最大offset
     *
     * @param brokerList
     * @param topic
     * @return
     */
    public KafkaTopicOffset getOffset(String brokerList, String topic, String flag) {
        KafkaTopicOffset kafkaTopicOffset = topicAndMetadataRequest(brokerList, topic);
        String[] hosts = getHostFromBrokerList(brokerList);
        Map<String, Integer> portMap = getPortFromBrokerList(brokerList);

        for (String host : hosts) {
            Iterator iterator = kafkaTopicOffset.getOffsetList().entrySet().iterator();
            SimpleConsumer simpleConsumer = null;
            try {
                simpleConsumer = new SimpleConsumer(host, portMap.get(host), Constant.TIME_OUT, Constant.BUFFERSIZE, Constant.groupId);
                while (iterator.hasNext()) {
                    Map.Entry<Integer, Long> entry = (Map.Entry<Integer, Long>) iterator.next();
                    int partitionId = entry.getKey();
                    //判斷當前的host是否為leader
                    if (!kafkaTopicOffset.getLeaderList().get(partitionId).equals(partitionId)) {
                        continue;
                    }

                    TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partitionId);
                    Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfoMap = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();

                    if (flag.equals(Constant.EARLIEST_OFFSET)) {
                        requestInfoMap.put(topicAndPartition, new PartitionOffsetRequestInfo(kafka.api.OffsetRequest.EarliestTime(), 1));
                    } else if (flag.equals(Constant.LATEST_OFFSET)) {
                        requestInfoMap.put(topicAndPartition, new PartitionOffsetRequestInfo(kafka.api.OffsetRequest.LatestTime(), 1));
                    }

                    OffsetRequest offsetRequest = new OffsetRequest(requestInfoMap, kafka.api.OffsetRequest.CurrentVersion(), Constant.groupId);
                    OffsetResponse offsetResponse = simpleConsumer.getOffsetsBefore(offsetRequest);

                    long[] offset = offsetResponse.offsets(topic, partitionId);
                    if (offset.length > 0) {
                        kafkaTopicOffset.getOffsetList().put(partitionId, offset[0]);
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (simpleConsumer != null) {
                    simpleConsumer.close();
                }
            }
        }

        return kafkaTopicOffset;

    }


}

Spark取到Kafka，出現ZK和Kafka offset不一致

Spark取到Kafka，出現ZK和Kafka offset不一致

MongoDB find命令匹配資料，匹配內容和檢索條件不一致

C#中，出現Inconsistent accessibility返回型別不一致問題

Kafka工作流程-KafkaCluster和Kafka 高可靠性儲存

微信小程式：點選商品+，出現數量和-

當執行npm publish 時，出現unauthorized 和 is not in the npm registry

css 對div用hover設定border，出現抖動和div走位問題，解決方法

jq 點編輯出現input框可進行編輯，出現儲存和取消

Unity NavMesh尋路檢測的bug（或者特性），爬坡卡住問題。(角色高度和網格高度不一致造成）

使用maven建立ssm專案時，出現dataSurce的bean建立不了的問題

用cmd執行記事本寫的java檔案，以及jdk版本和執行版本不一致原因及其解決辦法

基因資料處理122之SSW和SparkSW評分不一致，query為Q9

解決echarts的多個折現資料出現座標和值對不上的問題

ROS Base path和Source space不一致問題，修改檔名後無法make問題,catkin_make報錯問題

多行轉多列，行數和列數不確定

catkin_make報錯: ROS Base path和Source space不一致問題，

ArcGIS Engine開發，.NET4.0降為.NET3.5後，出現的”試圖載入格式不正確的程式“錯誤解決方法

Android studio匯入工程java檔案出現紅色J，gradle外掛與gradle版本不一致解決辦法

df和du 結果不一致，差別很大，df -h看到是利用率100%

crontab執行時間和系統時間不一致

Spark取到Kafka，出現ZK和Kafka offset不一致

相關推薦