1. 程式人生 > >kafka rebalance 機制與Consumer多種消費模式案例應用實戰-kafka 商業環境實戰

kafka rebalance 機制與Consumer多種消費模式案例應用實戰-kafka 商業環境實戰

本套系列部落格從真實商業環境抽取案例進行總結和分享,並給出Spark商業應用實戰指導,請持續關注本套部落格。版權宣告:本套Spark商業應用實戰歸作者(秦凱新)所有,禁止轉載,歡迎學習。

1 rebalance 何時觸發?到底幹嘛?流程如何?

1.1 reblance 何時觸發

  • 組訂閱發生變更,比如基於正則表示式訂閱,當匹配到新的topic建立時,組的訂閱就會發生變更。
  • 組的topic分割槽數發生變更,通過命令列指令碼增加了訂閱topic的分割槽數。
  • 組成員發生變更:新加入組以及離開組。

1.2 reblance 到底幹嘛

一句話:多個Consumer訂閱了一個Topic時,根據分割槽策略進行消費者訂閱分割槽的重分配

1.3 Coordinator 到底在那個Broker

找到Coordinator的演算法 與 找到_consumer_offsets目標分割槽的演算法是一致的。

  • 第一步:確定目標分割槽:Math.abs(groupId.hashCode)%50,假設是12。
  • 第二步:找到_consumer_offsets分割槽為10的Leader副本所在的Broker,那麼該broker即為Group Coordinator。

1.4 reblance 流程如何

reblance 流程流程整體如下圖所示,值得強調的幾點如下:

  • Coordinator的角色由Broker端擔任。
  • Group Leader 的角色主要有Consumer擔任。
  • 加入組請求(JoinGroup)=>作用在於選擇Group Leader。
  • 同步組請求(SyncGroup)=>作用在於確定分割槽分配方案給Coordinator,把方案響應給所有Consumer。

1.5 reblance 機制的好處

  • 分割槽分配權利下放給客戶端consumer,因此係統不用重啟,既可以實現分割槽策略的變更。
  • 使用者可以自行實現機架感知分配方案。

1.6 reblance generation 過濾無用請求

  • kafka引入 reblance generation ,就是為了防止Consumer group的無效Offset提交。若因為某些原因,consumer延遲提交了Offset,而該consumer被踢出了消費組,那麼該Consumer再次提交位移時,攜帶的就是舊的generation了。

2 reblance 監聽器應用級別實戰

  • reblance 監聽器解決使用者把位移提交到外部儲存的情況,在監聽器中實現位移儲存和位移的重定向。

  • onPartitionsRevoked : rebalance開啟新一輪的重平衡前會呼叫,一般用於手動提交位移,及審計功能

  • onPartitionsAssigned :rebalance在重平衡結束後會呼叫,一般用於消費邏輯處理

       Properties props = new Properties();
       props.put("bootstrap.servers", "localhost:9092");
       props.put("group.id", "test");
       props.put("enable.auto.commit", "false");
       props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
       props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
       KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
       
       統計rebalance總時長
       final AtomicLong totalRebalanceTimeMs =new AtomicLong(0L)
       
       統計rebalance開始時刻
       final AtomicLong rebalanceStart =new AtomicLong(0L)
       
       
       1 重平衡監聽
       consumer.subscribe(Arrays.asList("test-topic"), new ConsumerRebalanceListener(){
           
       @Override
          public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
              for(TopicPartition tp : partitions){
              
                  1 儲存到外部儲存
                  saveToExternalStore(consumer.position(tp)) 
                  
                  2 手動提交位移
                  //consumer.commitSync(toCommit);
              }
              
              rebalanceStart.set(System.currentTimeMillis())
          }
      }
    
      @Override
      public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
    
          totalRebalanceTimeMs.addAndGet(System.currentTimeMillis()-rebalanceStart.get())
          
          for (TopicPartition tp : partitions) {
                 
                 consumer.seek(tp,readFromExternalStore(tp))
          }
      }
       });
    
      2 訊息處理
      while (true) {
       ConsumerRecords<String, String> records = consumer.poll(100);
       for (ConsumerRecord<String, String> record : records) {
           buffer.add(record);
       }
       if (buffer.size() >= minBatchSize) {
           insertIntoDb(buffer);
           consumer.commitSync();
           buffer.clear();
       }
      }
    

3 Consumer組內訊息均衡實戰

3.1 Consumer 單執行緒封裝,實現多個消費者來消費(浪費資源)

例項主題:

  • ConsumerGroup 實現組封裝
  • ConsumerRunnable 每個執行緒維護私有的KafkaConsumer例項

public class  Main {

    public static void main(String[] args) {
        String brokerList = "localhost:9092";
        String groupId = "testGroup1";
        String topic = "test-topic";
        int consumerNum = 3;

        核心對外封裝
        ConsumerGroup consumerGroup = new ConsumerGroup(consumerNum, groupId, topic, brokerList);
        consumerGroup.execute();
    }
}

import java.util.ArrayList;
import java.util.List;
public class ConsumerGroup {    

    private List<ConsumerRunnable> consumers;
    
    public ConsumerGroup(int consumerNum, String groupId, String topic, String brokerList) {
        consumers = new ArrayList<>(consumerNum);
        for (int i = 0; i < consumerNum; ++i) {
            ConsumerRunnable consumerThread = new ConsumerRunnable(brokerList, groupId, topic);
            consumers.add(consumerThread);
        }
    }
    public void execute() {
        for (ConsumerRunnable task : consumers) {
            new Thread(task).start();
        }
    }
}

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.util.Arrays;
import java.util.Properties;

public class ConsumerRunnable implements Runnable {

    private final KafkaConsumer<String, String> consumer;

    public ConsumerRunnable(String brokerList, String groupId, String topic) {
        Properties props = new Properties();
        props.put("bootstrap.servers", brokerList);
        props.put("group.id", groupId);
        props.put("enable.auto.commit", "true");        //本例使用自動提交位移
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        this.consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));   // 本例使用分割槽副本自動分配策略
    }

    @Override
    public void run() {
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(200);   
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(Thread.currentThread().getName() + " consumed " + record.partition() +
                        "th message with offset: " + record.offset());
            }
        }
    }
}

3.2 一個Consumer,內部實現多執行緒消費(consumer壓力過大)

例項主題:

  • ConsumerHandler 單一的Consumer例項,poll后里面會跑一個執行緒池,執行多個Processor執行緒來處理
  • Processor 業務邏輯處理方法

進一步優化建議;

  • ConsumerHandler 設定手動提交位移,負責最終位移提交consumer.commitSync();。
  • ConsumerHandler設定一個全域性的Map<TopicPartion,OffsetAndMetadata> offsets,來管理Processor消費的位移。
  • Processor 負責批處理完訊息後,得到訊息的最大位移,並更新offsets陣列
  • ConsumerHandler 根據 offsets,位移提交後會清空offsets集合。
  • ConsumerHandler設定重平衡監聽

public class Main {
    public static void main(String[] args) {
    
        String brokerList = "localhost:9092,localhost:9093,localhost:9094";
        String groupId = "group2";
        String topic = "test-topic";
        int workerNum = 5;

        ConsumerHandler consumers = new ConsumerHandler(brokerList, groupId, topic);
        consumers.execute(workerNum);
        try {
            Thread.sleep(1000000);
        } catch (InterruptedException ignored) {}
        consumers.shutdown();
    }
}

import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class ConsumerHandler {

    private final KafkaConsumer<String, String> consumer;
    private ExecutorService executors;

    public ConsumerHandler(String brokerList, String groupId, String topic) {
        Properties props = new Properties();
        props.put("bootstrap.servers", brokerList);
        props.put("group.id", groupId);
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
    }
    public void execute(int workerNum) {
    
        executors = new ThreadPoolExecutor(workerNum, workerNum, 0L, TimeUnit.MILLISECONDS,
                new ArrayBlockingQueue<>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
                
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(200);
            
            for (final ConsumerRecord record : records) {
                executors.submit(new Processor(record));
            }
            
            
        }
    }
    public void shutdown() {
        if (consumer != null) {
            consumer.close();
        }
        if (executors != null) {
            executors.shutdown();
        }
        try {
            if (!executors.awaitTermination(10, TimeUnit.SECONDS)) {
                System.out.println("Timeout.... Ignore for this case");
            }
        } catch (InterruptedException ignored) {
            System.out.println("Other thread interrupted this shutdown, ignore for this case.");
            Thread.currentThread().interrupt();
        }
    }

}

import org.apache.kafka.clients.consumer.ConsumerRecord;
public class Processor implements Runnable {

    private ConsumerRecord<String, String> consumerRecord;

    public Processor(ConsumerRecord record) {
        this.consumerRecord = record;
    }

    @Override
    public void run() {
        System.out.println(Thread.currentThread().getName() + " consumed " + consumerRecord.partition()
            + "th message with offset: " + consumerRecord.offset());
    }
}   

3.3 方案對比

  • 第一種方案:建議採用 Consumer 單執行緒封裝,實現多個消費者來消費(浪費資源),這樣能很好地保證分割槽內消費的順序,同時也沒有執行緒切換的開銷。
  • 第二種方案:實現複雜,問題在於可能無法維護分割槽內的訊息順序,注意訊息處理和訊息接收解耦了。

4 Consumer指定分割槽消費案例實戰(Standalone Consumer)

  • Standalone Consumer assign 用於接收指定分割槽列表的訊息和Subscribe是矛盾的。只能二選一。

  • 多個 Consumer 例項消費一個 Topic 藉助於 group reblance可謂是天作之合。

  • 若要精準控制,assign逃不了。

          poperties props = new Properties();
          props.put("bootstrap.servers", brokerList);
          props.put("group.id", groupId);
          props.put("enable.auto.commit", "false");
          props.put("auto.commit.interval.ms", "1000");
          props.put("session.timeout.ms", "30000");
          props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
          props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
          consumer = new KafkaConsumer<>(props);
          
          List<TopicPartion> partitions = new ArrayList<>();
          
          List<PartitionInfo>  allPartitions = consumer.partitionsFor("kaiXinTopic")
          
          if(allPartitions != null && !allPartitions.isEmpty){
              for(PartitionInfo partitionInfo : allPartitions){
              
                  partitions.add(new TopicPartition(partitionInfo.topic(),partitionInfo.partition()))
              }
              
              consumer.assign(partitions)
          }
          
          while (true) {
             ConsumerRecords<String, String> records = consumer.poll(100);
             for (ConsumerRecord<String, String> record : records) {
                 buffer.add(record);
             }
             if (buffer.size() >= minBatchSize) {
                 insertIntoDb(buffer);
                 
                 consumer.commitSync();
                 
                 buffer.clear();
          }
    

結語

本文綜合了多本Kafka實戰書籍和部落格,為了寫好本文,參考了大量資料,進行了語言的重組,辛苦成文,各自珍惜!

秦凱新 2181119 2123