1. 程式人生 > >Amazon Kinesis Data Streams Adds Enhanced Fan-Out and HTTP/2 for Faster Streaming

Amazon Kinesis Data Streams Adds Enhanced Fan-Out and HTTP/2 for Faster Streaming

A few weeks ago, we launched two significant performance improving features for Amazon Kinesis Data Streams (KDS): enhanced fan-out and an HTTP/2 data retrieval API. Enhanced fan-out allows developers to scale up the number of stream consumers (applications reading data from a stream in real-time) by offering each stream consumer its own read throughput. Meanwhile, the HTTP/2 data retrieval API allows data to be delivered from producers to consumers in 70 milliseconds or better (a 65% improvement) in typical scenarios. These new features enable developers to build faster, more reactive, highly parallel, and latency-sensitive applications on top of Kinesis Data Streams.

Kinesis actually refers to a family of streaming services: Kinesis Video Streams, Kinesis Data Firehose, Kinesis Data Analytics, and the topic of today’s blog post, Kinesis Data Streams (KDS). Kinesis Data Streams allows developers to easily and continuously collect, process, and analyze streaming data in real-time with a fully-managed and massively scalable service. KDS can capture gigabytes of data per second from hundreds of thousands of sources – everything from website clickstreams and social media feeds to financial transactions and location-tracking events.

Kinesis Data Streams are scaled using the concept of a shard. One shard provides an ingest capacity of 1MB/second or 1000 records/second and an output capacity of 2MB/second. It’s not uncommon for customers to have thousands or tens of thousands of shards supporting 10s of GB/sec of ingest and egress. Before the enhanced fan-out capability, that 2MB/second/shard output was shared between all of the applications consuming data from the stream. With enhanced fan-out developers can register stream consumers to use enhanced fan-out and receive their own 2MB/second pipe of read throughput per shard, and this throughput automatically scales with the number of shards in a stream. Prior to the launch of Enhanced Fan-out customers would frequently fan-out their data out to multiple streams to support their desired read throughput for their downstream applications. That sounds like undifferentiated heavy lifting to us, and that’s something we decided our customers shouldn’t need to worry about. Customers pay for enhanced fan-out based on the amount of data retrieved from the stream using enhanced fan-out and the number of consumers registered per-shard. You can find additional info on the

pricing page.

Before we jump into a description of the new API, let’s cover a few quick notes about HTTP/2 and how we use that with the new SubscribeToShard API.

HTTP/2

HTTP/2 is a major revision to the HTTP network protocol that introduces a new method for framing and transporting data between clients and servers. It’s a binary protocol. It enables many new features focused on decreasing latency and increasing throughput. The first gain is the use of HPACK to compress headers. Another useful feature is connection multiplexing which allows us to use a single TCP connection for multiple parallel non-blocking requests. Additionally, instead of the traditional request-response semantics of HTTP, the communication pipe is bidirectional. A server using HTTP/2 can push multiple responses to a client without waiting for the client to request those resources. Kinesis’s SubscribeToShard API takes advantage of this server push feature to receive new records and makes use of another HTTP/2 feature called flow control. Kinesis pushes data to the consumer and keeps track of the number of bytes that have been unacknowledged. The client acknowledges bytes received by sending WINDOW_UPDATE frames to the server. If the client can’t handle the rate of data, then Kinesis will pause the flow of data until a new WINDOW_UPDATE frame is received or until the 5 minute subscription expires.

Now that we have a grasp on SubscribeToShard and HTTP/2 let’s cover how we use this to take advantage of enhanced fan-out!

Using Enhanced Fan-out

The easiest way to make use of enhanced fan-out is to use the updated Kinesis Client Library 2.0 (KCL). KCL will automatically register itself as a consumer of the stream. Then KCL will enumerate the shards and subscribe to them using the new SubscribeToShard API. It will also continuously call SubscribeToShard whenever the underlying connections are terminated. Under the hood, KCL handles checkpointing and state management of a distributed app with a Amazon DynamoDB table it creates in your AWS account. You can see an example of this in the documentation.

The general process for using enhanced fan-out is:

  1. Call RegisterStreamConsumer and provide the StreamARN and ConsumerName (commonly the application name). Save the ConsumerARN returned by this API call. As soon as the consumer is registered, enhanced fan-out is enabled and billing for consumer-shard-hours begins.
  2. Enumerate stream shards and call SubscribeToShard on each of them with the ConsumerARN returned by RegisterStreamConsumer. This establishes an HTTP/2 connection, and KDS will push SubscribeToShardEvents to the listening client. These connections are terminated by KDS every 5 minutes, so the client will need to call SubscribeToShard again if you want to continue receiving events. Bytes pushed to the client using enhanced fan-out are billed under enhanced fan-out data retrieval rates.
  3. Finally, remember to call DeregisterStreamConsumer when you’re no longer using the consumer since it does have an associated cost.

You can see some example code walking through this process in the documentation.

You can view Amazon CloudWatch metrics and manage consumer applications in the console, including deregistering them.

Available Now

Enhanced fan-out and the new HTTP/2 SubscribeToShard API are both available now in all regions for new streams and existing streams. There’s a lot more information than what I’ve covered in this blog post in the documentation. There is a per-stream limit of 5 consumer applications (e.g., 5 different KCL applications) reading from all shards but this can be increased with a  support ticket. I’m excited to see customers take advantage of these new features to reduce the complexity of managing multiple stream consumers and to increase the speed and parallelism of their real-time applications.

As always feel free to leave comments below or on Twitter.

相關推薦

Amazon Kinesis Data Streams Resources

This is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools. Amazon Ki

Amazon Kinesis Data Streams getting started

Reducing the time to get actionable insights from data is important to all businesses and customers who employ batch data analytics tools are exp

Amazon Kinesis Data Streams FAQs

Q: What is an Amazon Kinesis Application? An Amazon Kinesis Application is a data consumer that reads and processes data from an Amazon

Building a Data Processing Pipeline with Amazon Kinesis Data Streams and Kubeless

If you’re already running Kubernetes, FaaS (Functions as a Service) platforms on Kubernetes can help you leverage your existing investment in EC2

Amazon Kinesis Data Streams News

Two years ago we introduced Amazon Kinesis, which we now call Amazon Kinesis Streams, to allow customers to build applications that collect,

Amazon Kinesis Data Streams Pricing

Let’s assume that our data producers put 100 records per second in aggregate, and each record is 35KB. In this case, the total data input rate is

Amazon Kinesis Data Streams:AWS

Amazon Kinesis Data Streams (KDS) は、大規模にスケーラブルで持続的なリアルタイムのデータストリーミングサービスです。KDS はウエブサイトクリックストリームやデータべースイベントストリームや金融取引、ソーシャルメディアフィード、ITロゴ、ロケーション追跡イベ

Questions fréquentes (FAQ) sur Amazon Kinesis Data Streams

Q : Qu'est-ce qu'une application Amazon Kinesis ? Une application Amazon Kinesis est un consommateur de données qui lit et traite des do

Вопросы и ответы по Amazon Kinesis Data Streams

Вопрос: Что такое приложение Amazon Kinesis? Приложение Amazon Kinesis – это потребитель данных, который считывает и обрабатывает данные

Цены на Amazon Kinesis Data Streams

Предположим, что всего от источников данных поступает 100 записей в секунду, каждая запись размером 35 КБ. В этом случае общая скорость передачи в

Amazon Kinesis Data Streams 常見問題

問:什麼是 Amazon Kinesis 應用程式? 問:什麼是 Amazon Kinesis Client Library (KCL)? 適用於 Java | Python | Ruby | Node.js | .NET 的 Ama

Amazon Kinesis Data Streams 定價

讓我們假定我們的資料生產者平均每秒輸入 100 個記錄,每個記錄大小為 35KB。在這種情況下,總資料總輸入速率為 3.4MB/秒(100 個記錄/秒*35KB/記錄)。為方便起見,我們假設每次交易的吞吐量和記錄大小全天都是穩定不變的。請注意,我們可以隨時動態調整 Amazon Kinesi

Amazon Kinesis Data Firehose blog posts

Stream data into an Aurora PostgreSQL Database using AWS DMS and Amazon Kinesis Data Firehose In this blog post, we explore a solution to

Streaming CloudWatch Logs to Kinesis Data Streams

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Amazon Kinesis Data Firehose Features

You can configure Amazon Kinesis Data Firehose to prepare your streaming data before it is loaded to data stores. Simply select an AWS Lambda fun

Amazon Kinesis Data Firehose Pricing

If you send 5,000 records of streaming data per second, each record 7KB in size, to Amazon Kinesis Data Firehose in US-East to be loaded into Amaz

Amazon Kinesis Data Firehose Resources

Reducing the time to get actionable insights from data is important to all businesses and customers who employ batch data analytics tools are ex

Amazon Kinesis Data Analytics_流資料處理分析服務

Amazon Kinesis Data Analytics 是實時處理流資料的一種最簡單的方法,採用的是標準 SQL 且無需瞭解新的程式語言或處理框架。通過 Amazon Kinesis Data Analytics,您能夠使用 SQL 查詢流資料或構建整個流式處理應用程式,以便獲取可行的

Amazon Kinesis Data Firehose_流資料捕獲載入服務

Amazon Kinesis Data Firehose 是將流資料可靠地載入到資料儲存和分析工具的最簡單方式。它可以捕獲、轉換流資料並將其載入到 Amazon S3、Amazon Redshift、Amazon Elasticsearch Service 和 Splunk,讓您可以藉助正在