Stream processing with Kafka and Go June 16 2016
Tamás Michelberger Secret Sauce Partners, Inc
Agenda Let's talk about Kafka How to use Kafka from Go How we use Kafka at SSP
What is Kafka? From the docs: Kafka is a distributed, partitioned, replicated commit log service. Producers and consumers Topics for maintaining a feed of messages Originally comes from LinkedIn
Is it any good?
A typical microservice architecture
Apache Kafka + Zookeeper = 3.5 million writes per second(http://www.slideshare.net/hyderabadscalability/apache-kafka-zookeeper-35-million-writes-per-second)
Introducing Kafka to mix
Apache Kafka + Zookeeper = 3.5 million writes per second(http://www.slideshare.net/hyderabadscalability/apache-kafka-zookeeper-35-million-writes-per-second)
How does it work from a user's perspective? Topics are the main unit of organization Topics can have multiple partitions Partitions are the unit of parallelism O sets for keeping track of consumer progress
"Dumb" server and smart clients Do one thing but do it well Server is basically a transport mechanism with some housekeeping Client does: partitioning consumer group orchestration (only in <0.9) o set tracking
Message formats Messages are just byte arrays, the server never tries to make sense of them Clients are free to chose whatever format they want Send some (semi-)structured data such as JSON Even better: messages with associated schemas Avro, Protobuf, Thrift
Using Kafka from Go
Producers import "github.com/Shopify/sarama" producer, err := sarama.NewAsyncProducer([]string{"localhost:9092"}, nil) if err != nil { log.Fatal(err) } go func() { for err := range producer.Errors() { log.Printf("producer couldn't send message: %v", err) } }() producer.Input() <- &sarama.ProducerMessage{ Topic: "mytopic", Key: sarama.StringEncoder("key"), Value: sarama.StringEncoder("message content"), }
Consumers import "github.com/Shopify/sarama" consumer, err := sarama.NewConsumer([]string{"localhost:9092"}, nil) if err != nil { log.Fatal(err) } partitionConsumer, err := consumer.ConsumePartition("my_topic", 0, sarama.OffsetNewest) if err != nil { log.Fatal(err) } for message := range partitionConsumer.Messages() { log.Printf("Consumed message offset %d\n", msg.Offset) }
Consumer group 0.8.2 example but the API for 0.9 really similar import "github.com/wvanbergen/kafka/consumergroup" consumer, err := consumergroup.JoinConsumerGroup( "ExampleConsumerGroup", []string{"topic.with.single.partition", "topic.with.multiple.partitions"}, []string{"localhost:2181}, // Zookeeper nil) if err != nil { log.Fatal(err) } for event := range consumer.Messages() { // Process event log.Println(string(event.Value)) // Ack event consumer.CommitUpto(event) }
Kafka at SSP Single Kafka node Handful of topics Data pipeline for product and transaction feed processing A little bit less than 500 000 messages a day
Fit Predictor Product feed
Product consumer Style Finder
References kafka.apache.org/documentation.html (http://kafka.apache.org/documentation.html) godoc.org/github.com/Shopify/sarama (https://godoc.org/github.com/Shopify/sarama) Apache Kafka + Zookeeper = 3.5 million writes per second (http://www.slideshare.net/hyderabadscalability/apache-kafkazookeeper-35-million-writes-per-second)
Apache Kafka 0.10: Evaluating Performance in Distributed Systems (https://engineering.heroku.com/blogs/2016-05-27apache-kafka-010-evaluating-performance-in-distributed-systems/)
Thank you Tamás Michelberger Secret Sauce Partners, Inc @tmichelberger (http://twitter.com/tmichelberger)