Scio — data processing at Spotify

EN / Day 3 / 19:00 / Track 2

Scio is an open source Scala API for Apache Beam and Google Cloud Dataflow.

It's created by Spotify to process petabytes of data in both batch and streaming mode and is adopted by dozens of other companies as well.

We'll talk about the evolution of big data at Spotify, from Python, Hadoop, Hive, Storm, Scalding to today's world of cloud, and serverless computing. We'll look at some classic use cases behind the scene, e.g. Discover Weekly, Wrapped, and the challenges the company faced.

We'll also talk about some features that make it stand out from other Scala big data frameworks, including Spotify's uses of Algebird, macros, shapeless, magnolia, etc. to make large scale data processing easier, safer, and faster.

Download presentation