Imagine you can create a Spark session anywhere — from your data science notebook in your laptop, a script running in your batch pipeline, to an application running inside a container — and then Spark executors running on the cloud automatically connected to your Spark session and distribute your workload. No cluster provision, cluster management is required.
This Serverless experience gives users to instantly access to the distributed computing capability without knowledge about cluster deployment, resource allocation, security, maintenance, etc.
Staroid implemented Serverless Spark experience on Kubernetes and open sourced core implementation of it. We'll talk about architecture, why we used Kubernetes, what were the challenges, and how the company solved them. You will also see a working demo so you can get an idea of what the Serverless Spark experience looks like and how it benefits in your work.
Download presentation