AI-augmented data preparation: Building technology-agnostic data pipelines for modern data stacks with AI

RU / Day 3 / 17:15 / Track 1

To favorites

Retable DataFrame DSL is a new open source data pipelines DSL. On the one hand, it combines best practices of widely used data frameworks, such as Spark DataFrames and Python Pandas, on the other hand, it is backend-agnostic which means that it does not depend on backend technologies and allows to perform data pipelines either on data warehouses in ELT mode, or in ETL mode on data lakes, such as Spark.

Evgeny will talk about modern trends of Modern Data Stack, about pros and cons of old (ETL) and new (ELT) approaches and reasons which led to creating their own DSL. He will also share experience on how they managed to combine typed interface for building declarative data pipelines, CI/CD practices, scality and ability to work on any stack, either Spark, Snowflake or Pandas Code generation.

All talks