Talks SmartData 2020 conference

Evgeny Legky Retable
Evgeny Legky
Retable 
Day 3 / 17:15  / Track 1 / RU / Введение в технологию

AI-augmented data preparation: Building technology-agnostic data pipelines for modern data stacks with AI

Evgeny will talk about modern trends of Modern Data Stack, about pros and cons of old (ETL) and new (ELT) approaches and reasons which led to creating their own DSL.

Nadezhda Vesnina JetBrains
Nadezhda Vesnina
JetBrains 
Day 3 / 20:15  / Track 4 / RU /

Conference closing

Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.

Vladimir Verstov Yandex.Go
Vladimir Verstov
Yandex.Go 
Day 3 / 17:15  / Track 3 / RU / Для практикующих инженеров

How we develop DMP for Taxi, Food, and Lavka

Vladimir will talk about the motivation you need to develop your own ETL tool, about transforming ETL and DWH into DMP. The speaker will share what problems arise during the development of DMP and tell about the experience of solving them.

Andrey Kuznetsov Odnoklassniki
Andrey Kuznetsov
Odnoklassniki 
Day 2 / 10:45  / Track 1 / RU / Введение в технологию

Writing flexible pipelines for data platforms with Dagster

How to make Spark + Scala jobs and Python apps friends? Andrey will explain why it's worth doing and how to write pipelines with reusable blocks and flexible architecture using Dagster.

Nikolay Markov Aligned Research Group
Nikolay Markov
Aligned Research Group 
Day 2 / 12:30  / Track 1 / RU / Хардкор. Сложный низкоуровневый доклад, требующий от слушателя знаний технологии.

Working with data at a low level

Let's talk about some technologies that can help you to take more out of your machine — JIT, BLAS, and parallelism.

Evgeny Ermakov Yandex Go
Evgeny Ermakov
Yandex Go 
Nikolay Grebenshchikov Yandex Go
Nikolay Grebenshchikov
Yandex Go 
Day 1 / 17:15  / Track 4 / RU / Для практикующих инженеров

Highly Normalized Hybrid Model, or how we implemented the storage model

The DWH structure is not very flexible and modern approaches to design help fix this: Data Vault and Anchorn modeling. Eugene and Nikolay will tell you more about what to choose.

Aleksandr Sloutsky Microsoft
Aleksandr Sloutsky
Microsoft 
Gleb Lesnikov Dodo Engineering
Gleb Lesnikov
Dodo Engineering 
Day 1 / 17:15  / Track 1 / RU / Введение в технологию

Kusto (Azure Data Explorer): Microsoft's interactive Big Data platform

During this session Alexander will tell what makes Kusto (Azure Data Explorer) different from other solutions, will show how complex analysis of live telemetry of billion of records can take seconds, and open the curtain of the architecture on which Kusto is built.

Vladislav Shishkov Lamoda
Vladislav Shishkov
Lamoda 
Day 1 / 19:00  / Track 4 / RU / Для практикующих инженеров

Versioning database structure taking storage as an example

Vladislav will talk about versioning database structure taking Lamoda storage as an example.

Bronislav Zhitnikov Tinkoff
Bronislav Zhitnikov
Tinkoff 
Day 1 / 17:15  / Track 3 / RU /

Data initiation in Nifi

We will talk about NiFi as ETL and data Initiation for streaming. Bronislav will try to describe some practices and advice that Tinkoff uses.

Jeff Zhang Alibaba Group
Jeff Zhang
Alibaba Group 
Day 1 / 17:15  / Track 2 / EN / Введение в технологию

Flink + Zeppelin: Streaming data analytics platform

In this talk, Jeff would talk about how to use Flink on Zeppelin to build your own streaming data analytics platform.

Moon soo Lee Staroid, Inc.
Moon soo Lee
Staroid, Inc. 
Day 1 / 19:00  / Track 2 / EN / Введение в технологию

How we built Serverless Spark experience on Kubernetes

During this session, we'll talk about architecture, why Staroid used Kubernetes, what were the challenges, and how the company solved them. You will also see a working demo so you can get an idea of what the Serverless Spark experience looks like and how it benefits in your work.

Roman Korobeynikov VirtualHealth
Roman Korobeynikov
VirtualHealth 
Day 1 / 19:00  / Track 3 / RU / Для практикующих инженеров

On the way from Kafka to NiFi: How not to break and not lose

This talk is about building a fell-safe system for an Apache NiFi cluster using Apache Kafka as an input source.

Evgeny Rizhik Microsoft
Evgeny Rizhik
Microsoft 
Day 1 / 19:00  / Track 1 / RU / Хардкор. Сложный низкоуровневый доклад, требующий от слушателя знаний технологии.

Kusto (Azure Data Explorer): Architecture and internals

The talk about the principles of building a new database from scratch for working with logs and telemetry.

Alexey Konyaev CROC
Alexey Konyaev
CROC 
Day 2 / 12:30  / Track 2 / RU / Введение в технологию

Digitizing a worker in real-time

How does data from wearable devices travel to the user interface of the Digital Worker system.

Olga Makarova ivi
Olga Makarova
ivi 
Maria Nosareva ivi
Maria Nosareva
ivi 
Day 2 / 10:45  / Track 4 / RU / Для практикующих инженеров

Segmentation: A single window of knowledge about a user

Maria and Olga will present a talk on how to build an analytics system, which significantly expands business opportunities, using JVM and open source technologies.

Pavel Yakunin Russian Tech Centre Deutsche Bank
Pavel Yakunin
Russian Tech Centre Deutsche Bank 
Day 2 / 12:30  / Track 4 / RU / Для практикующих инженеров

Safe interactive big data at the bank: Business intelligence on Clickhouse

In his talk, Pavel will tell you what caused data fragmentation in his organization, and what typical analytics scenarios suffer as a result. He will also explain why the classic approach did not work for Deutsche Bank and what they learned to do differently.

Nikolay Averin Miro
Nikolay Averin
Miro 
Day 3 / 17:15  / Track 2 / RU / Введение в технологию

SQL migrations to Postgres under load

It is not a problem to make table migration when the database is stopped. But what if you need to migrate if the database is working? Nikolay will tell you about this in the form of practical tips for PostgreSQL.

Artur Hachuyan Tazeros
Artur Hachuyan
Tazeros 
Day 4 / 12:30  / Track 1 / RU / Для практикующих инженеров

Our repository for web analytics

Using the example of the history of building a repository for an advanced web analytics service, Artur will tell how the storage and reporting system in his project has evolved over the past 5 years.

Maksim Statsenko Yandex
Maksim Statsenko
Yandex 
Day 4 / 10:45  / Track 2 / RU / Введение в технологию

Review of the big data technologies. Pros and cons

Maksim's talk is about the pros and cons of various solutions for storing data: Cloud Solutions, Bare Metal Solutions, Hadoop, Vertica, ClickHouse, ExaSol, GreenPlum (ArenaDataDB), RDBMS, Teradata, and other.

Andrey Zhukov S7 Techlab
Andrey Zhukov
S7 Techlab 
Day 4 / 12:30  / Track 2 / RU / Введение в технологию

Enterprise data platform: Data infrastructure as a testing ground for business hypotheses

The talk about S7's experience in building a data platform, how long it took to build it.

Stanislav Bogatyrev NEO Saint Petersburg Competence Center
Stanislav Bogatyrev
NEO Saint Petersburg Competence Center 
Day 2 / 10:45  / Track 3 / RU / Хардкор. Сложный низкоуровневый доклад, требующий от слушателя знаний технологии.

NeoFS: Storing object data according to your rules

Stanislav wants to share the example of how you can replace the centralized S3 for storing data with a more accessible solution, organize policies so that data processing becomes more efficient. And also tell why there are multigraphs, homomorphic cryptography, multi-pass games, zero-knowledge proofs, and other mathematics.

Mikhail Maryfich Mail.Ru Group
Mikhail Maryfich
Mail.Ru Group 
Day 2 / 12:30  / Track 3 / RU / Для практикующих инженеров

CI/CD for ML models and datasets

There is not a very high-quality DS model in production and now there is no way to retrain or update it. To avoid this, come and listen to Mikhail's talk on this topic.

Neville Li Spotify
Neville Li
Spotify 
Day 3 / 19:00  / Track 2 / EN / Для практикующих инженеров

Scio — data processing at Spotify

We'll talk about the evolution of big data at Spotify, from Python, Hadoop, Hive, Storm, Scalding to today's world of cloud, and serverless computing.

Pasha Finkelstein JetBrains
Pasha Finkelstein
JetBrains 
Vitaly Khudobakhshov JetBrains
Vitaly Khudobakhshov
JetBrains 
Day 3 / 19:00  / Track 3 / RU / Введение в технологию

Kotlin API for Apache Spark: Why we made another API for working with Spark

Pasha and Vitaliy will talk about what data engineers choose and why they decided to make an API for one of the most popular frameworks for pipelines building.

Aleksandr Ermakov Arenadata
Aleksandr Ermakov
Arenadata 
Day 2 / 10:45  / Track 2 / RU / Введение в технологию

Approaches to building a modern data platform. The problems and the concept of implementation

Alexander will talk about the main characteristics of the modern data platform, the differences in the DWH architecture, the components used, and the open source distribution of Hadoop.

Phil Laszkowicz Futurice
Phil Laszkowicz
Futurice 
Day 3 / 18:30  / Track 1 / EN / От партнера

How to master time and space

Applying MLOps to a high-performance geospatial data platform for the edge and cloud.

Oleg Chirukhin JetBrains
Oleg Chirukhin
JetBrains 
Day 1 / 18:30  / Track 1 / RU / От партнера

Demo: Big Data tools

Join us for a presentation of a new JetBrains product: the Big Data Tools plugin. We will discuss its most significant use cases and provide a short demonstration using real-world examples. All questions will be answered by the developers directly involved in BDT development.

Dmitry Bugaychenko Sber
Dmitry Bugaychenko
Sber 
Day 4 / 10:45  / Track 1 / RU / Для практикующих инженеров

Stateful streaming: Cases, patterns, implementations

During this session, we will talk about the popular approach to data processing — thread processing, with a focus on working with the state.

Tanya Denisyuk Yandex
Tanya Denisyuk
Yandex 
Day 4 / 14:00  / Track 1 / RU /

SmartData 2020 Virtual Afterparty

Zoom session where we will gather all the attendees, speakers, and program committee members and experts. We will sum up the highlights of the conference and chat with each other in an informal setting of a merry crowd, like in good old non-COVID times. The only difference is that it will be in Zoom, because now it’s time of COVID, unfortunately.

Join the link below the player!

Alexey Fyodorov JUG Ru Group
Alexey Fyodorov
JUG Ru Group 
Sergey Boitsov JetBrains
Sergey Boitsov
JetBrains 
Day 4 / 13:45  / Track 1 / RU /

Conference closing

Join the SmartData closing with the Program committee: we will discuss the most interesting talks and chatters as well as talks that should be returned after the conference.

Phil Laszkowicz Futurice
Phil Laszkowicz
Futurice 
Day 4 / 12:00  / Track 1 / EN / От партнера

How to master time and space

Applying MLOps to a high-performance geospatial data platform for the edge and cloud.

Alexey Fyodorov JUG Ru Group
Alexey Fyodorov
JUG Ru Group 
Vitaly Khudobakhshov JetBrains
Vitaly Khudobakhshov
JetBrains 
Day 1 / 16:55  / Track 1 / RU /

Conference opening

Find out what awaits you in the next 4 days. The program committee will talk about schedule, interesting talks, and in what format they will be held. The team of organizers in turn will tell you how our platform works, where discussion zones will be held, how to connect to chat rooms, and where to ask questions.

Jacek Laskowski
Jacek Laskowski
 
Day 3 / 19:00  / Track 1 / EN / Введение в технологию

The latest and greatest of Delta Lake

This talk is a gentle introduction to the latest and greatest of Delta Lake. You will learn what Delta Lake is and what challenges it aims to solve.

Ksenia Tomak Dodo Engineering
Ksenia Tomak
Dodo Engineering 
Day 3 / 20:15  / Track 3 / RU /

Conference closing

Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.

Oleg Chirukhin JetBrains
Oleg Chirukhin
JetBrains 
Day 3 / 20:15  / Track 2 / RU /

Conference closing

Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.

Sergey Boitsov JetBrains
Sergey Boitsov
JetBrains 
Day 3 / 20:15  / Track 1 / RU /

Conference closing

Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.

Vitaliy Bragilevskiy JetBrains
Vitaliy Bragilevskiy
JetBrains 
Pasha Finkelstein JetBrains
Pasha Finkelstein
JetBrains 
Vitaly Khudobakhshov JetBrains
Vitaly Khudobakhshov
JetBrains 
Day 2 / 12:00  / Track 1 / RU / От партнера

Round Table: Programming languages in Data Engineering

We'll be discussing a wide variety of languages and technologies that data engineers are currently working with.