Many people in the industry are familiar with the situation where you quickly deploy a DS model and a month later when it needs to be retrained with a new data/feature, it turns out that DS cannot do this.
Taking a model into production means not only packing it into a conditional container but also fixing the process of its training and monitoring its work. A detailed description of how the model was obtained avoids loss of knowledge and experimental results.
Odnoklassniki builds a process in which:
- all training parameters, dependencies, and artifacts are committed to git;
- models have trained automatically in a controlled environment;
- models are reviewed and entered the master;
- models fly to production.
Mikhail will tell about:
- processes and tools;
- how to organize versioned storage of data sets on dvc;
- how to organize rollouts through the repository;
- the path of the model from the task in JIRA to the production and back;
- how to organize automatic retraining.