Data versioning in machine learning projects – Dmitry Petrov

Data versioning in machine learning projects – Dmitry Petrov

HomePyDataData versioning in machine learning projects – Dmitry Petrov
Data versioning in machine learning projects – Dmitry Petrov
ChannelPublish DateThumbnail & View CountDownload Video
Channel AvatarPublish Date not found Thumbnail
0 Views
PyData Berlin 2018

In machine learning projects, it's easy to get lost in many versions of your data files. Data Version Control or DVC is an open source tool for data science projects that was created to solve the problem of discrepancy between code and data files. It runs on Git and helps you switch between Git branches and extracts not only the source code but also a good version of the data files.

Slides: https://www.slideshare.net/DmitryPetrov15/pydata-berlin-2018-dvcorg

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 nonprofit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The PyData Global Network promotes discussion of best practices, new approaches and emerging technologies for data management, processing, analysis and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences are intended to be accessible and community-focused, with presentations from beginner to advanced level. PyData tutorials and conferences introduce participants to the latest project features as well as cutting-edge use cases. 00:00 Welcome!
00:10 Help us add timestamps or captions to this video! See description for details.

Want to help us add timestamps to our YouTube videos for easier discovery? Learn more here: https://github.com/numfocus/YouTubeVideoTimestamps

Please take the opportunity to connect and share this video with your friends and family if you find it useful.