Optimizing Backup Performance Using Data Science Techniques

Location image of event venue

Details

One of the most important tasks for a database administrator is taking (and testing!) backups. As databases get larger and larger, the amount of time it takes to perform a backup can grow as well, to the point where your backups take longer than your available backup window. There are several settings we can use to optimize backup performance, such as buffer counts, maximum transfer size, and the number of files, but trying every combination of settings on a single production-sized database could take weeks or even months. In this talk, we will apply data science techniques to the problem of backup settings optimization and look at different models for approaching the problem and analyzing data. Some statistics background would be helpful, but is not required; the big requirement is a desire to speed up backups.

Kevin Feasel is a Data Platform MVP and Engineering Manager of the Predictive Analytics team at ChannelAdvisor, where he specializes in T-SQL and R development, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com), a contributing author to Tribal SQL (http://www.tribalsql.com), and one of the contributors behind We Speak Linux (https://wespeaklinux.com). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.