Gradient descent gets used to solve a myriad of machine learning problems - fast svm, regularized regression, reduced rank matrix approximation, deep belief networks, etc. Nesterov's method can be used to speed up convergence for any of these. It's simple to apply. We'll run through some simple code that illustrates why gradient descent can sometimes be slow and then demonstrates the speedup with Nesterov's method. Warning - this will be a little nerdy but extremely useful if your algo is too slow or won't converge.
princeton - main theorem
princeton - background and needed lemma.
Derivation for more general function - (Tseng 2008)
google - On Accelerated Proximal Gradient Methods for Convex-Concave
In slide form (UCLA EE236)
google - 7. Fast proximal gradient methods