Abstract:We will explore techniques to determine the amount of similarity between documents. Specifically we will look at the intuition behind tf-idf and cosine similarity. With that as a foundation we will see how to compute these metrics with the natural language tool kit.
Harshvardhan Kelkar is a Software Engineer at Martini Media Inc. where he builds software for the Display Advertising Industry. Prior to that he worked at BMC Software on building the next generation Remedy Platform. He also likes the zen of python (import this).
*This is a Bay Area Python Interest Group (BayPIGgies) organized event. Please also see their web page: http://baypiggies.net/