Nächstes Meetup

Keep-Current :: Machine Learning Seminar #2
Machine Learning Seminar #2 - Document Distance Level: Advanced This is the second event in a series of seminars for approaching, understanding and working with Machine learning from different perspectives. These events are not a lectures, but rather discussions that aim to expand the know-how and the understanding of machine learning. It is known that the best way to learn and understand something fully, is to teach it to others. Therefore, this is an opportunity for you to 'show-off' what you have learnt while at the same time deepening your knowledge in the field through teaching it to the other members in the group. Yet, this is not a competition. Gaps in the material can and should be filled by other members in the group. We're here to learn from each other - without judging. -- We remain in the field of Natural Language Processing and with this event we move from words representations to documents while focusing on document distance for clustering and classification. We will discuss and explore similarities and differences across varying methodologies - from cosine similarity through word-movers distance to Kullback-leibler divergence and Hellinger distance - in an attempt to understand better the best uses and limitations of these tools. The seminar format works best if you come prepared. Please check the reading list below and bring your own insights, questions, and perplexities to the table! Note: The meetup end-hour is approximated. After the meetup, we will continue to a restaurant nearby for drinks and/or dinner. ## Recommended reading list: # Background - Information Theory: https://web.stanford.edu/class/stats311/Lectures/lec-02.pdf # Word-Movers-Distance: http://proceedings.mlr.press/v37/kusnerb15.pdf - From word embeddings to document distance https://arxiv.org/pdf/1805.04437.pdf - Cross-lingual Document Retrieval using Regularized Wasserstein Distance https://medium.com/@stephenhky/word-movers-distance-as-a-linear-programming-problem-6b0c2658592e (Optional read - an extended method - http://papers.nips.cc/paper/6138-supervised-word-movers-distance) # Kullback-Leibler Divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence http://users.softlab.ntua.gr/facilities/public/AD/Text%20Categorization/Using%20Kullback-Leibler%20Distance%20for%20Text%20Categorization.pdf # Doc2Vec https://arxiv.org/pdf/1405.4053.pdf - Distributed Representations of Sentences and Documents https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e # Differences between KL-Divergence, Bhattacharyya and Hellinger distance: https://stats.stackexchange.com/questions/130432/differences-between-bhattacharyya-distance-and-kl-divergence as well as in the correspondent Wikipedia articles # Document Clustering Similarity Measures for Text Document Clustering http://citeseerx.ist.psu.edu/viewdoc/download?doi= -- As always, if you have more sources, please share them in the comments or the discussions/forum section of the meetup. We look forward to seeing you!

WeAreDevelopers Office

Doblhofgasse 9 Tür 14 · Wien

Nächste Meetups

Vergangene Meetups (17)

Worum es bei uns geht

This is the official WeAreDevelopers Meetup.

Our goal is to create a developers community, that helps and enrich each other by sharing the latest news and updates in Software Development, DevOps, Machine (and Deep) Learning, Data Science, Data Engineering and more.

We also host hands-on development sessions for the educational open-source project -
https://www.github.com/Keep-Current , which uses Natural Language Processing and recommendation system algorithms to keep you up to date by filtering personalized relevant content. This enables you to 'get your hands dirty' with machine learning, learning and experiencing software architecture (i.e. hexagon, clean-code, etc.), playing around with front-end technologies or different databases (ArangoDB, GraknDB, ScylliaDB, etc.).

Join our technology-specific community groups on Facebook,:

Where we have groups for Machine Learning, Web development and Advanced Software development.

Follow us on Twitter:


And we're also chatting on Slack. Join us here:


Mitglieder (650)

Fotos (85)

Du findest uns auch auf