Past Meetup

Keep-Current :: Machine Learning Seminar #2

This Meetup is past

35 people went

WeAreDevelopers Office

Doblhofgasse 9 Tür 14 · Vienna

How to find us

Pay attention: use the Doblhoffgasse 9 entrance, with the sign: WeAreDevelopers - Reception. There's a buzzer on the right side of the main door. We're on the 4th floor, door 14. The meeting room is on the left.

Location image of event venue

Details

Machine Learning Seminar #2 - Document Distance

Level: Advanced

This is the second event in a series of seminars for approaching, understanding and working with Machine learning from different perspectives.

These events are not a lectures, but rather discussions that aim to expand the know-how and the understanding of machine learning.

It is known that the best way to learn and understand something fully, is to teach it to others. Therefore, this is an opportunity for you to 'show-off' what you have learnt while at the same time deepening your knowledge in the field through teaching it to the other members in the group.

Yet, this is not a competition. Gaps in the material can and should be filled by other members in the group. We're here to learn from each other - without judging.

--

We remain in the field of Natural Language Processing and with this event we move from words representations to documents while focusing on document distance for clustering and classification.

We will discuss and explore similarities and differences across varying methodologies - from cosine similarity through word-movers distance to Kullback-leibler divergence and Hellinger distance - in an attempt to understand better the best uses and limitations of these tools.

The seminar format works best if you come prepared. Please check the reading list below and bring your own insights, questions, and perplexities to the table!

Note: The meetup end-hour is approximated. After the meetup, we will continue to a restaurant nearby for drinks and/or dinner.

## Recommended reading list:

# Background - Information Theory:

https://web.stanford.edu/class/stats311/Lectures/lec-02.pdf

# Word-Movers-Distance:

http://proceedings.mlr.press/v37/kusnerb15.pdf - From word embeddings to document distance

https://arxiv.org/pdf/1805.04437.pdf - Cross-lingual Document Retrieval using Regularized Wasserstein Distance

https://medium.com/@stephenhky/word-movers-distance-as-a-linear-programming-problem-6b0c2658592e

(Optional read - an extended method - http://papers.nips.cc/paper/6138-supervised-word-movers-distance)

# Kullback-Leibler Divergence:

https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

http://users.softlab.ntua.gr/facilities/public/AD/Text%20Categorization/Using%20Kullback-Leibler%20Distance%20for%20Text%20Categorization.pdf

# Doc2Vec

https://arxiv.org/pdf/1405.4053.pdf - Distributed Representations of Sentences and Documents

https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e

# Differences between KL-Divergence, Bhattacharyya and Hellinger distance:

https://stats.stackexchange.com/questions/130432/differences-between-bhattacharyya-distance-and-kl-divergence

as well as in the correspondent Wikipedia articles

# Document Clustering

Similarity Measures for Text Document Clustering
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.332.4480&rep=rep1&type=pdf

--

As always, if you have more sources, please share them in the comments or the discussions/forum section of the meetup.

We look forward to seeing you!