Analysis of Lithuanian texts: a case of moon and femininity

Name: Analysis of Lithuanian texts: a case of moon and femininity
Start: 2018-04-05T19:00:00+03:00
End: 2018-04-05T21:00:00+03:00
Location: CUJO

Hosted by Aidis S.

PyData Kaunas

Details

Presenter: Evaldas (KTU / ISM)

Tools used: gensim, fastText

Abstract: Presentation will discuss machine learning task of text classification. Text corpora was ASTRA stenograms, containing 110905 Lithuanian parliamentary transcripts from 147 speakers, collected during 1990 March - 2013 December. Texts were categorized by the political partisanship of a speaker, the gender of a speaker and the fact that a transcript was recorded around a full moon date. Types of pre-processing considered: original text, lemmized, morphized and translated to English. Lemmas and morphemes were obtained using semantika.lt and English translation using Google Translate services. Feature sets investigated: 6 from gensim (3 Doc2Vec variants, LSI, LDA, RP), 1 from fastText (Sent2Vec), and 3 custom-made (morfologija, stilometNER, ontologija). Random forest was used as a base-learner as well as a meta-learner (in 7 "stacking" configurations). Experiments reveal which categories, which types of pre-processing and which feature sets appear to be the most successful for texts analysed.

Language: EN

Image by presenter :)

PyData Kaunas

Analysis of Lithuanian texts: a case of moon and femininity

PyData Kaunas

Details

Related topics

You may also like