Skip to content

Analysis of Lithuanian texts: a case of moon and femininity

Photo of Aidis Stukas
Hosted By
Aidis S.
Analysis of Lithuanian texts: a case of moon and femininity

Details

Presenter: Evaldas (KTU / ISM)

Tools used: gensim, fastText

Abstract: Presentation will discuss machine learning task of text classification. Text corpora was ASTRA stenograms, containing 110905 Lithuanian parliamentary transcripts from 147 speakers, collected during 1990 March - 2013 December. Texts were categorized by the political partisanship of a speaker, the gender of a speaker and the fact that a transcript was recorded around a full moon date. Types of pre-processing considered: original text, lemmized, morphized and translated to English. Lemmas and morphemes were obtained using semantika.lt and English translation using Google Translate services. Feature sets investigated: 6 from gensim (3 Doc2Vec variants, LSI, LDA, RP), 1 from fastText (Sent2Vec), and 3 custom-made (morfologija, stilometNER, ontologija). Random forest was used as a base-learner as well as a meta-learner (in 7 "stacking" configurations). Experiments reveal which categories, which types of pre-processing and which feature sets appear to be the most successful for texts analysed.

Language: EN

Image by presenter :)

Photo of PyData Kaunas group
PyData Kaunas
See more events
CUJO
Juozapavičiaus 31B · Kaunas