addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Wikipedia Mining: NLP & Data Science joint meetup



On the 19th, we are having a joint meetup with the Hungarian NLP Meetup group, discussing Wikipedia mining related topics.




The schedule:


Gődény Balázs (Meltwater Group)

Using Wikipedia as a multi-purpose knowledge base for NLP tasks

Given an arbitrary piece of text we can identify well knownconcepts of Wikipedia that the text is mentioning. The set of suchconcepts can be taken as the representation of the document. Thisapproach can be a powerful tool in several NLP tasks like contentcategorization, named entity disambiguation, keyword extraction.


Judit Ács (MTA SZTAKI, Human Language Technologies Group)

Building multilingual dictionaries using Wikipedia

Wikipedia offers an excellent source for multilingual text mining. We created a framework to extract dictionaries in 40 high and medium density languages.Dictionary extraction from comparable corpora is a difficult task and known methods are limited to a few major languages. We extended this scope to 40 languages using language-independent statistical methods.


Julianna Göbölös-Szabó (MTA SZTAKI, Data Mining Group)

Data mining and Wikipedia.

Wikipedia is a very rich and colorful data with its several high-quality articles and meaningful link structure and can be exploited in distinct data mining problems. I will sketch two different applications: in the first case we exploit the hyperlinks between the versions of articles in different languages in order to recommend new links (or editions). Our second research focuses on the temporal changes in Wikipedia by using several snapshots. In this case the goal is to identify real-world events based on the editions in the graph.


Eszter Simon (RIL-HAS, Language Technology Group), Dávid Nemeskey (MTA SZTAKI, Human Language Technologies Group)

Automatically generated NE tagged corpora for English and Hungarian

Supervised Named Entity Recognizers require large amounts of annotated text. Since manual annotation is a highly costly procedure, reducing the annotation cost is essential. We present a fully automatic, language-independent method that builds NE annotated corpora from Wikipedia and DBpedia.

Join or login to comment.

  • Judit A.

    I uploaded the first version of the Wiktionary parser tool to GitHub:
    Currently it only supports Wiktionary parsing and triangulating will be added later this week.
    Have fun and please send me your feedback.

    June 25, 2013

  • Gábor R.

    Dear All,
    The four sets of slides are now available for download at
    Thanks for coming!

    1 · June 21, 2013

    • Gábor P.

      Thanks a lot! And thanks for the presenters as well!

      June 21, 2013

  • Gábor P.

    Would it be possible to get the slides of the presentations? Or are they already available somewhere and I just can't find them?

    June 19, 2013

    • Judit A.

      They will be made available around tomorrow.

      June 20, 2013

  • Gergely N.

    The presenters should read a book about giving great and enjoyable presentations. We are not here only for information we could read from the slides as well.

    1 · June 19, 2013

    • Godeny B.

      I hugely overestimated how much I can tell in 10 minutes so I ended up hurrying through the material, unfortunately missing some of the most important points I wanted to share. I feel bad about this but at least I learned something that I'm not sure I could have learned from books.

      2 · June 20, 2013

  • Milan A.

    got sick, very sorry i'll have to miss it.

    June 19, 2013

  • Zoltan C. T.

    There will be some beer and pizza!

    June 19, 2013

  • Milan A.

    There is, I think, a relevant webinar which starts at 6PM.

    Is there any chance to have a live stream available before meetup?

    June 14, 2013

    • Zoltan C. T.

      Sure, we can set it up. But maybe the presentation place will be taken till 6, so I would say we can watch it from 6:15

      June 16, 2013

    • Gábor C.

      It would be a great idea! I would really appreciate it, too.

      June 18, 2013

  • Peter B.

    Nagyon bonyolult lenne áttenni csütörtökre?

    June 4, 2013

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy