Enterprise Search - Hadoop - Data Mining


Details
Ágnes Molnár: Entity Extraction – Getting structured information from unstructured content?
Most organizations suffer from an explosion of content of many types from many systems; Enterprise Search can be the bridge across information silos and make all this content easy to find and act on.
Tagging and classification are good ways to improve findability, but it takes time and human resources unless you have a good way to automate these steps. In this session, I will demonstrate how to plan and get ready for this level of search solution maturity, how to extract structured information from unstructured content, and what the key points and most important challenges are of entity extraction, as well as how to use the extracted information in your Search solutions.
Short bio
Agnes Molnar serves as a Senior Search Solutions Consultant for BA Insight (www.bainsight.com (http://www.bainsight.com/)). With a strong focus on Enterprise Search and Knowledge Management, she has been working with SharePoint technologies since 2001, and has architected dozens of SharePoint and FAST implementations for commercial and government organizations throughout Europe and the Americas.
In her role at BA Insight, Agnes helps guide the company’s product development team, and oversees enterprise-level deployments of the company’s technologies. A co-author and contributor to several books, including Real World SharePoint 2010 and SharePoint 2010 Unleashed, Agnes is a regular speaker at technical conferences and symposiums around the world. On her blog (www.aghy.hu (http://aghy.hu/)), Agnes regularly writes on the topics of enterprise search best practices, and SharePoint & FAST technologies.
Zsuzsanna Huczman: What is data mining? - Getting knowledge from structured data.
Every big company makes sales campaigns regularly when they call customers to sell something or give a discounted offer. It is based on the data in the background which is collected from customers - and also non-customers too. Which customers are called and which offers are given for them? This decision is becoming harder and harder while the number of customers and products are rising. This is the reason why data mining comes into focus: there are several data mining algorithms which can help us to convert big databases into useful information. Rapid Miner is an open source data mining software which is easy-to-use for everybody, I would like to show a data mining process with this tool based on real database.
Short bio
I received an MSc degree from the BME in 2009. During university I have the opportunity to take part in a financial project of one of the data mining group of BME. I have been working at Data Solutions Ltd. (www.datasolutions.hu (http://www.datasolutions.hu/)) for 4 years as analyst. Predecessor of this company was among the first consultant companies in the Central-European region which are specialized in data mining 15 years ago. My special fields are different types of open source data mining software and the telecommunication sector where I work with IBM SPSS Modeler (previously known as Clementine) and RapidMiner. I started a blog (adatmagus.blog.hu (http://adatmagus.blog.hu/)) one year ago whose mission is familiarizing public with data mining widespread.
András Benczúr: The LAWA Project: Towards a Virtual Web Observatory
The LAWA project on Longitudinal Analytics of Web Archive data builds an Internet-based experimental testbed for large-scale data analytics. Its focus is on developing a sustainable infra-structure, scalable methods, and easily usable software tools for aggregating, querying, and analyzing heterogeneous data at Internet scale for a deep understanding of Internet content characteristics (size, distribution, form, structure, evolution, dynamic).
I will show how far this (overly) ambitious project led us, what are the main achievements and blockers that we have identified. Some of the first (but really preliminary) demos are already up, http://vwo.lawa-project.eu:8080/ . Some limitations of current systems for distributed data analysis, especially of Hadoop, are in part resolved. However archival institutions still lack an easy-to-deploy, high quality and stable Web scale search solution and now we are trying to gather forces in collaboration with the Stratosphere project ( http://www.stratosphere.eu ) and also for scaling the SZTAKI plagiarism detection service ( http://www.kopi.sztaki.hu ) over the BonFIRE experimental cloud.
Short bio
Andras Benczur received his Ph.D. at the Massachusetts Institute of Technology in applied mathematics in 1997. Since then he is researcher at the Institute for Computer Science and Control of the Hungarian Academy of Sciences (MTA SZTAKI) where he heads the Informatics Laboratory of 30 researchers since 2008. The lab participates in international research and national industry projects in information retrieval and business intelligence. Among others his research on Web information retrieval was honored by a Yahoo! Faculty Research Grant, he lead the KDD Cup 2007 winner team and organized the ECML/PKDD 2010 Discovery Challenge on Web Quality.
Zoltan Toth: Pig: The Good Parts (a case study)
Apache Pig is a platform built on top of Hadoop that helps you quickly analyze large unstructured datasets.
Experience and challenges: a hands-on introduction to Pig through a Prezi case study.
Short bio
Prior to joining Prezi I worked as a developer for pharmaceutical market research companies. Now, as Senior Data Engineer, I help Prezi arrive at data-driven decisions.

Enterprise Search - Hadoop - Data Mining