Location visible to members
Boston Data Mining will co-host this event with Boston Predictive Analytics ( not necessary to cross register )
Featured Speaker - Dr. Rosaria Silipo
6:00 - 6:45 Presentation
6:45 - 7:00 Q&A, Break
7:00 - Hackathon / Hands-On with KNIME
The goal of every company these days (small or big, local or international) is to make use of their social data. Tons of pre-packaged applications are available, to screen the user sentiment or to represent the user circles using channel reporting tools, score-carding systems and predictive analytic techniques (primarily text mining).
Each has its useful aspects, but each also has limitations. In this presentation we will discuss a fourth approach – using a predictive analytic platform (KNIME (http://www.knime.org/)), that includes not only text mining, but network analysis as well as other predictive techniques such as clustering, to overcome the limitations of the previous techniques and to generate new fact based insight. KNIME [naim] is a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open integration platform provides over 1000 modules (nodes), including those of the KNIME community (http://tech.knime.org/) and its extensive partner networ (http://www.knime.org/partner/becoming-a-partner)k.
This approach was first used at a major European Telco. However, since data was proprietary, we replicated the work on publicly available data, to explain the detailed approach. In this project, text mining and network analytics were combined together to provide a better description of each user of a forum in terms of leadership and sentiment. By using network analytics, an authority score was calculated for each forum user. Text mining was used to measure the attitude of each user in the forum. Combining the authority/follower score with the attitude measure in a scatter plot, we easily detected the most extreme users in terms of attitude. It was interesting then to observe their degree of influence on the other forum participants.
Outlier identification, though, helps neither with an automatic user characterization nor with the description of the remaining more average users. Therefore, we reached to traditional data analytics in order to define a few groups with more general user features. Indeed, we identified a number of different clusters, including a very large cluster of inactive neutral users, a smaller cluster with positive and very active users, and an even smaller cluster with negative very active users. Different actions were then devised for different clusters of users.
Dr Rosaria Silipo holds a master degree in electrical engineering from the University of Florence (Italy, 1992) and a doctorate title in bioengineering from the Politecnico di Milano (Italy, 1996). The doctorate work dealt with statistical and machine learning algorithms for the automatic analysis of the electrocardiographic signal and was developed at the University of Florence (Italy) in cooperation with the Massachusetts Institute of Technology (USA).
She has been awarded two postdoctoral fellowships: one at Siemens (Munich, Germany,[masked]) and one at ICSI at the University of Berkeley (USA,[masked]) for the automatic analysis of biomedical signals and speech.
In 2000 she moved into the corporate world as a research engineer at Nuance (Menlo Park, USA,[masked]); as a senior developer at Spoken Translation (Berkeley, USA,[masked]); and as the manager of the SAS development group at Viseca (Zurich, Switzerland,[masked]).
Strong of the extensive experience acquired over the past years in applying data mining algorithms to industrial products, in 2009 she became a data mining consultant, helping companies to organize, clean, and finally make sense out of their data. From time to time she cooperates with KNIME in the development of cutting edge data mining applications.
Rosaria Silipo is the author of more than 50 scientific publications and of 3 books for data analysis practitioners.