Knowledge Extraction from a Web Forum using KNIME

By Dr. Rosaria Silipo

In recent years, the web has become the source of all possible information. More and more often web contents are being used to know customers’ and real-world people’s orientations. This project covers all parts of the process of extracting information from the web using KNIME.

In order to produce a practical example and at the same time to know more about the KNIME community users, this analysis is focused on data from the KNIME Forum. The analysis is divided in 4 parts.

The first workflow is a web crawler and is dedicated to web content extraction and data reorganization to make it suitable for the following analysis. Then, a few basic statistical measures are calculated to get insights about the forum performance as an indirect measure of the KNIME community performance. Here users can be posters and commenters at the same time. While the total number of users and of posts over time gives a measure of the community growth, the average number of comments for each post can be considered a measure of the forum answer efficiency.  The topics discussed in the KNIME forum represent another big source of information: they clearly describe the evolution of the users’ interests and wishes over time. In the third part, a full workflow has been implemented to classify topics and detect topic shifts in time. Finally, a fourth workflow is dedicated to see how the forum users interact with each other in different discussion groups. Here, depending on the discussed topics, experts emerge quickly from the user network graph.

Rosaria Silipo’s short Bio


Web site:

Dr Rosaria Silipo holds a master degree in electrical engineering from the University of Florence (Italy, 1992) and a doctorate title in bioengineering from the Politecnico di Milano (Italy, 1996). The doctorate work dealt with statistical and machine learning algorithms for the automatic analysis of the electrocardiographic signal and was developed at the University of Florence (Italy) in cooperation with the Massachusetts Institute of Technology (USA).

She has been awarded two postdoctoral fellowships: one at Siemens (Munich, Germany,[masked]) and one at ICSI at the University of Berkeley (USA,[masked]) for the automatic analysis of biomedical signals and speech.

In 2000 she moved into the corporate world as a research engineer at Nuance (Menlo Park, USA,[masked]); as a senior developer at Spoken Translation (Berkeley, USA,[masked]); and as the manager of the SAS development group at Viseca (Zurich, Switzerland,[masked]).

Strong of the extensive experience acquired over the past years in applying data mining algorithms to industrial products, in 2009 she became a data mining consultant, helping companies to organize, clean, and finally make sense out of their data. From time to time she cooperates with KNIME in the development of cutting edge data mining applications.

Rosaria Silipo is the author of more than 50 scientific publications and of 3 books for data analysis practitioners.

Join or login to comment.

  • Rita R.

    Very informative and well delivered

    October 3, 2013

  • Subhankar R.

    Pizza coming..

    October 2, 2013

  • Subhankar R.

    Workflow files for hand on session are here.
    You also need to download KNIME.

    October 2, 2013

  • Anil T.

    I run a Biotech company that mines biological data. Looking forward to finding out more about forum text mining.

    September 30, 2013

  • Anil T.

    I run a Biotech company that mines biological data. Looking forward to finding out more about forum text mining.

    September 30, 2013

Our Sponsors

Sometimes the best Meetup Group is the one you start

Get started Learn more

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy