Skip to content

Knowledge Extraction from a Web Forum using KNIME

Photo of Subhankar "Ray"
Hosted By
Subhankar ".
Knowledge Extraction from a Web Forum using KNIME

Details

By Dr. Rosaria Silipo

In recent years, the web has become the source of all possible information. More and more often web contents are being used to know customers’ and real-world people’s orientations. This project covers all parts of the process of extracting information from the web using KNIME (http://knime.org).

In order to produce a practical example and at the same time to know more about the KNIME community users, this analysis is focused on data from the KNIME Forum. The analysis is divided in 4 parts.

The first workflow is a web crawler and is dedicated to web content extraction and data reorganization to make it suitable for the following analysis. Then, a few basic statistical measures are calculated to get insights about the forum performance as an indirect measure of the KNIME community performance. Here users can be posters and commenters at the same time. While the total number of users and of posts over time gives a measure of the community growth, the average number of comments for each post can be considered a measure of the forum answer efficiency. The topics discussed in the KNIME forum represent another big source of information: they clearly describe the evolution of the users’ interests and wishes over time. In the third part, a full workflow has been implemented to classify topics and detect topic shifts in time. Finally, a fourth workflow is dedicated to see how the forum users interact with each other in different discussion groups. Here, depending on the discussed topics, experts emerge quickly from the user network graph.

Rosaria Silipo’s short Bio

Email: rosariasilipo@yahoo.com

Web site: http://dataminingreporting.weebly.com/

Dr Rosaria Silipo holds a master degree in electrical engineering from the University of Florence (Italy, 1992) and a doctorate title in bioengineering from the Politecnico di Milano (Italy, 1996). The doctorate work dealt with statistical and machine learning algorithms for the automatic analysis of the electrocardiographic signal and was developed at the University of Florence (Italy) in cooperation with the Massachusetts Institute of Technology (USA).

She has been awarded two postdoctoral fellowships: one at Siemens (Munich, Germany, 1996-1997) and one at ICSI at the University of Berkeley (USA, 1997-2000) for the automatic analysis of biomedical signals and speech.

In 2000 she moved into the corporate world as a research engineer at Nuance (Menlo Park, USA, 2000-2002); as a senior developer at Spoken Translation (Berkeley, USA, 2002-2007); and as the manager of the SAS development group at Viseca (Zurich, Switzerland, 2007-2009).

Strong of the extensive experience acquired over the past years in applying data mining algorithms to industrial products, in 2009 she became a data mining consultant, helping companies to organize, clean, and finally make sense out of their data. From time to time she cooperates with KNIME in the development of cutting edge data mining applications.

Rosaria Silipo is the author of more than 50 scientific publications and of 3 books for data analysis practitioners.

Photo of Boston AI/LLMs/ChatGPT Developers Group group
Boston AI/LLMs/ChatGPT Developers Group
See more events
Cambridge Innovation Center (CIC) - 5th Floor - Havana Room
1 Broadway (crossing of Broadway & 3rd St) · Cambridge, MA