A new feature selection algorithm for healthcare analytics


Details
Abstract: It may be assumed that the best way to make a decision is by considering all the factors that might influence the outcome and weighing their relative importance. However, people tend to overweigh peripheral variables at the expense of critical ones when they try to take all factors into account. To minimize the risk of overweighing peripheral considerations the critical variables in the data set can be identified by a feature selection algorithm.
The most common filter feature selection algorithms cannot eliminate redundant variables and may produce inconclusive results while the wrapper feature selection methods are prone to over fitting. A new feature selection algorithm was developed to overcome limitations of filter and wrapper feature selection approaches. The algorithm evaluates all features simultaneously without the exhaustive search among feature combinations, eliminates redundant variables, and avoids over fitting by classifier independent selection of features. The new algorithm should be useful for the analysis of diverse data sets from healthcare, molecular biosciences, or business.
Presenter: Dr. Jiri Perutka
Ph.D. in Analytical chemistry. 15+ years experience in data analysis, predictive modeling and molecular biology. Developed a computer algorithm for Sigma-Aldrich. Founder of Tar Getronics LLC.
Developed a computer algorithm for chronic disease care optimization. As a test of the algorithm a model of Alzheimer's disease based on MRI data (cortical thickness differences after 6,12, and 24 months from the baseline visits) was built. The model indicates that the first region of a human brain to be affected in Alzheimer's disease is the entorhinal cortex. The most affected brain regions in Alzheimer's disease are temporal pole, entorhinal and Para hippocampal region. The model can distinguish healthy controls from Alzheimer's disease patients with 83% accuracy. The accuracy was estimated under 5-fold cross-validation averaged over 20 repetitions of the fold-selection. The algorithm is competitive with other machine learning methods. https://www.linkedin.com/in/jiri-perutka-2479602a

A new feature selection algorithm for healthcare analytics