the interest in our cosy Meetup group and its corresponding events grows. A good reason to have another event before summer arrives and everybody leaves Berlin to relax.
The meeting's program is set up!
*) Thomas Vanck talks about Cross Validation and Model Selection
*) Alexander Kagoshima presents a case study: Deep Analytics on Traffic Data
*) Markus Frick asks for input for a real world problem in the context of medical data: Clustering in Medical Records. The basic question is how to model hierachical data for machine learning algorithms. Please find some details about the problem at the end of this text.
The event takes place at mbr targeting. They're cosy office is located at Hobrechstr. 65, close to Hermannplatz.
I am looking forward to meeting you in June.
Details about "Clustering in Medical Records"
Application Scenario: We have a set of patients files (all the data a clinic has gathered about a single patient). Such a patient file comprises the following kinds of data (these are the features):
*) demographic data like age, sex and formal data like admission date, length of the clinic stay, etc.
*) laboratory results - one such result has the form: (measurementType, value, unit, abnormalFlag) where measurementType defines the kind of measurement (like blood pressure) and abnormalFlag is something like (very low, low, normal, high, very high).
*) diagnoses - these are ICD10-encoded diagnoses (e.g. I20.8 stands for angina pectoris); these have a "prefix"-format, which means that I25.16 is a refinement of I25.1
*)semantic facts - formally, a semantic fact is represented as a small labelled tree with the nodes denoting instances of medical concepts and the relations denoting relationships. The tree
hasObservation:Seizure[hasAttribute:Tonic, hasAttribute:Clonic] i.e. a parent with two children, represents the fact that the patient has had a tonic-clonic seizure. Perhaps it's important to know that these concepts are organized hierarchically with respect to
a "subclass" relationship (actually it's some sort of taxonomy).
On a scientific/product side, we'd like to address a couple of questions with these data:
1.) given a patient and a selection of features, find me similar patients?? (whatever similarity means here?)
2.) given a condition (e.g. 30 < age < 70) on a single feature or on a set of features, can we learn that condition (find a classifier?)
3.) given a condition as above, can we refine the condition in such a way that the set of eligible patients remains similar (e..g can we make the 70 above a 69?)
4.) could we generalize the conditions such that the number of eligible patients crosses a certain threshold (assume that we want to increase the number of eligable patients without giving up
Together with these questions I'd like to discuss how these problems could/should be modelled, In particular,
the tree-like features with the underlying taxonomy.