PhD Student, Intelligent Systems Program, University of Pittsburgh
30 Sep 2020 - Arun Balajiee
Data in clinical practice is biased and can only be collected or acquired once throughout the experiment. The data is high-dimensional, unstructured and is multi-modal (has to interpreted using multiple perspectives). So the problem that Dr. Schaar et al. are trying to solve is to be able to build an automated ML framework that can functionally deploy the best ML model on the clinical dataset, while also be able interpretable and easy to use to clinicians, medical data scientists and others who are not experts in the field of machine learning. Some of these model choices are based on using the best clinical scores of those models – that is the best suited model that can predict most accurately on that dataset. But these approachs employ post-hoc methods to identify the best fit model for a given dataset based on the AUC and other metrics. The goal of Dr. Schaar et al. is to be able to build a system that can predict the model in advance, which is better than any of the model from the ensemble and whose performance can be measured using many metrics
Obviously, the most direct approach of brute force selection is not practically feasible for this use case. The ideal approach would be to break down the big problem of building such as system into four sub-problems and trying to attack the individually. The first problem to solve would be to build a predictive model to automate risk predictions and get a holistic view of patient health. For this van der Schaar et al. designed the AutoPrognosis system , complete with the design to build this system to its implementation and performance analysis. However, to solve this problem the issue of training and testing dataset dimensionality has to be first tackled. For this, van der Schaar et al. discuss a modificaiton to the Bayesian Optimization techniques with structured kernel learning. The next problem that she tackles is to build the “Survival Model” which can predict with consistency over a period of time, with different datasets. The problem to tackle here is that there is no “best” survival model for all cases. van der Schaar et al. tackle this by using the strategy of quilting to solve the problem. Finally, van der Schaar discussed the time series data and individvual treatment recommendation
The entire talk was centered an inspired from the field of recommender systems as Dr. van der Schaar personally commented and towards the end of the talk there was an enriching disucssion on the field of recommender systems and the parallels from the field and its applications for interpretability, explainability and trustworthiness of the models in a designing an Automated Machine Learning system.