SLM develops Statistical Learning Algorithms

for their use in medicine !!!!

The expanding collection and sharing of health-related data, increases in computational power, and advances in machine learning (ML) are hoped to enable discoveries of better ways to prevent, diagnose, and treat disease.  In our clinical field of Radiation Oncology, Machine Learning has been applied to outcome prediction, quality assurance, auto-segmentation and image registration, image classification, treatment planning and it is poised to become an indispensable tool in our daily clinical workflows.  Despite new advances, Radiation Oncology has many specific challenges, ranging from unique and complex datasets with multiple source of information (e.g. comorbidities, 4DCT, CBCT, CT, dose, structures, setup and quality assurance or genetic information), limited clinical outcome data, lack of standard of care for many disease sites, interaction of radiation and chemotherapy, limited access to genomics data, and the presence of confounders in many of our clinical datasets. If we pair these challenges with suboptimal algorithms, the indiscriminate deployment of models developed can compromise medicine's fundamental oath to primum non nocere. For instance, an artificial neural network (a non-interpretable algorithm) that was developed to triage patients with pneumonia for hospital discharge was found to inadvertently label asthmatic patients as low risk. Deploying this neural network could have had detrimental consequences for these patients but if an interpretable algorithm had been used this error could have been easily detected by physicians. Similar problems have been found for image classification tasks using deep learning giving a false sense of accuracy to physicians (e.g a model used the label “portable” on X-ray images to predict an increased risk of cardiomyopathy since patients that cannot move need to have the x-rays done at their beds). Therefore, to make ML part of everyday clinical practice in Radiation Oncology and Medicine at large, a critical challenge is to increase the robustness and transparency of the models developed. Equally important is to create a set of tools, commissioning procedures and a quality assurance program that could let us detect population shifts from the data used to train the algorithms or errors due to the presence of confounders. Towards achieving these goals, SLM is devoted. 

Theoretical Contributions:

In collaboration with Penn Computer Science Department and Stanford Statistics Department we developed MediBoost, an algorithm that improves the accuracy of the most popular decision tree algorithm (CART) while keeping its same topology and as such its interpretability. This algorithm was further extended in one of our hallmark publications to show how it unified two of the most popular frameworks to build ML models: CART and Gradient Boosting. This new framework was called “The Additive Tree” and due to its impact on accuracy and interpretability of decision trees, and the importance of the later in medicine, we belief that it opens a new era of research on Decision Tree algorithms. Additionally, in collaboration with the Berkeley Biostatistics and Statistics Department, weI have developed the Conditional Interpretable Super Learner (CiSL), an algorithm that removes the topological constraints that interpretable algorithms have while still building a transparent mode (under preparation for submission). Further, in this work we show for the first time how it is possible to learn in the cross validation space and improve on widely popular techniques like stacking. We believe that CiSL, for its characteristics, is especially important for the analysis of structured clinical trial data and dynamic treatment allocation. Big part of my future intellectual activity will be dedicated to the application of CiSL to Radiation Oncology clinical trial to optimize treatment selection. We have also led a team that have created the framework Expert Augmented Machine Learning (EAML), the first platform that effectively combine physicians and AI knowledge to improve over both. For a detailed description of the algorithm watch the following youtube video: 

 

To tackle problems with multi modality in nature we have developed Representational Gradient Boosting (RGB), the first meta algorithm of its type that let's users optimize at the same time state of the art algorithms like CNN and GB. If interested here is a video explaining the framework: 

 

Recently we have been interested in providing NNs with feature  and architecture selection capabilities. For this we have developed, Lockout, an extension of the path seeking ideas behind the popular package glmnet but to non linear function. With Lockout the strength or value of the regularization parameter does not need to be specified but it is automatically found by the optimizer. This opens a new era of effective regularization for NNs. If interested you can watch a description of the algorithm here: 

 

Applied Contributions: 

We have also been widely interested in the applications of Machine Learning for Quality Assurance (QA).  In this sense, We have pioneered the use of predictive models for their application to QA in Radiation Therapy. Specifically, Dr Valdes was one of the first authors to apply Machine Learning to Quality Assurance data in Radiation Oncology with the goal to improve patient safety. We developed ML models that predicted errors on the imaging system on the Linacs, a key factor in the delivery of accurate radiation treatments . Additionally, We developed and validated the concept of Virtual IMRT QA, an application that enables safe pre-treatment radiation therapy plan verification. Virtual IMRT QA will play a key role in the safe introduction of Adaptiative Radiation Therapy, one of the frontiers for Radiation Therapy in the next decade. A good part of our applied research program is intended to the deployment of Virtual IMRT QA into clinical practice and enabling adaptative Radiation Therapy.