“White box” machine learning

From “A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action,” Cell 177, 1–13 (2019):

Data-driven machine learning activities are poised to transform biological discovery and the treatment of human disease (Camacho et al., 2018, Wainberg et al., 2018, Webb, 2018, Yu et al., 2018a); however, existing techniques for extracting biological information from large datasets frequently encode relationships between perturbation and phenotype in opaque “black-boxes” that are mechanistically uninterpretable and, consequently, can only identify correlative as opposed to causal relationships (Ching et al., 2018). In natural systems, biological molecules are biochemically organized in networks of complex interactions underlying observable phenotypes; biological network models may therefore harbor the potential to provide mechanistic structure to machine learning activities, yielding transparent “white-box” causal insights (Camacho et al., 2018, Yu et al., 2018b).

Chemical and genetic screens are workhorses in modern drug discovery but frequently suffer from poor (1%–3%) hit rates (Roses, 2008). Such low hit rates often underpower the bioinformatic analyses used for causal inference because of limitations in biological information content. Experimentally validated network models possess the potential to expand the biological information content of sparse screening data; however, biological screening experiments are typically performed independently from network modeling activities, limiting subsequent analyses to either post hoc bioinformatic enrichment from screening hits or experimental validation of existing models. Therefore, there is a need to develop biological discovery approaches that integrate biochemical screens with network modeling and advanced data analysis techniques to enhance our understanding of complex drug mechanisms (Camacho et al., 2018, Wainberg et al., 2018, Xie et al., 2017). Here we develop one such approach and apply it to understanding antibiotic mechanisms of action.


Machine learning aims to generate predictive models from sets of training data; such activities are typically comprised of three parts: input data, output data, and the predictive model trained to compute output data from input data (Figure 1A; Camacho et al., 2018). Although modern machine learning methods can assemble high-fidelity input-output associations from training data, the functions comprising the resulting trained models often do not possess tangible biochemical analogs, rendering them mechanistically uninterpretable. Consequently, predictive models generated by such (black-box) machine learning activities are unable to provide direct mechanistic insights into how biological molecules are interacting to give rise to observed phenomena. To address this limitation, we developed a white-box machine learning approach, leveraging carefully curated biological network models to mechanistically link input and output data (Yu et al., 2018b).

h/t Anne Trafton of MIT News, “Painting a Fuller Picture of How Antibiotics Act”:

Markus Covert, an associate professor of bioengineering at Stanford University, says the study is an important step toward showing that machine learning can be used to uncover the biological mechanisms that link inputs and outputs.

“Biology, especially for medical applications, is all about mechanism,” says Covert, who was not involved in the research. “You want to find something that is druggable. For the typical biologist, it hasn’t been meaningful to find these kinds of links without knowing why the inputs and outputs are linked.”