A Machine Learning Approach to Explain Drug Selectivity to Soluble and Membrane Protein Targets.

Freyhult E, Gustafsson MG, Strömbergsson H

Mol Inform 34 (1) 44-52 [2015-01-00; online 2015-01-08]

Improved understanding of the forces that determine drug specificity to their targets is important for drug design and discovery, as well as for gaining knowledge about molecular recognition. Here, we present a machine learning approach that includes all approved drugs with a known protein target. The drugs were characterized using easily interpretable physico-chemical descriptors. Employing the Random Forest method, we were able to predict whether a drug binds to a soluble or membrane protein with an average accuracy of 84 % and an average area under curve of 0.91. The high average performance suggests that there exist some general physico-chemical differences between drugs that bind to membrane and soluble protein targets. Variable importance measures in combination with permutation tests were used to find the most influential descriptors. This resulted in six outstanding descriptors, that all involve drug flexibility and lipophilicity, suggesting that drugs binding to membrane protein targets are in general more flexible and lipophilic, and conversely, drugs binding to soluble protein targets are more rigid and hydrophilic. With the notion that ligands in general are blueprints of their protein pockets, we may also draw general conclusions about the protein-pocket properties which may add to the understanding of molecular recognition.

Bioinformatics (NBIS)

Bioinformatics Support and Infrastructure

Bioinformatics Support, Infrastructure and Training

PubMed 27490861

DOI 10.1002/minf.201400121

Crossref 10.1002/minf.201400121