Environ. Sci. Technol. 56 (22) 15508-15517 [2022-11-15; online 2022-10-21]
To achieve water quality objectives of the zero pollution action plan in Europe, rapid methods are needed to identify the presence of toxic substances in complex water samples. However, only a small fraction of chemicals detected with nontarget high-resolution mass spectrometry can be identified, and fewer have ecotoxicological data available. We hypothesized that ecotoxicological data could be predicted for unknown molecular features in data-rich high-resolution mass spectrometry (HRMS) spectra, thereby circumventing time-consuming steps of molecular identification and rapidly flagging molecules of potentially high toxicity in complex samples. Here, we present MS2Tox, a machine learning method, to predict the toxicity of unidentified chemicals based on high-resolution accurate mass tandem mass spectra (MS2). The MS2Tox model for fish toxicity was trained and tested on 647 lethal concentration (LC50) values from the CompTox database and validated for 219 chemicals and 420 MS2 spectra from MassBank. The root mean square error (RMSE) of MS2Tox predictions was below 0.89 log-mM, while the experimental repeatability of LC50 values in CompTox was 0.44 log-mM. MS2Tox allowed accurate prediction of fish LC50 values for 22 chemicals detected in water samples, and empirical evidence suggested the right directionality for another 68 chemicals. Moreover, by incorporating structural information, e.g., the presence of carbonyl-benzene, amide moieties, or hydroxyl groups, MS2Tox outperforms baseline models that use only the exact mass or log KOW.