Edlund K, Larsson O, Ameur A, Bunikis I, Gyllensten U, Leroy B, Sundström M, Micke P, Botling J, Soussi T
Proc. Natl. Acad. Sci. U.S.A. 109 (24) 9551-9556 [2012-06-12; online 2012-05-24]
Cancer mutation databases are expected to play central roles in personalized medicine by providing targets for drug development and biomarkers to tailor treatments to each patient. The accuracy of reported mutations is a critical issue that is commonly overlooked, which leads to mutation databases that include a sizable number of spurious mutations, either sequencing errors or passenger mutations. Here we report an analysis of the latest version of the TP53 mutation database, including 34,453 mutations. By using several data-driven methods on multiple independent quality criteria, we obtained a quality score for each report contributing to the database. This score can now be used to filter for high-confidence mutations and reports within the database. Sequencing the entire TP53 gene from various types of cancer using next-generation sequencing with ultradeep coverage validated our approach for curation. In summary, 9.7% of all collected studies, mostly comprising numerous tumors with multiple infrequent TP53 mutations, should be excluded when analyzing TP53 mutations. Thus, by combining statistical and experimental analyses, we provide a curated mutation database for TP53 mutations and a framework for mutation database analysis.