Relating enhancer genetic variation across mammals to complex phenotypes using machine learning.

Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, Foley K, Genereux DP, Zoonomia Consortium** , Karlsson EK, Lindblad-Toh K, Meyer WK, Pfenning AR

Science 380 (6643) eabm7993 [2023-04-28; online 2023-04-28]

Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.

Bioinformatics Support for Computational Resources [Service]

NGI Short read [Service]

NGI Uppsala (SNP&SEQ Technology Platform) [Service]

National Genomics Infrastructure [Service]

PubMed 37104615

DOI 10.1126/science.abm7993

Crossref 10.1126/science.abm7993

mid: NIHMS1897368
pmc: PMC10322212


Publications 9.5.0