Paralogization and New Protein Architectures in Planctomycetes Bacteria with Complex Cell Structures.

Mahajan M, Yee B, Hägglund E, Guy L, Fuerst JA, Andersson SGE

Mol. Biol. Evol. 37 (4) 1020-1040 [2020-04-01; online 2019-12-07]

Bacteria of the phylum Planctomycetes have a unique cell plan with an elaborate intracellular membrane system, thereby resembling eukaryotic cells. The origin and evolution of these remarkable features is debated. To study the evolutionary genomics of bacteria with complex cell architectures, we have resequenced the 9.2-Mb genome of the model organism Gemmata obscuriglobus and sequenced the 10-Mb genome of G. massiliana Soil9, the 7.9-Mb genome of CJuql4, and the 6.7-Mb genome of Tuwongella immobilis, all of which belong to the family Gemmataceae. A gene flux analysis of the Planctomycetes revealed a massive emergence of novel protein families at multiple nodes within the Gemmataceae. The expanded protein families have unique multidomain architectures composed of domains that are characteristic of prokaryotes, such as the sigma factor domain of extracytoplasmic sigma factors, and domains that have proliferated in eukaryotes, such as the WD40, leucine-rich repeat, tetratricopeptide repeat and Ser/Thr kinase domains. Proteins with identifiable domains in the Gemmataceae have longer lengths and linkers than proteins in most other bacteria, and the analyses suggest that these traits were ancestrally present in the Planctomycetales. A broad comparison of protein length distribution profiles revealed an overlap between the longest proteins in prokaryotes and the shortest proteins in eukaryotes. We conclude that the many similarities between proteins in the Planctomycetales and the eukaryotes are due to convergent evolution and that there is no strict boundary between prokaryotes and eukaryotes with regard to features such as gene paralogy, protein length, and protein domain composition patterns.

Bioinformatics Support for Computational Resources [Service]

NGI Uppsala (Uppsala Genome Center) [Service]

National Genomics Infrastructure [Service]

PubMed 31808939

DOI 10.1093/molbev/msz287

Crossref 10.1093/molbev/msz287

pii: 5663460