Genomic Characterization of the Barnacle Balanus improvisus Reveals Extreme Nucleotide Diversity in Coding Regions

Alm Rosenblad M, Abramova A, Lind U, Ólason P, Giacomello S, Nystedt B, Blomberg A

Mar Biotechnol - (-) - [2021-05-01; online 2021-05-01]

Barnacles are key marine crustaceans in several habitats, and they constitute a common practical problem by causing biofouling on man-made marine constructions and ships. Despite causing considerable ecological and economic impacts, there is a surprising void of basic genomic knowledge, and a barnacle reference genome is lacking. We here set out to characterize the genome of the bay barnacle Balanus improvisus (= Amphibalanus improvisus) based on short-read whole-genome sequencing and experimental genome size estimation. We show both experimentally (DNA staining and flow cytometry) and computationally (k-mer analysis) that B. improvisus has a haploid genome size of ~ 740 Mbp. A pilot genome assembly rendered a total assembly size of ~ 600 Mbp and was highly fragmented with an N50 of only 2.2 kbp. Further assembly-based and assembly-free analyses revealed that the very limited assembly contiguity is due to the B. improvisus genome having an extremely high nucleotide diversity (π) in coding regions (average π ≈ 5% and average π in fourfold degenerate sites ≈ 20%), and an overall high repeat content (at least 40%). We also report on high variation in the α-octopamine receptor OctA (average π = 3.6%), which might increase the risk that barnacle populations evolve resistance toward antifouling agents. The genomic features described here can help in planning for a future high-quality reference genome, which is urgently needed to properly explore and understand proteins of interest in barnacle biology and marine biotechnology and for developing better antifouling strategies.

Bioinformatics Long-term Support WABI [Collaborative]

Bioinformatics Support, Infrastructure and Training [Collaborative]

NGI Stockholm (Genomics Applications)

NGI Stockholm (Genomics Production)

National Genomics Infrastructure

PubMed 33931810

DOI 10.1007/s10126-021-10033-8

Crossref 10.1007/s10126-021-10033-8