Genomic structure of the horse major histocompatibility complex class II region resolved using PacBio long-read sequencing technology

Viļuma A, Mikko S, Hahn D, Skow L, Andersson G, Bergström TF

Sci Rep 7 (-) 45518 [2017-03-31; online 2017-03-31]

The mammalian Major Histocompatibility Complex (MHC) region contains several gene families characterized by highly polymorphic loci with extensive nucleotide diversity, copy number variation of paralogous genes, and long repetitive sequences. This structural complexity has made it difficult to construct a reliable reference sequence of the horse MHC region. In this study, we used long-read single molecule, real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) to sequence eight Bacterial Artificial Chromosome (BAC) clones spanning the horse MHC class II region. The final assembly resulted in a 1,165,328 bp continuous gap free sequence with 35 manually curated genomic loci of which 23 were considered to be functional and 12 to be pseudogenes. In comparison to the MHC class II region in other mammals, the corresponding region in horse shows extraordinary copy number variation and different relative location and directionality of the Eqca-DRB, -DQA, -DQB and -DOB loci. This is the first long-read sequence assembly of the horse MHC class II region with rigorous manual gene annotation, and it will serve as an important resource for association studies of immune-mediated equine diseases and for evolutionary analysis of genetic diversity in this region.

Bioinformatics Compute and Storage [Service]

Bioinformatics Long-term Support WABI [Service]

NGI Uppsala (Uppsala Genome Center) [Service]

QC bibliography QC xrefs

PubMed 28361880

DOI 10.1038/srep45518

Crossref 10.1038/srep45518