Data on haplotype-supported immunoglobulin germline gene inference.

Kirik U, Greiff L, Levander F, Ohlin M

Data Brief 13 (-) 620-640 [2017-08-00; online 2017-06-27]

Data that defines IGHV (immunoglobulin heavy chain variable) germline gene inference using sequences of IgM-encoding transcriptomes obtained by Illumina MiSeq sequencing technology are described. Such inference is used to establish personalized germline gene sets for in-depth antibody repertoire studies and to detect new antibody germline genes from widely available immunoglobulin-encoding transcriptome data sets. Specifically, the data has been used to validate (Parallel antibody germline gene and haplotype analyses support the validity of immunoglobulin germline gene inference and discovery (DOI: 10.1016/j.molimm.2017.03.012) (Kirik et al., 2017) [1]) the inference process. This was accomplished based on analysis of the inferred germline genes' association to the donors' different haplotypes as defined by their different, expressed IGHJ alleles and/or IGHD genes/alleles. The data is important for development of validated germline gene databases containing entries inferred from immunoglobulin-encoding transcriptome sequencing data sets, and for generation of valid, personalized antibody germline gene repertoires.

Bioinformatics Compute and Storage [Service]

NGI Stockholm (Genomics Applications) [Service]

NGI Stockholm (Genomics Production) [Service]

National Genomics Infrastructure [Service]

PubMed 28725665

DOI 10.1016/j.dib.2017.06.031

Crossref 10.1016/j.dib.2017.06.031

pii: S2352-3409(17)30275-5
pmc: PMC5502703

Publications 9.5.0