Thörnqvist L, Ohlin M
Mol. Immunol. 96 (-) 61-68 [2018-04-00; online 2018-02-28]
Inference of antibody gene repertoires using transcriptome data has emerged as an alternative approach to the complex process of sequencing of adaptive immune receptor germline gene loci. The diversity introduced during rearrangement of immunoglobulin heavy chain variable (IGHV), diversity, and joining genes has however been identified as potentially affecting inference specificity. In this study, we have addressed this issue by analysing the nucleotide composition of unmutated human immunoglobulin heavy chains-encoding transcripts, focusing on the 3ö most bases of 47 IGHV germline genes. Although transcripts derived from some of the germline genes predominately incorporated the germline encoded base even at position 320, the last base of most IGHV genes, transcripts originating in other genes presented other nucleotides to the same extent at this position. In transcripts derived from two of the germline genes, IGHV3-13*01 and IGHV4-30-2*01, the predominating nucleotide (G) was in fact not that of the gene (A). Hence, we suggest that inference of IGHV genes should be limited to bases preceding nucleotide 320, as inference beyond this would jeopardize the specificity of the inference process. The different degree of incorporation of the final base of the IGHV gene directly influences the distribution of amino acids of the ascending strand of the third complementarity determining region of the heavy chain. Thereby it influences the nature of this specificity-determining part of the antibody population. In addition, we also present data that indicate the existence of a common so far un-recognized allelic variant of IGHV3-7 that carries an A318G difference in relation to IGHV3-7*02.
European Nucleotide Archive: PRJEB18926 https://www.ebi.ac.uk/ena/data/view/PRJEB18926