Thörnqvist L, Ohlin M
Data Brief 19 (-) 337-352 [2018-08-00; online 2018-05-04]
The highly variable complementary determining region 3 (CDR3) of antibodies is generated through recombination of immunoglobulin heavy chain variable (IGHV), diversity, and joining genes. The codons encoding the first residues of CDR3 may be derived directly from the IGHV germline gene but they may also be generated as part of the rearrangement process. Data of the nucleotide composition of these codons of rearranged genes, an indicator of the degree of contribution of the IGHV gene to CDR3 diversity, are presented in this article. Analyzed data are presented for two unrelated sets of raw sequence data. The raw data sets consisted of sequences of antibody heavy chain-encoding transcripts of six allergic subjects (European Nucleotide Archive accession number PRJEB18926), and paired antibody heavy and light chain variable region-encoding transcripts of memory B cells of three subjects (European Nucleotide Archive accession numbers SRX709625, SRX709626, and SRX709627). The nucleotide compositions of the corresponding 5'-ends of sequences encoding the CDR3 are presented for transcripts with an origin in 47 different IGHV alleles. These data have been used (Thörnqvist and Ohlin, 2018)  to demonstrate the extent of incorporation of the 3' most bases of IGHV germline genes into rearranged immunoglobulin encoding sequences, and the extent whereby any difference in incorporation affects the specificity of inference of the 3'-end of IGHV genes from immunoglobulin-encoding transcripts. They have also been used to assess the effect of observed gene differences on the composition of the ascending strand of CDR3 associated to antibodies with an origin in different IGHV genes (Thörnqvist and Ohlin, 2018) .
European Nucleotide Archive: PRJEB18926 https://www.ebi.ac.uk/ena/data/view/PRJEB18926