AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets.

Lees WD, Christley S, Peres A, Kos JT, Corrie B, Ralph D, Breden F, Cowell LG, Yaari G, Corcoran M, Karlsson Hedestam GB, Ohlin M, Collins AM, Watson CT, Busse CE, AIRR Community

Immunoinformatics (Amst) 10 (-) - [2023-06-00; online 2023-02-19]

Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.

Drug Discovery and Development (DDD) [Service]

PubMed 37388275

DOI 10.1016/j.immuno.2023.100025

Crossref 10.1016/j.immuno.2023.100025

mid: NIHMS1905398
pmc: PMC10310305
pii: 100025


Publications 9.5.0