Scientist Regeneron Pharmaceuticals, New York, United States
Multi-omic single cell sequencing provides a route to improved predictive models for T cell receptor - antigen binding
Peter G. Hawkins*1, Wen Zhang*1, Jing He1, Namita T. Gupta1, Jinrui Liu1, Gabrielle Choonoo1, Se Jeong1, Calvin R. Chen1, Ankur Dhanik1, Stefan Semrau1, Myles Dillon1, Raquel Deering1, Lynn E. Macdonald1, Gavin Thurston1, Gurinder S. Atwal1
1Regeneron Pharmaceuticals Inc., Molecular Profiling and Data Science, Tarrytown, NY
T cells form an essential arm of the adaptive immune response, with their T cell receptors (TCRs) providing specific binding to antigens. There is a huge diversity of both TCRs and antigens [1], which makes relating TCRs measured in a sample to the antigens they bind a mathematically challenging task.
One method to assign antigen specificity of a TCR is the use of reference libraries of known antigen-specific TCR sequences and searching for clonotype matches to these libraries. This requires that one has already observed that exact clonotype previously bound to an antigen. Sequence-similarity based metrics [2,3,4,5] could be used to extend beyond exact clonotype matches for antigen specificity. However, this approach remains limited by the size of the reference dataset linking TCRs to antigens, the requirement to search over the full dataset, and the ability of classical protein distance scoring to provide accurate labelling of functionally similar TCRs. Experimentally one can use low-throughput tetramer assays to validate the specificity of a TCR, but this requires strong initial knowledge of the expected cognate antigen of the TCR.
Recent developments in highly multiplexed high-throughput dextramer-TCR binding sequencing provide a method to rapidly associate TCRs and antigens. We show that these assays, as part of a multi-omic approach combined with a novel computational framework to process these data, allows one to build large, reliable, TCR-antigen datasets [6]. We harness these datasets to train machine learning models that learn the properties of TCRs with shared antigen specificity [6]. These models predict the antigen specificity of new TCRs without the need to search over the entirety of any datasets once trained. We explore the predictive capabilities of these models, and their ability to cluster TCRs into groups with different physical properties.
* Authors contributed equally
[1] Mora, Thierry, and Aleksandra M. Walczak. Current Opinion in Systems Biology 18 (2019): 104-110.
[2] Dash, Pradyot, et al. Nature 547.7661 (2017): 89-93.
[3] Glanville, Jacob, et al. Nature 547.7661 (2017): 94-98.
[4] Huang, Huang, et al. Nature biotechnology 38.10 (2020): 1194-1202.
[5] Zhang, Hongyi, et al. Clinical Cancer Research26.6 (2020): 1359-1371.
[6] Zhang, Wen, et al. Science Advances 7.20 (2021): eabf5835.