SCEPTR

SCEPTR#

SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our manuscript demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods.

SCEPTR is a BERT-like transformer-based neural network implemented in Pytorch. With the default model providing best-in-class performance with only 153,108 parameters (typical protein language models have tens or hundreds of millions), SCEPTR runs fast- even on a CPU! And if your computer does have a CUDA-enabled GPU, the sceptr package will automatically detect and use it, giving you blazingly fast performance without the hassle.

sceptr’s API exposes four intuitive functions: calc_cdist_matrix(), calc_pdist_vector(), calc_vector_representations(), and calc_residue_representations() – and it’s all you need to make full use of the SCEPTR models. What’s even better is that they are fully compliant with pyrepseq’s tcr_metric API, so sceptr will fit snugly into the rest of your repertoire analysis toolkit.

Our model SCEPTR outperforms traditional protein language models and sequence alignment models on TCR specificity prediciton.

Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance.#

Indices and tables#