SCEPTR#
SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our manuscript demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods.
SCEPTR is a BERT-like transformer-based neural network implemented in Pytorch. With the default model providing best-in-class performance with only 153,108 parameters (typical protein language models have tens or hundreds of millions), SCEPTR runs fast- even on a CPU! And if your computer does have a CUDA-enabled GPU, the sceptr package will automatically detect and use it, giving you blazingly fast performance without the hassle.
sceptr’s API exposes four intuitive functions: calc_cdist_matrix(), calc_pdist_vector(), calc_vector_representations(), and calc_residue_representations() – and it’s all you need to make full use of the SCEPTR models.
What’s even better is that they are fully compliant with pyrepseq’s tcr_metric API, so sceptr will fit snugly into the rest of your repertoire analysis toolkit.
Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance.#