Usage#
Functional API (Recommended)#
The functional API is accessible from the root module, and is the easiest way to use SCEPTR. When using the functional API, you will be using the default SCEPTR model (see the model variants section below). To begin analysing TCR data with sceptr, you must first load the TCR data into memory in the prescribed format using pandas.
Tip
SCEPTR only recognises TCR V gene symbols that are IMGT-compliant, and also known to be functional (i.e. known pseudogenes or ORFs are not allowed). For easy standardisation of TCR gene nomenclature in your data, as well as filtering your data for functional V/J genes, check out tidytcells.
>>> from pandas import DataFrame
>>> tcrs = DataFrame(
... data = {
... "TRAV": ["TRAV38-1*01", "TRAV3*01", "TRAV13-2*01", "TRAV38-2/DV8*01"],
... "CDR3A": ["CAHRSAGGGTSYGKLTF", "CAVDNARLMF", "CAERIRKGQVLTGGGNKLTF", "CAYRSAGGGTSYGKLTF"],
... "TRBV": ["TRBV2*01", "TRBV25-1*01", "TRBV9*01", "TRBV2*01"],
... "CDR3B": ["CASSEFQGDNEQFF", "CASSDGSFNEQFF", "CASSVGDLLTGELFF", "CASSPGTGGNEQYF"],
... },
... index = [0,1,2,3]
... )
>>> print(tcrs)
TRAV CDR3A TRBV CDR3B
0 TRAV38-1*01 CAHRSAGGGTSYGKLTF TRBV2*01 CASSEFQGDNEQFF
1 TRAV3*01 CAVDNARLMF TRBV25-1*01 CASSDGSFNEQFF
2 TRAV13-2*01 CAERIRKGQVLTGGGNKLTF TRBV9*01 CASSVGDLLTGELFF
3 TRAV38-2/DV8*01 CAYRSAGGGTSYGKLTF TRBV2*01 CASSPGTGGNEQYF
calc_cdist_matrix#
As the name suggests, calc_cdist_matrix() gives you an easy way to calculate a cross-distance matrix between two sets of TCRs.
>>> import sceptr
>>> cdist_matrix = sceptr.calc_cdist_matrix(tcrs.iloc[:2], tcrs.iloc[2:])
>>> print(cdist_matrix)
[[1.2849896 0.7521934]
[1.4653426 1.4646543]]
calc_pdist_vector#
If you’re only interested in calculating distances within a set, calc_pdist_vector() gives you a one-dimensional array of within-set distances.
>>> pdist_vector = sceptr.calc_pdist_vector(tcrs)
>>> print(pdist_vector)
[1.4135991 1.2849895 0.75219345 1.4653426 1.4646543 1.287208 ]
Tip
The end result of using the calc_cdist_matrix() and calc_pdist_vector() functions are equivalent to generating sceptr’s TCR representations first with calc_vector_representations(), then using scipy’s cdist or pdist functions to get the corresponding matrix or vector, respectively.
But on machines with CUDA-enabled GPUs, directly using sceptr’s calc_cdist_matrix() and calc_pdist_vector() functions will run faster, as it internally runs all computations on the GPU.
calc_vector_representations#
If you want to directly operate on sceptr’s TCR representations, you can use calc_vector_representations().
>>> reps = sceptr.calc_vector_representations(tcrs)
>>> print(reps.shape)
(4, 64)
calc_residue_representations#
The package also provides the user with an easy way to get access to SCEPTR’s internal representations of each individual amino acid residue in the tokenised form of its input TCRs, as outputted by the penultimate layer of its self-attention stack.
Interested users can use calc_residue_representations().
Please refer to the documentation for the ResidueRepresentations class for details on how to interpret the output.
>>> res_reps = sceptr.calc_residue_representations(tcrs)
>>> print(res_reps)
ResidueRepresentations[num_tcrs: 4, rep_dim: 64]
Model variants#
The sceptr.variant submodule allows users access a variety of non-default SCEPTR model variants, and use them for TCR analysis.
The submodule exposes functions which return Sceptr objects with the model state of the chosen variant loaded.
These model instances expose the same functions as those used in the functional API, so you can just plug and play.
For example:
>>> from sceptr import variant
>>> sceptr_tiny = variant.tiny()
>>> tiny_reps = sceptr_tiny.calc_vector_representations(tcrs)
>>> print(tiny_reps.shape)
(4, 16)
Prescribed data format#
SCEPTR expects to receive TCR data in the form of pyrepseq standard format-compliant pandas DataFrames.
All TCR data should be represented as a DataFrame with the following structure and data types.
The column order is irrelevant.
Each row should represent one TCR.
Incomplete rows are allowed (e.g. only beta chain data available) as long as the SCEPTR variant that is being used has at least some partial information to go on.
Extra columns are also allowed.
Column name |
Column datatype |
Column contents |
|---|---|---|
TRAV |
str |
IMGT symbol for the alpha chain V gene |
CDR3A |
str |
Amino acid sequence of the alpha chain CDR3, including the first C and last W/F residues, in all caps |
TRBV |
str |
IMGT symbol for the beta chain V gene |
CDR3B |
str |
Amino acid sequence of the beta chain CDR3, including the first C and last W/F residues, in all caps |