sceptr#
SCEPTR is a small, fast, and performant TCR representation model for alignment-free TCR analysis. The root module provides easy access to SCEPTR through a functional API which uses the default model.
- sceptr.calc_cdist_matrix(anchors: DataFrame, comparisons: DataFrame) ndarray[tuple[Any, ...], dtype[float32]]#
Generate a cdist matrix between two collections of TCRs.
- Parameters:
anchors (DataFrame) – DataFrame specifying the first (anchor) collection of input TCRs. It must be in the prescribed format.
comparisons (DataFrame) – DataFrame specifying the second (comparison) collection of input TCRs. It must be in the prescribed format.
- Returns:
A 2D numpy ndarray representing a cdist matrix between TCRs from anchors and comparisons. The returned array will have shape \((X, Y)\) where \(X\) is the number of TCRs in anchors and \(Y\) is the number of TCRs in comparisons.
- Return type:
NDArray[numpy.float32]
- sceptr.calc_pdist_vector(instances: DataFrame) ndarray[tuple[Any, ...], dtype[float32]]#
Generate a pdist vector of distances between each pair of TCRs in the input data.
- Parameters:
instances (DataFrame) – DataFrame specifying the input TCRs. It must be in the prescribed format.
- Returns:
A 1D numpy ndarray representing a pdist vector of distances between each pair of TCRs in instances. The returned array will have shape \((\frac{1}{2}N(N-1),)\), where \(N\) is the number of TCRs in instances.
- Return type:
NDArray[numpy.float32]
- sceptr.calc_residue_representations(instances: DataFrame) ResidueRepresentations#
Map each TCR to a set of amino acid residue-level representations. The residue-level representations are the output of the penultimate self-attention layer, as also used by the
average_pooling()variant when generating TCR receptor-level representations.- Parameters:
instances (DataFrame) – DataFrame specifying the input TCRs. It must be in the prescribed format.
- Returns:
An array of representation vectors for each amino acid residue in the tokenised forms of the input TCRs. For details on how to interpret/use this output, please refer to the documentation for
ResidueRepresentations.- Return type:
- sceptr.calc_vector_representations(instances: DataFrame) ndarray[tuple[Any, ...], dtype[float32]]#
Map TCRs to their corresponding vector representations.
- Parameters:
instances (DataFrame) – DataFrame specifying the input TCRs. It must be in the prescribed format.
- Returns:
A 2D numpy ndarray object where every row vector corresponds to a row in instances. The returned array will have shape \((N, 64)\) where \(N\) is the number of TCRs in instances.
- Return type:
NDArray[numpy.float32]
- sceptr.disable_hardware_acceleration() None#
Instruct SCEPTR to ignore hardware acceleration options and only use the CPU.
By default, SCEPTR will look for available hardware acceleration devices such as CUDA-enabled GPUs and perform computations there. However, in some cases it may be favourable to explicitly keep models on the CPU (e.g. a CUDA-enabled GPU is available but does not have sufficient VRAM for your use case). This function is useful for such scenarios. This setting can be reversed using
sceptr.enable_hardware_acceleration().Note
Toggling this setting will affect the behaviour of the functional API and any new variants instantiated after the fucntion call. However, any variants instantiated before the call will remain unaffected. To disable hardware acceleration for existing model instances, use
sceptr.model.Sceptr.disable_hardware_acceleration().
- sceptr.enable_hardware_acceleration() None#
Instruct SCEPTR to detect and use available hardware acceleration, such as CUDA.
While hardware acceleration is toggled on by default, it can be turned off manually by calling
sceptr.disable_hardware_acceleration(). This function allows you to turn the setting back on.Note
Toggling this setting will affect the behaviour of the functional API and any new variants instantiated after the fucntion call. However, any variants instantiated before the call will remain unaffected. To enable hardware acceleration for existing model instances, use
sceptr.model.Sceptr.enable_hardware_acceleration().
- sceptr.setup(species: Literal['homosapiens', 'musmusculus'])#
Set up the SCEPTR package for Homo sapiens / Mus musculus TCR data.
Caution
Mus musculus support is considered experimental. All current SCEPTR variants are trained only on Homo sapiens TCR data. Therefore, strictly speaking, Mus musculus TCR data should be considered out of distribution. How well the models work for inferences on Mus musculus TCRs is currently untested.
- Parameters:
species (str) – SCEPTR currently supports
"homosapiens"or"musmusculus".
Examples
This experimental feature allows you to send Mus musculus TCR data through SCEPTR and produce representations for them. First, let’s prepare a toy set of two Mus musculus TCRs.
>>> import sceptr >>> from pandas import DataFrame >>> musmusculus_tcrs = DataFrame( ... data = { ... "TRAV": ["TRAV8D-1*01", "TRAV8-1*01"], ... "CDR3A": ["CATDPRNNAGAKLTF", "CATETNNNAGAKLTF"], ... "TRBV": ["TRBV12-1*01", "TRBV12-1*01"], ... "CDR3B": ["CASSPRDWGSGEQYF", "CASSLGDWGNAEQFF"], ... }, ... index = [0,1] ... ) >>> print(musmusculus_tcrs) TRAV CDR3A TRBV CDR3B 0 TRAV8D-1*01 CATDPRNNAGAKLTF TRBV12-1*01 CASSPRDWGSGEQYF 1 TRAV8-1*01 CATETNNNAGAKLTF TRBV12-1*01 CASSLGDWGNAEQFF
Passing Mus musculus TCR data to SCEPTR without doing anything else will result in an error, since the package is set up by default to recognize and process human TCR gene symbols. We must therefore explicitly tell SCEPTR to switch to its experimental Mus musculus mode using the
sceptr.setup()method.>>> sceptr.setup("musmusculus")
SCEPTR is now ready to parse Mus musculus data!
>>> reps = sceptr.calc_vector_representations(musmusculus_tcrs) >>> print(reps.shape) (2, 64)
If you want to switch back to Homo sapiens mode, just call
sceptr.setup()again with"homosapiens"as the argument.>>> sceptr.setup("homosapiens")