User Functions¶
User functions from DigitalCellSorter.core.DigitalCellSorter class.
Note
All of the tools listed below in this section are intended to use from an
instance of a DigitalCellSorter class. For example:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.dataName = 'my_data_name'
DCS.saveDir = os.path.join(os.path.dirname(__file__), 'output', DCS.dataName, '')
data = DCS.prepare(raw_data)
DCS.process(DCS.prepare(data))
DCS.makeIndividualGeneExpressionPlot('CCL5')
DCS.makeIndividualGeneTtestPlot('CCL5', analyzeBy='celltype')
cells = DCS.getCells(celltype='T cell')
DCS.makeAnomalyScoresPlot(cells)
# ...
Direct use of function from where they are stored may result in undefined behavior.
Description of the package functionality
The main class of DigitalCellSorter. The class includes tools for:
Pre-preprocessing of single cell RNA sequencing data
Quality control
Batch effects correction
Cells anomaly score evaluation
Dimensionality reduction
Clustering
Annotation of cell types
Vizualization
Post-processing
Primary tools¶
Primary tools are used for pre-processing of the input data, quality control, batch correction, dimensionality reduction, clustering and cell type annotation.
Note
We reccomend to use only functions prepare(), process(), and
visualize() of the Primary tools. All processing workflow is contained
within process().
If you wish to modify the workflow use the other components of the
Primary tools, such as cluster(), project() etc.
References to DigitalCellSorter class:
|
Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure. |
|
Convert index to hugo names, if any names in the index are duplicated, remove duplicates |
|
Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns |
|
Project pandas.DataFrame to lower dimensions |
|
Cluster PCA-reduced data into a desired number of clusters |
|
Produce cluster voting results, annotate cell types, and update marker expression with cell type labels |
|
Process data before using any annotation of visualization functions |
Aggregate of visualization tools of this class. |
Function prepare(): prepare input data for function process()
-
DigitalCellSorter.prepare(obj)[source] Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.
- Parameters:
- obj: str, pandas.DataFrame, pandas.Series
Expression data in a form of pandas.DataFrame, pandas.Series, or name and path to a csv file with data
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
dDCS.preapre(‘data.csv’)
Function convert(): convert gene index of a DataFrame prepared by function prepare()
from one naming convention to another
-
DigitalCellSorter.convert(nameFrom=None, nameTo=None, **kwargs)[source] Convert index to hugo names, if any names in the index are duplicated, remove duplicates
- Parameters:
- nameFrom: str, Default ‘alias’
Gene name type to convert from
- nameTo: str, Default ‘hugo’
Gene name type to convert to
Any parameters that function ‘mergeIndexDuplicates’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.convertIndex()
Function clean(): validate index, replace missing with zeros,
remove all-zero rows and columns of a DataFrame
-
DigitalCellSorter.clean()[source] Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.clean()
Function normalize(): rescale all cells, log-transform data,
remove constant genes, and sort index of a DataFrame
-
DigitalCellSorter.normalize(median=None)[source] Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index
- Parameters:
- median: float, Default None
Scale factor, if not provided will be computed as median across all cells in data
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.normalize()
Function project(): project data to lower dimensions
-
DigitalCellSorter.project(PCAonly=False, do_fast_tsne=True)[source] Project pandas.DataFrame to lower dimensions
- Parameters:
- PCAonly: boolean, Default False
Perform Principal component analysis only
- do_fast_tsne: boolean, Default True
Do FI-tSNE instead of “exact” tSNE This option is ignored if layout is not ‘TSNE’
- Returns:
- tuple
Processed data
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
xPCA, PCs, tSNE = DCS.project()
Function cluster(): cluster PCA-reduced data into a desired number of clusters
-
DigitalCellSorter.cluster()[source] Cluster PCA-reduced data into a desired number of clusters
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.cluster()
Function annotate(): produce cluster voting results
-
DigitalCellSorter.annotate(mapNonexpressedCelltypes=True)[source] Produce cluster voting results, annotate cell types, and update marker expression with cell type labels
- Parameters:
- mapNonexpressedCelltypes: boolean, Default True
If True then cell types coloring will be consistent across all datasets, regardless what cell types are annotated in all datasets for a given input marker list file.
- Returns:
- dictionary
Voting results, a dictionary in form of: {cluster label: assigned cell type}
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
results = DCS.annotate(df_markers_expr, df_marker_cell_type)
Function process(): main function
-
DigitalCellSorter.process(dataIsNormalized=False, cleanData=True)[source] Process data before using any annotation of visualization functions
- Parameters:
- dataIsNormalized: boolean, Default False
Whether DCS.df_expr is normalized or not
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
Function visualize(): make all default plots of to visualize results
of function process()
-
DigitalCellSorter.visualize()[source] Aggregate of visualization tools of this class.
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.visualize()
Extraction tools¶
Warning
Use these functions only after process()
References to DigitalCellSorter class:
|
Get expression of a gene. |
|
Get expression of a set of cells. |
|
Get cell annotations in a form of pandas.Series |
|
Function to get anomaly score of cells based on some reference set |
|
Extract new marker genes based on the cluster annotations |
|
Get index of sells that satisfy the QC criteria |
|
Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series |
Function getExprOfGene(): Get expression of a gene
-
DigitalCellSorter.getExprOfGene(gene, analyzeBy='cluster')[source] Get expression of a gene. Run this function only after function process()
- Parameters:
- cells: pandas.MultiIndex
Index of cells of interest
- analyzeBy: str, Default ‘cluster’
What level of lablels to include. Other possible options are ‘label’ and ‘celltype’
- Returns:
- pandas.DataFrame
With expression of the cells of interest
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.getExprOfGene(‘SDC1’)
Function getExprOfCells(): Get expression of a set of cells
-
DigitalCellSorter.getExprOfCells(cells)[source] Get expression of a set of cells. Run this function only after function process()
- Parameters:
- cells: pandas.MultiIndex
2-level Index of cells of interest, must include levels ‘batch’ and ‘cell’
- Returns:
- pandas.DataFrame
With expression of the cells of interest
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.getExprOfCells(cells)
Function getCells(): get cells index by celltype, clusterIndex or clusterName
-
DigitalCellSorter.getCells(celltype=None, clusterIndex=None, clusterName=None)[source] Get cell annotations in a form of pandas.Series
- Parameters:
- celltype: str, Default None
Cell type to extract
- clusterIndex: int, Default None
Cell type to extract
- clusterName: str, Default None
Cell type to extract
- Returns:
- pandas.MultiIndex
Index of labelled cells
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
labels = DCS.getCells()
Function getAnomalyScores(): get anomaly score of cells based on some reference set
-
DigitalCellSorter.getAnomalyScores(trainingSet, testingSet, printResults=False)[source] Function to get anomaly score of cells based on some reference set
- Parameters:
- trainingSet: pandas.DataFrame
With cells to trail isolation forest on
- testingSet: pandas.DataFrame
With cells to score
- printResults: boolean, Default False
Whether to print results
- Returns:
- 1d numpy.array
Anomaly score(s) of tested cell(s)
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
cutoff = DCS.getAnomalyScores(df_expr.iloc[:, 5:], df_expr.iloc[:, :5])
Function getNewMarkerGenes(): extract new markers from the annotated clusters and produce plot of the new markers
-
DigitalCellSorter.getNewMarkerGenes(cluster=None, top=100, zScoreCutoff=None, removeUnknown=False, **kwargs)[source] Extract new marker genes based on the cluster annotations
- Parameters:
- cluster: int, Default None
Cluster #, if provided genes of only this culster will be returned
- top: int, Default 100
Upper bound for number of new markers per cell type
- zScoreCutoff: float, Default 0.3
Lower bound for a marker z-score to be significant
- removeUnknown: boolean, Default False
Whether to remove type “Unknown”
Any parameters that function ‘makePlotOfNewMarkers’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.extractNewMarkerGenes()
Function getIndexOfGoodQualityCells(): Get index of sells that satisfy the QC criteria
-
DigitalCellSorter.getIndexOfGoodQualityCells(QCplotsSubDir='QC_plots', **kwargs)[source] Get index of sells that satisfy the QC criteria
- Parameters:
- count_depth_cutoff: float, Default 0.5
Fraction of median to take as count depth cutoff
- number_of_genes_cutoff: float, Default 0.5
Fraction of median to take as number of genes cutoff
- mitochondrial_genes_cutoff: float, Default 3.0
The cutoff is median + standard_deviation * this_parameter
Any parameters that function ‘makeQualityControlHistogramPlot’ can accept
- Returns:
- pandas.Index
Index of cells
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
index = DCS.getIndexOfGoodQualityCells()
Function getCountsDataframe(): Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series
-
DigitalCellSorter.getCountsDataframe(se1, se2, tagForMissing='N/A')[source] Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series
- Parameters:
- se1: pandas.Series
Series with the first set of items
- se2: pandas.Series
Series with the second set of items
- tagForMissing: str, Default ‘N/A’
Label to assign to non-overlapping items
- Returns:
- pandas.DataFrame
Contains counts
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df = DCS.getCountsDataframe(se1, se2)
Visualization tools¶
Warning
Use these functions only after process()
References to DigitalCellSorter class:
|
Produce projection plot colored by cell types |
|
Produce projection plot colored by batches |
|
Produce projection plot colored by clusters |
|
Produce Quality Control projection plots |
|
Produce subplots on each marker and its expression on all clusters |
|
Make anomaly scores plot |
|
Produce individual gene t-test plot of the two-tailed p-value. |
|
Produce individual gene expression plot on a 2D layout |
References to VisualizationFunctions class:
|
|
|
|
|
|
|
|
|
|
|
Function makeProjectionPlotAnnotated(): Produce t-SNE plot colored by cell types
-
DigitalCellSorter.makeProjectionPlotAnnotated(**kwargs)[source] Produce projection plot colored by cell types
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotAnnotated()
Example output:
Function makeProjectionPlotByBatches(): Produce t-SNE plot colored by batches
-
DigitalCellSorter.makeProjectionPlotByBatches(**kwargs)[source] Produce projection plot colored by batches
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotByBatches()
Example output:
Function makeProjectionPlotByClusters(): Produce t-SNE plot colored by clusters
-
DigitalCellSorter.makeProjectionPlotByClusters(**kwargs)[source] Produce projection plot colored by clusters
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotByClusters()
Example output:
Function makeQualityControlHistogramPlot(): Produce Quality Control histogram plots
-
DigitalCellSorter.makeQualityControlHistogramPlot(*args, **kwargs)
Example output:
Function makeProjectionPlotsQualityControl(): Produce Quality Control t-SNE plots
-
DigitalCellSorter.makeProjectionPlotsQualityControl(**kwargs)[source] Produce Quality Control projection plots
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotsQualityControl()
Example output:
Function makeMarkerSubplots(): Produce subplots on each marker and its expression on all clusters
-
DigitalCellSorter.makeMarkerSubplots(**kwargs)[source] Produce subplots on each marker and its expression on all clusters
- Parameters:
Any parameters that function ‘internalMakeMarkerSubplots’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeMarkerSubplots()
Example output:
Function makeAnomalyScoresPlot(): Make anomaly scores plot
-
DigitalCellSorter.makeAnomalyScoresPlot(cells='All', suffix='', noPlot=False, **kwargs)[source] Make anomaly scores plot
- Parameters:
- cells: pandas.MultiIndex, Default ‘All’
Index of cells of interest
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
cells = DCS.getCells(celltype=’T cell’)
DCS.makeAnomalyScoresPlot(cells)
Example output:
Function makeIndividualGeneTtestPlot(): Produce individual gene t-test plot of the two-tailed p-value
-
DigitalCellSorter.makeIndividualGeneTtestPlot(gene, analyzeBy='label', **kwargs)[source] Produce individual gene t-test plot of the two-tailed p-value.
- Parameters:
- gene: str
Name of gene of interest
- analyzeBy: str, Default ‘label’
What level of lablels to include. Other possible options are ‘label’ and ‘celltype’
Any parameters that function ‘makeTtestPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.makeIndividualGeneTtestPlot(‘SDC1’)
Example output:
Function makeIndividualGeneExpressionPlot(): Produce individual gene expression plot on a 2D layout
-
DigitalCellSorter.makeIndividualGeneExpressionPlot(genes, **kwargs)[source] Produce individual gene expression plot on a 2D layout
- Parameters:
- gene: str, or list-like
Name of gene of interest. E.g. ‘CD4, CD33’, ‘PECAM1’, [‘CD4’, ‘CD33’]
- hideClusterLabels: boolean, Default False
Whether to hide the clusters labels
- outlineClusters: boolean, Default True
Whether to outline the clusters with circles
Any parameters that function ‘internalMakeMarkerSubplots’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.makeIndividualGeneExpressionPlot(‘CD4’)
Example output:
Function makeHistogramNullDistributionPlot(): Produce histogram plot of the voting null distributions
-
DigitalCellSorter.makeHistogramNullDistributionPlot(*args, **kwargs)
Example output:
Function makeAnnotationResultsMatrixPlot(): Produce voting results voting matrix plot
-
DigitalCellSorter.makeAnnotationResultsMatrixPlot(*args, **kwargs)
Example output:
Function makeMarkerExpressionPlot(): Produce image on marker genes and their expression on all clusters
-
DigitalCellSorter.makeMarkerExpressionPlot(*args, **kwargs)
Example output:
Function makeStackedBarplot(): Produce stacked barplot with cell fractions
-
DigitalCellSorter.makeStackedBarplot(*args, **kwargs)
Example output:
Function makeSankeyDiagram(): Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes
-
DigitalCellSorter.makeSankeyDiagram(*args, **kwargs)
Example output: