User Functions¶

User functions from DigitalCellSorter.core.DigitalCellSorter class.

Note

All of the tools listed below in this section are intended to use from an instance of a DigitalCellSorter class. For example:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.dataName = 'my_data_name'
DCS.saveDir = os.path.join(os.path.dirname(__file__), 'output', DCS.dataName, '')

data = DCS.prepare(raw_data)

DCS.process(DCS.prepare(data))

DCS.makeIndividualGeneExpressionPlot('CCL5')

DCS.makeIndividualGeneTtestPlot('CCL5', analyzeBy='celltype')

cells = DCS.getCells(celltype='T cell')
DCS.makeAnomalyScoresPlot(cells)

# ...

Direct use of function from where they are stored may result in undefined behavior.

The main class of DigitalCellSorter. The class includes tools for:

Pre-preprocessing of single cell RNA sequencing data
Quality control
Batch effects correction
Cells anomaly score evaluation
Dimensionality reduction
Clustering
Annotation of cell types
Vizualization
Post-processing

Primary tools¶

Primary tools are used for pre-processing of the input data, quality control, batch correction, dimensionality reduction, clustering and cell type annotation.

Note

We reccomend to use only functions prepare(), process(), and visualize() of the Primary tools. All processing workflow is contained within process(). If you wish to modify the workflow use the other components of the Primary tools, such as cluster(), project() etc.

References to DigitalCellSorter class:

`prepare`(obj)	Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.
`convert`([nameFrom, nameTo])	Convert index to hugo names, if any names in the index are duplicated, remove duplicates
`clean`()	Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns
`project`([PCAonly, do_fast_tsne])	Project pandas.DataFrame to lower dimensions
`cluster`()	Cluster PCA-reduced data into a desired number of clusters
`annotate`([mapNonexpressedCelltypes])	Produce cluster voting results, annotate cell types, and update marker expression with cell type labels
`process`([dataIsNormalized, cleanData])	Process data before using any annotation of visualization functions
`visualize`()	Aggregate of visualization tools of this class.

DigitalCellSorter.prepare(obj)[source]

Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.

Parameters:

obj: str, pandas.DataFrame, pandas.Series: Expression data in a form of pandas.DataFrame, pandas.Series, or name and path to a csv file with data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

dDCS.preapre(‘data.csv’)

DigitalCellSorter.convert(nameFrom=None, nameTo=None, **kwargs)[source]

Convert index to hugo names, if any names in the index are duplicated, remove duplicates

Parameters:

nameFrom: str, Default ‘alias’: Gene name type to convert from
nameTo: str, Default ‘hugo’: Gene name type to convert to

Any parameters that function ‘mergeIndexDuplicates’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.convertIndex()

DigitalCellSorter.clean()[source]

Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.clean()

DigitalCellSorter.normalize(median=None)[source]

Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index

Parameters:

median: float, Default None: Scale factor, if not provided will be computed as median across all cells in data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.normalize()

DigitalCellSorter.project(PCAonly=False, do_fast_tsne=True)[source]

Project pandas.DataFrame to lower dimensions

Parameters:

PCAonly: boolean, Default False: Perform Principal component analysis only
do_fast_tsne: boolean, Default True: Do FI-tSNE instead of “exact” tSNE This option is ignored if layout is not ‘TSNE’

Returns:

tuple: Processed data

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

xPCA, PCs, tSNE = DCS.project()

DigitalCellSorter.cluster()[source]

Cluster PCA-reduced data into a desired number of clusters

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.cluster()

DigitalCellSorter.annotate(mapNonexpressedCelltypes=True)[source]

Produce cluster voting results, annotate cell types, and update marker expression with cell type labels

Parameters:

mapNonexpressedCelltypes: boolean, Default True: If True then cell types coloring will be consistent across all datasets, regardless what cell types are annotated in all datasets for a given input marker list file.

Returns:

dictionary: Voting results, a dictionary in form of: {cluster label: assigned cell type}

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

results = DCS.annotate(df_markers_expr, df_marker_cell_type)

DigitalCellSorter.process(dataIsNormalized=False, cleanData=True)[source]

Process data before using any annotation of visualization functions

Parameters:

dataIsNormalized: boolean, Default False: Whether DCS.df_expr is normalized or not

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DigitalCellSorter.visualize()[source]

Aggregate of visualization tools of this class.

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.visualize()

Extraction tools¶

Warning

Use these functions only after process()

References to DigitalCellSorter class:

`getExprOfGene`(gene[, analyzeBy])	Get expression of a gene.
`getExprOfCells`(cells)	Get expression of a set of cells.
`getCells`([celltype, clusterIndex, clusterName])	Get cell annotations in a form of pandas.Series
`getAnomalyScores`(trainingSet, testingSet[, …])	Function to get anomaly score of cells based on some reference set
`getNewMarkerGenes`([cluster, top, …])	Extract new marker genes based on the cluster annotations
`getIndexOfGoodQualityCells`([QCplotsSubDir])	Get index of sells that satisfy the QC criteria
`getCountsDataframe`(se1, se2[, tagForMissing])	Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

DigitalCellSorter.getExprOfGene(gene, analyzeBy='cluster')[source]

Get expression of a gene. Run this function only after function process()

Parameters:

cells: pandas.MultiIndex: Index of cells of interest
analyzeBy: str, Default ‘cluster’: What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Returns:

pandas.DataFrame: With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfGene(‘SDC1’)

DigitalCellSorter.getExprOfCells(cells)[source]

Get expression of a set of cells. Run this function only after function process()

Parameters:

cells: pandas.MultiIndex: 2-level Index of cells of interest, must include levels ‘batch’ and ‘cell’

Returns:

pandas.DataFrame: With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfCells(cells)

DigitalCellSorter.getCells(celltype=None, clusterIndex=None, clusterName=None)[source]

Get cell annotations in a form of pandas.Series

Parameters:

celltype: str, Default None: Cell type to extract
clusterIndex: int, Default None: Cell type to extract
clusterName: str, Default None: Cell type to extract

Returns:

pandas.MultiIndex: Index of labelled cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

labels = DCS.getCells()

DigitalCellSorter.getAnomalyScores(trainingSet, testingSet, printResults=False)[source]

Function to get anomaly score of cells based on some reference set

Parameters:

trainingSet: pandas.DataFrame: With cells to trail isolation forest on
testingSet: pandas.DataFrame: With cells to score
printResults: boolean, Default False: Whether to print results

Returns:

1d numpy.array: Anomaly score(s) of tested cell(s)

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.getAnomalyScores(df_expr.iloc[:, 5:], df_expr.iloc[:, :5])

DigitalCellSorter.getNewMarkerGenes(cluster=None, top=100, zScoreCutoff=None, removeUnknown=False, **kwargs)[source]

Extract new marker genes based on the cluster annotations

Parameters:

cluster: int, Default None: Cluster #, if provided genes of only this culster will be returned
top: int, Default 100: Upper bound for number of new markers per cell type
zScoreCutoff: float, Default 0.3: Lower bound for a marker z-score to be significant
removeUnknown: boolean, Default False: Whether to remove type “Unknown”

Any parameters that function ‘makePlotOfNewMarkers’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.extractNewMarkerGenes()

DigitalCellSorter.getIndexOfGoodQualityCells(QCplotsSubDir='QC_plots', **kwargs)[source]

Get index of sells that satisfy the QC criteria

Parameters:

count_depth_cutoff: float, Default 0.5: Fraction of median to take as count depth cutoff
number_of_genes_cutoff: float, Default 0.5: Fraction of median to take as number of genes cutoff
mitochondrial_genes_cutoff: float, Default 3.0: The cutoff is median + standard_deviation * this_parameter

Any parameters that function ‘makeQualityControlHistogramPlot’ can accept

Returns:

pandas.Index: Index of cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

index = DCS.getIndexOfGoodQualityCells()

DigitalCellSorter.getCountsDataframe(se1, se2, tagForMissing='N/A')[source]

Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

Parameters:

se1: pandas.Series: Series with the first set of items
se2: pandas.Series: Series with the second set of items
tagForMissing: str, Default ‘N/A’: Label to assign to non-overlapping items

Returns:

pandas.DataFrame: Contains counts

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df = DCS.getCountsDataframe(se1, se2)

Visualization tools¶

Warning

Use these functions only after process()

References to DigitalCellSorter class:

`makeProjectionPlotAnnotated`(**kwargs)	Produce projection plot colored by cell types
`makeProjectionPlotByBatches`(**kwargs)	Produce projection plot colored by batches
`makeProjectionPlotByClusters`(**kwargs)	Produce projection plot colored by clusters
`makeProjectionPlotsQualityControl`(**kwargs)	Produce Quality Control projection plots
`makeMarkerSubplots`(**kwargs)	Produce subplots on each marker and its expression on all clusters
`makeAnomalyScoresPlot`([cells, suffix, noPlot])	Make anomaly scores plot
`makeIndividualGeneTtestPlot`(gene[, analyzeBy])	Produce individual gene t-test plot of the two-tailed p-value.
`makeIndividualGeneExpressionPlot`(genes, **kwargs)	Produce individual gene expression plot on a 2D layout

References to VisualizationFunctions class:

`makeQualityControlHistogramPlot`(args, *kwargs)	Function to calculate QC quality cutoff and visualize it on a histogram
`makeHistogramNullDistributionPlot`(*args, …)	Produce histogram plot of the voting null distributions
`makeAnnotationResultsMatrixPlot`(args, *kwargs)	Produce voting results voting matrix plot
`makeMarkerExpressionPlot`(args, *kwargs)	Produce image on marker genes and their expression on all clusters.
`makeStackedBarplot`(args, *kwargs)	Produce stacked barplot with cell fractions
`makeSankeyDiagram`(args, *kwargs)	Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes

DigitalCellSorter.makeProjectionPlotAnnotated(**kwargs)[source]

Produce projection plot colored by cell types

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotAnnotated()

Example output:

DigitalCellSorter.makeProjectionPlotByBatches(**kwargs)[source]

Produce projection plot colored by batches

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByBatches()

Example output:

DigitalCellSorter.makeProjectionPlotByClusters(**kwargs)[source]

Produce projection plot colored by clusters

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByClusters()

Example output:

DigitalCellSorter.makeQualityControlHistogramPlot(*args, **kwargs)

Function to calculate QC quality cutoff and visualize it on a histogram

Parameters:

subset: pandas.Series: Data to analyze
cutoff: float: Cutoff to display
plotPathAndName: str, Default None: Text to include in the figure title and file name
N_bins: int, Default 100: Number of bins of the histogram
mito: boolean, Default False: Whether the analysis of mitochondrial genes fraction
displayMeasures: boolean, Default True: Print vertical dashed lines along with mean, median, and standard deviation
precision: int, Default 4: Number of digits after decimal
quantilePlotCutoff: float, Default 0.99: Distributions are cut to display the range from 0 to quantilePlotCutoff
dpi: int, Default 600: Resolution of the figure image
extension: str, Default ‘png’: Format of the figure file
fontScale: float, Default 1.5: Scale most of the figure fonts
includeTitle: boolean, Default False: Whether to include title on the figure

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.makeQualityControlHistogramPlot(subset, cutoff)

Example output:

DigitalCellSorter.makeProjectionPlotsQualityControl(**kwargs)[source]

Produce Quality Control projection plots

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotsQualityControl()

Example output:

DigitalCellSorter.makeMarkerSubplots(**kwargs)[source]

Produce subplots on each marker and its expression on all clusters

Parameters:

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeMarkerSubplots()

Example output:

DigitalCellSorter.makeAnomalyScoresPlot(cells='All', suffix='', noPlot=False, **kwargs)[source]

Make anomaly scores plot

Parameters:

cells: pandas.MultiIndex, Default ‘All’: Index of cells of interest

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

cells = DCS.getCells(celltype=’T cell’)

DCS.makeAnomalyScoresPlot(cells)

Example output:

DigitalCellSorter.makeIndividualGeneTtestPlot(gene, analyzeBy='label', **kwargs)[source]

Produce individual gene t-test plot of the two-tailed p-value.

Parameters:

gene: str: Name of gene of interest
analyzeBy: str, Default ‘label’: What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Any parameters that function ‘makeTtestPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneTtestPlot(‘SDC1’)

Example output:

DigitalCellSorter.makeIndividualGeneExpressionPlot(genes, **kwargs)[source]

Produce individual gene expression plot on a 2D layout

Parameters:

gene: str, or list-like: Name of gene of interest. E.g. ‘CD4, CD33’, ‘PECAM1’, [‘CD4’, ‘CD33’]
hideClusterLabels: boolean, Default False: Whether to hide the clusters labels
outlineClusters: boolean, Default True: Whether to outline the clusters with circles

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneExpressionPlot(‘CD4’)

Example output:

DigitalCellSorter.makeHistogramNullDistributionPlot(*args, **kwargs)

Produce histogram plot of the voting null distributions

Parameters:

dpi: int, Default 600: Resolution of the figure image
extension: str, Default ‘png’: Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeHistogramNullDistributionPlot()

Example output:

DigitalCellSorter.makeAnnotationResultsMatrixPlot(*args, **kwargs)

Produce voting results voting matrix plot

Parameters:

dpi: int, Default 600: Resolution of the figure image
extension: str, Default ‘png’: Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeAnnotationResultsMatrixPlot()

Example output:

DigitalCellSorter.makeMarkerExpressionPlot(*args, **kwargs)

Produce image on marker genes and their expression on all clusters. Uses files generated by function DCS.Vote

Parameters:

dpi: int, Default 600: Resolution of the figure image
extension: str, Default ‘png’: Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeMarkerExpressionPlot()

Example output:

DigitalCellSorter.makeStackedBarplot(*args, **kwargs)

Produce stacked barplot with cell fractions

Parameters:

clusterName: str, Deafult None: Label to include at the bar bottom. If None the self.dataName value will be used
legendStyle: boolean, Default False: Use one out of two styles of this figure
includeLowQC: boolean, Default True: Wether to include low quality cells
dpi: int, Default 600: Resolution of the figure image
extension: str, Default ‘png’: Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeStackedBarplot(clusterName)

Example output:

DigitalCellSorter.makeSankeyDiagram(*args, **kwargs)

Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes

Parameters:

df: pandas.DataFrame: With counts (overlaps)
colormapForIndex: dictionary, Default None: Colors to use for nodes specified in the DataFrame index
colormapForColumns: dictionary, Default None: Colors to use for nodes specified in the DataFrame columns
linksColor: str, Default ‘rgba(100,100,100,0.6)’: Color of the non-overlapping links
title: str, Default ‘’: Title to print on the diagram
interactive: boolean , Default False: Whether to launch interactive JavaScript-based graph
quality: int, Default 4: Proportional to the resolution of the figure to save
nodeLabelsFontSize: int, Default 15: Font size for node labels
nameAppend: str, Default ‘_Sankey_diagram’: Name to append to the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeSankeyDiagram(df)

Example output: