User Functions

User functions from DigitalCellSorter.core.DigitalCellSorter class.

Note

All of the tools listed below in this section are intended to use from an instance of a DigitalCellSorter class. For example:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.dataName = 'my_data_name'
DCS.saveDir = os.path.join(os.path.dirname(__file__), 'output', DCS.dataName, '')

data = DCS.prepare(raw_data)

DCS.process(DCS.prepare(data))

DCS.makeIndividualGeneExpressionPlot('CCL5')

DCS.makeIndividualGeneTtestPlot('CCL5', analyzeBy='celltype')

cells = DCS.getCells(celltype='T cell')
DCS.makeAnomalyScoresPlot(cells)

# ...

Direct use of function from where they are stored may result in undefined behavior.

Description of the package functionality

The main class of DigitalCellSorter. The class includes tools for:

  1. Pre-preprocessing of single cell RNA sequencing data

  2. Quality control

  3. Batch effects correction

  4. Cells anomaly score evaluation

  5. Dimensionality reduction

  6. Clustering

  7. Annotation of cell types

  8. Vizualization

  9. Post-processing



Primary tools

Primary tools are used for pre-processing of the input data, quality control, batch correction, dimensionality reduction, clustering and cell type annotation.

Note

We reccomend to use only functions prepare(), process(), and visualize() of the Primary tools. All processing workflow is contained within process(). If you wish to modify the workflow use the other components of the Primary tools, such as cluster(), project() etc.

References to DigitalCellSorter class:

prepare(obj)

Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.

convert([nameFrom, nameTo])

Convert index to hugo names, if any names in the index are duplicated, remove duplicates

clean()

Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns

project([PCAonly, do_fast_tsne])

Project pandas.DataFrame to lower dimensions

cluster()

Cluster PCA-reduced data into a desired number of clusters

annotate([mapNonexpressedCelltypes])

Produce cluster voting results, annotate cell types, and update marker expression with cell type labels

process([dataIsNormalized, cleanData])

Process data before using any annotation of visualization functions

visualize()

Aggregate of visualization tools of this class.

Function prepare(): prepare input data for function process()

DigitalCellSorter.prepare(obj)[source]

Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.

Parameters:
obj: str, pandas.DataFrame, pandas.Series

Expression data in a form of pandas.DataFrame, pandas.Series, or name and path to a csv file with data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

dDCS.preapre(‘data.csv’)


Function convert(): convert gene index of a DataFrame prepared by function prepare() from one naming convention to another

DigitalCellSorter.convert(nameFrom=None, nameTo=None, **kwargs)[source]

Convert index to hugo names, if any names in the index are duplicated, remove duplicates

Parameters:
nameFrom: str, Default ‘alias’

Gene name type to convert from

nameTo: str, Default ‘hugo’

Gene name type to convert to

Any parameters that function ‘mergeIndexDuplicates’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.convertIndex()


Function clean(): validate index, replace missing with zeros, remove all-zero rows and columns of a DataFrame

DigitalCellSorter.clean()[source]

Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.clean()


Function normalize(): rescale all cells, log-transform data, remove constant genes, and sort index of a DataFrame

DigitalCellSorter.normalize(median=None)[source]

Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index

Parameters:
median: float, Default None

Scale factor, if not provided will be computed as median across all cells in data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.normalize()


Function project(): project data to lower dimensions

DigitalCellSorter.project(PCAonly=False, do_fast_tsne=True)[source]

Project pandas.DataFrame to lower dimensions

Parameters:
PCAonly: boolean, Default False

Perform Principal component analysis only

do_fast_tsne: boolean, Default True

Do FI-tSNE instead of “exact” tSNE This option is ignored if layout is not ‘TSNE’

Returns:
tuple

Processed data

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

xPCA, PCs, tSNE = DCS.project()


Function cluster(): cluster PCA-reduced data into a desired number of clusters

DigitalCellSorter.cluster()[source]

Cluster PCA-reduced data into a desired number of clusters

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.cluster()


Function annotate(): produce cluster voting results

DigitalCellSorter.annotate(mapNonexpressedCelltypes=True)[source]

Produce cluster voting results, annotate cell types, and update marker expression with cell type labels

Parameters:
mapNonexpressedCelltypes: boolean, Default True

If True then cell types coloring will be consistent across all datasets, regardless what cell types are annotated in all datasets for a given input marker list file.

Returns:
dictionary

Voting results, a dictionary in form of: {cluster label: assigned cell type}

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

results = DCS.annotate(df_markers_expr, df_marker_cell_type)


Function process(): main function

DigitalCellSorter.process(dataIsNormalized=False, cleanData=True)[source]

Process data before using any annotation of visualization functions

Parameters:
dataIsNormalized: boolean, Default False

Whether DCS.df_expr is normalized or not

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()


Function visualize(): make all default plots of to visualize results of function process()

DigitalCellSorter.visualize()[source]

Aggregate of visualization tools of this class.

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.visualize()


Extraction tools

Warning

Use these functions only after process()

References to DigitalCellSorter class:

getExprOfGene(gene[, analyzeBy])

Get expression of a gene.

getExprOfCells(cells)

Get expression of a set of cells.

getCells([celltype, clusterIndex, clusterName])

Get cell annotations in a form of pandas.Series

getAnomalyScores(trainingSet, testingSet[, …])

Function to get anomaly score of cells based on some reference set

getNewMarkerGenes([cluster, top, …])

Extract new marker genes based on the cluster annotations

getIndexOfGoodQualityCells([QCplotsSubDir])

Get index of sells that satisfy the QC criteria

getCountsDataframe(se1, se2[, tagForMissing])

Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

Function getExprOfGene(): Get expression of a gene

DigitalCellSorter.getExprOfGene(gene, analyzeBy='cluster')[source]

Get expression of a gene. Run this function only after function process()

Parameters:
cells: pandas.MultiIndex

Index of cells of interest

analyzeBy: str, Default ‘cluster’

What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Returns:
pandas.DataFrame

With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfGene(‘SDC1’)


Function getExprOfCells(): Get expression of a set of cells

DigitalCellSorter.getExprOfCells(cells)[source]

Get expression of a set of cells. Run this function only after function process()

Parameters:
cells: pandas.MultiIndex

2-level Index of cells of interest, must include levels ‘batch’ and ‘cell’

Returns:
pandas.DataFrame

With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfCells(cells)


Function getCells(): get cells index by celltype, clusterIndex or clusterName

DigitalCellSorter.getCells(celltype=None, clusterIndex=None, clusterName=None)[source]

Get cell annotations in a form of pandas.Series

Parameters:
celltype: str, Default None

Cell type to extract

clusterIndex: int, Default None

Cell type to extract

clusterName: str, Default None

Cell type to extract

Returns:
pandas.MultiIndex

Index of labelled cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

labels = DCS.getCells()


Function getAnomalyScores(): get anomaly score of cells based on some reference set

DigitalCellSorter.getAnomalyScores(trainingSet, testingSet, printResults=False)[source]

Function to get anomaly score of cells based on some reference set

Parameters:
trainingSet: pandas.DataFrame

With cells to trail isolation forest on

testingSet: pandas.DataFrame

With cells to score

printResults: boolean, Default False

Whether to print results

Returns:
1d numpy.array

Anomaly score(s) of tested cell(s)

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.getAnomalyScores(df_expr.iloc[:, 5:], df_expr.iloc[:, :5])


Function getNewMarkerGenes(): extract new markers from the annotated clusters and produce plot of the new markers

DigitalCellSorter.getNewMarkerGenes(cluster=None, top=100, zScoreCutoff=None, removeUnknown=False, **kwargs)[source]

Extract new marker genes based on the cluster annotations

Parameters:
cluster: int, Default None

Cluster #, if provided genes of only this culster will be returned

top: int, Default 100

Upper bound for number of new markers per cell type

zScoreCutoff: float, Default 0.3

Lower bound for a marker z-score to be significant

removeUnknown: boolean, Default False

Whether to remove type “Unknown”

Any parameters that function ‘makePlotOfNewMarkers’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.extractNewMarkerGenes()


Function getIndexOfGoodQualityCells(): Get index of sells that satisfy the QC criteria

DigitalCellSorter.getIndexOfGoodQualityCells(QCplotsSubDir='QC_plots', **kwargs)[source]

Get index of sells that satisfy the QC criteria

Parameters:
count_depth_cutoff: float, Default 0.5

Fraction of median to take as count depth cutoff

number_of_genes_cutoff: float, Default 0.5

Fraction of median to take as number of genes cutoff

mitochondrial_genes_cutoff: float, Default 3.0

The cutoff is median + standard_deviation * this_parameter

Any parameters that function ‘makeQualityControlHistogramPlot’ can accept

Returns:
pandas.Index

Index of cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

index = DCS.getIndexOfGoodQualityCells()


Function getCountsDataframe(): Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

DigitalCellSorter.getCountsDataframe(se1, se2, tagForMissing='N/A')[source]

Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

Parameters:
se1: pandas.Series

Series with the first set of items

se2: pandas.Series

Series with the second set of items

tagForMissing: str, Default ‘N/A’

Label to assign to non-overlapping items

Returns:
pandas.DataFrame

Contains counts

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df = DCS.getCountsDataframe(se1, se2)


Visualization tools

Warning

Use these functions only after process()

References to DigitalCellSorter class:

makeProjectionPlotAnnotated(**kwargs)

Produce projection plot colored by cell types

makeProjectionPlotByBatches(**kwargs)

Produce projection plot colored by batches

makeProjectionPlotByClusters(**kwargs)

Produce projection plot colored by clusters

makeProjectionPlotsQualityControl(**kwargs)

Produce Quality Control projection plots

makeMarkerSubplots(**kwargs)

Produce subplots on each marker and its expression on all clusters

makeAnomalyScoresPlot([cells, suffix, noPlot])

Make anomaly scores plot

makeIndividualGeneTtestPlot(gene[, analyzeBy])

Produce individual gene t-test plot of the two-tailed p-value.

makeIndividualGeneExpressionPlot(genes, **kwargs)

Produce individual gene expression plot on a 2D layout

References to VisualizationFunctions class:

makeQualityControlHistogramPlot(*args, **kwargs)

Function to calculate QC quality cutoff and visualize it on a histogram

makeHistogramNullDistributionPlot(*args, …)

Produce histogram plot of the voting null distributions

makeAnnotationResultsMatrixPlot(*args, **kwargs)

Produce voting results voting matrix plot

makeMarkerExpressionPlot(*args, **kwargs)

Produce image on marker genes and their expression on all clusters.

makeStackedBarplot(*args, **kwargs)

Produce stacked barplot with cell fractions

makeSankeyDiagram(*args, **kwargs)

Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes

Function makeProjectionPlotAnnotated(): Produce t-SNE plot colored by cell types

DigitalCellSorter.makeProjectionPlotAnnotated(**kwargs)[source]

Produce projection plot colored by cell types

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotAnnotated()

Example output:

Cannot load this photo

Function makeProjectionPlotByBatches(): Produce t-SNE plot colored by batches

DigitalCellSorter.makeProjectionPlotByBatches(**kwargs)[source]

Produce projection plot colored by batches

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByBatches()

Example output:

Cannot load this photo

Function makeProjectionPlotByClusters(): Produce t-SNE plot colored by clusters

DigitalCellSorter.makeProjectionPlotByClusters(**kwargs)[source]

Produce projection plot colored by clusters

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByClusters()

Example output:

Cannot load this photo

Function makeQualityControlHistogramPlot(): Produce Quality Control histogram plots

DigitalCellSorter.makeQualityControlHistogramPlot(*args, **kwargs)

Function to calculate QC quality cutoff and visualize it on a histogram

Parameters:
subset: pandas.Series

Data to analyze

cutoff: float

Cutoff to display

plotPathAndName: str, Default None

Text to include in the figure title and file name

N_bins: int, Default 100

Number of bins of the histogram

mito: boolean, Default False

Whether the analysis of mitochondrial genes fraction

displayMeasures: boolean, Default True

Print vertical dashed lines along with mean, median, and standard deviation

precision: int, Default 4

Number of digits after decimal

quantilePlotCutoff: float, Default 0.99

Distributions are cut to display the range from 0 to quantilePlotCutoff

dpi: int, Default 600

Resolution of the figure image

extension: str, Default ‘png’

Format of the figure file

fontScale: float, Default 1.5

Scale most of the figure fonts

includeTitle: boolean, Default False

Whether to include title on the figure

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.makeQualityControlHistogramPlot(subset, cutoff)

Example output:

Cannot load this photo Cannot load this photo Cannot load this photo

Function makeProjectionPlotsQualityControl(): Produce Quality Control t-SNE plots

DigitalCellSorter.makeProjectionPlotsQualityControl(**kwargs)[source]

Produce Quality Control projection plots

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotsQualityControl()

Example output:

Cannot load this photo Cannot load this photo Cannot load this photo
Cannot load this photo

Function makeMarkerSubplots(): Produce subplots on each marker and its expression on all clusters

DigitalCellSorter.makeMarkerSubplots(**kwargs)[source]

Produce subplots on each marker and its expression on all clusters

Parameters:

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeMarkerSubplots()

Example output:

Cannot load this photo Cannot load this photo

Function makeAnomalyScoresPlot(): Make anomaly scores plot

DigitalCellSorter.makeAnomalyScoresPlot(cells='All', suffix='', noPlot=False, **kwargs)[source]

Make anomaly scores plot

Parameters:
cells: pandas.MultiIndex, Default ‘All’

Index of cells of interest

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

cells = DCS.getCells(celltype=’T cell’)

DCS.makeAnomalyScoresPlot(cells)

Example output:

Cannot load this photo Cannot load this photo
Cannot load this photo Cannot load this photo

Function makeIndividualGeneTtestPlot(): Produce individual gene t-test plot of the two-tailed p-value

DigitalCellSorter.makeIndividualGeneTtestPlot(gene, analyzeBy='label', **kwargs)[source]

Produce individual gene t-test plot of the two-tailed p-value.

Parameters:
gene: str

Name of gene of interest

analyzeBy: str, Default ‘label’

What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Any parameters that function ‘makeTtestPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneTtestPlot(‘SDC1’)

Example output:

Cannot load this photo

Function makeIndividualGeneExpressionPlot(): Produce individual gene expression plot on a 2D layout

DigitalCellSorter.makeIndividualGeneExpressionPlot(genes, **kwargs)[source]

Produce individual gene expression plot on a 2D layout

Parameters:
gene: str, or list-like

Name of gene of interest. E.g. ‘CD4, CD33’, ‘PECAM1’, [‘CD4’, ‘CD33’]

hideClusterLabels: boolean, Default False

Whether to hide the clusters labels

outlineClusters: boolean, Default True

Whether to outline the clusters with circles

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneExpressionPlot(‘CD4’)

Example output:

Cannot load this photo

Function makeHistogramNullDistributionPlot(): Produce histogram plot of the voting null distributions

DigitalCellSorter.makeHistogramNullDistributionPlot(*args, **kwargs)

Produce histogram plot of the voting null distributions

Parameters:
dpi: int, Default 600

Resolution of the figure image

extension: str, Default ‘png’

Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeHistogramNullDistributionPlot()

Example output:

Cannot load this photo

Function makeAnnotationResultsMatrixPlot(): Produce voting results voting matrix plot

DigitalCellSorter.makeAnnotationResultsMatrixPlot(*args, **kwargs)

Produce voting results voting matrix plot

Parameters:
dpi: int, Default 600

Resolution of the figure image

extension: str, Default ‘png’

Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeAnnotationResultsMatrixPlot()

Example output:

Cannot load this photo

Function makeMarkerExpressionPlot(): Produce image on marker genes and their expression on all clusters

DigitalCellSorter.makeMarkerExpressionPlot(*args, **kwargs)

Produce image on marker genes and their expression on all clusters. Uses files generated by function DCS.Vote

Parameters:
dpi: int, Default 600

Resolution of the figure image

extension: str, Default ‘png’

Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeMarkerExpressionPlot()

Example output:

Cannot load this photo

Function makeStackedBarplot(): Produce stacked barplot with cell fractions

DigitalCellSorter.makeStackedBarplot(*args, **kwargs)

Produce stacked barplot with cell fractions

Parameters:
clusterName: str, Deafult None

Label to include at the bar bottom. If None the self.dataName value will be used

legendStyle: boolean, Default False

Use one out of two styles of this figure

includeLowQC: boolean, Default True

Wether to include low quality cells

dpi: int, Default 600

Resolution of the figure image

extension: str, Default ‘png’

Format of the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeStackedBarplot(clusterName)

Example output:

Cannot load this photo

Function makeSankeyDiagram(): Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes

DigitalCellSorter.makeSankeyDiagram(*args, **kwargs)

Make a Sankey diagram, also known as ‘river plot’ with two groups of nodes

Parameters:
df: pandas.DataFrame

With counts (overlaps)

colormapForIndex: dictionary, Default None

Colors to use for nodes specified in the DataFrame index

colormapForColumns: dictionary, Default None

Colors to use for nodes specified in the DataFrame columns

linksColor: str, Default ‘rgba(100,100,100,0.6)’

Color of the non-overlapping links

title: str, Default ‘’

Title to print on the diagram

interactive: boolean , Default False

Whether to launch interactive JavaScript-based graph

quality: int, Default 4

Proportional to the resolution of the figure to save

nodeLabelsFontSize: int, Default 15

Font size for node labels

nameAppend: str, Default ‘_Sankey_diagram’

Name to append to the figure file

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeSankeyDiagram(df)

Example output:

Cannot load this photo