Core class¶

Submodule core

The main class of DigitalCellSorter. The class includes tools for:

Pre-preprocessing of single cell RNA sequencing data
Quality control
Batch effects correction
Cells anomaly score evaluation
Dimensionality reduction
Clustering
Annotation of cell types
Vizualization
Post-processing

class DigitalCellSorter(df_expr=None, dataName='dataName', species='Human', geneNamesType='alias', geneListFileName=None, mitochondrialGenes=None, sigmaOverMeanSigma=0.01, nClusters=10, nFineClusters=3, doFineClustering=True, splitFineClusters=False, subSplitSize=100, medianScaleFactor=10000, minSizeForFineClustering=50, clusteringFunction=<class 'sklearn.cluster._agglomerative.AgglomerativeClustering'>, nComponentsPCA=200, nSamples_pDCS=3000, nSamples_Hopfield=200, saveDir='', makeMarkerSubplots=False, availableCPUsCount=2, zScoreCutoff=0.3, subclusteringName=None, doQualityControl=True, doBatchCorrection=False, makePlots=True, useUnderlyingNetwork=True, minimumNumberOfMarkersPerCelltype=10, nameForUnknown='Unassigned', nameForLowQC='Failed QC', matplotlibMode='Agg', countDepthCutoffQC=0.5, numberOfGenesCutoffQC=0.5, mitochondrialGenesCutoffQC=1.5, excludedFromQC=None, countDepthPrecutQC=500, numberOfGenesPrecutQC=250, precutQC=False, minSubclusterSize=25, thresholdForUnknown_pDCS=0.0, thresholdForUnknown_ratio=0.0, thresholdForUnknown_Hopfield=0.0, thresholdForUnknown=0.2, layout='TSNE', safePlotting=True, HopfieldTemperature=0.1, annotationMethod='ratio-pDCS-Hopfield', useNegativeMarkers=True, removeLowQualityScores=True, updateConversionDictFile=True, verbose=1, random_state=None)[source]¶

Bases: DigitalCellSorter.VisualizationFunctions.VisualizationFunctions

Class of Digital Cell Sorter with methods for processing single cell RNA-seq data. Includes analyses and visualization tools.

Parameters:

df_expr: pandas.DataFrame, Defauld None

Gene expression in a form of a table, where genes are rows, and cells/batches are columns

dataName: str, Default ‘dataName’

Name used in output files

geneNamesType: str, Default ‘alias’

Input gene name convention

geneListFileName: str, Default None

Name of the marker genes file

mitochondrialGenes: list, Default None

List of mitochondrial genes to use in quality control

sigmaOverMeanSigma: float, Default 0.1

Threshold to consider a gene constant

nClusters: int, Default 10

Number of clusters

nFineClusters: int, Default 3

Number of fine clusters to determine with Spectral Co-clustering routine. This option is ignored is doFineClustering is False.

doFineClustering: boolean, Default True

Whether to do fine clustering or not

minSizeForFineClustering: int, Default 50

Minimum number of cells required to do fine clustering of a cluster. This option is ignored is doFineClustering is False.

clusteringFunction: function, Default AgglomerativeClustering

Clustering function to use. Other options: KMeans, {k_neighbors:40}, etc. Note: the function should have .fit method and same input and output. For Network-based clustering pass a dictionary {‘k_neighbors’:40, metric:’euclidean’, ‘clusterExpression’:True}, this way the best number of clusters will be determined automatically

nComponentsPCA: int, Default 200

Number of pca components

nSamples_pDCS: int, Default 3000

Number of random samples in distribution for pDCS annotation method

nSamples_Hopfield: int, Default 500

Number of repetitions for Hopfield annotation method

saveDir: str, Default os.path.join(‘’)

Directory for output files

makeMarkerSubplots: boolean, Default False

Whether to make subplots on markers

makePlots: boolean, Default True

Whether to make all major plots

availableCPUsCount: int, Default min(12, os.cpu_count())

Number of CPUs used in pDCS method

zScoreCutoff: float, Default 0.3

Z-Score cutoff when setting expression of a cluster as significant

thresholdForUnknown: float, Default 0.3

Threshold when assigning label “Unknown”. This option is used only with a combination of 2 or more annotation methods

thresholdForUnknown_pDCS: float, Default 0.1

Threshold when assigning label “Unknown” in pDCS method

thresholdForUnknown_ratio: float, Default 0.1

Threshold when assigning label “Unknown” in ratio method

thresholdForUnknown_Hopfield: float, Default 0.1

Threshold when assigning label “Unknown” in Hopfield method

annotationMethod: str, Default ‘ratio-pDCS-Hopfield’

Metod to use for annotation of cell types to clusters. Options are:

‘pDCS’: main DCS voting scheme with null testing

‘ratio’: simple voting score

‘Hopfield’: Hopfield Network classifier

‘pDCS-ratio’: ‘pDCS’ adjusted with ‘ratio’

‘pDCS-Hopfield’: ‘pDCS’ adjusted with ‘Hopfield’

‘ratio-Hopfield’: ‘ratio’ adjusted with ‘Hopfield’

‘pDCS-ratio-Hopfield’: ‘pDCS’ adjusted with ‘ratio’ and ‘Hopfield’

subclusteringName: str, Default None

Parameter used in for certain labels on plots

doQualityControl: boolean, Default True

Whether to remove low quality cells

doBatchCorrection: boolean, Default False

Whether to correct data for batches

minimumNumberOfMarkersPerCelltype: int, Default 10

Minimum number of markers per cell type to keep that cell type in annotation options

nameForUnknown: str, Default ‘Unassigned’

Name to use for clusters where label assignment yielded uncertain results

nameForLowQC: str, Default ‘Failed QC’

Name to use for cell that do not pass quality control

layout: str, Default ‘TSNE’

Projection layout used in visualization. Options are:

‘TSNE’: t-SNE layout L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

‘PCA’: use two largest principal components

‘UMAP’: use uniform manifold approximation, McInnes, L., Healy, J., UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018

‘PHATE’: use potential of heat diffusion for affinity-based transition embedding, Moon, K.R., van Dijk, D., Wang, Z. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019).

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df_data = DCS.Clean(df_data)

Methods:

`KeyInFile`(key, file)	Check is a key exists in a HDF file.
`alignSeries`(se1, se2, tagForMissing)	Align two pandas.Series
`annotate`([mapNonexpressedCelltypes])	Produce cluster voting results, annotate cell types, and update marker expression with cell type labels
`annotateWith_Hopfield_Scheme`(…)	Produce cluster annotation results
`annotateWith_pDCS_Scheme`(df_markers_expr, …)	Produce cluster annotation results
`annotateWith_ratio_Scheme`(df_markers_expr, …)	Produce cluster annotation results
`batchEffectCorrection`([method])	Batch effect correction.
`calculateQCmeasures`()	Calculate Quality Control (QC) measures
`calculateV`(args)	Calculate the voting scores (celltypes by clusters)
`clean`()	Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns
`cluster`()	Cluster PCA-reduced data into a desired number of clusters
`convert`([nameFrom, nameTo])	Convert index to hugo names, if any names in the index are duplicated, remove duplicates
`convertColormap`(colormap)	Convert colormap from the form (1.,1.,1.,1.) to ‘rgba(255,255,255,1.)’
`createReverseDictionary`(inputDictionary)	Efficient way to create a reverse dictionary from a dictionary.
`getAnomalyScores`(trainingSet, testingSet[, …])	Function to get anomaly score of cells based on some reference set
`getCells`([celltype, clusterIndex, clusterName])	Get cell annotations in a form of pandas.Series
`getCountsDataframe`(se1, se2[, tagForMissing])	Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series
`getExprOfCells`(cells)	Get expression of a set of cells.
`getExprOfGene`(gene[, analyzeBy])	Get expression of a gene.
`getHugoName`(gene[, printAliases])	Get gene hugo name(s).
`getIndexOfGoodQualityCells`([QCplotsSubDir])	Get index of sells that satisfy the QC criteria
`getNewMarkerGenes`([cluster, top, …])	Extract new marker genes based on the cluster annotations
`getQualityControlCutoff`(se, cutoff[, …])	Function to calculate QC quality cutoff
`getSubnetworkOfPCN`(subnetworkGenes[, …])	Extract subnetwork of PCN network
`loadAnnotatedLabels`([detailed, …])	Load cell annotations resulted from function ‘annotate’
`loadExpressionData`()	Load processed expression data from the internal HDF storage.
`makeAnomalyScoresPlot`([cells, suffix, noPlot])	Make anomaly scores plot
`makeHopfieldLandscapePlot`([…])	Make and plot Hopfield landscape
`makeIndividualGeneExpressionPlot`(genes, **kwargs)	Produce individual gene expression plot on a 2D layout
`makeIndividualGeneTtestPlot`(gene[, analyzeBy])	Produce individual gene t-test plot of the two-tailed p-value.
`makeMarkerSubplots`(**kwargs)	Produce subplots on each marker and its expression on all clusters
`makeProjectionPlotAnnotated`(**kwargs)	Produce projection plot colored by cell types
`makeProjectionPlotByBatches`(**kwargs)	Produce projection plot colored by batches
`makeProjectionPlotByClusters`(**kwargs)	Produce projection plot colored by clusters
`makeProjectionPlotsQualityControl`(**kwargs)	Produce Quality Control projection plots
`mergeIndexDuplicates`(df_expr[, method, …])	Merge index duplicates
`normalize`([median])	Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index
`prepare`(obj)	Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.
`prepareMarkers`([expressedGenes, …])	Get dictionary of markers for each cell types.
`process`([dataIsNormalized, cleanData])	Process data before using any annotation of visualization functions
`project`([PCAonly, do_fast_tsne])	Project pandas.DataFrame to lower dimensions
`propagateHopfield`([sigma, xi, T, tmax, …])	Function is used internally to propagate Hopfield network over a set number of time steps
`qualityControl`(**kwargs)	Remove low quality cells
`readMarkerFile`([mergeFunction, mergeCutoff])	Read markers file, prepare markers
`recordAnnotationResults`(df_marker_cell_type, …)	Record cell type annotation results to spreadsheets.
`recordExpressionData`()	Record expression data from the internal HDF storage.
`visualize`()	Aggregate of visualization tools of this class.
`zScoreOfSeries`(se)	Calculate z-score of pandas.Series and modify the Series in place

Attributes:

`df_expr`
`fileHDFpath`
`geneListFileName`
`saveDir`

property saveDir¶

property fileHDFpath¶

property df_expr¶

property geneListFileName¶

prepare(obj)[source]¶

Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.

Parameters:

obj: str, pandas.DataFrame, pandas.Series: Expression data in a form of pandas.DataFrame, pandas.Series, or name and path to a csv file with data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

dDCS.preapre(‘data.csv’)

convert(nameFrom=None, nameTo=None, **kwargs)[source]¶

Convert index to hugo names, if any names in the index are duplicated, remove duplicates

Parameters:

nameFrom: str, Default ‘alias’: Gene name type to convert from
nameTo: str, Default ‘hugo’: Gene name type to convert to

Any parameters that function ‘mergeIndexDuplicates’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.convertIndex()

clean()[source]¶

Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.clean()

normalize(median=None)[source]¶

Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index

Parameters:

median: float, Default None: Scale factor, if not provided will be computed as median across all cells in data

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.normalize()

project(PCAonly=False, do_fast_tsne=True)[source]¶

Project pandas.DataFrame to lower dimensions

Parameters:

PCAonly: boolean, Default False: Perform Principal component analysis only
do_fast_tsne: boolean, Default True: Do FI-tSNE instead of “exact” tSNE This option is ignored if layout is not ‘TSNE’

Returns:

tuple: Processed data

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

xPCA, PCs, tSNE = DCS.project()

cluster()[source]¶

Cluster PCA-reduced data into a desired number of clusters

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.cluster()

annotate(mapNonexpressedCelltypes=True)[source]¶

Produce cluster voting results, annotate cell types, and update marker expression with cell type labels

Parameters:

mapNonexpressedCelltypes: boolean, Default True: If True then cell types coloring will be consistent across all datasets, regardless what cell types are annotated in all datasets for a given input marker list file.

Returns:

dictionary: Voting results, a dictionary in form of: {cluster label: assigned cell type}

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

results = DCS.annotate(df_markers_expr, df_marker_cell_type)

process(dataIsNormalized=False, cleanData=True)[source]¶

Process data before using any annotation of visualization functions

Parameters:

dataIsNormalized: boolean, Default False: Whether DCS.df_expr is normalized or not

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

visualize()[source]¶

Aggregate of visualization tools of this class.

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.visualize()

makeProjectionPlotAnnotated(**kwargs)[source]¶

Produce projection plot colored by cell types

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotAnnotated()

makeProjectionPlotByBatches(**kwargs)[source]¶

Produce projection plot colored by batches

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByBatches()

makeProjectionPlotByClusters(**kwargs)[source]¶

Produce projection plot colored by clusters

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotByClusters()

makeProjectionPlotsQualityControl(**kwargs)[source]¶

Produce Quality Control projection plots

Parameters:

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeProjectionPlotsQualityControl()

makeMarkerSubplots(**kwargs)[source]¶

Produce subplots on each marker and its expression on all clusters

Parameters:

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.makeMarkerSubplots()

makeAnomalyScoresPlot(cells='All', suffix='', noPlot=False, **kwargs)[source]¶

Make anomaly scores plot

Parameters:

cells: pandas.MultiIndex, Default ‘All’: Index of cells of interest

Any parameters that function ‘makeProjectionPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

cells = DCS.getCells(celltype=’T cell’)

DCS.makeAnomalyScoresPlot(cells)

makeIndividualGeneTtestPlot(gene, analyzeBy='label', **kwargs)[source]¶

Produce individual gene t-test plot of the two-tailed p-value.

Parameters:

gene: str: Name of gene of interest
analyzeBy: str, Default ‘label’: What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Any parameters that function ‘makeTtestPlot’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneTtestPlot(‘SDC1’)

makeIndividualGeneExpressionPlot(genes, **kwargs)[source]¶

Produce individual gene expression plot on a 2D layout

Parameters:

gene: str, or list-like: Name of gene of interest. E.g. ‘CD4, CD33’, ‘PECAM1’, [‘CD4’, ‘CD33’]
hideClusterLabels: boolean, Default False: Whether to hide the clusters labels
outlineClusters: boolean, Default True: Whether to outline the clusters with circles

Any parameters that function ‘internalMakeMarkerSubplots’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.makeIndividualGeneExpressionPlot(‘CD4’)

makeHopfieldLandscapePlot(meshSamplingRate=1000, plot3D=True, reuseData=False, **kwargs)[source]¶

Make and plot Hopfield landscape

Parameters:

meshSamplingRate: int, Default 1000: Defines quality of sampling around attractor states
plot3D: boolean, Default False: Whether to plot 2D or 3D figure
reuseData: boolean, Default False: Whether to attempt using precalculated data.

Any parameters that function ‘HopfieldLandscapePlot’ or ‘HopfieldLandscapePlot3D’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter() DCS.makeHopfieldLandscapePlot()

getAnomalyScores(trainingSet, testingSet, printResults=False)[source]¶

Function to get anomaly score of cells based on some reference set

Parameters:

trainingSet: pandas.DataFrame: With cells to trail isolation forest on
testingSet: pandas.DataFrame: With cells to score
printResults: boolean, Default False: Whether to print results

Returns:

1d numpy.array: Anomaly score(s) of tested cell(s)

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.getAnomalyScores(df_expr.iloc[:, 5:], df_expr.iloc[:, :5])

getHugoName(gene, printAliases=False)[source]¶

Get gene hugo name(s).

Parameters:

gene: str: ‘hugo’ or ‘alias’ name of a gene

Returns:

str: Hugo name if found, otherwise input name

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.getHugoName(‘CD138’)

getExprOfGene(gene, analyzeBy='cluster')[source]¶

Get expression of a gene. Run this function only after function process()

Parameters:

cells: pandas.MultiIndex: Index of cells of interest
analyzeBy: str, Default ‘cluster’: What level of lablels to include. Other possible options are ‘label’ and ‘celltype’

Returns:

pandas.DataFrame: With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfGene(‘SDC1’)

getExprOfCells(cells)[source]¶

Get expression of a set of cells. Run this function only after function process()

Parameters:

cells: pandas.MultiIndex: 2-level Index of cells of interest, must include levels ‘batch’ and ‘cell’

Returns:

pandas.DataFrame: With expression of the cells of interest

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

DCS.getExprOfCells(cells)

getCells(celltype=None, clusterIndex=None, clusterName=None)[source]¶

Get cell annotations in a form of pandas.Series

Parameters:

celltype: str, Default None: Cell type to extract
clusterIndex: int, Default None: Cell type to extract
clusterName: str, Default None: Cell type to extract

Returns:

pandas.MultiIndex: Index of labelled cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.process()

labels = DCS.getCells()

getIndexOfGoodQualityCells(QCplotsSubDir='QC_plots', **kwargs)[source]¶

Get index of sells that satisfy the QC criteria

Parameters:

count_depth_cutoff: float, Default 0.5: Fraction of median to take as count depth cutoff
number_of_genes_cutoff: float, Default 0.5: Fraction of median to take as number of genes cutoff
mitochondrial_genes_cutoff: float, Default 3.0: The cutoff is median + standard_deviation * this_parameter

Any parameters that function ‘makeQualityControlHistogramPlot’ can accept

Returns:

pandas.Index: Index of cells

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

index = DCS.getIndexOfGoodQualityCells()

getQualityControlCutoff(se, cutoff, precut=1.0, mito=False, MakeHistogramPlot=True, **kwargs)[source]¶

Function to calculate QC quality cutoff

Parameters:

se: pandas.Series: With data to analyze
cutoff: float: Parameter for calculating the quality control cutoff
mito: boolean, Default False: Whether the analysis of mitochondrial genes fraction
plotPathAndName: str, Default None: Text to include in the figure title and file name
MakeHistogramPlot: boolean, Default True: Whether to make a histogram plot

Any parameters that function ‘makeQualityControlHistogramPlot’ can accept

Returns:

float: Cutoff value

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

cutoff = DCS.getQualityControlCutoff(se)

getCountsDataframe(se1, se2, tagForMissing='N/A')[source]¶

Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series

Parameters:

se1: pandas.Series: Series with the first set of items
se2: pandas.Series: Series with the second set of items
tagForMissing: str, Default ‘N/A’: Label to assign to non-overlapping items

Returns:

pandas.DataFrame: Contains counts

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df = DCS.getCountsDataframe(se1, se2)

getNewMarkerGenes(cluster=None, top=100, zScoreCutoff=None, removeUnknown=False, **kwargs)[source]¶

Extract new marker genes based on the cluster annotations

Parameters:

cluster: int, Default None: Cluster #, if provided genes of only this culster will be returned
top: int, Default 100: Upper bound for number of new markers per cell type
zScoreCutoff: float, Default 0.3: Lower bound for a marker z-score to be significant
removeUnknown: boolean, Default False: Whether to remove type “Unknown”

Any parameters that function ‘makePlotOfNewMarkers’ can accept

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.extractNewMarkerGenes()

classmethod calculateV(args)[source]¶

Calculate the voting scores (celltypes by clusters)

Parameters:

args: tuple

Tuple of sub-arguments

df_M: pandas.DataFrame: Marker cell type DataFrame
df_X: pandas.DataFrame: Markers expression DataFrame
cluster_index: 1d numpy.array: Clustering index
cutoff: float: Significance cutoff, i.e. a threshold for a given marker to be significant
giveSignificant: boolean: Whether to return the significance matrix along with the scores
removeLowQCscores: boolean: Whether to remove low quality scores, i.e. those with less than 10% of markers that a re supporting

Returns:

pandas.DataFrame: Contains voting scores per celltype per cluster

Usage:

Function is used internally.

df = calculateV((df_M, df_X, cluster_index, 0.3, False, True))

annotateWith_pDCS_Scheme(df_markers_expr, df_marker_cell_type)[source]¶

Produce cluster annotation results

Parameters:

df_markers_expr: pandas.DataFrame: Data with marker genes by cells expression
df_marker_cell_type: pandas.DataFrame: Data with marker genes by cell types

Returns:

tuple

Usage:

Function should be called internally only

annotateWith_ratio_Scheme(df_markers_expr, df_marker_cell_type)[source]¶

Produce cluster annotation results

Parameters:

df_markers_expr: pandas.DataFrame: Data with marker genes by cells expression
df_marker_cell_type: pandas.DataFrame: Data with marker genes by cell types

Returns:

tuple

Usage:

Function should be called internally only

annotateWith_Hopfield_Scheme(df_markers_expr, df_marker_cell_type)[source]¶

Produce cluster annotation results

Parameters:

df_markers_expr: pandas.DataFrame: Markers expression DataFrame
df_marker_cell_type: pandas.DataFrame: Marker cell type DataFrame

Returns:

tuple

Usage:

Function should be called internally only

recordAnnotationResults(df_marker_cell_type, df_markers_expr, df_L, df_V, dict_expressed_markers, df_null_distributions=None)[source]¶

Record cell type annotation results to spreadsheets.

Parameters:

df_marker_cell_type: pandas.DataFrame: Markers to cell types table
df_markers_expr: pandas.DataFrame: Markers expression in each cluster
df_L: pandas.DataFrame: Annotation scores along with other information
df_V: pandas.DataFrame: Annotation scores along with other information
dict_expressed_markers: dictionary: Dictionary of markers signigicantly expressed in each cluster
df_null_distributions: pandas.DataFrame, Default None: Table with null distributions

Returns:

None

Usage:

This function is intended to be used internally only

propagateHopfield(sigma=None, xi=None, T=0.2, tmax=200, fractionToUpdate=0.5, mode=4, meshSamplingRate=200, underlyingNetwork=None, typesNames=None, clustersNames=None, printInfo=False, recordTrajectories=True, id=None, printSwitchingFraction=False, path=None, verbose=0)[source]¶

Function is used internally to propagate Hopfield network over a set number of time steps

Parameters:

sigma: pandas.DataFrame, Default None

Markers expression

xi: pandas.DataFrame, Default None

Marker cell type DataFrame

T: float, Default 0.2

Noise (Temperature) parameter

tmax: int, Default 200

Number of step to iterate through

fractionToUpdate: float, Default 0.5

Fraction of nodes to randomly update at each iteration

mode: int, Default 4

Options are:: 1: non-onthogonalized, non-weighted attractors 2: onthogonalized, non-weighted attractors 3: onthogonalized, weighted attractors 4: onthogonalized, weighted attractors, asymetric and diluted dynamics

meshSamplingRate: int, Default 100

Visualization parameter to control the quality of the color mesh near the attractors

underlyingNetwork: 2d numpy.array, Default None

Network of underlying connections between genes

typesNames: list-like, Default None

Names of cell types

clustersNames: list-like, Default None

Names or identifiers of the clusters

printInfo: boolean, Default False

Whether to print detailes

recordTrajectories: boolean, Default True

Whether to record trajectories data to files

id: int, Default None

Identifier of this function call

printSwitchingFraction: boolean, Default False

Whether to print fraction of clusters that switch theie maximum overlapping attractor

path: str, Default None

Path for saving trajectories data

Returns:

2d numpy.array: Overlaps

Usage:

result = propagateHopfield(sigma=sigma, xi=df_attrs)

classmethod convertColormap(colormap)[source]¶

Convert colormap from the form (1.,1.,1.,1.) to ‘rgba(255,255,255,1.)’

Parameters:

colormap: dictionary: Colormap to convert

Returns:

dictionary: Converted colomap

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

colormap = DCS.convertColormap(colormap)

classmethod zScoreOfSeries(se)[source]¶

Calculate z-score of pandas.Series and modify the Series in place

Parameters:

se: pandas.Series: Series to process

Returns:

pandas.Series: Processed series

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

se = DCS.zScoreOfSeries(se)

classmethod KeyInFile(key, file)[source]¶

Check is a key exists in a HDF file.

Parameters:

key: str: Key name to check
file: str: HDF file name to check

Returns:

boolean: True if the key is found False otherwise

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.KeyInFile(‘df_expr’, ‘data/file.h5’)

getSubnetworkOfPCN(subnetworkGenes, min_shared_first_targets=30)[source]¶

Extract subnetwork of PCN network

Parameters:

subnetworkGenes: list-like: Set of genes that the subnetwork should contain
min_shared_first_targets: int, Default 30: Number of minimum first shared targets to connect two nodes

Returns:

pandas.DataFrame: Adjacency matrix

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df_subnetwork = DCS.getSubnetworkOfPCN(genes)

alignSeries(se1, se2, tagForMissing)[source]¶

Align two pandas.Series

Parameters:

se1: pandas.Series: Series with the first set of items
se2: pandas.Series: Series with the second set of items
tagForMissing: str, Default ‘Missing’: Label to assign to non-overlapping items

Returns:

pandas.DataFrame: Contains two aligned pandas.Series

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df = DCS.alignSeries(pd.Index([‘A’, ‘B’, ‘C’, ‘D’]).to_series(), pd.Index([‘B’, ‘C’, ‘D’, ‘E’, ‘F’]).to_series())

createReverseDictionary(inputDictionary)[source]¶

Efficient way to create a reverse dictionary from a dictionary. Utilizes Pandas.Dataframe.groupby and Numpy arrays indexing.

Parameters:

inputDictionary: dictionary: Dictionary to reverse

Returns:

dictionary: Reversed dictionary

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

revDict = DCS.createReverseDictionary(Dict)

readMarkerFile(mergeFunction='mean', mergeCutoff=0.25)[source]¶

Read markers file, prepare markers

Parameters:

mergeCutoff: str, Default ‘mean’

Function used for grouping of the cell sub-types. Options are:: ‘mean’: average of the values ‘max’: maxium of the values, effectively a logiacal OR function

mergeCutoff: float, Default 0.25

Values below cutoff are set to zero. This option is used if mergeCutoff is ‘mean’

Returns:

pandas.DataFrame: Celltype/markers matrix

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df_marker_cell_type = DCS.readMarkerFile()

mergeIndexDuplicates(df_expr, method='average', printDuplicates=False, verbose=1)[source]¶

Merge index duplicates

Parameters:

df_expr: pandas.DataFrame

Gene expression table

method: str, Default None

How to deal with index duplicates. Option are:

‘average’: average values of duplicates

‘first’: keep only first of duplicates, discard rest

Returns:

pandas.DataFrame: Gene expression table

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

df_expr = DCS.mergeIndexDuplicates(df_expr)

recordExpressionData()[source]¶

Record expression data from the internal HDF storage.

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.recordExpressionData()

loadAnnotatedLabels(detailed=False, includeLowQC=True, infoType='label')[source]¶

Load cell annotations resulted from function ‘annotate’

Parameters:

detailed: boolean, Default False: Whether to give cluster- or celltype- resolution data
includeLowQC: boolean, Default False: Whether to include low quality cells in the output

Returns:

pandas.Series

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.loadAnnotatedLabels()

loadExpressionData()[source]¶

Load processed expression data from the internal HDF storage.

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.loadExpressionData()

prepareMarkers(expressedGenes=None, createColormapForCelltypes=True)[source]¶

Get dictionary of markers for each cell types.

Parameters:

expressedGenes: pandas.Index, Default None: If not None then the marker DataFrame will be intersected with this index, i.e. all non-expressed genes will be filtered from the marker file
createColormapForCelltypes: boolean, Default True: Create (or update) a colormap for cell types based on a marker-celltype matrix. This will make coloring of cell clusters consistent across all plots.

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.prepareMarkers()

calculateQCmeasures()[source]¶

Calculate Quality Control (QC) measures

Parameters:

None

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.calculateQCmeasures()

qualityControl(**kwargs)[source]¶

Remove low quality cells

Parameters:

None

Returns:

Any parameters that function ‘getIndexOfGoodQualityCells’ can accept

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.qualityControl()

batchEffectCorrection(method='COMBAT')[source]¶

Batch effect correction.

Parameters:

method: str, Default ‘COMBAT’: Stein, C.K., Qu, P., Epstein, J. et al. Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat. BMC Bioinformatics 16, 63 (2015)

Returns:

None

Usage:

DCS = DigitalCellSorter.DigitalCellSorter()

DCS.batchEffectCorrection()