Developmental Guide¶
Core Module APIs¶
Sample Scoring¶
-
clep.sample_scoring.limma.
do_limma
()[source]¶ Perform data manipulation before limma based SS scoring.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
alpha – Family-wise error rate
method – Method used family-wise error correction
control – label used for representing the control in the design table of the data
- Returns
Dataframe containing the Single Sample scores from limma
-
clep.sample_scoring.ssgsea.
do_ssgsea
()[source]¶ Run single sample GSEA (ssGSEA) on filtered gene expression data set.
- Parameters
filtered_expression_data – filtered gene expression values for samples
gene_set – .gmt file containing gene sets
output_dir – output directory
processes – Number of processes
max_size – Maximum allowed number of genes from gene set also the data set
min_size – Minimum allowed number of genes from gene set also the data set
- Returns
ssGSEA results in respective directory
-
clep.sample_scoring.z_score.
do_z_score
()[source]¶ Carry out Z-Score based single sample DE analysis.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
control – label used for representing the control in the design table of the data
threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls.
- Returns
Dataframe containing the Single Sample scores using Z_Scores
-
clep.sample_scoring.radical_search.
do_radical_search
()[source]¶ Identify the samples with extreme feature values either based on the entire dataset or control population.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls
control – label used for representing the control in the design table of the data
control_based – The scoring is based on the control population instead of entire dataset
- Returns
Dataframe containing the Single Sample scores using radical searching
KG Generation¶
-
clep.embedding.network_generator.
do_graph_gen
()[source]¶ Generate patient-feature network given the data using a certain network generation method.
- Parameters
data – Dataframe containing the patient-feature scores
network_gen_method – Method to generate the patient-feature network
gmt – Optional field for the path to the gmt file containing the pathway data
intersection_threshold – Threshold to make edges in Pathway Overlap method
kg_data – Optional field for the knowledge graph in edgelist format stored in a pandas dataframe
folder_path – Optional field for the path to a folder containing multiple knowledge graphs
jaccard_threshold – Threshold to make edges in Interaction Network Overlap method
summary – Flag to indicate if the summary of the patient-feature network must be returned
- Returns
Dataframe containing patient-feature network, and optionally the summary of the patient-feature network
KG Embedding¶
-
clep.embedding.kge.
_weighted_splitter
()[source]¶ Split the given edgelist into training, validation and testing sets on the basis of the ratio of relations.
- Parameters
edgelist – Edgelist in the form of (Source, Relation, Target)
train_size – Size of the training data
validation_size – Size of the training data
- Returns
Tuple containing the train, validation & test splits
-
clep.embedding.kge.
do_kge
()[source]¶ Carry out KGE on the given data.
- Parameters
edgelist – Dataframe containing the patient-feature graph in edgelist format
design – Dataframe containing the design table for the data
out – Output folder for the results
model_config – Configuration file for the KGE models, in JSON format.
return_patients – Flag to indicate if the final data should contain only patients or even the features
train_size – Size of the training data for KGE ranging from 0 - 1
validation_size – Size of the validation data for KGE ranging from 0 - 1. It must be lower than training size
- Returns
Dataframe containing the embedding from the KGE
Classification¶
-
clep.classification.classify.
do_classification
()[source]¶ Perform classification on embeddings generated from previous step.
- Parameters
data – Dataframe containing the embeddings
model_name – model that should be used for cross validation
optimizer_name – Optimizer used to optimize the classification
out_dir – Path to the output directory
validation_cv – Number of cross validation steps
scoring_metrics – Scoring metrics tested during cross validation
rand_labels – Boolean variable to indicate if labels must be randomized to check for ML stability
args – Custom arguments to the estimator model
- Returns
Dictionary containing the cross validation results