Developmental Guide
Core Module APIs
Sample Scoring
- clep.sample_scoring.limma.do_limma()[source]
Perform data manipulation before limma based SS scoring.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
alpha – Family-wise error rate
method – Method used family-wise error correction
control – label used for representing the control in the design table of the data
- Returns
Dataframe containing the Single Sample scores from limma
- clep.sample_scoring.ssgsea.do_ssgsea()[source]
Run single sample GSEA (ssGSEA) on filtered gene expression data set.
- Parameters
filtered_expression_data – filtered gene expression values for samples
gene_set – .gmt file containing gene sets
output_dir – output directory
processes – Number of processes
max_size – Maximum allowed number of genes from gene set also the data set
min_size – Minimum allowed number of genes from gene set also the data set
- Returns
ssGSEA results in respective directory
- clep.sample_scoring.z_score.do_z_score()[source]
Carry out Z-Score based single sample DE analysis.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
control – label used for representing the control in the design table of the data
threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls.
- Returns
Dataframe containing the Single Sample scores using Z_Scores
- clep.sample_scoring.radical_search.do_radical_search()[source]
Identify the samples with extreme feature values either based on the entire dataset or control population.
- Parameters
data – Dataframe containing the gene expression values
design – Dataframe containing the design table for the data
threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls
control – label used for representing the control in the design table of the data
control_based – The scoring is based on the control population instead of entire dataset
- Returns
Dataframe containing the Single Sample scores using radical searching
KG Generation
- clep.embedding.network_generator.do_graph_gen()[source]
Generate patient-feature network given the data using a certain network generation method.
- Parameters
data – Dataframe containing the patient-feature scores
network_gen_method – Method to generate the patient-feature network
gmt – Optional field for the path to the gmt file containing the pathway data
intersection_threshold – Threshold to make edges in Pathway Overlap method
kg_data – Optional field for the knowledge graph in edgelist format stored in a pandas dataframe
folder_path – Optional field for the path to a folder containing multiple knowledge graphs
jaccard_threshold – Threshold to make edges in Interaction Network Overlap method
summary – Flag to indicate if the summary of the patient-feature network must be returned
- Returns
Dataframe containing patient-feature network, and optionally the summary of the patient-feature network
KG Embedding
- clep.embedding.kge._weighted_splitter()[source]
Split the given edgelist into training, validation and testing sets on the basis of the ratio of relations.
- Parameters
edgelist – Edgelist in the form of (Source, Relation, Target)
train_size – Size of the training data
validation_size – Size of the training data
- Returns
Tuple containing the train, validation & test splits
- clep.embedding.kge.do_kge()[source]
Carry out KGE on the given data.
- Parameters
edgelist – Dataframe containing the patient-feature graph in edgelist format
design – Dataframe containing the design table for the data
out – Output folder for the results
model_config – Configuration file for the KGE models, in JSON format.
return_patients – Flag to indicate if the final data should contain only patients or even the features
train_size – Size of the training data for KGE ranging from 0 - 1
validation_size – Size of the validation data for KGE ranging from 0 - 1. It must be lower than training size
- Returns
Dataframe containing the embedding from the KGE
Classification
- clep.classification.classify.do_classification()[source]
Perform classification on embeddings generated from previous step.
- Parameters
data – Dataframe containing the embeddings
model_name – model that should be used for cross validation
optimizer_name – Optimizer used to optimize the classification
out_dir – Path to the output directory
validation_cv – Number of cross validation steps
scoring_metrics – Scoring metrics tested during cross validation
rand_labels – Boolean variable to indicate if labels must be randomized to check for ML stability
args – Custom arguments to the estimator model
- Returns
Dictionary containing the cross validation results