Developmental Guide

Core Module APIs

Sample Scoring

clep.sample_scoring.limma.do_limma()[source]

Perform data manipulation before limma based SS scoring.

Parameters
  • data – Dataframe containing the gene expression values

  • design – Dataframe containing the design table for the data

  • alpha – Family-wise error rate

  • method – Method used family-wise error correction

  • control – label used for representing the control in the design table of the data

Returns

Dataframe containing the Single Sample scores from limma

clep.sample_scoring.ssgsea.do_ssgsea()[source]

Run single sample GSEA (ssGSEA) on filtered gene expression data set.

Parameters
  • filtered_expression_data – filtered gene expression values for samples

  • gene_set – .gmt file containing gene sets

  • output_dir – output directory

  • processes – Number of processes

  • max_size – Maximum allowed number of genes from gene set also the data set

  • min_size – Minimum allowed number of genes from gene set also the data set

Returns

ssGSEA results in respective directory

clep.sample_scoring.z_score.do_z_score()[source]

Carry out Z-Score based single sample DE analysis.

Parameters
  • data – Dataframe containing the gene expression values

  • design – Dataframe containing the design table for the data

  • control – label used for representing the control in the design table of the data

  • threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls.

Returns

Dataframe containing the Single Sample scores using Z_Scores

Identify the samples with extreme feature values either based on the entire dataset or control population.

Parameters
  • data – Dataframe containing the gene expression values

  • design – Dataframe containing the design table for the data

  • threshold – Threshold for choosing patients that are “extreme” w.r.t. the controls

  • control – label used for representing the control in the design table of the data

  • control_based – The scoring is based on the control population instead of entire dataset

Returns

Dataframe containing the Single Sample scores using radical searching

KG Generation

clep.embedding.network_generator.do_graph_gen()[source]

Generate patient-feature network given the data using a certain network generation method.

Parameters
  • data – Dataframe containing the patient-feature scores

  • network_gen_method – Method to generate the patient-feature network

  • gmt – Optional field for the path to the gmt file containing the pathway data

  • intersection_threshold – Threshold to make edges in Pathway Overlap method

  • kg_data – Optional field for the knowledge graph in edgelist format stored in a pandas dataframe

  • folder_path – Optional field for the path to a folder containing multiple knowledge graphs

  • jaccard_threshold – Threshold to make edges in Interaction Network Overlap method

  • summary – Flag to indicate if the summary of the patient-feature network must be returned

Returns

Dataframe containing patient-feature network, and optionally the summary of the patient-feature network

KG Embedding

clep.embedding.kge._weighted_splitter()[source]

Split the given edgelist into training, validation and testing sets on the basis of the ratio of relations.

Parameters
  • edgelist – Edgelist in the form of (Source, Relation, Target)

  • train_size – Size of the training data

  • validation_size – Size of the training data

Returns

Tuple containing the train, validation & test splits

clep.embedding.kge.do_kge()[source]

Carry out KGE on the given data.

Parameters
  • edgelist – Dataframe containing the patient-feature graph in edgelist format

  • design – Dataframe containing the design table for the data

  • out – Output folder for the results

  • model_config – Configuration file for the KGE models, in JSON format.

  • return_patients – Flag to indicate if the final data should contain only patients or even the features

  • train_size – Size of the training data for KGE ranging from 0 - 1

  • validation_size – Size of the validation data for KGE ranging from 0 - 1. It must be lower than training size

Returns

Dataframe containing the embedding from the KGE

Classification

clep.classification.classify.do_classification()[source]

Perform classification on embeddings generated from previous step.

Parameters
  • data – Dataframe containing the embeddings

  • model_name – model that should be used for cross validation

  • optimizer_name – Optimizer used to optimize the classification

  • out_dir – Path to the output directory

  • validation_cv – Number of cross validation steps

  • scoring_metrics – Scoring metrics tested during cross validation

  • rand_labels – Boolean variable to indicate if labels must be randomized to check for ML stability

  • args – Custom arguments to the estimator model

Returns

Dictionary containing the cross validation results