Command Line Interface

CLEP commands.

clep

Run clep.

clep [OPTIONS] COMMAND [ARGS]...

classify

Perform machine-learning classification.

clep classify [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--out <out>: Required Path to the output folder

--model <model>

Required Choose a classification model

Options: logistic_regression | elastic_net | svm | random_forest | gradient_boost

--optimizer <optimizer>

Required Optimizer used for classifier.

Options: grid_search | random_search | bayesian_search

--cv <cv>

Number of cross validation steps

Default: 5

-m, --metrics <metrics>

Metrics that should be tested during cross validation (comma separated)

Options: explained_variance | r2 | max_error | neg_median_absolute_error | neg_mean_absolute_error | neg_mean_squared_error | neg_mean_squared_log_error | neg_root_mean_squared_error | neg_mean_poisson_deviance | neg_mean_gamma_deviance | accuracy | roc_auc | roc_auc_ovr | roc_auc_ovo | roc_auc_ovr_weighted | roc_auc_ovo_weighted | balanced_accuracy | average_precision | neg_log_loss | neg_brier_score | adjusted_rand_score | homogeneity_score | completeness_score | v_measure_score | mutual_info_score | adjusted_mutual_info_score | normalized_mutual_info_score | fowlkes_mallows_score | precision | precision_macro | precision_micro | precision_samples | precision_weighted | recall | recall_macro | recall_micro | recall_samples | recall_weighted | f1 | f1_macro | f1_micro | f1_samples | f1_weighted | jaccard | jaccard_macro | jaccard_micro | jaccard_samples | jaccard_weighted

--randomize: Randomize sample labels to test the stability of and effectiveness of the machine learning algorithm

embedding

List Vectorization methods available.

clep embedding [OPTIONS] COMMAND [ARGS]...

evaluate

Perform Evaluation of the Embeddings.

clep embedding evaluate [OPTIONS]

Options

--data <data>: Required Path to a set of binned files

--label <label>: Required Label for the set of binned files

generate-network

Generate Network for the given data.

clep embedding generate-network [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--out <out>: Required Path to the output folder

--method <method>

The method used to generate the network

Default: interaction_network
Options: pathway_overlap | interaction_network | interaction_network_overlap

--kg <kg>: Path to the Knowledge Graph file in tsv format if Interaction Network method is chosen

--gmt <gmt>: Path to the gmt file if Pathway Overlap method is chosen

--network_folder <network_folder>: Path to the folder containing all the knowledge graph files if Interaction Network Overlap method is chosen

--intersect_thr <intersect_thr>

Threshold to make edges in Pathway Overlap method

Default: 0.1

-rs, --ret_summary

Flag to indicate if the edge summary for patients must be created.

Default: False

--jaccard_thr <jaccard_thr>

Threshold to make edges in Interaction Network Overlap method

Default: 0.1

kge

Perform knowledge graph embedding.

clep embedding kge [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--design <design>: Required Path to tab-separated experiment design file

--out <out>: Required Path to the output folder

--all_nodes

Use this tag to return all nodes (not just patients)

Default: False

-m, --model_config <model_config>: Required The configuration file for the model used for knowledge graph embedding in JSON format

--train_size <train_size>

Size of the training data for the knowledge graph embedding model

Default: 0.8

--validation_size <validation_size>

Size of the validation data for the knowledge graph embedding model

Default: 0.1

sample-scoring

List Single Sample Scoring methods available.

clep sample-scoring [OPTIONS] COMMAND [ARGS]...

limma

Limma-based Single Sample Scoring

clep sample-scoring limma [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--design <design>: Required Path to tab-separated experiment design file

--out <out>: Required Path to the output folder

--alpha <alpha>

Family-wise error rate

Default: 0.05

--method <method>

Method used for testing and adjustment of P-Values

Default: fdr_bh

--control <control>

Annotated value for the control samples (must start with an alphabet)

Default: Control

radical-search

Radical Searching based Single Sample Scoring

clep sample-scoring radical-search [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--design <design>: Required Path to tab-separated experiment design file

--out <out>: Required Path to the output folder

--control <control>

Annotated value for the control samples (must start with an alphabet)

Default: Control

--threshold <threshold>

Percentage of samples considered as ‘extreme’ on either side of the distribution

Default: 2.5

-rs, --ret_summary

Flag to indicate if the edge summary for patients must be created.

Default: False

-cb, --control_based: Run Radical Searching where the scoring is based on the control population instead of entire dataset

ssgsea

ssGSEA based Single Sample Scoring

clep sample-scoring ssgsea [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--design <design>: Required Path to tab-separated experiment design file

--out <out>: Required Path to the output folder

--gs <gs>: Required Path to the .gmt geneset file

z-score

Z-Score based Single Sample Scoring

clep sample-scoring z-score [OPTIONS]

Options

--data <data>: Required Path to tab-separated gene expression data file

--design <design>: Required Path to tab-separated experiment design file

--out <out>: Required Path to the output folder

--control <control>

Annotated value for the control samples (must start with an alphabet)

Default: Control

--threshold <threshold>

Threshold for choosing patients that are ‘extreme’ w.r.t. the controls. If the z_score of a gene is greater than this threshold the gene is either up or down regulated.

Default: 2.0