Command Line Interface

CLEP commands.

clep

Run clep.

clep [OPTIONS] COMMAND [ARGS]...

classify

Perform machine-learning classification.

clep classify [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--out <out>

Required Path to the output folder

--model <model>

Required Choose a classification model

Options

logistic_regression | elastic_net | svm | random_forest | gradient_boost

--optimizer <optimizer>

Required Optimizer used for classifier.

Options

grid_search | random_search | bayesian_search

--cv <cv>

Number of cross validation steps

Default

5

-m, --metrics <metrics>

Metrics that should be tested during cross validation (comma separated)

Options

explained_variance | r2 | max_error | neg_median_absolute_error | neg_mean_absolute_error | neg_mean_squared_error | neg_mean_squared_log_error | neg_root_mean_squared_error | neg_mean_poisson_deviance | neg_mean_gamma_deviance | accuracy | roc_auc | roc_auc_ovr | roc_auc_ovo | roc_auc_ovr_weighted | roc_auc_ovo_weighted | balanced_accuracy | average_precision | neg_log_loss | neg_brier_score | adjusted_rand_score | homogeneity_score | completeness_score | v_measure_score | mutual_info_score | adjusted_mutual_info_score | normalized_mutual_info_score | fowlkes_mallows_score | precision | precision_macro | precision_micro | precision_samples | precision_weighted | recall | recall_macro | recall_micro | recall_samples | recall_weighted | f1 | f1_macro | f1_micro | f1_samples | f1_weighted | jaccard | jaccard_macro | jaccard_micro | jaccard_samples | jaccard_weighted

--randomize

Randomize sample labels to test the stability of and effectiveness of the machine learning algorithm

embedding

List Vectorization methods available.

clep embedding [OPTIONS] COMMAND [ARGS]...

evaluate

Perform Evaluation of the Embeddings.

clep embedding evaluate [OPTIONS]

Options

--data <data>

Required Path to a set of binned files

--label <label>

Required Label for the set of binned files

generate-network

Generate Network for the given data.

clep embedding generate-network [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--out <out>

Required Path to the output folder

--method <method>

The method used to generate the network

Default

interaction_network

Options

pathway_overlap | interaction_network | interaction_network_overlap

--kg <kg>

Path to the Knowledge Graph file in tsv format if Interaction Network method is chosen

--gmt <gmt>

Path to the gmt file if Pathway Overlap method is chosen

--network_folder <network_folder>

Path to the folder containing all the knowledge graph files if Interaction Network Overlap method is chosen

--intersect_thr <intersect_thr>

Threshold to make edges in Pathway Overlap method

Default

0.1

-rs, --ret_summary

Flag to indicate if the edge summary for patients must be created.

Default

False

--jaccard_thr <jaccard_thr>

Threshold to make edges in Interaction Network Overlap method

Default

0.1

kge

Perform knowledge graph embedding.

clep embedding kge [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--design <design>

Required Path to tab-separated experiment design file

--out <out>

Required Path to the output folder

--all_nodes

Use this tag to return all nodes (not just patients)

Default

False

-m, --model_config <model_config>

Required The configuration file for the model used for knowledge graph embedding in JSON format

--train_size <train_size>

Size of the training data for the knowledge graph embedding model

Default

0.8

--validation_size <validation_size>

Size of the validation data for the knowledge graph embedding model

Default

0.1

sample-scoring

List Single Sample Scoring methods available.

clep sample-scoring [OPTIONS] COMMAND [ARGS]...

limma

Limma-based Single Sample Scoring

clep sample-scoring limma [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--design <design>

Required Path to tab-separated experiment design file

--out <out>

Required Path to the output folder

--alpha <alpha>

Family-wise error rate

Default

0.05

--method <method>

Method used for testing and adjustment of P-Values

Default

fdr_bh

--control <control>

Annotated value for the control samples (must start with an alphabet)

Default

Control

ssgsea

ssGSEA based Single Sample Scoring

clep sample-scoring ssgsea [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--design <design>

Required Path to tab-separated experiment design file

--out <out>

Required Path to the output folder

--gs <gs>

Required Path to the .gmt geneset file

z-score

Z-Score based Single Sample Scoring

clep sample-scoring z-score [OPTIONS]

Options

--data <data>

Required Path to tab-separated gene expression data file

--design <design>

Required Path to tab-separated experiment design file

--out <out>

Required Path to the output folder

--control <control>

Annotated value for the control samples (must start with an alphabet)

Default

Control

--threshold <threshold>

Threshold for choosing patients that are ‘extreme’ w.r.t. the controls. If the z_score of a gene is greater than this threshold the gene is either up or down regulated.

Default

2.0