FusionRank is a machine learning framework for prioritizing gene fusions detected from RNA sequencing data. FusionRank annotates the candidate fusions with biological, genomic, and structural features and ranks them according to their predicted oncogenic relevance using a Learning-to-Rank (LambdaRank) model trained on curated fusion databases. FusionRank supports fusion calls with or without breakpoint coordinate information and generates a ranked summary report for downstream review.
- FR_gene_breakpoint_model: Uses both fusion partner gene names, chromosomal breakpoint coordinates, and reported strandness.
- FR_gene_name_model: Uses only fusion partner gene names for when breakpoint coordinates are unavailable.
Create a conda environment:
conda create -n fusionrank python=3.10.14
conda activate fusionrank
conda install -c bioconda bedtoolsNote:
bedtoolsmust be installed separately via conda before running FusionRank. It is required for breakpoint-based annotation. If you are using the gene name-based model (FR_gene_name_model) only,bedtoolsis not required.
Clone the repository and install, from the repository root:
git clone https://github.com/WGLab/FusionRank.git
cd FusionRank
pip install .Verify installation:
fusionrank --helpwget https://github.com/WGLab/FusionRank/releases/download/v0.1.0/FR_refs.tar.gz
tar -xzf FR_refs.tar.gzFusionRank requires the following GENCODE v49 files. Download them into the same reference directory.
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.basic.annotation.gtf.gzwget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.transcripts.fa.gzwget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.pc_translations.fa.gzfusionrank <input_file> \
--sample-prefix <sample_prefix> \
--output-dir <output_dir> \
--refs-dir <reference_dir> \
--model-dir <model_dir>| Parameter | Description |
|---|---|
input_file |
Fusion calls as a TSV file (see below for input file format) |
--sample-prefix |
Prefix used for output files |
--output-dir |
Directory for Output folder with sample-prefix |
--refs-dir |
Reference directory |
--model-dir |
Directory containing trained model files (see below) |
FusionRank provides two pretrained models. Select the appropriate model by specifying the path with --model-dir.
Use when chromosomal breakpoint coordinates are available:
--model-dir /path/to/FR_gene_breakpoint_model
Use when only fusion partner gene names are available:
--model-dir /path/to/FR_gene_name_model
Note: Replace the example paths above with the location of the downloaded model directory on your system. FusionRank automatically detects whether breakpoint columns are present in the input file. If breakpoint information is unavailable, breakpoint-dependent annotations are skipped and the gene-level model should be specified.
Input files must be tab-separated (TSV) files containing one fusion candidate per row.
| sample_id | cancer_type | gene1 | gene2 |
|---|---|---|---|
| K562_rep1 | LAML | BCR | ABL1 |
| Sample_02 | LUAD | EML4 | ALK |
Column descriptions
| Column | Description |
|---|---|
sample_id |
Unique sample identifier |
cancer_type |
TCGA or pediatric cancer acronym |
gene1 |
5' fusion partner gene |
gene2 |
3' fusion partner gene |
Note: FusionRank supports processing multiple samples within a single input file. Fusion candidates are automatically grouped by
sample_id, and ranking is performed independently within each sample.
The following columns are required only when using the breakpoint-level model (FR_gene_breakpoint_model).
| sample_id | cancer_type | gene1 | gene2 | gene1_chr | gene1_bkp | gene1_strand | gene2_chr | gene2_bkp | gene2_strand |
|---|---|---|---|---|---|---|---|---|---|
| K562_rep1 | LAML | BCR | ABL1 | chr22 | 23290410 | + | chr9 | 130854063 | + |
Breakpoint column descriptions
| Column | Description |
|---|---|
sample_id |
Unique sample identifier |
cancer_type |
TCGA or pediatric cancer acronym |
gene1_chr |
Chromosome of the 5' fusion breakpoint |
gene1_bkp |
Genomic coordinate of the 5' breakpoint |
gene1_strand |
Strand of the 5' breakpoint (+ or -) |
gene2_chr |
Chromosome of the 3' fusion breakpoint |
gene2_bkp |
Genomic coordinate of the 3' breakpoint |
gene2_strand |
Strand of the 3' breakpoint (+ or -) |
Note: If breakpoint columns are omitted, FusionRank automatically skips breakpoint-dependent annotations and should be run using the gene-level model (
FR_gene_name_model).
The cancer_type column must contain one of the following cancer acronyms:
| Cancer Acronym | Full Name |
|---|---|
| ACC | Adrenocortical Carcinoma |
| ALL | Acute Lymphoblastic Leukemia |
| BLCA | Bladder Urothelial Carcinoma |
| BRCA | Breast Invasive Carcinoma |
| CCSK | Clear Cell Sarcoma of the Kidney |
| CESC | Cervical Squamous Cell Carcinoma |
| CHOL | Cholangiocarcinoma |
| COAD | Colon Adenocarcinoma |
| DLBC | Diffuse Large B-Cell Lymphoma |
| ESCA | Esophageal Carcinoma |
| GBM | Glioblastoma Multiforme |
| HNSC | Head and Neck Squamous Cell Carcinoma |
| KICH | Kidney Chromophobe |
| KIRC | Kidney Renal Clear Cell Carcinoma |
| KIRP | Kidney Renal Papillary Cell Carcinoma |
| LAML | Acute Myeloid Leukemia |
| LGG | Brain Lower Grade Glioma |
| LIHC | Liver Hepatocellular Carcinoma |
| LUAD | Lung Adenocarcinoma |
| LUSC | Lung Squamous Cell Carcinoma |
| MESO | Mesothelioma |
| NBL | Neuroblastoma |
| OS | Osteosarcoma |
| OV | Ovarian Serous Cystadenocarcinoma |
| PAAD | Pancreatic Adenocarcinoma |
| PCPG | Pheochromocytoma and Paraganglioma |
| PRAD | Prostate Adenocarcinoma |
| READ | Rectum Adenocarcinoma |
| RT | Rhabdoid Tumor |
| SARC | Sarcoma |
| SKCM | Skin Cutaneous Melanoma |
| STAD | Stomach Adenocarcinoma |
| TGCT | Testicular Germ Cell Tumors |
| THCA | Thyroid Carcinoma |
| THYM | Thymoma |
| UCS | Uterine Carcinosarcoma |
| UCEC | Uterine Corpus Endometrial Carcinoma |
| UVM | Uveal Melanoma |
| WT | Wilms Tumor |
Results are written to:
<output_dir>/<sample_prefix>/
Generated files include:
| File | Description |
|---|---|
<sample_prefix>_annotated.tsv |
Full feature matrix with all annotation columns |
<sample_prefix>_annotated_ranked.tsv |
Annotated fusions with annoations, predicted scores, and per-sample ranks |
<sample_prefix>_summary.tsv |
Evidence-oriented summary with database, drug, biology, and panel annotations |
<sample_prefix>_summary.html |
Browser-viewable ranked fusion report with top 5 candidates highlighted |
Please cite FusionRank at the following ...