Genes


Provide a list of genes over-expressed in your sample.
Transcription Factors
Choose TF option.


Provide a list of TFs expressed in your sample.
Motifs
Choose motif option.


Provide a list of motif IDs to score mutations for gain/loss of TF binding sites.
Variations


Overview

μ-cisTarget allows to find potential cis-regulatory mutations which affect binding sites of master transcription factors.

Overview of mucistarget

μ-cisTarget description:

  1. As input μ-cisTarget takes a gene signature and a list of genomic variations. The gene signature can be derived from the matched transcriptome of the same cancer sample, or can be a general gene signature of the matching cancer subtype.
  2. Motif discovery on the gene signature yields enriched motifs and candidate transcription factors. Motif discovery can be performed using i-cisTarget or iRegulon.
  3. Variations are selected by their proximity (up to 1Mb) from the genes in the input gene signature, and are scored with the motifs found under b).
  4. Genes with gains of motifs for cancer-type-related factors that are expressed in the sample are added to the inferred gene regulatory network (red edge). An optional filtering step selects only over-expressed cancer-related driver genes as targets.

Input

Gene set/signature

Provide a list of genes of your interest that should be used either to infer a gene regulatory network based on motif enrichment analysis and/or to filter mutations (only the variants close to these genes will be scored).

The gene list can represent for instance:

  1. genes specifically over-expressed in your sample (comparing to other samples of the same cancer type or across different cancer types)
  2. gene signature related to your sample (e.g. a genes underlying a proliferative state in melanoma, when your sample is a known proliferative cell line)

Gene set/signature format:

Gene signature must be supplied as a list of gene identifiers (HGNC gene symbols), separated by newline characters.

For example:

Gene set
KLF7
LOX
CLDN14
GAD1
HEY1
COL1A1
ZEB2
SNAI2
RUNX1
ETS1

Transcription factors (TFs)

Specify TFs of your interest. Motifs directly annotated to these TFs will be scored.

You can either:

  1. use TFs that are already present in your gene list,
  2. or provide a list of TFs (e.g. TFs expressed in your sample).

Transcription factors (TFs) format:

The same format as for the gene list should be used.

Motifs

Specify which motifs will be used to score variations.

Choose from the following options:

  1. Automatically discover enriched motifs in your gene set:
    iRegulon analysis will discover enriched motifs in the provided set of genes. Only the enriched motifs which are directly annotated for the chosen TFs (in the previous step) will be used for the variation scoring.
  2. Use known motifs for your list of TFs:
    The motifs that are directly annotated for the selected TFs (in the previous step) will be used for scoring of the variations.
  3. Provide a list of motif IDs:
    Provide motif IDs of your interest, e.g. from the previous i-cisTarget or iRegulon analysis.

Motifs format:

Motif IDs provided by iRegulon/i-cisTarget should be used, separated by newline characters.

For example:

Motif IDs
transfac_pro__M01207
transfac_pro__M03569
jaspar__MA0462.1
cisbp__M1578
cisbp__M5383

Variations

Specify the variations that should be scored. Please, note that all your variations will be assigned to the genes from your gene set (up to 1 Mb) and only these variations will be scored.

Variations format:

Variations should be provided in VCF format (Variant Call Format) in hg19 coordinates.

An example of the first 5 columns:

chromosome coordinates mutation ID (not used) Reference sequence Mutated sequence
chr12
chr16
chr3
chr3
chr3
chr7
chr11
40756654
68843162
149286338
40974853
70752234
14242729
101096706
SNV1
SNV2
SNV3
SNV4
SNV5
DEL1
INS1
C
C
A
A
A
ATCAACAGATGTGCGAATAATCTCTACTTCGGGGCCAGTATCAAAAAGAGCAGTAGC
T
T
T
C
C
C
A
TT

Output

μ-cisTarget log file

Analysis report, including the basic statistics:

μ-cisTarget motifLocator delta scores file

For each variant that was assigned to a gene from the gene list and that has either a wildtype or mutant MotifLocator score ≥ 0.8, the following information is provided:

chrom start reference mutation mutation type mutation ID associated gene distance to TSS motif ID motif name directly annotated TFs wildtype MotifLocator score mutant MotifLocator score delta MotifLocator score wildtype consensus sequence mutant consensus sequence
chr12
chr16
chr3
chr3
chr3
chr7
40756654
68843162
149286338
40974853
70752234
14242729
C
C
A
A
A
ATCAACAGATGTGCGAATAATCTCTACTTCGGGGCCAGTATCAAAAAGAGCAGTAGC
T
T
C
C
C
A
SNV
SNV
SNV
SNV
SNV
DEL
chr12__40756654__C__T__SNV
chr16__68843162__C__T__SNV
chr3__149286338__A__C__SNV
chr3__40974853__A__C__SNV
chr3__70752234__A__C__SNV
chr7__14242729__ATCAACAGATGTGCGAATAATCTCTACTTCGGGGCCAGTATCAAAAAGAGCAGTAGC__A__DEL
LRRK2
CDH1
WWTR1
CTNNB1
MITF
ETV1
+137779
+72035
+134722
-266076
+963649
-213158
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
cisbp__M1578
SOX4
SOX4
SOX4
SOX4
SOX4
SOX4
0.786153
0.788243
0.763568
0.780635
0.763266
0.783073
0.909462
0.914018
0.904501
0.922858
0.904501
0.904501
0.123309
0.125775
0.140933
0.142223
0.141235
0.121428
TTTCTGTT
ACTCTGTC
TTTTTTTT
CTATTTTT
TTTTTTTT
TTTTTGAT
TTTTTGTT
ACTTTGTC
TTTTTGTT
CTATTGTT
TTTTTGTT
TTTTTGTT

Please, note that no threshold for delta MotifLocator score was applied, hence this file includes all the variants with MotifLocator score ≥ 0.8 for wildtype or mutant sequence. The subsequent filtering is up to the user.

We recommend the following filtering to detect potential gain-of-motif mutations:

which can be easily performed in command line using the following command:

μ-cisTarget mutation to associated genes file

Output of the mutations to genes assignment process.

The file includes all the mutations that are associated to the genes from the provided gene list (up to 1Mb).

Examples

An example of SK-MEL-5 cis-regulatory mutations generating de novo edges in a melanoma proliferative regulatory network

The aim of this analysis was to find potential mutatations in SK-MEL-5 cell line (melanoma cell line) which cause new binding sites of known melanoma master TFs.

To do this, the following input and options were used:

  1. Genes: a proliferative gene signature (Verfaillie et al., 2015) & genes over-expressed in SK-MEL-5 which are also known cancer drivers and related to melanoma.
  2. Transcription Factors: "provide list of TFs" option was used where a list of TFs that are expressed in SK-MEL-5 and related to melanoma was provided
  3. Motifs: "automatically discover enriched motifs in your gene set" option was used.
  4. Mutations: SK-MEL-5 variants that are not SNPs (not present in dbSNP build 144) and that are assigned to the over-expressed, melanoma relevant and cancer driver genes were used.

μ-cisTarget performed the following steps and detected potential gain-of-motif mutations for a master TF.

  1. Motif discovery (iRegulon) on the gene signature to find enriched motifs and candidate master regulators (TFs) was performed.
  2. From the enriched motifs, only the motifs that are directly annotated for the TFs from the TF list were selected and used by MotifLocator.
  3. Mutations were assigned to the genes from the gene list.
  4. The assigned mutations were scored by MotifLocator.
  5. Six candidate mutations were detected causing a gain of motif directly annotated for SOX TFs. See an example of one of the SNVs at the figure below.

SKMEL5_example

Melanoma gene regulatory network with SK-MEL-5 de novo edges. a) Gene regulatory network inferred by iRegulon analysis of a "MITF-high" melanoma gene signature (Verfaillie et al., 2015). Among the enriched motifs are directly annotated motifs for MITF and SOX TFs, which are also over-expressed in the SK-MEL-5 cell line (Z-score >=1 comparing to other samples in CCLE) as well as melanoma relevant. The grey edges represent the link between TFs and TGs based on iRegulon analysis, while red arrows indicate a gain of these motifs caused by the SK-MEL-mutations close to the over-expressed, relevant, driver genes. All the represented TGs are over-expressed in SK-MEL-5 and related to melanoma (green nodes), on top of that some of them are also known cancer drivers (orange). b) Example of a candidate SNV in the 5th intron of CDH1 gene, which creates a new SOX binding site.

Contact

If you have any question or problem related to μ-cisTarget, please, inform us.

Cite us

If you use μ-cisTarget, please cite:

Kalender Atak Z, Imrichova H, Svetlichnyy D, Hulselmans G, Christiaens V, Reumers J, Ceulemans H, Aerts S. Identification Of cis-regulatory mutations generating de novo edges in personalized cancer gene regulatory networks. Genome Medicine. 9. (2017) doi: 10.1186/s13073-017-0464-7