Command line Usage

Example

The following code is used to generate the example images.

trackplot \
  -e chr1:1270656-1284730:+ \
  -r example/example.sorted.gtf.gz \
  --density example/density_list.tsv \
  --show-site \
  --focus 1272656-1272656:1275656-1277656 \
  --stroke 1275656-1277656:1277856-1278656@blue \
  --sites 1271656,1271656,1272656 \
  --line example/line_list.tsv \
  -o example/example.png \
  --dpi 300 \
  --width 10 \
  --height 1 \
  --barcode example/barcode_list.tsv \
  --heatmap example/heatmap_list.tsv \
  --interval example/interval_list.tsv \
  --customized-junction example/junctions.tsv

Command line arguments

List all the parameters

python main.py --help
# Or
trackplot --help

Parameters:

Usage: main.py [OPTIONS]

  Welcome to use trackplot

Options:
  --version                       Show the version and exit.
  --verbose                       enable debug level log
  --logfile PATH                  save log info into file
  -e, --event TEXT                Event range eg: chr1:100-200:+  [required]
  Common input files configuration: 
    --color-factor INTEGER RANGE  Index of column with color levels (1-based);
                                  NOTE: LUAD|red -> LUAD while be labeled in
                                  plots and red while be the fill color
                                  [default: 1; x>=1]
    --barcode PATH                Path to barcode list file, At list two
                                  columns were required, - 1st The name of bam
                                  file, not the alias of bam;

                                  - 2nd the barcode;

                                  - 3rd The group label, optional;

                                  - 4th The color of each cell type, default
                                  using the color of corresponding bam file.
    --barcode-tag TEXT            The default cell barcode tag label
                                  [default: CB]
    --umi-tag TEXT                The default UMI barcode tag label  [default:
                                  UB]
    -p, --process INTEGER RANGE   How many cpu to use  [1<=x<=12]
    --group-by-cell               Group by cell types in density/line plot
    --remove-duplicate-umi        Drop duplicated UMIs by barcode
    --normalize-format [count|cpm|rpkm]
                                  The normalize format for bam file  [default:
                                  count]
  Output settings: 
    -o, --output PATH             Path to output graph file
    -d, --dpi INTEGER RANGE       The resolution of output file  [default:
                                  300; x>=1]
    --raster                      The would convert heatmap and site plot to
                                  raster image (speed up rendering and produce
                                  smaller files), only affects pdf, svg and PS
    --height FLOAT                The height of single subplot, default adjust
                                  image height by content  [default: 1]
    --width INTEGER RANGE         The width of output file, default adjust
                                  image width by content  [default: 10; x>=0]
    --backend TEXT                Recommended backend  [default: Agg]
  Reference settings: 
    -r, --annotation PATH          Path to gtf file, both transcript and exon
                                  tags are necessary
    --interval PATH               Path to list of interval files in bed
                                  format, 1st column is path to file, 2nd
                                  column is the label [optional]
    --show-id                     Whether show gene id or gene name
    --show-exon-id                Whether show gene id or gene name
    --no-gene                     Do not show gene id next to transcript id
    --domain                      Add domain information into annotation track
    --proxy TEXT                  The http or https proxy for EBI/Uniprot
                                  requests,if `--domain` is True, eg:
                                  http://127.0.0.1:1080
    --timeout INTEGER RANGE       The requests timeout when `--domain` is
                                  True.  [default: 10; x>=1]
    --local-domain TEXT           Load local domain folder and load into
                                  annotation track, download from https://hgdow
                                  nload.soe.ucsc.edu/gbdb/hg38/uniprot/
    --domain-include TEXT         Which domain will be included in annotation
                                  plot
    --domain-exclude TEXT         Which domain will be excluded in annotation
                                  plot
    --remove-empty                Whether to plot empty transcript
    --transcripts-to-show TEXT    Which transcript to show, transcript name or
                                  id in gtf file, eg: transcript1,transcript2
    --choose-primary              Whether choose primary transcript to plot.
    --ref-color TEXT              The color of exons  [default: black]
    --intron-scale FLOAT          The scale of intron  [default: 0.5]
    --exon-scale FLOAT            The scale of exon  [default: 1]
  Density plot settings: 
    --density PATH                The path to list of input files, a tab
                                  separated text file,

                                  - 1st column is path to input file,

                                  - 2nd column is the file category,

                                  - 3rd column is input file alias (optional),

                                  - 4th column is color of input files
                                  (optional),

                                  - 5th column is the library of input file
                                  (optional, only required by bam file),

                                  - 6th column is the number of total reads
                                  (optional, only required by bam file).
    --customized-junction TEXT    Path to junction table column name needs to
                                  be bam name or bam alias.
    --only-customized-junction    Only used customized junctions.
    -t, --threshold INTEGER RANGE
                                  Threshold to filter low abundance junctions
                                  [default: 0; x>=0]
    --density-by-strand           Whether to draw density plot by strand
    --show-site                   Whether to draw additional site plot
    --site-strand [all|+|-]       Which strand kept for site plot, default use
                                  all  [default: all]
    --included-junctions TEXT     The junction id for including, chr1:1-100
    --show-junction-num           Whether to show the number of junctions
    --fill-step [pre|post|mid]    Define step if the filling should be a step
                                  function, i.e. constant in between x.  The
                                  value determines where the step will occur:

                                  - pre: The y value is continued constantly
                                  to the left from every x position,  i.e. the
                                  interval (x[i-1], x[i]] has the value y[i].

                                  - post: The y value is continued constantly
                                  to the right from every x position,  i.e.
                                  the interval [x[i], x[i+1]) has the value
                                  y[i].

                                  - mid: Steps occur half-way between the x
                                  positions."  [default: post]
    --smooth-bin INTEGER          The bin size used to smooth ATAC fragments.
                                  [default: 20]
    --sc-density-height-ratio FLOAT
                                  The relative height of single cell density
                                  plots  [default: 1]
  Line plot settings: 
    --line PATH                   The path to list of input files, a tab
                                  separated text file,

                                  - 1st column is path to input file,

                                  - 2nd column is the file category,

                                  - 3rd column is input file group (optional),

                                  - 4th column is input file alias (optional),

                                  - 5th column is color platte of
                                  corresponding group (optional). - 6th column
                                  is the number of total reads (optional, only
                                  required by bam file).
    --hide-legend                 Whether to hide legend
    --legend-position TEXT        The legend position
    --legend-ncol INTEGER RANGE   The number of columns of legend  [x>=0]
  Heatmap plot settings: 
    --heatmap PATH                The path to list of input files, a tab
                                  separated text file,

                                  - 1st column is path to input file,

                                  - 2nd column is the file category,

                                  - 3rd column is input file group (optional),

                                  - 4th column is color platte of
                                  corresponding group. - 5th column is the
                                  number of total reads (optional, only
                                  required by bam file).
    --clustering                  Enable clustering of the heatmap
    --clustering-method [single|complete|average|weighted|centroid|median|ward]
                                  The clustering method for heatmap  [default:
                                  ward]
    --distance-metric [braycurtis|canberra|chebyshev|cityblock|correlation|cosine|dice|euclidean|hamming|jaccard|jensenshannon|kulsinski|kulczynski1|mahalanobis|matching|minkowski|rogerstanimoto|russellrao|seuclidean|sokalmichener|sokalsneath|sqeuclidean|yule]
                                  The distance metric for heatmap  [default:
                                  euclidean]
    --heatmap-scale               Do scale on heatmap matrix.
    --heatmap-vmin INTEGER        Minimum value to anchor the colormap,
                                  otherwise they are inferred from the data.
    --heatmap-vmax INTEGER        Maximum value to anchor the colormap,
                                  otherwise they are inferred from the data.
    --show-row-names              Show row names of heatmap
    --sc-heatmap-height-ratio FLOAT
                                  The relative height of single cell heatmap
                                  plots  [default: 0.2]
  IGV settings: 
    --igv PATH                    The path to list of input files, a tab
                                  separated text file,

                                  - 1st column is path to input file,

                                  - 2nd column is the file category,

                                  - 3rd column is input file alias (optional),

                                  - 4th column is color of input files
                                  (optional),

                                  - 5th column is exon_id for sorting the
                                  reads (optional).
    --m6a TEXT                    trackplot will load location information
                                  from the given tags and  then highlight the
                                  RNA m6a modification cite at individual
                                  reads.  If there are multiple m6a
                                  modification site, please add tag as follow,
                                  234423,234450
    --polya TEXT                  trackplot will load length of poly(A) from
                                  the given tags and then visualize the
                                  poly(A) part at end of each individual
                                  reads.
    --rs TEXT                     trackplot will load real strand information
                                  of each reads from the given tags and

                                   the strand information is necessary for
                                   visualizing poly(A) part.
    --del-ratio-ignore FLOAT RANGE
                                  Ignore the deletion gap in nanopore or
                                  pacbio reads.

                                  if a deletion region was smaller than
                                  (alignment length) * (del_ratio_ignore),

                                  then the deletion gap will be filled.

                                  currently the del_ratio_ignore was 1.0.
                                  [0.0<=x<=1.0]
  HiC settings: 
    --hic PATH                    The path to list of input files, a tab
                                  separated text file,

                                  - 1st column is path to input file,

                                  - 2nd column is the file category,

                                  - 3rd column is input file alias (optional),

                                  - 4th column is color of input files
                                  (optional),

                                  - 5th column is data transform for HiC
                                  matrix, eg 0, 2, 10 (optional). Same to
                                  `--log`
  Additional annotation: 
    -f, --genome PATH             Path to genome fasta
    --sites TEXT                  Where to plot additional indicator lines,
                                  comma separated int
    --stroke TEXT                 The stroke regions:
                                  start1-end1:start2-end2@color-label, draw a
                                  stroke line at bottom, default color is red
    --link TEXT                   The link: start1-end1:start2-end2@color,
                                  draw a link between two site at bottom,
                                  default color is blue
    --focus TEXT                  The highlight regions: 100-200:300-400
  Motif settings: 
    --motif PATH                  The path to customized bedGraph file, first
                                  three columns is chrom, start and end site,
                                  the following 4 columns is the weight of
                                  ATCG.
    --motif-region TEXT           The region of motif to plot in start-end
                                  format
    --motif-width FLOAT           The width of ATCG characters  [default: 0.8]
  Layout settings: 
    --n-y-ticks INTEGER RANGE     The number of ticks of y-axis  [x>=0]
    --distance-ratio FLOAT        The distance between transcript label and
                                  transcript line  [default: 0.1]
    --annotation-scale FLOAT       The size of annotation plot in final plot
                                  [default: 0.25]
    --stroke-scale FLOAT          The size of stroke plot in final image
                                  [default: 0.25]
  Overall settings: 
    --font-size INTEGER RANGE     The font size of x, y-axis and so on  [x>=1]
    --reverse-minus               Whether to reverse strand of bam/annotation
                                  file
    --hide-y-label                Whether hide y-axis label
    --same-y                      Whether different density/line plots shared
                                  same y-axis boundaries
    --same-y-sc                   Similar with --same-y, but only shared same
                                  y-axis boundaries between same single cell
                                  files
    --log [0|2|10|zscore]         y axis log transformed, 0 -> not log
                                  transform;2 -> log2;10 -> log10
    --title TEXT                  Title
    --font TEXT                   Fonts
  Web settings:
    --start-server                Start web ui instead of running in cmd mode
    --host TEXT                   The ip address binding to
    --port INTEGER                The port to listen on
    --plots PATH                  The path to directory where to save the
                                  backend plot data and logs, required while
                                  using appImage.
    --data PATH                   The path to directory contains all necessary
                                  data files.
  -h, --help                      Show this message and exit.

Detailed description

Common options

1.--color-factor: the index of column to set colors

basic usage: the input file list as follows,

path/to/bam1 LUAD
path/to/bam2 LUSC

Then the --color-factor 2 means trackplot assign two colors to LUAD and LUSC separately automatically.

advanced usage:

path/to/bam1 LUAD|red
path/to/bam2 LUSC|#000000

Then the --color-factor 2 means trackplot assign red color to LUAD and "#000000" to LUSC separately.

Output options

1.-o, --output: the path to output file, the common image format such as pdf, png, jpg and svg are supported. 2.--backend: the backend is used to switch matplotlib plotting backend,

**known issues: **

the Agg, PDF, etc. backends may cause the small protein domains missing in final output image.
the Cairo backend required cairocffi packages, which may have difficulty to install for some users, if then please try to use our docker image or use an alternative backend like Agg/PDF.

Note: Cairo backend may disable the heatmap rasterisation, so use as appropriate.

The recommended combination of backend and image formats please check matplotlib backend

Annotation plot

1.`--domain`: fetch domain information from uniprot and ensemble, then map amino acid coordinate into genomic coordinate.

For each transcript, trackplot firstly get the uniprot id from uniprot website and check whether the length of protein is one third of CDS length. If yes, then fetch the uniprot information from ebi.

The trackplot will present these domains from ['DOMAIN_AND_SITES', 'MOLECULE_PROCESSING', 'PTM', 'STRUCTURAL', 'TOPOLOGY'] in default, and please refer this for the detail of each category.

2. Include or exclude the domain based on category or type

--domain-include: the domain will be included and a domain which is not in the list will not be presented --domain-exclude: the domain will be excluded For example,


# The protein information from PTM, DOMAIN_AND_SITES and helix from STRUCTURAL will be presented.    
--include PTM:DOMAIN_AND_SITES:STRUCTURAL,helix

# The protein information from PTM, DOMAIN_AND_SITES and helix from STRUCTURAL will be excluded.    
--exclude PTM:DOMAIN_AND_SITES:STRUCTURAL,helix

The Category and type of domain refer to here.

here is the command of the tutorial,

trackplot \
  -e 7:107830089-107832297 \
  -r example/Article_figures/FigS1/A/Homo_sapiens.GRCh37.87.gtf.sorted.gz \
  -o domain.raw.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num -t 10 --domain

# domain include test
trackplot \
  -e 7:107830089-107832297 \
  -r example/Article_figures/FigS1/A/Homo_sapiens.GRCh37.87.gtf.sorted.gz \
  -o domain.include.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num -t 10 --domain --domain-include MOLECULE_PROCESSING:PTM,CARBOHYD

# domain exclude test
trackplot \
  -e 7:107830089-107832297 \
  -r example/Article_figures/FigS1/A/Homo_sapiens.GRCh37.87.gtf.sorted.gz \
  -o domain.exclude.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num -t 10 --domain --domain-exclude PTM:TOPOLOGY

3.`--local-domain`: load domain information from a folder that contains bigbed files which download from UCSC

In order to facilitate these people from poor network regions, Sashimi also provides a local mode for domain visualization. First, the user must download the corresponding annotation from UCSC, and collect all bigbed file into a folder which could pass to trackplot with --local-domain.

But the bigbed file from UCSC didn't provide a transcript or uniprot id, Sashimi couldn't map the protein information into the corresponding transcript id.

4.`--interval`: add additional feature track into annotation.

In addition to fetch genomic feature from GTF or GFF file, Sashimi also provides a flexible way to load other features into annotation track. And user could prepare and record custom annotation information into a config file, like this

$ cat example/interval_list.tsv
#file_location  label_of_file
example/PolyASite.chr1.atlas.clusters.2.0.GRCh38.96.simple.bed.gz   polyAS

Then Sashimi receive the custom annotation file with parameter --interval, the additional genomic features will be visualized at the annotation track.

Density plot

Density plot takes bam, bigwig or bgzipped depth file generated using samtools depth as input.

The input file list as follows

# filepath  file_category   label   color
example/bams/1.bam bam
example/bams/2.bam bam  2bam
example/bams/3.bam bam  3bam    blue
example/bws/2.bw    bw bw green
example/bams/sc.bam bam sc

1.`--customized-junction`

This parameter is used to add the custom junctions from users

# junctions corresponding_input_file
junctions 2bam 3bam
chr1:1000-20000 100 200

The first row is title of each column, first column is the junction id
Each row means the different junctions except the first column.
Each column means the counts of the custom junction

** Example: --customized-junction **

trackplot \
  -e chr1:1270656-1284730:+ \
  -r example/example.sorted.gtf.gz \
  --density example/density_list.tsv \
  --customized-junction example/custome_junction.tsv \
  --output example.pdf

2.`--show-site` and `--show-strand`

These two parameters were used to show the density of reads starts by forward and reverse strand separately.

**Example: --show-site, --focus and --sites **

trackplot \
  -e chr1:1270656-1284730:+ \
  -r example/example.sorted.gtf.gz \
  --density example/density_list.tsv \
  --show-site \
  --focus 1272656-1272656:1275656-1277656 \
  --sites 1271656,1272656 \
  --output example.pdf

3. Sample preparetion

Firstly, prepare the example datas

# annotation gtf file
wget -c https://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/Homo_sapiens.GRCh38.101.chr.gtf.gz

# bam files
aria2c -x 16 https://www.encodeproject.org/files/ENCFF125RUG/@@download/ENCFF125RUG.bam
aria2c -x 16 https://www.encodeproject.org/files/ENCFF854PFR/@@download/ENCFF854PFR.bam
aria2c -x 16 https://www.encodeproject.org/files/ENCFF709LHN/@@download/ENCFF709LHN.bam
aria2c -x 16 https://www.encodeproject.org/files/ENCFF613CGT/@@download/ENCFF613CGT.bam

# bigWig files
aria2c -x 16 https://www.encodeproject.org/files/ENCFF936SHU/@@download/ENCFF936SHU.bigWig
aria2c -x 16 https://www.encodeproject.org/files/ENCFF363UDO/@@download/ENCFF363UDO.bigWig
aria2c -x 16 https://www.encodeproject.org/files/ENCFF051PIE/@@download/ENCFF051PIE.bigWig
aria2c -x 16 https://www.encodeproject.org/files/ENCFF245YUN/@@download/ENCFF245YUN.bigWig
aria2c -x 16 https://www.encodeproject.org/files/ENCFF556EQK/@@download/ENCFF556EQK.bigWig
aria2c -x 16 https://www.encodeproject.org/files/ENCFF476HFB/@@download/ENCFF476HFB.bigWig

aria2c -x 16 http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phastCons470way/hg38.phastCons470way.bw%

Here is the configure file bam.tsv for next section 4, 5, 6,

# we use third column to group the different data to aggregate datasets.
ENCFF125RUG.bam bam PTBP1_KD_agg    #0084d1 frf
ENCFF854PFR.bam bam PTBP1_KD_agg    #0084d1 frf
ENCFF854PFR.bam bam PTBP1_KD_2  #0084d1 frf

4. Filter junction counts and include the specific junction

User could also filter the minimum number of reads supporting the junction, and show the specific junction in plot.

Here is the command,

# panel A, show all junctions

trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.raw.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num

# panel B, show the junctions with more 10 counts.

trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.filter.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num -t 10

# panel C, show the specific junctions.
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.include.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:-

5. Sample aggregation

User could aggregate multiple files into one track, like scRNAseq from smart-seq2 or splice-QTL datasets.


trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.agg.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:-

The PTBP1_KD_agg is the mean of ENCFF854PFR.bam and ENCFF125RUG.bam

6. The intron shrinkage

The shrinkage of intron


# Without shrinkage of the intron (shrinkage_scale: 1)
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.1.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:- \
  --intron-scale 1

# with shrinkage of the intron (shrinkage_scale: 0.001)
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.001.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:- \
  --intron-scale .01

7. Visualize coverage by cpm or rpkm

We also support to visualize the coverage by normalized values.

Inspired by rpkm_per_region from MISO

here is the configuration bam.tsv file,

# get https://www.encodeproject.org/files/ENCFF854PFR/@@download/ENCFF854PFR.bam
ENCFF854PFR.bam bam PTBP1_KD_2  #0084d1 frf 6830944

# No any normalize
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.count.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:- \
  --intron-scale .01 --normalize-format count

# Normalize by rpkm
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.rpkm.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:- \
  --intron-scale .01 --normalize-format rpkm

# Normalize by cpm
trackplot \
  -e chr9:112296343-112335026 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --density bam.tsv \
  -o PTBP3.cpm.pdf \
  --dpi 300 \
  --width 6 \
  --height 1 --show-junction-num \
  --included-junctions chr9:112297917-112330441:-,chr9:112297917-112333470:-,chr9:112330475-112333470:- \
  --intron-scale .01 --normalize-format cpm

1.--barcode

Provide a manually curated barcode list to separate bam files by cell types or other groups.

This barcode list as follows:

#bam  barcode cell_type(optional) cell_type(optional)
sc AAACCTGCACCTCGTT-1 AT2 #A6DCC2
sc AAAGATGTCCGAATGT-1 AT2 #A6DCC2
sc AAAGCAATCGTACGGC-1 AT2 #A6DCC2

2.--barocde-tag and --umi-tag

3.The tag to extract barcode and umi from each reads record, here we take the 10x Genomics bam format as default.

4.--group-by-cell

Group by cell types in density/line plot.

Example: --barcode and --group-by-cell

trackplot \
  -e chr1:1270656-1284730:+ \
  -r example/example.sorted.gtf.gz \
  --density example/density_list.tsv \
  --barcode example/barcode_list.tsv \
  --focus 1272656-1272656:1275656-1277656 \
  --sites 1271656,1271656,1272656 \
  --group-by-cell \
  --output example.pdf

Line plot

The line plot is simply another format of density plots.

The input file list as same as density plots

1.--hide-legend, --legend-position and --legend-ncol

These three parameters were used to disable legend, modify legend position and the columns of legend separately.

By default, the position of legend and columns of legend were determined by matplotlib, and the further detailed legend configuration please check matplotlib legend.

**Example: --hide-legend **

trackplot \
  -e chr1:1270656-1284730:+ \
  -o example_without_legend.pdf \
  --dpi 300 \
  --width 10 \
  --height 1 \
  --barcode example/barcode_list.tsv \
  --raster --density-by-strand \
  -r example/example.sorted.gtf.gz \
  --line example/line_list.tsv \
  --hide-legend

Example: --legend-position and --legend-ncol

trackplot \
  -e chr1:1270656-1284730:+ \
  -o example_with_legend.pdf \
  --dpi 300 \
  --width 10 \
  --height 1 \
  -t 100000 \
  --barcode example/barcode_list.tsv \
  --raster --density-by-strand \
  -r example/example.sorted.gtf.gz \
  --line example/line_list.tsv \
  --legend-position right \
  --legend-ncol 2

Please check the documentation in density plot.

Heatmap plot

Heatmap plot takes bam or bigwig as input.

The input file list as follows

# filepath  file_category   group   color
example/bams/1.bam bam  bam
example/bams/2.bam bam  bam
example/bams/3.bam bam  bam
example/bams/4.bam bam  bam
example/bams/0.bam bam  bam
example/bws/1.bw    bw  bw  YlOrBr
example/bws/2.bw    bw  bw  YlOrBr
example/bws/3.bw    bw  bw  YlOrBr
example/bws/4.bw    bw  bw  YlOrBr
example/bws/0.bw    bw  bw  YlOrBr

Example: --heatmap-scale

Raw value

Scaled value

**Example: --heatmap-vmin and --heatmap-vmax to uniform color map **

Read-by-read plot

1.Sashimi.igv module support different format file as input.

A read-by-read plot provides a landscape of aligned reads in a straight and convenient way.

User could pass bed and bam file into Sashimi, and the input config file list as follows

#filepath   file_category   label   color
example/SRX9697989.corrected_reads.bed.gz   igv bed12   blue
example/SRX9697989.corrected_reads.6.bed.gz igv bed6
example/SRX9697989.corrected_reads.3.bed.gz igv bed3    red
example/bams/0.bam  igv bam

here is the plotting command line

trackplot \
  -e 1:10024601-10038168:+ \
  -r example/example.sorted.gtf.gz \
  --igv example/igv.tsv \
  -o igv.1.pdf \
  --dpi 300 \
  --width 10 \
  --height 1

2.Sashimi.igv module load and visualize features from bam tags.

In this topic, Sashimi.igv could load m6A modification (tag, ma:i) and length of polyA (tag, pa:f) tag from bam file, and then present it on each reads.

Here is the subset bam information,


SRR12503063.3513161 16  1   14362   1   ..  *   0   0   ..  ..  NM:i:197    ms:i:1177   AS:i:977    nn:i:0  ts:Z:+  tp:Z:P  cm:i:191    s1:i:837    s2:i:861    de:f:0.089  rl:i:41 pa:f:55.6556    rs:Z:-  ma:i:14368
SRR12503063.4180535 0   1   14363   48  ..  *   0   0   ..  ..  NM:i:117    ms:i:636    AS:i:1093   nn:i:0  ts:Z:-  tp:Z:P  cm:i:245    s1:i:1045   s2:i:1023   de:f:0.0588 rl:i:0  pa:f:90.4319    rs:Z:-  ma:i:14372
SRR12503063.4171820 16  1   14363   51  ..  *   0   0   ..  ..  NM:i:132    ms:i:700    AS:i:1191   nn:i:0  ts:Z:+  tp:Z:P  cm:i:257    s1:i:1182   s2:i:1152   de:f:0.0555 rl:i:0  pa:f:99.63  rs:Z:-  ma:i:14367
SRR12503063.8305358 16  1   14364   23  ..  *   0   0   ..  ..  NM:i:285    ms:i:890    AS:i:716    nn:i:0  ts:Z:+  tp:Z:P  cm:i:99 s1:i:487    s2:i:466    de:f:0.1315 rl:i:33 pa:f:78.8698    rs:Z:-  ma:i:14368
SRR12503063.89603   16  1   14394   2   ..  *   0   0   ..  ..  NM:i:220    ms:i:1064   AS:i:891    nn:i:0  ts:Z:+  tp:Z:P  cm:i:163    s1:i:740    s2:i:741    de:f:0.0992 rl:i:30 pa:f:66.8433    rs:Z:-  ma:i:14401
......

here is the command line,

trackplot \
  -e chr1:13362-29900:+ \
  --igv example/igv.m6a.tsv \
  -o igv.m6a.pdf \
  --dpi 300 \
  --width 5 \
  --height 1 --rs rs --polya pa --m6a ma

In this picture, the red track and blue dot represents the length of poly(A) and m6a modification respectively,

3.Sashimi.igv module also allow sort these reads by specific alternative exon

User could modify the config file as follows,

#filepath   file_category   label   color   focus_exon
example/SRX8994511.example.bed.gz   igv bed12   blue    43100453-43100519,43101366-43101432

here is the command line for plotting

wget -c https://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/Homo_sapiens.GRCh38.101.chr.gtf.gz

trackplot \
  -r example/Homo_sapiens.GRCh38.101.chr.gtf.gz \
  -e 21:43092956-43107570:+ \
  --igv example/igv.3.tsv \
  --focus 43100453-43100519:43101366-43101432 \
  -o test_igv_plot.3.pdf \
  --dpi 300 \
  --width 6 \
  --height 1

HiC heatmap

HiC toy example 1

Sashimi also support HiC track, and user could prepare Li_et_al_2015.h5 into a config file, then pass to --hic. here is an example config file,

Here is the command line for generating domain

hicFindTADs -m example/Li_et_al_2015.h5 \
--outPrefix example/Li_et_al_2015_thres0.05_delta0.01_fdr \
--thresholdComparisons 0.05 \
--delta 0.01 \
--correctForMultipleTesting fdr \
-p 12

The format of configuration file,

# filepath  file_category   label   color  transform (2,10 and e means log2, log10 and ln.) depth domaininfor
example/Li_et_al_2015.h5    hic Li_hic  RdYlBu_r    2   50000 example/Li_et_al_2015_thres0.05_delta0.01_fdr_domains.bed.gz

here is the plotting command line

trackplot \
  -e X:1500000-2500000:+ \
  --hic example/hic.tsv \
  -o example.hic.png \
  --dpi 300 \
  --width 10 \
  --height 1

for each hic track, a bigger depth means a higher y-axis.

HiC toy example 2

Because Li_et_al_2015.h5 doesn't contain chromosome 1, user could download a new toy dataset and add into example picture.

1.download hic file and convert into h5 format

wget https://encode-public.s3.amazonaws.com/2016/12/01/a241cba5-df2e-45fb-9a8f-5af5587fb02a/ENCFF121YPY.hic
hicConvertFormat -m ENCFF121YPY.hic --inputFormat hic --outputFormat cool -o ENCFF121YPY.cool --resolutions 1000
hicConvertFormat -m ENCFF121YPY_1000.cool --inputFormat cool --outputFormat h5 -o ENCFF121YPY.h5

If you met UnicodeDecodeError when converting hic into h5 format, you can download pairs format file from ENCODE.

Here is an example,

wget https://www.encodeproject.org/files/ENCFF931NQV/@@download/ENCFF931NQV.pairs.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
pairix ENCFF931NQV.pairs.gz
cooler cload pairix -p 16 hg38.chrom.sizes:1000 ENCFF931NQV.pairs.gz ENCFF931NQV_1kb.cool
hicConvertFormat -m ENCFF931NQV_1kb.cool --inputFormat cool --outputFormat h5 -o ENCFF931NQV.h5

2.prepare the config file

# filepath  file_category   label   color  transform depth
example/ENCFF931NQV.h5 hic ENCFF931NQV RdYlBu_r log2 30000

3.run trackplot

trackplot \
    -e chr19:35600000-35800000:+ \
    -r example/example.sorted.2.gtf.gz \
    --hic example/hic.2.tsv \
    -o hic.2.png \
    --dpi 300 \
    --width 10 \
    --height 1

here is the result. results

circRNA plot

The linear and circRNA raw data were downloaded from PRJNA541935.

The command for generating a circRNA coverage plot with highlight the back-splice junction

wget -c https://ftp.ensembl.org/pub/release-101/gtf/homo_sapiens/Homo_sapiens.GRCh38.101.chr.gtf.gz

trackplot \
  -e chr1:925421-944308:+ \
  --density example/circRNA.tsv \
  --stroke 937113-937713@blue \
  -o circRNA.pdf \
  --dpi 300 \
  --width 10 \
  --height 1 \
  -t 10 \
  -r Homo_sapiens.GRCh38.101.chr.gtf.gz \
  --link 925921-943808

Motif plot

trackplot \
  -e chr1:1270656-1270776:+ \
  --density example/density_list.tsv \
  -o motif.pdf \
  --dpi 300 \
  --width 10 \
  --height 1 \
  --raster \
  -r example/example.sorted.gtf.gz \
  --motif example/motif.bed.gz \
  --motif-region 1270756-1270760

The motif weight matrix should be customized bedGraph format as follows:

# chromosome  start end A_weight T_weight C_weight G_weight
chr1  100 101 0.1 0.2 -0.3  -0.4

Then, bgzipped && tabix indexed

here is the result. result

Additional annotation

We also provide multiple annotations, including indicator lines, focus, stroke and sequence.

Example: --focus, --sites and --stroke

trackplot \
  -e chr1:1270656-1284730:+ \
  --focus 1272656-1272656:1275656-1277656 \
  --stroke 1275656-1277656:1277856-1278656@blue \
  --sites 1271656,1271656,1272656 \
  -o additional.pdf \
  --dpi 300 \
  --width 10 \
  --height 1 \
  --raster \
  -r example/example.sorted.gtf.gz \
  --line example/line_list.tsv