DNA features¶
[1]:
import coolbox
from coolbox.api import *
coolbox.__version__
[1]:
'0.3.9'
[2]:
test_data_dir = "../../../tests/test_data/"
test_interval = "chr9:4500000-5500000"
example_bed6 = f"{test_data_dir}/bed6_chr9_4000000_6000000.bed"
example_bed9 = f"{test_data_dir}/bed9_chr9_4000000_6000000.bed"
example_bed12 = f"{test_data_dir}/bed_chr9_4000000_6000000.bed"
example_tad = f"{test_data_dir}/tad_chr9_4000000_6000000.bed"
BED file¶
gene style¶
BED12 format can plot fish-bone like shapes, BED6 and BED9 can not:
[3]:
frame = XAxis()
frame += BED(example_bed6) + Title("BED6")
frame += Spacer(1) + BED(example_tad, labels=False) + Title("BED6 without direction")
frame += Spacer(1) + BED(example_bed12, gene_style='flybase') + Title("BED12: flybase")
frame += Spacer(1) + BED(example_bed12, gene_style='normal') + Title("BED12: normal")
frame.plot("chr9:5000000-5500000")
[3]:
data:image/s3,"s3://crabby-images/f730f/f730fc0af6caa609b217746614ddda4d54f3436c" alt="../_images/_gallery_dna_features_5_0.png"
layout¶
The default height of BED
is 'auto'
, it means the height is auto-growth, we can set the heigh of one row by the row_height
parameter:
[4]:
frame = XAxis() + BED(example_bed12, row_height=1.0)
frame.plot(test_interval)
[4]:
data:image/s3,"s3://crabby-images/429d7/429d7bf0aa76e1964471c181f943bbd3c3fcd46d" alt="../_images/_gallery_dna_features_8_0.png"
If you want fixed track height, just specify with 'height'
parameter:
[5]:
frame = XAxis() + BED(example_bed12, height=8, fontsize=10)
frame.plot(test_interval)
[5]:
data:image/s3,"s3://crabby-images/c6a63/c6a63a7f93321d185f1765400273eb2d410be14f" alt="../_images/_gallery_dna_features_10_0.png"
The number of rows to display can be limited by setting num_rows
parameter:
[6]:
frame = XAxis() + BED(example_bed12, row_height=1.0, num_rows=5)
frame.plot(test_interval)
[6]:
data:image/s3,"s3://crabby-images/725bd/725bd6bfe42e6507e23558817ab62901def94da5" alt="../_images/_gallery_dna_features_12_0.png"
And rows can be collapsed:
[7]:
frame = XAxis() + BED(example_bed12, display='collapsed')
frame.plot("chr9:5000000-5500000")
[7]:
data:image/s3,"s3://crabby-images/c25d0/c25d0ea655743623a9b98b209986ca65f3fc59bd" alt="../_images/_gallery_dna_features_14_0.png"
This can used for visualize the ChromStates:
[8]:
frame = XAxis() + BED(f"{test_data_dir}/bed_chr9_4000000_6000000_chromstates.bed", display='collapsed') + TrackHeight(0.6)
frame.plot("chr9:5000000-5500000")
[8]:
data:image/s3,"s3://crabby-images/a44cb/a44cb91bdfc1a9ada66471c0dcc4f34c8229c739" alt="../_images/_gallery_dna_features_16_0.png"
GTF file¶
In default GTF only plot genes with dna-feature-viewer
.
[9]:
example_gtf = f"{test_data_dir}/gtf_chr9_4000000_6000000.gtf"
frame = XAxis() + GTF(example_gtf) + Title("Genes")
frame.plot(test_interval)
[9]:
data:image/s3,"s3://crabby-images/312ef/312ef42113b933b41f3dfff35d9a833cbb32c993" alt="../_images/_gallery_dna_features_19_0.png"
Show transcripts and exons, 'name_attr'
controls show which attribute:
[10]:
frame = XAxis()
frame += GTF(example_gtf, row_filter="feature == 'transcript'", name_attr='transcript_id') + TrackHeight(9) + Title("Transcripts")
frame += GTF(example_gtf, row_filter="feature == 'exon'", name_attr='exon_number') + TrackHeight(6) + Title("Exons")
frame.plot("chr9:5780000-6000000")
[10]:
data:image/s3,"s3://crabby-images/1c290/1c290dea9e2d6182d8c92a937a1109312add63a2" alt="../_images/_gallery_dna_features_21_0.png"
Attribute means the features in the attribute
columns of the GTF:
[11]:
gtf = GTF(example_gtf)
df = gtf.fetch_data(GenomeRange(test_interval))
df.head(2)
[11]:
seqname | source | feature | start | end | score | strand | frame | attribute | feature_name | |
---|---|---|---|---|---|---|---|---|---|---|
0 | chr9 | protein_coding | gene | 4490444 | 4587469 | . | + | . | gene_id "ENSG00000106688"; gene_name "SLC1A1";... | SLC1A1 |
1 | chr9 | protein_coding | transcript | 4490444 | 4587469 | . | + | . | gene_id "ENSG00000106688"; transcript_id "ENST... | SLC1A1 |
[12]:
df.loc[1, 'attribute']
[12]:
'gene_id "ENSG00000106688"; transcript_id "ENST00000262352"; gene_name "SLC1A1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "SLC1A1-001"; transcript_source "ensembl_havana"; tag "CCDS"; ccds_id "CCDS6452";'
row filter
can also filter the rows by other columns, for example only show the transcripts from JAK2
gene:
[13]:
frame = XAxis()
frame += GTF(example_gtf, row_filter="feature == 'transcript';feature_name == 'JAK2'")
frame.plot(test_interval)
[13]:
data:image/s3,"s3://crabby-images/e4937/e4937cb0b719227d386f3be5a1f774255e023979" alt="../_images/_gallery_dna_features_26_0.png"