Genomic level annotation

Annotation from genomic level is handled by the ganno subcommand in TransVar.

Short genomic regions

To annotate a short genomic region in a gene,

$ transvar ganno --ccds -i 'chr3:g.178936091_178936192'

outputs

chr3:g.178936091_178936192   CCDS43171.1 (protein_coding)    PIK3CA  +
   chr3:g.178936091_178936192/c.1633_1664+70/p.E545_R555     from_[cds_in_exon_9]_to_[intron_between_exon_9_and_10]
   C2=donor_splice_site_on_exon_9_at_chr3:178936123_included;start_codon=1789360
   91-178936092-178936093;end_codon=178936121-178936122-178936984;source=CCDS

Results indicates the beginning position is at coding region while ending position is at intronic region (c.1633_1664+70). Note that there is no consequence label (CSQN tag) when performing a region annotation (instead of a variant).

For intergenic sites, TransVar also reports the identity and distance to the gene upstream and downstream. For example, chr6:116991832 is simply annotated as intergenic in the original annotation. TransVar reveals that it is 1,875 bp downstream to ZUFSP and 10,518 bp upstream to KPNA5 showing a vicinity to the gene ZUFSP. There is no limit in the reported distance. If a site is at the end of the chromosome, TransVar is able to report the distance to the telomere.

Long genomic regions

$ transvar ganno -i 'chr19:g.41978629_41983350' --ensembl --refversion mm10
chr19:g.41978629_41983350    ENSMUST00000167927 (nonsense_mediated_decay),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000171561 (protein_coding),ENSMUST00000026170 (protein_coding) MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000163398 (nonsense_mediated_decay),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000164776 (nonsense_mediated_decay),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000026168 (protein_coding),ENSMUST00000026170 (protein_coding) MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000171755 (retained_intron),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000169775 (nonsense_mediated_decay),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .
chr19:g.41978629_41983350    ENSMUST00000168484 (nonsense_mediated_decay),ENSMUST00000026170 (protein_coding)        MMS19,UBTD1     -,+
   chr19:g.41978629_41983350/./.     from_[intron_between_exon_1_and_2;MMS19]_to_[intron_between_exon_1_and_2;UBTD1]
   .

Results indicates a 4721 bp region spanning the promoters of two closely located, opposite-oriented genes MMS19 and UBTD1. The starting point and ending point are situated in the first introns of the two genes.

$ transvar ganno -i '9:g.133750356_137990357' --ccds

outputs

9:g.133750356_137990357      CCDS35165.1 (protein_coding),CCDS6986.1 (protein_coding)        ABL1,OLFM1      +,+
   chr9:g.133750356_137990357/./.    from_[cds_in_exon_7;ABL1]_to_[intron_between_exon_4_and_5;OLFM1]_spanning_[51_genes]
   .
9:g.133750356_137990357      CCDS35166.1 (protein_coding),CCDS6986.1 (protein_coding)        ABL1,OLFM1      +,+
   chr9:g.133750356_137990357/./.    from_[cds_in_exon_7;ABL1]_to_[intron_between_exon_4_and_5;OLFM1]_spanning_[51_genes]
   .

The result indicates that the region span 53 genes. The beginning of the region resides in the coding sequence of ABL1, c.1187A and the ending region resides in the intronic region of OLFM1, c.622+6C. 2 different usage of transcripts in annotating the starting position is represented in two lines, each line corresponding to a combination of transcript usage. This annotation not only shows the coverage of the region, also reveals the fine structure of the boundary.

In another example, where the ending position exceeds the length of the chromosome, TransVar truncates the region and outputs upstream and downstream information of the ending position.

$ transvar ganno -i '9:g.133750356_1337503570' --ccds

outputs

9:g.133750356_1337503570     CCDS35165.1 (protein_coding),   ABL1,   +
   chr9:g.133750356_141213431/./.    from_[cds_in_exon_7;ABL1]_to_[intergenic_between_EHMT1(484,026_bp_downstream)_and_3'-telomere(0_bp)]_spanning_[136_genes]
   .
9:g.133750356_1337503570     CCDS35166.1 (protein_coding),   ABL1,   +
   chr9:g.133750356_141213431/./.    from_[cds_in_exon_7;ABL1]_to_[intergenic_between_EHMT1(484,026_bp_downstream)_and_3'-telomere(0_bp)]_spanning_[136_genes]
   .

Genomic variant

Single nucleotide variation (SNV)

This is the forward annotation

$ transvar ganno --ccds -i 'chr3:g.178936091G>A'

outputs

chr3:g.178936091G>A  CCDS43171.1 (protein_coding)    PIK3CA  +
   chr3:g.178936091G>A/c.1633G>A/p.E545K     inside_[cds_in_exon_9]
   CSQN=Missense;dbsnp=rs104886003(chr3:178936091G>A);codon_pos=178936091-178936
   092-178936093;ref_codon_seq=GAG;source=CCDS

Another example:

$ transvar ganno -i "chr9:g.135782704C>G" --ccds

outputs

chr9:g.135782704C>G  CCDS55350.1 (protein_coding)    TSC1    -
   chr9:g.135782704C>G/c.1164G>C/p.L388L     inside_[cds_in_exon_10]
   CSQN=Synonymous;dbsnp=rs770692313(chr9:135782704C>G);codon_pos=135782704-1357
   82705-135782706;ref_codon_seq=CTG;source=CCDS
chr9:g.135782704C>G  CCDS6956.1 (protein_coding)     TSC1    -
   chr9:g.135782704C>G/c.1317G>C/p.L439L     inside_[cds_in_exon_11]
   CSQN=Synonymous;dbsnp=rs770692313(chr9:135782704C>G);codon_pos=135782704-1357
   82705-135782706;ref_codon_seq=CTG;source=CCDS

and a nonsense mutation:

$ transvar ganno -i 'chr1:g.115256530G>A' --ensembl

outputs

chr1:g.115256530G>A  ENST00000369535 (protein_coding)        NRAS    -
   chr1:g.115256530G>A/c.181C>T/p.Q61*       inside_[cds_in_exon_3]
   CSQN=Nonsense;codon_pos=115256528-115256529-115256530;ref_codon_seq=CAA;alias
   es=ENSP00000358548;source=Ensembl

CSQN fields indicates a nonsense mutation.

Deletions

A frameshift deletion

$ transvar ganno -i "chr2:g.234183368_234183380del" --ccds

outputs

chr2:g.234183368_234183380del        CCDS2502.2 (protein_coding)     ATG16L1 +
   chr2:g.234183368_234183380del13/c.841_853del13/p.T281Lfs*5        inside_[cds_in_exon_8]
   CSQN=Frameshift;left_align_gDNA=g.234183367_234183379del13;unaligned_gDNA=g.2
   34183368_234183380del13;left_align_cDNA=c.840_852del13;unalign_cDNA=c.841_853
   del13;source=CCDS
chr2:g.234183368_234183380del        CCDS2503.2 (protein_coding)     ATG16L1 +
   chr2:g.234183368_234183380del13/c.898_910del13/p.T300Lfs*5        inside_[cds_in_exon_9]
   CSQN=Frameshift;left_align_gDNA=g.234183367_234183379del13;unaligned_gDNA=g.2
   34183368_234183380del13;left_align_cDNA=c.897_909del13;unalign_cDNA=c.898_910
   del13;source=CCDS
chr2:g.234183368_234183380del        CCDS54438.1 (protein_coding)    ATG16L1 +
   chr2:g.234183368_234183380del13/c.409_421del13/p.T137Lfs*5        inside_[cds_in_exon_5]
   CSQN=Frameshift;left_align_gDNA=g.234183367_234183379del13;unaligned_gDNA=g.2
   34183368_234183380del13;left_align_cDNA=c.408_420del13;unalign_cDNA=c.409_421
   del13;source=CCDS

Note the difference between left-aligned identifier and the right aligned identifier.

An in-frame deletion

$ transvar ganno -i "chr2:g.234183368_234183379del" --ccds

outputs

chr2:g.234183368_234183379del        CCDS2502.2 (protein_coding)     ATG16L1 +
   chr2:g.234183368_234183379del12/c.841_852del12/p.T281_G284delTHPG inside_[cds_in_exon_8]
   CSQN=InFrameDeletion;left_align_gDNA=g.234183367_234183378del12;unaligned_gDN
   A=g.234183368_234183379del12;left_align_cDNA=c.840_851del12;unalign_cDNA=c.84
   1_852del12;left_align_protein=p.T281_G284delTHPG;unalign_protein=p.T281_G284d
   elTHPG;source=CCDS
chr2:g.234183368_234183379del        CCDS2503.2 (protein_coding)     ATG16L1 +
   chr2:g.234183368_234183379del12/c.898_909del12/p.T300_G303delTHPG inside_[cds_in_exon_9]
   CSQN=InFrameDeletion;left_align_gDNA=g.234183367_234183378del12;unaligned_gDN
   A=g.234183368_234183379del12;left_align_cDNA=c.897_908del12;unalign_cDNA=c.89
   8_909del12;left_align_protein=p.T300_G303delTHPG;unalign_protein=p.T300_G303d
   elTHPG;source=CCDS
chr2:g.234183368_234183379del        CCDS54438.1 (protein_coding)    ATG16L1 +
   chr2:g.234183368_234183379del12/c.409_420del12/p.T137_G140delTHPG inside_[cds_in_exon_5]
   CSQN=InFrameDeletion;left_align_gDNA=g.234183367_234183378del12;unaligned_gDN
   A=g.234183368_234183379del12;left_align_cDNA=c.408_419del12;unalign_cDNA=c.40
   9_420del12;left_align_protein=p.T137_G140delTHPG;unalign_protein=p.T137_G140d
   elTHPG;source=CCDS

Another example

$ transvar ganno --ccds -i 'chr12:g.53703425_53703427del'

outputs

chr12:g.53703425_53703427del CCDS53797.1 (protein_coding)    AAAS    -
   chr12:g.53703427_53703429delCCC/c.670_672delGGG/p.G224delG        inside_[cds_in_exon_7]
   CSQN=InFrameDeletion;left_align_gDNA=g.53703424_53703426delCCC;unaligned_gDNA
   =g.53703425_53703427delCCC;left_align_cDNA=c.667_669delGGG;unalign_cDNA=c.669
   _671delGGG;left_align_protein=p.G223delG;unalign_protein=p.G223delG;source=CC
   DS
chr12:g.53703425_53703427del CCDS8856.1 (protein_coding)     AAAS    -
   chr12:g.53703427_53703429delCCC/c.769_771delGGG/p.G257delG        inside_[cds_in_exon_8]
   CSQN=InFrameDeletion;left_align_gDNA=g.53703424_53703426delCCC;unaligned_gDNA
   =g.53703425_53703427delCCC;left_align_cDNA=c.766_768delGGG;unalign_cDNA=c.768
   _770delGGG;left_align_protein=p.G256delG;unalign_protein=p.G256delG;source=CC
   DS

Note the difference between left and right-aligned identifiers on both protein level and cDNA level.

An in-frame out-of-phase deletion

$ transvar ganno -i "chr2:g.234183372_234183383del" --ccds

outputs

chr2:g.234183372_234183383del        CCDS2502.2 (protein_coding)     ATG16L1 +
   chr2:g.234183372_234183383del12/c.845_856del12/p.H282_G286delinsR inside_[cds_in_exon_8]
   CSQN=MultiAAMissense;left_align_gDNA=g.234183372_234183383del12;unaligned_gDN
   A=g.234183372_234183383del12;left_align_cDNA=c.845_856del12;unalign_cDNA=c.84
   5_856del12;source=CCDS
chr2:g.234183372_234183383del        CCDS2503.2 (protein_coding)     ATG16L1 +
   chr2:g.234183372_234183383del12/c.902_913del12/p.H301_G305delinsR inside_[cds_in_exon_9]
   CSQN=MultiAAMissense;left_align_gDNA=g.234183372_234183383del12;unaligned_gDN
   A=g.234183372_234183383del12;left_align_cDNA=c.902_913del12;unalign_cDNA=c.90
   2_913del12;source=CCDS
chr2:g.234183372_234183383del        CCDS54438.1 (protein_coding)    ATG16L1 +
   chr2:g.234183372_234183383del12/c.413_424del12/p.H138_G142delinsR inside_[cds_in_exon_5]
   CSQN=MultiAAMissense;left_align_gDNA=g.234183372_234183383del12;unaligned_gDN
   A=g.234183372_234183383del12;left_align_cDNA=c.413_424del12;unalign_cDNA=c.41
   3_424del12;source=CCDS

Insertions

An in-frame insertion of three nucleotides

$ transvar ganno -i 'chr2:g.69741762_69741763insTGC' --ccds

outputs

chr2:g.69741762_69741763insTGC       CCDS1893.2 (protein_coding)     AAK1    -
   chr2:g.69741780_69741782dupCTG/c.1614_1616dupGCA/p.Q546dupQ       inside_[cds_in_exon_12]
   CSQN=InFrameInsertion;left_align_gDNA=g.69741762_69741763insTGC;unalign_gDNA=
   g.69741762_69741763insTGC;left_align_cDNA=c.1596_1597insCAG;unalign_cDNA=c.16
   14_1616dupGCA;left_align_protein=p.Y532_Q533insQ;unalign_protein=p.Q539dupQ;p
   hase=2;source=CCDS

Note the proper right-alignment of protein level insertion Q. The left-aligned identifier is also given in the LEFTALN field.

A frame-shift insertion of two nucleotides

$ transvar ganno -i 'chr7:g.121753754_121753755insCA' --ccds

outputs

chr7:g.121753754_121753755insCA      CCDS5783.1 (protein_coding)     AASS    -
   chr7:g.121753754_121753755insCA/c.1064_1065insGT/p.I355Mfs*10     inside_[cds_in_exon_9]
   CSQN=Frameshift;left_align_gDNA=g.121753753_121753754insAC;unalign_gDNA=g.121
   753754_121753755insCA;left_align_cDNA=c.1063_1064insTG;unalign_cDNA=c.1063_10
   64insTG;source=CCDS
$ transvar ganno -i 'chr17:g.79093270_79093271insGGGCGT' --ccds

outputs

chr17:g.79093270_79093271insGGGCGT   CCDS45807.1 (protein_coding)    AATK    -
   chr17:g.79093282_79093287dupTGGGCG/c.3988_3993dupACGCCC/p.T1330_P1331dupTP        inside_[cds_in_exon_13]
   CSQN=InFrameInsertion;left_align_gDNA=g.79093270_79093271insGGGCGT;unalign_gD
   NA=g.79093270_79093271insGGGCGT;left_align_cDNA=c.3976_3977insCGCCCA;unalign_
   cDNA=c.3988_3993dupACGCCC;left_align_protein=p.A1326_P1327insPT;unalign_prote
   in=p.T1330_P1331dupTP;phase=0;source=CCDS

Notice the difference in the inserted sequence when left-alignment and right-alignment conventions are followed.

A frame-shift insertion of one nucleotides in a homopolymer

$ transvar ganno -i 'chr7:g.117230474_117230475insA' --ccds

outputs

chr7:g.117230474_117230475insA       CCDS5773.1 (protein_coding)     CFTR    +
   chr7:g.117230479dupA/c.1752dupA/p.E585Rfs*4       inside_[cds_in_exon_13]
   CSQN=Frameshift;left_align_gDNA=g.117230474_117230475insA;unalign_gDNA=g.1172
   30474_117230475insA;left_align_cDNA=c.1747_1748insA;unalign_cDNA=c.1747_1748i
   nsA;source=CCDS

Notice the right alignment of cDNA level insertion and the left alignment reported as additional information.

A in-frame, in-phase insertion

$ transvar ganno -i 'chr12:g.109702119_109702120insACC' --ccds
chr12:g.109702119_109702120insACC    CCDS31898.1 (protein_coding)    ACACB   +
   chr12:g.109702119_109702120insACC/c.6870_6871insACC/p.Y2290_H2291insT     inside_[cds_in_exon_49]
   CSQN=InFrameInsertion;left_align_gDNA=g.109702118_109702119insCAC;unalign_gDN
   A=g.109702119_109702120insACC;left_align_cDNA=c.6869_6870insCAC;unalign_cDNA=
   c.6870_6871insACC;left_align_protein=p.Y2290_H2291insT;unalign_protein=p.Y229
   0_H2291insT;phase=0;source=CCDS

Block substitutions

A block-substitution that results in a frameshift.

$ transvar ganno -i 'chr10:g.27329002_27329002delinsAT' --ccds
chr10:g.27329002_27329002delinsAT    CCDS41499.1 (protein_coding)    ANKRD26 -
   chr10:g.27329009dupT/c.2266dupA/p.M756Nfs*6       inside_[cds_in_exon_21]
   CSQN=Frameshift;left_align_gDNA=g.27329002_27329003insT;unalign_gDNA=g.273290
   02_27329003insT;left_align_cDNA=c.2259_2260insA;unalign_cDNA=c.2266dupA;sourc
   e=CCDS

A block-substitution that is in-frame,

$ transvar ganno -i 'chr10:g.52595929_52595930delinsAA' --ccds
chr10:g.52595929_52595930delinsAA    CCDS7243.1 (protein_coding)     A1CF    -
   chr10:g.52595929_52595930delinsAA/c.532_533delinsTT/p.P178L       inside_[cds_in_exon_4]
   CSQN=Missense;codon_cDNA=532-533-534;source=CCDS
chr10:g.52595929_52595930delinsAA    CCDS7241.1 (protein_coding)     A1CF    -
   chr10:g.52595929_52595930delinsAA/c.508_509delinsTT/p.P170L       inside_[cds_in_exon_4]
   CSQN=Missense;codon_cDNA=508-509-510;source=CCDS
chr10:g.52595929_52595930delinsAA    CCDS7242.1 (protein_coding)     A1CF    -
   chr10:g.52595929_52595930delinsAA/c.508_509delinsTT/p.P170L       inside_[cds_in_exon_4]
   CSQN=Missense;codon_cDNA=508-509-510;source=CCDS

Promoter region

One can define the promoter boundary through the –prombeg and –promend option. Default promoter region is defined from 1000bp upstream of the transcription start site to the transcription start site. One could customize this setting to e.g., [-1000bp, 2000bp] by

$ transvar ganno -i 'chr19:g.41978629_41980350' --ensembl --prombeg 2000 --promend 1000 --refversion mm10
chr19:g.41978629_41980350    ENSMUST00000167927 (nonsense_mediated_decay)    MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_237_bp(13.76%);aliases=ENSMUSP000001324
   83;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000171561 (protein_coding)     MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_194_bp(11.27%);aliases=ENSMUSP000001309
   00;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000163398 (nonsense_mediated_decay)    MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_234_bp(13.59%);aliases=ENSMUSP000001268
   64;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000164776 (nonsense_mediated_decay)    MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_215_bp(12.49%);aliases=ENSMUSP000001294
   78;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000026168 (protein_coding)     MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_219_bp(12.72%);aliases=ENSMUSP000000261
   68;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000171755 (retained_intron)    MMS19   -
   chr19:g.41978629_41980350/c.141+649_141+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_212_bp(12.31%);source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000169775 (nonsense_mediated_decay)    MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_214_bp(12.43%);aliases=ENSMUSP000001282
   34;source=Ensembl
chr19:g.41978629_41980350    ENSMUST00000168484 (nonsense_mediated_decay)    MMS19   -
   chr19:g.41978629_41980350/c.115+649_115+2370/.    inside_[intron_between_exon_1_and_2]
   promoter_region_of_[MMS19]_overlaping_221_bp(12.83%);aliases=ENSMUSP000001268
   81;source=Ensembl

The last result shows that 12-13% of the target region is inside the promoter region. The overlap is as long as ~200 base pairs.

Splice sites

Consider a splice donor site chr7:5568790_5568791 (a donor site, intron side by definition, reverse strand, chr7:5568792- is the exon),

The 1st exonic nucleotide before donor splice site:

$ transvar ganno -i 'chr7:5568792C>G' --ccds

output a exonic variation and a missense variation

chr7:5568792C>G      CCDS5341.1 (protein_coding)     ACTB    -
   chr7:g.5568792C>G/c.363G>C/p.Q121H        inside_[cds_in_exon_2]
   CSQN=Missense;C2=NextToSpliceDonorOfExon2_At_chr7:5568791;codon_pos=5568792-5
   568793-5568794;ref_codon_seq=CAG;source=CCDS

The 1st nucleotide in the canonical donor splice site (intron side, this is commonly regarded as the splice site location):

$ transvar ganno -i 'chr7:5568791C>G' --ccds

output a splice variation

chr7:5568791C>G      CCDS5341.1 (protein_coding)     ACTB    -
   chr7:g.5568791C>G/c.363+1G>C/.    inside_[intron_between_exon_2_and_3]
   CSQN=SpliceDonorSNV;C2=SpliceDonorOfExon2_At_chr7:5568791;source=CCDS

The 2nd nucleotide in the canonical donor splice site (2nd on the intron side, still considered part of the splice site):

$ transvar ganno -i 'chr7:5568790A>G' --ccds

output a splice variation

chr7:5568790A>G      CCDS5341.1 (protein_coding)     ACTB    -
   chr7:g.5568790A>G/c.363+2T>C/.    inside_[intron_between_exon_2_and_3]
   CSQN=SpliceDonorSNV;C2=SpliceDonorOfExon2_At_chr7:5568791;source=CCDS

The 1st nucleotide downstream next to the canonical donor splice site (3rd nucleotide in the intron side, not part of the splice site):

$ transvar ganno -i 'chr7:5568789C>G' --ccds

output a pure intronic variation

chr7:5568789C>G      CCDS5341.1 (protein_coding)     ACTB    -
   chr7:g.5568789C>G/c.363+3G>C/.    inside_[intron_between_exon_2_and_3]
   CSQN=IntronicSNV;source=CCDS

UTR region

$ transvar ganno -i 'chr2:25564781G>T' --refseq

results in a UTR-containing CSQN field

chr2:25564781G>T     NM_022552.4 (protein_coding)    DNMT3A  -
   chr2:g.25564781G>T/c.1-27928C>A/. inside_[5-UTR;noncoding_exon_1]
   CSQN=5-UTRSNV;dbxref=GeneID:1788,HGNC:2978,HPRD:04141,MIM:602769;aliases=NP_0
   72046;source=RefSeq
chr2:25564781G>T     NM_175629.2 (protein_coding)    DNMT3A  -
   chr2:g.25564781G>T/c.1-27928C>A/. inside_[5-UTR;intron_between_exon_1_and_2]
   CSQN=IntronicSNV;dbxref=GeneID:1788,HGNC:2978,HPRD:04141,MIM:602769;aliases=N
   P_783328;source=RefSeq
chr2:25564781G>T     NM_175630.1 (protein_coding)    DNMT3A  -
   chr2:g.25564781G>T/c.1-27928C>A/. inside_[5-UTR;intron_between_exon_1_and_2]
   CSQN=IntronicSNV;dbxref=GeneID:1788,HGNC:2978,HPRD:04141,MIM:602769;aliases=N
   P_783329;source=RefSeq

Non-coding RNA

Given Ensembl, GENCODE or RefSeq database, one could annotate non-coding transcripts such as lncRNA. E.g.,

$ transvar ganno --gencode -i 'chr1:g.3985200_3985300' --refversion mm10

results in

chr1:g.3985200_3985300       ENSMUST00000194643.1 (lincRNA)  GM37381 -
   chr1:g.3985200_3985300/c.121_221/.        inside_[noncoding_exon_2]
   source=GENCODE
chr1:g.3985200_3985300       ENSMUST00000192427.1 (lincRNA)  GM37381 -
   chr1:g.3985200_3985300/c.685_785/.        inside_[noncoding_exon_1]
   source=GENCODE

or

$ transvar ganno --refseq -i 'chr14:g.20568338_20569581' --refversion mm10

results in

chr14:g.20568338_20569581    NR_033571.1 (lncRNA)    1810062O18RIK   +
   chr14:g.20568338_20569581/c.260-1532_260-289/.    inside_[intron_between_exon_4_and_5]
   dbxref=GeneID:75602,MGI:MGI:1922852;source=RefSeq
chr14:g.20568338_20569581    NM_030180.2 (protein_coding)    USP54   -
   chr14:g.20568338_20569581/c.2188+667_2188+1910/.  inside_[intron_between_exon_15_and_16]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=NP_084456;source=RefSeq
chr14:g.20568338_20569581    XM_006519703.3 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.2359+667_2359+1910/.  inside_[intron_between_exon_16_and_17]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_006519766;source=RefSeq
chr14:g.20568338_20569581    XM_011245226.2 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.1972+667_1972+1910/.  inside_[intron_between_exon_13_and_14]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_011243528;source=RefSeq
chr14:g.20568338_20569581    XM_011245225.2 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.2359+667_2359+1910/.  inside_[intron_between_exon_16_and_17]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_011243527;source=RefSeq
chr14:g.20568338_20569581    XM_006519705.3 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.2188+667_2188+1910/.  inside_[intron_between_exon_15_and_16]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_006519768;source=RefSeq
chr14:g.20568338_20569581    XM_011245227.2 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.2359+667_2359+1910/.  inside_[intron_between_exon_16_and_17]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_011243529;source=RefSeq
chr14:g.20568338_20569581    XM_017316224.1 (protein_coding) USP54   -
   chr14:g.20568338_20569581/c.2359+667_2359+1910/.  inside_[intron_between_exon_16_and_17]
   dbxref=GeneID:78787,MGI:MGI:1926037;aliases=XP_017171713;source=RefSeq

or using Ensembl

$ transvar ganno --ensembl -i 'chr1:g.29560_29570'

results in

chr1:g.29560_29570   ENST00000488147 (unprocessed_pseudogene)        WASH7P  -
   chr1:g.29560_29570/c.1_11/.       inside_[noncoding_exon_1]
   promoter_region_of_[WASH7P]_overlaping_1_bp(9.09%);source=Ensembl
chr1:g.29560_29570   ENST00000538476 (unprocessed_pseudogene)        WASH7P  -
   chr1:g.29560_29570/c.237_247/.    inside_[noncoding_exon_1]
   source=Ensembl
chr1:g.29560_29570   ENST00000473358 (lincRNA)       MIR1302-10      +
   chr1:g.29560_29570/c.7_17/.       inside_[noncoding_exon_1]
   source=Ensembl

Coding Start and Stop

The following illustrates deletion of a coding start.

$ transvar ganno -i "chr7:g.5569279_5569288del" --ccds

results in

chr7:g.5569279_5569288del    CCDS5341.1 (protein_coding)     ACTB    -
   chr7:g.5569279_5569288delCATCATCCAT/c.3_12delGGATGATGAT/. inside_[cds_in_exon_1]
   CSQN=CdsStartDeletion;left_align_gDNA=g.5569277_5569286delATCATCATCC;unaligne
   d_gDNA=g.5569279_5569288delCATCATCCAT;left_align_cDNA=c.1_10delATGGATGATG;una
   lign_cDNA=c.1_10delATGGATGATG;cds_start_at_chr7:5569288_lost;source=CCDS

Deletion of a coding stop

$ transvar ganno -i "chr7:g.5567379_5567380del" --ccds

results in

Coding start loss due to SNP

$ transvar ganno -i "chr7:g.5568911T>A" --refseq

results in

chr7:g.5568911T>A    NM_001101.3 (protein_coding)    ACTB    -
   chr7:g.5568911T>A/c.244A>T/p.M82L inside_[cds_in_exon_3]
   CSQN=Missense;codon_pos=5568909-5568910-5568911;ref_codon_seq=ATG;dbxref=Gene
   ID:60,HGNC:132,HPRD:00032,MIM:102630;aliases=NP_001092;source=RefSeq
chr7:g.5568911T>A    XM_005249818.1 (protein_coding) ACTB    -
   chr7:g.5568911T>A/c.244A>T/p.M82L inside_[cds_in_exon_3]
   CSQN=Missense;codon_pos=5568909-5568910-5568911;ref_codon_seq=ATG;dbxref=Gene
   ID:60,HGNC:132,HPRD:00032,MIM:102630;aliases=XP_005249875;source=RefSeq
chr7:g.5568911T>A    XM_005249819.1 (protein_coding) ACTB    -
   chr7:g.5568911T>A/c.1A>T/.        inside_[cds_in_exon_2]
   CSQN=CdsStartSNV;C2=cds_start_at_chr7:5568911;dbxref=GeneID:60,HGNC:132,HPRD:
   00032,MIM:102630;aliases=XP_005249876;source=RefSeq
chr7:g.5568911T>A    XM_005249820.1 (protein_coding) ACTB    -
   chr7:g.5568911T>A/c.1-564A>T/.    inside_[5-UTR;noncoding_exon_3]
   CSQN=5-UTRSNV;dbxref=GeneID:60,HGNC:132,HPRD:00032,MIM:102630;aliases=XP_0052
   49877;source=RefSeq

Coding stop loss due to SNP

$ transvar ganno -i "chr7:g.5567379C>A" --refseq

results in

chr7:g.5567379C>A    NM_001101.3 (protein_coding)    ACTB    -
   chr7:g.5567379C>A/c.1128G>T/.     inside_[cds_in_exon_6]
   CSQN=CdsStopSNV;C2=cds_end_at_chr7:5567379;dbxref=GeneID:60,HGNC:132,HPRD:000
   32,MIM:102630;aliases=NP_001092;source=RefSeq
chr7:g.5567379C>A    XM_005249818.1 (protein_coding) ACTB    -
   chr7:g.5567379C>A/c.1128G>T/.     inside_[cds_in_exon_6]
   CSQN=CdsStopSNV;C2=cds_end_at_chr7:5567379;dbxref=GeneID:60,HGNC:132,HPRD:000
   32,MIM:102630;aliases=XP_005249875;source=RefSeq
chr7:g.5567379C>A    XM_005249819.1 (protein_coding) ACTB    -
   chr7:g.5567379C>A/c.885G>T/.      inside_[cds_in_exon_5]
   CSQN=CdsStopSNV;C2=cds_end_at_chr7:5567379;dbxref=GeneID:60,HGNC:132,HPRD:000
   32,MIM:102630;aliases=XP_005249876;source=RefSeq
chr7:g.5567379C>A    XM_005249820.1 (protein_coding) ACTB    -
   chr7:g.5567379C>A/c.762G>T/.      inside_[cds_in_exon_7]
   CSQN=CdsStopSNV;C2=cds_end_at_chr7:5567379;dbxref=GeneID:60,HGNC:132,HPRD:000
   32,MIM:102630;aliases=XP_005249877;source=RefSeq

Batch processing

To Illustrate batch processing with the following small batch input

$ cat data/small_batch_input
chr3 178936091       G       A       CCDS43171
chr9 135782704       C       G       CCDS6956
$ transvar ganno -l data/small_batch_input -g 1 -n 2 -r 3 -a 4 -t 5 --ccds
chr3|178936091|G|A|CCDS43171 CCDS43171.1 (protein_coding)    PIK3CA  +
   chr3:g.178936091G>A/c.1633G>A/p.E545K     inside_[cds_in_exon_9]
   CSQN=Missense;dbsnp=rs104886003(chr3:178936091G>A);codon_pos=178936091-178936
   092-178936093;ref_codon_seq=GAG;source=CCDS
chr9|135782704|C|G|CCDS6956  CCDS6956.1 (protein_coding)     TSC1    -
   chr9:g.135782704C>G/c.1317G>C/p.L439L     inside_[cds_in_exon_11]
   CSQN=Synonymous;dbsnp=rs770692313(chr9:135782704C>G);codon_pos=135782704-1357
   82705-135782706;ref_codon_seq=CTG;source=CCDS

One can also make a HGVS-like input and call

$ cat data/small_batch_hgvs
CCDS43171    chr3:g.178936091G>A
CCDS6956     chr9:g.135782704C>G
$ transvar ganno -l data/small_batch_hgvs -m 2 -t 1 --ccds
CCDS43171|chr3:g.178936091G>A        CCDS43171.1 (protein_coding)    PIK3CA  +
   chr3:g.178936091G>A/c.1633G>A/p.E545K     inside_[cds_in_exon_9]
   CSQN=Missense;dbsnp=rs104886003(chr3:178936091G>A);codon_pos=178936091-178936
   092-178936093;ref_codon_seq=GAG;source=CCDS
CCDS6956|chr9:g.135782704C>G CCDS6956.1 (protein_coding)     TSC1    -
   chr9:g.135782704C>G/c.1317G>C/p.L439L     inside_[cds_in_exon_11]
   CSQN=Synonymous;dbsnp=rs770692313(chr9:135782704C>G);codon_pos=135782704-1357
   82705-135782706;ref_codon_seq=CTG;source=CCDS

The first column for transcript ID restriction is optional.