| mm9 numbers by annotation |
|
| annotations | 21224 |
| annotation is all alternative | 657 |
| annotation is all constitutive | 11090 |
| merged refseq | 105 |
|
|
|
| mm9 numbers by span | |
| spans | 435538 |
| constitutive exons | 148495 |
| constitutive junctions | 120407 |
| alternative junctions | 84604 |
| alternative exon spans | 82032 |
scripts are packaged in in onepipe_template.tgz
parameters below are specific to mm9_f08 build.
Input data for mm9_f08 include UCSC knownGenes
The scripts are capable of handling refSeq, ESTs,mRNAs
1.) Load spans into database : each exon and junction of each transcript gets assigned to a span.
Chrom,start,end,strand,type are unique.
An isoform list keeps track of isoforms that contain the span.
Junctions less than 30 bases in length are not used. Surrounding exons are joined together.
2.) Filtering : remove if only in one EST, Not used in mm9_f08
3.) Filtering : drop bad splice sites (and not constitutive)
Accepted splice sites
/GT/ /AG/
/AT/ /AC/
/AT/ /AG/
/GC/ /AG/
/GA/ /AG/
/GT/ /TG/
4.) Filtering : remove exons < 8 in length (and not constitutive)
5.) Stretching : Terminal exons are stretched if the end of a terminal (for an isoform) exon span
falls within another exonic region, extend that end to the next nearest splice site/exon boundary.
Script iterates until no further stretching is required.
6.) Trimming : Terminal exons are trimmed only upto 50 bases if the end of a terminal (for an isoform) exon span falls within an intronic region,
TRIM that end to the next nearest splice site/exon boundary IF it is <50bp away.
Script iterates until no further trimming is required.
7.) Nonoverlap : Exon spans are broken up into exonnic regions and exon-exon junction spans (ej)
8.) No nulls : An error check to make sure every span is assigned to an annotation.
9.) No duplications : An error check to make sure no annotations are duplicated to other chroms or strands.
10.) Annotations : The splicing graph is walked to group spans into linked annotations. As spans are joined the annotation name is propagated to all linked spans.
11) hname : A human friendly name is assigned to each annotation. It must be unique. Order of preference :refSeq , knownGenes , mRNAs
12.) Walk and label exons: A splicing graph is created for each annotation.It is walked and exonic positions are labeled 5' to 3'.
13.) Walk and label junctions: Junctions for each annotaion are labeled by starting position and ending postion.
14.) Walk and label alt_regions: Alt_regions are labelled 5' to 3'
15.) Bed files: Bed files of the annotations and spans are created for display in the browser
16.) some hand editting to prevent merging of refSeqs may be required. This is done by moving txStart or txEnd
to seperate the annotations
Walk and Label Example
Span Table
The Span Table is created using non-overlapping regions. These regions are used as labels to describe the exon,intron or junction. The last two numbers in the name correspond to the start and end positions. For example, the three isoforms below would result in the following spans.
| name | constituitive? | alternative region | type of span |
| gene.0.1.1.ex | yes | 0 | Exon |
| gene.1.2.2.ex | no | 1 | Exon |
| gene.1.3.3.in | no | 1 | Intron |
| gene.1.4.4.ex | no | 1 | Exon |
| gene.0.5.5.ex | yes | 0 | Exon |
| gene.2.6.6.ex | no | 2 | Exon |
| gene.2.7.7.in | no | 2 | Intron |
| gene.0.8.8.ex | yes | 0 | Exon |
| Junctions | |||
| gene.1.1.2.ej | no | 1 | Exon-Exon Junction |
| gene.1.1.5.sj | no | 1 | Intron Splice Junction |
| gene.1.2.4.sj | no | 1 | Intron Splice Junction |
| gene.1.4.5.ej | no | 1 | Exon-Exon Junction |
| gene.2.5.6.ej | no | 2 | Exon-Exon Junction |
| gene.2.5.8.sj | no | 2 | Intron Splice Junction |
| gene.2.6.8.sj | no | 2 | Intron Splice Junction |