Calculating Annotation Integrity

Gene structure annotation integrity is evaluated by estimating the level of annotation support which is required to provide a structure unlikely to be significantly altered by reannotation. Annotation support is established through the evaluation of 'Isoform Specific Evidence, (ISE)'. Individual elements of the annotation structure (i.e. introns, exons, and UTRs) are compared with each ISE spliced alignment. Verification of the individual structural elements is used to establish an integrity score. The integrity score 'Φ' is given by the following formula.

Φ = (.6 * α) + (.3 * β) + (.05 * γ) + (.05 * ε)

In this equation 'α' is the percentage of confirmed intron structures, 'β' represents the percentage of the annotated structure overlapped by at least one ISE, 'γ' the ratio of the observed 5` UTR length to its expected length (with a maximum of 1), and 'ε' the ratio of the observed 3` UTR length to its expected (with a maximum of 1). Weights in the above formula are required to sum to 1 and all parameters are bounded between 0 and 1 thereby normalizing 'Φ'. A substantial number of gene annotations represent intonless transcripts. For these annotations 'Φ' is calculated as above with 'α' representing the percent deviation of the predicted CDS length as compared with expected length. Expected lengths for each UTR and the CDS are determined empirically. For each, a collection of sequence verified lengths are evaluated. An expected length is then determined as the length achieved by 95% of the evaluated sequences.