If necessary, open the actual output page of the GeneSeqer@PlantGDB analysis described below.

For large files, most browsers will take a while to correctly process all location tags. This may cause links on the graphic to appear non-functional. After the browser has completely loaded the page however, all links will be fully functional.

Below is an example of the GeneSeqer output. Only 60,000 base pairs can be shown on a single page so larger sequences will be shown in segments. You may choose which segment you are looking at by clicking on the dropdown menu next to the word 'Choose'.


In the output you will notice the different thinknesses of the horizontal lines. How to tell what those horizontal lines mean is in the next section, but he thinkness of those lines is very important. They are used to illustrate the difference between exons and introns (vocabulary review). The thicker retangular boxes represent exons. The exons appear to be linked be thin lines, these represent introns.

For each output the color scheme should be consistent. In order to find out what each color stands for you must move your cursor over the "PREDICTION SUMMARY" title above the graphic (color-coded output). Doing this will reveal a legend of the color scheme for your output. Typically, the area of your DNA sequence where there is believed to be a gene is indicated by the black bar. The ESTs and cDNAs from the database that have matched the original DNA sequece are shown in red. The orange arrows represent the predicted location of Open Reading Frames (ORF). And finally, green arrows represent the location of possible alternative gene structures. You should looking closely at your results to see if you can identify unique alternative gene structures for the DNA sequence you have tested.

The summary graphic (top window)

The summary graphic is clickable; by selecting a structure (colored arrow) within the graphic, the corresponding data file (alignment file) in the lower window will be scrolled to the appropriate section dealing with the element represented by your selection. Colored arrows represent aligned sequences and predicted gene structures according to their unique color.

The alignment file (bottom window)

The alignment file found in the lower pane of the results window is the heart of the GeneSeqer@PlantGDB output. This text shows the base-to-base alignment of the cDNAs and ESTs with your DNA sequence. Predicted introns are shown as strings of periods '.'. Score statistics for the alignment quality as well as the predicted splice site quality are shown for each aligned sequence. In addition, links to the source of each sequence are provided above their respective alignments.

The predicted gene structures and ORFs

The culmination of the GeneSeqer@PlantGDB analysis is the prediction of an accurate gene structure. The quality of this prediction can be assessed by comparing the predicted probable open reading frame (ORF) to know proteins. Predicted ORFs are shown as orange arrows in the summary graphic. Additionaly, the longest ORF as well as its translation frame is displayed in the alignment file. The NCBI blastp link following the translated ORF sequence in the alignment file will allow you to more easily find homologs for this predicted gene.

Further investigation & refined analysis

Detailed (Refined) analysis using GeneSeqer@PlantGDB

Interesting gene regions found through the process described above can be further refined through various methods. One such method, demonstrated in this paragraph, involves a detailed look at the evidence (ESTs and cDNAs) supporting a given gene structure. Through the spliced alignment of "All Plants" ESTs and cDNAs to the restricted region, insight into possible alternative gene structures, polymorphisms, and differential transcription is made possible. To demonstrate this concept, we have choosen the 15kb region extending from base 7500 to base 22500 of the Sorghum bicolor BAC analyzed above. The results are available here. This analysis was done in the same manor as above with the exceptions that the 7500 to 22500 range was input in step 2 and the "All Plants" EST and cDNA options were choosen in step 3.

As shown by the summary graphic, three distinct gene regions have been characterized. These three gene regions putatively represent a mitochondrial carrier protein, subunit 1 of a cleavage stimulation factor, and a serine threonine kinase based on BlastP queries with the NCBI database as described in the next section. Interestingly, spliced alignment of non-native (non Sorghum) transcripts alone are responsible for the characterization of the mitochondiral carrier protein in the 7800 to 11800 region shown to the left. Also noteworthy is the apparent alternative gene structure represented by an exon in the 9438 to 9477 region of this gene. The native transcript presumably encoded by this gene region is assumed to lack this exon or to express it as an alternatively spliced product due to the low local alignment similarity of the homologous sequence alignments. Investigation as to the origin of the transcripts corresponding to each gene structure reveal two (2) transcripts arising from monocotyledons (Secale cereale (rye) gi:10093099; Oryza sativa (rice) gi:27547342) and two (2) transcripts arising from dicotyledons (Solanum tuberosum (potato) gi:17074557l Lycopersicon esculentum (tomato) gi:18260535). In this example, the gene structure lacking the exon in question is supported by spliced alignment of the monocot homologs and thus most likely represents the native Sorghum gene transcript.

Homologous protein alignment using GeneSeqer@PlantGDB

Determining the complete gene structure, representing the entire coding region, of a gene is in some cases not possible using the alignment of transcribed sequences alone. As mentioned above, inclusion of homologous transcripts can increase the coverage of these alignments but is not always sufficient to produce a complete gene structure. For this reason, GeneSeqer@PlantGDB includes an interface allowing the alignment of homologous proteins. These homologs may be determined through the use of the NCBI blastp link provided in the ORF section of the web service results. To perform this analysis on your own results, simply click on the "NCBI blastp" link in the alignment file. The results shown here represent such alignments in the 7800 to 11800 region of the Sorghum BAC used throughout this demonstration.

The 10 putatively homologous proteins aligned in this example were obtained using the "NCBI blastp" link provided in the GeneSeqer@PlantGDB alignment file. Each ORF prediction found in the text results is followed by this link to facilitate searches against the NCBI non-redundant database. In this example, all 10 proteins demonstrated high correlation as shown by very low e-values of at most 8e-66. As was shown in the preceding section, homologous alignments of two (2) putative Arabidopsis thaliana proteins suggest an alternative gene structure while alignment of the other eight (8) protein sequences confers the predicted native gene structure.

If you have gone through this entire exercise and believe that you have found a unique gene structure (or structures) within your DNA segment, you can contribute that information to the database and be given credit for your discovery! Simply to enter your information and the DNA region where you believe the unique gene(s) to be located. Researchers will then verify your findings and enter your information into the database. REPORT YOUR UNIQUE GENES.