Alternative splicing visualization tool...
Minimum intron length cutoff and its affect on our study
An anonymous reviewer suggests that the minimum intron length in our study be set to 60nt instead our adopted threshold of 20nt. This suggestion was based on experimental determinations by Goodall & Filipowicz (1). We rather disagree with the reviewer's suggestion. The cited experiments used synthetic introns in maize and tobacco protoplast systems and determined the minimum intron size in those system to be 70-73 (1) We would like to argue that genome-wide studies as presented in our manuscript are particularly useful to probe the generality of experimental observations on a limited number of genes. As our method also relies on experimental evidence (transcript sequences sampled from cDNA libraries aligned to the genome), we think that our data in fact convincingly show that introns of short size can be spliced in plants (as they can in animal systems; for example, C. elegans is replete with introns shorter than 40nt (2). Arabidopsis introns shorter than 60nt were also reported previously based on database searches (2,3).
Precisely, we identified in our study 257 and 1034 introns in the size range 20-59nt in Arabidopsis and rice, respectively (corresponding to 0.2% and 0.9% of the total number of introns in our study). 26% and 13% of the short introns have high splice site scores, and additional 9.3% and 9.6% are supported by multiple EST/cDNA transcripts in Arabidopsis and rice, respectively. Two typical cases are shown in Figure A. In the first case, the 40nt putative intron has consensus splice sites. We cannot exclude the possibility that the genome sequences was incorrectly assembled and misses sequences within the presumed intron bounds, but it would seem more reasonable to take alignments as shown as evidence for the existence of introns less than 60nt.
We calculated the mean and medium intron lengths after removing introns less than 60nt, as suggested by an the reviewer. The mean and medium lengths in Arabidopsis remained unchanged (173nt and 101nt, respectively), and the values were only increased by 3-4nt in rice, giving adjusted values of 437nt and 163nt, respectively.
As to the GC-content of short introns, the mean GC-content is 42.9% for Arabidopsis short introns and 55.2% for rice short introns. These percentages are indeed considerably higher than the average GC-content of longer introns (32.7% for Arabidopsis and 37.3% for rice). However, the reported tendencies of IntronR indidence rates as a function of GC-content remained unchanged after removal of the short introns. As shown in Figure B, high GC-content introns are more likely to be retained in both species and are more prevalent in rice.
Of course, some of the small intron predictions may derive from sequence and/or alignment errors. But, as none of our conclusions are affected by removing the short introns, and the minimum intron size seems to be clearly less than 60nt, we did not make changes in the text except for a citation to this material as a web supplement (page 17).
A. gi 42469517 aligned to gene model At2g11810
ATCATCACTA AGGTAAGTCA TATCTTTGAA TCCTACCCAT TCATATAAAT AGCACTAAGA 4753308
|||||||||| || ||||||||
ATCATCACTA AG........ .......... .......... .......... ..CACTAAGA 1132
TTTTATGACA TAAAACAGGC TGGTCCGGGT ACGATTGCGG AAGCACTGAT TTGCGGCCTC 4753368
|||||||||| |||||||||| |||||||||| |||||||||| |||||||||| ||||||||||
TTTTATGACA TAAAACAGGC TGGTCCGGGT ACGATTGCGG AAGCACTGAT TTGCGGCCTC 1192
B. gi 26449531 aligned to gene model At5g26850
GGGAGAGTTT TCACATATCT TCGCTACTGT TGATGAGATT GTACATGCCA TTCTTGATAA 9447203
|||||||||| |||||||||| |||||||||| ||||||||||
GGGAGAGTTT TCACATATCT TCGCTACTGT TGATGAGATT .......... .......... 697
TTACGAGGCA GACATGATTG TTCAGACAAA TGAAGACAGA GAAGAGCAAA ATTGTAACTG 9447263
| |||||||||| |||||||||| |||||||||| ||||||||||
.......... .........G TTCAGACAAA TGAAGACAGA GAAGAGCAAA ATTGTAACTG 738
Figure A: Small introns and their flanking sequences. In each panel, the upper sequence represents the Arabidopsis genome, and the lower sequence represents the cDNA. Introns are indicated by dots. For Panel A, the full-length cDNA gi42469517 is derived from gene At2g11810. Alignments of other two transcripts (gi997294 and gi4252843) from the gene show a 66nt intron utilizing a downstream acceptor site. The alternative portion is denoted by gray letters in the alignment. Splicing of the 40nt intron results in a minor transcript isoform which produces a truncated protein. For Panel B, the indicated 39nt intron is the 7th intron in a 19-exon gene structure that encodes a 970 amino acids protein of unknown function.
Figure B. Incidence of IntronR in dependence on GC-content in introns longer than 60nt. The number on the top of each column represents the actual number of IntronR cases in each bin.
1. Goodall, G. J. & Filipowicz, W. (1990) Plant Mol Biol 14, 727-733.
2. Lim, L. P. & Burge, C. B. (2001) PNAS 98, 11193-11198.
3. Deutsch, M. & Long, M. (1999) Nucleic Acids Res 27, 3219-3228.