Frequently Asked Questions
Questions are organized by category. Click a category to expand it, then select a question.
To view all questions in all categories, click [Expand].
If you don't find the answer to your question, please use our feedback form (top).
PlantGDB provides sequence data for >70,000 plant species, custom EST assemblies (PUT) for over 150 species, web tools and plant genome browsers, as well as an outreach portal for plant genomics. For more information on PlantGDB, visit our About page or take a brief tour on our Help Home Page.
Use the 'feedback link' at the top right corner of any PlantGDB web page. We will contact you within 24 hours. You are also welcome to contact any of the PlantGDB contacts listed under About.
PlantGDB has been optimized for use with Firefox 3, Safari, or Internet Explorer 7 / 8. Many advanced features require that Javascript be enabled. If you encounter problems viewing any page at PlantGDB.org, please contact us using our feedback page. Please include a description of what didn't work as expected, and what web browser/operating system you were using. We will do our best to address the problem.
- PlantGDB's Public Plant Sequence data is updated every four months, coinciding with every other GenBank Version Release (December, April, and August). Transcript assemblies (PUT) are updated at this time and are typically made available 2-4 weeks after version update.
- Genome data at PlantGDB are updated periodically when a new genome assembly becomes available, or when transcript data are significantly increased.
- For more information , see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.
- Sequence data and metadata data are stored on our servers in three primary forms: 1) In MySQL databases which store metadata and links to other data types; 2) In multiFASTA-formatted sequence files, for sequence retrieval using FASTACMD; 3) In indices for BLAST and GeneSeqer analysis.
- For more information about how to access and download PlantGDB sequence data, see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.
PlantGDB's genome focus is on accurate spliced alignments of transcript to genomes, a critical component of accurate genome annotation. The xGDB genome browser platform used at PlantGDB has unique features that make it useful for viewing and annotating genomes:
- All splicing evidence can be viewed online and reproduced using web tools provided at PlantGDB.
- A community annotation tool (yrGATE) and gene model incongruence-detection system (GAEVAL) are built in, to facilitate genome annotation.
- Each xGDB has powerful BLAST tools and search tools to retrieve upstream sequence for motif analysis.
- xGDB supports the DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities.
- The complete xGDB code is available as open source software and can be custome-installed on a Linux server.
For more information, see the Genome Browser Help Page.
Likely reasons include: too large a region chosen; or region is very heavily annotated with one track type (typically, EST). In either case, the load on the graphics engine causes a long delay in track display times. Solutions:
- Re-enter a set of coordinates that span a narrower region and try again.
- If problem remains, try unselecting the EST track type using the track control and re-submit the region request.
- If you are unable to solve the problem, please contact us using the Feedback form, describing the region you were attempting to view.
Each genome has a "Downloads" page, accessible from the left panel on the GDB home page. Or, access it directly using this url: http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation. On this page you will find:
- FASTA files containing all the genomic and aligned data from the current GDB version
- The complete MySQL database, in a flat file format that can be used to recreat the database locally.
- For some genomes, a 0README file is included to describe special data
A. Yes, you can retrieve selected up/downstream sequences using the Search ID/Keyword tool:
- From any GDB home page, click Search ID/Keyword on the left side menu
- Enter IDs for one or more sequences (either aligned transcripts/proteins or gene models), or a keyword in quotes
- Optionally, limit search to relevant data type(s) by clicking appropriate selections under Limit Search
- Click Search to retrieve records. This may take up to a minute or more for large searches.
- On the results page under Retrieve Sequences, select 5' region, enter desired range, and select whether you want to exclude other overlapping genes
- 6) Click the Sequence ID column header checkbox to select all sequences for retrieval (or click individual checkboxes to select a subset). [Note: if the retrieval set is too large the program will error out]
- Click Retrieve FASTA to retrieve the desired sequences. This may take a minute or more for large datasets
B. If you need to retrieve ALL the upstream or downstream sequences from an annotated genome, you will need to download the genome data from PlantGDB and use appropriate tools on your local machine.
Below is a a step-by-step guide to the process you will need to follow (you will need access to MySQL and NCBI blastall or similar package):
- Download the FASTA genome sequence and the genome database .sql from e.g. http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Zm
- Create a local MySQL database from the .sql file and write a MySQL query to retrieve the upstream coordinates from each gene model. You will use the table called chr_gene_annotation, and your queries will look something like this:
select geneId, chr, r_pos + 1 as f_seq_start, r_pos + 1000 as f_seq_end from chr_gene_annotation where strand="f"; select geneId, chr, l_pos - 1000 as r_seq_start, l_pos - 1 as r_seq_end from chr_gene_annotation where strand="r";
- Format the genome FASTA using e.g. formatdb with -o T (see Note below)
- Create scripts to retrieve each sequence range as a FASTA file from each genome/chromosome using blastall's fastacmd (http://www.ncbi.nlm.nih.gov/BLAST/docs/fastacmd.html) or equivalent package.
- For fastacmd, the following options apply for blastall versions before 2.2.21. [Note that NCBI has recently updated blast to BLAST+ 2.2.23 (View new blast information) and the command line syntax has changed].
- use -d to specify the indexed genome data target
- use -s to specify the chromosome in a multifasta file
- use the -L option to specify the range
- use the -S option to get appropriate strand from f_seq and r_seq if that's important
- use the -o option to give the output file a name according to the geneId (or use some other naming scheme as appropriate)
Example: fastacmd -d /path/to/genome_data -s chr1 -L1000,2000 -S2 -o filename1.fasta
We don't make this data directly available but you can derive it easily from our database tables which are available for download, if you have access to MySQL and a scripting language.
First download the appropriate genome MySQL database from http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation, e.g. Zm for maize. Once you create the database locally you can derive the coordinate as follows:
- Find the table that stores gene model information; it is named either chr_gene_annotation (for chromosome-based browsers) or gseg_gene_annotation (for BAC or scaffold-based browsers).
- The relevant columns are chr (or gseg_gi), l_pos, r_pos, CDSstart, CDSstop and strand.
- A query such as the following will build a tabular output featuring the 3'UTR chr/coordinates, length and direction:
mysql>SELECT geneID, chr, IF(strand="f", CDSstop, l_pos) AS left_position, IF(strand="f", r_pos, CDSstop) AS right_position, IF(strand="f", r_pos-CDSstop, CDSstop-l_pos) AS length, strand FROM chr_gene_annotation;
Once you have the coordinates you can build a script to retrieve the data from the genome sequence (which is also available from the same download page referenced above), using fastacmd or perl, python or similar scripting language.
DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities. Several PlantGDB genome have DAS-served data - see DAS Services for details.
For more information on DAS, see the Genome Browser Help Page.
PlantGDB downloads GenBank and UniProt sequence data approximately every four months, corresponding to every other GenBank Release. Sequence data is parsed according to a database schema, and individual sequence files are filtered to detect vector and repeat sequence. When you download FASTA-formatted sequence data from PlantGDB, you may see differences in the masking of repeat or vector regions, but the sequence is otherwise identical.
- PUT = PlantGDB-assembled Unique Transcript. PlantGDB regularly assembles transcript sequences (EST and cDNA) for species with >10,000 sequences in GenBank, as well as by request for smaller or combined datasets. The resulting sequence assemblies (PUTs) are made available for search, download, BLAST, and spliced alignment using GeneSeqer.
- PUT assemblies include both contigs (comprising multiple sequences) and singletons. They are named according to version number, genus_species, and sequence number.
For more information visit the EST Assembly Page (Home>Left Menu>EST Assembly).
You can download sequence for any plant species by going to the Download portal (Home>Download>Sequence). Enter Genus/species and click 'Search'. (For popular species, use the shortcut "Featured Species" on the Home Page left menubar.)
To download PUT assemblies, go to the EST contig Download portal (Home>EST Assembly>Download)
To download large datasets, visit our ftp site at ftp.plantgdb.org where you can download all PUT assemblies or plant sequences using ftp.
PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).
If you visit the Download page for any species, you can retrieve files named as:
- Genus_species.PUT_member.txt
- Genus_species.alignment.txt
Which both provide the mapping of the ESTs to a PUT.
Alternatively, from the "Search" page, e.g.
http://www.plantgdb.org/search/display/data.php?Seq_ID=PUT-157a-Oryza_sativa-6232
You can view or retrieve the EST components of an individual PUT
PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).
PlantGDB's taxonomic conventions will always reflect NCBI's current naming system since our data source is GenBank. Check the current taxonomic name for your species using GenBank's Taxonomy browser. It is possible that the genus and/or species name has changed.
