Frequently Asked Questions

Questions are organized by category. Click a category to expand it, then select a question.
To view all questions in all categories, click [Expand].
If you don't find the answer to your question, please use our feedback form (top).

1. General Questions

PlantGDB provides sequence data for >70,000 plant species, custom EST assemblies (PUT) for over 150 species, web tools and plant genome browsers, as well as an outreach portal for plant genomics. For more information on PlantGDB, visit our About page or take a brief tour on our Help Home Page.

Use the 'feedback link' at the top right corner of any PlantGDB web page. We will contact you within 24 hours. You are also welcome to contact any of the PlantGDB contacts listed under About.

PlantGDB has been optimized for use with Firefox 3, Safari, or Internet Explorer 7 / 8. Many advanced features require that Javascript be enabled. If you encounter problems viewing any page at PlantGDB.org, please contact us using our feedback page. Please include a description of what didn't work as expected, and what web browser/operating system you were using. We will do our best to address the problem.

  • PlantGDB's Public Plant Sequence data is updated every four months, coinciding with every other GenBank Version Release (December, April, and August). Transcript assemblies (PUT) are updated at this time and are typically made available 2-4 weeks after version update.
  • Genome data at PlantGDB are updated periodically when a new genome assembly becomes available, or when transcript data are significantly increased.
  • For more information , see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.
  • Sequence data and metadata data are stored on our servers in three primary forms: 1) In MySQL databases which store metadata and links to other data types; 2) In multiFASTA-formatted sequence files, for sequence retrieval using FASTACMD; 3) In indices for BLAST and GeneSeqer analysis.
  • For more information about how to access and download PlantGDB sequence data, see FAQ categories 'Plant Sequence and PUT assemblies' and 'Genome Browsers' below.
2. Genomes / Genome Browsers

PlantGDB's genome focus is on accurate spliced alignments of transcript to genomes, a critical component of accurate genome annotation. The xGDB genome browser platform used at PlantGDB has unique features that make it useful for viewing and annotating genomes:

  • All splicing evidence can be viewed online and reproduced using web tools provided at PlantGDB.
  • A community annotation tool (yrGATE) and gene model incongruence-detection system (GAEVAL) are built in, to facilitate genome annotation.
  • Each xGDB has powerful BLAST tools and search tools to retrieve upstream sequence for motif analysis.
  • xGDB supports the DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities.
  • The complete xGDB code is available as open source software and can be custome-installed on a Linux server.

For more information, see the Genome Browser Help Page.

Likely reasons include: too large a region chosen; or region is very heavily annotated with one track type (typically, EST). In either case, the load on the graphics engine causes a long delay in track display times. Solutions:

  • Re-enter a set of coordinates that span a narrower region and try again.
  • If problem remains, try unselecting the EST track type using the track control and re-submit the region request.
  • If you are unable to solve the problem, please contact us using the Feedback form, describing the region you were attempting to view.

Each genome has a "Downloads" page, accessible from the left panel on the GDB home page. Or, access it directly using this url: http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation. On this page you will find:

  • FASTA files containing all the genomic and aligned data from the current GDB version
  • The complete MySQL database, in a flat file format that can be used to recreat the database locally.
  • For some genomes, a 0README file is included to describe special data

A. Yes, you can retrieve selected up/downstream sequences using the Search ID/Keyword tool:

  1. From any GDB home page, click Search ID/Keyword on the left side menu
  2. Enter IDs for one or more sequences (either aligned transcripts/proteins or gene models), or a keyword in quotes
  3. Optionally, limit search to relevant data type(s) by clicking appropriate selections under Limit Search
  4. Click Search to retrieve records. This may take up to a minute or more for large searches.
  5. On the results page under Retrieve Sequences, select 5' region, enter desired range, and select whether you want to exclude other overlapping genes
  6. 6) Click the Sequence ID column header checkbox to select all sequences for retrieval (or click individual checkboxes to select a subset). [Note: if the retrieval set is too large the program will error out]
  7. Click Retrieve FASTA to retrieve the desired sequences. This may take a minute or more for large datasets

B. If you need to retrieve ALL the upstream or downstream sequences from an annotated genome, you will need to download the genome data from PlantGDB and use appropriate tools on your local machine.

Below is a a step-by-step guide to the process you will need to follow (you will need access to MySQL and NCBI blastall or similar package):

  1. Download the FASTA genome sequence and the genome database .sql from e.g. http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Zm
  2. Create a local MySQL database from the .sql file and write a MySQL query to retrieve the upstream coordinates from each gene model. You will use the table called chr_gene_annotation, and your queries will look something like this:
    							
    select geneId, chr, r_pos + 1 as f_seq_start, r_pos + 1000 as f_seq_end
    from chr_gene_annotation where strand="f";
    select geneId, chr, l_pos - 1000 as r_seq_start, l_pos - 1 as r_seq_end 
    from chr_gene_annotation where strand="r";
    							
  3. Format the genome FASTA using e.g. formatdb with -o T (see Note below)
  4. Create scripts to retrieve each sequence range as a FASTA file from each genome/chromosome using blastall's fastacmd (http://www.ncbi.nlm.nih.gov/BLAST/docs/fastacmd.html) or equivalent package.
  5. For fastacmd, the following options apply for blastall versions before 2.2.21. [Note that NCBI has recently updated blast to BLAST+ 2.2.23 (View new blast information) and the command line syntax has changed].
    • use -d to specify the indexed genome data target
    • use -s to specify the chromosome in a multifasta file
    • use the -L option to specify the range
    • use the -S option to get appropriate strand from f_seq and r_seq if that's important
    • use the -o option to give the output file a name according to the geneId (or use some other naming scheme as appropriate)

Example: fastacmd -d /path/to/genome_data -s chr1 -L1000,2000 -S2 -o filename1.fasta

We don't make this data directly available but you can derive it easily from our database tables which are available for download, if you have access to MySQL and a scripting language.
First download the appropriate genome MySQL database from http://www.plantgdb.org/XGDB/phplib/download.php?GDB=Xx where Xx is the Genus/species abbreviation, e.g. Zm for maize. Once you create the database locally you can derive the coordinate as follows:

  • Find the table that stores gene model information; it is named either chr_gene_annotation (for chromosome-based browsers) or gseg_gene_annotation (for BAC or scaffold-based browsers).
  • The relevant columns are chr (or gseg_gi), l_pos, r_pos, CDSstart, CDSstop and strand.
  • A query such as the following will build a tabular output featuring the 3'UTR chr/coordinates, length and direction:
	mysql>SELECT geneID, chr, IF(strand="f", CDSstop, l_pos) AS left_position,
	IF(strand="f", r_pos, CDSstop) AS right_position, 
	IF(strand="f", r_pos-CDSstop, CDSstop-l_pos) AS length, strand
	FROM chr_gene_annotation;
					

Once you have the coordinates you can build a script to retrieve the data from the genome sequence (which is also available from the same download page referenced above), using fastacmd or perl, python or similar scripting language.

DAS (Distributed Annotation Service) standard for cross-platform data display, and provides both DAS client and DAS server capabilities. Several PlantGDB genome have DAS-served data - see DAS Services for details.

For more information on DAS, see the Genome Browser Help Page.

3. PlantGDB Sequences & PUT Assemblies

PlantGDB downloads GenBank and UniProt sequence data approximately every four months, corresponding to every other GenBank Release. Sequence data is parsed according to a database schema, and individual sequence files are filtered to detect vector and repeat sequence. When you download FASTA-formatted sequence data from PlantGDB, you may see differences in the masking of repeat or vector regions, but the sequence is otherwise identical.

  • PUT = PlantGDB-assembled Unique Transcript. PlantGDB regularly assembles transcript sequences (EST and cDNA) for species with >10,000 sequences in GenBank, as well as by request for smaller or combined datasets. The resulting sequence assemblies (PUTs) are made available for search, download, BLAST, and spliced alignment using GeneSeqer.
  • PUT assemblies include both contigs (comprising multiple sequences) and singletons. They are named according to version number, genus_species, and sequence number.

For more information visit the EST Assembly Page (Home>Left Menu>EST Assembly).

You can download sequence for any plant species by going to the Download portal (Home>Download>Sequence). Enter Genus/species and click 'Search'. (For popular species, use the shortcut "Featured Species" on the Home Page left menubar.)

To download PUT assemblies, go to the EST contig Download portal (Home>EST Assembly>Download)

To download large datasets, visit our ftp site at ftp.plantgdb.org where you can download all PUT assemblies or plant sequences using ftp.

PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).

If you visit the Download page for any species, you can retrieve files named as:

  • Genus_species.PUT_member.txt
  • Genus_species.alignment.txt

Which both provide the mapping of the ESTs to a PUT.

Alternatively, from the "Search" page, e.g.


http://www.plantgdb.org/search/display/data.php?Seq_ID=PUT-157a-Oryza_sativa-6232

You can view or retrieve the EST components of an individual PUT

Return to top

PlantGDB's sequence data is updated every 4 months, coinciding with every other GenBank Release (odd numbers). For example, recent updates included V.165 (April 2008) and V.163 (December 2007).

PlantGDB's taxonomic conventions will always reflect NCBI's current naming system since our data source is GenBank. Check the current taxonomic name for your species using GenBank's Taxonomy browser. It is possible that the genus and/or species name has changed.

Return to top

Latest News

Click below or view all news
Latest update: August 23, 2010


GenBank Release 179 (8-23)
GenBank Release 179.0 sequence data (close date 8-14-2010) are now being downloaded and parsed at PlantGDB. We expect the 179 version update, including updated PUT assemblies, to be complete by mid-September (August 23, 2010).
CpGAT reports exon origins (8-19)
The CpGAT* tool for automated, real-time gene structure annotation now reports the details of each exon origin (whether evidence ID, pasa assembly, or ab inito-derived) in the GFF3 output file. Access CpGAT from any PlantGDB genome browser.
*Comprehensive plant Gene Annotation Tool, release 1.05
Community Annotation Features (8-5)
We have updated the yrGATE tool and the CommunityCentral database for commmunity annotation. Changes include a more consistent annotation naming scheme, and easier to navigate annotation tables. PLEASE NOTE that if you have previously annotated genes, the ID has been changed; the former ID is now part of the Description field and can be retrieved using Search. The Community Annotation system is available for all genome browsers at PlantGDB.
New AcDs Tagging Data (8-3)
AcDs Tagging insertion data for maize have been updated to include redundancy information, confirmation data and southern blot and ipcr images. See example of a Ds Insertion Line

What's Coming?

Maize RefGen_v2 genome browser
The revised pseudomolecule assembly of maize inbred B73 (RefGen_V2) was released on March 5, 2010 by the Arizona Genomics Institute. We are busy running spliced-alignment to maize EST, cDNA, PUT, and related-species protein in anticipation of releasing a new version of ZmGDB (expected May, 2010)
Manihot esculenta (cassava) genome browser
The recently-released Manihot esculenta draft genome consists of 11,243 scaffolds spanning 416Mb. PlantGDB will release a cassava genome browser incorporating EST, cDNA, PUT and related-species protein alignments as well as published gene models (expected June, 2010).
Prunus persica (peach) genome browser
The recently-released Prunus persica draft genome consists of 8 pseudomolecules (scaffolds) and an additional 194 nonlinked scaffolds spanning 227.3 Mb. PlantGDB will release a peach genome browser incorporating EST, cDNA, PUT and related-species protein alignments as well as published gene models (expected June, 2010).

Loading Help Page...Thanks for your patience!

Loading Video...Thanks for your patience!

Loading Image...Thanks for your patience!