Functional annotation of the potato genome

The potato genome assembly is available through the Potato Genome Sequencing Consortium (v.3; http://potatogenomics.plantbiology.msu.edu/index.html). Gene predictions were done ab initio with parameters trained for A. thaliana and also based on sequence similarity with four other plant genomes (ref). Functional annotation of predicted genes were done by identifying orthologous and paralogous gene families in 12 sequenced plant species by OrthoMCL (ref). A MapMan mapping file based on sequence identity to Arabidopsis is also available.

On the 11 of July last year the sequence of the potato (Solanum tuberosum) genome (850 Mbp) was published (ref). During the sequencing project the potato genome consortium run into several problems due to sequence heterogeneity and eventually genome assembly could only be successfully done based on a homozygous doubled-monoploid potato clone (S. tuberosum group Phureja). The genome structure of this clone differs greatly from the cultivars that are commonly studied, i.e. crop potato cultivars grown for food or as starch for industrial use.

Currently, the genes predicted in the sequenced genome are relatively poorly functionally annotated and, in addition, the predicted genes and annotations from the reference genome need to be transferred to cultivars studied. The goal is that the genome sequence will advance our understanding of molecular processes in the potato and ultimately facilitate advances in breeding.

Currently no public Gene Ontology (GO) annotation is available for the potato genome. We have set up Blast2Go (ref) in order to retrieve GO terms from the Arabidopsis TAIR resource, but have encountered problems with the MySQL settings. Ideally, we would like to create a system where we retrieve functional gene information from several plant genomes based on sequence identity. The heterogeneity and tetraploidy of the potato genome also need to be taken into account and a robust, unified system for naming gene variants and alleles should be established. Importantly, gene variants and alleles related to the same gene needs to be linked and annotated appropriately, as should related transcript and protein variants.

An annotation effort is needed to make the potato genome more useful. In addition, genes should be linked in putative pathways based on sequence identity to other plants species. The heterogeneity and tetraploid nature need to be considered carefully.

