Test Case Title

Gene prediction modelling for plants

Test Case Acronyme


Test Case Class


Contact person

Erik Andreasson and Hans Ronne - SLU Sweden



Test Case Description

There is no established workflow for gene modelling. We would like to test whether an optimal method to create gene models can be found for plant species. One possibility is to try different pipelines in parallel and verify and benchmark them against available organism-specific proteomics data. The idea would be to try some different plant species to see whether the same workflow suits them equally well. Arabidopsis, rice and Physcomitrella could be used as plant species because of their evolutionary divergence and availability of proteomic data. Potatoes would be of special interest for one of the actors (Erik Andreasson’s group).

Background knowledge

Recently a Nature article was published (Nat Rev vol 13:329-342) discussing various pipelines for gene prediction. This article is a good start to discuss these issues and have good suggestions of possible pipelines.

An observation from RNA-seq data related to this question is that incorrect splicing and splicing variants are common in plants. A better understanding on possibilities and drawbacks with different gene prediction workflows would help to address also this issue.

Initial state of the Test case

Public sequencing and proteomic data is available for Arabidopsis, rice and Physcomitrella. Erik Andreasson’s group has both RNA-seq and proteomics data from potato. Various tools are used for gene prediction, e.g. Augustus, Genemark.

Desired final state of the Test Case

A benchmark of different pipelines for plant gene prediction, if possible, establishing a “golden standard” for gene prediction in plants.

Test Case Work Plan

We can provide help with sequences (RNA-seq) and finding suitable, public proteomics data. For potato proteomics, data that could be used for benchmarking exists in-house.


This comparison of workflows could be very useful for the plant community and establish whereas a “golden standard” is possible or whether individual pipeline solutions should be found for each plant species.

LF: test case similar to TC14, a pipeline to benchmark gene prediction could be designed. Perhaps a contest similart to nGASP: http://wiki.wormbase.org/index.php/NGASP Invite people who organised such a contest.

