Comparing dozens or hundreds of bacterial genomes is a challenging task. Especially if those genomes are unfinished drafts. We would like to find a tool that would compare any number of closely related genomes, identify a core set of genes in all genomes and produce per genome, a list of additional genes, a list of mutated core genes and a list of absent genes.

Background knowledge

Draft genomes are usually incomplete, the prediction of the genes and the classification is thus a difficult case. In addition when the number of draft genomes grows above 50, it renders large scale analysis probably impossible.

Initial state of the Test case

A series of draft genomes of closely related bacterial strains or species.

Desired final state of the Test Case

List of genes:

  • core genes found in all genomes
  • mutated core genes in each genome
  • additional genes in each genome
  • missing core genes in each genome
Test Case Work Plan

The participants will have to predict the genes, identify the core genome and for each genome the set of mutated, additional or missing genes. The tool should cope with hundred if not thousands of genomes…


LF: my case, I would like to invite people involved in clustering (OrthoMCL, TGICL, etc…)

