Test Case Title

Identification of large structural variations

Test Case Acronyme

NGSGAPS

Test Case Class

Plants (Animals)

Contact person

nd

Contact

nd

Test Case Description

Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality–drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions.Especially reads on the Illumina Genome Analyzer produce millions of short reads that are rather difficult to assemble. Innitiatives like the Arabidopsis 1001 genome projects has so far revealed mainly single nucleotide polymorphisms but fail to uncover large structural variations from the reference genome sequence. So far iterative mapping is a potential way to resolve the current problems, however, the time-consuming nature of this approach and requirement for manuel adjustment makes it an slow process for large-scale projects.

Background knowledge

nd

Actors

nd

Initial state of the Test case

Currently more than 100 Arabidopsis accessions have been sequenced, so data is available in databases and currently no need for the generation of new data, unless the current data has not enough quality.

Desired final state of the Test Case

Identify the large structural variations within the genomes of the different Arabidopsis accessions with the current available sequence data. Furthermore a web-resource covering this data should become publically available.

Test Case Work Plan

Project should focus on the generation of software that first of all allows for a rapid assessment of the quality of the sequence data. New ways for mapping short reads to a reference genome should be designed, allowing for more flexible gap options, variable assembling parameters that rely on the local context of the genomic region and iterative read mapping.

Discussion

LF: identifying large structural variations with unfinished genomes is rather difficult. What is possible is to remap the paired-end or better the mate-pair reads and use that information. There are a number of tools able to do that: breakdancer, Hydra, GASV, Tigra-SV, Delly. Perhaps a general assessment of these tools should be a first step of the project. We could invite the developers of those tools. Ken Chen, Aaron Quinlan, gasv@cs.brown.edu, Tobias Rausch from EMBL would probably the best.

public/loadedtestcases/tc2.txt · Last modified: 2012/09/28 15:57 by lfalquet
Trace: tc2