Authors
H Imamura1; M Domagalska1; F Dumetz1; M Vanaerschot1; J Cotton3; M Berriman3; J Vermeesch2; J C Dujardin1;
1 Institute of Tropical Medicine, Antwerp, Belgium; 2 KU-Leuven, Belgium; 3 Wellcome Trust Sanger Institute
Discussion
In 2011,
the first L. donovani reference genome, based on a clinical isolate from the
Indian Subcontinent (BPK282/0cl4), was assembled using a combination of
454 and Illumina sequencing. Using the Pacbio RSII sequencer
and P5-C3 chemistry, we have re-sequenced that genome, yielding
around 616,900 post-filtered reads with an average length of 8.4 kb (131x
coverage). SMRTanalysis tools were used to assemble the reads and final
base/indel correction was carried out by ICORN using Illumina reads, while
annotations were added using Companion. The PacBio assembly resulted in 36
chromosomes and, for the first time, a full maxicircle. At the same time,
quality increased by reducing gaps from 2142 in the previous reference to 20.
Complex regions were refined like intra-chromosomal amplicons on chromosome 23
(H-locus) and 36 (MPK1) or other repetitive regions such as the HSP70 and miniexon tandem-arrays. Further
improvements were also made in telomeric regions. In addition, inversions and
miss-assemblies in the old reference sequence were corrected, reducing a
substantial number of false positive genetic variations. These improvements led
to a better gene annotation and provided new insights into previously
poorly characterized regions. Those particular features are required for a
thorough investigation of the genome stability and plasticity of the Leishmania
genome at population and single cell levels.