Authors
T E Ebenezer2; M Zoltner3; V HamplNO MATCH, M GingerNO MATCH, A JacksonNO MATCH, H de-Koning4; J Lukes1; J DacksNO MATCH, M LebertNO MATCH, M Carrington2; S Kelly5; M C Field3;
1 University of South Bohemia, UK; 2 University of Cambridge, UK; 3 University of Dundee, UK; 4 University of Glasgow , UK; 5 University of Oxford, UK
Discussion
Euglena gracilis is a major component of the aquatic
ecosystem and together with closely related species, is ubiquitous. Euglenoids are an
important group of protists, possessing a secondarily acquired plastid
and relatives to the Kinetoplastidae which themselves are global impacts
as disease agents. To understand the biology of E. gracilis, and to
provide further insight into the evolution and origins of
Kinetoplastidae, we embarked on sequencing the nuclear genome. Earlier
studies suggested an extensive nuclear DNA content, with likely a high
degree of repetitive sequence, together with significance
extrachromosomal elements. To produce a dataset of coding sequences we
combined transcriptome data from both published and new data, as well as
embarked on de novo genome sequencing using combination of 454,
Illumina paired end libraries and PacBio platforms as well as transcriptomics and proteomics analysis under light and dark adaptations. Preliminary analysis
suggests a surprising large genome approaching 2 Gbp, with highly
fragmented architecture and extensive repeat composition. Over 80 % of
the transcriptome maps to the genome, at par with T. brucei and T. cruzi.
As a view of genome architecture, we have analyzed the tubulin and
calmodulin genes together with several large genome contigs and which
highlight potential novel splicing mechanisms. Analysis of the
transcriptome and genome revealed a repertoire of E. gracilis
genes conserved across all eukaryotes (~ 55 %) and others shared with
specific eukaryotic lineages and evidence of LGT. Functional annotation
via a web interface http://euglenadb.org suggests that of 36,526
predicted proteins, 23,833 (~ 46 %) have homology to NCBI-NR entries,
while 23,866 (~ 65 %) have hits against the Interpro database. Overall,
our data suggest that the Euglena genome is a chimera with highly significant proteome changes.