UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA VEGETAL
MINING ICE IN GENOMES
COMPARATIVE GENOMICS OF INTEGRATIVE
ELEMENTS IN PROKARYOTIC GENOMES
Leonor Maria dos Santos Silva Tomé Quintais
MESTRADO EM BIOLOGIA MOLECULAR E GENÉTICA
2010
1
UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA VEGETAL
MINING ICE IN GENOMES
COMPARATIVE GENOMICS OF INTEGRATIVE
ELEMENTS IN PROKARYOTIC GENOMES
Leonor Maria dos Santos Silva Tomé Quintais
DISSERTAÇÃO
Projecto orientado pelo Prof. Doutor Eduardo Rocha, Microbial Evolutionary
Genomics group - Insituto Pasteur
e co-orientado Prof. Doutor Pedro Silva, Departamento de Biologia Vegetal –
Faculdade de Ciências da Universidade de Lisboa
MESTRADO EM BIOLOGIA MOLECULAR E GENÉTICA
2010
2
Outline
RESUMO DA DISERTAÇÃO
2
RESUMO
7
ABSTRACT
8
INTRODUCTION
9
DATA AND METHODS
Identification and Characterization of Elements
RESULTS AND DISCUSSION
14
18
21
Comparison of ICE and T4SS in Proteobacteria
21
Comparison of ICE and Conjugative Plasmids
24
FUTURE PERSPECTIVES
27
REFERENCES
28
1
RESUMO DA DISERTAÇÃO
Elementos Integrativos e Conjugativos
Sabe-se hoje que a transferência horizontal de genes (HGT), processo de transferência de
material genético entre organismos não relacionados, desempenha um papel fundamental
na evolução dos procariotas. Existem três formas distintas pelas quais HGT pode ocorrer:
transformação, transdução e conjugação, sendo este último processo o que se pensa
desempenhar o papel mais preponderante. Para que ocorra conjugação as células devem
estar em contacto directo, contacto este conseguido através de um complexo multi-proteico
produzido pela célula dadora e que se denomina “Mating pair formation” (Mpf). O DNA é
geralmente transferido em cadeia simples, sendo posteriormente convertido em cadeia
dupla pela maquinaria de replicação da célula receptora. Existem essencialmente dois tipos
de elementos que contribuem para este papel determinante da conjugação para a
transferência horizontal de genes, plasmídios e ICEs (elementos integrativos e
conjugativos). O estudo destes elementos reveste-se de uma importância vital, visto serem
os principais vectores de transmissão de, por exemplo, resistência a antibióticos, factores de
virulência e produção de produtos antimicrobianos (Burrus and Waldor 2004).
Este projecto incide sobre ICEs, um grupo que inclui todos os elementos que se transferem
por conjugação e capazes de se integrar no genoma, independentemente dos mecanismos
pelos quais estes dois processos ocorrem (Burrus and Waldor 2004) . Uma vez integrados,
os ICEs replicam com o genoma do hospedeiro; quando a sua excisão é induzida, estes
elementos circularizam e ocorre um passo de replicação, seguido pela transferência de uma
das cópias para a célula receptora por conjugação. Esta cópia integra-se no genoma da
célula receptora e a cópia que permanece na célula dadora pode também voltar a integrarse no seu genoma. Estes elementos apresentam portanto características de transposões,
fagos e plasmídios: como transposões, integram-se e sofrem excisão do cromossoma, mas
estes elementos não são transferidos de uma célula para outra. Como fagos temperados,
integram-se no cromossoma do hospedeiro e replicam com este, mas os fagos são
transmitidos por transdução e não por conjugação. Como plasmídios, os ICEs são
transmitidos por conjugação, mas os plasmídios não dependem do cromossoma do
hospedeiro para replicar e são mantidos como estruturas circulares extracromossómicas.
2
Os ICEs estão presentes em todas as principais divisões de bactérias e incluem, por
exemplo, elementos classificados como transposões conjugativos e ilhas de patogenicidade
(Burrus, Pavlovic et al. 2002; Burrus and Waldor 2004). Ao contrário dos plasmídios,
descobertos com o surgimento da biologia molecular e estudados desde então, o estudo dos
ICEs é recente e não se sabe quantos sistemas existem nos genomas, qual o seu tamanho
ou conteúdo génico.
A estrutura central dos ICEs é composta por três módulos: manutenção, transmissão e
regulação (Toussaint and Merlin 2002). Para além destas funções essenciais, os ICEs
contêm geralmente um grande número de outros genes que conferem potencial adaptativo
ao hospedeiro, como acima mencionado. O módulo de manutenção codifica uma
recombinase, a proteína responsável pela integração dos ICEs no genoma do hospedeiro.
As famílias de recombinases mais amplamente descritas em ICEs são as recombinases de
serina e treonina (Wang, Roberts et al. 2000). No entanto estudos recentes revelaram que
transposases do tipo DDE podem também desempenhar este papel (Brochet, Da Cunha et
al. 2009).
O módulo de transmissão codifica o sistema de conjugação, normalmente um sistema de
secreção do tipo IV (T4SS) (Cascales and Christie 2003). Existem quatro tipos principais de
T4SS, três deles com base no grupo de incompatibilidade de plasmídios conjugativos: Inc-F
(plasmídio F), Inc-P (plasmídio RP4) e Inc-I (plasmídio R64) (Lawley, Klimke et al. 2003). O
quarto tipo de T4SS, ICEHin1056, foi recentemente identificado em ilhas genómicas [8].
O módulo de regulação contém genes que regulam a transferência do elemento. Embora
pouco se saiba acerca do seu funcionamento, estudos recentes mostram que a presença de
tetraciclina ou a activação da resposta SOS induz a conjugação (Beaber, Hochhut et al.
2004).
Apesar de todos os ICEs terem uma estrutura comum, o facto de os módulos e as proteínas
por eles codificadas poderem ser muito diferentes confere-lhes plasticidade. Estes
elementos são também responsáveis pela plasticidade do genoma do hospedeiro: uma vez
que há locais de integração partilhados por diferentes ICEs e devido a estes poderem
apresentar uma ampla gama de hospedeiros, tais locais são uma fonte de variabilidade
intra-espécie e inter-géneros. Por outro lado, ICEs contêm muitas vezes genes e sequências
(tais como a recombinase e sequências de inserção acima mencionadas) que facilitam o
recrutamento de outros genes para a estrutura do elemento. Se esta integração ocorrer num
3
locus específico, um conjunto de genes transferidos horizontalmente pode ser então
conservado e transmitido entre bactérias (Burrus and Waldor 2004).
Este projecto constitui a primeira iniciativa de identificação e quantificação de ICES em larga
escala. Nesta análise foram utilizados todos os 1055 genomas procarioticos sequenciados
até à data.
Para identificar a presença de ICEs, a pesquisa centrar-se-á no mecanismo de conjugação,
T4SS, isto é, será através da identificação de homólogos de proteínas deste sistema. Esta
escolha é justificada porque esta é a característica que distingue inequivocamente os ICEs
de todos os outros elementos móveis integrados no genoma. Uma vez identificados os
elementos, as recombinases podem ser procuradas na sua vizinhança.
A subfamília de T4SS responsável pela transferência de DNA durante a conjugação
bacteriana é conhecida por Mpf/Cp (“Mating pair formation/coupling protein”). O Mpf típico
(plasmídio Ti) é composto por onze proteínas conservadas, VirB1-VirB11, que formam o
pilus que estabelece contacto entre as duas células. A “Coupling Protein”, também
designada por VirD4, tem como função promover o transporte do ICE para o sistema Mpf,
de modo a que ocorra conjugação (Cascales and Christie 2004; Schröder and Lanka 2005).
VirB4, uma das proteínas do sistema de secreção, parece estar presente nos sistemas
conjugativos de Gram-positivas e Gram-negativas, apesar destas últimas não formarem um
pilus e utilizarem adesinas para estabelecer o contacto celular (Juhas, Crook et al. 2008).
Um estudo prévio do nosso laboratório realizado em plasmídios demonstrou que em 98%
dos casos a presença de VirB4 corresponde à presença de todo o complexo (Smillie C.,
Garcillian M. et al.).
Para que a conjugação se processe é necessária uma relaxase. Em plasmídios estas
proteínas são responsáveis pela clivagem inicial do DNA, ao qual permanecem ligadas. É
este complexo de nucleoproteína que é reconhecido por VirD4 e transportado para a célula
receptora através do sistema T4SS. (Llosa, Gomis-Rüth et al. 2002).
Método
Como atrás descrito, a identificação dos ICEs baseou-se nos sistemas de conjugação tipo IV
e na relaxase. Visto que a presença de VirB4 se encontra fortemente associada à presença
de todo o complexo, a identificação dos sistemas “Mating pair formation/Coupling protein”
4
será feita através da identificação de VirB4 e VirD4. Em Proteobacteria, devido à quantidade
de dados disponíveis (representam mais de 50% de todos os genomas procarióticos
sequenciados), e ao facto de os sistemas T4SS terem sido identificados neste clade, foi
possível identificar não só estas duas proteínas mas também as proteínas específicas de
cada T4SS.
Para completar a análise, relaxases das seis famílias descritas foram também identificadas
(MOBc, MOBf, MOBh, MOBphen, MOBq e MOBv). Relaxases específicas de Firmicutes e
Bacteroidetes foram também analisadas (Flannagan, Zitzow et al. 1994; Xu, Bjursell et al.
2003).
Todas as MOBs, T4CP, VirB4 e restantes proteínas específicas dos sistemas T4SS que
utilizámos neste estudo como base para a pesquisa nos genomas foram identificadas em
plasmídios, no trabalho efectuado no nosso laboratório acima mencionado. A pesquisa foi
efectuada em 1055 genomas, disponíveis na base de dados do NCBI.
A pesquisa de homólogos das diferentes proteínas nos genomas foi efectuada por HMMER
(Durbin, Eddy et al. 1999), um programa que analisa sequencias utilizando “profile hidden
Markov models”, criando matrizes com informação especifica para cada posição – HMM
profiles. Após localização de todas as proteínas nos genomas, procedeu-se a identificação
dos sistemas T4SS. Todos os genes específicos que se encontravam a uma distância
máxima de 25 posições no genoma foram classificados como pertencentes ao mesmo
elemento. Avaliação manual foi efectuada em todos os elementos encontrados, juntando
elementos adjacentes que se complementavam. Seguiu-se a pesquisa por VirB4, T4CP e
MOB nos 25 genes a montante e a ajuzante dos elementos. Novamente, avaliação manual
foi efectuada. Esta análise só pode ser efectuada em Proteobacteria, pelo que nos restantes
organismos apenas a presença de VirB4, T4CP e MOB foi avaliada. Após concluída a
identificação dos elementos, dados obtidos por BLASTP com bom e-value foram também
avaliados. Se um cluster não estivesse completo e a proteína que o completaria, encontrada
por BLASTP (Altschul, Gish et al. 1990), estivesse próxima, seria incluída no elemento.
Estes dados foram utilizados apenas para complementar a análise principal uma vez que
resultavam num grande número de falsos positivos.
Uma vez todos os elementos identificados, procedeu-se á sua classificação. Foi identificado
o número de genes mínimo que cada sistema T4SS teria de possuir para se poder
considerar completo. Um elemento com o sistema T4SS completo, MOB,VirB4 e T4CP foi
designado de ICE. Elementos sem MOB mas com um sistema T4SS completos foram
5
designados T4SS, isto é, são sistemas exclusivamente para secreção de proteínas, pois
sem MOB não podem ser mobilizados. Elementos incompletos mas possivelmente
mobilizáveis, ou seja, que apresentavam MOB, foram classificados como pseudo-ICE
(pICE), pois parecem corresponder a ICE em processo de pseudogenização. Elementos
incompletos que não apresentavam MOB foram classificados como pseudo-T4SS – não são
mobilizáveis e não são capazes de proceder a secreção de proteínas.
Resultados - sumário
Com este trabalho chegámos a duas conclusões principais: Em Proteobacteria verificámos
uma distribuição de ICES e T4SS dependente do tamanho do genoma. Genomas médios (35Mb) e grandes (>5Mb) apresentam sobretudo ICES, e genomas pequenos (<3Mb)
apresentam sobretudo T4SS. Este facto pode ser explicado se tivermos em consideração
que este grupo de organismos inclui bactérias endossimbiontes, que utilizam os sistemas de
secreção de proteínas para mediar a interacção com os seus hospedeiros. Contudo
verificamos que em organismos com genomas pequenos também se encontra,
contrariamente ao que intuitivamente se esperaria, um elevado número de ICEs em
processo de pseudogenização.
Uma das teorias mais aceites é que os ICES seriam os principais responsáveis pela
transmissão horizontal de genes em Firmicutes, e que em Proteobacteria esse papel seria
desempenhado por plasmídios conjugativos. Com este trabalho verificamos que de facto em
Firmicutes o numero de ICES é marcadamente mais elevado que o número de plasmídios
conjugativos. Contrariamente ao esperado, também em Proteobacteria, apesar de a
diferença não ser tão acentuada, foi identificado um maior numero de ICES que de
plasmídios conjugativos. Deste modo, podemos agora afirmar que, em Proteobacteria, os
ICE parecem desempenhar um papel pelo menos tão importante como os plasmídios
conjugativos em termos de transferência horizontal de genes, ao contrário do que se
supunha até aqui.
6
RESUMO
Elementos integrativos e conjugativos (ICEs) são um grupo muito diverso de elementos
genéticos móveis, que se caracterizam por partilharem características de fagos e
plasmídios. Como fagos temperados, os ICES integram-se no genoma do hospedeiro,
estando dependentes deste para a sua replicação; como plasmídios, são transmitidos para
outras células através de conjugação. Estes elementos são portanto responsáveis por
transferência horizontal de genes (HGT) em procariotas.
A sua estrutura é composta por três módulos: manutenção, transmissão e regulação. O
módulo de manutenção codifica a recombinase, a proteína responsável pela integração dos
ICES no genoma do hospedeiro. O módulo de disseminação inclui o sistema conjugativo,
tipicamente um sistema de secreção do tipo IV (T4SS). O módulo de regulação é composto
por genes que regulam a transferência dos elementos. No entanto, apesar do estudo dos
ICE se revestir de enorme importância clínica, uma vez que transmitem características como
resistência a antibióticos, produção de factores de virulência ou mesmo produção de
biofilmes, pouco se sabe ainda acerca do seu conteúdo génico, tamanho, e que
mecanismos de integração e conjugação utilizam.
Este é o primeiro estudo que identifica ICEs em todos os genomas procarióticos
sequenciados. Nos 1055 genomas disponíveis, identificámos e caracterizámos a distribuição
de 315 ICEs.
Uma das teorias mais aceites acerca do papel dos ICEs na THG especula que estes terão
um papel dominante em Firmicutes, mas que em Proteobactérias são plasmídios
conjugativos os verdadeiros responsáveis pela transferência horizontal de genes. Utilizando
dados de um estudo prévio do nosso laboratório que caracterizou a mobilidade em
plasmídios, verificámos que esta relação não parece ser verdadeira.
Uma vez que este se trata de um estudo pioneiro, os resultados por nós obtidos podem abrir
novas portas na investigação de ICEs.
Palavras-Chave: Transferência Horizontal de Genes (HGT); Elementos Integrativos e
Conjugativos (ICE); Procariotas; Genómica Comparativa; Sistemas de Secreção Tipo IV
(T4SS).
7
ABSTRACT
Integrative conjugative elements (ICEs) are a diverse group of mobile genetic elements
characterized by their dual phage and plasmid behaviour. Like temperate phages, ICEs can
integrate into the host chromosome and replicate with it, and like plasmids they are
transferred by conjugation. These elements contribute to horizontal gene transfer (HGT) in
prokaryotes, and are responsible for the transmission of traits such as antibiotic resistance,
virulence factors and biofilm formation. Its core structure can be divided in three modules:
maintenance, dissemination and regulation. The maintenance module encodes a
recombinase, which is responsible for ICEs integration into host replicons. The dissemination
module includes the conjugating system, typically IV secretion system (T4SS). The
regulation module comprises the genes that regulate ICEs transfer. The studies on ICEs are
very recent and therefore the knowledge about their cargo content, their size and how they
conjugate and integrate into the host genome is still reduced. Therefore, studying these
elements is of vital importance.
This is the first large-scale study that identifies integrative conjugative elements in all the
sequenced prokaryotic genomes. In the 1055 available genomes, we identified and
characterized the distribution of 315 ICEs. We were also able to identify T4SS systems not
involved in conjugation, and their distribution was compared to those of ICEs. We used data
from a previous work of our laboratory, which characterised plasmid mobility, in order to
compare the T4SS systems involved in the conjugation of ICEs and conjugative plasmids.
We were able to contradict the mainstream idea of ICE being the major contributors to HGT
in Firmicutes, whereas that role was played by conjugative plasmids in Proteobacteria.
Because this is a pioneer study, the obtained results may open new avenues of reasearch in
this field.
Key-words: Horizontal Gene Transfer (HGT); Integrative Conjugative Elements (ICE);
Prokaryots; Comparative Genomics; Type-IV secretion system (T4SS);
8
INTRODUCTION
It is now widely accepted that horizontal gene transfer (HGT) has deeply shaped the
evolution of prokaryotes. The mechanisms of HGT are transformation, phage transduction
and conjugation. The latter is thought to play the major role in HGT, mostly due to both
plasmids and integrative conjugative elements (ICEs). Study of these elements is of vital
importance since their hosts are known to become resistant to antibiotics and heavy metals
(Waldor, Tschape et al. 1996; Rice 1998; Boltner, MacMahon et al. 2002; Whittle,
Shoemaker et al. 2002; Davies, Shera et al. 2009), to synthesize antimicrobial products
(Burrus and Waldor 2004) or to degrade aromatic compounds (Ravatn, Studer et al. 1998).
More complex characteristics were also reported, e.g. the colonization of new hosts (Sullivan
J.T. and Ronson C.W. 1998), virulence and biofilm formation or nitrogen fixation (Drenkard
and Ausubel 2002; He, Baldini et al. 2004; Davies, Shera et al. 2009). Especially because
they are responsible for the antibiotic resistance propagation, investigation of ICE is of great
clinical importance (Hochhut, Lotfi et al. 2001; Mohd-Zain, Turner et al. 2004). ICE capacity
of antibiotic resistance propagation, makes them an important target for clinical investigation.
The focus of the current project is on integrative conjugative elements, a diverse group
including all integrative and conjugative self-transmissible elements, independently of the
mechanisms by which integration and conjugation occurs (Burrus and Waldor 2004). They
encode not only the machinery for excision and conjugation, but also complex regulatory
systems to control these processes. ICE integrates into the host chromosome and replicates
with it, and when excision is induced they circularize, replicate and are transmitted by
conjugation to a recipient cell. The result of this process is the insertion of one copy of the
element into the new host chromosome, while the other copy which remains in the donor cell
can again be reintegrated. ICEs are characterized by their transposon, phage and plasmid
like features. Similar to transposons, they integrate into the chromosome and excise from it,
differently transposons are not transferred from one cell to another. Like temperate phages,
they integrate into the host chromosome and replicate with it, but phages are not transmitted
by conjugation. In common with plasmids they are transferred by conjugation, although ICE
are dependent on the chromosome to replicate and are not kept in the circular form.
ICEs are present in all major divisions of bacteria and include, for example, elements
classified as conjugative transposons (normally require minimal sequence specificity), such
as Tn916 (Lu and Churchward 1995), and mobile pathogenicity islands, such as ICEclcB13
9
(Ravatn, Studer et al. 1998; Burrus, Pavlovic et al. 2002; Burrus and Waldor 2004). Contrary
to plasmids, which were discovered in the early days of molecular genetics and studied ever
since, the study of ICE is comparatively recent, and consequently there is a large gap in
knowledge regarding them. Some fundamental questions are still to be answered, such as:
the number of systems existent in genomes, their size, their gene content and their
processes of conjugation and integration. Despite their importance, there are very few
studies on comparative and evolutionary genomics of ICE.
The core structure of ICE consists of three modules for maintenance, dissemination and
regulation (Toussaint and Merlin 2002). Apart from these essential functions, ICEs often
contain a large number of unrelated genes conferring adaptive changes in bacterial genome
repertoires, as mentioned above.
The maintenance module encodes the proteins responsible for integration and excision of
the ICE into host replicons, such as chromosomes or plasmids. The integration is mediated
by an integrase, which is necessary and sufficient for this process to occur. This protein is
also responsible for the excision of the element from the chromosome, but in most cases
requires the presence of other factors. Tyrosine recombinase family is the most widely
described recombinase family of ICE, and its prototypical recombinase is the λ phage
integrase. This protein recognizes identical or highly similar sequences both in the host
chromosome (the attB sites) and the phage (the attP sites), promoting site-specific
recombination without deletions or sequence duplications (Kikuchi and Nash 1979). Several
integrases from ICE, such as proteins of the SXT-R391 family, use a mechanism similar to λ
phage (Beaber, Hochhut et al. 2002), and promote the integration into the 3’ end of transfer
RNAs (tRNAs). In most of the described cases, integration occurs only in one particular
locus, even though the bacteria possess multiple alleles of the same tRNA. However,
exceptions are known such as the integrase of ICEclcB13 (Gaillard, Vallaeys et al. 2006),
which does not depend completly on the typical attB sequence (Burrus and Waldor 2003;
Lee, Auchtung et al. 2007). The tyrosine recombinase family also includes proteins with a
different origin from the λ phage integrase (Rajeev, Malanowska et al. 2009), such as the
integrase of the Tn916. This integrase presents less sequence specificity, integrating for
example in AT-rich or bent sequences (Lu and Churchward 1995). There are however
proteins responsible for integration of ICE which do not belong to the tyrosine recombinases
family. This is the case for the proteins encoded by TnGBS2, a DDE-type transposase
(Brochet, Da Cunha et al. 2009), and by Tn5397, a serine recombinase (Wang and Mullany
2000).
10
As mentioned above, the integrase is necessary but not always sufficient for the excision
process, which is required to create the circular extrachromosomal form of the ICE that is
transferred to another host. For excision to occur, the presence of recombination
directionality factors (RDF) if often required. RDFs are small DNA-binding proteins which
bias the action of integrase towards excision rather than integration by influencing the
formation of specific protein-DNA architectures (Lewis and Hatfull 2001). The ICE excision
may also be influenced by environmental factors, as shown for ICEclcB13, whose excision
increases in stationary phase (Ravatn, Studer et al. 1998).
If the host cell undergoes replication after ICE excision, the element can be lost. Therefore,
some ICEs also encode factors that prevent their own loss from the chromosome. One such
example is a homolog of Soj, a protein implicated in plasmid maintenance, present in the ICE
PAPI-1. Wild type PAPI-1 is lost in 0,16% of the cells, whereas all host cells lose the element
in the absence of the Soj homolog (Klockgether, Reva et al. 2004; Qiu, Gurkar et al. 2006).
Although the mechanism is not yet fully understood, since this protein is only expressed after
excision it has been proposed that its role is to stabilize the extrachromosomal form of the
ICE.
The dissemination module encodes the proteins responsible for the DNA processing after
excision and for the transference of the element copy. Most models of DNA processing in
ICE are derived from those of plasmids, in which the conjugative DNA processing starts with
relaxase, a protein responsible for the cleavage of the DNA at the origin of transfer, initiating
of the rolling circle replication. The relaxase remains attached to the single-stranded DNA
(ssDNA), and the resulting nucleoprotein complex is transported to the recipient cell via the
mating pore (Llosa, Gomis-Rüth et al. 2002). Since the copy number of the element in the
donor cell does not increase, ICE are regarded as not truly replicative. Even though ICE are
thought to transfer as ssDNA there are exceptions: as for plasmids, some ICEs from
Actinobacteria are transferred as double-stranded DNA (dsDNA) by a different mobilization
mechanism active in the mycelia (Grohmann, Muth et al. 2003).
The conjugation system of ICE is typically a type IV secretion system (T4SS) (Cascales and
Christie 2003). The subfamily of T4SS responsible for DNA transfer during bacterial
conjugation is known by Mpf/CP (Mating pair formation/Coupling Protein or VirD4). The
prototypical Mpf (from the Ti plasmid) consists of eleven conserved proteins, VirB1-VirB11,
which form the membrane-spanning complex and the surface pilus that establish contact with
the recipient bacteria. (Schröder and Lanka 2005). VirD4 is a NTP-binding protein that
probably plays two roles in the conjugation: initially, it is the first component of the secretion
machinery that comes into contact with the nucleoprotein complex (Cascales and Christie
11
2004), and secondly it couples it with the secretion pore formed by the Mfp system (Schröder
and Lanka 2005), where it is thought to help to energize the secretion machinery (Schroder,
Krause et al. 2002). The only protein of this complex that was found to be ubiquitous in
conjugative systems of both Gram-negative and Gram-positive bacteria is VirB4 (Juhas,
Crook et al. 2008), even though the surface proteins produced by the latter work as nonspecific adhesins instead of forming a pilus. VirB4 is an inner membrane protein that
energizes the secretion machinery (Dang, Zhou et al. 1999; Schröder and Lanka 2005).
There are four major types of T4SS, three of them were identified based on the
incompatibility group of the representative conjugative plasmids: IncF (plasmid F), IncP
(plasmid RP4) and IncI (plasmid R64) (Lawley, Klimke et al. 2003). The fourth type of T4SS,
ICEHin1056, was recently identified in genomic islands (Juhas, Crook et al. 2007). These
systems will be referred to as T4SS-F, T4SS-T, T4SS-I and T4SS-G, respectively.
The regulation modules comprise the genes that regulate ICE transfer. Although little is
known about their activity, studies have revealed induction of conjugation in the presence of
tetracycline or the activation of the SOS response by DNA damaging agents (Stevens,
Shoemaker et al. 1990; Beaber, Hochhut et al. 2004).
Although all ICEs have a common backbone, their structure is plastic, as the modules and
the proteins they encode may be very different. They are also responsible for genome
plasticity, because the same integration sites are shared by related ICEs. Since they may
have a broad host range, such sites increase the variability within both bacterial species and
genera, increasing intra-species and inter-genus locus variability. On the other hand, they
often contain genes and sequences, such as the above-mentioned recombinases and
insertion sequences, which facilitate the recruitment of other genes to the ICE backbone. If
this integration occurs in a specific locus, a cluster of horizontally transferred genes may be
conserved and transmitted between bacteria (Burrus and Waldor 2004).
The major goal of this project is to quantify and characterize the distribution of ICE in the
1055 prokaryotic genomes available. In order to identify the presence of ICEs, we will search
for the key elements of the conjugation machinery, the T4SS system and the relaxosome.
Centering our attention on conjugation is reasonable because within all integrated mobile
elements in genomes, such as prophages, and prophage-like elements, the presence of a
conjugative apparatus is the very defining feature of ICE.
12
Since we have data from previous studies of our laboratory regarding conjugative plasmids
(Smillie C., Garcillian M. et al.), and because the chosen approach allowed us to distinguish
between complete and non complete elements, we are able to ask relevant questions such
as: are there differences in the secretion systems used by ICE and conjugative plasmids? Is
there really, as hypothesized, a predominant role of ICE for horizontal gene transfer in
Firmicutes, whereas in Proteobacteria this function is essentially performed by conjugative
plasmids?
This study is also a first step towards understanding the secretion systems present in
symbionts – is it possible that they derive from ICE?
The number of sequenced genomes available is exponentially increasing, mainly due to the
next-generation sequencing techniques. But along with the creation of data, new methods for
its analysis must also be developed. The informatics tools available nowadays to treat
biological data may be the key to its efficient integration, and allow the formulation of new
questions. In this project we performed comparative genomics analysis, i.e., we used well
characterized proteins, which we knew that could allow us to discover the ICEs, as templates
to search for their homologs across entire genomes.
This is the first large-scale study that identifies integrative conjugative elements in all the
available prokaryotic genomes, and the obtained results may therefore open new avenues of
reasearch in this field.
13
DATA AND METHODS
Main objective
As described in the Introduction, we identified the presence of ICEs by searching the T4SS
system and the relaxosome. According to previous literature mentioned about, we consider
that conjugation involves the Mpf system and the transfer of ssDNA, which is brought to the
complex by the coupling protein or VirD4. Some systems found in Actinobacteria transfer
dsDNA in micelia by a different mechanism using an FtsK-like system and will not be
considered here.
A previous study from the laboratory (Smillie C., Garcillian M. et al.), in plasmids of
Proteobacteria, showed that in 98% of the cases when VirB4 in found, it corresponds to the
presence of the entire complex. Therefore, to identify the Mpf/CP we will focus on both VirB4
and VirD4.
In Proteobacteria, because of the amount of information available about the proteins that
constitute the T4SS systems, it was possible to search not only for VirB4 but also for the
other proteins that are specific of the different T4SS systems.
To complete the analyses we searched for the proteins responsible for the initialization of the
conjugative process: relaxases or MOBs (from mobilization). They are responsible for the
initial cleavage of the DNA and then remain attached to it. These form the nucleoprotein
complex that is transported to the recipient cell via T4SS system. The relaxases are
classified in six families: MOBc, MOBf, MOBh, MOBphen, MOBq and MOBv.
In addition, in Firmicutes and in Bacteroidetes, two specific MOBs of this clades were also
searched: ORF20 (YP_133675.1), from Enterococcus faecalis, and mobilization protein B
(NP_818960.1),
from
Bacteroides
thetaiotaomicron
VPI-5482
(Xu,
Bjursell
et
al.
2003),(Flannagan, Zitzow et al. 1994). The protein designations that start with an YC or NC
prefixes is in fact a RefSeq accession number, i.e., an unique identifier that classifies a
molecule (in this case a protein) in the NCBI database. Both prefixes indicate that these
molecules are proteins, and that they result from both automated processing and expert
curation. For the proteins with the prefix YC a corresponding transcript record was provided.
Data
Due to the previous study of our laboratory that determined plasmid mobility, we obtained a
data set of plasmidic VirB4, T4CP, MOBs and, for Proteobacteria, specific T4SS system
genes, that we could use to effectuate the search for ICE in the genomes.
There are four prototypical systems previously described (Cascales and Christie 2003):
Plasmid F (NC_002483) for T4SS type F (T4SS-F), Plasmid Ti (NC_002377) for T4SS type T
(T4SS-T), Plasmid R64 (NC_005014) for T4SS type I (T4SS-I) and ICEHin1056
14
(NC_008739) for T4SS type G (T4SS-G).
(T4SS G). These protein identifiers are also RefSeq
RefS
accession numbers, and the NC prefix stands for complete genomic molecules that result
from both automated processing and expert curation.
The prototypical systems above mentioned were in fact described and classified based on
the incompatibility type off the plasmids. It is important to refer that Mating pair formation and
incompatibility type are two different and independent concepts. The incompatibility type is a
propriety attributed to plasmids (both conjugative and non-conjugative),
non conjugative), often related to each
other, that are unable to stably coexist in the same cell. Even though the Mpf complexes
were identified in these plasmids and its classification derives from there (as F, T and I type),
the Mpf is a type IV secretion system implicated exclusively in conjugation.
The specific genes from the different T4SS systems that we searched for are, respectively:
TraN, TraU, traE, traH, traK, traL, traV, traW and trbC for T4SS-F
T4SS F system; virB3, virB6, virB8
and virB9 for T4SS-T
T system; traI, traK, traL, traM, traN,
tr
traP, traQ, traR,
traW
and
traY for T4SS-II system; for T4SS-G
T4SS G system, ICEHIN1056_000310, ICEHIN1056_000410,
ICEHIN1056_000440, ICEHIN1056_000510 and ICEHIN1056_000520.
It is important to refer, however, that in the T4SS-I
T4SS I system the role of VirB4 is played by
another ATPase, TraU, with very low sequence similarity with VirB4.
The sequences with more than 95% of similarity were removed to facilitate the analysis,
using a end-gap
gap free global alignment BALI.
BALI
We search for these proteins in the 1055 genomes
genomes available at the NCBI database when we
started the work. Figure 1 shows the clade distribution of these genomes.
Figure 1. Frequency of organisms from each clade in our 1055 genomes database.
15
Methodological essays
Based on the previous study of our laboratory, which inferred mobilization and conjugation of
plasmids, we initially tried to find ICE using PSI-BLAST (Position specific iterative BLAST)
(Altschul, Madden et al. 1997) to search for MOBs, and BLASTP (Basic Local Alignment
Search Tool for protein) (Altschul, Gish et al. 1990) to search for both VirB4 and T4CP. Since
PSI-BLAST, due to its matrix based proprieties, searches for distant relatives, this was the
most powerful method in that it is able to identify very weak similarities. However, it is also
the more error prone, for it can create many false positives. If in the end of an iteration
several of the distant relatives of the protein of interest are found and included in the matrix,
the next round of iteration will obtain even more evolutionarily distant proteins, and so forth.
Therefore, the ideal situation is that the results converge, i.e., that in one given round of the
iteration only proteins that had already been found are retrieved, terminating the search.
As an initial approach, we effectuated PSI-BLAST for each MOB family in all the genomes
available. Several thousands of proteins were found by this methodology, but only the MOBc
family converged. It was not possible to use these results since there was not a clear way to
decide, for each family of MOBs, how many iterations to accept. As a result this methodology
was abandoned, and new methods were tried.
We therefore developed another approach to retrieve fewer false positives. The first attempt
was to use BLASTP, since this was the methodology proposed to search the other proteins.
The use of a single well-known method would simplify the whole subsequent analysis.
Because this method was not used in the previous study on plasmid data, we had to first
validate it. Plasmids were used as a control study, and a BLASTP that allowed a maximum
e-value of 0.1 was performed using all the protein of the different families of MOBs with less
than 95% of similarity. The objective was, since we knew already which proteins to find, to
define for each family a way to distinguish the true from the false positives. Since we had
several proteins from the same family, the idea was to count the number of times a protein in
the family was found in all the BLASTPs with the plasmid data. Taking MOBf as an example,
we performed BLAST only with those proteins with less than 95% of polipeptide sequence
similarity. By doing this we exclude proteins that are highly similar and that would likely
simplify the analysis. From the 155 MOBf proteins identified in plasmids only 76 have less
than 95% of similarity, and therefore 76 BLASTP were performed. If, say, all the true
positives were found in at least 60 of these BLASTs and the false positives always found less
times, then we could use 60 as a cut-off and, when using this methodology in the genomes,
admit that all the proteins found at least 60 times were true MOBf, and the ones found in less
BLASTs were false positives. Figure 2 shows an example of the search results obtained in
plasmids, where we have enough data to distinguish between true and false positives. Even
16
though all 155 real MOBf were found, we also retrieved other 161 proteins, some of them
MOBs from different families – more than 50% are false positives. We then tried to define a
threshold to separate these two types of proteins.
Figure 2. BlastP results of MOBf – number of times a protein is found in 76 performed BlastPs.
Distribution of the 155 true positives (A) and 161 false positives (B) obtained, according to the number
of times they were retrieved by the 76 independent BlastP effectuated using plasmidic MOBf as
queries.
As we can see in figure 2A, the true positives are mainly found in at least 65 of the 76
BLASTPs effectuated; on the other hand, the false positives (Figure 2B) are mainly found in
less than 15 BLASTPs. If we define a cut-off
cut off of 65, we apparently obtain a good separation.
However, from the 155 MOBf we know that exist in plasmids,
plasmids, 23 are above this number,
meaning that almost 15% of the true MOBf would be missed in our analyses. As for the false
positives, 28 of these proteins are found by at least 65 BLASTPs, meaning that 17% of all
the proteins wrongly retrieved would be included in our results.
Even though this was not a completely satisfactory result we decided to perform the BLASTP
analysis in the genomes not only for the MOB families but also for VirB4 and T4CP, always
using the proteins with less than 95% of similarity,
similarity, and this information was kept as a
backup. We did this because in our attempts to minimize the number of false positives, some
potentially true positives may as well be lost. When it is possible incomplete elements are
retrieved by other methodologies, and when in its periphery there is a protein identified by
BLASTP with a good e-value,
value, then we could consider this protein as a true positive and
include it in the element.
17
Adopted Methodology
The program we tested next was HMMER (Durbin, Eddy et al. 1999), that analyzes
sequences using profile hidden Markov models – profile HMMs.
Profile HMMs are statistical models of multiple sequence alignments. This profiles capture
position-specific information about how conserved each column of the alignment is, and
which residues are likely to occur in a certain position. The multiple alignments of the known
proteins from plasmids necessary to create the HMM profiles were obtained with MUSCLE
(Edgar 2004), and for each type of protein that we searched for – T4SS specific proteins,
MOBs, VirB4 and T4CP – all the plasmidic proteins with less than 95% identity were used to
create the HMM profile. The search of these profiles in the database was then performed as
a “glocal” alignment, i.e., global with respect to the profile, so that we know that all the protein
must align, but local with respect to the sequence.
In the control tests with plasmids, allowing e-values up to 0.1, this method resulted in really
few or even none false positives. When used in the complete genomes, the proteins obtained
were in the expected order of magnitude, and hence we decided to adopt this methodology.
The utilization of HMMER had not been considered before because this is a slow
methodology. The new version of HMMER is faster (that was not available at the time of the
analyses), but does not perform “glocal” alignments.
Identification and Characterization of Elements
When all the proteins of interest were localized in the genomes, we started the identification
of the possible elements. The first step was to verify if the T4SS specific genes were near
each other, forming possibly functional T4SS systems. In order to do this, an awk scrip
clustered the proteins localized in the genome up to 25 genes apart. The first step was to list
all the specific genes according to the position they occupy in the genome. For each
identified gene, if its distance to the previous gene in the list was 25 or less positions, these
two genes were clustered together. Since in organisms such as the ones from genus
Rickettsia the conjugation systems are known to be scattered throughout the genomes
(Weinert, Welch et al. 2009), a manual curation was required to join the different clusters
formed automatically in order to create the complete one. On the other hand, some clusters
that were not complete and presented the complementary genes within more than 25
positions were also merged together.
18
Once concluded the identification of the T4SS systems, we searched within 25 genes
upstream and downstream of the clusters for VirB4, T4CP and MOBs. Also these results
were manually curated in order to include proteins that, if not within 25 genes in the genome,
were close enough to be considered part of the element. It is important to remember that the
T4SS systems were described in detail only in Proteobacteria, and therefore the specific
genes were searched only in this clade. In the other organisms the elements were classified
according to the presence or absence of only VirB4, T4CP and MOB.
We decided to complement this analysis with the results from the less restrictive
methodology, BLASTP. For this, we searched for proteins located in the genome up to 25
positions apart and manual curation was performed as described above. This allowed us to
create more complete elements, because we know that some proteins may not be retrieved
with the previous methodology, which is more conservative.
A global view of our results, however, made us realise that some of the elements with
apparently functional T4SS systems were incomplete, lacking for example a T4CP or a MOB.
This could indicate that a pseudogenization process was occurring, and therefore the
element was no longer functional. In order to understand if this assumption was true, we tried
to identify possible pseudogenes in the vicinity using TBLASTN (Altschul, Gish et al. 1990).
This program uses BLASTP to compare a protein sequence against a database of nucleotide
sequences translated in all six reading frames. The database used to search for
pseudogenes of the different proteins was created using the nucleotidic sequences that
covered 50 Kb upstream and downstream of the elements lacking that given protein. The fact
that we found pseudogenes with this method does not influence the classification of the
elements, since are likely to code for non-functional proteins; it only helps to consolidate the
idea that element was indeed an ICE. There is also the possibility that these pseudogenes
are in fact the product of sequencing errors, but it was beyond the scope of this work to resequence such loci.
In first classed the T4SS systems of the Proteobacteria. For each of the four systems we had
several elements with all or near all the proteins we searched for, and elements with very few
of those proteins. This clear bimodal distribution of our hits suggested that there was a
minimum number of proteins required to the system to be functional, so that if some genes
were lost the system would no longer be functional and the other genes would be rapidly lost
as well, leading to us finding only near complete or really degraded systems. Also, isolated
genes might simply be false positives. We identified five genes that seem to be present in the
vast majority of known T4SS-F and absent in the other systems, four for the T4SS-G, also
four for the T4SS-I and three for the T4SS-T.
19
Table 1. Element classification in Proteobacteria.
MOB
Complete T4SS
VirB4/TraU
T4CP
Classification
+
Yes
+
+
ICE
+
No
+/-
+/-
pICE
-
Yes
+
+/-
T4SS
-
No
+/-
+/-
pT4SS
After the classification of the T4SS we evaluated the presence or absence of a MOB. If an
element has a MOB it can be mobilized, and can be or have been an ICE. In other words, if
an element presents a complete T4SS, a MOB, a T4CP and a VirB4 is considered an ICE; if
it has a MOB and some but not all of the other components, we consider it a pseudo-ICE
(pICE), since it presents the relaxase and, even if not complete, a conjugation system. If
there is no MOB in an element, then it can only function as a protein secretion system, a
T4SS or, if not complete, a pseudo-T4SS (pT4SS). This classification is summarized, for the
Proteobacteria clade, in Table 1.
For organisms other than Proteobacteria, where we have less information, the classification
is based only in the presence or absence of VirB4, T4CP and MOB. If all the three were
present, the element is considered an ICE; if one of these proteins is absent, the element is a
pICE. In this analysis, Archaea are the exception, since no MOB is known in these
organisms. Therefore, if in an element we did not find a MOB this fact is not enough for us to
state that there is none, and we consider it to be an ICE-A (ICE from Archaea).
All the programming was performed in UNIX, and the statistical analyses with JMP.
20
RESULTS AND DISCUSSION
Complete and Non Complete Elements
Using the classification defined in Data and Methods we found 652 elements: 315 ICE, from
which 198 in Proteobacteria,, 243 pICE, from which 114 in Proteobacteria,, and 67 T4SS and
27 pT4SS, only in Proteobateria since these organisms are the only ones were we can
search for the T4SS specific genes.
genes The clade Proteobacteria includes more than half of the
elements we found but it is important to keep in mind that, as shown in Figure 1, this clade
represents more than 50% of the available genomes. These numbers are, therefore, not
enough per se to take
e any conclusions regarding higher frequency of ICE in Proteobacteria.
Proteobacteria
Comparison of ICE and T4SS in Proteobacteria
Biased distribution of ICE and T4SS
Figure 3. Number of ICE and T4SS present in each genome length category. There are, in
Proteobacteria,, 220 genomes with less than 3 Mb (small genomes), 229 genomes with 3 to 5 Mb
(medium genomes), and 98 genomes with more than 5 Mb (large genomes). The distribution of the
number of elements is dependent from the length of the genome (pvalue
(
of qui-square
square test is
<0.0001).
The first question we tried to answer was if there were differences in the distribution of ICE
and T4SS in Proteobaceria.. The approach was to divide the genomes into three categories small (less than 3 Mb), medium (between
(between 3 and 5 Mb) and large (more than 5 Mb) - and
compare the number of ICE and T4SS between them. As shown in Figure 3 we observed a
biased distribution, with clear predominance of ICE in medium and large genomes, and the
21
inverse in small genomes, where not only there are significantly less ICE than in the other
categories, but also more T4SS. As a result, in the small genomes there is a predominance
of T4SS, contrary to the remaining genomes. A qui-square
qui square test confirms that the differences
are statistically significant, and that the distribution of the elements is indeed dependent from
the size of the genomes.
This result was somewhat expected. Organisms with smaller genomes tend to require
symbiotic relationships and endure little horizontal transfer. Such organisms use T4SS
systems in their interactions
eractions with eukaryotic hosts,
hosts, by exporting effectors to the eukaryotic
cytoplasm. On the other hand, reduced horizontal transfer leads to fewer ICE.
Are T4SS derived from ICE?
Figure 4. Number of complete and non complete elements in each genome length category.
We have shown a biased distribution of the T4SS systems and ICE. In larger genomes, the
constraints of carrying non functional elements are lower, so it is not unexpected to see pICE
and pT4SS in such organisms. In smaller genomes, however, the constraints are much
higher, and as we can see in Figure 4, the number of pT4SS is really reduced. The number
of pICE, however, is the highest of the three length categories. Unlike medium and large
genomes, in the smaller genomes a T4SS>pICE>ICE relation is observed. This observation
leaded us to the question: is it possible that T4SS are retained in these organisms, after the
remaining ICE is degraded?
We therefore focused our attention in the organisms that
that seems to unbalance this equilibrium
– the symbionts.
22
Since we do not have habitat information for all the genomes, we performed this first analysis
with a rough selection of animal endosymbionts, the family Ricketsiaceae. Table 2 presents a
more detailed list of elements found in these organisms (15 genomes) and in the other
organisms with small genomes (205 organisms).
Table 2. Elements Present in Organisms with Small Genomes
As we can see in Table 2, 24 of the 25 pICE-F
p
F present in organisms with small genomes are
in fact in the 15 genomes of the family Ricketsiaceae.. At a closer look to these elements, we
realized that these pICE-F
F were highly similar between them, with the same orientation and
the same genes missing.
g. This result may point to functionality, being either an ICE or a
complete T4SS. If ICE, this would be the first F-type
F
ICE described in these organisms, and
the reason why this particular type is retained could be investigated. This option, however,
contradicts
tradicts the intuitive thought of organisms with small genomes having preferentially less
ICE, since they have really restrict interaction with their hosts and not with other bacteria. If
T4SS, these would be the firsts T4SS-F
T4SS systems ever described that do
o not play a role in
conjugation. With the data available, however, we cannot yet decide which of the hypotheses
is correct, and we keep the classification as pICE.
23
We can observe that, excluding the 25 pICE-F that appear to a biological role rather than
being pseudogenized ICE, we observe the relation T4SS > pICE > ICE, with respectively 12,
6 and 4 elements, whereas in the remaining organisms with small genomes the ICEs are the
second most abundant elements.
In Ricketsiales, even though the horizontal gene transfer is really reduced, the genes of the
T4SS system are proven not to be result of vertical transference (Weinert, Welch et al.
2009). Given this discovery and the relation T4SS>pICE>ICE that we observed in
Ricketsiaceae, our theory of transition from ICE to T4SS seems plausible.
This study may however be improved by using other endosymbionts. Therefore, more data is
needed regarding the habitat and the bacteria-host interactions.
Comparison of ICE and Conjugative Plasmids
Predominant Role: ICE in Firmicutes, Conjugative Plasmids in Proteobacteria?
One idea often conveyed in the literature, but not yet statistically tested, is that horizontal
transfer is most frequently caused by conjugative plasmids in Proteobacteria, and by ICE in
Firmicutes. Since we have data from both conjugative plasmids (previous work of our
laboratory) and ICE in both clades, we can test this hypothesis.
In order to make this analysis we need to be sure that the data-sets are comparable,
because if in one of the clades the frequency of plasmids is lower, this fact could alone
explain the presence of less conjugative plasmids in that clade.
Several plasmids were sequenced by its intrinsic biological interest and not with its
correspondent genome, and if we used all the sequenced plasmids to make the comparisons
with ICE the analysis would be plasmid biased. As an example, there are 957 plasmids of
Proteobacteria available but only 406 were sequenced with the genomes - these are the
plasmids that we are going to include in our analysis. Therefore, we first selected only the
plasmids that were sequenced with the genomes.
In Proteobacteria we have 547 sequenced genomes and 406 plasmids, and in Firmicutes we
have 178 genomes and 119 plasmids. Even though the amount of data is significantly
different, the frequency of genomes and plasmids in both clades is comparable (pvalue of
qui-square test is 0,4396). Therefore, plasmids are equally represented in the two clades and
we can proceed with the analysis.
24
Figure 5. Proportion of ICE and Conjugative Plasmids in Proteobacteria and Firmicutes.
Firmicutes In the y
axis is shown the proportion of ICE and conjugative Plasmids according to the rule ICE / (ICE +
Conjugative Plasmids): 1 means only ICE, 0 means only Conjugative Plasmids. The x axis represents
the amount of data, larger for Proteobacteria as expected.
The second step is to verify if in fact there are more ICE than conjugative plasmids in
Firmicutes.. With our study we found 50 ICE in this clade, and only 3 of the 119 plasmids
were classed as conjugative. We observe, therefore, 16.6 times more ICE than ICE in
Firmicutes (proportion shown in Figure 5), in agreement with the hypothesis that we wish to
test.
The third step is to understand if Proteobacteria have indeed a significantly larger number of
conjugative plasmids than ICE. As we can see by Figure 5, the answer to this question is no.
In fact, we found 198 ICE in these organisms, and only 110 conjugative plasmids. Therefore
our data suggests that ICE are more frequent than conjugative plasmids in both clades,
albeit the difference is much more
e important in Firmicutes.
It is important to note, however, that in culture the segregation of plasmids is thought to be
higher than that of ICE. So it is possible that, even with the precautions taken, the data is
biased because at the moment of sequencing
sequencing the plasmid has already been lost. In any
case, the data suggests that at best ICE and conjugative plasmids have comparable
frequency in Proteobacteria.
25
Are there differences in the T4SS systems of Conjugative Plasmids and ICE?
As a final question, since in Proteobacteria we were able to distinguish between the different
T4SS systems present in both conjugative plasmids and ICE, we can try to understand if
there are differences in their distribution.
Indeed, we show that T4SS-T
T is not only the predominant type in conjugative plasmids and
ICE, but is also equally distributed in both elements (Figure 6, pvalue of Fisher’s exact test is
0.8071). There is, however, an important difference in the frequencies
frequencie of T4SS-F
F and T4SST4SS
G (pvalue of Fisher’s exact test is <0.0001 in both cases). In conjugative plasmids T4SS-F
T4SS is
the second more frequent system, a position occupied by T4SS-G
T4SS G in ICE. Since T4SS-G
T4SS is
not really well known from a molecular point of view, is not yet possible to give a biological
interpretation to these results.
Figure 6. Proportion of the different T4SS types in both conjugative Plasmids and ICE from
Proteobacteria. From the 110 Conjugative Plasmids, 69 present a T4SS-T,
T4SS
33 a T4SS-F,
F, 7 a T4SS-I
and 1 a T4SS-G.
G. From the 198 ICE, 120 present a T4SS-T,
T4SS
57 a T4SS-G, 21 a T4SS-F
F and there is
no Ice with T4SS-I.
I. The graphic shows the proportions, according to the rule ICE / (ICE + Conjugative
Plasmids). A Fisher’s exact test was performed to compare
compare each T4SS type in both samples. All
pvalues were significative (<0.0001) except the one from T4SS-T,
T4SS T, which means that this is the only
system equally distributed in both ICE and Conjugative Plasmids.
investigate the cargo content of both ICE and
It would be particularly interesting to investigate
conjugative plasmids with T4SS-T
T in order to understand if their predominant presence in
both kinds of elements is due to mechanisms that prevent loss, or if there is a more specific
and yet unknown reason for the predominance
predo
of this system.
26
FUTURE PERSPECTIVES
There are three main studies that can be performed with the ICEs identified with this project.
The first is the delimitation of the ICEs, i.e., exactly where in the genome do the selftransmissible elements begin and end. A program based in syntenic blocks (group of genes
found in the same order in different species) is currently being developed in our laboratory.
Such an approach could allow not only to define the borders of the ICEs, if the genes
surrounding the elements constitute a syntenic block, but also the definition of the ICEs
themselves, if homologous genes occupy the same position within the element, constituting
one or several syntenic blocks. One possible example would be the identification of
conserved modules across different elements. Using well characterized ICEs as a training
set, the program can be optimized and used to delimit the elements described in this work.
Such an analysis will allow the systematic study of the functions coded in the cargo regions
of ICE, which has never been achieved before in a large-scale study. This is possibly the
most clinically relevant study to be made with the obtained data.
The second possible study, already being performed in our laboratory, is a phylogenetic
analysis using the identified recombinases and T4SS systems. This will allow to frame the
evolutionary history of ICE and to test their relative relatedness with phages, plasmids and
transposons.
The third study is related with our hypothesis of the T4SS systems in organisms with small
genomes, particularly the ones of endosymbionts, being derived from ICEs. It would imply a
phylogenetic analysis of the T4SS machinery used to secrete proteins in these organisms
and the T4SS systems of ICEs.
27
REFERENCES
Altschul, S. F., W. Gish, et al. (1990). "Basic local alignment search tool." Journal of Molecular Biology
215(3): 403-410.
Altschul, S. F., T. L. Madden, et al. (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs." Nucleic Acids Res 25(17): 3389-3402.
Beaber, J. W., B. Hochhut, et al. (2002). "Genomic and Functional Analyses of SXT, an Integrating
Antibiotic Resistance Gene Transfer Element Derived from Vibrio cholerae." J. Bacteriol
184(15): 4259-4269.
Beaber, J. W., B. Hochhut, et al. (2004). "SOS response promotes horizontal dissemination of
antibiotic resistance genes." Nature 427(6969): 72-74.
Boltner, D., C. MacMahon, et al. (2002). "R391: a Conjugative Integrating Mosaic Comprised of
Phage, Plasmid, and Transposon Elements." J. Bacteriol 184(18): 5158-5169.
Brochet, M., V. Da Cunha, et al. (2009). "Atypical association of DDE transposition with conjugation
specifies a new family of mobile elements." Molecular Microbiology 71(4): 948-959.
Burrus, V., G. Pavlovic, et al. (2002). "Conjugative transposons: the tip of the iceberg." Molecular
Microbiology 46(3): 601-610.
Burrus, V. and M. K. Waldor (2003). "Control of SXT Integration and Excision." J. Bacteriol 185(17):
5045-5054.
Burrus, V. and M. K. Waldor (2004). "Shaping bacterial genomes with integrative and conjugative
elements." Research in Microbiology 155(5): 376-386.
Cascales, E. and P. J. Christie (2003). "The versatile bacterial type IV secretion systems." Nat Rev
Micro 1(2): 137-149.
Cascales, E. and P. J. Christie (2004). "Definition of a Bacterial Type IV Secretion Pathway for a DNA
Substrate." Science 304(5674): 1170-1173.
Dang, T. A., X. R. Zhou, et al. (1999). "Dimerization of the Agrobacterium tumefaciens VirB4 ATPase
and the effect of ATP-binding cassette mutations on the assembly and function of the T-DNA
transporter." Molecular Microbiology 32(6): 1239-1253.
Davies, M. R., J. Shera, et al. (2009). "A Novel Integrative Conjugative Element Mediates Genetic
Transfer from Group G Streptococcus to Other {beta}-Hemolytic Streptococci." J. Bacteriol
191(7): 2257-2265.
Drenkard, E. and F. M. Ausubel (2002). "Pseudomonas biofilm formation and antibiotic resistance are
linked to phenotypic variation." Nature 416(6882): 740-743.
Durbin, R., S. Eddy, et al. (1999). Biological Sequence Analysis : Probabilistic Models of Proteins and
Nucleic Acids, {Cambridge University Press}.
Edgar, R. C. (2004). "MUSCLE: multiple sequence alignment with high accuracy and high throughput."
Nucleic Acids Res 32(5): 1792-1797.
Flannagan, S. E., L. A. Zitzow, et al. (1994). "Nucleotide Sequence of the 18-kb Conjugative
Transposon Tn916 from Enterococcus faecalis." Plasmid 32(3): 350-354.
Gaillard, M., T. Vallaeys, et al. (2006). "The clc Element of Pseudomonas sp. Strain B13, a Genomic
Island with Various Catabolic Properties." J. Bacteriol 188(5): 1999-2013.
Grohmann, E., G. Muth, et al. (2003). "Conjugative Plasmid Transfer in Gram-Positive Bacteria."
Microbiol. Mol. Biol. 67(2): 277-301.
He, J., R. L. Baldini, et al. (2004). "The broad host range pathogen Pseudomonas aeruginosa strain
PA14 carries two pathogenicity islands harboring plant and animal virulence genes." Proc.
Natl. Acad. Sci. USA 101(8): 2530-2535.
Hochhut, B., Y. Lotfi, et al. (2001). "Molecular Analysis of Antibiotic Resistance Gene Clusters in Vibrio
cholerae O139 and O1 SXT Constins." Antimicrob. Agents Chemother. 45(11): 2991-3000.
Juhas, M., D. W. Crook, et al. (2007). "Novel Type IV Secretion System Involved in Propagation of
Genomic Islands." J. Bacteriol 189(3): 761-771.
28
Juhas, M., D. W. Crook, et al. (2008). "Type IV secretion systems: tools of bacterial horizontal gene
transfer and virulence." Cellular Microbiology 10(12): 2377-2386.
Kikuchi, Y. and H. A. Nash (1979). "Nicking-closing activity associated with bacteriophage lambda int
gene product." Proc. Natl. Acad. Sci. USA 76(8): 3760-3764.
Klockgether, J., O. Reva, et al. (2004). "Sequence Analysis of the Mobile Genome Island pKLC102 of
Pseudomonas aeruginosa C." J. Bacteriol 186(2): 518-534.
Lawley, T., W. Klimke, et al. (2003). "F factor conjugation is a true type IV secretion system." FEMS
Microbiology Letters 224(1): 1-15.
Lee, C. A., J. M. Auchtung, et al. (2007). "Identification and characterization of int (integrase), xis
(excisionase) and chromosomal attachment sites of the integrative and conjugative element
ICEBs1 of Bacillus subtilis." Molecular Microbiology 66(6): 1356-1369.
Lewis, J. A. and G. F. Hatfull (2001). "Control of directionality in integrase-mediated recombination:
examination of recombination directionality factors (RDFs) including Xis and Cox proteins."
Nucleic Acids Res 29(11): 2205-2216.
Llosa, M., F. X. Gomis-Rüth, et al. (2002). "Bacterial conjugation: a two-step mechanism for DNA
transport." Molecular Microbiology 45(1): 1-8.
Lu, F. and G. Churchward (1995). "Tn916 target DNA sequences bind the C-terminal domain of
integrase protein with different affinities that correlate with transposon insertion
frequency." J. Bacteriol 177(8): 1938-1946.
Mohd-Zain, Z., S. L. Turner, et al. (2004). "Transferable Antibiotic Resistance Elements in
Haemophilus influenzae Share a Common Evolutionary Origin with a Diverse Family of
Syntenic Genomic Islands." J. Bacteriol 186(23): 8114-8122.
Qiu, X., A. U. Gurkar, et al. (2006). "Interstrain transfer of the large pathogenicity island (PAPI-1) of
Pseudomonas aeruginosa." Proc. Natl. Acad. Sci. USA 103(52): 19830-19835.
Rajeev, L., K. Malanowska, et al. (2009). "Challenging a Paradigm: the Role of DNA Homology in
Tyrosine Recombinase Reactions." Microbiol. Mol. Biol. Rev. 73(2): 300-309.
Ravatn, R., S. Studer, et al. (1998). "Chromosomal Integration, Tandem Amplification, and
Deamplification in Pseudomonas putida F1 of a 105-Kilobase Genetic Element Containing the
Chlorocatechol Degradative Genes from Pseudomonas sp. Strain B13." J. Bacteriol 180(17):
4360-4369.
Ravatn, R., S. Studer, et al. (1998). "Int-B13, an Unusual Site-Specific Recombinase of the
Bacteriophage P4 Integrase Family, Is Responsible for Chromosomal Insertion of the 105Kilobase clc Element of Pseudomonas sp. Strain B13." J. Bacteriol 180(21): 5505-5514.
Rice, L. B. (1998). "Tn916 Family Conjugative Transposons and Dissemination of Antimicrobial
Resistance Determinants." Antimicrob. Agents Chemother. 42(8): 1871-1877.
Schroder, G., S. Krause, et al. (2002). "TraG-Like Proteins of DNA Transfer Systems and of the
Helicobacter pylori Type IV Secretion System: Inner Membrane Gate for Exported
Substrates?" J. Bacteriol 184(10): 2767-2779.
Schröder, G. and E. Lanka (2005). "The mating pair formation system of conjugative plasmids - A
versatile secretion machinery for transfer of proteins and DNA." Plasmid 54(1): 1-25.
Smillie C., Garcillian M., et al. "Unpublished."
Stevens, A. M., N. B. Shoemaker, et al. (1990). "The region of a Bacteroides conjugal chromosomal
tetracycline resistance element which is responsible for production of plasmidlike forms from
unlinked chromosomal DNA might also be involved in transfer of the element." J. Bacteriol
172(8): 4271-4279.
Sullivan J.T. and Ronson C.W. (1998). "Evolution of rhizobia by acquisition of a 500-kb symbiosis
island that integrates into a phe-tRNA gene." Proc. Natl. Acad. Sci. USA 95: 5145 - 5149.
Toussaint, A. and C. Merlin (2002). "Mobile Elements as a Combination of Functional Modules."
Plasmid 47(1): 26-35.
Waldor, M. K., H. Tschape, et al. (1996). "A new type of conjugative transposon encodes resistance to
sulfamethoxazole, trimethoprim, and streptomycin in Vibrio cholerae O139." J. Bacteriol
178(14): 4157-4165.
29
Wang, H. and P. Mullany (2000). "The Large Resolvase TndX Is Required and Sufficient for Integration
and Excision of Derivatives of the Novel Conjugative Transposon Tn5397." J. Bacteriol
182(23): 6577-6583.
Weinert, L. A., J. J. Welch, et al. (2009). "Conjugation genes are common throughout the genus
Rickettsia and are transmitted horizontally." Proc Biol Sci. 276(1673): 3619-3627.
Whittle, G., N. B. Shoemaker, et al. (2002). "The role of &lt;SMALL&gt;Bacteroides&lt;/SMALL&gt;
conjugative transposons in the dissemination of antibiotic resistance genes." Cellular and
Molecular Life Sciences 59(12): 2044-2054.
Xu, J., M. K. Bjursell, et al. (2003). "A Genomic View of the Human-Bacteroides thetaiotaomicron
Symbiosis." Science 299(5615): 2074-2076.
30
Download

MINING ICE IN GENOMES - Repositório da Universidade de Lisboa