Seqüenciamento e Análise de
Genomas
1
1953 – Modelo de Watson e Crick - O DNA é uma hélice
dupla em que as bases nitrogenadas interagem por
pontes de hidrogênio
2
1958 - A replicação do DNA é semiconservativa
3
Lewin, http://www.ergito.com
Seqüenciamento de DNA

Em 1977 o método de seqüenciamento
com terminadores de cadeia é
desenvolvido
Sanger F, Nicklen, S and Coulson AR.
DNA sequencing with chain-terminating
inhibitors. Proceedings of the National
Academy of Sciences (USA) 74: 5463–
5467, 1977
4
Seqüenciamento de DNA
pelo método de Sanger
5
Seqüenciamento de DNA pelo método de Sanger
6
Manual sequencing uses radiolabeled
dATP (35-S or 33-P) to label the
DNA. The sample is then split into four
tubes each with an individual ddNTP
present. The samples are then
subjected to acrylamide gel
electrophoresis followed by
autoradiography
G
A
T
C
Manual
Sequencing
7
O INÍCIO DA ANÁLISE DE
GENOMAS
8
1977 – Sanger e colegas desenvolvem a técnica de seqüenciamento
usando inibidores como terminadores de síntese de DNA. O primeiro
genoma completo é seqüenciado - o do bacteriófago phi-x174 (5375
bases).
Sanger F et al. The nucleotide sequence of bacteriophage phi-X174.
Journal of Molecular Biology 125: 225-46, 1977.
Sanger F, Nicklen S, Coulson AR. DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci U S A. 74: 5463-5467, 1977 .
1980 – O método de Shotgun é desenvolvido
Sanger e colaboradores desenvolveram o método de shotgun
para preparar moldes para seqüenciamento de DNA: esse
trabalho também apresentou o uso de vírus bacterianos
(bacteriófago M13) para clonagem e preparação de moldes.
Sanger et al. Cloning in single-stranded bacteriophage as an aid
to rapid DNA sequencing. 1980. Journal of Molecular Biology 143
161-78
9
http://www.yourgenome.org/timeline.html
Early genome sequencing projects
The ability to undertake DNA sequencing on a large scale started a
revolution in biology, as it became theoretically possible to determine the
complete DNA sequence of any organism, and thus obtain a full description
of the complete set of genes, or genome. Initially small genomes of viruses
and organelles were studied (phiX174, 5.4kb; the mitochodrial genome,
16kb; bacteriophage lambda, 50kb; Epstein Barr virus, 172kb; human
cytomegalovirus, 229kb). It became clear that the most efficient means to
determine a complete genome sequence was to break the DNA randomly
into fragments of appropriate size for sequencing, and then to reassemble
the pieces using the DNA sequence itself. Overlaps between the individual
pieces were identified on the basis of matches between independently
sequenced DNA fragments. This process, termed random shotgun
sequencing, was first applied to the whole genome of bacteriophage lambda,
and has since been applied to progressively larger projects as the speed of
sequencing and the power of computers to undertake the assembly process
has increased.
http://www.sanger.ac.uk/HGP/draft2000/early.shtml
10
1981 - IBM introduz o primeiro computador de uso
pessoal – ele tinha 100 KB de memória, um driver de
disco flexível e custava cerca de $3000
1982 - Compaq introduziu o primeiro computador
portátil – sem disco rígido e custava cerca de $3000
1982 – O primeiro genoma é seqüenciado pela metodologia
de shotugun de genoma total (whole genome shotgun WGS)
Sanger e colaboradores seqüenciam o genoma do
bacteriófafo Lambda usando a técnica de shotgun.
Sanger F et al. 1982. Nucleotide sequence of bacteriophage
lambda. Journal of Molecular Biology. 162: 729–73
11
Tamanho do genoma – 48.502 pb
1986 – O primeiro protótipo de uma máquina
para seqüenciamento de DNA é apresentado.
1987 -Apple introduz o Mac II. Com um
disco rígido de 40MB e custo de $5500
1987 - Olson e colegas desenvolvem o método para
clonagem de longas regiões de DNA do genoma
(100.000–200.000 pares de bases)
Burke et al. Cloning of large segments of exogenous
DNA into yeast by means of artificial chromosome
vectors. Science 236 806-812, 1987.
12
1989 – O primeiro sistema de hipertexto – a
base da world wide web – foi desenvolvido no
European Organisation for Nuclear Research
(CERN )
1989 – Criação da Human Genome
Organisation (HUGO)
HUGO, uma associação internacional de
pesquisadores envolvidos com o Projeto
Genoma Humano foi criado para ajudar
a coordenar as atividades e auxiliar na
distribuição dos dados e recursos.
1990 – Apresentação, da Proposta formal do Department
of Energy e National Institutes of Health, USA, para
13
seqüenciar o genoma humano.
1992 - Sulston e Waterston solicitam recursos da
Wellcome Trust para seqüenciar 40Mb do genoma
humano – proposta de cinco anos.
Wellcome Trust e Medical Research Council, UK,
associam-se ao Projeto Genoma Humano
1994 -Mosaic Communications – depois
Netscape Communications – é criado
1995 – O primeiro genoma de um microrganismo de
vida livre é seqüenciado
O genoma de Haemophilus influenzae – 1.830.137
pares de bases – é seqüenciado pelo The Institute for
Genomic Research (TIGR)
14
1997 - Seqüenciador automático
MegaBace 1000 (96 capilares)
5 corridas/dia = ~200.000 pb
1998 – O seqüenciador ABI
Prism 3700 é lançado, com
capacidade para analisar oito
conjuntos de 96 reações por dia
=~308.000 pb
2001 - Seqüenciador automático
MegaBace 4000 (384 capilares)
Parte de um eletroferograma gerado no seqüenciamento 15
automático de DNA
The MegaBACE 1000 DNA Sequencing
System features:
· Capillary electrophoresis with automated gel matrix replacement, sample
injection, DNA separation, and base calling. Unlike slab-gel systems, the
automation of gel replacement eliminates the need to pour gels, wash glass plates,
and re-track samples after electrophoresis.
· Electrokinetic sample injection automatically loads samples from a 96-well plate
into the capillaries. Automated sample injection eliminates manual sample loading
and sample mix up.
· Energy transfer (ET) dye chemistry kits are the most robust and sensitive kits
available. DYEnamic™ ET primers and DYEnamic ET terminator kits are
formulated with Thermo Sequenase™ DNA polymerase and Thermo Sequenase II
DNA polymerase, respectively, and are optimized for use with the MegaBACE.
· The MegaBACE 1000 uses linear polyacrylamide (LPA) separation matrix. LPA is
an innovation in capillary electrophoresis sieving matrices that enables read lengths
in excess of 800 bases. With long read lengths; MegaBACE 1000 is ideal for
finishing.
· Flexible, user-friendly analysis software produces accurate base-calling data in a
PHRED-compatible file format.
16
DYEnamic ET Terminators
The dye terminators feature novel dye-labelled dideoxynucleotides and a new
DNA polymerase. Each dideoxy terminator is labelled with two dyes. One of
these dyes, fluorescein, has a large extinction coefficient at the wavelength
(488 nm) of the argon-ion laser in the instrument. The fluorescein donor dye
absorbs light energy from incident laser light and transfers the collected
energy using radiationless energy transfer to an “acceptor” dye.
Each of the four chain terminators, ddG, ddA, ddT, and ddC, has a different
acceptor dye coupled with the fluorescein donor. The acceptor dyes then emit
light at their characteristic wavelengths. The fluorescence is detected by the
instrument, which allows the nucleotide that caused the termination event to
be identified. Using energy transfer provides a more efficient excitation of the
acceptor dyes than does using direct excitation by the laser, and results in a
sequencing method that is very sensitive and robust.
The acceptor dyes are the same standard rhodamine dyes used in DYEnamic
ET primers: rhodamine 110, rhodamine-6-G, tetramethyl rhodamine, and
rhodamine X. By using the standard rhodamine dyes as acceptors, the
reaction products can be detected using the same filter set as the DYEnamic
ET primers.
The DYEnamic ET primers for the MegaBACE system are 5-carboxy-fluorescein
(FAM) as the donor dye. The acceptor dyes are rhodamine 110 (R110) (C), 6carboxyrhodamine (REG) (A), N,N,N’,N’,-tetramethyl-5-carboxyrhodamine 17
(TAMRA) (G) and 5-carboxy-X-rhodamine (ROX) (T).
Crescimento no número de
seqüências no GenBank
Year
Base pairs
Sequences
1982
680,338
606
1994
217,102,462
215,273
1995
384,939,485
555,694
1996
651,972,984
1,021,211
1997
1,160,300,687
1,765,847
1998
2,008,761,784
2,837,897
1999
3,841,163,011
4,864,570
2000
11,101,066,288
10,106,023
2001
15,849,921,438
14,976,310
2002
28,507,990,166
22,318,883
2003
36,553,368,485
30,968,418
2004
44,575,745,176
40,604,319
Revised: February 16, 2005.
18
Projetos Genoma
Vários projetos de seqüenciamento de
genomas bacterianos foram ou estão sendo
desenvolvidos no mundo, enfocando as mais
diversas espécies de organismos como
Mycobacterium tuberculosis, Vibrio cholera e
inúmeras outras bactérias (248 já concluídos
e mais de 402 em andamento);
Vários genomas eucarióticos também já
foram seqüenciados.

19
Genome sequencing projects statistics
Organism
Prokaryotes
Archaea
Bacteria
Eukaryotes
Animals
Mammals
Birds
Fishes
Insects
Flatworms
Roundworms
Amphibians
Reptiles
Other animals
Plants
Land plants
Green Algae
Fungi
Ascomycetes
Basidiomycetes
Other fungi
Protists
Apicomplexans
Kinetoplasts
Other protists
total:
Complete
248
22
226
19
4
2
1
Draft assembly
127
3
124
54
20
7
1
2
6
1
2
2
2
2
9
7
1
1
4
2
2
267
25
21
3
1
9
5
1
3
181
In progress
275
11
264
131
61
18
2
25
2
3
1
12
19
15
4
20
16
2
2
27
9
3
14
406
total
650
36
614
204
85
27
1
4
32
2
6
1
0
14
21
17
4
54
44
6
4
40
14
6
19
854
Revised: Aug 16, 2005
http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html
20
Genomas eucarióticos
human
C. elegans
Plasmodium falciparum
Arabidopsis thaliana
21
Schizosaccaromyces pombe
Drosophila melanogaster
Whole Genomes
The Draft Genome of Ciona intestinalis: Insights into Chordate and
Vertebrate Origins Paramvir Dehal et al.Science, 13 December 2002, p.
2157-2167
The Genome Sequence of the Malaria Mosquito Anopheles gambiae
Robert A. Holt et al. Science, 4 October 2002, p. 129-149
Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu
rubripes Samuel Aparicio et al. Science, 23 August 2002, p. 1301-1310;
published online July 25, 2002, 10.1126/science.1072104
A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica)
Jun Yu et al. Science, 5 April 2002, p. 79-92
A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica)
Stephen A. Goff et al. Science, 5 April 2002, p. 92-100
Miniature Genome in the Marine Chordate Oikopleura dioica
Hee-Chan Seo et al. Science, 21 December 2001, p. 2506
22
Genoma humano
Nature 431: 931 (2004)
23
Estratégias para Seqüenciamento de
Genomas

Shotgun: o genoma é fragmentado ao acaso em pedaços
pequenos (2,0 – 4,0 kb) que se sobrepõem. Esses
fragmentos dão clonados num plamídeo (pUC18). Um
número suficiente de clones é seqüenciado para gerar um
conjunto de seqüências que corresponde a várias vezes o
genoma (aproximadamente 10X). As seqüências (reads)
são montadas com auxílio de programas de computador
em regiões genômicas completas.

Clone a clone: um conjunto de clones de cada
cromossomo ou região do genoma é organizado, cada
clone é seqüenciado e as seqüencias de cada clone são
montadas num segmento maior.
24
Estratégias para Seqüenciamento
Shotgun
Clone a Clone
25
Cobertura do genoma
C= N.r/G
C= cobertura do genoma
N= número de reads
r= tamanho médio dos reads
G= tamanho do genoma
C= 8X
N= número de reads
r= 400 bp
G= 1 Mb
10=N.400/106
N= 20.000
Genome coverage: average number of times a nucleotide is represented by a highquality base in random raw sequence.
Full shotgun coverage: genome coverage in random raw sequence required to
produce finished sequence, usually 8-10 fold.
Partial shotgun coverage: typically 3-6X random coverage of a genome which
produces sequence data of sufficient quality to enable gene identification but which is
not sufficient to produce a finished genome sequence
Contig: contiguous DNA sequence produced from joining overlapping raw sequence
reads.
Scaffold: a group of ordered and orientated contigs known to be physically linked to
each other by paired read information.
26
Finished sequence: complete sequence of a genome with no gaps and an accuracy
of
> 99.9%.
Estratégia de seqüenciamento shotgun
DNA genômico
Seleção por tamanho
Reparacão das pontas
Ligacão no vetor
Biblioteca
Seqüenciamento
Montagem
Contigs
27
Fechamento e anotação
Estratégia para Fechamento
PACE - PCR-Assisted Contig Extension
28
Carraro et al. BioTechniques 34:626-632 (March 2003)
29
Lab Bioinformática – LNCC
30
Dra. Ana Tereza Vasconcelos
Diagrama mostrando as etapas de um projeto de seqüenciamento
31
de genoma por “shotgun”. Fraser et al. (Nature 406, 799 – 803,
2000)
O genoma de H. inflluenzae
tem 1,83 Mb e contém 1740
open reading frames (ORFs)
O genoma de M. genitalium tem
580.074 bp e contém 470 ORFs
32
Download

Uma breve história da genética