BIOINFORMÁTICA UFMG
Genômica e Bioinformática
BIOINFORMÁTICA UFMG
2000
Genoma completo ou morte!
1995
ESTs mesmo que
redundantes
O fim de uma EST
BIOINFORMÁTICA UFMG
Uma foto de um novo transcriptoma [otorrin...] [...damonh...]
start
AUG
ATG
cDNA (fita +)
ATCATGACTTACGGGCGCGCGATxxxxxx
end
cDNA (fita -)
AUG
(A)20
(A)
18
0(T)18
cDNA (fita +)
GGCGCGCGATATCCxxxx
cDNA (fita -)
(A)20
(A)
18
0(T)18
Vida depois de PHRED 15
BIOINFORMÁTICA UFMG
Query: non trimmed read. Subj: published sequence
Query: 469
TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979
Query: 529
Sbjct: 978
Query: 589
Sbjct: 918
Query: 649
Sbjct: 858
ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588
||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||
acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919
tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648
|||||||||||||| |||||||||| |||| ||||||||||| |||||||||||||||||
tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859
ccaggcttgatgataccagtttcaacgc 676
||||||||||||||||||||||||||||
ccaggcttgatgataccagtttcaacgc 831
When PHRED meets BLAST
BIOINFORMÁTICA UFMG
pUC18 (published sequence)
Sequencing reaction:
single pool distributed over 3 96-well plates
3 MegaBACE
3 reads each - 846 reads total
Processing:
MegaBLAST (BLASTn, SWAT)
Phred
– trim: a chromatogram analyzer
– trim_alt: increasing trim_cutoff from 1% up to 25%
200
O fimonly
de uma
EST
PHRED 10 (10% error):
losses
100
BIOINFORMÁTICA UFMG
0
Number of bases
1%
2%
3%
4%
5%
6%
7%
8%
9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
-100
-200
-300
-400
-500
Trim_cutoff parameter value(%)
Included (trim)
Discarded (trim)
Included (TrimAlt)
Discarded(TrimAlt)
BIOINFORMÁTICA UFMG
30,00%
Error occurrence:
16%
25,00%
BIOINFORMÁTICA
UFMG
Trimmed reads
Added bases
20,00%
% error
in sequence
15,00%
17%
% error
in the tip
10,00%
5,00%
3%
0,00%
1%
2%
3%
4%
5%
6%
7%
8%
9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
total miscall
stepwise miscall
Virtual pUC18 protein: STOP = *
BIOINFORMÁTICA UFMG
>protein_puc18
RQGFPSHDVVKRRPVPSLHACRSTLEDPRVPSSNS*SWS*LFPV*NCYPLTIPHNIRAGS
IKCKAWGA**VS*LTLIALRSLPAFQSGNLSCQLH**IGQRAGRGGLRIGRSSASSLTDS
LRSVVRLRRAVSAHSKAVIRLSTESGDNAGKNM*AKGQQKARNRKKAALLAFFHRLRPPD
EHHKNRRSSQRWRNPTGL*RYQAFPPGSSLVRSPVPTLPLTGYLSAFLPSGSVALSHSSR
CRYLSSV*VVRSKLGCVHEPPVQPDRCALSGNYRLESNPVRHDLSPLAAATGNRISRARY
VGGATEFLKWWPNYGYTRRTVFGICALLKPVTFGKRVGSS*SGKQTTAGSGGFFVCKQQI
TRRKKGSQEDPLIFSTGSDAQWNENSR*GILVMRLSKRIFT*ILLN*K*SFKSI*SIYE*
TWSDSYQCLISEAPISAICLFRSSIVA*LPVV*ITTIREGLPSGPSAAMIPRDPRSPAPD
LSAINQPAGRAERRSGPATLSASIQSINCCREARVSSSPVNSLRNVVAIATGIVVSRSSF
GMASFSSGSQRSRRVT*SPMLCKKAVSSFGPPIVVRSKLAAVLSLMVMAALHNSLTVMPS
VRCFSVTGEYSTKSF*E*CMRRPSCSCPASIRDNTAPHSRTLKVLIIGKRSSGRKLSRIL
PLLRSSSM*PTRAPN*SSASFTFTSVSG*AKTGRQNAAKKGIRATRKC*ILILFLFQYY*
SIYQGYCLMSGYIFECI*KNKQIGVPRTFPRKVPPDV*ETIIIMTLTYKNRRITRPFRLA
RFGDDGENL*HMQLPETVTACL*ADAGSRQARQGASAGVGGCRGWLNYAASEQIVLRVHH
MRCEIPHRCVRRKYRIRRHSPFRLRNCWEGRSVRASSLLRQLAKGGCAARRLSWV
tBLASTn (BLASTx) maximize with PHRED 8
BIOINFORMÁTICA UFMG
Variação de score usando tblastn em sequências pUC 18
trimadas com phred em diversos valores de cutoff
500
8
scorescore
BLASTx
400
15
300
200
100
0
0
4
8
12
16
20
% erro value (%)
Trim_cutoff parameter
24
Summarizing
BIOINFORMÁTICA UFMG
PHRED meets BLAST as errors in tip are 16%
Molecules carry 3% global error
And scores for EST vs aa comparisons maximize
Real life: crossmatch ends with X’s
Authors:
– Fabiano Peixoto (CENAPAD)
– Francisco Prosdocimi (Lab Biodiversidade)
– Maurício Mudado (Lab Biodados)
pUC18 proteina virtual
BIOINFORMÁTICA UFMG
Download

cDNA (fita +)