Uso da bioinformática na análise genômica
ATCTCGTAGCT
ATCTCGTAGCTA
ATCTCGTAGCTAG
ATCTCGTAGCT
ATCTCGTAGCT
ATCTCGTAGCTAGC
ATCTCGTAGCT
ATCTCGTAGCTAGCT
ATCTCGTAGCT
ATCTCGTAGCTAGCTA
ATCTCGTAGCT
ATCTCGTAGCTAGCTAC
ATCTCGTAGCT
ATCTCGTAGCTAGCTACG
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGA
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGAC
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGACG
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGACGT
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGACGTC
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGACGTCT
ATCTCGTAGCT
ATCTCGTAGCTAGCTACGACGTCTA
A
G
C
T
A
C
G
A
C
G
T
C
T
A
TAGAGCATCGATCGATGCTGCAGATGATGCTAGCATCGGCTAGGCGACG
Start
End
Processamento de seqüências
30
20
cromatograma
10 acgatctcgctagctgctactgtagccgcgattattcgcgatctacgtatatcgcgatcgatc
• O programa Phred lê o cromatograma e nomeia as bases
• Cada base tem uma chance de erro de sua nomeação (10% = 0,1)
• A escala de Phred é semelhante à de pH multiplicado por 10:
- chance de erro de 0,001 = 10-3 = Phred 30
• A nomeação é praticamente aleatória no início e no final, onde a
chance de erro é alta (baixo valor de Phred)
In the Pursuit of Optimal
Sequence Trimming Parameters
for EST Projects
Fabiano C. Peixoto & J. Miguel Ortega
LCC-CENAPAD
BIOINFORMÁTICA UFMG
Noticed:
• BLAST results
• Phred 15
• Too much trimming
50
40
30
20
10
0
.TGAAGCTTTCAGCTTCTTTAGGAGGATCGTTTTTAGAATCCCCTGCAAC
Phred 15 GTTACCACGGTGGATTTCACTGACTGCGACGTTCTTAACGTTGAATCCAA
CGttGCTACCAgggagagcctcagtaagtgcttcatgatgcatttcgaca
gaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccatacc
aggcttgatgataccagtttcaacgcctcggggccaggctggcgtgaaca
gggcctagcgggtccgcgggggaagggtcccggctcaatccaccaataga
gcggagctaaagtgacgggggcgcca
Query: 469
TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979
Query: 529
Sbjct: 978
Query: 589
Sbjct: 918
Query: 649
Sbjct: 858
ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588
||||||||||||||||| || |||||||||||||||||| ||||||||||||||||||||
acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919
tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648
|||||||||||||| |||||||||| |||| ||||||||||| |||||||||||||||||
tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859
ccaggcttgatgataccagtttcaacgc 676
||||||||||||||||||||||||||||
ccaggcttgatgataccagtttcaacgc 831
Experimental approach
Sequences:
•pUC18 plasmidial vector (published sequence)
•Sequence reaction:
•Single pool - 3 plates (96 samples)
•MegaBACE sequencer
•3 reads for each plate, esd processing - 846 reads
Processing:
•BLAST (MegaBLAST, as in UniGene)
•Phred
•trim: a chromatogram analyzer
•trim_alt: trim_cutoff parameter 1% up to 25%
200
100
0
Number of bases
1%
2%
3%
4%
5%
6%
7%
8%
9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
-100
-200
-300
-400
-500
Trim_cutoff parameter value(%)
Included (trim)
Discarded (trim)
Included (TrimAlt)
Discarded(TrimAlt)
30,00%
16%
25,00%
Trim_alt sequence
17%
Additional
bases
BLAST
20,00%
gaps/missmatches
(% of bases)
15,00%
10,00%
5,00%
3%
0,00%
1%
2% 3%
4% 5%
6% 7%
8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25%
total miscall
stepwise miscall
Download

Phred - Biodados