Uso da bioinformática na análise genômica ATCTCGTAGCT ATCTCGTAGCTA ATCTCGTAGCTAG ATCTCGTAGCT ATCTCGTAGCT ATCTCGTAGCTAGC ATCTCGTAGCT ATCTCGTAGCTAGCT ATCTCGTAGCT ATCTCGTAGCTAGCTA ATCTCGTAGCT ATCTCGTAGCTAGCTAC ATCTCGTAGCT ATCTCGTAGCTAGCTACG ATCTCGTAGCT ATCTCGTAGCTAGCTACGA ATCTCGTAGCT ATCTCGTAGCTAGCTACGAC ATCTCGTAGCT ATCTCGTAGCTAGCTACGACG ATCTCGTAGCT ATCTCGTAGCTAGCTACGACGT ATCTCGTAGCT ATCTCGTAGCTAGCTACGACGTC ATCTCGTAGCT ATCTCGTAGCTAGCTACGACGTCT ATCTCGTAGCT ATCTCGTAGCTAGCTACGACGTCTA A G C T A C G A C G T C T A TAGAGCATCGATCGATGCTGCAGATGATGCTAGCATCGGCTAGGCGACG Start End Processamento de seqüências 30 20 cromatograma 10 acgatctcgctagctgctactgtagccgcgattattcgcgatctacgtatatcgcgatcgatc • O programa Phred lê o cromatograma e nomeia as bases • Cada base tem uma chance de erro de sua nomeação (10% = 0,1) • A escala de Phred é semelhante à de pH multiplicado por 10: - chance de erro de 0,001 = 10-3 = Phred 30 • A nomeação é praticamente aleatória no início e no final, onde a chance de erro é alta (baixo valor de Phred) In the Pursuit of Optimal Sequence Trimming Parameters for EST Projects Fabiano C. Peixoto & J. Miguel Ortega LCC-CENAPAD BIOINFORMÁTICA UFMG Noticed: • BLAST results • Phred 15 • Too much trimming 50 40 30 20 10 0 .TGAAGCTTTCAGCTTCTTTAGGAGGATCGTTTTTAGAATCCCCTGCAAC Phred 15 GTTACCACGGTGGATTTCACTGACTGCGACGTTCTTAACGTTGAATCCAA CGttGCTACCAgggagagcctcagtaagtgcttcatgatgcatttcgaca gaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccatacc aggcttgatgataccagtttcaacgcctcggggccaggctggcgtgaaca gggcctagcgggtccgcgggggaagggtcccggctcaatccaccaataga gcggagctaaagtgacgggggcgcca Query: 469 TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979 Query: 529 Sbjct: 978 Query: 589 Sbjct: 918 Query: 649 Sbjct: 858 ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588 ||||||||||||||||| || |||||||||||||||||| |||||||||||||||||||| acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919 tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648 |||||||||||||| |||||||||| |||| ||||||||||| ||||||||||||||||| tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859 ccaggcttgatgataccagtttcaacgc 676 |||||||||||||||||||||||||||| ccaggcttgatgataccagtttcaacgc 831 Experimental approach Sequences: •pUC18 plasmidial vector (published sequence) •Sequence reaction: •Single pool - 3 plates (96 samples) •MegaBACE sequencer •3 reads for each plate, esd processing - 846 reads Processing: •BLAST (MegaBLAST, as in UniGene) •Phred •trim: a chromatogram analyzer •trim_alt: trim_cutoff parameter 1% up to 25% 200 100 0 Number of bases 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% -100 -200 -300 -400 -500 Trim_cutoff parameter value(%) Included (trim) Discarded (trim) Included (TrimAlt) Discarded(TrimAlt) 30,00% 16% 25,00% Trim_alt sequence 17% Additional bases BLAST 20,00% gaps/missmatches (% of bases) 15,00% 10,00% 5,00% 3% 0,00% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% total miscall stepwise miscall