BIOINFORMÁTICA UFMG Genômica e Bioinformática BIOINFORMÁTICA UFMG 2000 Genoma completo ou morte! 1995 ESTs mesmo que redundantes O fim de uma EST BIOINFORMÁTICA UFMG Uma foto de um novo transcriptoma [otorrin...] [...damonh...] start AUG ATG cDNA (fita +) ATCATGACTTACGGGCGCGCGATxxxxxx end cDNA (fita -) AUG (A)20 (A) 18 0(T)18 cDNA (fita +) GGCGCGCGATATCCxxxx cDNA (fita -) (A)20 (A) 18 0(T)18 Vida depois de PHRED 15 BIOINFORMÁTICA UFMG Query: non trimmed read. Subj: published sequence Query: 469 TTAGGAGGATCGTTTTTAGAATCCCCTGCAACGTTACCACGGTGGATTTCACTGACTGCG 528 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1038 ttaggaggatcgtttttagaatcccctgcaacgttaccacggtggatttcactgactgcg 979 Query: 529 Sbjct: 978 Query: 589 Sbjct: 918 Query: 649 Sbjct: 858 ACGTTCTTAACGTTGAATCCAACGTTGCTACCAgggagagcctcagtaagtgcttcatga 588 ||||||||||||||||| || |||||||||||||||||| |||||||||||||||||||| acgttcttaacgttgaagcccacgttgctaccagggagaccctcagtaagtgcttcatga 919 tgcatttcgacagaattgacttcagtcgacaaaccttgcggagcaaaagtgacgaccata 648 |||||||||||||| |||||||||| |||| ||||||||||| ||||||||||||||||| tgcatttcgacagacttgacttcagccgaccaaccttgcggaccaaaagtgacgaccata 859 ccaggcttgatgataccagtttcaacgc 676 |||||||||||||||||||||||||||| ccaggcttgatgataccagtttcaacgc 831 When PHRED meets BLAST BIOINFORMÁTICA UFMG pUC18 (published sequence) Sequencing reaction: single pool distributed over 3 96-well plates 3 MegaBACE 3 reads each - 846 reads total Processing: MegaBLAST (BLASTn, SWAT) Phred – trim: a chromatogram analyzer – trim_alt: increasing trim_cutoff from 1% up to 25% 200 O fimonly de uma EST PHRED 10 (10% error): losses 100 BIOINFORMÁTICA UFMG 0 Number of bases 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% -100 -200 -300 -400 -500 Trim_cutoff parameter value(%) Included (trim) Discarded (trim) Included (TrimAlt) Discarded(TrimAlt) BIOINFORMÁTICA UFMG 30,00% Error occurrence: 16% 25,00% BIOINFORMÁTICA UFMG Trimmed reads Added bases 20,00% % error in sequence 15,00% 17% % error in the tip 10,00% 5,00% 3% 0,00% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% total miscall stepwise miscall Virtual pUC18 protein: STOP = * BIOINFORMÁTICA UFMG >protein_puc18 RQGFPSHDVVKRRPVPSLHACRSTLEDPRVPSSNS*SWS*LFPV*NCYPLTIPHNIRAGS IKCKAWGA**VS*LTLIALRSLPAFQSGNLSCQLH**IGQRAGRGGLRIGRSSASSLTDS LRSVVRLRRAVSAHSKAVIRLSTESGDNAGKNM*AKGQQKARNRKKAALLAFFHRLRPPD EHHKNRRSSQRWRNPTGL*RYQAFPPGSSLVRSPVPTLPLTGYLSAFLPSGSVALSHSSR CRYLSSV*VVRSKLGCVHEPPVQPDRCALSGNYRLESNPVRHDLSPLAAATGNRISRARY VGGATEFLKWWPNYGYTRRTVFGICALLKPVTFGKRVGSS*SGKQTTAGSGGFFVCKQQI TRRKKGSQEDPLIFSTGSDAQWNENSR*GILVMRLSKRIFT*ILLN*K*SFKSI*SIYE* TWSDSYQCLISEAPISAICLFRSSIVA*LPVV*ITTIREGLPSGPSAAMIPRDPRSPAPD LSAINQPAGRAERRSGPATLSASIQSINCCREARVSSSPVNSLRNVVAIATGIVVSRSSF GMASFSSGSQRSRRVT*SPMLCKKAVSSFGPPIVVRSKLAAVLSLMVMAALHNSLTVMPS VRCFSVTGEYSTKSF*E*CMRRPSCSCPASIRDNTAPHSRTLKVLIIGKRSSGRKLSRIL PLLRSSSM*PTRAPN*SSASFTFTSVSG*AKTGRQNAAKKGIRATRKC*ILILFLFQYY* SIYQGYCLMSGYIFECI*KNKQIGVPRTFPRKVPPDV*ETIIIMTLTYKNRRITRPFRLA RFGDDGENL*HMQLPETVTACL*ADAGSRQARQGASAGVGGCRGWLNYAASEQIVLRVHH MRCEIPHRCVRRKYRIRRHSPFRLRNCWEGRSVRASSLLRQLAKGGCAARRLSWV tBLASTn (BLASTx) maximize with PHRED 8 BIOINFORMÁTICA UFMG Variação de score usando tblastn em sequências pUC 18 trimadas com phred em diversos valores de cutoff 500 8 scorescore BLASTx 400 15 300 200 100 0 0 4 8 12 16 20 % erro value (%) Trim_cutoff parameter 24 Summarizing BIOINFORMÁTICA UFMG PHRED meets BLAST as errors in tip are 16% Molecules carry 3% global error And scores for EST vs aa comparisons maximize Real life: crossmatch ends with X’s Authors: – Fabiano Peixoto (CENAPAD) – Francisco Prosdocimi (Lab Biodiversidade) – Maurício Mudado (Lab Biodados) pUC18 proteina virtual BIOINFORMÁTICA UFMG