Hi Tristan, Is your only ambiguity the K or R from the tryptic digest? If so, then when using BLOSUM80 you get a positive score for either case (+6 for exact match and +2 for the similar match). So, you could put either amino acid there and it should be okay. Have you had a chance to try the XXX's in between the fragments? My tests show that this works well. Since BLAST is a local alignment tool, you don't have to worry about getting the order of the fragments correct. The 3 fragments will show up as separate HSPs but the E-value will represent the combined significance. I have attached a fasta file which has a protein plucked out of genbank ( a lectin from Pea). I have then split it into 3 short pieces (Nterm, internal_1, and internal_2) and also made one combined piece (lectin_combined) which has the 3 frags strung together with 10 X's. You can use this with a blastp search of NR and you'll see the advantage of stringing the parts together. The command line parameters I use are: blastall -p blastp -i lectin_parts.pep -d nr -M BLOSUM80 -g F -F 'mS' -W 2 Cheers, Joey >-----Original Message----- >From: ssml-general-admin at bioinformatics.org [mailto:ssml-general- >admin at bioinformatics.org] On Behalf Of Tristan Fiedler >Sent: Wednesday, December 10, 2003 3:29 PM >To: Kevin Karplus >Cc: dmb at mrc-dunn.cam.ac.uk; t.fiedler at umiami.edu; ssml- >general at bioinformatics.org >Subject: Re: [ssml] Finding Matches using N-term & C-term sequences > >Greetings and Thank you all for the information. I am working it up >currently. > >Using standard fasta files and blast queries, is it possible to indicate >sequence ambiguities such as : > >RLTGVDA[KR]TEIDKLSE > >where [KR] means either Lys or Arg at that position? > >If possible, this would make my searching *much* simpler, since each of >the sequence fragments I am working with has a few residues which are >ambiguous. > >Happy Holidays, > >Tristan > > >> >> Dan, >> >> You mentioned the "product of p values" method for combining hits with >> one query to different sequences in the same family: >> @inproceedings{product-of-p-values, >> title="Classifying proteins by family using the product of correlated >> p-values", >> author="Bailey, Timothy L. and Grundy, William N.", >> booktitle=recomb99, >> month="April 11-14", >> year="1999", >> pages="10-14", >> publisher="ACM Press" >> } >> >> That is a useful technique, but different from what I was proposing, >> which is to combine search results from independent queries (the >peptides) >> so that different queries bringing up the same sequence will strongly >> reinforce the signal for that sequence. >> >> Perhaps the best bet is to do as Joseph Bedell suggests, and >> concatenate the peptides with XXXXXXXXXX spacers, and use the already >> written multi-hit functions in BLAST. Since the order of the peptides >> is unknown, 6 searches should be done, one for each order of the >> residues. >> >> I may be misunderstanding the problem, but I was assuming that the >> problem was to identify a protein from an organism that did NOT have a >> genomic sequencing project near completion. Thus the need to look for >> homologs in other organisms (which may not be very similar). If there >> is some genomic data, the full-length putative homologs may be used to >> seach the genome of the organism for a match One a putative homolog is >> found, an HMM based on its full-length sequence could be used (created >> using SAM-T2K or PSI-BLAST and HMMer) could be used for the search, >> and to identify any regions likely to be highly conserved in the >> protein. The highly conserved regions may allow designing a primer to >> fish out the gene itself. >> >> Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus >> Professor of Computer Engineering, University of California, Santa Cruz >> Undergraduate and Graduate Director, Bioinformatics >> Affiliations for identification only. >> >> > > >-- >Tristan J. Fiedler, Ph.D. >Postdoctoral Research Fellow - Walsh Laboratory >NIEHS Marine & Freshwater Biomedical Sciences Center >Rosenstiel School of Marine & Atmospheric Sciences >University of Miami > >tfiedler at rsmas.miami.edu >t.fiedler at umiami.edu (alias) >305-361-4626 >_______________________________________________ >ssml-general mailing list >ssml-general at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/ssml-general -------------- next part -------------- A non-text attachment was scrubbed... Name: lectin_parts.pep Type: application/octet-stream Size: 501 bytes Desc: lectin_parts.pep Url : http://bioinformatics.org/pipermail/ssml-general/attachments/20031212/d0ecd480/lectin_parts.obj