Greetings and Thank you all for the information. I am working it up currently. Using standard fasta files and blast queries, is it possible to indicate sequence ambiguities such as : RLTGVDA[KR]TEIDKLSE where [KR] means either Lys or Arg at that position? If possible, this would make my searching *much* simpler, since each of the sequence fragments I am working with has a few residues which are ambiguous. Happy Holidays, Tristan > > Dan, > > You mentioned the "product of p values" method for combining hits with > one query to different sequences in the same family: > @inproceedings{product-of-p-values, > title="Classifying proteins by family using the product of correlated > p-values", > author="Bailey, Timothy L. and Grundy, William N.", > booktitle=recomb99, > month="April 11-14", > year="1999", > pages="10-14", > publisher="ACM Press" > } > > That is a useful technique, but different from what I was proposing, > which is to combine search results from independent queries (the peptides) > so that different queries bringing up the same sequence will strongly > reinforce the signal for that sequence. > > Perhaps the best bet is to do as Joseph Bedell suggests, and > concatenate the peptides with XXXXXXXXXX spacers, and use the already > written multi-hit functions in BLAST. Since the order of the peptides > is unknown, 6 searches should be done, one for each order of the > residues. > > I may be misunderstanding the problem, but I was assuming that the > problem was to identify a protein from an organism that did NOT have a > genomic sequencing project near completion. Thus the need to look for > homologs in other organisms (which may not be very similar). If there > is some genomic data, the full-length putative homologs may be used to > seach the genome of the organism for a match One a putative homolog is > found, an HMM based on its full-length sequence could be used (created > using SAM-T2K or PSI-BLAST and HMMer) could be used for the search, > and to identify any regions likely to be highly conserved in the > protein. The highly conserved regions may allow designing a primer to > fish out the gene itself. > > Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus > Professor of Computer Engineering, University of California, Santa Cruz > Undergraduate and Graduate Director, Bioinformatics > Affiliations for identification only. > > -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow - Walsh Laboratory NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626