[ssml] Finding Matches using N-term & C-term sequences

Tristan Fiedler tfiedler at rsmas.miami.edu
Wed Dec 10 16:29:11 EST 2003


Greetings and Thank you all for the information.  I am working it up
currently.

Using standard fasta files and blast queries, is it possible to indicate
sequence ambiguities such as :

RLTGVDA[KR]TEIDKLSE

where [KR] means either Lys or Arg at that position?

If possible, this would make my searching *much* simpler, since each of
the sequence fragments I am working with has a few residues which are
ambiguous.

Happy Holidays,

Tristan


>
> Dan,
>
> You mentioned the "product of p values" method for combining hits with
> one query to different sequences in the same family:
> @inproceedings{product-of-p-values,
> 	title="Classifying proteins by family using the product of correlated
> p-values",
> 	author="Bailey, Timothy L. and Grundy, William N.",
> 	booktitle=recomb99,
> 	month="April 11-14",
> 	year="1999",
> 	pages="10-14",
> 	publisher="ACM Press"
> 	}
>
> That is a useful technique, but different from what I was proposing,
> which is to combine search results from independent queries (the peptides)
> so that different queries bringing up the same sequence will strongly
> reinforce the signal for that sequence.
>
> Perhaps the best bet is to do as Joseph Bedell suggests, and
> concatenate the peptides with XXXXXXXXXX spacers, and use the already
> written multi-hit functions in BLAST.  Since the order of the peptides
> is unknown, 6 searches should be done, one for each order of the
> residues.
>
> I may be misunderstanding the problem, but I was assuming that the
> problem was to identify a protein from an organism that did NOT have a
> genomic sequencing project near completion.  Thus the need to look for
> homologs in other organisms (which may not be very similar).  If there
> is some genomic data, the full-length putative homologs may be used to
> seach the genome of the organism for a match One a putative homolog is
> found, an HMM based on its full-length sequence could be used (created
> using SAM-T2K or PSI-BLAST and HMMer) could be used for the search,
> and to identify any regions likely to be highly conserved in the
> protein.  The highly conserved regions may allow designing a primer to
> fish out the gene itself.
>
> Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
> Professor of Computer Engineering, University of California, Santa Cruz
> Undergraduate and Graduate Director, Bioinformatics
> Affiliations for identification only.
>
>


-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow - Walsh Laboratory
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626



More information about the ssml-general mailing list