[ssml] Finding Matches using N-term & C-term sequences
Tristan Fiedler
tfiedler at rsmas.miami.edu
Wed Dec 10 16:29:11 EST 2003
Greetings and Thank you all for the information. I am working it up
currently.
Using standard fasta files and blast queries, is it possible to indicate
sequence ambiguities such as :
RLTGVDA[KR]TEIDKLSE
where [KR] means either Lys or Arg at that position?
If possible, this would make my searching *much* simpler, since each of
the sequence fragments I am working with has a few residues which are
ambiguous.
Happy Holidays,
Tristan
>
> Dan,
>
> You mentioned the "product of p values" method for combining hits with
> one query to different sequences in the same family:
> @inproceedings{product-of-p-values,
> title="Classifying proteins by family using the product of correlated
> p-values",
> author="Bailey, Timothy L. and Grundy, William N.",
> booktitle=recomb99,
> month="April 11-14",
> year="1999",
> pages="10-14",
> publisher="ACM Press"
> }
>
> That is a useful technique, but different from what I was proposing,
> which is to combine search results from independent queries (the peptides)
> so that different queries bringing up the same sequence will strongly
> reinforce the signal for that sequence.
>
> Perhaps the best bet is to do as Joseph Bedell suggests, and
> concatenate the peptides with XXXXXXXXXX spacers, and use the already
> written multi-hit functions in BLAST. Since the order of the peptides
> is unknown, 6 searches should be done, one for each order of the
> residues.
>
> I may be misunderstanding the problem, but I was assuming that the
> problem was to identify a protein from an organism that did NOT have a
> genomic sequencing project near completion. Thus the need to look for
> homologs in other organisms (which may not be very similar). If there
> is some genomic data, the full-length putative homologs may be used to
> seach the genome of the organism for a match One a putative homolog is
> found, an HMM based on its full-length sequence could be used (created
> using SAM-T2K or PSI-BLAST and HMMer) could be used for the search,
> and to identify any regions likely to be highly conserved in the
> protein. The highly conserved regions may allow designing a primer to
> fish out the gene itself.
>
> Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus
> Professor of Computer Engineering, University of California, Santa Cruz
> Undergraduate and Graduate Director, Bioinformatics
> Affiliations for identification only.
>
>
--
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow - Walsh Laboratory
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami
tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626
More information about the ssml-general
mailing list