Hi Tristan, >-----Original Message----- >From: Tristan Fiedler [mailto:tfiedler at rsmas.miami.edu] >Sent: Friday, December 12, 2003 10:32 AM >To: Joseph Bedell >Subject: Thanks for the lectin example > >Hi Joey, > >1. If the fragments in the query sequence are in an incorrect order, but >as you indicated, give a single HSP (presumably connected by hashed lines >in the graphical output), does this mean that only 1 entry (ie line) will >exist in the blast output table entitled "Sequences producing significant >alignments:" ? Whoops, turns out I was wrong about the order issue. Kevin was right, you need to get them in the correct order so that they are "consistent" HSPs. At that point BLAST will do its SUM magic and combine them for a better E-value. The good news is you know which is the N-terminal one so you just have to switch around the other 2 fragments (for a total of 8? or do you know the N-term vs. C-term for those tryptic digests?) Oh, and they won't give a single HSP, it will be 3 different HSPs, but the E-values will be the same b/c the significance will have been combined. >2. I have read up in your book on gapped/ungapped alignments, and do not >understand, (assuming the correct order of sequence fragments in the query >and also assuming a true hit exists in the database with regions of >sufficient similarity to all three fragments) how an ungapped (ie -g F ) >alignment will allow for an incorrect number of 'X's in the query >sequence. The -g F will just keep BLAST from gapping around the X's. X is an ambiguity character and gives a negative score (-1 to -3 in BLOSUM80). If using gapping, it may actually read across these Xs and try to stitch it together, which is not what I want for the X's. Another way to avoid this is to use gapping but just put in 50-100 X's. The idea behind the X's is not to get one alignment for the 3 frags but to have BLAST think of them as a single Query sequence so it will combine the significance. >3. Is it possible to set the '-g ' option using the NCBI blastp website? There's seems to be something wrong with the website now so I can't check for sure, but you may be able to set -g F in the "Other Advanced " box. It doesn't list that as one of the acceptable advanced options but I'd give it a try. You should also check "Mask for lookup table only". This is called softmasking which will not use low complexity sequence for the initial word search but will allow extension across this region. I almost always use the softmasking option for protein and especially DNA searches. >4. When I used the NCBI blastp webpage with : > >query >lectin_combined >MASLQTQMISFYAIXXXXXKVNSTETTSFLITXXXXXKPQTGGGYLGVFNSAEYD > >blosum 80, word size of 2, Expect value of 10,000, Gap Exist 11, Extend 1 > >I did not retrieve the parent sequence (>gi|490035|gb|CAA01149.1| lectin >[Pisum sativum]) which concerns me? Could you give any insight on what I >may be doing incorrectly? Hmmm. Sounds like you are doing it correctly. Is your first hit "gi|126148|sp|P02867|LEC_PEA"? If so, that's the same protein. The NR db combines identical sequences and concatenates their deflines. I think the CAA01149 is probably just several down the defline ladder and isn't shown in the BLAST report. I pulled that out of one of our custom Databases so it's not an NR protein. >5. In the graphical portion of the blast output, I have noticed that >sometimes the black bars are *not* connected by the hashed lines. Further >inspection of these shows that they are (to my understanding) completely >unrelated. For exampled, two unconnected black bars referred to a UDP >sugar Hydrolase (S=24, E=393) and a Favin precursor (S=39.2, E=0.011). >The sequence alignments were in completely different places of the output >as well. Whats the deal with these unconnected black bars? I must be >missing something on this. The bars in the report represent all of the different hits to your query. Ones connected with lines are HSPs from the same subject. When they aren't connected by lines then they are different database subjects (as you saw above) >6. I had many nice conversations and meals with your co-author Mark >Yandell at the recent CSHL bioinformatics courses. Its nice to interact >with yet another co-author! > Thanks, Tristan. Good talking to you too. I hope I've helped. Joey