<html><div style='background-color:'><DIV></DIV>

<DIV></DIV><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">Hey Bioinformatics Board,</SPAN></B>

<DIV></DIV>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">     Long time reader, first time poster. I have a big program due tomorrow morning, I'm having major problems with this program. This seems like a simple, basic Perl program that any professional in the bioinformatics field may be able to answer. </SPAN></B></P>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">    I'd love it if you could just figure out how to do this program and send it back to me asap, by tonight if possible... Tuesday, May 7th. </SPAN></B></P>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">    Desperate students call for desperate measures.  :)  Please email program to <A href="mailto:StarMachina@hotmail.com">StarMachina@hotmail.com</A></SPAN></B></P>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt"> </SPAN></B></P>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">    Gracias fellow bioinformaticists...</SPAN></B></P>

<DIV></DIV>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt"></SPAN></B> </P>

<DIV></DIV>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt"></SPAN></B> </P>

<P class=MsoNormal><B><SPAN style="FONT-SIZE: 14pt; mso-bidi-font-size: 12.0pt">Programming Assignment: </SPAN></B></P>

<DIV></DIV>

<P class=MsoNormal>For this assignment you are to write a program which will translate an mRNA sequence into a list of amino acids.<SPAN style="mso-spacerun: yes">  </SPAN>To do this, I have provided you a file which contains the information found in a standard table of the genetic code for mRNA to protein translation.<SPAN style="mso-spacerun: yes">  </SPAN>That file is called Base2Protein.dat.<SPAN style="mso-spacerun: yes">  </SPAN>Each line of the file contains a 3-base RNA sequence and its associated amino acid.<SPAN style="mso-spacerun: yes">  </SPAN>The last line of the file is XXX to indicate the end of the data.<SPAN style="mso-spacerun: yes">  </SPAN></P>

<DIV></DIV>

<P class=MsoNormal> <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>You are also provided a file called rnaSequence which is a randomly generated sequence of the letters A, U, C, and G.</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>Your program is to mimic the action of mRNA to protein translation.<SPAN style="mso-spacerun: yes">  </SPAN>As you know, the three base sequence AUG is where translation begins.<SPAN style="mso-spacerun: yes">  </SPAN>So your program must find that sequence and translate until one of the STOP sequences is encountered.<SPAN style="mso-spacerun: yes">  </SPAN>There may be several places where AUG occurs, that is, there may be several "proteins" encoded in the file.<SPAN style="mso-spacerun: yes">  </SPAN>Also, at the end of the file it may be that you find an occurrence of AUG but the end of the file is reached before a STOP codon is found.<SPAN style="mso-spacerun: yes">  </SPAN>In that case, you need to inform the user that translation was prematurely halted.</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>Here is the basic strategy you should follow.<SPAN style="mso-spacerun: yes">  </SPAN>All steps should be clearly separate in your program, clearly delimited and commented.</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal><B>step 1.)</B> Open the file Base2Protein.dat for input.<SPAN style="mso-spacerun: yes">  </SPAN>Read through it, creating a hash holding the 3-base sequences and their associated amino acids as you go.<SPAN style="mso-spacerun: yes">  </SPAN>In other words, you want a hash called %codes, where, for example, $codes{UUU} is "Phe".<SPAN style="mso-spacerun: yes">  </SPAN>Now, when you read one line of the file, you need to get the two "tokens", that is, strings, that appear on each line.<SPAN style="mso-spacerun: yes">  </SPAN>(Use <I>split</I> to do this, splitting the line on whitespace.)<SPAN style="mso-spacerun: yes">  </SPAN>Now, if the tokens from the line you just split are in the array @tokens, you want to say $codes{token[0]} = token[1].<B> <SPAN style="mso-spacerun: yes">  </SPAN></B>In the above example, this associates the mRNA codon UUU with the amino acid Phe.<B><o:p></o:p></B></P>

<DIV></DIV>

<P class=MsoNormal><B> <o:p></o:p></B></P>

<DIV></DIV>

<P class=MsoNormal><B>step 2.)</B> Read and translate the sequence in the input file.<SPAN style="mso-spacerun: yes">  </SPAN>This part of the program should do all input to STDIN and all output to <SPAN style="mso-spacerun: yes"> </SPAN>STDOUT.<SPAN style="mso-spacerun: yes">  </SPAN>You are assuming that the user will redirect the input file at least and maybe the output file as well.<SPAN style="mso-spacerun: yes">  </SPAN>What you need to do is to find the next occurrence of AUG.<SPAN style="mso-spacerun: yes">  </SPAN>At the position where the AUG <I>begins</I>, you need to start getting base pairs three at time and, using the hash you created in step 1, send their associated amino acid abbreviations to the output file.<SPAN style="mso-spacerun: yes">  </SPAN></P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>Here is a useful aspect of regular expression that is not mentioned above, but is useful here:</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal><SPAN style="mso-spacerun: yes">                   </SPAN>/[AUCG]{3}/</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>denotes a pattern which consists of <I>exactly three</I> occurrences of letters from [AUTG].</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>Your program must process the sequence file in a case-insensitive manner.</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>Don't forget that there may be more than one "protein" encoded in the input file.<SPAN style="mso-spacerun: yes">  </SPAN>What you want to do is find where the first one starts, then translate it to the output file.<SPAN style="mso-spacerun: yes">  </SPAN>When you are done with the first "protein", that is, when you have encountered a STOP codon, <SPAN style="mso-spacerun: yes"> </SPAN>look for the next occurrence of AUG which is after the place where your translation left off.<SPAN style="mso-spacerun: yes">  </SPAN>Keep doing this until you fail to find another occurrence of AUG or you hit the end of the file during translation (in which case output a message indicating premature end of translation, as described above).</P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>You may work in teams of three on this program.<SPAN style="mso-spacerun: yes">  </SPAN>Let me know who you are working with.<SPAN style="mso-spacerun: yes">  </SPAN>You must write the program one step at a time.<SPAN style="mso-spacerun: yes">  </SPAN>In other words, don't try step 2 until you are certain that step 1 works.<SPAN style="mso-spacerun: yes">  </SPAN>Furthermore, each sub-step must be written and tested in order, i.e., opening and reading the file Base2Protein must be accomplished before you attempt to create the hash.<SPAN style="mso-spacerun: yes">  </SPAN>If you ask for my help at some step – say in creating the hash – I will first insist on seeing an earlier version that opens and reads the file Base2Protein. </P>

<DIV></DIV>

<P class=MsoNormal> <o:p></o:p></P>

<DIV></DIV>

<P class=MsoNormal>This program will be due on Wednesday, May 8th<SPAN style="mso-spacerun: yes"> @ 10am.</SPAN></P>

<DIV></DIV>

<P class=MsoNormal><SPAN style="mso-spacerun: yes"> </SPAN></P>

<DIV></DIV>

<P class=MsoNormal><STRONG>sequence.dat file:</STRONG></P><XMP>GCAGTCGCACCAAATGTGTTTCGAATGAGTTTTGTGAACTAATCCTGATTTCATCGCTACATGCGAACAATAAACCGGCATCACAATCGAGGTCTTTAGTCACTTCCATCGGCTAATTTTAACACTGGCATCACCTGTATGAAGCGACTGAGCTGGGGTCAAGAGAAAGAGACACTCAGAGAGACGCGATTTTGGCGGACTCAATCACAAGGAGCGGGGATGAGGTTTCTTGCACATGCCCAGGGCTCCAGCGTAGATGCCTTGCGAACCCCCTCTGACCGAGTACACCCTTGTGCACCCCCTTTTCCCGGCCCCCTTCGCAACTTGCTTATGCCGTGGTGAGTCTCGGCAACGTAGTGGACCTGCGCAAGCTTTGCGTCGACATCGATACCCACATGTATGGGTAGAACTACCGGAACACCGCCATTCCGACGAAGTACAAATCTCGCCGGTTCCGAGCAGGTTCGGGATTATGGACCAGCCCGTTGTAGGTTAGTAATGTGAACCAGTGTGGACCCGTATCATCTGCTTAGCTTGTTGCAGTGTAGCACGGATGTTTAGATGGCTCAAAACAATTGCTCCTTTTTATAGTAGCCAGATCGACGTTTTTTTTTACCAGGAAAGGCCTATGTCCGCGGAAGTTGACGGCCAGAGTGTGTCTGTCCCTGAAACATCATTATGTTCTGTCAAACCCTGGAGTGAGGAGTGGCTGTCTCGTGTAGCAAATAACTCCTAGGGAGGCAGGTAGTGGGGTCTGTCTGCTATTAGGACATGCTTAGAATAGCGTATCTCTGTTATTCCTAACGCAATTAGGGGCCCGAATCATTTAACTAAGTTGTATCGAGTACTGAACGTGCATCATGTTGTCTTGTATGGTTAATCCGCTCGAACTGGCTTAAGGCACTAGCAGCTGACACCGCGAGTTCGAGATGGTGTGACCAGACCAGTAGAGCCGGATGGGTTCACGAGTGCTATAATGGATGACAAACGCTGACGCGGGCACTCAATCAGCTGCCTCGGTGAGTATGTTTCTCCCAGAAACCTAATCGACCCCCTACACCATCCGCACGTGTCTGGCCGCAAACAACGTTGTGGGAGCCTGGTACGGTCTCACCGCTAGTGTACCGCAGGCCATTCCGATTATGTGAGATATAACTGAGCGATGGTCAGTATGGCAGGAGCGGTCTGCGTGCCTGCAACTAACTGGACCATTTAGTGTTATCAGCACGCTCAGCGATAGCAGGCAGCAATGGTGCAACGTCGTCTCTGAGAGTAAGTAGGAACGCATTTAACCCAGAAATTTACCAGCGCTCACTGAGCCACAGATCACCTTTATAGTTAAGTATCCCGTAGCTCATAGACCGAAACGGAAAGTTCATCACCAGATGTTCCTCGGCTACGCCGGGAACTCCGCATTTGGACGAGGTCGATCGGCCCGTGCTGGTTGTGCTGGAAATCGTCCGGCGTACGAACTATCAAACCATCTTTGCCTGGGCAACTCGAACCGGGCAAACGATGTTAATTCGCCCAATGGTGCATAAATTCGAACCTTGTGTCCATTAGTGAGTGAACGTCTTCACAATCTCCTACACCGCAAAATTACCTCAGCGTAATGTGGAGTCGTGTTAACTACCCCACCTACCGCCACGCTTTAAGGTAAAAGTTGTGTTGGGCTATCGAGGCAGTTGTAGGTCGCTACCTCCAGTCCCTGAGCCTAGCCGTTATATGGAATCCCCACCATGAGAGACGGGAATAAGGTACTTGACTATTTATTGTATATAGTGCGGGAGTGCTTAGTTTCAATGGCGAGCGAAGGGACACCGCGATCTGGCTTCTTGGATGGTGACCCCCAACTTATCGATCACCTTCTCATAGAGTGTAGCAGCTCCGCGGGCCCTACATGAGTCGGACCCATGAAGGGTACAACCGCGCGGATGCTCACTGAAATCGCTGGCAGGGTAACGTATCAGCACTCTTTAAATAGCATGAGAGGTTACAAA</XMP>

<DIV></DIV>

<P class=MsoNormal><SPAN style="mso-spacerun: yes"><STRONG></STRONG></SPAN> </P>

<DIV></DIV>

<P class=MsoNormal><SPAN style="mso-spacerun: yes"><STRONG>Base2Protein.dat file:</STRONG></SPAN></P><SPAN style="mso-spacerun: yes"><XMP>UUU Phe

</DIV>UUC Phe

</DIV>UUA Leu

</DIV>UUG Leu

</DIV>UCU Ser

</DIV>UCC Ser

</DIV>UCA Ser

</DIV>UCG Ser

</DIV>UAU Tyr

</DIV>UAC Tyr

</DIV>UAA STOP

</DIV>UAG STOP

</DIV>UGA STOP

</DIV>UGU Cys

</DIV>UGC Cys

</DIV>UGG Trp

</DIV>CUU Leu

</DIV>CUC Leu

</DIV>CUA Leu

</DIV>CUG Leu

</DIV>CCU Pro

</DIV>CCC Pro

</DIV>CCA Pro

</DIV>CCG Pro

</DIV>CAU His

</DIV>CAC His

</DIV>CAA Gln

</DIV>CAG Gln

</DIV>CGU Arg

</DIV>CGC Arg

</DIV>CGA Arg

</DIV>CGG Arg

</DIV>AUU Ile

</DIV>AUC Ile

</DIV>AUA Ile

</DIV>AUG Met

</DIV>ACU Thr

</DIV>ACC Thr

</DIV>ACA Thr

</DIV>ACG Thr

</DIV>AAU Asn

</DIV>AAC Asn

</DIV>AAA Lys

</DIV>AAG Lys

</DIV>AGU Ser

</DIV>AGC Ser

</DIV>AGA Arg

</DIV>AGG Arg

</DIV>GUU Val

</DIV>GUC Val

</DIV>GUA Val

</DIV>GUG Val

</DIV>GCU Ala

</DIV>GCC Ala

</DIV>GCA Ala

</DIV>GCG Ala

</DIV>GAU Asp

</DIV>GAC Asp

</DIV>GAA Glu

</DIV>GAG Glu

</DIV>GGU Gly

</DIV>GGC Gly

</DIV>GGA Gly

</DIV>GGG Gly

</DIV>XXX EndOfData</XMP><XMP> </XMP><XMP><STRONG></STRONG> </XMP><XMP><STRONG>rnasequence.dat file:</STRONG></XMP><XMP>UACUGAUACAACCCGUGUGGGAUCCGUCUGGGGUGUCCAGCCGAAGUCGGGACGAUAGCACGUAUCCACCGCCCAAUUACGACACCGAUCUUGAGGGCUGACAGGAACGAUUAGCCGGGGCCACAGUUACGACAAGUGCGUCCUAUCAGUCUAGUUUUGACCUCUCCCUCUCACAGACUCUCUCAUAUCGGGGUUAUUCAGACCGACACCUUCUAUUUUCGUCUUGGGAGGUACACGUAAACUUUAGAACUAUGCUCGUAAGGUAUCCAAAAAGAGUCAAUCUGCACAAAGGUGUACAAAAAGGGGAAAUUAAAAAGGAUACCAGGUAGGCGUAAUGUUGUCUGAGAUUACCAUGCUGUUCAAGUAUACCUAGGGUAUGAUCACGAUCGCAAACACGUGCGUUUGCUCCAGCAAUUCCACAAUAACGGAAUCAUCCUGCACCCGAGAUAAUUGGAAUCUACUUGGAUUUCGGCGUUCAACUAAAUGGUGCUUGGCUGCCGUGUCCAACUGUGUUCAAAUGCGACGAGUAGGCUAGGUGGUACUGUGCUACAUUCGUGGGCUCGUUAGACCCCACCGGUAGAAGGGGGCGCUGCUAACUCGAUCAUGGGGGGGGGCAACUUCCCUUAAGCGUGAAUAUUCCUGGUCAUUAACUCUGUGUGAGUGAAAGUCCCACGACGGCGUGGAGUGACCCAAAGUUCUGUCUUCUGUUAGUGAGAUGUGCUACCCGCCAGAAGCUUUCUUACUUGCUGUUUUGAGUGAGUAGCGGCUUCACGUAGGCUCCGCUAUGCGAGAGUGGCGGAAGCCAUACCGGCUUUUAAAUCCGACGGGCCAGCCUGGUGCGAUCUGCAGUCCAUGUACGACGUGGUGAGGUGCGUUGGCCGAAUAGAUCCAGUUAGGCCUUACAGCUACUAGUCACAAUAUCUGGAUCUCGUUGUGUCAACUCAACUGCUCUAAUUCGUUUGGACAUCUGUAGCGCCGUUCGUCACCCAUAGUCAUAUUUACAGACCGACUAGUAAGAUUGUCUGCGUGGGAGAAACUCCCAAGCCGAUCAAAAAGCACAACGAAUACAUGUGAGUUAAUACCCACCAUGGUGUUUCUAAGUUGCAUUGAGACAAUAGCUGUGCAAUACUUAACGGAAUCGGCGUGUCUCGCGCCAGUCUAUCGUUGACUGCGUUACUUCUAUUGAGUAUGUAAGUACCAGCCAGUUCAACGGGCUGUGGCGACUACAUAGACUAUCGCUACUUACUACCGUUGUACCAUGAUGAGAGUCUCUGCCUGCUUCCAUACGGGCCAAACUCCCGGGCAACUAUAGACAGUCUAACACUCGACAAGGGCGCUGGCCUGCGAAAUGCUAGACGCUCAAUCCCAUUCCCUGGACGACAACUCGUGGAAGAUUAGCAUAAUUUCCAGAAUACGGGUUCAUCUUGAUCGAUUAAAUGUAGUUGGUGUAGUUCCCGAUGAAUUAUGCAUCCAGCGACCCAACGAGGGUAAGUUUACCAGAUCCAAUUUACCCAUCGUGGCCGGAUAAACCGUUGUACGCCCGGAUCCAAGGUGUGAACGGCUGUCUGUCCAUGAGGACACCGAGAAGCACAAUACCCCGGCAAGACUAUGCCGUGUUCUGAUGUGGCCAGCAAAACAAGCAAUAACAUAGGGCCUUGCCCCUGGUGUGGUUUAGCGAUCUUACUGGUGCUUGAUAGCAAGAACUGAAAGUCUAAGCUAAUGGCGCGUUCCGAAAACAACGUCUCUCAUUUCCGCCUUGCAGGUCAGCGGGCGGUGCGCGCUGUAUUUCUGUAGGCUGGGACCGUUAUCUAUCCUUUCACAAUAUCGAGUUAGGAGGUUCGUUGUCAAAAACCAGGCGAUCGACAAGGAGACGCUCUGUGCUACUAGAAUAUUUAAAGCACGUCUGAUUCAAACGUCCUUUGCACCAAUAUAUUCGUAGACAGUCCCGAUAGUUACUUUGCCAUGCGACUACAGAGGGCCCGCUACGUCUCUUGGCACCC</XMP><XMP> </XMP></SPAN>

<DIV></DIV></div><br clear=all><hr>MSN Photos is the easiest way to share and print your photos: <a href='http://g.msn.com/1HM105401/46'>Click Here</a><br></html>