<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>Clustering EST sequences</TITLE>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.2800.1141" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face=Arial size=2>I don't know of any conversion utilities but you 

can certainly write a quick conversion in Perl. I'm not familiar with the 

specific layouts but it sounds like you simply need to properly truncate each 

row of data. There shouldn't be a problem if your field partition is white space 

(or any other specific delimiter for that matter). </FONT></DIV>

<DIV><FONT face=Arial size=2>If you don't get a better offer, send me a small 

data file of what you need converted, the field delimiter used, and an 

example of what it needs converted into. I should be able to write you a Perl 

routine and send it back to you. </FONT></DIV>

<DIV><FONT face=Arial size=2>Scott A. Halpine<BR>Ecologic Complex Systems, 

LLC<BR>4640 Forbes Blvd, Suite 200<BR>Lanham, MD 20706-4885<BR>Phone: 

301-918-3283<BR>Fax: 301-429-8762</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">

  <DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>

  <DIV 

  style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B> 

  <A title=A.Bossers@id.dlo.nl href="mailto:A.Bossers@id.dlo.nl">Bossers, A.</A> 

  </DIV>

  <DIV style="FONT: 10pt arial"><B>To:</B> <A 

  title=bio_bulletin_board@bioinformatics.org 

  href="mailto:bio_bulletin_board@bioinformatics.org">bio_bulletin_board@bioinformatics.org</A> 

  </DIV>

  <DIV style="FONT: 10pt arial"><B>Cc:</B> <A 

  title=biodevelopers@bioinformatics.org 

  href="mailto:biodevelopers@bioinformatics.org">biodevelopers@bioinformatics.org</A> 

  </DIV>

  <DIV style="FONT: 10pt arial"><B>Sent:</B> Tuesday, April 01, 2003 6:29 

  AM</DIV>

  <DIV style="FONT: 10pt arial"><B>Subject:</B> [BiO BB] Clustering EST 

  sequences</DIV>

  <DIV><BR></DIV>

  <P><FONT face="News Gothic" size=2>Dear All,</FONT> </P>

  <P><FONT face="News Gothic" size=2>I have a very basic problem of which I 

  wonder how others have solved this.</FONT> </P>

  <P><FONT face="News Gothic" size=2>I want to make a unigene collection of a 

  large EST database. We have chromat files in ABI format and I use Linux on the 

  intel platform.</FONT></P>

  <P><FONT face="News Gothic" size=2>I have phred and phrap running but since 

  phrap was originally designed for genomic sequences we get lots of 

  misaasemblies on poly-A or poly-T stretches.</FONT></P>

  <P><FONT face="News Gothic" size=2>Therefore I installed the TIGR tigcl 

  package which is designed for EST databases and also runs very well on multi 

  node machines.</FONT></P>

  <P><FONT face="News Gothic" size=2>However, it uses multi fasta files (and 

  corresponding (optional) quality files) as input.</FONT> <BR><FONT 

  face="News Gothic" size=2>I wanted to use the phred package to generate the 

  required fasta and qual files. This runs fine but the fasta file has in the 

  >name line additional info separated with spaces. These files are not 

  accepted by TGICL.</FONT></P>

  <P><FONT face="News Gothic" size=2>Is there an easy unix (linux) utility to 

  convert these multi fasta files and quality fasta files in simpel >name 

  {CRT} seq files so they kan be used as input for tgicl? Or is a conversion 

  utility available to convert/extract phreds phd files into fasta-seq and 

  fasta-qual?</FONT></P>

  <P><FONT face="News Gothic" size=2>Any help would be appreciated,</FONT> </P>

  <P>        <FONT face="News Gothic" 

  size=2>Alex</FONT> </P><BR></BLOCKQUOTE></BODY></HTML>