[BiO BB] Poly A tail length - script help please

Joseph Landman landman at scalableinformatics.com
Tue Sep 9 19:57:34 EDT 2003


First one is free ... 

        #!/usr/bin/perl
        
        use strict;
        
        my ($directory,$directory_handle,$file, at files,$sequence);
        my ($file_handle,$poly_a_tail,$rseq);
        
        $directory = "./";	# directory to open
        if (!(opendir $directory_handle,$directory))
           {
             die "FATAL ERROR: Unable to open directory = ".$directory."\n";
           }
           
        # select only the .seq files
        @files = grep { /\.seq$/ } readdir($directory_handle); 
        
        # loop over these selected files
        foreach $file (@files)
          {    
            # try to open the file
            if (!(open($file_handle,"< ".$file)))
               {
                 # if we cannot open it, warn the user, and skip to the next file
                 warn "Warning: unable to open file = ".$file."\. Skipping\.\n";
        	 next;
               }
              else
               {
                 # assume one line per file, or we will have to modify this
        	 chomp($sequence=<$file_handle>);
        	 # now time to bring out the heavy artillery
        	 $rseq=reverse $sequence;	# poly-a is now at the head
        	 $rseq =~ /^([AN]+)\w+$/;	# match A's and/or N's at the front
        	 $poly_a_tail = $1;		# return the match ...
        	 printf "%i %s\n",length($poly_a_tail),$file;	# tell the world ...
        	 close($file_handle);
               }
          }



On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> Thanks for the scripting tips!  I have a 'counting' issue which I need to
> quickly resolve.  A typical sequence input file (5 - 700 bases) looks like
> :
> 
> AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> 
> I have over 500 files, named *.seq.  I would like to create a script which :
> 
> a.  runs through all the files,
> b.  counts the length of the 'poly A' tail (defined as the longest stretch
> of A or N)
> c. sends the output to a file, eg.
> 
> 25 1.seq
> 87 2.seq
> 13 3.seq
> 
> Example valid poly A tails :
> 
> AAAANANANANAAANNAAAAAA
> 
> AAAAAAAAAAAAAA
> 
> NNNNNNNNNNNNN
> 
> AAANNNNNNNNNNNAAAAAAAAA
> 
> Thank you so much for your expertise!
> 
> Tristan
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615





More information about the BBB mailing list